Fixed
Status Update
Comments
da...@gmail.com <da...@gmail.com> #2
[Empty comment from Monorail migration]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #4
The following revision refers to this bug:
https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3
commit f9579994763988b068c0f9b1f7a278ff66b780e3
Author: David Ostrovsky <david@ostrovsky.org>
Date: Wed Oct 16 08:26:23 2019
Migrate change index to use dimensional numeric types
Lucene 6.x deprecated IntField and replaced it with IntPoint that is
using different backend storage: [1]. Instead of continuing to
represent numeric data using a structure specifically designed and
tuned for text, the Bkd implementation introduced the first flexible
tree structure designed specifically for indexing discrete numeric
points: [1]. While the new data types are mostly the drop in
replacement for old IntField and LongField types, new type cannot be
used for document id types.
The previous migration from Lucene 5 to Lucene 6 switched to using
deprecated LegacyIntField type. In the next Lucene release 7, this
class and friends were extracted from Lucene distribution and moved
for one release to Apache Solr library. So theoretically we could
still use Apache Solr dependency by adding this dependency to Gerrit
and continue to use the old/deprecated/removed data types for one
major Lucene release.
We prefer forward migration strategy and switch to using string field
type as document id for account, change and groups indexes. Only change
index is handled in this commit. Other indexes are handled in follow-up
changes.
To support online migration, legacy numeric field types are still used
in the old index schema version, but new dimensional point field types
are used in new schema version. Old integer document id field type is
replaced with string type id in new change index schema. Therefore, in
different code paths it must be decided whether the legacy number field
types or the new dimensional point field types should be used depending
on the currently used index schema version. To support this logic, new
attribute is added to the index schema class: useLegacyNumericFields.
While this approach temporarily complicates the code, it can be removed
when a next gerrit version is released. Until then the deprecated type
classes are still used.
Non id fields are replaced with new IntPoint and LongPoint fields so
that we do not use any deprecated and removed features in Lucene and
could easily upgrade to the next major Lucene release without relying
on third party dependency (Apache Solr).
One side effect of this change is that ChangeQueryBuilder in the
AbandonUtil must be used with Guice provider. The reason for that is
because index collection must be accessed to retrieve schema instance,
to detect the useLegacyNumericFields attribute. Given that AbandonUtil
is bound in singleton scope, index collection is only provided when
multiversion index module is started. When the support for legacy
numeric field is removed in later gerrit releases this change can be
reverted.
[1]https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf
Bug:https://crbug.com/gerrit/11643
Change-Id: Icbc80d8a775a6ffea97e99717b24d3e8cacaee14
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/elasticsearch/ElasticChangeIndex.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/index/Schema.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/index/SchemaUtil.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/lucene/AbstractLuceneIndex.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/lucene/ChangeSubIndex.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/lucene/LuceneChangeIndex.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/lucene/QueryBuilder.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/change/AbandonUtil.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/index/IndexUtils.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/index/change/ChangeField.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/index/change/ChangeIndex.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/index/change/ChangeSchemaDefinitions.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/query/change/ChangeQueryBuilder.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/query/change/CommentPredicate.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/query/change/ConflictsPredicate.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/query/change/FuzzyTopicPredicate.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/query/change/InternalChangeQuery.java
[add]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/query/change/LegacyChangeIdStrPredicate.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/java/com/google/gerrit/server/query/change/MessagePredicate.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/javatests/com/google/gerrit/server/index/change/FakeChangeIndex.java
[modify]https://gerrit.googlesource.com/gerrit.git/+/f9579994763988b068c0f9b1f7a278ff66b780e3/javatests/com/google/gerrit/server/query/change/AbstractQueryChangesTest.java
commit f9579994763988b068c0f9b1f7a278ff66b780e3
Author: David Ostrovsky <david@ostrovsky.org>
Date: Wed Oct 16 08:26:23 2019
Migrate change index to use dimensional numeric types
Lucene 6.x deprecated IntField and replaced it with IntPoint that is
using different backend storage: [1]. Instead of continuing to
represent numeric data using a structure specifically designed and
tuned for text, the Bkd implementation introduced the first flexible
tree structure designed specifically for indexing discrete numeric
points: [1]. While the new data types are mostly the drop in
replacement for old IntField and LongField types, new type cannot be
used for document id types.
The previous migration from Lucene 5 to Lucene 6 switched to using
deprecated LegacyIntField type. In the next Lucene release 7, this
class and friends were extracted from Lucene distribution and moved
for one release to Apache Solr library. So theoretically we could
still use Apache Solr dependency by adding this dependency to Gerrit
and continue to use the old/deprecated/removed data types for one
major Lucene release.
We prefer forward migration strategy and switch to using string field
type as document id for account, change and groups indexes. Only change
index is handled in this commit. Other indexes are handled in follow-up
changes.
To support online migration, legacy numeric field types are still used
in the old index schema version, but new dimensional point field types
are used in new schema version. Old integer document id field type is
replaced with string type id in new change index schema. Therefore, in
different code paths it must be decided whether the legacy number field
types or the new dimensional point field types should be used depending
on the currently used index schema version. To support this logic, new
attribute is added to the index schema class: useLegacyNumericFields.
While this approach temporarily complicates the code, it can be removed
when a next gerrit version is released. Until then the deprecated type
classes are still used.
Non id fields are replaced with new IntPoint and LongPoint fields so
that we do not use any deprecated and removed features in Lucene and
could easily upgrade to the next major Lucene release without relying
on third party dependency (Apache Solr).
One side effect of this change is that ChangeQueryBuilder in the
AbandonUtil must be used with Guice provider. The reason for that is
because index collection must be accessed to retrieve schema instance,
to detect the useLegacyNumericFields attribute. Given that AbandonUtil
is bound in singleton scope, index collection is only provided when
multiversion index module is started. When the support for legacy
numeric field is removed in later gerrit releases this change can be
reverted.
[1]
Bug:
Change-Id: Icbc80d8a775a6ffea97e99717b24d3e8cacaee14
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
[add]
[modify]
[modify]
[modify]
da...@gmail.com <da...@gmail.com> #5
[Empty comment from Monorail migration]
da...@gmail.com <da...@gmail.com> #6
[Empty comment from Monorail migration]
ek...@google.com <ek...@google.com> #7
[Monorail components: Backend]
ek...@google.com <ek...@google.com> #8
[Monorail components: -Lucene]
Description
and range types were replaced with IntPoint and LongPoint and range
types with different backend representation. See these issues for more
details:[1],[2],[3] and this in depth overview:[4].
The deprecated types were renamed in Lucene 6.x in LegacyIntField and
LegacyLongField, and removed in Lucene 7.x so that we cannot upgrade
anymore and must migrate to the new dimensional numeric types.
[1]
[2]
[3]
[4]