Status Update
Comments
da...@gmail.com <da...@gmail.com>
da...@gmail.com <da...@gmail.com> #2
Here's a more detailed timeline from the change metadata:
- Dec 30 01:04:10 - patch set 24 was rebased by Marcin Czech, creating patch set 25
- Dec 30 01:17:19 - patch set 26 was pushed by Luca Milanesio
- Dec 30 01:17:24 - 2 comments by Luca Milanesio were published on patch set 25
- Dec 30 01:17:24 - attention set on patch set 26 was updated
Step #3 has broken the account mapping. On a closer look, that single publication of two comments is more complex than it looks. From the diff, we can tell that it is adding comments on two patch sets:
- patch set 23 (commit id
d3877e6
) - all good here, no changes to account ids - patch set 3 (commit id
4566262
) - introduces breaking changes, allcomment.author.id
values were changed to current server account id from the migrated server account id.
Let's now have a closer look at how the external to current account ID mapping works. The external account ID mapping is handled when metadata are read from git. The NoteDB stores comments in JSON files, which are first parsed, and then the comment author's IDs are "negotiated", meaning that a mapping between the current instance account ID and external account ID is found. From that point, all (in memory) comments are pointing to account IDs from the current instance (whereas NoteDB data stays unchanged and uses external account id).
This looks to me as a potential serialization problem. We either haven't "reverse" mapped the account IDs or used a list of comments "from memory" instead of "from git" when creating a commit object.
The other question is why, this happens on that single occasion, as we haven't observed similar issues in other changes.
I'll continue investigating this issue.
da...@gmail.com <da...@gmail.com> #3
One suspicious area is com.google.gerrit.server.notedb.ChangeUpdate#getRevisionNoteMap()
. The ChangeUpdate
class is not threaded safe, as stated in the class javadoc, additionally the getRevisionNoteMap()
, can potentially reuse previously loaded ChangeRevisionNote
, which makes me wonder if it would be possible that the reused ChangeRevisionNote
could have mapped account IDs, this could potentially lead to the result we observed in this bug report, but I don't know how (or even if) it is technically possible to happen.
da...@gmail.com <da...@gmail.com> #4
I was able to reproduce it locally and it's a caching-related problem.
For the reproduction an imported project with external account ID mapping and a change with comments created on the old server instance is required.
- Create a draft commit on an imported patch set with comments created on the other instance
- Connect the debugger and put a thread breakpoint in
com.google.gerrit.server.notedb.ChangeNotesCache:405
- Submit draft comment created in step #1
- Each time the breakpoint is hit, flush all caches over SSH
- Investigate the
meta
branch for change and observe changedcomment.author.id
values
Disabling change_notes
should also result in a similar outcome.
What is happening here?
First of all, as explained in the second comment, we keep the original account ID in the meta
branch, but when we read (or cache) the data, we automatically map that account id to the current one and store that in the cache.
Looking at a comment in ChangeNotesCache
L375 gives a hint of what is actually going on. Another part of the puzzle is in ChangeUpdate
L652, where we use cached note
when notes.revisionNoteMap
is not null
(remember that the cached value has mapped account IDs). The final part is in the RevisionNoteBuilder
L170, where we just serialize Comments
objects.
Let's now put it all together. When we don't find an entry in the change_notes
cache, we'll load it from the meta
branch, migrate the account IDs and keep that modified value in memory. Now when a comment is added and change notes were just evicted from the cache, the ChangeUpdate
will load it. Then in ChagneUpdate.getRevisionMap()
we'll use the cached value. Finally when we serialize data back to JSON format to store it in the meta
branch, it the RevisionNoteBuilder
we don't use any GSON mapping from Comment.author
field which will just store the current server account ID.
This issue will only occur when change note is evicted from cache "just" before ChangeUpdate
is creating the revision map.
ap...@google.com <ap...@google.com> #5
Branch: stable-3.7
commit e2e7d68c093fd790c60b976736a05d8962a6fe98
Author: Luca Milanesio <luca.milanesio@gmail.com>
Date: Thu Jan 11 23:43:23 2024
Complete comment identity with the associated serverId in comments
Change note-db /meta ref contains the list of change notes associated,
including all the comments were added to files and commit messages.
The remapping of account-ids performed with I4b0fe6689 has correctly
adapted the comment's author account-ids to the local associated
accounts, however, it failed to update the comment serverId, leaving
the rest of Gerrit with an inconsistent identity.
Even though the identity was not causing apparent damage, mostly
because the serverId was ignored mainly by Gerrit; it left the cached
entities with the incorrect data and, in some cases, persisted back
to NoteDb (see
Rectify the situation by updating the serverId when accounts are
remapped.
Release-Notes: Fix serverId in identity parsing for imported human comments
Forward-Compatible: checked
Bug:
Change-Id: I5904dc46120858beb54e20ac339f10a02d785d3a
M java/com/google/gerrit/server/notedb/ChangeNotesParser.java
M java/com/google/gerrit/server/notedb/NoteDbUtil.java
M javatests/com/google/gerrit/server/notedb/ImportedChangeNotesTest.java
Description
*** !!!! THIS BUG TRACKER IS FOR GERRIT CODE REVIEW !!!! *** Do not submit bugs for chrome/android and issues with your company's *** Gerrit setup here. Those issues belong in different issue trackers.
Background
The project
eclipse-jgit/jgit
on eclipse.gerrithub.io was initially developed on git.eclipse.org and then moved to GerritHub.io on the 20th of November 2023.Some of the changes started on git.eclipse.org and then were moved to eclipse.gerrithub.io. The identities of the authors on git.eclipse.org have been remapped to other accounts on gerrithub.io.
Example: David Ostrovsky has accountId=1000177 on eclipse.gerrithub.io and accountId=1684 on git.eclipse.org .
The serverId of eclipse.gerrithub.io is
d5d70762-12d0-45a1-890d-524b12d3f735
, the serverId of git.eclipse.org is97ee7c02-f12f-4043-b43e-dea463d88b31
, therefore David Ostrovsky is identified as1000177@d5d70762-12d0-45a1-890d-524b12d3f735
or1684@97ee7c02-f12f-4043-b43e-dea463d88b31
in change messages.With regards to change comments, they are stored in NoteDb (the
refs/changes/NN/NNN/meta
ref) with account=1684 when serverId=97ee7c02-f12f-4043-b43e-dea463d88b31 or account=1000177 when serverId=d5d70762-12d0-45a1-890d-524b12d3f735.What steps will reproduce the problem?
Sat Dec 30 01:17:24 2023
I have published two new comments to the changeWhat is the expected output? The updated change notes
refs/changes/40/194140/meta
should have contained two more comments.What do you see instead? The updated change notes (Commit SHA1=
f62b199bd2b00a3f73f11fa97392fc6a65270690
) had David Ostrovsky's comments associated with accountId=1684
on serverId=97ee7c02-f12f-4043-b43e-dea463d88b31
rewritten as accountId=1000177
.Additional information below.
See below the diff of the changes introduced on
refs/changes/40/194140/meta
with the commitf62b199bd2b00a3f73f11fa97392fc6a65270690
:It looks like the ChangeUpdate.java class had rewritten the author of some of its comments instead of just reading the existing ones on NoteDb.
The only method that allows to rewrite the author in a comment is in Comment.setRealAuthor():
It is unclear how publishing a draft comment for me could have triggered the call to
setRealAuthor()
for a series of other changes.