Status Update
Comments
sy...@gmail.com <sy...@gmail.com> #2
[Empty comment from Monorail migration]
sy...@gmail.com <sy...@gmail.com> #3
What are the remaining steps before we can announce and release this?
Is this waiting on somebody verifying the changes athttps://gerrit-review.googlesource.com/q/hashtag :"jgitupgradedec2018"?
Is this waiting on somebody verifying the changes at
lu...@gmail.com <lu...@gmail.com> #4
[Empty comment from Monorail migration]
sy...@gmail.com <sy...@gmail.com> #5
[Empty comment from Monorail migration]
lu...@gmail.com <lu...@gmail.com> #6
lu...@gmail.com <lu...@gmail.com> #7
Sorry about that. I've added you as reviewer on the other changes in that set.
ma...@gmail.com <ma...@gmail.com> #8
> Is this waiting on somebody verifying the changes at https://gerrit-review.googlesource.com/q/hashtag :"jgitupgradedec2018"?
Yes.
Given that it's the same fix in all of them, it should be safe enough to verify on one of them and then carry the score over to all the others.
I can try to test it tomorrow if you can provide steps how to do it.
It would be better though if we can write an integration test on the earliest stable branch and then merge that up through the subsequent branches.
Yes.
Given that it's the same fix in all of them, it should be safe enough to verify on one of them and then carry the score over to all the others.
I can try to test it tomorrow if you can provide steps how to do it.
It would be better though if we can write an integration test on the earliest stable branch and then merge that up through the subsequent branches.
sy...@gmail.com <sy...@gmail.com> #9
[Empty comment from Monorail migration]
lu...@gmail.com <lu...@gmail.com> #10
Removing the visibility restriction now that the issue is fixed.
sy...@gmail.com <sy...@gmail.com> #11
[Empty comment from Monorail migration]
ma...@gmail.com <ma...@gmail.com> #12
[Comment Deleted]
ma...@gmail.com <ma...@gmail.com> #13
[Monorail components: Backend]
na...@codeaurora.org <na...@codeaurora.org> #14
@Matthias, did the defaults for receive.autogc or other autogc configs change in JGit used in 3.2 vs 3.3?
@Luca, does this test pass on 3.2 (with no jgit config)?
@Luca, does this test pass on 3.2 (with no jgit config)?
ma...@gmail.com <ma...@gmail.com> #15
- stable-3.2 uses jgit 5.8.1
- stable-3.3 uses jgit 5.9.0
- autoGc and default for receive.autogc was introduced in 4.6.0
https://git.eclipse.org/r/c/jgit/jgit/+/70714
- running autoGc in background was added in 4.7.1
https://git.eclipse.org/r/c/jgit/jgit/+/98474
- last change regarding autogc I found was in 4.8.0
$ git log -i --oneline -S autogc
8c59e6441 Use a dedicated executor to run auto-gc in command line interface
6b1e3c58b Run auto GC in the background
f8ac03459 Fix loop in auto gc
64a404803 Implement auto gc
- stable-3.3 uses jgit 5.9.0
- autoGc and default for receive.autogc was introduced in 4.6.0
- running autoGc in background was added in 4.7.1
- last change regarding autogc I found was in 4.8.0
$ git log -i --oneline -S autogc
8c59e6441 Use a dedicated executor to run auto-gc in command line interface
6b1e3c58b Run auto GC in the background
f8ac03459 Fix loop in auto gc
64a404803 Implement auto gc
ma...@gmail.com <ma...@gmail.com> #16
I repeated the tests two more times and still couldn't reproduce this issue
sy...@gmail.com <sy...@gmail.com> #17
Thanks for trying this out Matthias.
I just wanted to clarify a couple of things:
- The issue here is solely the "Missing Unknown", the "want not valid" is not an issue.
As I understand it, after talking to luca, it is a security protection that forbids the server to expose SHA1s that clients should not see (i.e. the client asked for an invalid ref).
This happens across different branches (tested with 3.3 and 3.2) and it is expected behaviour.
The "Missing Unknown" errors I can see when testing with Gatling are an _overwhelming_ majority (as you can see from the attached logs).
This ticket is solely about the "Missing Unknown" issue (Updated summary).
- I tried multiple times to reproduce this issue on stable-3.2 with no success: stable-3.2 seems unaffected.
- I tried multiple times to reproduce this issue on gerrit-3.3-0-rc5 with no success: gerrit-3.3-0-rc5 seems unaffected.
- I can still consistently reproduce this on (3.3.0-rc4-17-gbef704ebfa7) (bef704ebfa77934d235208509e64ad7ec40019b4)
@Matthias would you please try to replicate this on 3.3.0-rc4 too?
I just wanted to clarify a couple of things:
- The issue here is solely the "Missing Unknown", the "want not valid" is not an issue.
As I understand it, after talking to luca, it is a security protection that forbids the server to expose SHA1s that clients should not see (i.e. the client asked for an invalid ref).
This happens across different branches (tested with 3.3 and 3.2) and it is expected behaviour.
The "Missing Unknown" errors I can see when testing with Gatling are an _overwhelming_ majority (as you can see from the attached logs).
This ticket is solely about the "Missing Unknown" issue (Updated summary).
- I tried multiple times to reproduce this issue on stable-3.2 with no success: stable-3.2 seems unaffected.
- I tried multiple times to reproduce this issue on gerrit-3.3-0-rc5 with no success: gerrit-3.3-0-rc5 seems unaffected.
- I can still consistently reproduce this on (3.3.0-rc4-17-gbef704ebfa7) (bef704ebfa77934d235208509e64ad7ec40019b4)
@Matthias would you please try to replicate this on 3.3.0-rc4 too?
lu...@gmail.com <lu...@gmail.com> #18
My test was on stable-3.3 and, when autogc is enabled, failed.
The only way to make it work is to disable autogc, which is fair but tells me that something isn't really working as it should.
The only way to make it work is to disable autogc, which is fair but tells me that something isn't really working as it should.
sy...@gmail.com <sy...@gmail.com> #19
I confirm I reproduced this on stable-3.3-RC5.
I have also used Matthias's latest jgit version [1] for testing.
Attaching some additional logs
[1]https://review.gerrithub.io/c/GerritForge/gatling-git/+/505686
I have also used Matthias's latest jgit version [1] for testing.
Attaching some additional logs
[1]
ma...@gmail.com <ma...@gmail.com> #20
I could reproduce MissingObjectExceptions occurring on gerrit e6b548b3324309c940b489c132d9bdea4d2d2598 (stable-3.3)
I tried to bisect this problem traversing jgit versions between v5.8.1.202007141445-r and v5.9.0.202009080501-r.
Bisect identified this changehttps://git.eclipse.org/r/c/jgit/jgit/+/168247 to be the culprit.
Though this change is only changing version numbers and cannot be the root cause for MissingObjectException.
I ran the load test once more using the same version and could not reproduce the problem.
Will now try to bisect Gerrit versions between stable-3.2 and stable-3.3.
I tried to bisect this problem traversing jgit versions between v5.8.1.202007141445-r and v5.9.0.202009080501-r.
Bisect identified this change
Though this change is only changing version numbers and cannot be the root cause for MissingObjectException.
I ran the load test once more using the same version and could not reproduce the problem.
Will now try to bisect Gerrit versions between stable-3.2 and stable-3.3.
ma...@gmail.com <ma...@gmail.com> #21
here is the log of bisecting jgit versions:
$ git bisect log
git bisect start
# bad: [dd169769bf42115e1dee749efeecab84544b28c4] JGit v5.9.0.202009080501-r
git bisect bad dd169769bf42115e1dee749efeecab84544b28c4
# good: [246954e0d66a1e38282d0786f10df8da54911628] JGit v5.8.1.202007141445-r
git bisect good 246954e0d66a1e38282d0786f10df8da54911628
# good: [72b111ecd7ddd5fb980767e87ff5515c05a48dbc] Update javadoc for RemoteSession and SshSessionFactory
git bisect good 72b111ecd7ddd5fb980767e87ff5515c05a48dbc
# bad: [14d5a7497f2f599d6bc24dbccd90a2a22ccbd599] Merge changes Ib0b2dc7c,I827da397 into stable-5.9
git bisect bad 14d5a7497f2f599d6bc24dbccd90a2a22ccbd599
# bad: [bf6b2b93147b822baf465fb1ab1b70ba71768906] JGit v5.9.0.202008260805-m3
git bisect bad bf6b2b93147b822baf465fb1ab1b70ba71768906
# good: [eec9b55dcf8fc08f431f16eaf139ab009453d445] FS: don't cache fallback if running in background
git bisect good eec9b55dcf8fc08f431f16eaf139ab009453d445
# good: [e9c7ba6fdccf64b16fdadd74106175f95454ec4f] Do not prematurely create directory of jgit's XDG config file
git bisect good e9c7ba6fdccf64b16fdadd74106175f95454ec4f
# good: [0220f32e5a60d3f0ac4acd3d2f35fd5a2a44809a] Fix possible NegativeArraySizeException in PackIndexV1
git bisect good 0220f32e5a60d3f0ac4acd3d2f35fd5a2a44809a
# first bad commit: [bf6b2b93147b822baf465fb1ab1b70ba71768906] JGit v5.9.0.202008260805-m3
$ git bisect log
git bisect start
# bad: [dd169769bf42115e1dee749efeecab84544b28c4] JGit v5.9.0.202009080501-r
git bisect bad dd169769bf42115e1dee749efeecab84544b28c4
# good: [246954e0d66a1e38282d0786f10df8da54911628] JGit v5.8.1.202007141445-r
git bisect good 246954e0d66a1e38282d0786f10df8da54911628
# good: [72b111ecd7ddd5fb980767e87ff5515c05a48dbc] Update javadoc for RemoteSession and SshSessionFactory
git bisect good 72b111ecd7ddd5fb980767e87ff5515c05a48dbc
# bad: [14d5a7497f2f599d6bc24dbccd90a2a22ccbd599] Merge changes Ib0b2dc7c,I827da397 into stable-5.9
git bisect bad 14d5a7497f2f599d6bc24dbccd90a2a22ccbd599
# bad: [bf6b2b93147b822baf465fb1ab1b70ba71768906] JGit v5.9.0.202008260805-m3
git bisect bad bf6b2b93147b822baf465fb1ab1b70ba71768906
# good: [eec9b55dcf8fc08f431f16eaf139ab009453d445] FS: don't cache fallback if running in background
git bisect good eec9b55dcf8fc08f431f16eaf139ab009453d445
# good: [e9c7ba6fdccf64b16fdadd74106175f95454ec4f] Do not prematurely create directory of jgit's XDG config file
git bisect good e9c7ba6fdccf64b16fdadd74106175f95454ec4f
# good: [0220f32e5a60d3f0ac4acd3d2f35fd5a2a44809a] Fix possible NegativeArraySizeException in PackIndexV1
git bisect good 0220f32e5a60d3f0ac4acd3d2f35fd5a2a44809a
# first bad commit: [bf6b2b93147b822baf465fb1ab1b70ba71768906] JGit v5.9.0.202008260805-m3
sy...@gmail.com <sy...@gmail.com> #22
Just an update on this, perhaps it's useful.
I have tried to replicate the issue by disabling the autogc in jgit.config, and running GC it as an SSH command over a fix period of time:
```
watch -n xxx 'ssh -p 29418 admin@localhost gerrit gc load-test --show-progress'
```
where xxx was: 3 secs, 10 secs, 30 secs, 60 secs.
Under these conditions I wasn't able to cause the Missing Unknown error.
I have tried to replicate the issue by disabling the autogc in jgit.config, and running GC it as an SSH command over a fix period of time:
```
watch -n xxx 'ssh -p 29418 admin@localhost gerrit gc load-test --show-progress'
```
where xxx was: 3 secs, 10 secs, 30 secs, 60 secs.
Under these conditions I wasn't able to cause the Missing Unknown error.
lu...@gmail.com <lu...@gmail.com> #23
I tried to put some logging on what happens during an autoGc, and I see:
[2020-11-23T15:35:46.745Z] [JGit-WorkQueue] INFO org.eclipse.jgit.internal.storage.file.GC : GC startedorg.eclipse.jgit.internal.storage.file.GC@340aaa24
[2020-11-23T15:35:47.801Z] [ReceiveCommits-2[java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@457b16cb[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@4f837dbb[Wrapped task = com.google.gerrit.server.logging.LoggingContextAwareRunnable@41449a3b]]]-for-SSH git-receive-pack /test-repo (admin)] ERROR org.eclipse.jgit.util.FS : /Users/lucamilanesio/gerrit-3.3.0/git/test-repo.git/refs/changes/22
java.nio.file.NoSuchFileException: /Users/lucamilanesio/gerrit-3.3.0/git/test-repo.git/refs/changes/22
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at java.base/sun.nio.fs.BsdFileStore.findMountEntry(BsdFileStore.java:61)
at java.base/sun.nio.fs.UnixFileStore.<init>(UnixFileStore.java:67)
at java.base/sun.nio.fs.BsdFileStore.<init>(BsdFileStore.java:40)
at java.base/sun.nio.fs.BsdFileSystemProvider.getFileStore(BsdFileSystemProvider.java:46)
at java.base/sun.nio.fs.BsdFileSystemProvider.getFileStore(BsdFileSystemProvider.java:34)
at java.base/sun.nio.fs.UnixFileSystemProvider.getFileStore(UnixFileSystemProvider.java:373)
at java.base/java.nio.file.Files.getFileStore(Files.java:1487)
at org.eclipse.jgit.util.FS$FileStoreAttributes.getFileStoreAttributes(FS.java:360)
at org.eclipse.jgit.util.FS$FileStoreAttributes.get(FS.java:344)
at org.eclipse.jgit.util.FS.getFileStoreAttributes(FS.java:908)
at org.eclipse.jgit.internal.storage.file.FileSnapshot.<init>(FileSnapshot.java:224)
at org.eclipse.jgit.internal.storage.file.FileSnapshot.<init>(FileSnapshot.java:205)
at org.eclipse.jgit.internal.storage.file.FileSnapshot.save(FileSnapshot.java:102)
at org.eclipse.jgit.internal.storage.file.RefDirectory.scanRef(RefDirectory.java:1123)
at org.eclipse.jgit.internal.storage.file.RefDirectory$LooseScanner.scanOne(RefDirectory.java:509)
at org.eclipse.jgit.internal.storage.file.RefDirectory$LooseScanner.scanTree(RefDirectory.java:477)
at org.eclipse.jgit.internal.storage.file.RefDirectory$LooseScanner.scanTree(RefDirectory.java:475)
at org.eclipse.jgit.internal.storage.file.RefDirectory$LooseScanner.scanTree(RefDirectory.java:475)
at org.eclipse.jgit.internal.storage.file.RefDirectory$LooseScanner.scan(RefDirectory.java:431)
at org.eclipse.jgit.internal.storage.file.RefDirectory.getRefs(RefDirectory.java:363)
at org.eclipse.jgit.internal.storage.file.PackedBatchRefUpdate.checkConflictingNames(PackedBatchRefUpdate.java:217)
at org.eclipse.jgit.internal.storage.file.PackedBatchRefUpdate.execute(PackedBatchRefUpdate.java:132)
at org.eclipse.jgit.lib.BatchRefUpdate.execute(BatchRefUpdate.java:602)
at com.google.gerrit.git.RefUpdateUtil.executeChecked(RefUpdateUtil.java:56)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:382)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:332)
at com.google.gerrit.server.update.BatchUpdate$ChangesHandle.execute(BatchUpdate.java:588)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:148)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:412)
at com.google.gerrit.server.git.receive.ReceiveCommits.insertChangesAndPatchSets(ReceiveCommits.java:958)
at com.google.gerrit.server.git.receive.ReceiveCommits.processCommandsUnsafe(ReceiveCommits.java:691)
at com.google.gerrit.server.git.receive.ReceiveCommits.processCommands(ReceiveCommits.java:607)
at com.google.gerrit.server.git.receive.AsyncReceiveCommits.lambda$3(AsyncReceiveCommits.java:370)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ ... ]
[2020-11-23T15:35:48.729Z] [JGit-WorkQueue] INFO org.eclipse.jgit.internal.storage.file.GC : GC finishedorg.eclipse.jgit.internal.storage.file.GC@340aaa24 with 2 packfiles
It looks like it is simply the process of putting loose refs into a packed ref that creates troubles. Can we just catch the error during the JGit's 'scanRef' and retry?
I believe we would be best to fix the issue, rather than assuming the problem is autoGC and ignore it.
[2020-11-23T15:35:46.745Z] [JGit-WorkQueue] INFO org.eclipse.jgit.internal.storage.file.GC : GC startedorg.eclipse.jgit.internal.storage.file.GC@340aaa24
[2020-11-23T15:35:47.801Z] [ReceiveCommits-2[java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@457b16cb[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@4f837dbb[Wrapped task = com.google.gerrit.server.logging.LoggingContextAwareRunnable@41449a3b]]]-for-SSH git-receive-pack /test-repo (admin)] ERROR org.eclipse.jgit.util.FS : /Users/lucamilanesio/gerrit-3.3.0/git/test-repo.git/refs/changes/22
java.nio.file.NoSuchFileException: /Users/lucamilanesio/gerrit-3.3.0/git/test-repo.git/refs/changes/22
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at java.base/sun.nio.fs.BsdFileStore.findMountEntry(BsdFileStore.java:61)
at java.base/sun.nio.fs.UnixFileStore.<init>(UnixFileStore.java:67)
at java.base/sun.nio.fs.BsdFileStore.<init>(BsdFileStore.java:40)
at java.base/sun.nio.fs.BsdFileSystemProvider.getFileStore(BsdFileSystemProvider.java:46)
at java.base/sun.nio.fs.BsdFileSystemProvider.getFileStore(BsdFileSystemProvider.java:34)
at java.base/sun.nio.fs.UnixFileSystemProvider.getFileStore(UnixFileSystemProvider.java:373)
at java.base/java.nio.file.Files.getFileStore(Files.java:1487)
at org.eclipse.jgit.util.FS$FileStoreAttributes.getFileStoreAttributes(FS.java:360)
at org.eclipse.jgit.util.FS$FileStoreAttributes.get(FS.java:344)
at org.eclipse.jgit.util.FS.getFileStoreAttributes(FS.java:908)
at org.eclipse.jgit.internal.storage.file.FileSnapshot.<init>(FileSnapshot.java:224)
at org.eclipse.jgit.internal.storage.file.FileSnapshot.<init>(FileSnapshot.java:205)
at org.eclipse.jgit.internal.storage.file.FileSnapshot.save(FileSnapshot.java:102)
at org.eclipse.jgit.internal.storage.file.RefDirectory.scanRef(RefDirectory.java:1123)
at org.eclipse.jgit.internal.storage.file.RefDirectory$LooseScanner.scanOne(RefDirectory.java:509)
at org.eclipse.jgit.internal.storage.file.RefDirectory$LooseScanner.scanTree(RefDirectory.java:477)
at org.eclipse.jgit.internal.storage.file.RefDirectory$LooseScanner.scanTree(RefDirectory.java:475)
at org.eclipse.jgit.internal.storage.file.RefDirectory$LooseScanner.scanTree(RefDirectory.java:475)
at org.eclipse.jgit.internal.storage.file.RefDirectory$LooseScanner.scan(RefDirectory.java:431)
at org.eclipse.jgit.internal.storage.file.RefDirectory.getRefs(RefDirectory.java:363)
at org.eclipse.jgit.internal.storage.file.PackedBatchRefUpdate.checkConflictingNames(PackedBatchRefUpdate.java:217)
at org.eclipse.jgit.internal.storage.file.PackedBatchRefUpdate.execute(PackedBatchRefUpdate.java:132)
at org.eclipse.jgit.lib.BatchRefUpdate.execute(BatchRefUpdate.java:602)
at com.google.gerrit.git.RefUpdateUtil.executeChecked(RefUpdateUtil.java:56)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:382)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:332)
at com.google.gerrit.server.update.BatchUpdate$ChangesHandle.execute(BatchUpdate.java:588)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:148)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:412)
at com.google.gerrit.server.git.receive.ReceiveCommits.insertChangesAndPatchSets(ReceiveCommits.java:958)
at com.google.gerrit.server.git.receive.ReceiveCommits.processCommandsUnsafe(ReceiveCommits.java:691)
at com.google.gerrit.server.git.receive.ReceiveCommits.processCommands(ReceiveCommits.java:607)
at com.google.gerrit.server.git.receive.AsyncReceiveCommits.lambda$3(AsyncReceiveCommits.java:370)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ ... ]
[2020-11-23T15:35:48.729Z] [JGit-WorkQueue] INFO org.eclipse.jgit.internal.storage.file.GC : GC finishedorg.eclipse.jgit.internal.storage.file.GC@340aaa24 with 2 packfiles
It looks like it is simply the process of putting loose refs into a packed ref that creates troubles. Can we just catch the error during the JGit's 'scanRef' and retry?
I believe we would be best to fix the issue, rather than assuming the problem is autoGC and ignore it.
lu...@gmail.com <lu...@gmail.com> #24
@Matthias I believe I’ve found the culprit of the problem
see this piece of code in JGit:
private static FileStoreAttributes getFileStoreAttributes(Path dir) {
FileStore s;
try {
if (Files.exists(dir)) {
s = Files.getFileStore(dir);
FileStoreAttributes c = attributeCache.get(s);
if (c != null) {
return c;
}
if (!Files.isWritable(dir)) {
// cannot measure resolution in a read-only directory
LOG.debug(
"{}: cannot measure timestamp resolution in read-only directory {}", //$NON-NLS-1$
Thread.currentThread(), dir);
return FALLBACK_FILESTORE_ATTRIBUTES;
}
} else {
// cannot determine FileStore of an unborn directory
LOG.debug(
"{}: cannot measure timestamp resolution of unborn directory {}", //$NON-NLS-1$
Thread.currentThread(), dir);
return FALLBACK_FILESTORE_ATTRIBUTES;
}
as you can see, there isn’t any locking between the Files.exist(dir) and Files.getFileStore(dir)
so, basically, it may happen that the GC is getting rid of the directory during the GC phase
and that disappears between the Files.exist(dir) and Files.getFileStore(dir) right?
I believe the fix is super-easy: just catch java.nio.file.NoSuchFileException and return the FAILLBACK_FILESTORE_ATTRIBUTES
Does that make sense to you?
see this piece of code in JGit:
private static FileStoreAttributes getFileStoreAttributes(Path dir) {
FileStore s;
try {
if (Files.exists(dir)) {
s = Files.getFileStore(dir);
FileStoreAttributes c = attributeCache.get(s);
if (c != null) {
return c;
}
if (!Files.isWritable(dir)) {
// cannot measure resolution in a read-only directory
LOG.debug(
"{}: cannot measure timestamp resolution in read-only directory {}", //$NON-NLS-1$
Thread.currentThread(), dir);
return FALLBACK_FILESTORE_ATTRIBUTES;
}
} else {
// cannot determine FileStore of an unborn directory
LOG.debug(
"{}: cannot measure timestamp resolution of unborn directory {}", //$NON-NLS-1$
Thread.currentThread(), dir);
return FALLBACK_FILESTORE_ATTRIBUTES;
}
as you can see, there isn’t any locking between the Files.exist(dir) and Files.getFileStore(dir)
so, basically, it may happen that the GC is getting rid of the directory during the GC phase
and that disappears between the Files.exist(dir) and Files.getFileStore(dir) right?
I believe the fix is super-easy: just catch java.nio.file.NoSuchFileException and return the FAILLBACK_FILESTORE_ATTRIBUTES
Does that make sense to you?
lu...@gmail.com <lu...@gmail.com> #25
[Empty comment from Monorail migration]
ma...@gmail.com <ma...@gmail.com> #26
I guess it's auto-gc triggered by an earlier receive-pack causing that while another PackedBatchRefUpdate checks if there are refs conflicting with new ones which are potentially created by the ref update at hand. I think it cannot be caused by the same PackedBatchRefUpdate instance since it starts auto gc at the end of its service method after the ref updates were already processed.
This IOException is handled in RefDirectory$LooseScanner.scanOne() when PackedBatchRefUpdate looks for conflicting refs. If this exception is caught scanOne assumes this loose ref has gone missing and removes it from its cache of loose refs. Not sure how this could cause a MissingObjectException when PackedBatchRefUpdate checks for conflicting refs.
This IOException is handled in RefDirectory$LooseScanner.scanOne() when PackedBatchRefUpdate looks for conflicting refs. If this exception is caught scanOne assumes this loose ref has gone missing and removes it from its cache of loose refs. Not sure how this could cause a MissingObjectException when PackedBatchRefUpdate checks for conflicting refs.
ma...@gmail.com <ma...@gmail.com> #27
This particular exception can be completely avoided using patch [1] which caches the FileStore of all files underneath
a given repository assuming no repository spans more than one filesystem volume.
[1]https://git.eclipse.org/r/c/jgit/jgit/+/170138
a given repository assuming no repository spans more than one filesystem volume.
[1]
lu...@gmail.com <lu...@gmail.com> #28
Thanks, Matthias, for the feedback.
The [1] looks definitely interesting and useful, however, it is currently in Merge conflict and it is quite big :-(
Do you believe there is space for rebasing it, reviewing and merge it by today?
If not, can we just delay the release for one more week and get [1] merged?
WDYT?
The [1] looks definitely interesting and useful, however, it is currently in Merge conflict and it is quite big :-(
Do you believe there is space for rebasing it, reviewing and merge it by today?
If not, can we just delay the release for one more week and get [1] merged?
WDYT?
lu...@gmail.com <lu...@gmail.com> #29
I've posted a simple and small self-contained fix at:
https://git.eclipse.org/r/c/jgit/jgit/+/172695
I'm testing it locally to see if that alone can resolve the issue.
I'm testing it locally to see if that alone can resolve the issue.
lu...@gmail.com <lu...@gmail.com> #30
I don't think is enough to solve the issue :-(
In another run I got the following error:
Caused by: com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.GitUpdateFailureException: Update failed: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 c9d802aaba28f40fd64a8f73fd2a7e1f4ac89efa refs/changes/12/8612/1 (REJECTED_MISSING_OBJECT)
CREATE: 0000000000000000000000000000000000000000 6d040900a635e13d2830d0acde879cac04b8eec2 refs/changes/12/8612/meta (REJECTED_OTHER_REASON: transaction aborted)
]
The above is quite worrying, and it was in the middle of a GC cycle.
What is basically happening is a MissingObjectException in the middle of a ref-update, which isn't supposed to happen right? It was the creation of a brand-new change, so the object wasn't prunable for sure.
In another run I got the following error:
Caused by: com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.GitUpdateFailureException: Update failed: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 c9d802aaba28f40fd64a8f73fd2a7e1f4ac89efa refs/changes/12/8612/1 (REJECTED_MISSING_OBJECT)
CREATE: 0000000000000000000000000000000000000000 6d040900a635e13d2830d0acde879cac04b8eec2 refs/changes/12/8612/meta (REJECTED_OTHER_REASON: transaction aborted)
]
The above is quite worrying, and it was in the middle of a GC cycle.
What is basically happening is a MissingObjectException in the middle of a ref-update, which isn't supposed to happen right? It was the creation of a brand-new change, so the object wasn't prunable for sure.
lu...@gmail.com <lu...@gmail.com> #31
There is another piece of code that is worrying: the pruning of loose objects after a GC phase. I believe that's the only piece of code that could have removed an 'in-flight' SHA1 object whilst the receive-pack was running.
However, the default prune expire is 2 weeks: how comes that a GC cycle can remove an in-flight object?
@Matthias any light?
However, the default prune expire is 2 weeks: how comes that a GC cycle can remove an in-flight object?
@Matthias any light?
ma...@gmail.com <ma...@gmail.com> #32
> The [1] looks definitely interesting and useful, however, it is currently in Merge conflict and it is quite big :-(
> Do you believe there is space for rebasing it, reviewing and merge it by today?
> If not, can we just delay the release for one more week and get [1] merged?
I rebased [1] onto current master. I can run the load test again using Gerrit stable-3.3 with that JGit version.
Currently Gerrit stable-3.3 uses JGit v5.9.0.202009080501-r.
> I don't think is enough to solve the issue :-(
I agree and I think this fix is worse than the current implementation since it won't remove the
not found loose ref from the cache of loose refs in RefDirectory
which is already handled properly in RefDirectory$LooseScanner.scanOne().
[2] handles the IOException
[3] removes the loose ref from the internal cache if it was known before
[2]https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/refs/tags/v5.9.0.202009080501-r/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/RefDirectory.java#510
[3]https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/refs/tags/v5.9.0.202009080501-r/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/RefDirectory.java#521
> Do you believe there is space for rebasing it, reviewing and merge it by today?
> If not, can we just delay the release for one more week and get [1] merged?
I rebased [1] onto current master. I can run the load test again using Gerrit stable-3.3 with that JGit version.
Currently Gerrit stable-3.3 uses JGit v5.9.0.202009080501-r.
> I don't think is enough to solve the issue :-(
I agree and I think this fix is worse than the current implementation since it won't remove the
not found loose ref from the cache of loose refs in RefDirectory
which is already handled properly in RefDirectory$LooseScanner.scanOne().
[2] handles the IOException
[3] removes the loose ref from the internal cache if it was known before
[2]
[3]
please post the complete stacktrace of this error
> The above is quite worrying, and it was in the middle of a GC cycle.
How do you know that ?
> What is basically happening is a MissingObjectException in the middle of a ref-update, which isn't supposed to happen right? It was the creation of a brand-new change, so > the object wasn't prunable for sure.
Looking at the implementation of PackedBatchRefUpdate.execute() [4] there might be room for races with gc
during the following steps
- checkObjectExistence()
- checkNonFastForwards()
then it packs all refs to ensure they are all (or at least most of them, I think another thread could still create new loose refs) stored in packed-refs
It seems in this error also Gerrit's MetaDataUpdate was involved. I don't know enough about the notedb machinery
to assess if this plays a role in ending up in this exception.
In line 155 PackedBatchRefUpdate.execute() first acquires a reentrant transient lock which is also acquired by GC.packRefs() which ensures that
a gc running in the same process does not repack refs concurrently with PackedBatchRefUpdate.
Then it acquires File locks on all involved loose refs and packed-refs.
I don't know yet where in this sequence the missing object is detected.
The last change in PackedBatchRefUpdate was Han-Wen's fix of a race [5] in 4.11.9 in Sept 2019.
[4]
[5]
ma...@gmail.com <ma...@gmail.com> #33
I think there is no guarantee that all executions of PackedBatchRefUpdate succeed.
If a transaction is aborted the client needs to react accordingly depending on the failure type.
I don't know if the load test tries to handle such failed transactions.
If a transaction is aborted the client needs to react accordingly depending on the failure type.
I don't know if the load test tries to handle such failed transactions.
lu...@gmail.com <lu...@gmail.com> #34
See below the full stack and logs of the in-flight BLOB removed:
[2020-11-23T18:46:44.098Z] [JGit-WorkQueue] INFO org.eclipse.jgit.internal.storage.file.GC : GC started org.eclipse.jgit.internal.storage.file.GC@48799a
[2020-11-23T18:46:44.511Z] [SSH git-receive-pack /test-repo (admin)] WARN com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor worker did not call end() before returning
[2020-11-23T18:46:44.511Z] [SSH git-receive-pack /test-repo (admin)] ERROR com.google.gerrit.server.git.receive.AsyncReceiveCommits : error while processing push
java.util.concurrent.ExecutionException: com.google.gerrit.exceptions.StorageException: Can't insert change/patch set for test-repo
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at com.google.gerrit.server.git.receive.AsyncReceiveCommits.preReceive(AsyncReceiveCommits.java:387)
at com.google.gerrit.server.git.receive.AsyncReceiveCommits.lambda$1(AsyncReceiveCommits.java:332)
at org.eclipse.jgit.transport.ReceivePack.service(ReceivePack.java:2206)
at org.eclipse.jgit.transport.ReceivePack.receive(ReceivePack.java:2120)
at com.google.gerrit.sshd.commands.Receive.runImpl(Receive.java:98)
at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:108)
at com.google.gerrit.sshd.AbstractGitCommand.access$0(AbstractGitCommand.java:97)
at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:73)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:473)
at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:103)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:612)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.google.gerrit.exceptions.StorageException: Can't insert change/patch set for test-repo
at com.google.gerrit.server.git.receive.ReceiveCommits.insertChangesAndPatchSets(ReceiveCommits.java:994)
at com.google.gerrit.server.git.receive.ReceiveCommits.processCommandsUnsafe(ReceiveCommits.java:691)
at com.google.gerrit.server.git.receive.ReceiveCommits.processCommands(ReceiveCommits.java:607)
at com.google.gerrit.server.git.receive.AsyncReceiveCommits.lambda$3(AsyncReceiveCommits.java:370)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at com.google.gerrit.server.util.RequestScopePropagator.lambda$2(RequestScopePropagator.java:182)
at com.google.gerrit.server.util.RequestScopePropagator.lambda$0(RequestScopePropagator.java:170)
at com.google.gerrit.server.util.ThreadLocalRequestScopePropagator.lambda$0(ThreadLocalRequestScopePropagator.java:45)
at com.google.gerrit.server.util.RequestScopePropagator$1.call(RequestScopePropagator.java:85)
at com.google.gerrit.server.util.RequestScopePropagator$2.run(RequestScopePropagator.java:116)
... 8 more
Caused by: com.google.gerrit.extensions.restapi.RestApiException: Error inserting change/patchset
at com.google.gerrit.extensions.restapi.RestApiException.wrap(RestApiException.java:25)
at com.google.gerrit.server.git.receive.ReceiveCommits.asRestApiException(ReceiveCommits.java:304)
at com.google.gerrit.server.git.receive.ReceiveCommits.insertChangesAndPatchSets(ReceiveCommits.java:960)
... 18 more
Caused by: com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.GitUpdateFailureException: Update failed: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 c9d802aaba28f40fd64a8f73fd2a7e1f4ac89efa refs/changes/12/8612/1 (REJECTED_MISSING_OBJECT)
CREATE: 0000000000000000000000000000000000000000 6d040900a635e13d2830d0acde879cac04b8eec2 refs/changes/12/8612/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.server.update.BatchUpdate.wrapAndThrowException(BatchUpdate.java:229)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:172)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:412)
at com.google.gerrit.server.git.receive.ReceiveCommits.insertChangesAndPatchSets(ReceiveCommits.java:958)
... 18 more
Caused by: com.google.gerrit.git.GitUpdateFailureException: Update failed: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 c9d802aaba28f40fd64a8f73fd2a7e1f4ac89efa refs/changes/12/8612/1 (REJECTED_MISSING_OBJECT)
CREATE: 0000000000000000000000000000000000000000 6d040900a635e13d2830d0acde879cac04b8eec2 refs/changes/12/8612/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.git.RefUpdateUtil.checkResults(RefUpdateUtil.java:102)
at com.google.gerrit.git.RefUpdateUtil.executeChecked(RefUpdateUtil.java:57)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:382)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:332)
at com.google.gerrit.server.update.BatchUpdate$ChangesHandle.execute(BatchUpdate.java:588)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:148)
... 20 more
[2020-11-23T18:46:44.098Z] [JGit-WorkQueue] INFO org.eclipse.jgit.internal.storage.file.GC : GC started org.eclipse.jgit.internal.storage.file.GC@48799a
[2020-11-23T18:46:44.511Z] [SSH git-receive-pack /test-repo (admin)] WARN com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor worker did not call end() before returning
[2020-11-23T18:46:44.511Z] [SSH git-receive-pack /test-repo (admin)] ERROR com.google.gerrit.server.git.receive.AsyncReceiveCommits : error while processing push
java.util.concurrent.ExecutionException: com.google.gerrit.exceptions.StorageException: Can't insert change/patch set for test-repo
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at com.google.gerrit.server.git.receive.AsyncReceiveCommits.preReceive(AsyncReceiveCommits.java:387)
at com.google.gerrit.server.git.receive.AsyncReceiveCommits.lambda$1(AsyncReceiveCommits.java:332)
at org.eclipse.jgit.transport.ReceivePack.service(ReceivePack.java:2206)
at org.eclipse.jgit.transport.ReceivePack.receive(ReceivePack.java:2120)
at com.google.gerrit.sshd.commands.Receive.runImpl(Receive.java:98)
at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:108)
at com.google.gerrit.sshd.AbstractGitCommand.access$0(AbstractGitCommand.java:97)
at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:73)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:473)
at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:103)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:612)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.google.gerrit.exceptions.StorageException: Can't insert change/patch set for test-repo
at com.google.gerrit.server.git.receive.ReceiveCommits.insertChangesAndPatchSets(ReceiveCommits.java:994)
at com.google.gerrit.server.git.receive.ReceiveCommits.processCommandsUnsafe(ReceiveCommits.java:691)
at com.google.gerrit.server.git.receive.ReceiveCommits.processCommands(ReceiveCommits.java:607)
at com.google.gerrit.server.git.receive.AsyncReceiveCommits.lambda$3(AsyncReceiveCommits.java:370)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at com.google.gerrit.server.util.RequestScopePropagator.lambda$2(RequestScopePropagator.java:182)
at com.google.gerrit.server.util.RequestScopePropagator.lambda$0(RequestScopePropagator.java:170)
at com.google.gerrit.server.util.ThreadLocalRequestScopePropagator.lambda$0(ThreadLocalRequestScopePropagator.java:45)
at com.google.gerrit.server.util.RequestScopePropagator$1.call(RequestScopePropagator.java:85)
at com.google.gerrit.server.util.RequestScopePropagator$2.run(RequestScopePropagator.java:116)
... 8 more
Caused by: com.google.gerrit.extensions.restapi.RestApiException: Error inserting change/patchset
at com.google.gerrit.extensions.restapi.RestApiException.wrap(RestApiException.java:25)
at com.google.gerrit.server.git.receive.ReceiveCommits.asRestApiException(ReceiveCommits.java:304)
at com.google.gerrit.server.git.receive.ReceiveCommits.insertChangesAndPatchSets(ReceiveCommits.java:960)
... 18 more
Caused by: com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.GitUpdateFailureException: Update failed: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 c9d802aaba28f40fd64a8f73fd2a7e1f4ac89efa refs/changes/12/8612/1 (REJECTED_MISSING_OBJECT)
CREATE: 0000000000000000000000000000000000000000 6d040900a635e13d2830d0acde879cac04b8eec2 refs/changes/12/8612/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.server.update.BatchUpdate.wrapAndThrowException(BatchUpdate.java:229)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:172)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:412)
at com.google.gerrit.server.git.receive.ReceiveCommits.insertChangesAndPatchSets(ReceiveCommits.java:958)
... 18 more
Caused by: com.google.gerrit.git.GitUpdateFailureException: Update failed: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 c9d802aaba28f40fd64a8f73fd2a7e1f4ac89efa refs/changes/12/8612/1 (REJECTED_MISSING_OBJECT)
CREATE: 0000000000000000000000000000000000000000 6d040900a635e13d2830d0acde879cac04b8eec2 refs/changes/12/8612/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.git.RefUpdateUtil.checkResults(RefUpdateUtil.java:102)
at com.google.gerrit.git.RefUpdateUtil.executeChecked(RefUpdateUtil.java:57)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:382)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:332)
at com.google.gerrit.server.update.BatchUpdate$ChangesHandle.execute(BatchUpdate.java:588)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:148)
... 20 more
lu...@gmail.com <lu...@gmail.com> #35
@Matthias I believe I'm getting closer to where the problem is.
I have a default prune time of 2.weeks.ago, so the GC should *NOT* prune objects unless they are 2 weeks old, correct?
When the load-test was running, I ran the following command every sec:
OUT=files-$(date +'%s')
find objects -type f | sort > $OUT
wc -l $OUT
Then, I see the output and, sometimes, the number is going down, instead of increasing ... which is highly unexpected.
See below one example of files removed by the GC phase:
747 files-1606167915
738 files-1606167916
At TS=1606167915 there were 747 files, but at TS=1606167916 there are only 738 files.
The difference between the two is:
diff files-1606167915 files-1606167916
1d0
< objects/03/54207ad3d35df7c8aa75db8ba19824bed8bf1b
3d1
< objects/19/b5c8b58c666a3448bc924952465353708abef6
7d4
< objects/25/308145e2136833ab915c88527c5a44fbaed8d3
12d8
< objects/50/550b83027afc0dada65438290afccb1576683f
14d9
< objects/6a/dada79c4603ec4559c0f9517172c00b2ad4c97
16,17d10
< objects/85/bbfc5ca1901cc99c89c3e684ae025ae248a035
< objects/b1/d5b1b5d811f8e68f3e922ef54a8f588b64fdc4
19d11
< objects/d9/4a43b62baa754fccdb102cc0f088e6a9e02247
21d12
< objects/e7/bbe52b5e1a69f80719ed5943ca7fb6f1a7bd3b
Those are *ALL* loose objects, isn't it?
How come that GC removed them?
Is there any explanation of that?
P.S. It should be fairly easy to reproduce: just run the above script whilst the test is running, with continuous GC happening in parallel.
I have a default prune time of 2.weeks.ago, so the GC should *NOT* prune objects unless they are 2 weeks old, correct?
When the load-test was running, I ran the following command every sec:
OUT=files-$(date +'%s')
find objects -type f | sort > $OUT
wc -l $OUT
Then, I see the output and, sometimes, the number is going down, instead of increasing ... which is highly unexpected.
See below one example of files removed by the GC phase:
747 files-1606167915
738 files-1606167916
At TS=1606167915 there were 747 files, but at TS=1606167916 there are only 738 files.
The difference between the two is:
diff files-1606167915 files-1606167916
1d0
< objects/03/54207ad3d35df7c8aa75db8ba19824bed8bf1b
3d1
< objects/19/b5c8b58c666a3448bc924952465353708abef6
7d4
< objects/25/308145e2136833ab915c88527c5a44fbaed8d3
12d8
< objects/50/550b83027afc0dada65438290afccb1576683f
14d9
< objects/6a/dada79c4603ec4559c0f9517172c00b2ad4c97
16,17d10
< objects/85/bbfc5ca1901cc99c89c3e684ae025ae248a035
< objects/b1/d5b1b5d811f8e68f3e922ef54a8f588b64fdc4
19d11
< objects/d9/4a43b62baa754fccdb102cc0f088e6a9e02247
21d12
< objects/e7/bbe52b5e1a69f80719ed5943ca7fb6f1a7bd3b
Those are *ALL* loose objects, isn't it?
How come that GC removed them?
Is there any explanation of that?
P.S. It should be fairly easy to reproduce: just run the above script whilst the test is running, with continuous GC happening in parallel.
lu...@gmail.com <lu...@gmail.com> #36
I've got the smoking gun ... finally :-)
Test-1: Run 2 concurrent users that are creating changes + concurrent GCs
Problem-1: the objects are pruned even when they should NOT.
Problem-2: if I run the 'git fsck' I see lots of missing objects. That translates in missing unknown from a Gerrit's perspective
Test-2: Comment out the 'prune(Collections.emptySet());' line at GC.java:270
No objects are ever pruned. The repository is always consistent.
--- * ---
I believe there is some bug in the pruning logic, possibly something introduced recently?
Test-1: Run 2 concurrent users that are creating changes + concurrent GCs
Problem-1: the objects are pruned even when they should NOT.
Problem-2: if I run the 'git fsck' I see lots of missing objects. That translates in missing unknown from a Gerrit's perspective
Test-2: Comment out the 'prune(Collections.emptySet());' line at GC.java:270
No objects are ever pruned. The repository is always consistent.
--- * ---
I believe there is some bug in the pruning logic, possibly something introduced recently?
lu...@gmail.com <lu...@gmail.com> #37
I added some logging on the JGit GC code where files are removed, and I do see *plenty* of delete of loose objects:
grep DELETE error_log| grep objects | grep -v pack | grep -v keep | grep -v lock | wc -l
331
Going now to understand why they are removed, as they should not be touched until they expire, isn't it?
grep DELETE error_log| grep objects | grep -v pack | grep -v keep | grep -v lock | wc -l
331
Going now to understand why they are removed, as they should not be touched until they expire, isn't it?
lu...@gmail.com <lu...@gmail.com> #38
OK, it looks like the GC.prunePacked() doesn't make any check on the expiry: if it founds a loose object that is already contained in a packfile, it just gets rid of it, without checking the expiry time.
I'll try to comment that piece of code and see if the problem disappears.
I'll try to comment that piece of code and see if the problem disappears.
lu...@gmail.com <lu...@gmail.com> #39
See below an example of the type of corrupted repository gets created with concurrent GCs and receive-pack:
git fsck | grep -v dangling
Checking object directories: 100% (256/256), done.
Checking objects: 100% (246461/246461), done.
error: refs/heads/branch-a1dba62459b5-3: invalid sha1 pointer 57faebc7496ddb2559858f6208f753c529fce657
error: refs/heads/branch-a1dba62459b5-3: invalid reflog entry 57faebc7496ddb2559858f6208f753c529fce657
See below the content of the reflog for refs/heads/branch-a1dba62459b5-3:
0000000000000000000000000000000000000000 b0d7c3bfd1d2a393ed4154cc357a432bb0c8ce8b Administrator <admin|account-1000000@127.0.0.1> 1606238804 +0000 push
b0d7c3bfd1d2a393ed4154cc357a432bb0c8ce8b 0f69f91cd0a78808c8a62d48da94acbfec5cf954 Administrator <admin|account-1000000@127.0.0.1> 1606238805 +0000 push
0f69f91cd0a78808c8a62d48da94acbfec5cf954 a90120b169eb85a87d53dc0f2094cd650c4724c8 Administrator <admin|account-1000000@127.0.0.1> 1606238863 +0000 push
a90120b169eb85a87d53dc0f2094cd650c4724c8 d48b4d0804fefbd6912c54b1104e9b9d254a41bf Administrator <admin|account-1000000@127.0.0.1> 1606238866 +0000 push
d48b4d0804fefbd6912c54b1104e9b9d254a41bf 5225f8e560f4d0fefd3723d88b7906543ac92212 Administrator <admin|account-1000000@127.0.0.1> 1606238929 +0000 push
5225f8e560f4d0fefd3723d88b7906543ac92212 f9c10f1e8174160d0777fcb9c893b1494c5383a4 Administrator <admin|account-1000000@127.0.0.1> 1606238933 +0000 push
f9c10f1e8174160d0777fcb9c893b1494c5383a4 6a3843505a51720113b807fd377b2e75a21707d7 Administrator <admin|account-1000000@127.0.0.1> 1606238993 +0000 push
6a3843505a51720113b807fd377b2e75a21707d7 bdf602d97629605be574e14d1bbfb384d0bf039a Administrator <admin|account-1000000@127.0.0.1> 1606239003 +0000 push
bdf602d97629605be574e14d1bbfb384d0bf039a c0b29647ddd9a7cd6b737fa01324a9689a140c23 Administrator <admin|account-1000000@127.0.0.1> 1606239056 +0000 push
c0b29647ddd9a7cd6b737fa01324a9689a140c23 e956b41b7f6483a1046300a487af61da489f3d46 Administrator <admin|account-1000000@127.0.0.1> 1606239064 +0000 push
e956b41b7f6483a1046300a487af61da489f3d46 06431329dba5716509e4e982e8676cf24a149c9f Administrator <admin|account-1000000@127.0.0.1> 1606239134 +0000 push
06431329dba5716509e4e982e8676cf24a149c9f 94e8bcd1ccf323b6231785133e51ee4c6155a8e6 Administrator <admin|account-1000000@127.0.0.1> 1606239147 +0000 push
94e8bcd1ccf323b6231785133e51ee4c6155a8e6 bfe613b51d91809d93a4e092af33f2377d6e2d92 Administrator <admin|account-1000000@127.0.0.1> 1606239225 +0000 push
bfe613b51d91809d93a4e092af33f2377d6e2d92 f2683c6c17a8a4d75785679d3b39cee7412867fd Administrator <admin|account-1000000@127.0.0.1> 1606239242 +0000 push
f2683c6c17a8a4d75785679d3b39cee7412867fd 7b604bdbaaf7c01d9963639555886ff64dcddf42 Administrator <admin|account-1000000@127.0.0.1> 1606239330 +0000 push
7b604bdbaaf7c01d9963639555886ff64dcddf42 de6487eeb56e5dfd5aa389e76937a3a143827467 Administrator <admin|account-1000000@127.0.0.1> 1606239347 +0000 push
de6487eeb56e5dfd5aa389e76937a3a143827467 a29f7862a1f89f520291450c16484cb036e77de8 Administrator <admin|account-1000000@127.0.0.1> 1606239413 +0000 push
a29f7862a1f89f520291450c16484cb036e77de8 ebdb79857bb71de650ff41f82bb21553f69e8dfc Administrator <admin|account-1000000@127.0.0.1> 1606239436 +0000 push
ebdb79857bb71de650ff41f82bb21553f69e8dfc 9cc6cb743da09250b826d6e6855cd23ac7e826ec Administrator <admin|account-1000000@127.0.0.1> 1606239502 +0000 push
9cc6cb743da09250b826d6e6855cd23ac7e826ec 3313477fa268096ad390a886b27ef9ceb5e2c1e1 Administrator <admin|account-1000000@127.0.0.1> 1606239531 +0000 push
3313477fa268096ad390a886b27ef9ceb5e2c1e1 2ad2f51efd759b2709b689eb6947a6a6bbf685d7 Administrator <admin|account-1000000@127.0.0.1> 1606239598 +0000 push
2ad2f51efd759b2709b689eb6947a6a6bbf685d7 c128e1bf31a1d5034d427e3faea00f61533a4d4f Administrator <admin|account-1000000@127.0.0.1> 1606239623 +0000 push
c128e1bf31a1d5034d427e3faea00f61533a4d4f d756539838f580616cea34225f984cc53571c49e Administrator <admin|account-1000000@127.0.0.1> 1606239683 +0000 push
d756539838f580616cea34225f984cc53571c49e 57faebc7496ddb2559858f6208f753c529fce657 Administrator <admin|account-1000000@127.0.0.1> 1606239728 +0000 push
The last push (d756539838f580616cea34225f984cc53571c49e => 57faebc7496ddb2559858f6208f753c529fce657) was successful, it updated the ref-log and the target branch. However, the GC removed the BLOB and now the branch and ref-log are pointing to an non-existent object.
git fsck | grep -v dangling
Checking object directories: 100% (256/256), done.
Checking objects: 100% (246461/246461), done.
error: refs/heads/branch-a1dba62459b5-3: invalid sha1 pointer 57faebc7496ddb2559858f6208f753c529fce657
error: refs/heads/branch-a1dba62459b5-3: invalid reflog entry 57faebc7496ddb2559858f6208f753c529fce657
See below the content of the reflog for refs/heads/branch-a1dba62459b5-3:
0000000000000000000000000000000000000000 b0d7c3bfd1d2a393ed4154cc357a432bb0c8ce8b Administrator <admin|account-1000000@127.0.0.1> 1606238804 +0000 push
b0d7c3bfd1d2a393ed4154cc357a432bb0c8ce8b 0f69f91cd0a78808c8a62d48da94acbfec5cf954 Administrator <admin|account-1000000@127.0.0.1> 1606238805 +0000 push
0f69f91cd0a78808c8a62d48da94acbfec5cf954 a90120b169eb85a87d53dc0f2094cd650c4724c8 Administrator <admin|account-1000000@127.0.0.1> 1606238863 +0000 push
a90120b169eb85a87d53dc0f2094cd650c4724c8 d48b4d0804fefbd6912c54b1104e9b9d254a41bf Administrator <admin|account-1000000@127.0.0.1> 1606238866 +0000 push
d48b4d0804fefbd6912c54b1104e9b9d254a41bf 5225f8e560f4d0fefd3723d88b7906543ac92212 Administrator <admin|account-1000000@127.0.0.1> 1606238929 +0000 push
5225f8e560f4d0fefd3723d88b7906543ac92212 f9c10f1e8174160d0777fcb9c893b1494c5383a4 Administrator <admin|account-1000000@127.0.0.1> 1606238933 +0000 push
f9c10f1e8174160d0777fcb9c893b1494c5383a4 6a3843505a51720113b807fd377b2e75a21707d7 Administrator <admin|account-1000000@127.0.0.1> 1606238993 +0000 push
6a3843505a51720113b807fd377b2e75a21707d7 bdf602d97629605be574e14d1bbfb384d0bf039a Administrator <admin|account-1000000@127.0.0.1> 1606239003 +0000 push
bdf602d97629605be574e14d1bbfb384d0bf039a c0b29647ddd9a7cd6b737fa01324a9689a140c23 Administrator <admin|account-1000000@127.0.0.1> 1606239056 +0000 push
c0b29647ddd9a7cd6b737fa01324a9689a140c23 e956b41b7f6483a1046300a487af61da489f3d46 Administrator <admin|account-1000000@127.0.0.1> 1606239064 +0000 push
e956b41b7f6483a1046300a487af61da489f3d46 06431329dba5716509e4e982e8676cf24a149c9f Administrator <admin|account-1000000@127.0.0.1> 1606239134 +0000 push
06431329dba5716509e4e982e8676cf24a149c9f 94e8bcd1ccf323b6231785133e51ee4c6155a8e6 Administrator <admin|account-1000000@127.0.0.1> 1606239147 +0000 push
94e8bcd1ccf323b6231785133e51ee4c6155a8e6 bfe613b51d91809d93a4e092af33f2377d6e2d92 Administrator <admin|account-1000000@127.0.0.1> 1606239225 +0000 push
bfe613b51d91809d93a4e092af33f2377d6e2d92 f2683c6c17a8a4d75785679d3b39cee7412867fd Administrator <admin|account-1000000@127.0.0.1> 1606239242 +0000 push
f2683c6c17a8a4d75785679d3b39cee7412867fd 7b604bdbaaf7c01d9963639555886ff64dcddf42 Administrator <admin|account-1000000@127.0.0.1> 1606239330 +0000 push
7b604bdbaaf7c01d9963639555886ff64dcddf42 de6487eeb56e5dfd5aa389e76937a3a143827467 Administrator <admin|account-1000000@127.0.0.1> 1606239347 +0000 push
de6487eeb56e5dfd5aa389e76937a3a143827467 a29f7862a1f89f520291450c16484cb036e77de8 Administrator <admin|account-1000000@127.0.0.1> 1606239413 +0000 push
a29f7862a1f89f520291450c16484cb036e77de8 ebdb79857bb71de650ff41f82bb21553f69e8dfc Administrator <admin|account-1000000@127.0.0.1> 1606239436 +0000 push
ebdb79857bb71de650ff41f82bb21553f69e8dfc 9cc6cb743da09250b826d6e6855cd23ac7e826ec Administrator <admin|account-1000000@127.0.0.1> 1606239502 +0000 push
9cc6cb743da09250b826d6e6855cd23ac7e826ec 3313477fa268096ad390a886b27ef9ceb5e2c1e1 Administrator <admin|account-1000000@127.0.0.1> 1606239531 +0000 push
3313477fa268096ad390a886b27ef9ceb5e2c1e1 2ad2f51efd759b2709b689eb6947a6a6bbf685d7 Administrator <admin|account-1000000@127.0.0.1> 1606239598 +0000 push
2ad2f51efd759b2709b689eb6947a6a6bbf685d7 c128e1bf31a1d5034d427e3faea00f61533a4d4f Administrator <admin|account-1000000@127.0.0.1> 1606239623 +0000 push
c128e1bf31a1d5034d427e3faea00f61533a4d4f d756539838f580616cea34225f984cc53571c49e Administrator <admin|account-1000000@127.0.0.1> 1606239683 +0000 push
d756539838f580616cea34225f984cc53571c49e 57faebc7496ddb2559858f6208f753c529fce657 Administrator <admin|account-1000000@127.0.0.1> 1606239728 +0000 push
The last push (d756539838f580616cea34225f984cc53571c49e => 57faebc7496ddb2559858f6208f753c529fce657) was successful, it updated the ref-log and the target branch. However, the GC removed the BLOB and now the branch and ref-log are pointing to an non-existent object.
lu...@gmail.com <lu...@gmail.com> #40
> The last push (d756539838f580616cea34225f984cc53571c49e => 57faebc7496ddb2559858f6208f753c529fce657) was successful, it updated the ref-log and the target branch. However, the GC removed the BLOB and now the branch and ref-log are pointing to an non-existent object.
After more investigation, we realised that the BLOB *hasn't been* removed, it is still there in its packfile. However, that file has no index associated and therefore isn't discoverable.
How come that a packfile can be created *without an index* OR a GC phase prevent a packfile from being indexed?
I re-created the index manually and the BLOB is reachable again and the repository isn't corrupted anymore.
After more investigation, we realised that the BLOB *hasn't been* removed, it is still there in its packfile. However, that file has no index associated and therefore isn't discoverable.
How come that a packfile can be created *without an index* OR a GC phase prevent a packfile from being indexed?
I re-created the index manually and the BLOB is reachable again and the repository isn't corrupted anymore.
lu...@gmail.com <lu...@gmail.com> #41
I found a place where GC.java may actually remove an index:
/**
* Deletes orphans
* <p>
* A file is considered an orphan if it is either a "bitmap" or an index
* file, and its corresponding pack file is missing in the list.
* </p>
*/
private void deleteOrphans() {
Path packDir = repo.getObjectDatabase().getPackDirectory().toPath();
List<String> fileNames = null;
try (Stream<Path> files = Files.list(packDir)) {
fileNames = files.map(path -> path.getFileName().toString())
.filter(name -> (name.endsWith(PACK_EXT)
|| name.endsWith(BITMAP_EXT)
|| name.endsWith(INDEX_EXT)))
.sorted(Collections.reverseOrder())
.collect(Collectors.toList());
} catch (IOException e1) {
// ignore
}
if (fileNames == null) {
return;
}
String base = null;
for (String n : fileNames) {
if (n.endsWith(PACK_EXT)) {
base = n.substring(0, n.lastIndexOf('.'));
} else {
if (base == null || !n.startsWith(base)) {
try {
Files.delete(packDir.resolve(n));
} catch (IOException e) {
LOG.error(e.getMessage(), e);
}
}
}
}
}
Note above the "name.endsWith(INDEX_EXT)" in the checks. Going to add more logging there and see if that's the case.
/**
* Deletes orphans
* <p>
* A file is considered an orphan if it is either a "bitmap" or an index
* file, and its corresponding pack file is missing in the list.
* </p>
*/
private void deleteOrphans() {
Path packDir = repo.getObjectDatabase().getPackDirectory().toPath();
List<String> fileNames = null;
try (Stream<Path> files = Files.list(packDir)) {
fileNames = files.map(path -> path.getFileName().toString())
.filter(name -> (name.endsWith(PACK_EXT)
|| name.endsWith(BITMAP_EXT)
|| name.endsWith(INDEX_EXT)))
.sorted(Collections.reverseOrder())
.collect(Collectors.toList());
} catch (IOException e1) {
// ignore
}
if (fileNames == null) {
return;
}
String base = null;
for (String n : fileNames) {
if (n.endsWith(PACK_EXT)) {
base = n.substring(0, n.lastIndexOf('.'));
} else {
if (base == null || !n.startsWith(base)) {
try {
Files.delete(packDir.resolve(n));
} catch (IOException e) {
LOG.error(e.getMessage(), e);
}
}
}
}
}
Note above the "name.endsWith(INDEX_EXT)" in the checks. Going to add more logging there and see if that's the case.
lu...@gmail.com <lu...@gmail.com> #42
There is also another possibility: the packfile-*.idx was never created. That is unlikely because the receive-pack was successful, the ref was updated, the reflog also was updated. It is impossible that the destination packfile + idx would have NOT been created.
The only rationale of the index file missing, is that it was removed afterwards.
@Matthias do you have any other possible explanations?
The only rationale of the index file missing, is that it was removed afterwards.
@Matthias do you have any other possible explanations?
lu...@gmail.com <lu...@gmail.com> #43
I have added the explicit logging, in the deleteOrphans(), of what file has been deleted. I have reproduced again the "missing object" and see below what I saw:
[2020-11-25T11:14:08.929Z] [JGit-WorkQueue] INFO org.eclipse.jgit.util.FileUtils : DELETE /Users/lucamilanesio/gerrit-3.3.0/git/test-repo.git/objects/pack/pack-809098556d7f46e05ffa408ecb14aea33e1e03fe.idx
And, no surprise, in the repository I do see now the packfile alone, without his own index file:
ls -l pack-809098556d7f46e05ffa408ecb14aea33e1e03fe*
-r--r--r-- 1 lucamilanesio staff 12575 25 Nov 11:14 pack-809098556d7f46e05ffa408ecb14aea33e1e03fe.pack
That's the final smoking gun of the root cause of the problem.
@Matthias what do you think?
[2020-11-25T11:14:08.929Z] [JGit-WorkQueue] INFO org.eclipse.jgit.util.FileUtils : DELETE /Users/lucamilanesio/gerrit-3.3.0/git/test-repo.git/objects/pack/pack-809098556d7f46e05ffa408ecb14aea33e1e03fe.idx
And, no surprise, in the repository I do see now the packfile alone, without his own index file:
ls -l pack-809098556d7f46e05ffa408ecb14aea33e1e03fe*
-r--r--r-- 1 lucamilanesio staff 12575 25 Nov 11:14 pack-809098556d7f46e05ffa408ecb14aea33e1e03fe.pack
That's the final smoking gun of the root cause of the problem.
@Matthias what do you think?
lu...@gmail.com <lu...@gmail.com> #44
Pinging here the suggestion I've received from @Matthias on how to repair the situation with git porcelain commands:
Matthias Sohn 9:57 PM
You can create the missing pack index using git index-pack [1] and list the objects in the pack using git verify-pack -v [2]
[1]https://git-scm.com/docs/git-index-pack
[2]https://git-scm.com/docs/git-verify-pack
Matthias Sohn 9:57 PM
You can create the missing pack index using git index-pack [1] and list the objects in the pack using git verify-pack -v [2]
[1]
[2]
lu...@gmail.com <lu...@gmail.com> #45
I've checked the change where the deleteOrphans() was introduced and commented there:
https://git.eclipse.org/r/c/jgit/jgit/+/89395
@Matthias feel free also to follow-up on that change.
It is unclear to me, at the moment, how that started to have an impact just now and *NOT* in stable-3.2.
@Matthias feel free also to follow-up on that change.
It is unclear to me, at the moment, how that started to have an impact just now and *NOT* in stable-3.2.
sy...@gmail.com <sy...@gmail.com> #46
@Matthias, @Luca,
Just a quick update on my side that I also managed to reproduce this missing .idx scenario.
1 - Run gatling-test against a local stable-3.3-rc6
2 - Get failures:
java.util.concurrent.ExecutionException: com.google.gerrit.exceptions.StorageException: Can't insert change/patch set for load-test
...
Caused by: com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.GitUpdateFailureException: Update failed: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 6187448f478d1ca27986f6c7ea77dd97a62e2e2a refs/changes/98/198/1 (REJECTED_MISSING_OBJECT)
CREATE: 0000000000000000000000000000000000000000 238e09c2d220d33df96e945e37f9871f96fea207 refs/changes/98/198/meta (REJECTED_OTHER_REASON: transaction aborted)
]
3 - Confirmed I have a missing `pack-<>.idx` file:
➜ pack git:(master) for i in `find . -name "*.pack" | cut -d'-' -f2 | cut -d '.' -f1`; do ls pack-$i.idx > /dev/null; done;
ls: pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.idx: No such file or directory
Hope it helps
Just a quick update on my side that I also managed to reproduce this missing .idx scenario.
1 - Run gatling-test against a local stable-3.3-rc6
2 - Get failures:
java.util.concurrent.ExecutionException: com.google.gerrit.exceptions.StorageException: Can't insert change/patch set for load-test
...
Caused by: com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.GitUpdateFailureException: Update failed: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 6187448f478d1ca27986f6c7ea77dd97a62e2e2a refs/changes/98/198/1 (REJECTED_MISSING_OBJECT)
CREATE: 0000000000000000000000000000000000000000 238e09c2d220d33df96e945e37f9871f96fea207 refs/changes/98/198/meta (REJECTED_OTHER_REASON: transaction aborted)
]
3 - Confirmed I have a missing `pack-<>.idx` file:
➜ pack git:(master) for i in `find . -name "*.pack" | cut -d'-' -f2 | cut -d '.' -f1`; do ls pack-$i.idx > /dev/null; done;
ls: pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.idx: No such file or directory
Hope it helps
lu...@gmail.com <lu...@gmail.com> #47
@Tony you can now try to re-create the index and display its content, and you should see the missing object 6187448f478d1ca27986f6c7ea77dd97a62e2e2a .
fa...@gmail.com <fa...@gmail.com> #48
[Empty comment from Monorail migration]
sy...@gmail.com <sy...@gmail.com> #49
@Luca, I can confirm running index-pack allows object 6187448f478d1ca27986f6c7ea77dd97a62e2e2a to be visible and reachable again
➜ pack git:(master) git index-pack -v pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.pack
Indexing objects: 100% (6/6), done.
747f5ddb392953266e488c707bfdc18ad1150dc5
➜ pack git:(master) ls pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.idx
pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.idx
➜ pack git:(master) git verify-pack -v pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.pack
6187448f478d1ca27986f6c7ea77dd97a62e2e2a commit 339 210 12
067d3c7cb5c300c83bc2bc8acb11617bdb84567c tree 608 557 222
55fdf43c72c8fcce54220a187e744160ee8d35f7 blob 6263 4742 779
036cc076b547b418f19b700d3129d53d6f87a544 blob 7012 5307 5521
da4b1aac0c78b4be6c8a874a6cd072c966ef9f19 blob 6469 4897 10828
bb65799e67b5c489a7a4d2fd608616d9f3808706 blob 4149 3153 15725
non delta: 6 objects
pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.pack: ok
➜ pack git:(master) git index-pack -v pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.pack
Indexing objects: 100% (6/6), done.
747f5ddb392953266e488c707bfdc18ad1150dc5
➜ pack git:(master) ls pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.idx
pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.idx
➜ pack git:(master) git verify-pack -v pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.pack
6187448f478d1ca27986f6c7ea77dd97a62e2e2a commit 339 210 12
067d3c7cb5c300c83bc2bc8acb11617bdb84567c tree 608 557 222
55fdf43c72c8fcce54220a187e744160ee8d35f7 blob 6263 4742 779
036cc076b547b418f19b700d3129d53d6f87a544 blob 7012 5307 5521
da4b1aac0c78b4be6c8a874a6cd072c966ef9f19 blob 6469 4897 10828
bb65799e67b5c489a7a4d2fd608616d9f3808706 blob 4149 3153 15725
non delta: 6 objects
pack-acb5b64c72b8067a35134e3dab7388c639b44c2e.pack: ok
lu...@gmail.com <lu...@gmail.com> #50
Sounds good, I believe we have *finally* the root cause, isn't it?
ma...@gmail.com <ma...@gmail.com> #51
here is my analysis of the root cause in GC.deleteOrphans
When ReceivePack receives a pack file it does the following steps:
all files are located in "objects/pack/"
file endings relevant in "pack" dir:
.pack
.keep
.idx
.bitmap
ReceivePack.service():
receivePackAndCheckConnectivity():
receivePack():
receive the pack
parse the pack, returns packLock (.keep file)
PackInserter.flush():
write tmpPck file: "insert_<random>.pack"
write tmpIdx file: "insert_<random>.idx"
real pack name: "pack-<SHA1>.pack"
real index name: "pack-<SHA1>.idx"
atomic rename tmpPack to realPack
atomic rename tmpIdx to tmpIdx
execute commands
unlockPack() (removes .keep file)
autoGc() (triggers concurrent gc if autogc is on)
When PackInserter.flush() renames the temporary pack to the final pack-xxx.pack file
the temporary pack index file "insert_xxx.idx" has no matching .pack file with the same
base name for a short period of time.
If GC.deleteOrphans runs at exactly that time it will deduce the pack index file is orphaned and delete it.
Please review the fix:https://git.eclipse.org/r/c/jgit/jgit/+/172829
When ReceivePack receives a pack file it does the following steps:
all files are located in "objects/pack/"
file endings relevant in "pack" dir:
.pack
.keep
.idx
.bitmap
ReceivePack.service():
receivePackAndCheckConnectivity():
receivePack():
receive the pack
parse the pack, returns packLock (.keep file)
PackInserter.flush():
write tmpPck file: "insert_<random>.pack"
write tmpIdx file: "insert_<random>.idx"
real pack name: "pack-<SHA1>.pack"
real index name: "pack-<SHA1>.idx"
atomic rename tmpPack to realPack
atomic rename tmpIdx to tmpIdx
execute commands
unlockPack() (removes .keep file)
autoGc() (triggers concurrent gc if autogc is on)
When PackInserter.flush() renames the temporary pack to the final pack-xxx.pack file
the temporary pack index file "insert_xxx.idx" has no matching .pack file with the same
base name for a short period of time.
If GC.deleteOrphans runs at exactly that time it will deduce the pack index file is orphaned and delete it.
Please review the fix:
sy...@gmail.com <sy...@gmail.com> #52
I have added some log lines to jgit in order to dump the list of files that exist on disk before the deletion of the orphans happen,
Basically dumping the list of:
fileNames = files.map(path -> path.getFileName().toString())
.filter(name -> (name.endsWith(PACK_EXT)
|| name.endsWith(BITMAP_EXT)
|| name.endsWith(INDEX_EXT)))
I can confirm that the .idx files that gets deleted does not have a matching .pack file, so it is *momentarily* an orphan.
That’s why it gets deleted.
Somehow the .idx file gets written and flushed to disk _before_ its associated .pack file.
If I understand @Matthias explanation, that’s because the .pack file is actually written as .keep file first and then atomically renamed,
So it might happen that the GC checks happens after the .idx has been created and _before_ the renaming.
Do I understand this correctly?
Basically dumping the list of:
fileNames = files.map(path -> path.getFileName().toString())
.filter(name -> (name.endsWith(PACK_EXT)
|| name.endsWith(BITMAP_EXT)
|| name.endsWith(INDEX_EXT)))
I can confirm that the .idx files that gets deleted does not have a matching .pack file, so it is *momentarily* an orphan.
That’s why it gets deleted.
Somehow the .idx file gets written and flushed to disk _before_ its associated .pack file.
If I understand @Matthias explanation, that’s because the .pack file is actually written as .keep file first and then atomically renamed,
So it might happen that the GC checks happens after the .idx has been created and _before_ the renaming.
Do I understand this correctly?
da...@gmail.com <da...@gmail.com> #53
[Empty comment from Monorail migration]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #54
The following revision refers to this bug:
https://gerrit.googlesource.com/gerrit.git/+/401e01fd0605689129a159dc8188c0ebf91cdf31
commit 401e01fd0605689129a159dc8188c0ebf91cdf31
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Thu Nov 26 22:57:30 2020
Upgrade JGit to 5.3.8.202011260953-r
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Release Notes:
https://projects.eclipse.org/projects/technology.jgit/releases/5.1.14
Bug:https://crbug.com/gerrit/13544
Change-Id: I81272f4cac9923b63b0966bcf227325efbf7d0e9
[modify]https://gerrit.googlesource.com/gerrit.git/+/401e01fd0605689129a159dc8188c0ebf91cdf31/lib/jgit/jgit.bzl
commit 401e01fd0605689129a159dc8188c0ebf91cdf31
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Thu Nov 26 22:57:30 2020
Upgrade JGit to 5.3.8.202011260953-r
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Release Notes:
Bug:
Change-Id: I81272f4cac9923b63b0966bcf227325efbf7d0e9
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #55
The following revision refers to this bug:
https://gerrit.googlesource.com/gerrit.git/+/1dea2aa41583829cf2668ea58695710201c52941
commit 1dea2aa41583829cf2668ea58695710201c52941
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Fri Nov 27 11:41:50 2020
Upgrade JGit to 5cd485e5
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Bug:https://crbug.com/gerrit/13544
Change-Id: I656ba9b6a718857742fb4744a32dac8db654fa07
[modify]https://gerrit.googlesource.com/gerrit.git/+/1dea2aa41583829cf2668ea58695710201c52941/modules/jgit
commit 1dea2aa41583829cf2668ea58695710201c52941
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Fri Nov 27 11:41:50 2020
Upgrade JGit to 5cd485e5
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Bug:
Change-Id: I656ba9b6a718857742fb4744a32dac8db654fa07
[modify]
ma...@gmail.com <ma...@gmail.com> #56
[Empty comment from Monorail migration]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #57
The following revision refers to this bug:
https://gerrit.googlesource.com/gerrit.git/+/ac1f09139cd665e29f4d4b2a50e3004c76f92eee
commit ac1f09139cd665e29f4d4b2a50e3004c76f92eee
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Thu Nov 26 12:09:13 2020
Upgrade JGit to 5.1.14.202011251942-r
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Release Notes:
https://projects.eclipse.org/projects/technology.jgit/releases/5.1.14
Bug:https://crbug.com/gerrit/13544
Change-Id: Ieeb5a883bcb487a4d45f299aec5b31475002cdd3
[modify]https://gerrit.googlesource.com/gerrit.git/+/ac1f09139cd665e29f4d4b2a50e3004c76f92eee/lib/jgit/jgit.bzl
commit ac1f09139cd665e29f4d4b2a50e3004c76f92eee
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Thu Nov 26 12:09:13 2020
Upgrade JGit to 5.1.14.202011251942-r
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Release Notes:
Bug:
Change-Id: Ieeb5a883bcb487a4d45f299aec5b31475002cdd3
[modify]
ma...@gmail.com <ma...@gmail.com> #58
[Empty comment from Monorail migration]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #59
The following revision refers to this bug:
https://gerrit.googlesource.com/gerrit.git/+/ac1f09139cd665e29f4d4b2a50e3004c76f92eee
commit ac1f09139cd665e29f4d4b2a50e3004c76f92eee
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Thu Nov 26 12:09:13 2020
Upgrade JGit to 5.1.14.202011251942-r
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Release Notes:
https://projects.eclipse.org/projects/technology.jgit/releases/5.1.14
Bug:https://crbug.com/gerrit/13544
Change-Id: Ieeb5a883bcb487a4d45f299aec5b31475002cdd3
[modify]https://gerrit.googlesource.com/gerrit.git/+/ac1f09139cd665e29f4d4b2a50e3004c76f92eee/lib/jgit/jgit.bzl
commit ac1f09139cd665e29f4d4b2a50e3004c76f92eee
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Thu Nov 26 12:09:13 2020
Upgrade JGit to 5.1.14.202011251942-r
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Release Notes:
Bug:
Change-Id: Ieeb5a883bcb487a4d45f299aec5b31475002cdd3
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #60
The following revision refers to this bug:
https://gerrit.googlesource.com/gerrit.git/+/ac1f09139cd665e29f4d4b2a50e3004c76f92eee
commit ac1f09139cd665e29f4d4b2a50e3004c76f92eee
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Thu Nov 26 12:09:13 2020
Upgrade JGit to 5.1.14.202011251942-r
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Release Notes:
https://projects.eclipse.org/projects/technology.jgit/releases/5.1.14
Bug:https://crbug.com/gerrit/13544
Change-Id: Ieeb5a883bcb487a4d45f299aec5b31475002cdd3
[modify]https://gerrit.googlesource.com/gerrit.git/+/ac1f09139cd665e29f4d4b2a50e3004c76f92eee/lib/jgit/jgit.bzl
commit ac1f09139cd665e29f4d4b2a50e3004c76f92eee
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Thu Nov 26 12:09:13 2020
Upgrade JGit to 5.1.14.202011251942-r
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Release Notes:
Bug:
Change-Id: Ieeb5a883bcb487a4d45f299aec5b31475002cdd3
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #61
The following revision refers to this bug:
https://gerrit.googlesource.com/gerrit.git/+/401e01fd0605689129a159dc8188c0ebf91cdf31
commit 401e01fd0605689129a159dc8188c0ebf91cdf31
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Thu Nov 26 22:57:30 2020
Upgrade JGit to 5.3.8.202011260953-r
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Release Notes:
https://projects.eclipse.org/projects/technology.jgit/releases/5.1.14
Bug:https://crbug.com/gerrit/13544
Change-Id: I81272f4cac9923b63b0966bcf227325efbf7d0e9
[modify]https://gerrit.googlesource.com/gerrit.git/+/401e01fd0605689129a159dc8188c0ebf91cdf31/lib/jgit/jgit.bzl
commit 401e01fd0605689129a159dc8188c0ebf91cdf31
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Thu Nov 26 22:57:30 2020
Upgrade JGit to 5.3.8.202011260953-r
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Release Notes:
Bug:
Change-Id: I81272f4cac9923b63b0966bcf227325efbf7d0e9
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #62
The following revision refers to this bug:
https://gerrit.googlesource.com/gerrit.git/+/a60b03950903bc7a644e86225102c9e4c3972ede
commit a60b03950903bc7a644e86225102c9e4c3972ede
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Fri Nov 27 11:23:12 2020
Upgrade JGit to 5cd485e5
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Bug:https://crbug.com/gerrit/13544
Change-Id: I5a86a2bf615e6defe7b64119155affc988b96218
[modify]https://gerrit.googlesource.com/gerrit.git/+/a60b03950903bc7a644e86225102c9e4c3972ede/modules/jgit
commit a60b03950903bc7a644e86225102c9e4c3972ede
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Fri Nov 27 11:23:12 2020
Upgrade JGit to 5cd485e5
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Bug:
Change-Id: I5a86a2bf615e6defe7b64119155affc988b96218
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #63
The following revision refers to this bug:
https://gerrit.googlesource.com/gerrit.git/+/3deba8778f9bb416e36a1e769a07b054530d56cf
commit 3deba8778f9bb416e36a1e769a07b054530d56cf
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Fri Nov 27 11:35:14 2020
Upgrade JGit to ad902087
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Bug:https://crbug.com/gerrit/13544
Change-Id: I7266f7b0c164826140726b939a647489902633b9
[modify]https://gerrit.googlesource.com/gerrit.git/+/3deba8778f9bb416e36a1e769a07b054530d56cf/modules/jgit
commit 3deba8778f9bb416e36a1e769a07b054530d56cf
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Fri Nov 27 11:35:14 2020
Upgrade JGit to ad902087
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Bug:
Change-Id: I7266f7b0c164826140726b939a647489902633b9
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #64
The following revision refers to this bug:
https://gerrit.googlesource.com/gerrit.git/+/e78e7154a1c2c14954cbc1adbb583815385c1eaa
commit e78e7154a1c2c14954cbc1adbb583815385c1eaa
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Fri Nov 27 11:37:32 2020
Upgrade JGit to d1801402
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Bug:https://crbug.com/gerrit/13544
Change-Id: Ib543e9b8f094b8065e9ba4174953ab52be6eb43e
[modify]https://gerrit.googlesource.com/gerrit.git/+/e78e7154a1c2c14954cbc1adbb583815385c1eaa/modules/jgit
commit e78e7154a1c2c14954cbc1adbb583815385c1eaa
Author: Matthias Sohn <matthias.sohn@sap.com>
Date: Fri Nov 27 11:37:32 2020
Upgrade JGit to d1801402
This version contains the following fix:
Ensure that GC#deleteOrphans respects pack lock
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
Bug:
Change-Id: Ib543e9b8f094b8065e9ba4174953ab52be6eb43e
[modify]
Description
*************************************************************************
*** !!!! THIS BUG TRACKER IS FOR GERRIT CODE REVIEW !!!!
*** Do not submit bugs for chrome/android and issues with your company's
*** Gerrit setup here. Those issues belong in different issue trackers.
*************************************************************************
Affected Version: stable-3.3, master
What steps will reproduce the problem?
1. Run a gerrit instance: docker run -ti -p 8080:8080 -p 29418:29418 gerritcodereview/gerrit
2. points sbt-gatling-test to it for 10 minutes [1]
What is the expected output?
All tests pass
What do you see instead?
Around 1000 changes gerrit starts throwing issues similar to
ssh://admin@localhost:29418/load-test: internal server error - org.eclipse.jgit.errors.RemoteRepositoryException: ssh://admin@localhost:29418:29418/load-test: internal server error
want eb0630607d2719a48b86a1956657dd7c032a7977 not valid - org.eclipse.jgit.errors.TransportException: want eb0630607d2719a48b86a1956657dd7c032a7977 not valid
want 3bd884b554ee26875621b26d52c1ff74c18b235e not valid - org.eclipse.jgit.errors.TransportException: want 3bd884b554ee26875621b26d52c1ff74c18b235e not valid
want a307972b5af19eb8a028e583004306326efe6b28 not valid - org.eclipse.jgit.errors.TransportException: want a307972b5af19eb8a028e583004306326efe6b28 not valid
want 94c9248fb9810457160301f15b9bffd545fad2e1 not valid - org.eclipse.jgit.errors.TransportException: want 94c9248fb9810457160301f15b9bffd545fad2e1 not valid
Please provide any additional information below.
Deployed gerrit straight from sources as such:
java -jar bazel-bin/release.war init -d /tmp/gerrit-3.3 --install-all-plugins --dev --batch --no-auto-start
Run gatling as:
make JOBS=2 parallel-run
No jgit.config.
[1]https://github.com/GerritForge/gatling-sbt-gerrit-test