Fixed
Status Update
Comments
sv...@axis.com <sv...@axis.com> #2
"The method `com.google.gerrit.server.submit.RebaseSorter.isAlreadyMerged(RebaseSorter.java:117)` checks for each incoming commit whether it is already merged into a branch. "
This isn't true, the check isn't performed for each incoming commit, it only performs an extra check for those commits that have been identified as a missing dependency to catch the cases where the commit is in fact already merged into a branch.
"I have prepared a workaround [2], that specifically checks whether the change series that is to be submitted contains a merge and only then runs isAlreadyMerged(). This should fix the issue for us."
IIUC this shouldn't fix the issue for you. As you stated you don't submit changes that are merges so if this was the only fix in your case it would just skip the extra check and the submit would fail with "Missing Dependecy".
The check for toSortContainsMerge is simply a fail-fast since we cannot recover from this situation so it doesn't make sense to do the extra check.
What actually should help you is the check before that "isMergedInBranchAsSubmittedChange" [1] that does a quick index lookup, this is were the commits that have slipped through in your case would be caught.
[1]https://gerrit-review.googlesource.com/c/gerrit/+/364134/13/java/com/google/gerrit/server/submit/RebaseSorter.java#104
This isn't true, the check isn't performed for each incoming commit, it only performs an extra check for those commits that have been identified as a missing dependency to catch the cases where the commit is in fact already merged into a branch.
"I have prepared a workaround [2], that specifically checks whether the change series that is to be submitted contains a merge and only then runs isAlreadyMerged(). This should fix the issue for us."
IIUC this shouldn't fix the issue for you. As you stated you don't submit changes that are merges so if this was the only fix in your case it would just skip the extra check and the submit would fail with "Missing Dependecy".
The check for toSortContainsMerge is simply a fail-fast since we cannot recover from this situation so it doesn't make sense to do the extra check.
What actually should help you is the check before that "isMergedInBranchAsSubmittedChange" [1] that does a quick index lookup, this is were the commits that have slipped through in your case would be caught.
[1]
th...@gmail.com <th...@gmail.com> #3
> This isn't true, the check isn't performed for each incoming commit, it only performs an extra check for those commits that have been identified as a missing dependency to catch the cases where the commit is in fact already merged into a branch.
True that was not well phrased from my side. I mention these checks later. These checks should have evaluated to false, i.e. the commit being looked at is not a missing dependency. In some cases, we see however that these conditions wrongly identified the commit as a missing dependency. In case of the mentioned debug session, where I was able to reproduce it, it was the latest patchset of the change that was the tip of the change series. This commit was clearly mergeable and also an "incoming" commit. And the checks correctly evaluated the commit not to be a missing dependency seconds later during the debug session, when triggering the debugger to rerun them.
> The check for toSortContainsMerge is simply a fail-fast since we cannot recover from this situation so it doesn't make sense to do the extra check.
Yes, you are right. But that would still improve the situation by quite a lot, since the use gets faster feedback that something went wrong and will not block threads by retrying the submit that will eventually fail, since the original submit eventually succeeded.
> What actually should help you is the check before that "isMergedInBranchAsSubmittedChange" [1] that does a quick index lookup, this is were the commits that have slipped through in your case would be caught.
I don't get this. "isMergedInBranchAsSubmittedChange" finds commits that are already submitted as changes into the target branch of the current submit, i.e. if the commit that falsely ended up in this check, i.e. it is the latest patchset of the change to be submitted, is used for this check, the `byBranchCommit()` query will return a list containing the change to be submitted. The check will then check whether the change was merged, which is not the case, and will return false and continue with the `(toSortContainsMerge && isAlreadyMergedInAnyBranch(c))` condition.
True that was not well phrased from my side. I mention these checks later. These checks should have evaluated to false, i.e. the commit being looked at is not a missing dependency. In some cases, we see however that these conditions wrongly identified the commit as a missing dependency. In case of the mentioned debug session, where I was able to reproduce it, it was the latest patchset of the change that was the tip of the change series. This commit was clearly mergeable and also an "incoming" commit. And the checks correctly evaluated the commit not to be a missing dependency seconds later during the debug session, when triggering the debugger to rerun them.
> The check for toSortContainsMerge is simply a fail-fast since we cannot recover from this situation so it doesn't make sense to do the extra check.
Yes, you are right. But that would still improve the situation by quite a lot, since the use gets faster feedback that something went wrong and will not block threads by retrying the submit that will eventually fail, since the original submit eventually succeeded.
> What actually should help you is the check before that "isMergedInBranchAsSubmittedChange" [1] that does a quick index lookup, this is were the commits that have slipped through in your case would be caught.
I don't get this. "isMergedInBranchAsSubmittedChange" finds commits that are already submitted as changes into the target branch of the current submit, i.e. if the commit that falsely ended up in this check, i.e. it is the latest patchset of the change to be submitted, is used for this check, the `byBranchCommit()` query will return a list containing the change to be submitted. The check will then check whether the change was merged, which is not the case, and will return false and continue with the `(toSortContainsMerge && isAlreadyMergedInAnyBranch(c))` condition.
sv...@axis.com <sv...@axis.com> #4
My bad, I thought that the incoming commits would never end up in that code-path but:
if (!c.has(canMergeFlag) || !incoming.contains(c)) {
Shouldn't this really be (if canMergeFlag is missing and it's not in "incoming")?:
if (!c.has(canMergeFlag) && !incoming.contains(c)) {
Why would we consider a commit that we want to submit a missing dependency?
If I'm correct this would have worked because the order of `sort` is correct in most cases (so that parent commits are checked first).
For your corner-cases it might be that they were caused by `sort` not being sorted like that so that a child was checked before it's parent in which case the parent, event though it was part of what we wanted to submit, would have been considered a missing-dependency.
I'm not quite clear how the submits in your case could eventually pass if the commit wasn't merged into another branch.
@Edwin, could you shed some light as to why OR is used?
https://gerrit-review.googlesource.com/c/gerrit/+/37780/10/gerrit-server/src/main/java/com/google/gerrit/server/git/RebaseSorter.java
if (!c.has(canMergeFlag) || !incoming.contains(c)) {
Shouldn't this really be (if canMergeFlag is missing and it's not in "incoming")?:
if (!c.has(canMergeFlag) && !incoming.contains(c)) {
Why would we consider a commit that we want to submit a missing dependency?
If I'm correct this would have worked because the order of `sort` is correct in most cases (so that parent commits are checked first).
For your corner-cases it might be that they were caused by `sort` not being sorted like that so that a child was checked before it's parent in which case the parent, event though it was part of what we wanted to submit, would have been considered a missing-dependency.
I'm not quite clear how the submits in your case could eventually pass if the commit wasn't merged into another branch.
@Edwin, could you shed some light as to why OR is used?
sv...@axis.com <sv...@axis.com> #5
W.r.t. toSortContainsMerge, I think I phrased things unclearly. To clarify:
I think it's good to fail fast so it's a good optimization.
By "since we cannot recover from this situation so it doesn't make sense to do the extra check" I was referring to how it works, I wasn't trying to say that the `toSortContainsMerge` check doesn't make sense.
I think it's good to fail fast so it's a good optimization.
By "since we cannot recover from this situation so it doesn't make sense to do the extra check" I was referring to how it works, I wasn't trying to say that the `toSortContainsMerge` check doesn't make sense.
th...@gmail.com <th...@gmail.com> #6
> I'm not quite clear how the submits in your case could eventually pass if the commit wasn't merged into another branch.
Yes, this is indeed strange (why didn't I see this?). But I might have an explanation. I rechecked the thread dumps and logs and found that actually 5 requests were sent to submit the change for the case that I had thread dumps for. The thread dumps captured the last three requests and all were spending time in isAlreadyMerged(). In this case the submit actually didn't go through. I originally assumed that was the case, because the user who did inform me, said that he did cancel the request eventually. It was not an ideal situation, but so far the affected users did not alert me again in time for me to collect more data.
I will now collect more data from the logs to create a timeline for this case to get a better idea of what happened and will report the data here. This seems to be rather a race between multiple submit requests on the same change.
Yes, this is indeed strange (why didn't I see this?). But I might have an explanation. I rechecked the thread dumps and logs and found that actually 5 requests were sent to submit the change for the case that I had thread dumps for. The thread dumps captured the last three requests and all were spending time in isAlreadyMerged(). In this case the submit actually didn't go through. I originally assumed that was the case, because the user who did inform me, said that he did cancel the request eventually. It was not an ideal situation, but so far the affected users did not alert me again in time for me to collect more data.
I will now collect more data from the logs to create a timeline for this case to get a better idea of what happened and will report the data here. This seems to be rather a race between multiple submit requests on the same change.
sv...@axis.com <sv...@axis.com> #7
Am I understanding correctly that there were multiple submit-requests an one of them was successful?
Then my theory might hold up. It could be that `sort` wasn't sorted "correctly" for the requests that you got metrics from but was sorted "correctly" in another request that succeeded.
Then my theory might hold up. It could be that `sort` wasn't sorted "correctly" for the requests that you got metrics from but was sorted "correctly" in another request that succeeded.
th...@gmail.com <th...@gmail.com> #8
In the described case the submit was unfortunately not successful. Here is the data about all 5 submit requests for that change:
Submit Request 1:
startTime: 13:56:40
gatewayTimeout: 14:01:40
stopTime: 14:19:06
protocol: HTTP
statusCode: 503
error: LockFailureException [1]
user: User1
latency: 1346005 ms (~22 min)
memAlloc: 28.74 Gb
Submit Request 2:
startTime: 13:58:15
gatewayTimeout: 14:03:15
stopTime: 14:20:10
protocol: HTTP
statusCode: 503
error: LockFailureException [2]
user: User2
latency: 1315772 ms (~22 min)
memAlloc: 27.59 Gb
Submit Request 3:
startTime: 14:43:50
gatewayTimeout: 14:48:50
stopTime: 14:48:50
protocol: SSH
statusCode: killed
error:
user: User2
latency: 299996 ms (~5 min)
memAlloc: -148.00 Gb
Submit Request 4:
startTime: 14:51:51
gatewayTimeout: 14:56:51
stopTime: 14:56:51
protocol: SSH
statusCode: killed
error:
user: User2
latency: 299999 ms (~5 min)
memAlloc: -125.46 Gb
Submit Request 5:
startTime: 14:53:43
gatewayTimeout: 14:58:43
stopTime: 15:14:08
protocol: HTTP
statusCode: 503
error: LockFailureException [3]
user: User1
latency: 1224896 ms (~20 min)
memAlloc: 28.74 Gb
I did some thread dumps, after the user informed me. This captured SubmitRequests 3-5, which were all working on `isAlreadyMerged()`. These are far from ideal data and I asked the reporting users to add tracing to future submits and inform me. But so far I did not get a response on that, although I still see long running submits in the logs (mostly those submits take about 6 min, which is already very long, but there are also submits taking 20+ min as this one. A very similar occurance was
[1]
[2023-03-14T14:19:06.496+01:00] [HTTP POST /changes/project~1385603/revisions/1/submit (User1 from 1.1.1.1)] ERROR com.google.gerrit.httpd.restapi.RestApiServlet : Error in POST /changes/project~1385603/revisions/1/submit [CONTEXT project="project" request="REST /changes/*/revisions/*/submit" ]
com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 6de95694fb9bb301d8fffb1024dfb8cbbfd38389 refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 0e6fca9f7d1bc3899115febbe427813ed1593d20 6de95694fb9bb301d8fffb1024dfb8cbbfd38389 refs/heads/llvm (LOCK_FAILURE)
UPDATE: 3bb0885dc3e51df4355909310b371ee7bcd5ffd7 a754cc410f6d8212ab19e431688c882cac9e004a refs/changes/03/1385603/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.server.update.BatchUpdate.wrapAndThrowException(BatchUpdate.java:235)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:178)
at com.google.gerrit.server.update.SubmissionExecutor.execute(SubmissionExecutor.java:61)
at com.google.gerrit.server.submit.MergeOp.integrateIntoHistory(MergeOp.java:667)
at com.google.gerrit.server.submit.MergeOp.lambda$merge$1(MergeOp.java:535)
at com.google.gerrit.server.update.RetryableChangeAction.lambda$new$0(RetryableChangeAction.java:48)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.google.gerrit.server.update.RetryHelper.executeWithTimeoutCount(RetryHelper.java:564)
at com.google.gerrit.server.update.RetryHelper.execute(RetryHelper.java:507)
at com.google.gerrit.server.update.RetryableAction.call(RetryableAction.java:173)
at com.google.gerrit.server.update.RetryableChangeAction.call(RetryableChangeAction.java:84)
at com.google.gerrit.server.submit.MergeOp.merge(MergeOp.java:547)
at com.google.gerrit.server.restapi.change.Submit.mergeChange(Submit.java:214)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:191)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:88)
....
Caused by: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 6de95694fb9bb301d8fffb1024dfb8cbbfd38389 refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 0e6fca9f7d1bc3899115febbe427813ed1593d20 6de95694fb9bb301d8fffb1024dfb8cbbfd38389 refs/heads/llvm (LOCK_FAILURE)
UPDATE: 3bb0885dc3e51df4355909310b371ee7bcd5ffd7 a754cc410f6d8212ab19e431688c882cac9e004a refs/changes/03/1385603/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.git.RefUpdateUtil.checkResults(RefUpdateUtil.java:100)
at com.google.gerrit.git.RefUpdateUtil.executeChecked(RefUpdateUtil.java:57)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:386)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:336)
at com.google.gerrit.server.update.BatchUpdate$ChangesHandle.execute(BatchUpdate.java:614)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:150)
... 90 more
[2]
[2023-03-14T14:20:10.910+01:00] [HTTP POST /changes/project~1385603/revisions/1/submit (User2 from 1.1.1.1)] ERROR com.google.gerrit.httpd.restapi.RestApiServlet : Error in POST /changes/project~1385603/revisions/1/submit [CONTEXT project="project" request="REST /changes/*/revisions/*/submit" ]
com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 7356ce4f8a42d05a2f2891ecc21ce437b557855b refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 0e6fca9f7d1bc3899115febbe427813ed1593d20 7356ce4f8a42d05a2f2891ecc21ce437b557855b refs/heads/llvm (LOCK_FAILURE)
UPDATE: 3bb0885dc3e51df4355909310b371ee7bcd5ffd7 c269ec4b48312258ff162a60ad2c235810bdb8e4 refs/changes/03/1385603/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.server.update.BatchUpdate.wrapAndThrowException(BatchUpdate.java:235)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:178)
at com.google.gerrit.server.update.SubmissionExecutor.execute(SubmissionExecutor.java:61)
at com.google.gerrit.server.submit.MergeOp.integrateIntoHistory(MergeOp.java:667)
at com.google.gerrit.server.submit.MergeOp.lambda$merge$1(MergeOp.java:535)
at com.google.gerrit.server.update.RetryableChangeAction.lambda$new$0(RetryableChangeAction.java:48)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.google.gerrit.server.update.RetryHelper.executeWithTimeoutCount(RetryHelper.java:564)
at com.google.gerrit.server.update.RetryHelper.execute(RetryHelper.java:507)
at com.google.gerrit.server.update.RetryableAction.call(RetryableAction.java:173)
at com.google.gerrit.server.update.RetryableChangeAction.call(RetryableChangeAction.java:84)
at com.google.gerrit.server.submit.MergeOp.merge(MergeOp.java:547)
at com.google.gerrit.server.restapi.change.Submit.mergeChange(Submit.java:214)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:191)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:88)
...
Caused by: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 7356ce4f8a42d05a2f2891ecc21ce437b557855b refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 0e6fca9f7d1bc3899115febbe427813ed1593d20 7356ce4f8a42d05a2f2891ecc21ce437b557855b refs/heads/llvm (LOCK_FAILURE)
UPDATE: 3bb0885dc3e51df4355909310b371ee7bcd5ffd7 c269ec4b48312258ff162a60ad2c235810bdb8e4 refs/changes/03/1385603/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.git.RefUpdateUtil.checkResults(RefUpdateUtil.java:100)
at com.google.gerrit.git.RefUpdateUtil.executeChecked(RefUpdateUtil.java:57)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:386)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:336)
at com.google.gerrit.server.update.BatchUpdate$ChangesHandle.execute(BatchUpdate.java:614)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:150)
... 90 more
[3]
[2023-03-14T15:14:08.182+01:00] [HTTP POST /changes/project~1385603/revisions/1/submit (User1 from 1.1.1.1)] ERROR com.google.gerrit.httpd.restapi.RestApiServlet : Error in POST /changes/project~1385603/revisions/1/submit [CONTEXT project="project" request="REST /changes/*/revisions/*/submit" ]
com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 fe1938a609b3d909c11a0dd26702a03bfb12ce2f refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 3acb4a36a56d0936125616ec8169381eaded320d fe1938a609b3d909c11a0dd26702a03bfb12ce2f refs/heads/llvm (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: f2ae8430e5842697c9a7cb68f84af97fed61094b 72dbdbe53da77a63b7f3ca27fcc7d38a8c772fff refs/changes/03/1385603/meta (LOCK_FAILURE)
]
at com.google.gerrit.server.update.BatchUpdate.wrapAndThrowException(BatchUpdate.java:235)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:178)
at com.google.gerrit.server.update.SubmissionExecutor.execute(SubmissionExecutor.java:61)
at com.google.gerrit.server.submit.MergeOp.integrateIntoHistory(MergeOp.java:667)
at com.google.gerrit.server.submit.MergeOp.lambda$merge$1(MergeOp.java:535)
at com.google.gerrit.server.update.RetryableChangeAction.lambda$new$0(RetryableChangeAction.java:48)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.google.gerrit.server.update.RetryHelper.executeWithTimeoutCount(RetryHelper.java:564)
at com.google.gerrit.server.update.RetryHelper.execute(RetryHelper.java:507)
at com.google.gerrit.server.update.RetryableAction.call(RetryableAction.java:173)
at com.google.gerrit.server.update.RetryableChangeAction.call(RetryableChangeAction.java:84)
at com.google.gerrit.server.submit.MergeOp.merge(MergeOp.java:547)
at com.google.gerrit.server.restapi.change.Submit.mergeChange(Submit.java:214)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:191)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:88)
Caused by: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 fe1938a609b3d909c11a0dd26702a03bfb12ce2f refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 3acb4a36a56d0936125616ec8169381eaded320d fe1938a609b3d909c11a0dd26702a03bfb12ce2f refs/heads/llvm (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: f2ae8430e5842697c9a7cb68f84af97fed61094b 72dbdbe53da77a63b7f3ca27fcc7d38a8c772fff refs/changes/03/1385603/meta (LOCK_FAILURE)
]
at com.google.gerrit.git.RefUpdateUtil.checkResults(RefUpdateUtil.java:100)
at com.google.gerrit.git.RefUpdateUtil.executeChecked(RefUpdateUtil.java:57)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:386)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:336)
at com.google.gerrit.server.update.BatchUpdate$ChangesHandle.execute(BatchUpdate.java:614)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:150)
... 90 more
Submit Request 1:
startTime: 13:56:40
gatewayTimeout: 14:01:40
stopTime: 14:19:06
protocol: HTTP
statusCode: 503
error: LockFailureException [1]
user: User1
latency: 1346005 ms (~22 min)
memAlloc: 28.74 Gb
Submit Request 2:
startTime: 13:58:15
gatewayTimeout: 14:03:15
stopTime: 14:20:10
protocol: HTTP
statusCode: 503
error: LockFailureException [2]
user: User2
latency: 1315772 ms (~22 min)
memAlloc: 27.59 Gb
Submit Request 3:
startTime: 14:43:50
gatewayTimeout: 14:48:50
stopTime: 14:48:50
protocol: SSH
statusCode: killed
error:
user: User2
latency: 299996 ms (~5 min)
memAlloc: -148.00 Gb
Submit Request 4:
startTime: 14:51:51
gatewayTimeout: 14:56:51
stopTime: 14:56:51
protocol: SSH
statusCode: killed
error:
user: User2
latency: 299999 ms (~5 min)
memAlloc: -125.46 Gb
Submit Request 5:
startTime: 14:53:43
gatewayTimeout: 14:58:43
stopTime: 15:14:08
protocol: HTTP
statusCode: 503
error: LockFailureException [3]
user: User1
latency: 1224896 ms (~20 min)
memAlloc: 28.74 Gb
I did some thread dumps, after the user informed me. This captured SubmitRequests 3-5, which were all working on `isAlreadyMerged()`. These are far from ideal data and I asked the reporting users to add tracing to future submits and inform me. But so far I did not get a response on that, although I still see long running submits in the logs (mostly those submits take about 6 min, which is already very long, but there are also submits taking 20+ min as this one. A very similar occurance was
[1]
[2023-03-14T14:19:06.496+01:00] [HTTP POST /changes/project~1385603/revisions/1/submit (User1 from 1.1.1.1)] ERROR com.google.gerrit.httpd.restapi.RestApiServlet : Error in POST /changes/project~1385603/revisions/1/submit [CONTEXT project="project" request="REST /changes/*/revisions/*/submit" ]
com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 6de95694fb9bb301d8fffb1024dfb8cbbfd38389 refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 0e6fca9f7d1bc3899115febbe427813ed1593d20 6de95694fb9bb301d8fffb1024dfb8cbbfd38389 refs/heads/llvm (LOCK_FAILURE)
UPDATE: 3bb0885dc3e51df4355909310b371ee7bcd5ffd7 a754cc410f6d8212ab19e431688c882cac9e004a refs/changes/03/1385603/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.server.update.BatchUpdate.wrapAndThrowException(BatchUpdate.java:235)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:178)
at com.google.gerrit.server.update.SubmissionExecutor.execute(SubmissionExecutor.java:61)
at com.google.gerrit.server.submit.MergeOp.integrateIntoHistory(MergeOp.java:667)
at com.google.gerrit.server.submit.MergeOp.lambda$merge$1(MergeOp.java:535)
at com.google.gerrit.server.update.RetryableChangeAction.lambda$new$0(RetryableChangeAction.java:48)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.google.gerrit.server.update.RetryHelper.executeWithTimeoutCount(RetryHelper.java:564)
at com.google.gerrit.server.update.RetryHelper.execute(RetryHelper.java:507)
at com.google.gerrit.server.update.RetryableAction.call(RetryableAction.java:173)
at com.google.gerrit.server.update.RetryableChangeAction.call(RetryableChangeAction.java:84)
at com.google.gerrit.server.submit.MergeOp.merge(MergeOp.java:547)
at com.google.gerrit.server.restapi.change.Submit.mergeChange(Submit.java:214)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:191)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:88)
....
Caused by: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 6de95694fb9bb301d8fffb1024dfb8cbbfd38389 refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 0e6fca9f7d1bc3899115febbe427813ed1593d20 6de95694fb9bb301d8fffb1024dfb8cbbfd38389 refs/heads/llvm (LOCK_FAILURE)
UPDATE: 3bb0885dc3e51df4355909310b371ee7bcd5ffd7 a754cc410f6d8212ab19e431688c882cac9e004a refs/changes/03/1385603/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.git.RefUpdateUtil.checkResults(RefUpdateUtil.java:100)
at com.google.gerrit.git.RefUpdateUtil.executeChecked(RefUpdateUtil.java:57)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:386)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:336)
at com.google.gerrit.server.update.BatchUpdate$ChangesHandle.execute(BatchUpdate.java:614)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:150)
... 90 more
[2]
[2023-03-14T14:20:10.910+01:00] [HTTP POST /changes/project~1385603/revisions/1/submit (User2 from 1.1.1.1)] ERROR com.google.gerrit.httpd.restapi.RestApiServlet : Error in POST /changes/project~1385603/revisions/1/submit [CONTEXT project="project" request="REST /changes/*/revisions/*/submit" ]
com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 7356ce4f8a42d05a2f2891ecc21ce437b557855b refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 0e6fca9f7d1bc3899115febbe427813ed1593d20 7356ce4f8a42d05a2f2891ecc21ce437b557855b refs/heads/llvm (LOCK_FAILURE)
UPDATE: 3bb0885dc3e51df4355909310b371ee7bcd5ffd7 c269ec4b48312258ff162a60ad2c235810bdb8e4 refs/changes/03/1385603/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.server.update.BatchUpdate.wrapAndThrowException(BatchUpdate.java:235)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:178)
at com.google.gerrit.server.update.SubmissionExecutor.execute(SubmissionExecutor.java:61)
at com.google.gerrit.server.submit.MergeOp.integrateIntoHistory(MergeOp.java:667)
at com.google.gerrit.server.submit.MergeOp.lambda$merge$1(MergeOp.java:535)
at com.google.gerrit.server.update.RetryableChangeAction.lambda$new$0(RetryableChangeAction.java:48)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.google.gerrit.server.update.RetryHelper.executeWithTimeoutCount(RetryHelper.java:564)
at com.google.gerrit.server.update.RetryHelper.execute(RetryHelper.java:507)
at com.google.gerrit.server.update.RetryableAction.call(RetryableAction.java:173)
at com.google.gerrit.server.update.RetryableChangeAction.call(RetryableChangeAction.java:84)
at com.google.gerrit.server.submit.MergeOp.merge(MergeOp.java:547)
at com.google.gerrit.server.restapi.change.Submit.mergeChange(Submit.java:214)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:191)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:88)
...
Caused by: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 7356ce4f8a42d05a2f2891ecc21ce437b557855b refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 0e6fca9f7d1bc3899115febbe427813ed1593d20 7356ce4f8a42d05a2f2891ecc21ce437b557855b refs/heads/llvm (LOCK_FAILURE)
UPDATE: 3bb0885dc3e51df4355909310b371ee7bcd5ffd7 c269ec4b48312258ff162a60ad2c235810bdb8e4 refs/changes/03/1385603/meta (REJECTED_OTHER_REASON: transaction aborted)
]
at com.google.gerrit.git.RefUpdateUtil.checkResults(RefUpdateUtil.java:100)
at com.google.gerrit.git.RefUpdateUtil.executeChecked(RefUpdateUtil.java:57)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:386)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:336)
at com.google.gerrit.server.update.BatchUpdate$ChangesHandle.execute(BatchUpdate.java:614)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:150)
... 90 more
[3]
[2023-03-14T15:14:08.182+01:00] [HTTP POST /changes/project~1385603/revisions/1/submit (User1 from 1.1.1.1)] ERROR com.google.gerrit.httpd.restapi.RestApiServlet : Error in POST /changes/project~1385603/revisions/1/submit [CONTEXT project="project" request="REST /changes/*/revisions/*/submit" ]
com.google.gerrit.server.update.UpdateException: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 fe1938a609b3d909c11a0dd26702a03bfb12ce2f refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 3acb4a36a56d0936125616ec8169381eaded320d fe1938a609b3d909c11a0dd26702a03bfb12ce2f refs/heads/llvm (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: f2ae8430e5842697c9a7cb68f84af97fed61094b 72dbdbe53da77a63b7f3ca27fcc7d38a8c772fff refs/changes/03/1385603/meta (LOCK_FAILURE)
]
at com.google.gerrit.server.update.BatchUpdate.wrapAndThrowException(BatchUpdate.java:235)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:178)
at com.google.gerrit.server.update.SubmissionExecutor.execute(SubmissionExecutor.java:61)
at com.google.gerrit.server.submit.MergeOp.integrateIntoHistory(MergeOp.java:667)
at com.google.gerrit.server.submit.MergeOp.lambda$merge$1(MergeOp.java:535)
at com.google.gerrit.server.update.RetryableChangeAction.lambda$new$0(RetryableChangeAction.java:48)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.google.gerrit.server.update.RetryHelper.executeWithTimeoutCount(RetryHelper.java:564)
at com.google.gerrit.server.update.RetryHelper.execute(RetryHelper.java:507)
at com.google.gerrit.server.update.RetryableAction.call(RetryableAction.java:173)
at com.google.gerrit.server.update.RetryableChangeAction.call(RetryableChangeAction.java:84)
at com.google.gerrit.server.submit.MergeOp.merge(MergeOp.java:547)
at com.google.gerrit.server.restapi.change.Submit.mergeChange(Submit.java:214)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:191)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:88)
Caused by: com.google.gerrit.git.LockFailureException: Update aborted with one or more lock failures: PackedBatchRefUpdate[
CREATE: 0000000000000000000000000000000000000000 fe1938a609b3d909c11a0dd26702a03bfb12ce2f refs/changes/03/1385603/2 (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: 3acb4a36a56d0936125616ec8169381eaded320d fe1938a609b3d909c11a0dd26702a03bfb12ce2f refs/heads/llvm (REJECTED_OTHER_REASON: transaction aborted)
UPDATE: f2ae8430e5842697c9a7cb68f84af97fed61094b 72dbdbe53da77a63b7f3ca27fcc7d38a8c772fff refs/changes/03/1385603/meta (LOCK_FAILURE)
]
at com.google.gerrit.git.RefUpdateUtil.checkResults(RefUpdateUtil.java:100)
at com.google.gerrit.git.RefUpdateUtil.executeChecked(RefUpdateUtil.java:57)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:386)
at com.google.gerrit.server.notedb.NoteDbUpdateManager.execute(NoteDbUpdateManager.java:336)
at com.google.gerrit.server.update.BatchUpdate$ChangesHandle.execute(BatchUpdate.java:614)
at com.google.gerrit.server.update.BatchUpdate.execute(BatchUpdate.java:150)
... 90 more
th...@gmail.com <th...@gmail.com> #9
Sent the comment to early:
one other occurrence was:
Submit Request 1:
startTime: 15:45:43
gatewayTimeout: 15:50:43
stopTime: 16:14:43
protocol: HTTP
statusCode: 200
error: EarlyEofException
user: User1
latency: 1739178 ms (~29 min)
memAlloc: 29.35 Gb
Submit Request 2:
startTime: 15:51:43
gatewayTimeout: 15:56:43
stopTime: 16:18:35
protocol: HTTP
statusCode: 503
error: LockFailureException
user: User1
latency: 1612394 ms (~26 min)
memAlloc: 27.44 Gb
Submit Request 3:
startTime: 16:09:16
gatewayTimeout: 16:14:16
stopTime: 16:32:43
protocol: HTTP
statusCode: 503
error: LockFailureException
user: User1
latency: 1407806 ms (~23 min)
memAlloc: 27.41 Gb
I don't have traces or thread dumps, so I don't know where the time was spend.
But these long submits only happen on branches using RebaseIfNecessary, which is only a handful in the project. All other branches are using MergeIfNecessary.
one other occurrence was:
Submit Request 1:
startTime: 15:45:43
gatewayTimeout: 15:50:43
stopTime: 16:14:43
protocol: HTTP
statusCode: 200
error: EarlyEofException
user: User1
latency: 1739178 ms (~29 min)
memAlloc: 29.35 Gb
Submit Request 2:
startTime: 15:51:43
gatewayTimeout: 15:56:43
stopTime: 16:18:35
protocol: HTTP
statusCode: 503
error: LockFailureException
user: User1
latency: 1612394 ms (~26 min)
memAlloc: 27.44 Gb
Submit Request 3:
startTime: 16:09:16
gatewayTimeout: 16:14:16
stopTime: 16:32:43
protocol: HTTP
statusCode: 503
error: LockFailureException
user: User1
latency: 1407806 ms (~23 min)
memAlloc: 27.41 Gb
I don't have traces or thread dumps, so I don't know where the time was spend.
But these long submits only happen on branches using RebaseIfNecessary, which is only a handful in the project. All other branches are using MergeIfNecessary.
sv...@axis.com <sv...@axis.com> #10
Are these stacks of changes or single changes?
Do you have the opportunity to add extra logging?
It would be interesting to know the states of:
* `incoming`
* `n`
* `c`
And their relationship when
if (!c.has(canMergeFlag) || !incoming.contains(c)) {
evaluates to true.
I still think it should be:
if (!c.has(canMergeFlag) && !incoming.contains(c)) {
Do you have the opportunity to add extra logging?
It would be interesting to know the states of:
* `incoming`
* `n`
* `c`
And their relationship when
if (!c.has(canMergeFlag) || !incoming.contains(c)) {
evaluates to true.
I still think it should be:
if (!c.has(canMergeFlag) && !incoming.contains(c)) {
th...@gmail.com <th...@gmail.com> #11
In both cases described above it was a single change. And at least for the second described case, a rebase was necessary.
I can add additional logging. It will however take until tomorrow to be deployed and the n some time to find an occurrence of that issue.
> I still think it should be:
> if (!c.has(canMergeFlag) && !incoming.contains(c)) {
If I follow the code correctly, all commits in `incoming` will have the canMergeFlag. The flag is only set in `MergeOp.validateChangeList`, which sets the flag to all commits that validated ok and returns only these commits as a BranchBatch, which will be used to create the `incoming`-Set in MergeOp.getSubmitStrategies(). Thus, if we know that `c` is in incoming, we know that it has the `canMergeFlag`. This also holds true the other way around, if `c` has the `canMergeFlag`, it has to be in the `incoming`-Set, because the flag is set nowhere else in the current code. So both variants of the condition should work, but I think OR is the better version, because if we have an incoming commit that does not have the flag for whatever reason something went wrong and we shouldn't go ahead with the submit.
I can add additional logging. It will however take until tomorrow to be deployed and the n some time to find an occurrence of that issue.
> I still think it should be:
> if (!c.has(canMergeFlag) && !incoming.contains(c)) {
If I follow the code correctly, all commits in `incoming` will have the canMergeFlag. The flag is only set in `MergeOp.validateChangeList`, which sets the flag to all commits that validated ok and returns only these commits as a BranchBatch, which will be used to create the `incoming`-Set in MergeOp.getSubmitStrategies(). Thus, if we know that `c` is in incoming, we know that it has the `canMergeFlag`. This also holds true the other way around, if `c` has the `canMergeFlag`, it has to be in the `incoming`-Set, because the flag is set nowhere else in the current code. So both variants of the condition should work, but I think OR is the better version, because if we have an incoming commit that does not have the flag for whatever reason something went wrong and we shouldn't go ahead with the submit.
th...@gmail.com <th...@gmail.com> #12
I pushed a change adding some logs: https://gerrit-review.googlesource.com/c/gerrit/+/366336
I will apply this change to our internal version. It should be deployed tomorrow morning.
I will apply this change to our internal version. It should be deployed tomorrow morning.
sv...@axis.com <sv...@axis.com> #13
Since you hinted that branches in the same repository has different submit-strategies, are you using prolog to set submit-strategy per branch?
If so, has this changed recently? I.e. has submit ever worked for these branches?
Was this part of a `submitWholeTopic` operation?
"but I think OR is the better version, because if we have an incoming commit that does not have the flag for whatever reason something went wrong and we shouldn't go ahead with the submit."
FWIW, Edwin doesn't recall why it's `||` and not `&&` other than that was how it was done in MergeSorter (which is understandable since it was 10+ years since he wrote the RebaseSorter ;-) ) [1]
The condition was added to MergeSorter here [2] and seems to be added for the reverse case "If the commit has canMerge flag but isn't a part of `incoming`".
I'm now a proponent of the even simpler condition:
if (!incoming.contains(c)) {
I'm not smart enough to figure out in what sort of situation we would set the flag but not add the commit to `submitted` or vice versa.
But I trust `contains` (harder to get wrong) more than `has(flag)` since (IIRC) `has(flag)` is dependent on that your are checking against the same object that the flag was set on and with the same instantiation of the flag that was set. It is fairly non trivial to get a different object or check with a different instantiation of the flag you want to check for (at least I have managed to mess this up on multiple occassions).
[1]https://gerrit-review.googlesource.com/c/gerrit/+/37780/10/gerrit-server/src/main/java/com/google/gerrit/server/git/RebaseSorter.java#59
[2]https://gerrit-review.googlesource.com/c/gerrit/+/37754/5/gerrit-server/src/main/java/com/google/gerrit/server/git/MergeSorter.java
If so, has this changed recently? I.e. has submit ever worked for these branches?
Was this part of a `submitWholeTopic` operation?
"but I think OR is the better version, because if we have an incoming commit that does not have the flag for whatever reason something went wrong and we shouldn't go ahead with the submit."
FWIW, Edwin doesn't recall why it's `||` and not `&&` other than that was how it was done in MergeSorter (which is understandable since it was 10+ years since he wrote the RebaseSorter ;-) ) [1]
The condition was added to MergeSorter here [2] and seems to be added for the reverse case "If the commit has canMerge flag but isn't a part of `incoming`".
I'm now a proponent of the even simpler condition:
if (!incoming.contains(c)) {
I'm not smart enough to figure out in what sort of situation we would set the flag but not add the commit to `submitted` or vice versa.
But I trust `contains` (harder to get wrong) more than `has(flag)` since (IIRC) `has(flag)` is dependent on that your are checking against the same object that the flag was set on and with the same instantiation of the flag that was set. It is fairly non trivial to get a different object or check with a different instantiation of the flag you want to check for (at least I have managed to mess this up on multiple occassions).
[1]
[2]
th...@gmail.com <th...@gmail.com> #14
> Since you hinted that branches in the same repository has different submit-strategies, are you using prolog to set submit-strategy per branch?
Yes, the project uses a prolog rule to do so.
> If so, has this changed recently? I.e. has submit ever worked for these branches?
The prolog rule is in place for years and the branch affected most uses it since 2021. It worked fine until a few weeks ago. The issue started occurring together with general performance issues, which are related to issues with Java GC (Parallel GC), where we see long-running Young Generation GCs with long STW-pauses. So it seems that the general performance issues might also lead to this issue. The performance issues started shortly after upgrading to 3.6.
> Was this part of a `submitWholeTopic` operation?
No. This feature is not enabled.
Yes, the project uses a prolog rule to do so.
> If so, has this changed recently? I.e. has submit ever worked for these branches?
The prolog rule is in place for years and the branch affected most uses it since 2021. It worked fine until a few weeks ago. The issue started occurring together with general performance issues, which are related to issues with Java GC (Parallel GC), where we see long-running Young Generation GCs with long STW-pauses. So it seems that the general performance issues might also lead to this issue. The performance issues started shortly after upgrading to 3.6.
> Was this part of a `submitWholeTopic` operation?
No. This feature is not enabled.
sv...@axis.com <sv...@axis.com> #15
Thanks!
This might not make sense but since you mentioned Java GC issues:
Do you know what type of reference the flags or the commits of a revwalk have?
Could it be that one of these java-objects was garbage-collected and then recreated?
Because if that would be possible it would fail the `c.has(canMergeFlag)` if I'm not mistaken.
This might not make sense but since you mentioned Java GC issues:
Do you know what type of reference the flags or the commits of a revwalk have?
Could it be that one of these java-objects was garbage-collected and then recreated?
Because if that would be possible it would fail the `c.has(canMergeFlag)` if I'm not mistaken.
ma...@gmail.com <ma...@gmail.com> #16
- flags are stored as bits of an integer field of RevObject which is the base class of RevCommit
- the RevWalk has a collection "roots" strongly referencing all RevCommits marked via #markStart or #markUninteresting
- during the walk commits are loaded from disk and parsed into transient RevCommits
- RevCommits have strong references to their parent commits
- during the walk flags can be carried over to their parent commits
- this means all RevCommits already walked form a connected object graph which cannot be garbage collected by the Java GC as long as the
RevWalk object is in the live-set of the JVM
- the RevWalk has a collection "roots" strongly referencing all RevCommits marked via #markStart or #markUninteresting
- during the walk commits are loaded from disk and parsed into transient RevCommits
- RevCommits have strong references to their parent commits
- during the walk flags can be carried over to their parent commits
- this means all RevCommits already walked form a connected object graph which cannot be garbage collected by the Java GC as long as the
RevWalk object is in the live-set of the JVM
ma...@gmail.com <ma...@gmail.com> #17
- calling #dispose clears the roots collection which removes all references to RevCommits held by the RevWalk which means they can then be garbage collected if there are no other references keeping them alive
sv...@axis.com <sv...@axis.com> #18
Thanks for the Explanation Matthias.
TLDR; Seems like I was barking up the wrong tree, unless there's some other type of flag-mixup.
What I was thinking of is that the flags are unique for each RevWalk instance [1], so that you may allocate a "CAN_MERGE" flag in one RevWalk (RevFlag flag1 = rw.newFlag("CAN_MERGE")), set that flag on a RevCommit (`c1.add(flag1)` ) and allocate the flag "again" on the same RevWalk (RevFlag flag2 = rw.newFlag("CAN_MERGE") and it will not be represented by the same bit. So that `c1.has(flag2)` is false.
But FWICT canMergeFlag is only allocated once [2].
[1]https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/refs/heads/master/org.eclipse.jgit/src/org/eclipse/jgit/revwalk/RevWalk.java#1331
[2]https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/submit/MergeOpRepoManager.java#83
TLDR; Seems like I was barking up the wrong tree, unless there's some other type of flag-mixup.
What I was thinking of is that the flags are unique for each RevWalk instance [1], so that you may allocate a "CAN_MERGE" flag in one RevWalk (RevFlag flag1 = rw.newFlag("CAN_MERGE")), set that flag on a RevCommit (`c1.add(flag1)` ) and allocate the flag "again" on the same RevWalk (RevFlag flag2 = rw.newFlag("CAN_MERGE") and it will not be represented by the same bit. So that `c1.has(flag2)` is false.
But FWICT canMergeFlag is only allocated once [2].
[1]
[2]
th...@gmail.com <th...@gmail.com> #19
Another idea we had in our team was to just mark all branch tips as uninteresting in the first place, since this would completely avoid the check whether the commit was already merged into any branch. I found that this was actually the case until https://gerrit-review.googlesource.com/c/gerrit/+/90314 . We discussed that in the team and came to the conclusion that the use case that was fixed by that change is only rarely used and should get a special treatment, but the optimization of marking all branch tips as uninteresting should be the default. This would help our case, even if we can't explain how we ended up in that check at all.
@Sven since you implemented change 90314, do you agree?
@Sven since you implemented change 90314, do you agree?
sv...@axis.com <sv...@axis.com> #20
No I don't agree.
That would re-introduce the same bug again.
It's a fairly common use-case at $DAY_JOB to have a temporary branch that you use in parallel to uploading changes (FWIW I don't work like this myself but many of our Gerrit users work like this and aren't willing to change).
Furthermore, wouldn't it negate the project option "Create a new change for every commit not in the target branch"? I.e. you could create the change but you could never submit it?
If you change the condition to be:
if (!incoming.contains(c)) {
You won't have any more issues, right?
The canMergeFlag is set for all commits in `incoming`and I cannot find an instance in the source-code when it's ever removed.
Can you explain what issues you fear will happen if we stop checking the canMergeFlag here and instead trust the `incoming` collection?
That would re-introduce the same bug again.
It's a fairly common use-case at $DAY_JOB to have a temporary branch that you use in parallel to uploading changes (FWIW I don't work like this myself but many of our Gerrit users work like this and aren't willing to change).
Furthermore, wouldn't it negate the project option "Create a new change for every commit not in the target branch"? I.e. you could create the change but you could never submit it?
If you change the condition to be:
if (!incoming.contains(c)) {
You won't have any more issues, right?
The canMergeFlag is set for all commits in `incoming`and I cannot find an instance in the source-code when it's ever removed.
Can you explain what issues you fear will happen if we stop checking the canMergeFlag here and instead trust the `incoming` collection?
ma...@gmail.com <ma...@gmail.com> #21
Our proposal is to only mark all branch tips as uninteresting if the project option
"Create a new change for every commit not in the target branch" is switched off.
Hence, I think this shouldn't affect your use case where this option is switched on.
"Create a new change for every commit not in the target branch" is switched off.
Hence, I think this shouldn't affect your use case where this option is switched on.
th...@gmail.com <th...@gmail.com> #22
Sorry when my proposal was not clear. It was meant like Matthias says. I have pushed a change (still WIP) that would implement that: https://gerrit-review.googlesource.com/c/gerrit/+/366462 . I will add tests to verify that it works with "Create a new change for every commit not in the target branch".
th...@gmail.com <th...@gmail.com> #23
As an additional finding, today we again had a 20 min+ submit. This time there was only a single submit request, so the issue is not being caused by multiple submit requests being sent at the same time. I am nevertheless starting to work on a change to prohibit multiple concurrent submit requests, since that leads to unnecessary work. We should fail early, if a change is already processed by a MergeOp.
> Can you explain what issues you fear will happen if we stop checking the canMergeFlag here and instead trust the `incoming` collection?
I personally don't see any issue, but I might have missed something there. It is still worth a shot. Let me try it and see whether at least all tests are still successful. This might be a historic artefact, where there were other places that could set the flag. Or since the OpenRepo-object containing the RevWalk is cached, the flag is shared between MergeOps.
On the other hand, I am also not 100 % sure that the flag is our issue. This just seems the most likely candidate considering what I learned by debugging and reading the code.
> Can you explain what issues you fear will happen if we stop checking the canMergeFlag here and instead trust the `incoming` collection?
I personally don't see any issue, but I might have missed something there. It is still worth a shot. Let me try it and see whether at least all tests are still successful. This might be a historic artefact, where there were other places that could set the flag. Or since the OpenRepo-object containing the RevWalk is cached, the flag is shared between MergeOps.
On the other hand, I am also not 100 % sure that the flag is our issue. This just seems the most likely candidate considering what I learned by debugging and reading the code.
sv...@axis.com <sv...@axis.com> #24
"Our proposal is to only mark all branch tips as uninteresting if the project option
"Create a new change for every commit not in the target branch" is switched off."
Makes much more sense.
"This might be a historic artefact, where there were other places that could set the flag. Or since the OpenRepo-object containing the RevWalk is cached, the flag is shared between MergeOps."
I also had the feeling that solving the problem with RevFlags was first chosen but then the incoming collection needed to be added (possibly because flags leaked from other MergeOp:es". But the problem could have been solved by the incoming collection all the time.
I don't know for certain that there aren't any unconsidered edge-cases, but I know that historically people (me included) haven't been overly eager to change or review the submit-code due to it's complexity and potential impact. A behavior that promotes dead or unnecessary code-paths.
"Create a new change for every commit not in the target branch" is switched off."
Makes much more sense.
"This might be a historic artefact, where there were other places that could set the flag. Or since the OpenRepo-object containing the RevWalk is cached, the flag is shared between MergeOps."
I also had the feeling that solving the problem with RevFlags was first chosen but then the incoming collection needed to be added (possibly because flags leaked from other MergeOp:es". But the problem could have been solved by the incoming collection all the time.
I don't know for certain that there aren't any unconsidered edge-cases, but I know that historically people (me included) haven't been overly eager to change or review the submit-code due to it's complexity and potential impact. A behavior that promotes dead or unnecessary code-paths.
gi...@appspot.gserviceaccount.com <gi...@appspot.gserviceaccount.com> #25
The following revision refers to this bug:
https://gerrit.googlesource.com/gerrit/+/4f12dcc3748e195a8bd788bc382fb3deb70c4e71
commit 4f12dcc3748e195a8bd788bc382fb3deb70c4e71
Author: Thomas Draebing <thomas.draebing@sap.com>
Date: Tue Mar 28 18:59:04 2023
Set uninteresting branches based on project configuration
The set of uninteresting branches varies depending on whether the
"Create a new change for every commit not in target" is set or not.
If it is not set all branches are uninteresting since you aren't
allowed to push the same commit for review on two different branches.
If it is set there are a lot more edge-cases to consider which
potentially requires a lot of rev-walking which in turn potentially
comes with a significant performance impact.
Not having to take the penalty for uninteresting edge-cases is the
preferred behavior.
Bug:https://crbug.com/gerrit/16814
Release-Notes: Improve Rebase-SubmitStrategy performance in some cases
Change-Id: I2338c0a94304b7f899d446afa23fde920dc4639c
[modify]https://gerrit.googlesource.com/gerrit/+/4f12dcc3748e195a8bd788bc382fb3deb70c4e71/java/com/google/gerrit/server/submit/RebaseSorter.java
[modify]https://gerrit.googlesource.com/gerrit/+/4f12dcc3748e195a8bd788bc382fb3deb70c4e71/java/com/google/gerrit/server/submit/SubmitStrategy.java
commit 4f12dcc3748e195a8bd788bc382fb3deb70c4e71
Author: Thomas Draebing <thomas.draebing@sap.com>
Date: Tue Mar 28 18:59:04 2023
Set uninteresting branches based on project configuration
The set of uninteresting branches varies depending on whether the
"Create a new change for every commit not in target" is set or not.
If it is not set all branches are uninteresting since you aren't
allowed to push the same commit for review on two different branches.
If it is set there are a lot more edge-cases to consider which
potentially requires a lot of rev-walking which in turn potentially
comes with a significant performance impact.
Not having to take the penalty for uninteresting edge-cases is the
preferred behavior.
Bug:
Release-Notes: Improve Rebase-SubmitStrategy performance in some cases
Change-Id: I2338c0a94304b7f899d446afa23fde920dc4639c
[modify]
[modify]
th...@gmail.com <th...@gmail.com> #26
[Empty comment from Monorail migration]
Description
*************************************************************************
*** !!!! THIS BUG TRACKER IS FOR GERRIT CODE REVIEW !!!!
*** Do not submit bugs for chrome/android and issues with your company's
*** Gerrit setup here. Those issues belong in different issue trackers.
*************************************************************************
What steps will reproduce the problem?
This issue is not easily reproducible. We face this issue on a Gerrit server that anyway suffers from performance issues during peak usage and only on a huge mono repo.
This issue only occurs on branches using RebaseIfNecessary as a submit strategy (set by submit_type prolog rule). Some submits take 20+ min. Users see a 504 Gateway timeout that is returned by the proxy, but the submit finishes successfully eventually. This issue happens already with the simple case of submitting a single non-merge change.
What is the expected output?
The submit should be done in a reasonable amount of time. The Gateway timeout of 5 min should be more than enough.
What do you see instead?
The submit takes 20+ min.
Please provide any additional information below.
We were able to perform thread dumps during one such occurrence and found that Gerrit spend a lot of time in this state:
"HTTP POST /changes/project~1234/revisions/1/submit (user from 1.1.1.1)" #110184 prio=5 os_prio=0 cpu=2720524.70ms elapsed=9239.51s allocated=131G defined_classes=18 tid=0x00007f5ee0016000 nid=0x160c runnable [0x00007f63e15a7000]
java.lang.Thread.State: RUNNABLE
at org.eclipse.jgit.revwalk.DateRevQueue.everbodyHasFlag(DateRevQueue.java:155)
at org.eclipse.jgit.revwalk.MergeBaseGenerator._next(MergeBaseGenerator.java:143)
at org.eclipse.jgit.revwalk.MergeBaseGenerator.init(MergeBaseGenerator.java:76)
at org.eclipse.jgit.revwalk.StartGenerator.next(StartGenerator.java:72)
at org.eclipse.jgit.revwalk.RevWalk.next(RevWalk.java:615)
at com.google.gerrit.server.git.CodeReviewCommit$CodeReviewRevWalk.next(CodeReviewCommit.java:85)
at com.google.gerrit.server.git.CodeReviewCommit$CodeReviewRevWalk.next(CodeReviewCommit.java:68)
at org.eclipse.jgit.revwalk.RevWalk.isMergedInto(RevWalk.java:440)
at com.google.gerrit.server.submit.RebaseSorter.isAlreadyMerged(RebaseSorter.java:117)
at com.google.gerrit.server.submit.RebaseSorter.sort(RebaseSorter.java:81)
at com.google.gerrit.server.submit.RebaseSubmitStrategy.buildOps(RebaseSubmitStrategy.java:61)
at com.google.gerrit.server.submit.SubmitStrategy.addOps(SubmitStrategy.java:263)
at com.google.gerrit.server.submit.MergeOp.getSubmitStrategies(MergeOp.java:757)
at com.google.gerrit.server.submit.MergeOp.integrateIntoHistory(MergeOp.java:644)
at com.google.gerrit.server.submit.MergeOp.lambda$merge$1(MergeOp.java:535)
at com.google.gerrit.server.submit.MergeOp$$Lambda$2753/0x00007f6f92ac5168.call(Unknown Source)
at com.google.gerrit.server.update.RetryableChangeAction.lambda$new$0(RetryableChangeAction.java:48)
at com.google.gerrit.server.update.RetryableChangeAction$$Lambda$1852/0x00007f6790c158b0.call(Unknown Source)
at com.google.gerrit.server.update.RetryHelper$$Lambda$1093/0x00007f6de0f90058.call(Unknown Source)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.google.gerrit.server.update.RetryHelper.executeWithTimeoutCount(RetryHelper.java:564)
at com.google.gerrit.server.update.RetryHelper.execute(RetryHelper.java:507)
at com.google.gerrit.server.update.RetryableAction.call(RetryableAction.java:173)
at com.google.gerrit.server.update.RetryableChangeAction.call(RetryableChangeAction.java:84)
at com.google.gerrit.server.submit.MergeOp.merge(MergeOp.java:547)
at com.google.gerrit.server.restapi.change.Submit.mergeChange(Submit.java:214)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:191)
at com.google.gerrit.server.restapi.change.Submit.apply(Submit.java:88)
...
The methodhttps://gerrit-review.googlesource.com/c/gerrit/+/91610 ). This is done to identify commits coming from changes with merge commits. This method will loop over all branches (1000+ in our case) and do a revwalk to check whether the commit is reachable from the respective branch tip.
com.google.gerrit.server.submit.RebaseSorter.isAlreadyMerged(RebaseSorter.java:117)
checks for each incoming commit whether it is already merged into a branch. This was introduced in Gerrit 2.12 (However, in our case, where we only submit changes that are not merges, we should not even execute the check in [1] should have resulted as false. I was able to reproduce this behaviour once in a local test system. I ended up within the if-block, but when I ran the condition code again in the debugger it correctly evaluated to false. Thus, I assume that this is an issue with thread safety and a race condition. This is also supported by the fact, that this code is present in Gerrit for a very long time, but the issue only occurs now that we see general performance issues.
I have prepared a workaround [2], that specifically checks whether the change series that is to be submitted contains a merge and only then runs isAlreadyMerged(). This should fix the issue for us, but does not solve the underlying issue that we falsely end up in the if-block containing this check.
I went through the submit code to identify possible places where issues with thread safety in regards to the condition could occur. For
!incoming.contains(c)
, the incoming-set seems to not be changed after it is initially filled and it is only used by a single thread as far as I can tell (We should consider making it an ImmutableSet).The more likely candidate would be
!c.has(canMergeFlag)
, since the RevWalk providing commit c is part of the cached OpenRepo. While debugging two concurrent submits on the same repo, both RevWalks were different objects however, so it looks like the OpenRepo-cache is not shared between threads, but further investigation is required.I would appreciate any input on this issue especially from someone who is familiar with the submit code.
If there are no vetos, I would also appreciate, if we could submit my workaround for now.
Thanks,
Thomas
(Adding Edwin and Youssef in CC due to there relatively recent contributions to MergeOp)
[1]https://gerrit.googlesource.com/gerrit/+/refs/heads/stable-3.6/java/com/google/gerrit/server/submit/RebaseSorter.java#80 https://gerrit-review.googlesource.com/c/gerrit/+/364134
[2]