Help
Change theme
Press space for more information.
Refresh (Shortcut: Shift+r)
Go home (Shortcut: u)
Copy issue ID
Show links for this issue (Shortcut: i, l)
Previous Issue (Shortcut: k)
Next Issue (Shortcut: j)
Sign in to use full features.
View issue level access limits(Press Alt + Right arrow for more information)
Pending code changes (auto-populated)
The version field defines the version of the software the bug was found in. [ID: 1154637]
Notification menu
Vote: I am impacted
Unintended behavior
View staffing
Description
**What steps will reproduce the problem?**
--------------------------------------
Same steps as
Using SAML is just one way to reproduce the problem.
**1.** Configure SAML authentication with Google Cloud Identity
**2.** Login as user who has non-ascii ISO-8859-1 characters in name attribute
As Google Cloud Identity does not have full name attribute I was using computedDisplayName = true
**What is the expected output?**
----------------------------
The name is displayed correctly and stored correctly in All-Users.git repository.
**What do you see instead?**
------------------------
See the attached gerrit-saml-name-bug.png; in my case the letter 'ö' (LATIN SMALL LETTER O WITH DIAERESIS, unicode code point U+00F6) is replaced with a box containing ?
Additional information
----------------------
In All-Users.git repository the account.config file contains
fullName = Janne R�nkk�
Where ös are replaced with three bytes 0xef 0xbf 0xbd.
For background on this read up on
[1] encodes a string to ISO-8859-1 and decodes it back as UTF-8. This will break any strings that have different byte representations in these encodings. The commit was submitted with no test case, so we can only guess as to what the intent was - and our bets guess is that it is to represent characters outside the ISO-8859-1 character set as `?`. However the way this was done also broke valid ISO-8859-1 characters.
A proposed fix for this would be to decode back to ISO-8859-1 instead of UTF-8, whereby characters outside the ISO-8859-1 set would be represented as question marks, and valid characters would remain as such. Arguably this is too defensive and should happen upstream, or at Gerrit's borders (I don't know Gerrit well enough to know whether that would be here), but this fix would:fix characters that are within the ISO-8859-1 set, replace characters outside said set with `?`, and provide a guard for Gerrit against invalid characters.
We should submit the change with a test this time :)
[1]