Opened 4 years ago

Last modified 18 months ago

#2757 new enhancement

Use chardet on incoming bytestrings

Reported by: dustin Owned by:
Priority: major Milestone: 0.9.+
Version: 0.8.8 Keywords: encoding


There's a library, chardet, which can do a reasonable job of guessing the charset of a bytestring.

There are a number of places in Buildbot where incoming data is a bytestring. Most of those allow the user to specify an encoding, and default to UTF-8. For example, change sources generally get bytestrings for commit comments, authors, and so on.

In the default case, it may be more convenient for users if we dynamically detect the character encoding of these strings. This would amount to "doing the right thing" when possible, with the fallback option for users to supply an explicit encoding.

Chardet would also be useful in the ascii2unicode method, which currently only allows ascii bytestrings. Then a little mojibake is the unlikely worst case, rather than an exception

Change History (1)

Note: See TracTickets for help on using tickets.