Ticket #407 (closed defect: fixed)
{{{darcs_buildbot}}} uses {{{.encode('ascii')}}, but {{{.encode('utf-8')}}} works better
| Reported by: | zooko | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | undecided |
| Version: | 0.7.9 | Keywords: | darcs |
| Cc: | dustin |
Description
I committed a darcs patch written by François Deppierraz. The darcs_buildbot script failed with:
Traceback (most recent call last):
File "/usr/local/bin/darcs_buildbot", line 164, in <module>
Finished applying...
sendChanges(MASTER)
File "/usr/local/bin/darcs_buildbot", line 131, in sendChanges
changes = findNewChanges()
File "/usr/local/bin/darcs_buildbot", line 114, in findNewChanges
changes = getSomeChanges(lookback)
File "/usr/local/bin/darcs_buildbot", line 100, in getSomeChanges
return getChangesFromCommand(cmd, count)
File "/usr/local/bin/darcs_buildbot", line 95, in getChangesFromCommand
changes.append(makeChange(p))
File "/usr/local/bin/darcs_buildbot", line 59, in makeChange
comments = comments.encode("ascii")
UnicodeEncodeError: 'ascii' codec can't encode character u'xe7' in position 217: ordinal not in range(128)
Posthook failed!
Apply failed!
Then I hacked that script by replacing all instances of the string 'ascii' with the string 'utf-8' and it worked.
Change History
comment:2 Changed 4 years ago by zooko
I don't understand the specific situation, other than that I tried this change and it worked for me.
I have the vague notion that utf-8 is always better than ascii and that everything would have been a lot easier if Python had just made utf-8 its default encoding instead of ascii, but I don't know whether that general notion is true in this case.
comment:3 Changed 4 years ago by dustin
Yeah, Py3k makes utf-8 the default encoding, but that's really another story.
In this case, I see:
60 # note that these are all unicode. Because PB can't handle unicode, we 61 # encode them into ascii, which will blow up early if there's anything we 62 # can't get to the far side. When we move to something that *can* handle 63 # unicode (like newpb), remove this.
which means that we're stuck with the ascii charset -- we just can't get anything that's not ascii over to the other side. The other side is not expecting utf-8-encoded data, and will not know to decode such data back to a unicode string. Adding "replace" just means that any non-ascii characters are handled gracefully (replaced with a placeholder character) instead of triggering a traceback. It's not pretty, but it gets the commit into buildbot.
Obviously, the larger solution is to support unicode universally, and make arrangements to encode and decode strings when they pass through ascii-only channels, but that's going to be a much more significant task.
What do you think?
comment:4 Changed 4 years ago by zooko
Hm, but when I told it to encode this patch with utf-8, it worked, even though it contained a non-ascii char (in the comment, toward the end, in the name "François")
http://allmydata.org/trac/tahoe/changeset/3274
Buildbot received the notification about that patch and triggered appropriately, and even included the sedilla:
http://allmydata.org/buildbot/changes/694
Could it be that Twisted PB has fixed this limitation since that buildbot code comment was written?
comment:5 Changed 4 years ago by zooko
Whoops, sorry, wrong hyperlink. The changeset patch in question was this one:
http://allmydata.org/trac/tahoe/changeset/3309
Which led to this buildbot event (already hyperlinked correctly in my previous message):
![[Buildbot Logo]](/chrome/site/header-text-transparent.png)
I think that, for the moment, the more appropriate solution is to add the "replace" error-handling flag to the encode() call. Presumably someone added this encode() because the target does not expect unicode strings.
comments = comments.encode("ascii", "replace")What do you think?