Ticket #827 (closed defect: fixed)

Opened 21 months ago

Last modified 5 months ago

ChangeSources may produce changes containing non-ascii bytestrings

Reported by: dustin Owned by:
Priority: critical Milestone: 0.8.5
Version: 0.7.12 Keywords: encoding database
Cc: sobucni@…

Description

We need to be careful and flexible in how bytestrings are handled when they come from pollers and commit hooks. I'm not sure what happens right now, but we should be applying some kind of decoding to them.

Change History

comment:1 Changed 21 months ago by thebrasse

I just want to point out that git_buildbot.py has been modified (in  f79aa7e5e0d6da468c8f) to use unicode strings for comments and such.

This might no be relevant since a more general approach would be to make buildbot decode the strings as they come in.

comment:2 Changed 21 months ago by dustin

thebrasse: actually, that's exactly the approach I want to take - the changesource itself should be responsible for decoding into unicode strings.

comment:3 Changed 21 months ago by dustin

  • Priority changed from major to critical
  • Milestone changed from 0.8.0 to 0.8.1

I have a fix for this at

 http://github.com/djmitche/buildbot/commits/bug827

but I think that if I merge it I will open the unicode floodgates and have to unicodify the rest of Buildbot - something I'm not ready to do in 0.8.0.

comment:4 Changed 21 months ago by willw

I think our p4poller just ran into this issue yesterday - causing it to stop working:

{ 2010-05-18 12:28:36+0200 [-] adding change, who claire, 1 files, rev=400373, branch=trunk repository=, comments Change 400373 by claire@fft_claire on 2010/05/18 12:25:54

!B - Fix Review: ID 22330 by Márcio Martins

, category None

2010-05-18 12:28:36+0200 [-] P4 poll failed

Traceback (most recent call last):

File "/usr/lib/python2.5/site-packages/Twisted-10.0.0-py2.5-linux-x86_64.egg/twisted/internet/defer.py", line 354, in _startRunCallbacks

self._runCallbacks()

File "/usr/lib/python2.5/site-packages/Twisted-10.0.0-py2.5-linux-x86_64.egg/twisted/internet/defer.py", line 371, in _runCallbacks

self.result = callback(self.result, *args, kw)

File "/usr/lib/python2.5/site-packages/Twisted-10.0.0-py2.5-linux-x86_64.egg/twisted/internet/defer.py", line 330, in _continue

self.unpause()

File "/usr/lib/python2.5/site-packages/Twisted-10.0.0-py2.5-linux-x86_64.egg/twisted/internet/defer.py", line 325, in unpause

self._runCallbacks()

--- <exception caught here> ---

File "/usr/lib/python2.5/site-packages/Twisted-10.0.0-py2.5-linux-x86_64.egg/twisted/internet/defer.py", line 371, in _runCallbacks

self.result = callback(self.result, *args, kw)

File "/usr/lib/python2.5/site-packages/buildbot-0.8.0rc2-py2.5.egg/buildbot/changes/p4poller.py", line 222, in _process_describe

self.parent.addChange(c)

File "/usr/lib/python2.5/site-packages/buildbot-0.8.0rc2-py2.5.egg/buildbot/changes/manager.py", line 113, in addChange

self.parent.addChange(change)

File "/usr/lib/python2.5/site-packages/buildbot-0.8.0rc2-py2.5.egg/buildbot/master.py", line 1035, in addChange

self.db.addChangeToDatabase(change)

File "/usr/lib/python2.5/site-packages/buildbot-0.8.0rc2-py2.5.egg/buildbot/db/connector.py", line 306, in addChangeToDatabase

self.runInteractionNow(self._txn_addChangeToDatabase, change)

File "/usr/lib/python2.5/site-packages/buildbot-0.8.0rc2-py2.5.egg/buildbot/db/connector.py", line 210, in runInteractionNow

return self._runInteractionNow(interaction, *args, kwargs)

File "/usr/lib/python2.5/site-packages/buildbot-0.8.0rc2-py2.5.egg/buildbot/db/connector.py", line 235, in _runInteractionNow

result = interaction(c, *args, kwargs)

File "/usr/lib/python2.5/site-packages/buildbot-0.8.0rc2-py2.5.egg/buildbot/db/connector.py", line 324, in _txn_addChangeToDatabase

t.execute(q, values)

pysqlite2.dbapi2.ProgrammingError?: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

}

Is it safe for us just to apply the fix above?

comment:5 Changed 21 months ago by dustin

  • Summary changed from ChangeSources may produce changes containing non-unicode strings to ChangeSources may produce changes containing non-ascii bytestrings

Probably not - I pretty badly misunderstood the encoding requirements when I wrote that patch.

You should probably adjust the P4 changesource to produce unicode objects by decoding from whatever encoding is appropriate. If you make that encoding configurable, it would be a great change to merge upstream.

comment:6 Changed 21 months ago by dustin

  • Keywords encoding added

comment:7 Changed 21 months ago by dustin

  • Keywords database added

comment:8 Changed 20 months ago by willw

Any luck finding a workaround for this? It's giving me some major headaches at the moment, and unfortunately I'm not to sure where to begin in order to solve it myself...

comment:9 Changed 20 months ago by dustin

No, I still haven't tackled the encoding issues. I'll admit I'm still pretty baffled about them.

comment:10 Changed 19 months ago by incubos

  • Cc sobucni@… added

comment:11 Changed 18 months ago by dustin

  • Milestone changed from 0.8.2 to 0.8.3

We seem to be limping along without solving encoding issues so far..

comment:12 Changed 14 months ago by ayust

  • Milestone changed from 0.8.3 to 0.8.+

comment:13 Changed 8 months ago by dustin

  • Status changed from new to closed
  • Resolution set to fixed

Now that everything's in the db, this is pretty much solved by force (since the db's store unicode). There are a few remaining bugs to get e.g., PBChangeSource to do the right thing, but there's no need for a general framework.

comment:14 Changed 7 months ago by birbilakos

I'm having a similar issue with 0.8.4p1 and P4 changesource:

File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/base.py", line 1491, in _execute_clauseelement

compiled_sql, distilled_params

File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/base.py", line 1599, in _execute_context

context)

File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/base.py", line 1592, in _execute_context

context)

File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/default.py", line 325, in do_execute

cursor.execute(statement, parameters)

sqlalchemy.exc.ProgrammingError?: (ProgrammingError?) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. u'INSERT INTO changes (author, comments, is_dir, branch, revision, revlink, when_timestamp, category, repository, project) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)' ('aluo', 'Change 83431 by uname_1.1 on 2011/07/12 02:13:52\n\n\tbug39265: - there should be warning message for changing System IP.\n\t\n\nJobs fixed ...\n\nbug39265 by uname@… *RESOLVED-FIXED*\n\n\t- there should be warning message for changing \x93System IP\x94.\n\n', 0, '1.1', '83431', , 1310462032, None, , )

I think the special characters: \x93\x94 are causing these issues.

comment:15 Changed 7 months ago by dustin

Can you paste the whole traceback? Thanks

comment:16 Changed 7 months ago by birbilakos

Sure,

Here's a full traceback of this issue. It happens every day depending on the special characters that get passed to the Perforce changelists.

2011-07-15 12:31:01+0300 [-] P4 poll failed
	Traceback (most recent call last):
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 269, in errback
	    self._startRunCallbacks(fail)
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 312, in _startRunCallbacks
	    self._runCallbacks()
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 328, in _runCallbacks
	    self.result = callback(self.result, *args, **kw)
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 630, in gotResult
	    _deferGenerator(g, deferred)
	--- <exception caught here> ---
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 605, in _deferGenerator
	    result = g.next()
	  File "/usr/local/lib/python2.6/site-packages/buildbot-0.8.4p1-py2.6.egg/buildbot/changes/p4poller.py", line 186, in _poll
	    wfd.getResult()
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 584, in getResult
	    self.result.raiseException()
	  File "/usr/local/lib/python2.6/site-packages/twisted/python/threadpool.py", line 210, in _worker
	    result = context.call(ctx, function, *args, **kwargs)
	  File "/usr/local/lib/python2.6/site-packages/twisted/python/context.py", line 59, in callWithContext
	    return self.currentContext().callWithContext(ctx, func, *args, **kw)
	  File "/usr/local/lib/python2.6/site-packages/twisted/python/context.py", line 37, in callWithContext
	    return func(*args,**kw)
	  File "/usr/local/lib/python2.6/site-packages/buildbot-0.8.4p1-py2.6.egg/buildbot/db/pool.py", line 112, in thd
	    rv = callable(conn, *args, **kwargs)
	  File "/usr/local/lib/python2.6/site-packages/buildbot-0.8.4p1-py2.6.egg/buildbot/db/changes.py", line 141, in thd
	    project=project))
	  File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/base.py", line 1358, in execute
	    params)
	  File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/base.py", line 1491, in _execute_clauseelement
	    compiled_sql, distilled_params
	  File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/base.py", line 1599, in _execute_context
	    context)
	  File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/base.py", line 1592, in _execute_context
	    context)
	  File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/default.py", line 325, in do_execute
	    cursor.execute(statement, parameters)
	sqlalchemy.exc.ProgrammingError: (ProgrammingError) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. u'INSERT INTO changes (author, comments, is_dir, branch, revision, revlink, when_timestamp, category, repository, project) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)' ('build', 'Change 83699 by build@build1.1 on 2011/07/14 03:37:24\n\n\tIntegrating //depot/1.1_maint/... to //depot/1.1/... as of change 83680\n\nJobs fixed ...\n\nbug31060 by relengr *REOPENED*\n\n\Release Bug\n\nbug36707 by xxx *VERIFIED-FIXED*\n\n\tPlayback fails for MBM file http://sph_breast_MBM.mp4\n\nbug36759 by xxx *VERIFIED-FIXED*\n\n\tChanges needed in Flexible Redirection to support Day Pass at Alltel.\n\nbug36767 by xxx *VERIFIED-FIXED*\n\n\t\x91307\x92 responses from the server are sent to the client as \x91206\x92.\n\nbug36796 by xxx *RESOLVED-FIXED*\n\n\tRadius enhancement to be able to handle and log the ULI VSA\n\nbug36833 by xxx *VERIFIED-FIXED*\n\n\tKPI report inconsistent throughput with KPI Parser log errors\n\nbug36972 by xxx *RESOLVED-FIXED*\n\n\tWrong value for 3GPP-User-Location-Info registered in the sdb\n\n', '83699', '', 1310639844, None, '', '')
Last edited 7 months ago by dustin (previous) (diff)

comment:17 Changed 7 months ago by dustin

  • Status changed from closed to reopened
  • Resolution fixed deleted
  • Milestone changed from 0.8.+ to 0.8.5

comment:18 Changed 6 months ago by dustin

OK, I tried to make the following change to the tests to replicate this:

  • master/buildbot/test/unit/test_changes_p4poller.py

    diff --git a/master/buildbot/test/unit/test_changes_p4poller.py b/master/buildbot/test/unit/test_changes_p4poller.py
    index 100ce29..766aacf 100644
    a b third_p4changes = \ 
    3333""" 
    3434 
    3535change_4_log = \ 
    36 """Change 4 by mpatel@testclient on 2006/04/13 21:55:39 
     36u"""Change 4 by mpatel@testclient on 2006/04/13 21:55:39 
    3737 
    38        short desc truncated because this is a long description. 
    39 """ 
     38       short desc truncated because this is a long \N{SNOWMAN}. 
     39""".encode('utf8') # test decoding unicode 
    4040change_3_log = \ 
    4141"""Change 3 by bob@testclient on 2006/04/13 21:51:39 

and I couldn't replicate it. I don't have access to a perforce instance, so I don't know exactly what the various pieces are, but this seems to be the same kind of response that contains the \x93\x94 you're seeing. What have I missed?

My plan to fix this is to add an encoding parameter to the P4Poller, and decode the responses from 'p4' using that encoding. But I need to replicate the error first.

comment:19 Changed 6 months ago by birbilakos

Hello dustin,

Thank you for looking into this issue. Here is a changelist example that is causing this kind of error:

p4 change -o 85902

Change 85902 by asd@asd.asdf.11 on 2011/08/11 03:18:06

	bug 40295: 
	1. Reason: When mousing down the update image button (not click), its icon name will be suffixed to "Down" such as "xxxDown.jpg", so the button seems to disappear. 
	2. Fix: using setShowDown(false) when creating imageButton.

Jobs fixed ...

bug40295 by asd@asdf.com *CLOSED-FIXED*

	ASDF-GUI-P3-�Upgrade Icon� disappears sometimes.

Affected files ...

... //depot/....

The problem comes from those special characters in the buglist (which are actually quote chars: “). I imagine those are not the only ones. In order to replicate this, please try to use the “ chars in the changelist description or anywhere else actually in the p4 change.

comment:20 Changed 6 months ago by birbilakos

And here is the tail from twisted.log:

2011-08-23 12:32:35+0300 [-] P4 poll failed
	Traceback (most recent call last):
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 269, in errback
	    self._startRunCallbacks(fail)
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 312, in _startRunCallbacks
	    self._runCallbacks()
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 328, in _runCallbacks
	    self.result = callback(self.result, *args, **kw)
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 630, in gotResult
	    _deferGenerator(g, deferred)
	--- <exception caught here> ---
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 605, in _deferGenerator
	    result = g.next()
	  File "/usr/local/lib/python2.6/site-packages/buildbot-0.8.4p2-py2.6.egg/buildbot/changes/p4poller.py", line 186, in _poll
	    wfd.getResult()
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 584, in getResult
	    self.result.raiseException()
	  File "/usr/local/lib/python2.6/site-packages/twisted/python/threadpool.py", line 210, in _worker
	    result = context.call(ctx, function, *args, **kwargs)
	  File "/usr/local/lib/python2.6/site-packages/twisted/python/context.py", line 59, in callWithContext
	    return self.currentContext().callWithContext(ctx, func, *args, **kw)
	  File "/usr/local/lib/python2.6/site-packages/twisted/python/context.py", line 37, in callWithContext
	    return func(*args,**kw)
	  File "/usr/local/lib/python2.6/site-packages/buildbot-0.8.4p2-py2.6.egg/buildbot/db/pool.py", line 114, in thd
	    rv = callable(conn, *args, **kwargs)
	  File "/usr/local/lib/python2.6/site-packages/buildbot-0.8.4p2-py2.6.egg/buildbot/db/changes.py", line 141, in thd
	    project=project))
	  File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/base.py", line 1358, in execute
	    params)
	  File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/base.py", line 1491, in _execute_clauseelement
	    compiled_sql, distilled_params
	  File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/base.py", line 1599, in _execute_context
	    context)
	  File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/base.py", line 1592, in _execute_context
	    context)
	  File "/usr/local/lib/python2.6/site-packages/SQLAlchemy-0.7.1-py2.6-linux-i686.egg/sqlalchemy/engine/default.py", line 325, in do_execute
	    cursor.execute(statement, parameters)
	sqlalchemy.exc.ProgrammingError: (ProgrammingError) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. u'INSERT INTO changes (author, comments, is_dir, branch, revision, revlink, when_timestamp, category, repository, project) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)' ('ihu', 'Change 85902 by asd@asd.asdf.11 on 2011/08/11 03:18:06\n\n\tbug 40295: \n\t1. Reason: When mousing down the update image button (not click), its icon name will be suffixed to "Down" such as "xxxDown.jpg", so the button seems to disappear. \n\t2. Fix: using setShowDown(false) when creating imageButton.\n\t \n\nJobs fixed ...\n\nbug40295 by asd@asdf.com *CLOSED-FIXED*\n\n\ASDF-GUI-P3-\x93Upgrade Icon\x94 disappears sometimes.\n\n', 0, 'asd/1.1', '85902', '', 1313057886, None, '', '')

comment:21 Changed 6 months ago by dustin

Ah, I wasn't testing correctly. Thanks!

comment:22 Changed 6 months ago by Dustin J. Mitchell

  • Status changed from reopened to closed
  • Resolution set to fixed

Add an encoding parameter to P4Poller

This allows the poller to decode encoded data from p4, and insert actual unicode strings into the db. Fixes #827.

Changeset: 4871d5c2b0350c3a5e4b8aa7f9588a634a3415cd

comment:23 Changed 6 months ago by dustin

Incidentally, let's open new bugs for any further problems like this..

comment:24 Changed 6 months ago by birbilakos

Strangely, I've used the latest code from the git and it seems that p4poller fails to poll good changes too now:

2011-08-24 10:57:24+0300 [-] added change Change(revision=u'86748', who=u'asdfg', branch=u'asdfgh/1.1', comments=u"Change 86748 by asdfg@asdfg_gui_ga_11 on 2011/08/24 01:06:03\n\n\tchange description of ASD/FGH performance data: remove 'blade' in all their counters\n\t[ http://asdfghj.asdfghjkla.com/r/9044 - Reviewed By: asdf ] \n\nJobs fixed ...\n\nbug38762 by asdfg@asdfghjkla.com *RESOLVED-FIXED*\n\n\tA4200 GUI: Most performance counter descriptions should change\n\n", when=1314137163, category=None, project=u'', repository=u'') to database
2011-08-24 10:57:24+0300 [-] P4 poll failed
	Traceback (most recent call last):
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 243, in callback
	    self._startRunCallbacks(result)
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 312, in _startRunCallbacks
	    self._runCallbacks()
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 328, in _runCallbacks
	    self.result = callback(self.result, *args, **kw)
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 630, in gotResult
	    _deferGenerator(g, deferred)
	--- <exception caught here> ---
	  File "/usr/local/lib/python2.6/site-packages/twisted/internet/defer.py", line 605, in _deferGenerator
	    result = g.next()
	  File "/usr/local/lib/python2.6/site-packages/buildbot-0.8.4_pre_1121_g3c95c91-py2.6.egg/buildbot/changes/p4poller.py", line 147, in _poll
	    result = result.decode(self.encoding)
	  File "/usr/local/lib/python2.6/encodings/utf_8.py", line 16, in decode
	    return codecs.utf_8_decode(input, errors, True)
	exceptions.UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 327: unexpected code byte

The changelist though seems good though:

p4 change -o 86748
Change:	86748

Date:	2011/08/24 01:06:03

Client:	asdfg_gui_ga_11

User:	asdfg

Status:	submitted

Description:
	change description of ASD/FGH performance data: remove 'blade' in all their counters
	[ http://asdfghj.asdfghjkla.com/r/9044 - Reviewed By: asdf ] 

Jobs:
	bug38762	# A4200 GUI: Most performance cou

comment:25 Changed 6 months ago by birbilakos

# buildbot --version Buildbot version: 0.8.4-pre-1121-g3c95c91 Twisted version: 8.2.0

comment:26 Changed 6 months ago by dustin

Can you find out what's at position 327? 0x92 is definitely not ascii, so this isn't a "good" change in that sense.

comment:27 Changed 6 months ago by birbilakos

I think that the ' char makes it choke. I've observed that consistently since yesterday with all the changelists containing this char (don't know it there are more chars though).

Other changelists seem to be parsed normally.

comment:28 Changed 5 months ago by dustin

0x92 is not '.

Basically, your p4 is sending output in some encoding *other* than utf8, so you need to figure out what that is and set the encoding parameter accordingly.

Note: See TracTickets for help on using tickets.