Ticket #176 (closed defect: fixed)

Opened 4 years ago

Last modified 3 years ago

'buildbot reconfig' causes WebStatus to give tracebacks for awhile

Reported by: bhearsum Owned by:
Priority: major Milestone: 0.7.10
Version: 0.7.6 Keywords:
Cc: dustin, thatch, ijon

Description

After doing a reconfig, even one that doesn't change anything, WebStatus? stops working for a few minutes. It then magically starts working again. There's nothing in the log to indicate how it recovered. Here's the traceback:

File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/web/server.py", line 160, in process

self.render(resrc)

File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/web/server.py", line 167, in render

body = resrc.render(self)

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 210, in render

data = self.content(request)

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 245, in content

data += self.fillTemplate(s.header, request)

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 239, in fillTemplate

valuestitle? = self.getTitle(request)

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/waterfall.py", line 417, in getTitle

status = self.getStatus(request)

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 220, in getStatus

return request.site.buildbot_service.getStatus()

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/baseweb.py", line 458, in getStatus

return self.parent.getStatus()

<type 'exceptions.AttributeError?'>: 'NoneType?' object has no attribute 'getStatus'

Change History

comment:1 Changed 4 years ago by bhearsum

It turns out that I can't consistently reproduce this. It only seems to happen with one of my Buildbots.

comment:2 Changed 4 years ago by dustin

  • Cc dustin added

The reconfig operation is pretty dark magic. It involves divorcing 'old' objects from the object graph, but if they are still in use (e.g., by the web), then problems will ensue. In this case, for example, the web service has been divorced from its parent service. I'm not sure there's a good fix for this problem.

comment:3 Changed 4 years ago by thatch

  • Cc thatch added

comment:4 Changed 4 years ago by ijon

  • Cc ijon added

Identical to #139.

comment:5 Changed 4 years ago by bhearsum

I've now noticed that if I reload a ton of times (by holding down the keyboard shortcut for 'reload') - it comes back immediately.

comment:6 Changed 4 years ago by warner

I'm seeing this a lot at work too.

comment:7 Changed 4 years ago by dbailey

I get this relatively consistently.

The most recent occurrence was when I updated the master.cfg file to change the FileUpload? step on the 3 builders defined to use a WithProperties? to set the filename.

Only solution in most of the cases I encounter is to complete restart the buildbot master.

comment:8 Changed 3 years ago by dustin

I see this too. My theory is that my browser is using HTTP/1.1 with connection caching, and I'm still connected to the old status object. I'm not sure there's a good solution to this.

comment:9 Changed 3 years ago by dbailey

cache-Control directive may solve the problem.

 http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html Look for 14.9

I haven't checked to see if the necessary HTTP headers can be set by buildbot, but since it's using twisted for its own web server, I'm assuming it should be possible.

I haven't read the options in detail to see if there is a nice option to inform browsers that they should ignore any cached output prior to a given time/date (i.e update that value after any reconfig).

The alternative is to request the browser to disable caching.

comment:10 Changed 3 years ago by dustin

hmm, I don't like the idea of disabling connection caching altogether just to fix this bug. If anything, this is a bug in twisted -- not terminating existing connections when the service is shut down.

Another solution may be to delay removing the old WebStatus? object from the service hierarchy for some longish time like 5 minutes.

comment:11 Changed 3 years ago by dustin

  • Milestone changed from undecided to 0.7.10

let's see if we can fix this in 0.7.10, eh?

comment:12 Changed 3 years ago by dustin

  • Status changed from new to closed
  • Resolution set to fixed
commit 48a0947ad8e829963f9564ab27848a66230f381a
Author: Dustin J. Mitchell <dustin@zmanda.com>
Date:   Wed Feb 25 13:22:05 2009 -0500

    (refs #176) use buildmaster_service.master, not ..parent, so that cached web connections can still get reasonable info
Note: See TracTickets for help on using tickets.