Ticket #176 (closed defect: fixed)
'buildbot reconfig' causes WebStatus to give tracebacks for awhile
| Reported by: | bhearsum | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | 0.7.10 |
| Version: | 0.7.6 | Keywords: | |
| Cc: | dustin, thatch, ijon |
Description
After doing a reconfig, even one that doesn't change anything, WebStatus? stops working for a few minutes. It then magically starts working again. There's nothing in the log to indicate how it recovered. Here's the traceback:
File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/web/server.py", line 160, in process
self.render(resrc)
File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/web/server.py", line 167, in render
body = resrc.render(self)
File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 210, in render
data = self.content(request)
File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 245, in content
data += self.fillTemplate(s.header, request)
File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 239, in fillTemplate
valuestitle? = self.getTitle(request)
File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/waterfall.py", line 417, in getTitle
status = self.getStatus(request)
File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 220, in getStatus
return request.site.buildbot_service.getStatus()
File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/baseweb.py", line 458, in getStatus
return self.parent.getStatus()
<type 'exceptions.AttributeError?'>: 'NoneType?' object has no attribute 'getStatus'
Change History
comment:2 Changed 4 years ago by dustin
- Cc dustin added
The reconfig operation is pretty dark magic. It involves divorcing 'old' objects from the object graph, but if they are still in use (e.g., by the web), then problems will ensue. In this case, for example, the web service has been divorced from its parent service. I'm not sure there's a good fix for this problem.
comment:5 Changed 4 years ago by bhearsum
I've now noticed that if I reload a ton of times (by holding down the keyboard shortcut for 'reload') - it comes back immediately.
comment:7 Changed 4 years ago by dbailey
I get this relatively consistently.
The most recent occurrence was when I updated the master.cfg file to change the FileUpload? step on the 3 builders defined to use a WithProperties? to set the filename.
Only solution in most of the cases I encounter is to complete restart the buildbot master.
comment:8 Changed 3 years ago by dustin
I see this too. My theory is that my browser is using HTTP/1.1 with connection caching, and I'm still connected to the old status object. I'm not sure there's a good solution to this.
comment:9 Changed 3 years ago by dbailey
cache-Control directive may solve the problem.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html Look for 14.9
I haven't checked to see if the necessary HTTP headers can be set by buildbot, but since it's using twisted for its own web server, I'm assuming it should be possible.
I haven't read the options in detail to see if there is a nice option to inform browsers that they should ignore any cached output prior to a given time/date (i.e update that value after any reconfig).
The alternative is to request the browser to disable caching.
comment:10 Changed 3 years ago by dustin
hmm, I don't like the idea of disabling connection caching altogether just to fix this bug. If anything, this is a bug in twisted -- not terminating existing connections when the service is shut down.
Another solution may be to delay removing the old WebStatus? object from the service hierarchy for some longish time like 5 minutes.
comment:11 Changed 3 years ago by dustin
- Milestone changed from undecided to 0.7.10
let's see if we can fix this in 0.7.10, eh?
comment:12 Changed 3 years ago by dustin
- Status changed from new to closed
- Resolution set to fixed
commit 48a0947ad8e829963f9564ab27848a66230f381a
Author: Dustin J. Mitchell <dustin@zmanda.com>
Date: Wed Feb 25 13:22:05 2009 -0500
(refs #176) use buildmaster_service.master, not ..parent, so that cached web connections can still get reasonable info
![[Buildbot Logo]](/chrome/site/header-text-transparent.png)
It turns out that I can't consistently reproduce this. It only seems to happen with one of my Buildbots.