Opened 9 years ago

Closed 9 years ago

#957 closed defect (wontfix)

None.change_svc dereferenced in ConsoleStatusResource

Reported by: mhagger Owned by:
Priority: major Milestone: 0.8.+
Version: 0.8.1 Keywords: web reconfig
Cc: mhagger@…

Description

Trying to view the "console" web page sometimes results in the following error:

2010-08-10 15:22:49+0200 [HTTPChannel,8,192.168.100.37] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.5/site-packages/twisted/protocols/basic.py", line 231, in dataReceived
            why = self.lineReceived(line)
          File "/usr/lib/python2.5/site-packages/twisted/web/http.py", line 1067, in lineReceived
            self.allContentReceived()
          File "/usr/lib/python2.5/site-packages/twisted/web/http.py", line 1108, in allContentReceived
            req.requestReceived(command, path, version)
          File "/usr/lib/python2.5/site-packages/twisted/web/http.py", line 626, in requestReceived
            self.process()
        --- <exception caught here> ---
          File "/usr/lib/python2.5/site-packages/twisted/web/server.py", line 150, in process
            self.render(resrc)
          File "/usr/lib/python2.5/site-packages/twisted/web/server.py", line 157, in render
            body = resrc.render(self)
          File "/usr/lib/python2.5/site-packages/buildbot/status/web/base.py", line 230, in render
            data = self.content(request, ctx)
          File "/usr/lib/python2.5/site-packages/buildbot/status/web/console.py", line 613, in content
            source = self.getChangeManager(request)
          File "/usr/lib/python2.5/site-packages/buildbot/status/web/console.py", line 96, in getChangeManager
            return request.site.buildbot_service.parent.change_svc
        exceptions.AttributeError: 'NoneType' object has no attribute 'change_svc'

The error is not 100% reproducible. Reloading the page soon after the last page load sometimes results in an error but more often in a correct console page (maybe error 10% of the time?), even when there is no activity in our repository. It seems like errors are much more likely if there was a pause of more than 10 s or so since the page was last loaded. For example, the autorefresh of the console web page is quite likely to result in an error.

The buildmaster is running on a relatively quiet Linux VM under a Xen hypervisor on a physical computer that is sometimes busy. The repository is configured with two change sources: a SVNPoller and a PBChangeSource. If you need any more information about our configuration, please let me know.

Change History (12)

comment:1 follow-up: Changed 9 years ago by dustin

Is the buildmaster being reconfigured during this time? This sort of transient error often occurs during reconfiguration.

comment:2 in reply to: ↑ 1 Changed 9 years ago by mhagger

  • Cc mhagger@… added

Replying to dustin:

Is the buildmaster being reconfigured during this time? This sort of transient error often occurs during reconfiguration.

No, the error occurred an hour after the last reconfiguration.

comment:3 Changed 9 years ago by dustin

  • Keywords web added
  • Milestone changed from undecided to 0.8.+

So buildbot_service.parent should be the buildmaster. I'm guessing that the web service is keeping an HTTP connection open to your client, and that is holding a reference to an old Site object.

We had some commits to try to fix this a while back, but they weren't perfect.

Can you turn off the HTTP connection keepalive somehow?

comment:4 Changed 9 years ago by mhagger

I just set firefox's setting network.http.keep-alive to false (I suppose that this does what you mean) but the error is still there.

I should mention that the buildbot website is being accessed over https, in case that is relevant.

comment:5 Changed 9 years ago by mhagger

When I saw the errors, I had been reconfiguring a running the buildmaster (kill -HUP). I just stopped and restarted the buildmaster, and so far I haven't seen the error again. So maybe the bug is triggered by a problem with reconfiguration.

comment:6 Changed 9 years ago by dustin

I'm certain it's reconfiguration-related. The question is, how is that lasting so long. If you're accessing over SSL, then are you using a frontend proxy?

If you're curious, this was the fix we applied here -- you can see what a hack it is..

4a01b0f2db29f5a5a282206ae8ff3e11fee2ad85

comment:7 Changed 9 years ago by mhagger

I'm no Apache wizard, but here is a snipped from the config that is hopefully illuminating:

<VirtualHost 192.168.100.37:443>
        [...]
        SSLEngine On
        SSLCACertificateFile /etc/apache2/ssl/XXXX.crt
        SSLCertificateFile /etc/apache2/ssl/YYYY.crt
        SSLCertificateKeyFile /etc/apache2/ssl/YYYY.key

        ProxyRequests Off
        ProxyPass / http://192.168.100.37:8010/
        ProxyPassReverse / http://192.168.100.37:8010/

        <Proxy *>
          Order deny,allow
          Allow from all
        </Proxy>

        [...]
</VirtualHost>

comment:8 Changed 9 years ago by dustin

Ah, check out:

http://httpd.apache.org/docs/current/mod/mod_proxy.html#proxypass

In particular, it looks like Apache will keep pooled connections, presumably indefinitely. You probably want to add disablereuse=On in your ProxyPass? directive. It doesn't look like there's a way to convince Apache to close the connections after some configurable timeout, but I may be missing something in the potential Apache config.

I realize that this does not really "solve" the underlying problem in buildbot, but I'm not entirely convinced that it can be solved -- at least not without a major redesign. So hopefully this fixes the problem for you?

comment:9 Changed 9 years ago by dustin

  • Keywords reconfig added

comment:10 Changed 9 years ago by mhagger

I can definitely live with the status quo. Feel free to close the bug.

But maybe first one last thing: it would be helpful if the documentation would warn that "buildbot reconfig" is a bit fragile (maybe there is such a warning already, but I didn't see one). For example, the "Loading the Config File" chapter would be a good place to mention it. I'd submit a doc patch myself, but I don't understand the details well enough to explain the situation.

Thanks for the help!

comment:11 Changed 9 years ago by Dustin J. Mitchell

Mention weaknesses in 'buildbot reconfig'

Refs #957

Changeset: df9054012ae0161889399b4cb4245963e2059868

comment:12 Changed 9 years ago by mhagger

  • Resolution set to wontfix
  • Status changed from new to closed

Thanks!

Note: See TracTickets for help on using tickets.