Ticket #102 (closed defect: fixed)
reconfig buildbot, then 'reload' in browser causes exception
|Reported by:||warner||Owned by:|
The new WebStatus? page gets rebuilt on each reconfig (because it's too hard to figure out whether it's changed or not). This has the unfortunate side-effect of exposing a problem with cached (persistent) connections: any browser which was talking to the WebStatus? before the reconfig will continue to talk to the *old* WebStatus?, and won't see the new one. This problem will persist until either the server times out the persistent connection (twisted.web does this after 12 hours), or the browser decides to drop the connection on its own (from 2 to 5 minutes, in my experiments).
To deal with this, I'm adding some code to the web-page classes that keep track of all the HTTPChannels that have been used by a given WebStatus? object (using a WeakKeyDictionary?). When the WebStatus? shuts down (because it's being replaced by a new one), it goes through the list and kills those connections first.
The rest of this ticket hold my random notes on this topic.
browsers will cache connections, and if we've recently reloaded the config file, a browser might still be talking to the previous Site, which will work for some things, but will break when they try to reach through our now-empty .parent attribute (usually via HtmlResource?.getStatus(), which does request.site.buildbot_service.parent). This results in a big ugly exception on pretty much any page in the following situation: the browser hits a buildbot page, then the buildbot is reconfigured, then the user tells the browser to reload (or hit another page on the same buildmaster). The fact that we use a new WebStatus? instance for every reconfig (not just those which modify the WebStatus? parameters) makes this even worse.
The most annoying thing about this is when you're hacking on your config and want to see the changes you've just made.
The connection will be kept open until either the server or the browser decides to close it. My copy of firefox appears to keep it alive for about two minutes. The twisted.web.server.Site (an HTTPFactory subclass) sets a server-side timeout, which drops the connection if it has been up for more than 12 hours.
Unfortunately, the factory doesn't keep a reference to the HTTPChannels that it creates, so we don't have anything to track down and break at reconfig time (unless we were willing to use gc.get_referrers() on the WebStatus?.site that we just removed, and I'm not).
I can think of the following ways to deal with this:
- keep the old .parent link alive, allowing the cached connection to continue to work. However, if the reconfig action was to change the WebStatus?, the browser will continue to show the old behavior, which will be confusing and annoying.
- use gc.get_referrers() on the old WebStatus?.site to find all the HTTPChannels that refer to it, and force them to shut down their connections. Not likely.
- subclass HTTPChannel and override checkPersistence() to disable persistent connections entirely. Seems heavy-handed.
- in render(), use request.channel.persistent=False . Also heavy-handed. I'm ok with persistence, as long as it stops when we shut down the WebStatus?.
- lower the HTTPFactory timeout from 12 hours to more like 30 seconds. It is important to make the timeout longer than it takes to render any actual page, since the timeout will sever the connection even if it is still in use by page rendering.
- restart the web browser
- restart the buildmaster
I'm going to use weakrefs to allow the WebStatus? to keep track of all the channels that are still open, and then have its stopService() method shut them all down.