Ticket #2272 (closed defect: worksforme)
Eviction of BuildStatus objects
|Reported by:||szager||Owned by:|
The memory usage of our buildbot master seems to grow linearly with time (which corresponds pretty well to number of builds). There doesn't seem to be any limit to this behavior; if we don't restart the master for a long time, it will grow until the OS starts swapping, which effectively kills the master and forces a restart. That happens when the master process is using >10GB on a 12GB system.
I inspected all the live instances of buildbot.status.builder.BuilderStatus? and looked at the buildCache fields. What I noticed is that for most builders, len(builder_status.buildCache.cache) is equal to the maximum allowed size of the cache, but len(builder_status.buildCache.weakrefs) is typically five times the maximum allowed size of the cache!
So, I used the heap profiler to see where references to BuildStatus? objects were being held. From what I can tell, it seems like circular references between BuildStatus?, BuildStepStatus?, and LogFile? (and maybe some other status-related objects) are the issue here. Theoretically, the python garbage collector should be able to detect circular references between unreachable objects and clean them up; but it's not 100% efficient (for reasons of algorithmic complexity).
First of all, I'd like to know whether this all sounds plausible.
Secondly, I'd like to hear opinions on a proposed fix, which is to use weakrefs for all the parent pointers in status objects; e.g., in buildbot.status.buildstep.py:
def init(self, parent, master, step_number):
self.build = weakref.ref(parent) self.builder = parent.getBuilder() self.build_number = parent.getNumber() ...
result = self.build() if result is None:
result = self.builder.getBuildByNumber(self.build_number) self.build = weakref.ref(result)
And, obviously, all usage of stepstatus.build would need to be changed to stepstatus.getBuildStatus().
- Keywords performance added
- Type changed from undecided to defect
- Milestone changed from undecided to 0.8.7