Ticket #2272 (closed defect: worksforme)
Eviction of BuildStatus objects
| Reported by: | szager | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | 0.8.7 |
| Version: | 0.8.6p1 | Keywords: | performance |
| Cc: |
Description
The memory usage of our buildbot master seems to grow linearly with time (which corresponds pretty well to number of builds). There doesn't seem to be any limit to this behavior; if we don't restart the master for a long time, it will grow until the OS starts swapping, which effectively kills the master and forces a restart. That happens when the master process is using >10GB on a 12GB system.
I've done a bit of heap profiling on a running master, and it seems like there are *way* too many BuildStatus?, BuildStepStatus?, StepProgress?, and LogFile? objects in memory.
I inspected all the live instances of buildbot.status.builder.BuilderStatus? and looked at the buildCache fields. What I noticed is that for most builders, len(builder_status.buildCache.cache) is equal to the maximum allowed size of the cache, but len(builder_status.buildCache.weakrefs) is typically five times the maximum allowed size of the cache!
So, I used the heap profiler to see where references to BuildStatus? objects were being held. From what I can tell, it seems like circular references between BuildStatus?, BuildStepStatus?, and LogFile? (and maybe some other status-related objects) are the issue here. Theoretically, the python garbage collector should be able to detect circular references between unreachable objects and clean them up; but it's not 100% efficient (for reasons of algorithmic complexity).
First of all, I'd like to know whether this all sounds plausible.
Secondly, I'd like to hear opinions on a proposed fix, which is to use weakrefs for all the parent pointers in status objects; e.g., in buildbot.status.buildstep.py:
class BuildStepStatus?(styles.Versioned):
def init(self, parent, master, step_number):
self.build = weakref.ref(parent) self.builder = parent.getBuilder() self.build_number = parent.getNumber() ...
def getBuildStatus(self):
result = self.build() if result is None:
result = self.builder.getBuildByNumber(self.build_number) self.build = weakref.ref(result)
return result
And, obviously, all usage of stepstatus.build would need to be changed to stepstatus.getBuildStatus().
Thoughts?
Thanks,
Stefan
Change History
comment:2 Changed 14 months ago by dustin
- Keywords performance added
- Type changed from undecided to defect
- Milestone changed from undecided to 0.8.7
That sounds like a good solution to me!
comment:3 Changed 12 months ago by dustin
Have you determined that this fix eliminates the circular references? Python's GC really only gets into trouble when one of the objects in a cycle has a __del__ method -- which pb.Referencable objects do. None of the classes you implicated above are referencable, but perhaps that wasn't a complete list of the cycle?
comment:4 Changed 12 months ago by szager
In hindsight, I believe the circular references are a red herring. The real fix is this one:
commit ff8480d1fb3aa2330d89bdc4bca874e6c6775a75 Author: Stefan Zager <szager@…> Date: Tue May 1 15:05:37 2012 -0700
Don't cache json build status objects; they are never cleaned up.
This change can be discarded.
Thanks,
Stefan
![[Buildbot Logo]](/chrome/site/header-text-transparent.png)
The code snippet came out a bit garbled I think; here it is again:
class BuildStepStatus(styles.Versioned): def __init__(self, parent, master, step_number): self.build = weakref.ref(parent) self.builder = parent.getBuilder() self.build_number = parent.getNumber() def getBuildStatus(self): result = self.build() if result is None: result = self.builder.getBuildByNumber(self.build_number) self.build = weakref.ref(result) return result