Ticket #2272 (closed defect: worksforme)

Opened 14 months ago

Last modified 12 months ago

Eviction of BuildStatus objects

Reported by: szager Owned by:
Priority: major Milestone: 0.8.7
Version: 0.8.6p1 Keywords: performance
Cc:

Description

The memory usage of our buildbot master seems to grow linearly with time (which corresponds pretty well to number of builds). There doesn't seem to be any limit to this behavior; if we don't restart the master for a long time, it will grow until the OS starts swapping, which effectively kills the master and forces a restart. That happens when the master process is using >10GB on a 12GB system.

I've done a bit of heap profiling on a running master, and it seems like there are *way* too many BuildStatus?, BuildStepStatus?, StepProgress?, and LogFile? objects in memory.

I inspected all the live instances of buildbot.status.builder.BuilderStatus? and looked at the buildCache fields. What I noticed is that for most builders, len(builder_status.buildCache.cache) is equal to the maximum allowed size of the cache, but len(builder_status.buildCache.weakrefs) is typically five times the maximum allowed size of the cache!

So, I used the heap profiler to see where references to BuildStatus? objects were being held. From what I can tell, it seems like circular references between BuildStatus?, BuildStepStatus?, and LogFile? (and maybe some other status-related objects) are the issue here. Theoretically, the python garbage collector should be able to detect circular references between unreachable objects and clean them up; but it's not 100% efficient (for reasons of algorithmic complexity).

First of all, I'd like to know whether this all sounds plausible.

Secondly, I'd like to hear opinions on a proposed fix, which is to use weakrefs for all the parent pointers in status objects; e.g., in buildbot.status.buildstep.py:

class BuildStepStatus?(styles.Versioned):

def init(self, parent, master, step_number):

self.build = weakref.ref(parent) self.builder = parent.getBuilder() self.build_number = parent.getNumber() ...

def getBuildStatus(self):

result = self.build() if result is None:

result = self.builder.getBuildByNumber(self.build_number) self.build = weakref.ref(result)

return result

And, obviously, all usage of stepstatus.build would need to be changed to stepstatus.getBuildStatus().

Thoughts?

Thanks,

Stefan

Change History

comment:1 Changed 14 months ago by szager

The code snippet came out a bit garbled I think; here it is again:

class BuildStepStatus(styles.Versioned):
  def __init__(self, parent, master, step_number):
    self.build = weakref.ref(parent)
    self.builder = parent.getBuilder()
    self.build_number = parent.getNumber()

  def getBuildStatus(self):
    result = self.build()
    if result is None:
      result = self.builder.getBuildByNumber(self.build_number)
      self.build = weakref.ref(result)
    return result

comment:2 Changed 14 months ago by dustin

  • Keywords performance added
  • Type changed from undecided to defect
  • Milestone changed from undecided to 0.8.7

That sounds like a good solution to me!

comment:3 Changed 12 months ago by dustin

Have you determined that this fix eliminates the circular references? Python's GC really only gets into trouble when one of the objects in a cycle has a __del__ method -- which pb.Referencable objects do. None of the classes you implicated above are referencable, but perhaps that wasn't a complete list of the cycle?

comment:4 Changed 12 months ago by szager

In hindsight, I believe the circular references are a red herring. The real fix is this one:

commit ff8480d1fb3aa2330d89bdc4bca874e6c6775a75 Author: Stefan Zager <szager@…> Date: Tue May 1 15:05:37 2012 -0700

Don't cache json build status objects; they are never cleaned up.

This change can be discarded.

Thanks,

Stefan

comment:5 Changed 12 months ago by dustin

  • Status changed from new to closed
  • Resolution set to worksforme
Note: See TracTickets for help on using tickets.