Opened 3 years ago

Last modified 3 years ago

#3272 new defect

Buildbot start-up time starts to tip-over once there logs for ~3000 builds in a builder (especially if a build has lots of steps)

Reported by: vlovich Owned by:
Priority: major Milestone: 0.8.x
Version: 0.8.10 Keywords: performance
Cc:

Description

The algorithm for determineNextBuildNumber is O(MN) where M is the number of builds & N is the number of steps per build. Even with only 3439 builds there's 108556 files in a directory for a builder (~32 files per build).

Change History (7)

comment:1 Changed 3 years ago by vlovich

First, if the build number were to be a folder instead that contains the other files, that would cut this down to an O(M) problem.

If the last next builder number were to be stored in the DB, it would an O(1) in the common case. On startup, if the build exists, search up until you find the next one. If it doesn't, search down until you find the one that does.

Even if the number were to be recorded in the DB though there would still need to be a cleanup to provide some more structure. To that end, I would recommend that build logs are re-organized to live under: YYYYMM/DD/Build #/...

That will keep the number of files in a given directory to be much more manageable. Probably would need to keep a information in the DB to get at the YYYYMM/DD information given a build #.

Alternatively, it might be possible to accomplish improvements with a more simple change. Simply have some kind of nesting value (probably configurable but default to something big like 5000 or 10000). That controls how many builds at a given level.

Even if only 1 level is implemented, at 10 000 you're looking at 100 million builds with logfiles still around before you have to look at tackling supporting more levels (or re-architecting again).

comment:2 Changed 3 years ago by vlovich

Another idea: perhaps you can just get a build number & just increment manually & do an isdir(). That might make it O(M) too. Don't know how big a concern gaps in the logfiles are.

You'd have to add a dependency on the scandir module though since there's no other efficient mechanism in python to iterate incrementally through a directory.

comment:3 Changed 3 years ago by vlovich

I don't think this last idea would work though. My directory has lots of gaps for whatever reason & I haven't even removed anything manually so that means it's likely buildbot creates those gaps.

comment:4 Changed 3 years ago by dustin

  • Milestone changed from undecided to 0.8.x

comment:5 Changed 3 years ago by jaredgrubb

Yes, it does seem that a flat-directory does not scale well.

comment:6 Changed 3 years ago by tardyp

I had the same problem in my prod. I made a quick hack to fix the issue, which I did not upstreamed. I prefer working on long term solution with buildbot nine.

There are several scaling limitations on eight.

  • It stores a lot of files in the same diretory (all files related to a builder's dir is stored in one dir, which is problematic on several filesystem)

a good solution for this is to store them in multiple directories (encoded with the lower builds of the build number)

  • determineNextBuildNumber is fundamentally flawed as it parsed all the build files at startup. A good solution for this would be to store it either in the builder pickle itself, or in its own file (so that one can override the next buildnumber easily)

comment:7 Changed 3 years ago by vlovich

What does the roadmap for 0.9 look like? Even if it were made available next week, I wouldn't be able to upgrade right away & would have to validate & fix any incompatability issues.

Would it be possible to make some improvements to this issue in 0.8? I guess there are really two: 100k files in a directory is one and the slowness of determineNextBuildNumber is the other.

Note: See TracTickets for help on using tickets.