Ticket #26 (closed enhancement: fixed)

Opened 6 years ago

Last modified 4 years ago

compress buildstep logs

Reported by: warner Owned by: Pike
Priority: minor Milestone: 0.7.10
Version: Keywords:
Cc: s.lipnevich@…, pw@…, dustin, Pike

Description

Darragh Bailey suggests that the buildstep logfiles could be compressed on disk to good effect. This is probably not too difficult to implement: basically in LogFile.init we'd use gzip.open() instead of the regular open(). The tricky part would be compatibility with existing logfiles: this might require a special extension for the new compressed logs, which would require checking for both kinds of filenames when looking for logs.

Attachments

buildbot-gzip.patch Download (2.6 KB) - added by slinkp 5 years ago.
first attempt at gzip support patch
buildbot-gzip2.patch Download (3.7 KB) - added by slinkp 5 years ago.
second attempt, more complex, still many test failures

Change History

comment:1 follow-up: ↓ 2 Changed 6 years ago by warner

Darragh had another good idea: just compress the logfile after it is closed, rather than trying to do something incremental. When looking for a logfile, take the .gz form if present, otherwise fall back to the non-.gz form.

There might be a tricky bit w.r.t. windows buildmasters, if someone is trying to read from the status file while we're writing the compressed form, but I have a feeling that this problem either doesn't actually exist or is completely masked by some other problem.

comment:2 in reply to: ↑ 1 Changed 6 years ago by bear

Replying to warner:

Darragh had another good idea: just compress the logfile after it is closed, rather than trying to do something incremental. When looking for a logfile, take the .gz form if present, otherwise fall back to the non-.gz form.

There might be a tricky bit w.r.t. windows buildmasters, if someone is trying to read from the status file while we're writing the compressed form, but I have a feeling that this problem either doesn't actually exist or is completely masked by some other problem.

Could this be as simple as changing line 236 to read:

self.openfile = gzip.open(fn, 'w+')

and line 269:

return gzip.open(self.getFilename(), "r")

within buildbot/status/builder.py ?

the gzip.open() returns a fully functional file handle so it should be able to replace the two existing open() calls.

comment:3 Changed 6 years ago by warner

possibly. If gzip.open() is given a non-gzipped file, does it fall back to behaving just like regular open()? (we need to keep working with old logfiles). If not, it might be sufficient to wrap the gzip.open()s in a try: block that falls back to regular open() in case of some hypothetical NotAGZipFileError exception.

Also, it would be nice if the files on disk were obviously compressed, but since they aren't entirely plain text that probably isn't too big of a deal.

There are two other open() calls, at lines 269 (LogFile?.getFile) and 450 (LogFile?.upgrade) that need the same treatment.

Could somebody see if the unit tests pass with this change? Note that we must still manually test the handle-old-non-compressed-logfiles properly, since there are no unit tests to verify this. If it works, we should write a unit test to check the backwards-compatibility (by creating some logfiles, closing them, sneaking around and uncompressing one of them, then reading their contents back).

comment:4 Changed 6 years ago by warner

  • Milestone set to 0.8.0

comment:5 Changed 6 years ago by warner

  • Milestone changed from 0.8.0 to 0.9.0

comment:6 Changed 5 years ago by sergey

  • Cc s.lipnevich@… added

comment:7 Changed 5 years ago by slinkp

  • Cc pw@… added

I would love to see this feature. gzip.open() claims to work on non-gzipped files, but you get an IOError when you call its read method. So you may want something like:

>>> def open2(fname):
...     f = gzip.open(fname)
...     try:
...          f.read(1)
...          f.seek(0)
...          return f
...     except IOError:
...          del f
...          return open(fname)
... 
>>> open2('stdio2.gz').read() == open2('stdio').read()
True

comment:8 Changed 5 years ago by dustin

  • Cc dustin added

comment:9 Changed 5 years ago by dustin

Let's someone post a patch so that others can improve on it..

comment:10 Changed 5 years ago by Pike

  • Cc Pike added

Changed 5 years ago by slinkp

first attempt at gzip support patch

comment:11 Changed 5 years ago by slinkp

Patch against current CVS attached; it doesn't work, lots of tests fail due to calls like f.seek(0, 2) which gzip files don't support.

I'm gonna leave it at that; anybody else, feel free to find a solution.

comment:12 follow-up: ↓ 15 Changed 5 years ago by slinkp

I played around with another patch, but there are two problems that look insurmountable without totally rethinking the logging system:

  • GzipFile? doesn't support seeking backward when opened in write mode; eg. you can never seek(0) if you've already seeked to any positive integer.
  • GzipFile? doesn't really support "w+" or "r+" mode; it pretends to, but in "w+" mode it raises errors if you call read(), and in "r+" mode it raises errors if you call write().

Changed 5 years ago by slinkp

second attempt, more complex, still many test failures

comment:13 Changed 5 years ago by slinkp

I should note that I have only python 2.4 handy. Is the gzip module any more compatible with normal file objects in 2.5?

comment:14 Changed 5 years ago by Pike

Here is what I was thinking:

Keep the logfile structure as it is, and just glue the chunks together and gzip or bz2 them, and to do that after the log is completed.

Not exactly sure if there's a trap to go into if the logs are actually already compressed data themselves, at that point, autodetecting by just trying wouldn't work.

Regarding python versions, I think that buildbot still aims at being compatible with python 2.3.

comment:15 in reply to: ↑ 12 Changed 5 years ago by bbl

Replying to slinkp:

I played around with another patch, but there are two problems that look insurmountable without totally rethinking the logging system:

  • GzipFile? doesn't support seeking backward when opened in write mode; eg. you can never seek(0) if you've already seeked to any positive integer.

This is obvious beacause seeking backwards and writing something to that place would force to recompress that whole file starting from the position you seeked to - but this seems not to be implemented in GzipFile? because it would probably force duplicating the file opened which is very ineffective

  • GzipFile? doesn't really support "w+" or "r+" mode; it pretends to, but in "w+" mode it raises errors if you call read(), and in "r+" mode it raises errors if you call write().

This also seems to be very ineffective to implement, too.

Therefore you should probably write the logs the way you do now and create a compressed version of them (and delete the uncompressed one) on closing the log...

Another idea is to save the compressed logs with .gz/.bz2 extension so that you can easily distinguish if a log is compressed or not because if not, as Pike said, in very rare cases, an uncompressed log might be recognised as a compressed log...

comment:16 Changed 4 years ago by Pike

  • Owner changed from warner to Pike

I have a branch for this on  http://github.com/Pike/buildbot/commits/bug26.

Not really sure yet on whether it's cool to compress all logs without limits, and whether to do that in one blocking operation post-facto.

Doing this during logging would be less blocking, but raises the normal problems of seek() not being good on compressed files.

The other way to do it would be to deferToThread the compression piece, but that makes it hard to test, as I had to add a callback, and there's no deferred handed through, and I think the impact would be less if I only compress after logFinished. At which point I kinda loose the observers :-/.

comment:17 Changed 4 years ago by Pike

The sporadic testfailure I mention in my check-in comment seems to happen on master, too, and I suspect it's a fixed twisted bug,  http://twistedmatrix.com/trac/ticket/2419, as I can't reproduce this on my 8.2.0 setup.

comment:18 Changed 4 years ago by dustin

Hmm, the sporadic test failure worries me, since we still support 2.5.0. Is it possible to make this feature disable itself with that version of twisted?

comment:19 Changed 4 years ago by dustin

  • Keywords review added

comment:20 Changed 4 years ago by Pike

  • Status changed from new to assigned

OK, branch seems to be good now, and ready for review.

comment:21 Changed 4 years ago by dustin

  • Keywords review removed
  • Status changed from assigned to closed
  • Resolution set to fixed

Lookin' good. The only thing I see is that in BuildMaster?, known_keys specifies logSizeLimit, while the key is logCompressionLimit. I fixed that and merged to master.

Hooray! This bug has been around for a long time. Thanks!

comment:22 Changed 4 years ago by dustin

  • Milestone changed from 0.9.0 to 0.7.10
Note: See TracTickets for help on using tickets.