Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#3101 closed defect (fixed)

exception while rendering logs

Reported by: sa2ajj Owned by: dustin
Priority: major Milestone: 0.9.0
Version: master Keywords: web
Cc:

Description

From twistd.log:

2014-12-08 18:26:39+0000 [-] while handling API request
        Traceback (most recent call last):
          File "/usr/home/bbmaster/master/lib/python2.7/site-packages/twisted/internet/defer.py", line 1097, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/usr/home/bbmaster/master/lib/python2.7/site-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
          File "/usr/home/bbmaster/buildbot/master/buildbot/www/rest.py", line 374, in renderRest
            request.write(data)
          File "/usr/local/lib/python2.7/contextlib.py", line 35, in __exit__
            self.gen.throw(type, value, traceback)
        --- <exception caught here> ---
          File "/usr/home/bbmaster/buildbot/master/buildbot/www/rest.py", line 111, in handleErrors
            yield
          File "/usr/home/bbmaster/buildbot/master/buildbot/www/rest.py", line 308, in renderRest
            data = yield ep.get(rspec, kwargs)
          File "/usr/home/bbmaster/master/lib/python2.7/site-packages/twisted/internet/defer.py", line 1097, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/usr/home/bbmaster/master/lib/python2.7/site-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
          File "/usr/home/bbmaster/buildbot/master/buildbot/data/logchunks.py", line 78, in get
            logid, firstline, lastline)
          File "/usr/home/bbmaster/master/lib/python2.7/site-packages/twisted/python/threadpool.py", line 196, in _worker
            result = context.call(ctx, function, *args, **kwargs)
          File "/usr/home/bbmaster/master/lib/python2.7/site-packages/twisted/python/context.py", line 118, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/usr/home/bbmaster/master/lib/python2.7/site-packages/twisted/python/context.py", line 81, in callWithContext
            return func(*args,**kw)
          File "/usr/home/bbmaster/buildbot/master/buildbot/db/pool.py", line 191, in __thd
            rv = callable(arg, *args, **kwargs)
          File "/usr/home/bbmaster/buildbot/master/buildbot/db/logs.py", line 83, in thd
            idx = content.rindex('\n', 0, idx - 1)
        exceptions.ValueError: substring not found

The outcome seems to be 500 Internal Server Error by nginx.

Attachments (1)

3101.dump (31.8 KB) - added by dustin 4 years ago.
sqlite-dump.txt

Download all attachments as: .zip

Change History (11)

comment:1 Changed 4 years ago by Ben

That happened visiting http://nine.buildbot.net/#/builders/1/build/24/step/3/log/problems.

Browser shows: GET http://nine.buildbot.net/api/v2/logs/1572/contents?limit=16&offset=24 500 (Internal Server Error)

Response from server was: {"error": "ValueError('substring not found',)"}

comment:2 Changed 4 years ago by Ben

Trying to reproduce this, I wrote that test:

    @defer.inlineCallbacks
    def test_getLogLines_one_chunk_per_5_lines(self):
        yield self.insertTestData(self.backgroundData + [
            fakedb.Log(id=201, stepid=101, name=u'stdio', slug=u'stdio',
                       complete=0, num_lines=200, type=u's')] + [
            fakedb.LogChunk(logid=201, first_line=i, last_line=i+4, compressed=0,
                            content="chunk %d\n\n\n\n" % i) for i in range(0, 200, 5)
        ])
        self.assertEquals((yield self.db.logs.getLogLines(201, 42, 48)), u'\n\n\nchunk 45\n\n\n\n')

Which give different results on the real and fake db ...

EDIT: Fixed the off-by-one in the test case, results are now consistent.

Last edited 4 years ago by Ben (previous) (diff)

comment:3 Changed 4 years ago by Ben

We've been trying to pinpoint the cause of this.

Same trouble happens there: http://nine.buildbot.net/#/builders/7/build/19/step/4/log/problems and the log is smaller.

That is log #1488, the build page shows it having 210 lines. According to @sa2ajj, there are 211 lines in that log in the db, that logs ends with '\n\ (i.e. line #212 is empty.)

Same as been observed with the log #1470, this one has 11 lines according to the build page, and that one has 12 lines (the 13th is empty). Only it doesn't show the trouble as the chunk is smaller that the asked part of the logs.

Those logs are coming from the Trial step (buildbot.step.python_twisted). It's one big addCompletelog with a string that ends with '\n'. (Well, PR-1435 tries to address that, I hope it just doesn't removes the problem we have at hand ...)

comment:4 Changed 4 years ago by Ben

Logs in the db don't end with a '\n' (See https://github.com/buildbot/buildbot/blob/master/master/buildbot/db/logs.py#L107). So in order to have one actually ending with a '\n', the original chunk of text must have ended with two (i.e. an empty last line).

It is possible to download the full problematic logs via the 'download' button, which makes analyse a bit easier

According to the 'Trial' step, the 'problems' logs are a subset from the 'stdio' logs (the last part starting from '=' * 60), so we know pretty much what should be expected there.

The 'stdio' don't have a trailing '\n' (ends with an empty line), which make me think that the one in 'problems' doesn't belong there. I just don't know who added it / where does it comes from.

About the test-case above, the test case itself was wrong (thanks @dustin).

Changed 4 years ago by dustin

sqlite-dump.txt

comment:5 Changed 4 years ago by dustin

The attachment contains the relevant DB rows from comment 4.

comment:6 Changed 4 years ago by dustin

  • Owner set to dustin
  • Status changed from new to assigned

I wrote up a quick verification script:

import sqlalchemy as sa

db_url = "sqlite:////A/bbrun/state.sqlite"

def verify():
    eng = sa.create_engine(db_url)
    conn = eng.connect()

    for log_row in conn.execute("select id, num_lines, type from logs"):
        print "verifying log", log_row.id
        assert log_row['type'] in 'sth'

        line_ranges = []
        for logchunk_row in conn.execute("select first_line, last_line, content from logchunks where logid=%d" % log_row['id']):
            content, first, last = logchunk_row['content'], logchunk_row['first_line'], logchunk_row['last_line']
            assert last >= first
            assert str(content).count('\n') == last - first
            line_ranges.append((logchunk_row['first_line'], logchunk_row['last_line']))

        next_line = 0
        for f, l in sorted(line_ranges):
            assert f == next_line
            next_line = l + 1

        assert log_row['num_lines'] == next_line

if __name__ == "__main__":
    verify()

and all logs in the DB (including 1470 and 1488) pass without issue.

Last edited 4 years ago by dustin (previous) (diff)

comment:7 Changed 4 years ago by dustin

Clicking through various URLs like that in comment 1, I see that

etc. work just fine, but once you add limit=1 things go haywire. I think there's a bug in the Data API implementation here.

comment:9 Changed 4 years ago by Dustin J. Mitchell <dustin@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 2add1d92b0643abf62621d71f4d1928326220c2f:

Fix off-by-one error in log chunk handling

The extra - 1 here caused single blank lines to be skipped when
chopping un-requested lines of the end of a response. Earlier tests,
even if they had actually been running, would not have discovered this
as they did not test blank lines. Fixes #3101.

comment:10 Changed 4 years ago by Mikhail Sobolev <mss@…>

In 8b7fec7aa46c56f9b64ce27784edae44709ba3ef:

Merge pull request #1436 from djmitche/bug3101

Fix off-by-one error in log handling of blank lines

Note: See TracTickets for help on using tickets.