Opened 5 years ago

Last modified 4 years ago

#2551 new defect

Triggered builds containing steps synchronized by a SlaveLock (counting or exclusive) are sometimes not run.

Reported by: cmumford Owned by:
Priority: major Milestone: 0.9.+
Version: 0.8.7p1 Keywords: locks
Cc:

Description

I have a test buildbot configuration with two builders (that build), one of which triggers six other test builds. There are only two test build slaves which use a SlaveLock? to ensure that a slave only runs one test at a time. In this configuration if I make changes to the monitored Git repository, which in turn fires off four builds, there will often be several test slaves which are never run. There is no evidence in either the master or slave logs to indicate the cause (that I can see).

Steps to reproduce:

  1. Expand the attached archive.
  2. $ cd bbtest/bbtest
  3. $ make # This starts both server and a few slaves.
  4. $ make bumpseveral; sleep 8; make bumpseveral
  5. Wait for builds/tests to finish (approx 1-2 minutes).
  6. $ tree /tmp/build_artifacts

Each android build artifact should have six test results files as so:

/tmp/build_artifacts
├── build_0
│   ├── android
│   │   ├── test1
│   │   ├── test1_results.txt
│   │   ├── test2
│   │   ├── test2_results.txt
│   │   ├── test3
│   │   ├── test3_results.txt
│   │   ├── test4
│   │   ├── test4_results.txt
│   │   ├── test5
│   │   ├── test5_results.txt
│   │   ├── test6
│   │   └── test6_results.txt
│   └── gtk
│       ├── test1
│       ├── test2
│       ├── test3
│       ├── test4
│       ├── test5
│       └── test6

But sometimes you will see a build with one or more tests which are not run:

├── build_2
│   ├── android
│   │   ├── test1
│   │   ├── test2
│   │   ├── test3
│   │   ├── test4
│   │   ├── test4_results.txt
│   │   ├── test5
│   │   ├── test5_results.txt
│   │   ├── test6
│   │   └── test6_results.txt
│   └── gtk
│       ├── test1
│       ├── test2
│       ├── test3
│       ├── test4
│       ├── test5
│       └── test6

If you look in the waterfall or grid you will see that the test is not run.

If I bump the maxCountForSlave's from 1 to 10 then I cannot reproduce the problem.

Attachments (1)

bbtest.tgz (174.9 KB) - added by cmumford 5 years ago.
A bitbake test project (bbtest) and accompanying test project to reproduce the bug.

Download all attachments as: .zip

Change History (4)

Changed 5 years ago by cmumford

A bitbake test project (bbtest) and accompanying test project to reproduce the bug.

comment:1 Changed 5 years ago by dustin

  • Keywords locks added; Triggerable removed
  • Milestone changed from undecided to 0.8.+

Do the builds *never* run? One possibility here is that the release of the locks is not correctly initiating a check for runnable build requests. But if that's the case, then eventually (I think at a 30-minute interval) the builds would be run anyway. The 30-minute timer is a kind of backup process for missed triggers like this.

Otherwise, something deeper is wrong :(

comment:2 Changed 5 years ago by cmumford

Sorry to say that I waited > 90 min. and still no change in the run/not-run status of builders.

comment:3 Changed 4 years ago by dustin

  • Milestone changed from 0.8.+ to 0.9.+

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.