Opened 9 years ago

Closed 7 years ago

#1854 closed defect (fixed)

FileUpload never times out — at Version 14

Reported by: exarkun Owned by: juanl
Priority: critical Milestone: 0.8.+
Version: 0.8.2 Keywords: transfer master-slave sprint
Cc:

Description (last modified by dustin)

If a slave loses its connection to the master without sending a FIN or RST (eg, because of network issues) while a FileUpload step is running, the build that step is part of never finishes, the master never notices the slave disconnected, and the slave can never reconnect until the master is restarted. After the master is restarted, the build is left in a weird state where it appears incomplete rather than failed.

Change History (14)

comment:1 Changed 9 years ago by dustin

  • Milestone changed from undecided to 0.8.4
  • Priority changed from major to critical
  • Type changed from undecided to defect

comment:2 Changed 9 years ago by dustin

  • Keywords sprint added

comment:3 Changed 8 years ago by dustin

  • Keywords transfer added; sprint removed
  • Milestone changed from 0.8.4 to 0.8.+

comment:4 Changed 8 years ago by dustin

  • Keywords sprint added

This should be relatively easy to replicate (e.g., using iptables). I think that the fix is to add a keepalive or timeout to the step.

comment:5 Changed 8 years ago by exarkun

A document covering good testing practices, in particular testing of code involving protocols and timing, has recently been added to the Twisted documentation:

http://twistedmatrix.com/documents/current/core/howto/trial.html#auto4

It may be helpful in developing good unit tests for this functionality.

comment:6 Changed 8 years ago by juanl

  • Owner set to juanl
  • Status changed from new to accepted

comment:7 Changed 8 years ago by Ben

I raised the same trouble with a full hard disk. The builder remained stuck in the upload step, reporting "uploading" (yellow: step in progress), where the logs clearly reported the OSError (err.html only, err.txt was empty), and the build status (at the top of the waterfall) also reported the exception (purple) but as happening on another step of the build ...

I cleanly shutdown the master, after reboot, the build in question was not to be seen anywhere anymore ( it completely disappeared ).

comment:8 Changed 7 years ago by tom.prince

  • Milestone changed from 0.8.+ to 0.8.8

comment:9 Changed 7 years ago by tom.prince

  • Keywords slave-proto added; sprint removed
  • Milestone changed from 0.8.8 to 0.8.+

This is probably best handled during the slave-proto work.

comment:10 Changed 7 years ago by tom.prince

  • Keywords master-slave added; slave-proto removed

comment:11 Changed 7 years ago by dustin

  • Keywords sprint added

comment:12 Changed 7 years ago by juanl

related conversation on the message board: http://sourceforge.net/mailarchive/message.php?msg_id=29042423

comment:13 Changed 7 years ago by juanl

It appears that this ticket might no longer be valid.

I've installed v0.8.2 master and slave on the same machine, started a fileUpload with a large file, brought down the loopback interface, and waited for the master to sense the slave disconnect. In this case the build step will hang, remaining in the in-process (yellow) state. Even after the slave reconnects, the step will remain in the in-process state - even though new builds can be started.

Using v0.8.7p1 master and slave and the same procedure above I see the build will be properly interrupted and marked as having incurred an exception.

Here is a bash script that sets up a v0.8.2 or v0.8.7p1 test environment for recreating this issue. http://pastebin.mozilla.org/2226893

comment:14 Changed 7 years ago by dustin

  • Description modified (diff)
  • Resolution set to fixed
  • Status changed from accepted to closed

I don't think there's any "might" involved. You've demonstrated conclusively that this is no longer a problem.

Note: See TracTickets for help on using tickets.