Opened 8 years ago

Last modified 8 years ago

#1699 new enhancement

Add support for wild-card matching for logfiles parameter in build steps

Reported by: lantictac Owned by:
Priority: minor Milestone: 2.0.+
Version: 0.8.2 Keywords:
Cc: morgan@…, willw@…, tom.prince@…

Description

We currently have a set of test steps in our build process that contain many test "runners" that process in parallel within the same build step. Each test runner outputs its own log and we want to follow them all via the waterfall.

However one issue is that we need to explicitly declare the logs we want to follow in the master.cfg when in reality this list is defined externally and dynamically. This messy task could be avoided if the build step could support wild cards matching for log files.

ie.

f.addStep(ShellCommand(command=cmd,
    description=['testing'],
    descriptionDone=['test'],
    logfiles={'* output' : '*_test.log'}))

So: "foo_test.log" -> "foo output"

If I get the chance I might take a look myself... But the question is, does this sound like a sensible addition?

Change History (6)

comment:1 Changed 8 years ago by dustin

It sounds sensible. Off the top of my head, I'm not sure how difficult it will be - the globbing will obviously need to be done on the slave, but I think the protocol currently specifies the master as the one to enumerate the logs.

If this requires a protocol change, it will need to be done in a completely backward-compatible fashion, which could be tricky.

Give it a shot, and do drop by #buildbot with any questions!

comment:2 Changed 8 years ago by dustin

  • Cc morgan@… willw@… added; morgan@… removed
  • Keywords log removed
  • Milestone changed from undecided to 1.0.+

Any progress on this?

comment:3 Changed 8 years ago by lantictac

Sorry for the incredibly slow response (production deadlines got the better of me). As it happens I did manage to get a hacky implementation working, which we're using to dynamically pick up multiple log files without explicitly declaring them in the master. Unfortunately I might be missing some key details but here's the basis for the implementation. Make of it (and take of it) what you will...

On the slave side in runprocess.py...

# Used to watch for any number of files matching a given wild-card
class WildcardLogFileWatcher:
    LIST_INTERVAL = 4

    def __init__(self, command, name, logfile, follow=False):
        self.command = command
        self.name = name
        self.logfile = logfile
        self.dirPath = os.path.split(logfile)[0]

        log.msg("WildcardLogFileWatcher created to watch %s dir %s" % (logfile, self.dirPath))

        self.started = False

        # follow the file, only sending back lines
        # added since we started watching
        self.follow = follow

        # every few seconds we check on the directory again
        self.dirPoller = task.LoopingCall(self.dirPoll)
        
        # maintain a list of child log file watches (initially empty)
        self.logFileWatchers = []
        
        # maintain a set of found log files to compare with the latest matches
        self.logFileSet = set()

    def start(self):
        self.dirPoller.start(self.LIST_INTERVAL).addErrback(self._cleanupListPoll)

    def _cleanupListPoll(self, err):
        log.err(err, msg="Polling error for file listing")
        self.dirPoller = None

    def stop(self):
        self.dirPoll()
        if self.dirPoller is not None:
            self.dirPoller.stop()
        for w in self.logFileWatchers:
            w.stop()

    def listDir(self):
        if os.path.exists(self.dirPath):
            return set(glob.glob(self.logfile))
        return set()

    def dirPoll(self):
        fileSet = self.listDir()
        
        if fileSet == self.logFileSet:
            return # not started yet
        
        # The fileSet is different so add the new files
        newFileSet = fileSet - self.logFileSet
        
        for filePath in newFileSet:
            # Generate a unique name based on format: "logname;realfilename"
            uniqueName = "%s;%s" % (self.name, os.path.split(filePath)[1])
            log.msg('WildcardLogFileWatcher found new file "%s" and assigned it the name "%s"' % (filePath, uniqueName))
            w = LogFileWatcher(self.command,
                    uniqueName,
                    filePath,
                    follow=self.follow)
            
            w.start()
            self.logFileWatchers.append(w)
        
        self.logFileSet = self.logFileSet | newFileSet

And later in RunProcess? init()

            w = None
            
            if filename.find('*') == -1:
                w = LogFileWatcher(self, name,
                                   os.path.join(self.workdir, filename),
                                   follow=follow)
            else:
                w = WildcardLogFileWatcher(self, name,
                                   os.path.join(self.workdir, filename),
                                   follow=follow)
            
            self.logFileWatchers.append(w)

While on the master side in buildstep.py:

    def addToLog(self, logname, data):
    
        oldname = logname
        realname = logname
        isWildCardPath = False
        
        # Wild-card matches are returned encoded in the logname as "oldname;realname"
        # This allow a degree of backwards compatibilty and avoids altering the protocol
        if logname.find(';') != -1:
            splitname = logname.split(';')
            oldname = splitname[0]
            realname = splitname[1]
            assert oldname != realname
            isWildCardPath = True
    
        # Activate delayed logs on first data.
        if oldname in self.delayedLogs and not (realname in self.logs):

            (activateCallBack, closeWhenFinished) = self.delayedLogs[oldname]
            
            # Only wild-card based delayed logs should remain around to deal with additional matches
            if not isWildCardPath:
                del self.delayedLogs[oldname]
            
            # Allow the build step to create a new local log instance for the status
            loog = activateCallBack(self, realname)
            self.logs[realname] = loog
            self._closeWhenFinished[realname] = closeWhenFinished
            
        if realname in self.logs:
            self.logs[realname].addStdout(data)
        else:
            log.msg("%s.addToLog: no such log %s" % (self, realname))

Pretty ugly I'm sure you'll agree but the best I could throw together at the time. It seems to work most of the time but I don't doubt there are issues with this as it stands. Feel free to use and abuse if it might be of use. Either way I'd love to see official support for this feature in the near future - it certainly made managing our master.cfg vastly easier.

Last edited 8 years ago by lantictac (previous) (diff)

comment:4 Changed 8 years ago by dustin

I, too, think this would be well-received. A few concerns:

  • you say it works most of the time - what is the failure condition?
  • I'm worried about using ';' as a separator (as it might appear in existing filenames), and also about assuming any filename with '*' is a glob. The latter is both too conservative (doesn't notice '?') and too liberal (precludes filenames containing '*')

I'd rather see a master-side configuration that explicitly asks for globbing, and explicit support for it in the master/slave protocol (including using a tuple to avoid trying to encode two filenames in one with an unquoted separator). There is a facility in place already that can be used to determine whether slaves support a certain feature, so it's a simple matter for the master to error out if the user requests globbing but the slave does not support it.

Do you think you could put together a patch against master that addresses these concerns?

comment:5 Changed 8 years ago by tom.prince

  • Cc tom.prince@… added

comment:6 Changed 8 years ago by verm

I will address the issues raised in this ticket as I complete #593.

Note: See TracTickets for help on using tickets.