Opened 6 years ago

Closed 5 years ago

#2587 closed defect (fixed)

Buildbot doesn't accept changes done by the same user for different repositories

Reported by: jollyroger Owned by: dustin
Priority: major Milestone: 0.8.9
Version: 0.8.8 Keywords: database users
Cc: lorenzo.battistini@…

Description

If you try to push two similar changes (for example, with changesource command) that differ only by VCS type (passed as --vc argument, for example, git and svn), the second one won't be added and master will raise the following exception:

2013-11-16 00:39:36+0200 [Broker,1,127.0.0.1] perspective_addChange called
2013-11-16 00:39:36+0200 [-] Peer will receive following PB traceback:
2013-11-16 00:39:36+0200 [-] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 781, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 808, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 761, in run
self.__target(*self.__args, **self.__kwargs)
--- <exception caught here> ---
File "/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 172, in _worker
result = context.call(ctx, function, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
return func(*args,**kw)
File "/usr/lib/python2.7/dist-packages/buildbot/db/pool.py", line 184, in __thd
rv = callable(arg, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/buildbot/db/users.py", line 67, in thd
return thd(conn, no_recurse=True)
File "/usr/lib/python2.7/dist-packages/buildbot/db/users.py", line 52, in thd
r = conn.execute(tbl.insert(), dict(identifier=identifier))
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 662, in execute
params)
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 874, in _execute_context
context)
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1024, in _handle_dbapi_exception
exc_info
File "/usr/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 196, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 867, in _execute_context
context)
File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 324, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (IntegrityError) column identifier is not unique u'INSERT INTO users (identifier) VALUES (?)' (u'User Name <user_name@example.com>',)

Database will not contain anything about second user as well as the change info. Possible database contents:

CREATE TABLE users (
uid INTEGER NOT NULL,
identifier VARCHAR(256) NOT NULL, bb_username VARCHAR(128), bb_password VARCHAR(128),
PRIMARY KEY (uid)
);
INSERT INTO users VALUES(1,'User Name <user_name@example.com>',NULL,NULL);
CREATE TABLE users_info (
uid INTEGER NOT NULL,
attr_type VARCHAR(128) NOT NULL,
attr_data VARCHAR(128) NOT NULL,
FOREIGN KEY(uid) REFERENCES users (uid)
);
INSERT INTO users_info VALUES(1,'git','User Name <user_name@example.com>');

Comments from djmitche:

basically, Buildbot's unable to find the git user with the svn informatoin
but both have the same identifier
it really should invent a new and different identifier in that case

Change History (9)

comment:1 Changed 6 years ago by dustin

  • Milestone changed from undecided to 0.8.9

In more detail:

  • when the first user, git in this case, is added, it defaults to using the full git username as the identifier
  • when the second user is added, svn in this case, the lookup in the users_info table doesn't find the existing record, because the attr_type column differs.
  • consequently, it attempts to add a new, unrelated user, although one that the admin may later want to join to the existing user
  • that new, unrelated user also uses the full username <email> pair as the identifier, but it happens that's already taken.

I can see cases where it will obviously make sense to merge users when this happens. But I can also see cases where it won't (e.g., Dustin Sallings and I are 'dustin' on various services like github and sourceforge, and we wouldn't want our accounts combined). So I think that the best thing to do is take the safe default, and when identifiers clash, add some suffix to disambiguate them, e.g., 'User Name (1) <user_name@example.com>'.

comment:2 Changed 6 years ago by gracinet

Wouldn't that be reasonible to consider valid email addresses to be unique enough ? In that case, the Python library has methods to parse them that could help. Of course that'd stay an external input, therefore someone could cheat by submitting bogus author information in their changesets.

Is this a security concern, though? Any implications on authentication ?

comment:3 Changed 6 years ago by dustin

Not all version-control systems supply email addresses, though, nor do IRC networks. So, for example, if Buildbot sees 'dustin' as a subversion username, then 'dustin' seems a reasonable identifier. If it later sees 'dustin' as a github username, I'd very much like that it not assume that's me (it's not), so it should create a new user identity in Buildbot. But the 'dustin' identity is already taken, and no email address is readily available. In this case, using 'dustin (2)' would, I think, be a good solution.

That identifier is only ever used for display purposes (it's separate from the username), and thus can be changed easily. Well, it could if the 'buildbot user' command worked better :)

Buildbot authentication always happens with a username and password. Users should only rely on identifiers from version control systems if they know that those version control systems are secure. For example, if SSH is configured to use the system username, and commits are only allowed by SSH, that's probably safe. On the other hand, trusting committer/author information from a simple git repository is most definitely not safe.

comment:4 Changed 6 years ago by gracinet

Sorry I haven't been really clear, I was suggesting that in case the identifier looks to be an email address (which the library can help us to check), it could be assumed to be unique.

Any way, if this is only for display purposes, I agree there's not much point about guesswork, and a simple disambiguation system would at least solve the blocking. Buildmaster admins can then very well perform afterwards merging with whatever tool or decide not to care.

The pattern you suggest above User Name (1) <user_name@example.com>' means that the disambiguation part should anyway use the address parsing utilities.

comment:5 Changed 6 years ago by jollyroger

I've remembered another possible caveat with using email address. Even buildbot has some examples: simply run git log | grep Author: | sort | uniq . So how many emails does Dustin have? And the same picture is for lots of other repositories. This is mostly a problem to make a precise statistics per person but we still could use this to have a description and email to blame for a broken build. I've seen Redmine project has a possibility to map repository id's to the id's of the accounts inside this software. Some semi-automatic matching rules are available there (if email matches strictly, for example) but you always can change things manually. Once this is done, association of the commits in VCS and Redmine user allows to have a better navigation and linking.

comment:6 Changed 6 years ago by dustin

Yes - that's the idea we're going for here. It just needs to get implemented :)

Know any students that want to apply to GSoC?

comment:7 Changed 5 years ago by eLBati

  • Cc lorenzo.battistini@… added

comment:8 Changed 5 years ago by dustin

  • Owner set to dustin
  • Status changed from new to assigned

comment:9 Changed 5 years ago by Dustin J. Mitchell <dustin@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 0efac3e18392fc3860277b8de5594691ba514d4f:

Unique-ify user identifiers when attributes do not agree.

This solves the problem when two users happen to have the same
"suggested" identifier (the first argument to findUserByAttr).
Fixes #2587.

Note: See TracTickets for help on using tickets.