herraiz.org | Blog

Main | Blog | Research papers | PhD thesis | GnuPG (PGP)

 Subscribe to this blog in a reader

Software projects alzheimer: Julian Assange's lost contributions

A couple of weeks ago, I attended the Canadian Summer School on Practical Analyses of Software Engineering Data, where the organizers proposed a research challenge using data from some selected open source projects. The challenge topic was to try to say something meaningful about a software project based on that data and any additional information that could be found publicly in the Internet. The deadline for the task was 12 hours.

Martin Beck and I started to play with one of the datasets, that contained the log of the git repository of PostgreSQL. Just at the first look, we realized that Julian Assange was one of the committers that appeared in the dataset.

Our report tracked all the activity done by Julian, whichs was done only during a month in the summer of 1996. In that month, Julian merged several large patches into the main development trunk of PostgreSQL. But the biggest surpirse was when we read a thread in the pgsql-hackers mailing list, where the repository admins were asking for confirmation about the different email addresses used in the old CVS repository. They found that one of committers was Julian Assange, but no one could remeber his contributions to the project, and even that he was part of the project in the past. They even wondered whether they should make public that information (which was irremediably public already, as it was in the public CVS repository of PostgreSQL). It is very surprising that in spite of having software repositories, that tracks the whole history of a software project, making it possible to obtain the codebase as it was at any point in the past, and also making it possible to recover what changes were made by whom, in spite of all that precise information, the developers could not remember Julian Assange's contributions.

One possible explanation could be that his contributions were casual and not very important. However, one of his commits was over 500 hundred lines of code, and consisted of a patch that affected the very internal parts of PostgreSQL. Why was a casual contributor making large commits and merging patches in PostgreSQL? If he had gained the right to make such bold commits, he probably had to be close to the core members of the project. But if he was close to the core, why did he do only six commits and disappeared afterwards? Moreover, there is no trace about his activities in any other mailing list or repository. Why was his activity so silent even though it was a large rewrite of one of the main parts of PostgreSQL?

And above all, why do not the current developers remember him?

Written on Jul 07 2011 | Tags: #research, #pased, #msr
blog comments powered by Disqus