Open Sources

Free software projects are a growing and important source of software. They range in size from one develop to hundreds. Unlike traditional models of software development, they use the Internet to collaborate in a decentralized, self-organizing fashion. It is important to understand how such communities can prosper without traditional centralized control. Open Sources is a visualization that operates on existing code bases and mailing list archives in order to understand the relationship between the community's communication patterns and the code. It attempts to reveal characteristic signatures that may result in successful projects, as well as the locally hidden history behind code and mailing list archives.

The main interface shows a series of stacked colored horizontal layers across time (x-axis). Each bar represents a developer by a color that is locally unique. The height of a layer corresponds to the amount of a contributor's code in the repository at that time. An individual's cardinality in the stack refers to the order in which they submitted code relative to the group. Each bar is assigned a color from a small repeating palette artificially to help differentiate between individuals stacked together. Each color in the palette was chosen to have the same perceived luminosity as the rest in order to keep individuals from appearing to be more or less important for artificial reasons.

On the left of the interface is a listing of the developers to further explore the dataset. Below the user list are a set of buttons that allow the user to filter the dataset by selected individuals. One mode allows selected users to be highlighted by creating empty space between themselves and surrounding layers. Another mode only draws the layers of the selected individuals. Such a mode is useful for comparing individuals.

Communication events are depicted as small circles. Because we are looking at emails to a mailing list as activity indicators, further metadata is not necessary. They are first shown in the middle of the layer that corresponds to the sender.

Only showing the communication data in the authors layer was not sufficient to our goals. First, it unnecessarily increases cognitive complexity when trying to mentally create a global reference if there is a lot of activity. Second, depending on the scale, a participant with little code might not have visible messages even though they may support a lot of collaboration. Third, individuals who communicated before committing any code would be artificially filtered.

We decided to include a complementary area above the graph where all communication incidents are also represented. This technique gives the user a linear representation across time that can be visually filtered by manipulating colors to highlight particular individuals. If a short period of time has multiple messages, we avoid overlap by stacking the quantized messages vertically. When in a mode that highlights selected individuals, the non-selected author's communications are made transparent to preserve the global reference but give less emphasis. Should the stack reach the edges of its container, all the dots scale down to ensure a proper fit. Thus by looking at the height and density of a vertical strip users can easily spot and estimate communicative activity.