The GNU Project FTP Site:
A Digital Collection Supporting a Social Movement
GNU Head

This paper was written for University of Michigan School of Information 504: Social Systems and Collections, taught by Profs. Michael Cohen and David Wallace and GSI JoAnn Brooks, fall 1999. Thanks to Richard Stallman, Len Tower, Brian Youmans and Jonas Oberg for their enthusiastic participation in this research.

Introduction
Digital collections, whether complements to collections in the physical world or freestanding collections in their own right, are increasingly visible in the online world. Their nature as collections, however, is often obscured by the interest in the content they deliver, whether by email, HTTP, or FTP. As the ranks of digital collections and their users explode daily, analyzing these resources as collections with relationships to particular social systems becomes increasingly important. One prominent and fascinating digital collection is the FTP site hosted by the GNU Project of the Free Software Foundation, which distributes free ("open source") software to its users, with a dose of free software philosophy. Richard Stallman, founder of the Free Software Foundation, disagrees with the notion of a "user community" around the GNU system, arguing that the concept assumes homogeneity of experience or opinion across one group of users.1 However, it is precisely due to heterogeneity of experiences and opinions that a study of the GNU Project's FTP site is valuable. Analysis offers insights into access points and barriers for users, who all approach the collection with different capabilities, experiences, and expectations. In addition, the GNU Project FTP site also illustrates one path for "grassroots" digital collection development.

Methodology
I chose to study this collection because of my personal interest in the philosophy behind free software, as well as the limited volume of academic discussion around this type of collection. Staff members at the FSF were receptive to my research and generously participated in phone, email, and in-person interviews. My primary source for this study is the collection itself, available via FTP at ftp.gnu.org. Supplementary sources include staff interviews, materials on free software and the history of the GNU Project available on their website, and limited academic sources.

Major Concepts
A complete discussion of the GNU Project, the Free Software Foundation, and the social systems in which they exist are too vast for the scope of this paper. I will offer first some preliminary background and history to place the collection in historical context. Then, I have chosen to discuss some of the most illuminating concepts from the collection. The organizing principles of the collection provide a solid starting point for examining the relationship between the collection and its community. Then, the concept of document genre will explore that relationship in more detail by focusing on the README file. Moving to a view of the collection with larger granularity, I will discuss routines for maintenance and the role of the server mirrors. The mailing list archives, a gem of the collection, illuminate some other current uses. Finally, I will examine the physical materials funding and volunteer work that makes the collection possible.

What is GNU and "free" software?
GNU is as much a computer operating system as a philosophy. GNU is part of the GNU/Linux operating system, technically similar to the UNIX system. It was started in 1983 by Richard Stallman, an operating systems developer working at the MIT Artificial Intelligence Lab. Stallman has described a community of hackers at the MIT AI Lab who lived the GNU philosophy - sharing source code - before it was even articulated as such.2 Software was so precious that it was always shared among programmers. But, as more machines were developed and the software market began to grow, software sharing began to die.3 Stallman's belief that software sharing had a viable future drove the publication of what is now known as the GNU Manifesto. On Sept. 27, 1983, Stallman announced through ARPANET mail his plan to build a "complete Unix-compatible software system called GNU (for GNU's Not Unix), and give it away free to everyone who can use it".4 In order to accomplish such an ambitious goal, he called on programmers to contribute programs and time, and computer manufacturers to contribute computers and funding. In 1985, he founded the Free Software Foundation as a non-profit entity to head up the creation of the GNU operating system.5 That the development of the GNU system was originally conceived as a non-profit venture is fundamental to understanding the philosophy behind GNU, and by extension, the collection as a whole.

The mission of the GNU Project was to create a "free" operating system - not a matter of price, but a matter of liberty. The GNU website defines free software as follows:

Free software is software that comes with permission for anyone to use, copy, and distribute, either verbatim or with modifications, either gratis or for a fee. In particular, this means that source code must be available. ``If it's not source, it's not software.'' 6
The ambiguity of the term "free software" is clarified and contextualized by Len Tower, an early staff member at the FSF. He described the origin of the term "free software" as derived from popular names of social movements of the 1960s and 1970s focusing on civil rights and personal liberties.7 Software freedom is understood as pertaining to licensing terms in a broader social context; it challenges proprietary software by making the source code available for any user to modify and redistribute. The FSF created the GNU General Public License (GPL) as an alternative to traditional copyright (the "copyleft") which allows for modification and redistribution, but not restriction of the licensing terms on any derivative software.8 While the GNU Project was always oriented towards the production of an entirely free operating system, the FSF also took on advocacy for free software and open licensing terms as an area for social activism. The GNU Project is characterized by its fundamental assumption that licensing is a social movement; the complexities of their FTP collection bear this relationship out.

Nature of the Collection
The FTP site has many notable characteristics, several of which I will analyze further on with regard to their implications for the social system of users. Essentially, the FTP site is a digital collection of digital resources. The collection is mainly comprised of software, but there is also a significant archive of mailing lists, documents on the GNU philosophy, software documentation, and the GPL license. Software is written and contributed by programmers around the world, not exclusively by FSF staff. The collection is made available to users on computers with Internet connections via File Transfer Protocol (FTP) at ftp.gnu.org. The site allows anonymous FTP connections, so users do not need explicit access granted by GNU server administrators. Anonymous FTP is a key access point; users can freely download files to their own computers, and little information about their transactions is logged or archived. Each file in the collection is displayed in the directory hierarchy alphabetically by name; its size, last date of modification, and file type is also listed. In addition to being available at ftp.gnu.org, the site is also mirrored extensively. Mirroring produces an exact replica of the site on another computer, which in turn serves files to its geographic community. The GNU Project FTP site is mirrored throughout the U.S. and in thirty-five other countries, many of which host more than one mirror. The original FTP site is located on a computer in Sunnydale, California, in the offices of VA Research. GNU server administrators perform their administration duties remotely from the FSF office in Boston, Massachusetts, or from their homes around the world.

History of the Collection
It is important to place the collection in its historical context in order to draw out the purposes for which it was originally intended and now used. From the beginning, the audience for the collection was mainly UNIX programmers. In 1985, Stallman made a version of his popular editing environment software EMACS available via FTP from his work computer at prep.ai.mit.edu. He had also distributed this software manually on tape, charging $150.9 The need for two distribution schemes highlights the differing capabilities of contemporary users; only those with Internet access could download the software, so many potential users could not. After Stallman left MIT to focus exclusively on GNU, MIT generously continued to donate use of prep.ai.mit.edu for serving free software.10 As the first programs necessary for the GNU operating system were developed, they were also made available through the FTP site. As the FSF began to grow and gain the attention of more UNIX programmers, more software was developed (both there and by other interested programmers) and made accessible through the FTP site. As development blossomed, the collection blossomed along with it, with the frequent addition of more software and more users. The GNU operating system was finally achieved in 1997 with the development of the Linux kernel by Linus Torvalds. Since then, even more software has been developed for the GNU/Linux platform and added to the FTP collection.

Organizing Principles
The FTP site is explicitly organized around UNIX directory and file naming conventions. The opening page is prefaced with a brief welcome message that gives a general overview of the structure of the FTP site and links to frequently sought locations. The four main directories follow.

Early in the collection's history, the FTP site had only one flat file, with all the programs and associated text files mingling in the same directory.11 As more files were added to the site, the collection was "redisorganized" 12 into the current directory structure. The first directory, /bin/, provides "binary" (executable) files of the archive, compression, and directory listing programs tar, gzip, and ls used frequently within the collection.13 The following directory, /gnu/, is the largest. In /gnu/, there is one directory per software program, as well as the directories /GNUinfo/, /GNUsBulletins/, and /Manuals/, among others. /GNUinfo/ contains the text of the GNU General Public License, the GNU Manifesto, and instructions on acquiring software through FTP. /GNUsBulletins/ provides back issues of the now defunct print newsletter. /Manuals/ contains documentation for software hosted on the FTP site or provides links to sites that maintain current documentation for a program. The following directory back on the main level, /lib/, contains libraries of code which programmers can use to extend the functionality of software they are developing. The final directory, /pub/, is not actually a directory, but rather a link to the isolated area on the server that has been defined as the FTP site.

As indicated above, the organizing principles for the site are based on traditional UNIX directory structure. This organization presumes audience familiarity with such principles, which seems at odds with the recognition that the user community is heterogeneous. While intuitive to users with UNIX experience, this organization presents an access barrier to new users who do not approach the collection with that knowledge in hand. For example, the welcome message uses UNIX command jargon (cd, ls) as verbs in a colloquial way common among UNIX users. Meaning is assumed to be shared around those terms and around the meanings of the directories that follow. When asked how this organization is useful or not to members of the community, Len Tower emphasized that the organization is rooted in a UNIX heritage of small programs.14 As more programs are developed, he suggested that a future organization of files into directories by functional group ­ one directory for editors, another for compilers, another for utilities ­ might prove useful as well. Evaluation of the site's organization continues as the audience expands.

Document Genre
Document genre, like organization, is also an important element of the FTP site collection. Brown and Duguid in their essay "The Social Life of Documents" describe the role of document genre within FTP sites:

Within a community highly condensed forms of communication, which rely on the shared assumptions of the community, work well. Between communities these must be elaborated, often to the exasperation of the original community, whose members can see the elaboration as redundant. Anyone who has used the Internet much has probably come across the different approaches. Most ftp sites, which are usually constructed primarily for use within a known community, are almost completely inscrutable -- a collection of files with semiliterate names. Successful websites designed to engage people from different communities have, by contrast, a much more public face.15

README files, which appear frequently throughout the GNU Project FTP site, have a particular social context in the programming world. The README is usually a plain text file that accompanies source code or pre-compiled software. Within the user community, there are shared assumptions about the appropriate contents of the README file ­ usually they contain installation instructions or compiling tips, a list of recent updates or bug fixes, and contact information for questions or additional fixes. The README arises from a documentation tradition among programmers, and is viewed as part of the responsibility programmers have to other programmers, particularly in the free software community, where one can assume that software will be modified later by other users. Like other genres, the README is invoked in as part of particular recurring actions, like software creation or modification through updates or bug fixes.

Although it serves information needs within the user community, it does keep some boundaries between experienced and newer users intact. Further on in their essay, Brown and Duguid note that

...documents can patrol community boundaries rather than cross them. Strange formats, unexplained generic conventions, jargon, abbreviations, allusions, as well as private languages are all examples of ways in which documents keep people out as much as bring them in.16

The FTP site is intended to share, not occlude, information, and it is important to place the site within its social context and draw out elements, like document genre, which are common among the social system of GNU/Linux programmers. Looking at document genre allows for a view of the interactions between the collection, its content, and the social system in which they exist.

Routines for Maintenance of the Collection
Within the routines for maintenance of the collection, influences of the specifically GNU social sphere are also evident. In this case, I define maintenance as the addition of new software, deletion of outdated materials, organization of the materials, and access management to the site itself. Consistent with the themes of sharing, cooperation, and non-profit work that run in the GNU history, most collection maintenance is done by GNU volunteers. Jonas Oberg, a GNU webmaster and volunteer located in Sweden, provided me with background on collection maintenance. He describes "teams" of volunteers working together, coordinating work through a volunteer mailing list. For the FTP site, the "ftp-upload crew" consists of three volunteers who post new software and allocate user accounts to programmers who make frequent updates to programs hosted on the GNU FTP site. The mailing list archives, which I will discuss more further on, are rotated automatically at the end of each month by the mailing list server, which adds the new archive to the ftp site.17 In terms of collection organization, it appears that those decisions are made in teams as well, but seem to arise less frequently than the practicalities of moving software on to the site. In addition, README files and welcome messages like the one that greets users at ftp.gnu.org can also be updated by users with access to add files to that area of the server. These routines, and the importance of volunteer work that support them, emphasize the cooperative nature of the GNU Project itself.

Role of Server Mirrors
In addition to the forms of sharing that happen internally to the FTP site, the nature of the site itself also reflects sharing as a goal. Sharing of software is difficult, however, from just one server, for congestion and accessibility reasons. The server mirrors, defined in the Nature of the Collection section above, expand access to the software and improve service to all users. The main server at ftp.gnu.org often supports 80-90 concurrent users, so additional access points for acquiring GNU software are necessary.18 The GNU FTP site has mirrors nationally and internationally in over thirty-five countries, most of which host more than one.19 The availability of server mirrors internationally is crucial to making acquiring free software a viable option in many countries with steep Internet access fees and slow international network connections. Jonas Oberg explains that users sometimes have to pay for each megabyte of traffic transferred internationally, so "it's very important to have local mirrors so that users in that country can access our information without paying any significant fees".20 The server mirrors, then, play an integral role in expanding free software philosophy by providing additional access points. For users who host a server mirror, it is also a tangible way to show support for GNU and free software 21 while supporting the needs of their geographic communities. In this case, the user network and the physical network work in concert to expand access to the collection.

The Mailing List and Its Archives
The user network has been essential to spreading the free software philosophy since the beginning of the FTP site, and it still occupies an important position. Mailing lists are valued highly for promoting dynamic user-driven content 22 and the opportunity they give users to interact and share with other users. The mailing list archives house a rich organizational memory that reflects the participatory, community-oriented nature of the collection itself. The first mailing lists were started in the mid-1980s with the rise of Usenet news; GNU was even the first special-interest hierarchy name in Usenet.23 The emails from the lists are archived as plain text files and have been made available via FTP since their inception. They are now accessible from a link on ftp.gnu.org to alpha.gnu.org, a computer located at the University of Massachusetts Boston.24 The earliest archives show fourteen different mailing lists, each with a different orientation; the latest archives from May 1999 show over seventy.25 The lists are oriented mainly towards bug reports with GNU software and free software hosted on the FTP site; announcements of updates, new software, and fixes; help with particular software, and general information about GNU and free software. The FSF staff often refer general emails from the gnu@gnu.org email address to the mailing lists for further information, or even to the archives so that users can get a sense of the topics discussed on each.26 27 That the FTP site is a collection is truly exemplified in this area; not only does it collect software, it also collects its own history.

Materials Funding
By examining the collection in larger granularity, we can gain additional insight into the social context in which it is located. Materials funding for the GNU Project, like for other non-profits, were immediately important to its creation and success. Stallman explicitly requested donations of hardware from computer manufacturers in the GNU Manifesto.28 When the Free Software Foundation was created as a tax-exempt non-for-profit organization in 1985, the FTP site was still running off of Stallman's former computer at MIT, prep.ai.mit.edu.29 MIT allowed him to continue to use the computer to serve the fledgling FTP site. The ability to offer the FTP site from a computer no longer physically accessible underscores the importance of information as the collection's major resource. The nature of FTP as a communications protocol, as well as the ability to grant user accounts with particular rights, allows the stewards of the collection to interact with it remotely, just as users do. The same system of interaction with the collection is still in place; however, the FTP site is now hosted on a different computer. A few years ago, VA Research, a Linux systems distributor, donated a computer and bandwidth to serve both the FTP and WWW sites from their offices in Sunnydale, California.30 31 There is a recursive irony here; the FSF receives donations from the manufacturer of a computer system that would not be possible without their work. This irony is not lost on the FSF. As a non-profit, the FSF has not reaped the kinds of financial benefit from the explosion of GNU/Linux systems as a company like RedHat has. However, Len Tower says that the FSF prefers to "run lean".32 They accept donations of computers and other hardware, and even provide a way for users to donate by purchasing "deluxe distributions" of compiled GNU software and source code.33

Essentially, the FSF relies on donations to be able to continue to offer the FTP site collection. Critical observers might suggest that this gets to the heart of the problem with free software - no long-term support - and makes free software appear to be an unstable choice. As an organization supporting a collection like the FTP site, however, the FSF bears more similarities to other grassroots non-profits arising from the needs of a particular community. For the FSF, free software is not a business; it is social activism. When asked whether he considers himself a hacker or an activist, Stallman did not hesitate before replying - "An activist!" 34 The issue of materials funding for the collection, then, seems appropriate and germane to such grassroots non-profit organizations.

Volunteer Work
Another concept germane to the non-profit world is the important role played by volunteers, and this is no less true in the case of GNU. The majority of collection maintenance is done by volunteer workers in the U.S. and abroad. The webmaster volunteer team is responsible for server administration, HTML work, and public relations through the web and FTP sites. They also mentor other volunteers by guiding them into the GNU Project and maintain a list of tasks for volunteers on the GNU website.35 This allows volunteers with limited abilities to commit to contribute by completing a discrete project. Of course, long-term volunteers are critical as well. The visibility and amount of volunteers was initially enigmatic to me; why would hackers who have such financially valuable skills choose to use them volunteering for the GNU Project? Stallman believes that the volunteers firmly support the goals of the FSF and volunteer to demonstrate it. But more than that, they do it because it's fun.36 This perspective is echoed by Jonas Oberg, volunteer webmaster, who says his job is "incredibly fun...I don't think there goes a day without learning something new".37 The number of dedicated volunteers seems to reflect a claim to the FTP site as an important part of the social system. Conversely, without the volunteers, it seems unlikely that the collection could continue to be the robust hub of information sharing it is.

Conclusions
As a case study for the interactions between collections and their social systems, the GNU Project FTP site could easily fill a volume. The social system of hackers in which it evolved has expanded to include a social system of activists, and this is manifested in multiple ways throughout the collection itself. Broadening access to the collection via anonymous FTP and server mirrors are two significant examples of the effort to draw the community in to the collection. The nature of the collection itself as a repository for free software programs written by anyone, not just FSF staff, also explicitly involves the community in collection development. Within the community, there are shared meanings around documents and directory structures, and these are mirrored in the collection as well. Within the organization, they recall and refer to their history through the mailing list archives. Volunteer work and donation of materials have been critical to the viability of the collection, and have supported it admirably. The FTP site is a fascinating example of digital collection development taking place within the non-profit world, and there is certainly more to be written on this theme. For the Free Software Foundation, the FTP site is important for distributing software, but that function is secondary. The real value of the collection is that it is not merely a clearinghouse for software, but a clearinghouse for ideas.


1 Richard Stallman, personal interview. Conducted 12/4/99. back
2 Stallman, Richard. The GNU Project. Available HTTP www.gnu.org/gnu/thegnuproject.html. Accessed 11/18/99. back
3 Len Tower, personal interview. Conducted 11/21/99. back
4 Stallman, Richard. Initial Announcement. Available HTTP www.gnu.org/gnu/initial-announcement.html. Accessed 11/18/99. back
5 Stallman, Richard. The GNU Project. Available HTTP www.gnu.org/gnu/thegnuproject.html. Accessed 11/18/99. back
6 The GNU Project. Categories of Free and Non-Free Software. Available HTTP www.gnu.org/philosophy/categories.html. Accessed 11/1/99. back
7 Len Tower, personal interview. Conducted 11/21/99. back
8 Stallman, Richard. The GNU Project. Available HTTP www.gnu.org/gnu/thegnuproject.html. Accessed 11/18/99. back
9 Stallman, Richard. The GNU Project. Available HTTP www.gnu.org/gnu/thegnuproject.html. Accessed 11/18/99. back
10 Stallman, Richard. The GNU Project. Available HTTP www.gnu.org/gnu/thegnuproject.html. Accessed 11/18/99. back
11 Len Tower, personal interview. Conducted 11/21/99. back
12 I could not determine who originated the phrase "redisorganized", nor what exactly it meant. Richard Stallman liked the term, however, and thought it was an appropriate description of the FTP site. back
13 Thanks to Ali Asad Lotia, a dedicated GNU/Linux systems user, for giving me a UNIX tutorial. back
14 Personal interview. Conducted 11/21/99. back
15 Brown, John Seely and Duguid, Paul. "The Social Life of Documents," First Monday 1 (No.1, May 6th. 1996). back
16 Brown, John Seely and Duguid, Paul. "The Social Life of Documents," First Monday 1 (No.1, May 6th. 1996). back
17 Jonas Oberg, personal interview. Conducted 12/16/99. back
18 Jonas Oberg, personal interview. Conducted 11/20/99. back
19 The GNU Project. Mirrors of www.gnu.org. Available HTTP www.gnu.org/server/list-mirrors.html. Accessed 11/21/99. back
20 Jonas Oberg, personal interview. Conducted 11/20/99. back
21 Jonas Oberg, personal interview. Conducted 11/20/99. back
22 Len Tower, personal interview. Conducted 11/21/99. back
23 Len Tower, personal interview. Conducted 11/21/99. back
24 Brian Youmans, personal interview. Conducted 11/23/99. back
25 The GNU Project. Archive for the GNU Mailing Lists. Available FTP ftp-mailing-list-archives.gnu.org. Accessed 12/16/99. back
26 Brian Youmans, personal interview. Conducted 11/23/99. back
27 I used a similar method to discover the contents of each archive, and even found an early mention of librarians' support for the goals of free software on the info-gnu list. back
28 Stallman, Richard. Initial Announcement. Available HTTP www.gnu.org/gnu/initial-announcement.html. Accessed 11/18/99. back
29 Stallman, Richard. The GNU Project. Available HTTP www.gnu.org/gnu/thegnuproject.html. Accessed 11/18/99. back
30 Jonas Oberg, personal interview. Conducted 11/20/99. back
31 All FSF computers, naturally, run only free software. back
32 Len Tower, personal interview. Conducted 11/21/99. back
33 Brian Youmans, personal interview. Conducted 11/23/99. back
34 Richard Stallman, personal interview. Conducted 12/4/99. back
35 Jonas Oberg, personal interview. Conducted 11/20/99. back
36 Richard Stallman, personal interview. Conducted 12/4/99. back
37 Jonas Oberg, personal interview. Conducted 11/21/99. back

Copyright (C) 1999-2000 Michelle Bejian. Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.

Return to home page.