Democratizing the Archive: An Open Interface for Mediation
Democratizing the Archive: An Open Interface for Mediation.
"Effective democratization can always be measured by this essential criterion: the participation in and the access to the archive, its constitution, and its interpretation." Jaques Derrida, Mal d'Archive (Archive Fever)
We address specific problems with media archives of contemporary U.S. House and Senate floor proceedings through the implementation of an alternative archive. Metavid shows how contemporary archives impede democratic access to the production of meaning around context specific online [re]presentations of elected Representatives. Contemporary archives act as gatekeepers to meaning production by; implementing costly permission based access to public media assets; promoting the production of static, opaque consumable mediations; and engaging in proprietary encapsulation for self-preservation. These problems are traced to the application of restrictive broadcast production metaphors to the internet, where less restrictive forms of participation are possible. The conditions of contemporary archives are generalized as a consequence of operating in a culture of consumption; a culture where meaning is produced to be consumed. We present an alternative to the consumer based broadcast model in the online context. By making all document and software source[s] freely available for reuse through free licenses; alternative modes of archival engagement are possible.
Archival engagement is understood as mediation. Mediation of an archive negotiates large sets of source material into shorter consumable pieces of meaning. Remediation is the process of re-negotiating the meaning produced by an archival mediation. To address the problems of contemporary archives of audio video legislative proceeding online, source media assets should be made available for participant [re]mediation. To insure continued transparent and malleability, participant mediations should be open and contestable to non-destructive [re]mediation. 
Borrowing from Pablo Freire notion of liberatory education engagement the notion of praxis is used to shape the role of open source development processes in relation to 'participants'. The notion of praxis and its application to open source and free software systems informs both the practice and framework for the metavid project. The guiding praxis is a continuous process of collaborative free and open reimplementation of the tools for meaning production.
The praxis of Free/Open Source Software (F/OSS)  is used to enable this alternative archive. We encode Legislative audio video media assets in open formats and make them available online for reuse. The software for mediating this archive is open source. By building on the praxis of free and open source content/software systems, metavid address specific problems with contemporary archives. The praxis of open source dictates alternate modes of production; users produce malleable content and software for participation rather than consumption. Mediations of the metavid archive with the metavid software is also transparent, contestable and collectively owned by participants. Free software and online spaces such as Wikipedia  inform this alternate praxis. The adoption of participant centered systems as an underling framework for the construction of the metavid archive addresses our specific criticisms of contemporary U.S legislative proceedings archives. We explore potential participation scenarios and ongoing participation with the metavid archive.
Students, artists, open source programmers, bloggers, university professors and intellectual property lawyers actively collaborate to address, implement and sustain alternatives to the status quo of contemporary archives. The diverse group enables the metavid project to address the obstacles to implementing alternatives to contemporary audio video legislative archives. The metavid project is comprised of two major efforts: (1) The leveraging of intuitional positions to de-encapsulate proprietary footage and return it to the public domain. We convert consumable audio/video records into media assets for online participation. (2) The construction of a system for archival mediation that reflects the praxis of open source. We create and enable the creation archival mediations that are contestable, transparent and reusable. A criterion for evaluation is established and disadvantages of the praxis of open source approach are discussed. The future of legislative archives is contemplated along with the possibilities for ongoing participation with metavid project.
Motivations for Metavid and its Contested Legality
Before delving into the problematics of contemporary legislative archives, the basic motivation and initial legal obstacles to the implementation of metavid will be presented. Before metavid existed, audio-video legislative archives where non-malleable, very expensive,  and had very restricted use. C-SPAN, the primary source for legislative media assets, required any media object produced from their existing archives to be licensed for particular usage contexts  and restricted citizens from sharing these media assets online. These restrictions seemed inappropriate for content that is a public media asset. The metavid project returned these media assets to the public domain and through the praxis of open source developed means to democratize the mediation of these media assets.
Since the late 1970's, footage of the US Congress has been produced by employees of the US House and Senate, and as such is legally a 'Government Work' and is in the Public Domain.. From the cameras on the floor, footage is fed to media outlets which have press credentials and access to the press gallery. Most citizens experience this content mediated by commercial news and television networks as they take snippets from selected speeches to summarize the proceedings into “news”. (Figure3-4) The one station that rebroadcasts the press gallery feed in its entirety is The National Cable Satellite Corporation, (C-SPAN), a non-profit entity run by the cable industry.
As metavid did not have access to the raw feeds from the House or Senate press gallery, the path of least resistance was to appropriate C-SPAN's broadcasted footage. The major obstacle with this approach as documented in the problems with contemporary archives is that C-SPAN sends legal threats to those that use their footage. Investigations showed that C-SPAN’s distribution control over government produced content rested on their encapsulation of the footage with their trademark. This was further apparent in the text of C-SPAN classroom copyright page which stated “As public domain material, the video coverage of the floor proceedings of the U.S. House of Representatives and of the U.S. Senate is not subject to this license”.  As trademark law dictates a failure to defend a trademark results in the loss of rights to it. Consequently, C-SPAN litigates if their trademark is rebroadcast. Rather than working with C-SPAN to attain a license to rebroadcast their trademark (future conversations revealed they would not be interested in such an arrangement), we chose to cover C-SPAN broadcasts with a graphic that states “public domain”.
Metavid’s Legal Framework
This graphic is applied to the video files as the metavid system transcodes them. By removing the trademark from the video, metavid de-encapsulates the footage. De-encapsulation is the process in which the original public domain material is copied out of a copyrightable encapsulation and republished without licensing the material from the publisher of the encapsulation. While our particular practice has not been tested in court other cases may establish its validity. In the 1991 Feist Publications v. Rural Telephone Service  Supreme Court case Rural claim to copyright of telephone listings was dismissed. Prior to this case the subsistence of copyright law for non-expressive work followed the “sweat of the brow” doctrine that gave copyright to anyone who invested time and energy into the republication of factual or public knowledge. Assessment Technologies v.s WIRE data reinforced this decision. This case established that a copyright holder in a compilation of public domain data cannot use that copyright to prevent others from using the underlying public domain data, rather can only restrict the specific format of the compilation, if that format is itself sufficiently creative. 
With the legal support of the University and the previously mentioned cases, we moved forward with our de-encapsulation approach. Figure 1 & 2 illustrate this simple process of removing the C-SPAN trademark and overlaying it with a public domain graphic. The supplemental informational graphics such as the speaker’s title, state and party are factual, and should be similarly legally usable.
C-SPAN Contacts Metavid
The praxis of open source dictated an open development environment and an early version of the site made legislative proceedings publicly accessible. C-SPAN legal council located these early versions of the site and contacted Professor Warren Sack advisor of the project. C-SPAN requested a phone conversation and Warren replied stating we would like to include a team of people in the conversation. C-SPAN legal counsel discouraged any dialog about the content in question with the following,
“Given the nature of C-SPAN's legal rights, I don't think there is a need to assemble a team of people on your end. In our phone conversation I will describe to you the nature of the legal harm and the practical harm the website is causing C-SPAN so that you will willingly agree to halt the infringements. My experience over the last 25 years has been that many otherwise informed people are under the false impression that C-SPAN's video product is somehow in the public domain and available for any use. We have been very successful over the years in persuading such unintentional infringers to respect our intellectual property rights, often without resort to litigation. My hope is that we can reach the same result here.” (E-mail correspondence with C-SPANs general council.)
Conversation with other unauthorized users of C-SPAN broadcast and our conversations with C-SPAN highlight C-SPAN’s tactical approach to bringing down unauthorized mediators of government proceedings. C-SPAN blurs the line between C-SPAN produced content and encapsulated government produced media. And thus directs online participants to forfeit the use of these otherwise public domain media assets. The university legal council and C-SPAN legal council conversation resulted in the removal of a small portion of the archive consisting of Judge Alito Confirmation hearing which C-SPAN produced with their cameras. The rest of the government produced House and Senate floor footage remained online and metavid’s de-encapsulated method was not legally challenged by C-SPAN. Later in the paper we will look at conversations of un-authorized mediators as C-SPAN shut them down via legal threats. Also the environment and metaphors under which C-SPAN is operating will be considered to explain the public good they do within the broadcasting context and how their service is problematized in online environment. In the concluding section I will discuss current conversations we are having with these un-authorized mediators.
Democratizing the mediation
At the same time we worked to return this content to the public domain we also worked to democratize the mediation of this content. As detailed in technical overview section of this document, metavid’s development choices forged open connections between open projects and applied the social constructs of contestable collectively owned participatory spaces. This praxis of open source proposed many avenues of development.
Initially metavid was simply an interface for search and retrieval. Participants’ engagement primarily consisted of entering words or phrases. The metavid system would perform basic queries against the database of close caption transcripts. This satisfied the goals of making this footage available. By adopting the praxis of open source additional weaknesses of contemporary archives were addressed. The interface for search and retrieval and the database files it queries are open for inspection, alteration and contestation. This openness attracted other university students to work on the project and we oversaw two undergrad developer participants who worked on the metavid code base.
Development is ongoing and developer/participants continue to create a number of ways to mediate the metavid archive. These mediations include: an enhanced search which allows web participants to query for videos of their representative, a system for displaying campaign finance data for a given speaker, spaces to document and describe clip’s, and a system for building open sequences that can be remixed by other participants, a profile view of particular politicians, and a drag and drop sequence system. All these mediations are built from the open source framework of the project allowing these systems to be reworked and remixed and rerun. High level participation such as building a sequence with the web interface also reflects adherence to the open source praxis by providing easy access to the “source” of any sequence that is shared between participants. All mediations that are produced in metavid display their source upon request, making mediations contestable and re-publishable. A detailed overview of the functionality of these components will be addressed later in this document.
Defining the problem: Force fitting the broadcast model to the Internet medium.
To better understand the driving methodology of participation in the metavid system we should first consider the status quo of video archives of legislative procedings, the structural choices made in the construction of these archives, and the difficulties for particular participant mediations in these archives. A discussion of these choices and difficulties for participants will inform the praxis of open source selected for the development of the metavid project.
Citizen can attend the House or Senate proceedings, in physical visits that are mediated by rules, regulations and metal detectors  and witness proceedings. Obviously this is not a practical means of engagement for the large and dispersed population that elected legislative members ostensibly represent.
C-SPAN provides televised broadcasts which empower citizens to engage legislative proceedings on a level that was not previously possible. However opportunity to monitor elected representatives is tainted by practical considerations such as the time conflicts of the average work day with the legislative schedule and the mundaneness of day in and day out coverage. This results in very few citizens investing time to view the c-span broadcast converge. 
Rather than constantly watching C-SPAN, citizens use newspapers and television networks to monitor and inform them of "news" happenings. These networks mediate the original record with advertisements, commentary, coverage choices and the selection of commentator voices transforming bits and pieces of legislative proceedings and other events into consumable news objects. (figure 3-4)
Compiling Closed Source News
News networks compile source material making it proprietary, consumable and static. The source material is encapsulated in the network news’ product a particular mediation of source media assets. These mediations are less contestable as the apparatus of reception (television, newspapers, and radio receiver) differs from the apparatus of production (network news studio, print production house, radio studio). Furthermore broadcasting licenses, high production costs, and broadcast technology limitations restrict production to a select few.  The broadcast model dictates that only official information producers need reusable access to source media assets. The model is reflected in the government’s distribution strategy for House and Senate footage, providing access to source media assets to those with official press credentials.
The limited access to source materials and the compiled news product limits citizens ability to renegotiate these mediations, and impedes democratic access to the production of meaning. This broadcast model problem has been challenged by culture jammers  and media remixers such as EBN. With the advent of network neutral  interconnected information systems, (the Internet) and open computational systems  (the PC), the base apparatus for both production and reception shifts the broadcast-consumer relationship and opens up remediation to a larger set of participants. 
In the broadcast framework the citizens can not mediate the original source material because they don’t have access to it, so they re-mediate produced content. Because mediations are copyrighted the only legal recourse for un-funded or unapproved re-mediators is fair use. Unfortunately, the space of fair use online is very narrow. Powerful information distributes, such as Fox News, CNN or C-SPAN, can simply threaten legal action to achieve their aims without necessarily having to argue their case. One example of C-SPAN threat against bloggers is detailed later in this paper.
Official mediations representing the point of view of the newspaper and television networks writers and commentators no longer represent the end of techno-mediated meaning production. In the context of legislative proceedings, let us consider the openness of the text record as a point of contrast with the audio video record.
The open legislative text record
Since January of 1995, the U.S government has published detailed text records of legislative proceedings through the library of congress and the TOMAS web site. Sites which are not sponsored by the government or the established news media have sprung up and made use of these source documents to build their own mediations of the legislative archive. GovTrack, Project Vote Smart, Open Secrets, Congresspedia, and many other sites make use of the repository of public domain text material which is made available by the government. The wide availability of this source material and the structural qualities of the internet medium  enable archival systems to flourish. These mediations services created by individuals and organizations operate with various degrees of openness. Open Secrets republishes their analysis for a small fee, this publication does not impede the availability of the source data. The source data can be acquired directly from election commission web site. In contrast there is no such public reusable repository of audio video records; in its place there was until recently there only C-SPAN’s online offerings. 
The C-SPAN archive
C-SPAN began cable broadcasts of congressional proceedings on March 19th, 1979 and senate proceedings in June of 1986 . In a broadcast centric model of distribution C-SPAN has provided a valuable service for citizens to view proceedings in real-time in their entirety. C-SPAN television offerings are commercially free and their web offerings are inclusive of all their programming. The video are available on their web site for around 2 weeks with some programs being permanently viewable. The temporary web archive of video available on c-span.org hosts the c-span content in Real Media format. The Real Media format generally enforces the broadcaster mentality in its structure; videos are not downloaded rather they are “streamed” from the broadcaster. These files are highly compressed in a way that does not facilitate reuse. This format makes it difficult to edit or reuse these online clips. Additionally these clips are encoded at very low bitrates at postage stamp size under the assumption that people will only view these proceeding rather then want to pull out segments and recontextualize them. The software makes the copying of the video for remix and reuse difficult. C-SPAN is a nonprofit “offered as a public service” so what is their motivation in these restrictions? Our legal correspondence with C-SPAN is illustrative of their position.
“We copyright our product so that we can prevent others from copying it and giving it away. The Internet, unfortunately, has made such copying and distribution nearly cost-free. I will explain that our product is not in the public domain, even though many people wrongly think so. Others may want to "support" our public service mission by making our product available by other means, but they do so at the risk of undermining our ability to create the product in the first place. When we allow such alternate distribution modes, we do so by negotiated licenses which include license fees” email conversations with C-SPAN”s General Counsel (bold added)
C-SPAN sees the value the archive in terms of its ability to deliver widely available viewing access while maintaining sustainability. C-SPAN copyrights & applies their trademark to all the material that it broadcasts. In the case of C-SPAN produced content, such as Washington Journal and Booknotes, this is a logical approach to support C-SPAN’s production costs. Unfortunately, a similar model is applied to government produced content via trade mark encapsulation. This enables C-SPAN to sell legislative proceedings along side coffee mugs and other C-SPAN produced content as a means to support c-span archiving and broadcasting efforts.
We argue that the use of artificial scarcity should not be applicable to public domain source content. These restrictions and licensing agreements transform public assets into consumable objects. These consumable objects cannot be reworked, reused and mediated. These consumable objects cannot be transformed after purchase by condition of c-span license:
“Requests to obtain videotapes in formats other than VHS and DVD and for uses beyond viewing or research require C-SPAN approval. C-SPAN zealously and actively monitors and protects its intellectual property, including the video it produces and C-SPAN registered service marks and logos. C-SPAN is a private, nonprofit organization. It does not, and never has, received any government funding. C-SPAN video is not in the public domain C-SPAN specifically does not allow airing of the content on public access or local cable television, posting or streaming from an internet site, fundraising use or commercial advertising. cspan licensing and use [http://www.c-spanstore.org/shop/index.php?main_page=specialuse (bold in the original)
These restrictions result in particular types of mediations being possible while making others very difficult. Participant centered mediation via segmentation and republication in participant centered contexts such as a blog or website is specifically prohibited. Open ended participant centered mediations of the entirety of the archive are made very difficult via these restrictions, and are only really possible via dedicated funding and access to c-span Archive located in Purdue Research Park in West Lafayette, Indiana.[ http://www.c-spanarchives.org/Info/aboutarchives.php] For example the metavid system of participant generated archival mediations is not possible without full access to all the meta data and public domain audio video material. As a consequence of C-SPAN’s closed archival structure, participants cannot interrogate or rework the algorithms used to search, retrieve and present the archive.
Beyond these technical and legal restrictions, cultural conditioning frames production as consumable mediation rather than a negotiated communication, and cultivates a broadcast medium mindset. This affects citizens’ perception of how to engage the cultural media assets by bounding cultural engagement with the broadcast mentality. This limiting effect is a consequence of a culture of consumption; where production is understood as a “one to many” relationship. Media theorist Marshall McLuhan warned that “forcing new media to do the work of the old” can have undesirable side effects. These undesirable side effects of applying the broadcast model to the Internet medium have resulted in a cultural bounding of creative inquiry.
C-SPAN and Intellectual Property as Documented by the un-sanctioned re-mediators:
Rather than laying out the legal threats used to shut down web sites which engaged unsanctioned mediations, I will draw from internet posts that have interpreted C-SPAN’s take downs requests. This aims to let the un-sanctioned mediators to speak for themselves. This will allow the reader to construct a picture of the restrictions faced by un-sanctioned re-mediators of contemporary audio video legislative archives. These posts illustrate the confusion faced by these online participants as C-SPAN makes legal threats to shut them down.
We start with the C-SPAN intellectual property section of wikipeida that I have participated in writing. This collective text represents a form of collective community documentation of the contemporary legislative archive problem. In the concluding section I will discuss ongoing conversations with these unsanctioned mediators to re-enable their usage of the public legislative proceedings videos via the metavid website. The following is an expert of intellectual property section of the C-SPAN article as featured on wikipedia as of may 24th, 2006 The latest version in all likelihood does not match the excerpt below:
“C-SPAN maintains an intellectual property enforcement policy and ‘zealously and actively monitors and protects its intellectual property’ . For example, in another section of the C-SPAN site they state that all ‘C-SPAN video is not in the public domain’ . In what would initially appear to be a contradiction, C-SPAN's teachers resources page states that that the House and Senate footage is available in the public domain . However, even though the House and Senate footage itself is in the public domain, the House and Senate coverage shown on the C-SPAN networks is not according to C-SPAN. By applying C-SPAN logos and graphics to the proceedings of Congress, C-SPAN makes a claim of copyright to these audio/video documents. Unlike the modern interpretations and performances of Bach and Tolstoy which constitute expression and are therefore copyrightable, C-SPAN copyright’s are applied to documents of public record made available by the government. It has not been tested in court whether C-SPAN's graphics and the factual information such as who is on screen constitute a form of expression. The use of C-SPAN’s trade mark is different from copyright and is afforded different forms of protection. C-SPAN has engaged in numerous actions to stop parties from making unauthorized uses of their content, even in cases where the footage is the House and Senate proceedings. (encapsulated public domain content) C-SPAN has not prevented the use of their footage by the major cable television networks. However, C-SPAN has targeted online sites for similar usage. Some senators believe C-SPAN's coverage to be in the public domain.  Websites that have copied unauthorized C-SPAN clips of government proceedings have received requests to take down the content. For example, Dem Bloggers received a take down request for some clips they had posted.  Additionally in February of 2006, WRPI's Dennis Karius was fired for airing audio from C-SPAN's web stream on his radio program.  In May 2006, C-SPAN requested the removal of the Stephen Colbert performance at the White House Correspondent's Dinner from You Tube while allowing it to remain on Google Video . The removal caused a bit of controversy  in the blogosphere, including a mention on the Boing Boing weblog. The video was available directly from the C-SPAN website ."
as featured on wikipeida: http://en.wikipedia.org/wiki/C-span#C-SPAN_and_Intellectual_Property
Dem Bloggers :: No more C-SPAN Videos
DemBloogers was a community blogging site runs which posted segments of C-SPAN videos for blogers to comment on. On june 28th C-SPAN contacted Dem Bloggers and treated legal action unless the videos clips were removed from the site. What follows are some selected postings from the announcement that these video had to be taken down. 
“Sorry to hear about the trouble with CSPAN. I just signed up here anyway because you guys have done such a great job, and I know you will continue to even without the CSPAN videos. I understand CSPAN's concern technically (legally) I think, but I also think it is time that the American people had better access to what goes on in our government. It's totally unfair that Bush gets all the free airtime he wants, repeated ad nauseum in the media, but those of us who work during the day are at the mercy of CSPAN's decisions on what they will rebroadcast from the day's house and senate sessions. Not that it could happen in this Congress, but given the state of technology today, why shouldn't we expect video recordings of Congressional proceedings to be made available free on the net, like text is on Thomas? Oh, right. Santorum and cronies are even trying to take away free weather reports from the National Weather Service. (never mind that our taxes already paid for it). Speaking of which, does CSPAN or cable operators get subsidies from the gov't (i.e. our taxes) in consideration of them providing coverage of Congress?” by MH in PA on Tue Jun 28th, 2005 at 05:06:53 PM EST
Re: No More C-SPAN Videos I think I'd be asking a lawyer to explain just what your rights are in this regard, and whether a hoity threat from CSPAN is nothing more than an act of intimidation. Did you contact anyone to make sure they are within their rights to ask you to stop hosting excerpts of taxpayer-funded programs? by ddpoli on Tue Jun 28th, 2005 at 09:11:05 PM ESTRe: No More C-SPAN Videos the problem is that I don't really have any "lawyer friends". I really don't want to declare anything and I have fully complied with their request. This thread is moving down fast. Let's continue the discussion via e-mail. (use the contact form). Talk to you soon! by Mark Williams on Tue Jun 28th, 2005 at 09:16:44 PM EST
Re: No More C-SPAN Videos
That's really a shame. I don't have cable TV and I really started to enjoy watching the actual people talk about what's going on instead of hearing it from some news source's perspective.
Is there a way where you can put a link to C-SPAN when a relevent video is posted by them? I went to their website and it's poorly designed. You take one look at all the millions of links and a headache ensues.
by unlickedcub on Wed Jun 29th, 2005 at 02:54:30 PM EST
Designing openness; the praxis of open source and Metavid
The defining characteristics of the praxis of open source borrows heavily from a Freiren notion of praxis. This Freiren definition is clearly articulated by Peter McLaren in “Icon of Liberation – Knowing Freire”. 
“Praxis is the complex activity by which individuals create culture and society, and become critically conscious human beings. Praxis comprises a cycle of action-reflection-action which is central to liberatory education. Characteristics of praxis include self-determination (as opposed to coercion), intentionality (as opposed to reaction), creativity (as opposed to homogeneity), and rationality (as opposed to chance)” .
The metavid archive seeks to embody qualities that enable critical engagement and the construction of conscious beings as the non-coerced producers of meaning. The never ending development cycle of open source software is appropriated as an action-reflection-action state in which software remains malleable for reflection and re-implementation. The process of learning to engage with metavid code base, is a process of liberatory education; which is not bounded to the instructors/developers original implementation, rather the code is left open to renegotiation in collaboration with the learner. By building the metavid software on an entirely open source base but not exclusive its functionally to this base  participants are free to engage with a productive level of transparency and metavid does not limit its pupils participation through free software purism.. The rational approach is informed by efforts to broadly shift cultural production from within the current cultural context of proprietary systems. The action reflection action approach enables a pedagogical process that builds off the pupil’s current working knowledge and familiarity with propitiatory software. This enables shifts in their cultural practice from production to be consumed to a production for participation.
As previously mentioned the praxis of open source was applied to address both issues of access and the democratization of mediations of the archive. The following sections focus on the informed implementation of this praxis. First the elements of reflection that inform the reusable access to the archive then the action of implantation followed by informed reflection about how access can be further improved by extending additional open source frameworks.
Beyond view only access; making the archive reusable
In the case of making the archival content available the praxis of open source was informed by related theory and practice of free culture. In his 2004 book Free Culture, Laurence Lessig argues that our culture is becoming less free as powerful corporations use the law and technology to lock down culture and control creativity (Lessig). The free culture framework is an extension or derivative of Richard Stallman’s concept of free software, and advocates for a derivative friendly copyright environment where individuals can create culture and society by building on shared cultural assets.
“Free cultures are cultures that leave a great deal open for others to build upon; unfree, or permission, cultures leave much less. Ours was a free culture. It is becoming much less so.” (Lessig pg 30)
The metavid project builds off of the free culture framework to address the question of access. For the metavid project, access to legislative proceedings is not bounded by a view only relationship; rather access is thought to include the ability to reuse the source footage. The question of access is restricted to the specific domain of online access to legislative proceedings. Larger questions of access, socio-economic inequity and the digital divide which negotiate conditions and implementations of intellectual property are not addressed and considered out of the scope of this iteration of the metavid project. Metavid is a specific application of the free culture framework to online offerings of audio video legislative proceedings. In this context the public is restricted to online participants with open computational devices. Public media assets are digital files which have no restrictions on their copying and reuse. The archive of legislative proceedings is initially produced by U.S tax paying public. Without a framework which makes these public media assets available for reuse, “the public” is very narrowly defined as organizations and institutions that have press credentials appropriate and own these assets. With an online framework for copying and reuse of these assets are owned by a broader online public.
As described in the legal section the metavid project de-encapsulated C-SPAN broadcast. The result of negotiations with C-SPAN was not a limited license for university use. Instead the content would adhere to the free culture framework by making these media assets available to arbitrary online participants who are not associated with the university to view and reuse this material. Support from faculty, and collaboration with the university Intellectual Property lawyer was critical in these negotiations as mentioned earlier in this documents, Sites which do not have institutional support have been forced to comply with take down requests.
Our solution to cover the trade mark is a short term solution which was productive given the initial scope of the metavid project. Future versions of metavid or similar archival systems should seek direct access to the material.(ie not through C-SPAN) The process of de-appropriation is quite laborious. Beyond overlaying the c-span trade-mark footage must be manually scrubbed through to eliminate C-SPAN proprietary programming that is interjected at various moments through the day. This condition of the footage limits real-time mediations which were tested with the metavid technology but are limited by the C-SPAN zero tolerance copyright policy which would bring the site down if proprietary content slipped through. Additionally the de-encapsulation method has not been proven in court, which may limit usage of the footage by cautious participants. In order to legally sustain public rights to reuse of this footage the above mentioned issues will need to be addressed.
Metavid Technical Overview
Beyond legal obstacles to making the footage available, there where also technical obstacles in building a massive audio video archive with relatively little resources and the time constraints of a graduate student work load. The following technical overview will guide the reader through the technology and design choices and how the project built on other open source projects and the massive programming project (10,000+ lines) to develop the capture and mediation software. The praxis of open source informs specific design processes. A do it yourself (dyi) process is adopted for building the computer servers for the project. Also discusses is the selection of a software license for the code base.
Strictly Open Source
Metavid makes use of many bleeding edge open source projects and technologies and the system is in contestant flux and technology choices reflect the praxis of Open Source. This praxis as described in designing openness, consist of an initial developer’s action of implementation or extension of exiting software framework. Then the developer reflects with participates who are learning to use and extend this developed software mediation framework. The new participant and original developer feely collaborate in a development-action-reflection framework or the new participants freely extend and or re-implement the software to an alternate context of engagement where the process repeats anew. These processes are neither mutually exclusive nor limited to 1-1 relationships, extension, reimplementation and participation happen simultaneously.
The strict adherence to free and open source software may be a limiting factor for other projects which are not operating under the praxis of open source but seek to emulate portions of the metavid project. Additionally strict adherence to open source and free software can have consequences for access, a detailed reasoning of why the praxis of open source dictated the solutions we used are warranted.
While the strict adherence to open source has posed challenges it allows entire system to be reimplementable without licensing costs, or limitation imposed by the original developer. As this technology overview will highlight every element of the system is transparent for inspection and reuse. This does not negate the conditions of this reuse being restricted by technical literacy, coding abilities and time constraints. Metavid does not aim to restrict participation to programmers or highly technical users but using all open solutions may be in conflict with goals of easy access. Using proprietary software in some cases may be more effective at facilitating easy access. But proprietary software comes with restriction in distribution or sharing, lack of transparency, and reduced participant control over the software that mediates their experiences and the production of meaning.
These conditions are balanced to maximize effective engagement for particular contexts. In the context of metavid it was possible to use entirely open source infrastructure and software. The basic use of the metavid archive is comparable to other online web services in searching and retrieving results. The complexity which is not hidden for use of the metavid archival video may include the installation of the relatively popular firefox browser and relatively unknown annodex player. Given the issues of access involved in the use of theora based codec a detailed explanation of why metavid chose theora is provided.
There are a wide range of codecs and players with various levels of adoption. If a project aimed to maximize ease of access for widest range of participants there are many common codecs and containers to choose from. A project could still adhere to the use of entirely open source tools and free software encoders while using “commercial” codecs. Dedicated video hackers have reverse engineered and re-implemented the encoding and decoding of many commercial and standard compliant audio/video encoding and decoding systems. The primary open source project in this area is ffmpeg and the Libavcodec library which includes support for Macromedia’s flash flv, Apple’s QuickTime Sorenson 3 codec, and Microsoft’s Windows Media codecs and many more. Projects such as xvid have successfully built entirely open source MPEG-4 implementations. The metavid project did not use any of these codecs even though it would have made access for online users simpler, by not requiring the installation of additional software. Metavid used a little known and not yet widely used codec named theora.
The praxis of open source dictated that an ideal codec used in the metavid system would be free for reuse without licensing costs, and be usable in free and open source video editing software. While the above implementations are open source, licensing and patent issues stain their usage. MPEG-4 based codecs for example must get a license from the moving picture export group for commercial uses and the flash based video encoding is wrapped in a closed source container format that promotes view only compiled access to online muti-media. Streaming software systems such as Real Media & Windows Media (currently used by C-SPAN) while widely propagated are surrounded in patent issues and make reuse of the footage difficult. (As documented in the problems with status quo archives section)
The Theora codec and ogg container are free from patent issues. The original creator On2 technologies has “irrevocably given royalty-free license of the VP3 patents to all of humanity” . Theora is quickly becoming more usable in wide range of contexts. By selecting the theora codec for the metavid archive online participants that visit metavid will inevitably expand theoras usage. Similarly the ogg audio codec is used for the audio portion of the streams because it has no licensing or patent issues. 
The selection of theora is not designed to enforce open source purism on users of the archive; this would limit the accessibility and be contrary to the goals of the praxis we have adopted rather we aim to make the archive accessible as possible while not sacrificing transparency and accessibility to the archives construction. Use of theora was motivated by the availability of an open source browser plug-in for firefox available on Linx, Windows, and OSX. This pulg-in does not substantially impede access to the archive because the nessesary software can be installed with a few mouse clicks. Future developments should make theora encoded audio/video files even more accessible. For example a theora extension for QuickTime has been developed, allowing Quicktime users and video artists to view and work with theora encoded content via QuickTime enabled artist tools (such as final cut pro and iMovie). 
Licensing, Openness, the aGPL, and the metavid code base
Licensing the metavid code base was an important element of the praxis of open source. We considered several licenses to secure the freedom of participants to re-negotiate the metavid software. BSD style licenses are generally considered the most liberal licenses and are comparable to releasing code under the public domain. The metavid code base under a BSD license would mirror the public domain qualities of the house and senate proceedings that made up the metavid archive. We did not select a BSD style license because metavid participants did not want mediations based on the metavid codebase to become closed source and lose the qualities that open source praxis promoted. Under a BSD style licensee future version of the software could be closed source and make mediation opaque, un-contestable and privately owned. Allowing derivatives to abandon the democratizing principals of the archival mediation software was not desirable. The GPL or general public licensee was considered a good fit as it would preserve access to the source code for derivate work.
We closely investigated the current version of the GPL (v2.0) in the context of online services and discovered that network services are not considered a form of software distribution. Since it was important that all participants in metavid have access to the source code regardless of wheatear the software was installed locally or ran over a network, we chose the aGPL (the affero general public license). This little known license is nearly identical to the GPL with the exception of online usage of aGPL code must include a link to the source. This address the issues of web services getting around the distribution of the alterations to the code base that they have made, by declaring “any user interacting with the Program [is] given the opportunity to request transmission to that user of the Program's complete source code”. (agpl section 2d)
Metavid was a do it yourself archive(DIY) because we built all the computers ourselves using off the shelf components and self-researched technical know-how rather than a professional service. Two computers where initially built for the metavid project, all together costing less than three thousand dollars. The web server consists of dual 1.8Ghz Opterons with 1 terabyte of raid 5 storage (5x 250GB drives) & 2GB ram. The Linux software RAID 5 offers cheap redundancy with multi disk reading performance. This insures relative archival stability as one drive dies the system will continue uninterrupted. The capture box has a single Atholon 64 3800+ processor and two pvr 250 mpeg2 capture cards. The hardware encoding capture cards enables metavid to capture box to capture two streams at once.
Data enters the metavid capture box via cable TV broadcast. The capture box is running fedora core linux (2.6.15-1.1833_FC4) with the ivtv modules inserted. The use of bleeding edge capture drives is not recommended as software bugs are still being worked out (there are no alternatives but the drivers are improving rapidly). The capture software is a set of scripts written in php which screen scrape  publicly available TV listing to know when to start capture content. The video and every few lines of close caption text is pulled from the video card and inserted into the local file system and the mysql transcript tables respectively. After the stream is finished for the day, another set of scripts handle post processing of the video.
Frames are analyzed for on screen text using gocr, this allows the metavid database to store who was talking when. At the same time these frames are inserted into the database to allow arbitrary frame time requests from the web server. Participant can then pull up frames just by altering a url request for example,
http://metavid.ucsc.edu/image_media/senate_proceeding_04-27-06?size=large&t=00:01:00 pulls up a an image of the senate_proceeding_04-27-06 at 1 minute into the day and http://metavid.ucsc.edu/image_media/senate_proceeding_04-27-06?size=large&t=00:03:00 pulls up an image 3 minutes into the day.
Once the post processing scripts complete a manual review of the footage takes place to remove any proprietary C-SPAN content. Often C-SPAN will intersperse its own programming when the house and senate take breaks or when there is little activity on the floor. The manual review sets in and out points for these segments. The valid segments are transcoded using the open source theora codec, via the ffmpeg2theora encoder. The vhook extension is used to mask the C-SPAN trade mark. Two versions are encoded one high quality version for broadcast usage and a low bitrate version for online streaming. The high quality version is encoded at 526x368 at 900kbs which is comparable to somewhere between VHS and DVD quality, and the streaming version is encoded at 320x240 at 300kbs slightly below VHS quality.
Once transcoded it is inserted into the metavid database. The valid time segments are synchronized with the database values for screen shots, close captions, and person tags. The admin interface manages hundreds of streams and the hundreds of thousands of associated images and text. (figure 8-9) These files are then made available through the web site. Each stream can be up to 10 hours long. This poses an obstacle to making the footage available to online participants and it would require a great deal of bandwidth to download a whole day of footage and special systems would have to be installed to search the video locally. To address this obstacle we implemented the open source mod_annodex apache extension.
Mod_Annodex and dynamic participant based segmentationMod annodex allowed for arbitrary key frame reconstruction and segment streaming based on url time requests. Mod_annodex is an extension to the open source apache web server software. It allows the url request string to specify arbitrary time segments from a given video file. Since metavids archival records of Senate and House proceedings were long it was important to allow participants to query for smaller digestible segments. In other words web clients could arbitrarily select a segment to view. For example a 10 second clip 1hour 35min into the senate proceeding for November, 10th 05 can be called with:
Once this url is entered we receive a clip of Senator Jon Kyl discussing the outrageous request made by prisoners including a request for a particular type of peanut butter. The open source praxis dictated that we would not pre-select the segments for participants to view rather, participants could alter the url string to request arbitrary segments. For example the same url mentioned above could have the end time altered to see the full context of Jon Kyl’s arguments.
Mod annodex is also consistent with praxis of open source because segments of large files to be downloaded to disk not just streamed in the web browser or video application. Web pages are dynamically generated for arbitrary clip requests and a link is provided to download the segment locally; enabling participants to work with the file in a traditional digital video editing environment.
A technical overview of the metavid reMediator system
The metavid reMediator system is by far the most complex component of the metavid code base. Following the above overview is a detailed guide of usage highlighting the system functions. We skip over common function such as login and session handling etc. The reMediator system runs in a Firfox web browser and makes use of the annodex video plug-in.
We built the metavid reMediator framework around modular components called overlays. These overlays can operate as tools for searching mediating the archive. The search overlay can query meta data such as who is currently speaking on screen, close caption transcripts and other data retrieved from online sources. Each overlay can be operated with other overlays in the reMedaitor context or requested as a stand alone web page. The stand alone display is friendlier to linking allowing participants to link to stand alone version of most overlays they interact with in the reMediator system.
The Archival Browser Overlay (.1)
The Archival Browser Overlay facilitates searching the archive. The advanced search functionality updates results in real time as web participantes add or remove filters. Any sub filtered search can be turned into a standalone webpage and or rss feed to be continuously updated if new streams match the given filter set. Currently only the date and close captions metadata can be searched and filtered. As more filters are programmed into the advanced search system the complexity of participant constructed queries will increase and the dataset. The archival browser also facilitates browsing the archive via of on screen appearances of congressman.
The Person Overlay (.2)
Currently the person overlay only supports the display of open secrets campaign contribution data (pictured in figure 7). On the other hand the framework supports the inclusion of many external web sources. The person overlay updates as different congressman and senators appear in the video playback, retrieving the associative campaign contribution information.
The Video Playback and Controls Overlays (.3, .4)
The video playback and controls enable video playback functions such as pause, stop, and full screen display. The video control also allows users to request arbitrary portions of the given stream by specifying a starting point and duration. A link is provided to a display the video in a stand alone web page.
The Sequence Builder Overlay (.5)
The sequence builder allows participants to build sequences of clips. Central to the construction of sequences is the ability for them to be reworked by future participants that view them.
The Close Caption Overlay (.6)
The close caption overlay displays close caption text. The close caption overlay updates the displayed text relative to the position of the video playback.
While the majority of the code was written by metavid’s principal developer, metavid’s open source approach and premise attracted participation by two undergraduate Computer Science students for class credit. While student components are still under development, their participation exemplifies the modular design & open source infrastructure that was specifically developed in the metavid system to encurge participant programmed overlays.
Infrastructure blog, wiki and GForge
The supplemental metavid infrastructure is a crucial component in enabling the open source praxis adopted by the project. The blog serves as platform to announce metavid features and comment on net happenings. The wikidocuments the project development and enables participants to describe clips. The ucsc dforge system hosts the code development and facilitates public access to the metavid codebase. Each of these components is publicly accessible, for a more detailed view of how that are being used in the metavid project vist the respective pages of the metavid site.
"If its correct, as I believe it is, that a fundamental element of human nature is the need for creative work or creative inquiry for free-creation without the arbitrary limiting effects of coercive institutions then of course it will follow that a decent society should maximize the possibilities for this fundamental human characteristic to be realized." Noam Chomsky 
Chomsky generalizes the human condition and a means to address impediments to its realization. Chomsky’s broad framework is aligned with the implementation specifics of Friere’s pedagogical praxis to maximizing non-coercive creativity. Attributes of Open Source development and free culture are interpreted as essential elements of a praxis designed to address context specific coerced creativity. The restrictions and commodification of legislative cultural assets is shown to be an oppressive force in coercing creative engagement with the archives of audio video records of legislative proceedings. The praxis of open source is an effective means for implementation of alternatives. This paper points to but makes no proof of the broad applicability of the praxis to other contexts. A broader application of this praxis would be week without a full consideration of the specific social-economic conditions negotiates its applicability. As the cycle of action reflection action continues with the metavid project as we dialog with un-sanctioned mediators and online citizens to [re]enable their participation with the legislative archive. Participant groups will be free to participate in the evolution of the metavid project or take what we have done and build on it or implement it in other contexts. In addition to presenting at several events, and maintaining a public presence through the site blog, we have started to contact online communities to inform them of the existence of the metavid archive. We have posted on the C-SPAN sucks community forms Dem Bloggers (the group referenced in the un-sanctioned mediators section of the paper) and others have posted about metavid in various places. We have attempted to promote the site in a way that does not exceed our server capacity. Metavid usage has been steely increasing with 60 average daily visits in March to 300 average daily visits in early June. We have been able to participate in the system that we created as well. We document clips we find and share them with our friends; participating in the mediation of the legislative archive along with others.
The end goal of archival systems should not be the sustainability of the institution around these archives rather the creation of open systems that enable free association and participation. Participation is coerced and democratization is impaired, when the archive limits access to its construction and interpretation. The archive and its archivists should not be uniquely qualified to produce meaning for others; rather they should seek to build open archives where ongoing participation is sustained.
There are many references imbed in the text and footnotes any more resources of interest can be added here:
Robert A. Baron: Reconstructing the public domain, metaphor as polemic in the intellectual property wars.
In Defense of the Digital Divide as Paralogy (v 1.0)by Ulises A. Mejias
Why does multimedia specifically need open source? by Monty
- ↑ New version of a digital mediation need not destroy the original. Participant mediations can be open to reinterpretation while preserving the original mediation or participant production of meaning.
- ↑ The conjoining of Free and Open Source Software is not meant to negate their differences. Both practices make up the praxis; Ideological the praxis is aligned with free software movement in its view of non-free software as a social problem. Issues of insuring access dictate a relative approach used by the Open Source development methodology which prioritizes open source options over proprietary ones. For a more in-depth discussion of the differences of Open Source and Free Software see FSF article: “Why Free Software is better than Open Source”
- ↑ wikipeida is the free document online encyclopedia. Encyclopedia articles are created, edited and contested by online participants. All contributed information is under a free-document license . The mediaWiki software which mediates these contributions is open source under the GPL. The entire archive is available for download and reimplementation]
- ↑ The October 1977 passing of H. Res 866 permitted broadcast coverage of house proceedings,. For a overview of the history of congress and television see “U.S congress and Television” By Ron Garay  )
- ↑ transcoding is the process of transforming a digitally video file to another format, in the case of the metavid system we transcode mpeg2 video files to ogg theora
- ↑ the word compile is used here to draw a parallel between produced news content and the compiled binaries of closed source software. In close source software the source is not visible and the software mediation can not be easily altered.
- ↑ the cultural consequences of the broadcast model and television specifically have had been written about in volumes, which I won’t attempt to condense here, further reading on the television section of Wikipeida is a good start (feel free to add any resource that’s missing)
- ↑ 8.0 8.1 net neutrality is an important and contested element of how the internet is currently operated. It dictates that all network traffic is treated with the same priority, and that any user can connect to any other end user. For a good overview again see wikipeida’s network neutrality
- ↑ the open architecture of the PC is also being contested with technologies such as Digital Rights Management (DRM) and Trusted Computing. These technology apply broadcast model restrictions on produced media making it very difficult to remediate
- ↑ Unlike the EBN mediators, contemporary remixers do not have to purchase editing equipment, editing software, distribution licenses etc. Open source video editors run on open computational systems and once can freely distribute content on network neutral information distribution systems.
- ↑ un-sanctioned mediators are any person or group which can’t afford a license to use content or their activities are not approved by the license issuer; this is broadly inclusive of the activities of many artists, educators, critics, and citizen media activists.
- ↑ There is also feedNet which sells a web version of the senate proceedings back to congressman and news producers for web posting. The for-profit pay-for access model which is used at fednet is even more restrictive than c-span offerings. Feednet is used much less by online participants and they are not the focus of the status quo of online access to house and senate proceedings.
- ↑ Artificial Scarcity describes the use of technology to limit the availability of digital information. For more info see 
- ↑ the metavid application has been tested on developed on top of proprietary operating systems such as windows and OSX
- ↑ free software purism would enforce client usage of a GNU free software stack for development and or client access. In a free software stack every piece of underling code is free software. Linux distributions such as Debian adhere to 100% free software and metavid is compatible with such an environment while also being compatible with proprietary stacks such as Windows XP with Microsoft Internet Explore 7 with Microsoft Java or vlc activeX.
- ↑ Richard Stallman a free software programmer/philosopher is often labeled as grandfather of the free software movement. Stallmans sees free software as a means of building a more just society. For more information visit philosophy section of GNU: http://www.gnu.org/philosophy
- ↑ There are many interesting engagements with these issues of how intellectual property is negotiated by socio-economic conditions. Kimberly Christen’s Gone Digital: Aboriginal Remix and the Cultural Commons for example includes a critique of free culture’s universality. Also In Defense of the Digital Divide as Paralogy (v 1.0) by Ulises A. Mejias offers new ways to conceptualize Digital Divide that addresses commodification of knowledge that technology promotes
- ↑ For a detailed overview of why open source media formats are important in the contemporary corporate environment read xiph about page http://www.xiph.org/about/
- ↑ For a up-to-date list of supported playback software for theora encoded streams see “playing theora videos”  on wikipeida
- ↑ Software Distribution or BSD is the license applied to the unix variant that came out of university of California, Berkely in the 1970s a full history see BSD on wikipeida 
- ↑ The general public licence for which a large portion of open source software is licenced. For more information visit 
- ↑ DIY culture is broadly related to open source, home brewing and form of anarchical political activism. DIY is when people to do things "themselves" rather than pay for a proprietary or professional service. For a overview see diy culture
- ↑ Screen scraping is the process in which a program grabs a web page as if it was a normal user and then runs some algorithms (usually regular expressions) on the page to extract relevant data
- ↑ this infrastructure includes software such as version control, bug tracking systems and email lists. This infrastructure was provided by the universities GForge code development support software. GForge is a web-based collaborative open source development environment similar to sourceforge. Other open source infrastructure components are discusses in the infrastructure section
- ↑ the c-span sucks community is an alternative community forum that was formed after C-SPAn decided to eliminate the community forum on C-Span.org]