Search the history of over 279 billion web pages on the Internet.

Featured

All Texts This Just In Smithsonian Libraries FEDLINK (US) Genealogy Lincoln Collection Additional Collections

eBooks & Texts

Top

American Libraries Canadian Libraries Universal Library Community Texts Shareware CD-ROMs Project Gutenberg Biodiversity Heritage Library

Open Library

Children's Library

Featured

All Video This Just In Prelinger Archives Democracy Now! Occupy Wall Street TV NSA Clip Library

TV News

Top

Animation & Cartoons Arts & Music Community Video Computers & Technology Cultural & Academic Films Ephemeral Films Movies

Understanding 9/11

News & Public Affairs Spirituality & Religion Sports Videos Television Videogame Videos Vlogs Youth Media

Featured

All Audio This Just In Grateful Dead Netlabels Old Time Radio 78 RPMs and Cylinder Recordings

Live Music Archive

Top

Audio Books & Poetry Community Audio Computers & Technology Music, Arts & Culture News & Public Affairs Non-English Audio Podcasts

Librivox Free Audiobook

Radio Programs Spirituality & Religion

Featured

All Software This Just In Old School Emulation MS-DOS Games Historical Software Classic PC Games Software Library

Internet Arcade

Top

Community Software MS-DOS APK Software Sites Tucows Software Library Vintage Software Vectrex

Console Living Room

Atari 2600 Magnavox Odyssey 2 Bally Astrocade ZX Spectrum ZX Spectrum Library: Games Sega Genesis Sega Game Gear

Featured

All Image This Just In Flickr Commons Occupy Wall Street Flickr Cover Art USGS Maps

Metropolitan Museum

Top

NASA Images Solar System Collection Ames Research Center

Brooklyn Museum

News [more]

VoA - The Wayback Machine Shows History of the Internet
Discover - Archivists Want AI to Help Save, Analyze Everything Trump Says
LA Times - Fearing climate change databases may be threatened in Trump era, UCLA scientists work to protect them
Motherboard - All References to Climate Change Have Been Deleted From the White House Website
GCN - Scratching the surface of the Obama administration’s social media data
Wired - Rogue Scientists Race to Save Climate Data from Trump
The Hindu - An Internet time machine
ZD Net - Google Chrome gets Wayback Machine extension: End to the pain of 404 errors?
Forbes - Why Aren't We Doing More With Our Web Archives?
Circle ID - History is Written and Revised by the Winners - Can the Internet Archive Change That?

Job Opportunities at the Internet Archive

Web Crawl Engineer

Web Archiving Software Engineer

Head of Digitization

Book Scanner

Senior Application Developer: Archive.org

Senior Engineer: Wayback Machine

About the Internet Archive

The Internet Archive is a non-profit with a huge mission: to give everyone access to all knowledge the books, web pages, audio, television and software of our shared human culture. Forever. Based in San Francisco and with satellites around the world, the Internet Archive staffers are building the digital library of the future--a place where anyone can go to learn and explore. Our 160 engineers, book scanners, librarians, designers and team members have built the 250+ most popular website in the world. (https://archive.org) Internet Archive is a non-profit digital library offering free universal access to books, movies & music, as well as 464 billion archived web pages.

Web Crawl Engineer

Location: Inner Richmond, San Francisco, CA or Remote

Job Classification: : Full-time, Exempt

Job Summary: The Internet Archive is seeking a Web Crawl Engineer for its Web Archiving Group. Our crawl engineering team is responsible for capturing and managing the highest quality content from the web. An ideal candidate demonstrates independence and initiative, is a problem solver, works well autonomously, and is technologically savvy. Additionally, the ideal candidate is open to being trained on, and helping advance, best practices and standards around large-scale web harvests, web data processing and engineering, and contributing to the development of new harvesting, access, and analysis tools.

The position will work in the Web Archiving Group in support of web harvesting services and programs working with partners ranging from national libraries and archives to collaborative international initiatives supporting the collection, preservation, and accessibility of web content. The role will help design the strategy and implementation of web archiving services using open source technologies and platforms, develop harvest techniques and tools to enable archival capture and re-rendering of rich media, streaming content, social media, as well as traditional web page content. The position will also create tools, services, and workflows to improve crawl analysis, reports, data management and derivation, and identify technical, operational and data analysis requirements. This role contributes to defining deployment architectures and workflows, managing data at scale, and monitoring production systems.

Essential Job Functions:

Running large-scale web harvests on global and national domain levels and focused and specialized crawls using Heritrix, our open-source crawler, as well as other open-source technologies developed internally, including Umbra, Brozzler, warcprox and others.
Configuration, monitoring, and improvement of large-scale, multi-machine web crawls to ensure their quality and timely completion.
Processing, analysis and quality assurance of archived web content to ensure it is complete and of the highest quality.
Contribute to development of tools for automated analysis and reporting of crawl material, and to development projects focused on crawling, processing, and access.
Manage both large ingests and exports of web data, derivatives, logs, and reports.
Demonstrated experience of delivering on commitments with deadlines and project time lines and working in a collaborative team of engineers and project/product managers.

Minimum Qualifications:

Experience with web crawlers or scrapers, especially Heritrix
Proven experience in Unix shell scripting and Python coding required
Solid experience in Internet protocols (HTTP is must.) Strong knowledge of HTML, JavaScript and Web technologies in general
Knowledge of building and deploying web applications, databases, web-host services, and knowledge of basic Linux system administration
Ability to work in, and enjoy, a loosely structured work environment

Preferred Qualifications:

Cluster computing experience is preferred, especially familiarity with Hadoop and related technologies and tools
Experience or familiarity with Java strongly preferred
Experience with applications designed to display archived web content, especially server-side apps and Wayback
Experience with development environments and system monitoring/administration tools
Experience with open source practices, version control, and code review
Experience with Atlassian tool sets
Flexibility and a sense of humor are a plus

Requirements: Bachelor's Degree in Computer Science or a related field, five years of progressively responsible experience in software development.

Reporting Structure: The Web Crawl Engineer reports to the Director of Web Archiving and works closely with other departments. The position works alongside other web archiving engineers as well as program staff in Web Archiving Group and with the broader Internet Archive infrastructure and engineering teams.

To Apply: Please send your resume and cover letter to jobs+crawlengineer@archive.org with the subject line "Web Crawl Engineer."

Internet Archive reserves the right to revise job descriptions or work hours as required.

Internet Archive is an Equal Opportunity Employer and a 501(c)(3) non profit library founded in 1996.

The Archive will consider for employment-qualified applicants with criminal histories in a manner consistent with the requirements of the Fair Chance Ordinance.

Web Archiving Software Engineer

Location: Inner Richmond, San Francisco, CA or Remote

Job Classification: Full-time, Exempt

Job Summary: The Internet Archive has over 24PB of unique digital information, all running across an integrated cluster of over 700 VMs on 500+ bare-metal hosts in 3 data centers. We are looking for a smart engineer with experience in defining and building service APIs. The ideal candidate will also have experience creating software that interacts with systems at high transaction rates while delivering reliability and performance of both internal and public-facing web applications. All candidates must be able to work collaboratively within our Web Archiving team of talented engineers and program staff.

Essential Job Functions:

Build, test, and package APIs for the transfer of data out of a repository of web archive files
Consume external APIs to enable the ingest of external data into web archive files
Deploy, administer, and tune tools that support the software development infrastructure and data management and processing environments used within the Web Archiving group
Analyze, manage, transfer, and maintain large amounts of archival data in multiple environments
Participate in monitoring, maintaining, and restoring the health of the storage and computer cluster and key processes and services related to crawling, indexing, and access to archived web content

Minimum Qualifications:

Fluency in Linux environments, scripting and/or programming skills, development of custom tool integrations

Proven experience in Unix shell scripting and Python required

Demonstrated experience building or working with APIs
Experience deploying and administering database, search, and web-host services

Proven experience open source practices, participation in open source forums, and staying current with industry trends
BS in Computer Science, or equivalent work experience

Preferred Qualifications:

Familiarity configuration of software development environments and cluster administration tools, including Git, ELK stack and monitoring tools: Nagios, Graphite, Grafana, etc
Knowledge of evolving database or analytics tools, especially Hadoop, Druid, or RethinkDB
Experience or familiarity with Java is a plus
Experience with Atlassian tool sets
MS in Computer Science or equivalent work experience
Flexibility and a sense of humor

Reporting Structure: The Web Archiving Software Engineer reports to the Director of Engineering and works closely with the Director, Web Archiving Programs. The position will also work alongside other systems, applications, and QA engineers as well as program staff in Web Archiving Programs team.

To Apply: Please send your resume and cover letter to jobs+webarchivingengineer@archive.org with the subject line "Web Archiving Software Engineer."

Head of Digitization

Location: Location: San Francisco, CA, remote possible.

Job Classification: : Full-time, Exempt

Senior management position to manage and expand our digitization of millions of books, audio records, films and videotapes to build one of the world's largest digital libraries.

Reporting to the Digital Librarian, the Head of Digitization will have overall strategic and operational responsibility for Internet Archive's 70+ digitization staff in 8 countries, programs, expansion, and execution of the group's mission.

This requires managing people, setting up facilities, creating production processes, and working through process improvements.

Responsibilities

Triple production rates and expand media types efficiently digitized

Build production processes and manage to them

Manage 70+ staff and volunteers working in libraries in multiple countries

Manage the contracts and operate a couple remote "super scanning center"

Work closely with our partner libraries and vendors

Identify and resolve bottlenecks within the workflow that hinders quality, partner satisfaction, and efficiency.

Develop new and interesting partnerships

Track and communicate production throughput and productivity.

Project manage hardware/software releases across all scanning operations

Develop strong working relationships with engineering, finance, administration, and HR teams

Qualifications:

Engineering mindset and approach to production processes

Worked internationally in setting up and operating factories

Track record of effectively leading and scaling a performance-based organization and staff

Unrelenting commitment to quality and efficiency

Desire to travel

Ability to work effectively with diverse groups of employees and library partners

Passion, integrity, positive attitude, mission-driven, and self-directed

Engineering degree with at least 5 years of senior management experience.

To Apply: Please send your resume and cover letter to hr@archive.org with the subject line "Head of Digitization."

Book Scanner

Location: Location: Washington DC area

Job Classification: : Full-time, Non-Exempt

The Book Scanning Operator "Scanner" digitizes and helps de-bug the scanning process in the Internet Archive scanning centers. The Internet Archive has an immediate opening for a Scanner in the Washington DC area.

Desired Qualifications:

High tolerance for repetitive tasks.

Attention to detail.

Ability to assess image quality and if a page has been skipped.

Average computer skills.

Willingness to do first level of troubleshooting.

Ability to communicate with others about problems or solutions.

Must be able to sit/stand at a scanning device constantly.

Patience and a natural curiosity about how things work is required.

Previous imaging experience is not necessary.

This is a non-exempt hourly position. Benefits include; medical, dental, FSA/DCA, 403B, LTD, life insurance.

To Apply: Please send your resume and cover letter to hr@archive.org with the subject line "Scanner DC."

Senior Application Developer: Archive.org

Location: San Francisco, CA

Job Classification: Full-time, exempt

Job Summary: The Internet Archive has a huge corpus of digital information. Every day, our team of development engineers creates tools and applications that help our users to access and work with 22 petabytes of content that includes millions of books and texts, millions of hours of video, millions of audio tracks, and over 450 billion web captures. We are looking for smart engineers to help develop next generation of web-based applications and tools that will be used by libraries and archives around the world to build and manage curated collections of books, texts, web, and image content. The ideal candidate will be a strong programmer who has successfully led and completed several projects involving large or intricate web applications or services, and who works collaboratively with talented engineering colleagues.

Key Responsibilities:

The responsibilities of this position are to be part of the team that will maintain and evolve the Archive.org web site. More specifically, this means:
Work at the direction of the technical project lead to continue to evolve and enhance the next generation of the archive.org web site.

Minimum Qualifications:

Passion for delivering delightful end-user experiences when interacting with delivered web applications and services.
Extensive work experience with Javascript, HTML5, and CSS.
Extensive experience developing applications and websites in PHP
Work history that includes integrating front end user interfaces with search, database , and business logic to create integrated applications and services.
Experience working with digital media files and metadata structures
Experience developing and maintaining structured APIs
Good understanding of latest web framework technologies and protocols
Fluency in Linux environments
Flexibility and a sense of humor

Preferred Qualifications:

Strong programming experience Python.
Experience open source practices and participation in open source forums
Experience working with time-based digital media (audio and video).
Specific experience with Atlassian tool sets (Jira, Confluence)

Reporting Structure:The Web Application Developer reports to the Director of Engineering and will work closely with the web archiving and TV archiving teams. The entire staff is guided by founder and Digital Librarian, Brewster Kahle.

To Apply:Please send your resume and cover letter to Jobs+Seniorapplicationdeveloper@archive.org with the subject line "AE-106: Web Application Developer."

Internet Archive reserves the right to revise job descriptions or work hours as required.

Internet Archive is an Equal Opportunity Employer and a 501(c)(3) non profit library founded in 1996.

The Archive will consider for employment-qualified applicants with criminal histories in a manner consistent with the requirements of the Fair Chance Ordinance.

Senior Engineer: Wayback Machine

Location: San Francisco, CA

Job Classification: Full-time, exempt

Job Summary:The Internet Archive's Wayback Machine is the world's largest public archive of historical web sites. Have you ever wanted to work with 450 billion things at once? Would you like to serve 1,500 requests per second? How about having your service referred to regularly in news articles and blog posts across the web? You can work on a challenging and popular project and help the world at the same time.

We are looking for a smart, collaborative and resourceful engineer to help develop the next version of the Wayback Machine. The ideal candidate will possess a desire to work collaboratively with a small internal team and a large, vocal and active user community; demonstrating independence, creativity, initiative and technological savvy, in addition to being a great programmer/architect.

Minimum Qualifications:

2-3 years work experience in Python, or similar
Experience working in Linux environments
Familiarity with Java (current deployment is written in Java)
Good understanding of latest web framework technologies and aspects of web technology and protocols
Flexibility and a sense of humor
BS Computer Science, or equivalent work experience

Preferred Qualifications:

Experience with web crawlers and/or applications designed to display archived web content (especially server-side apps)
Cluster computing experience
Open source practices experience

To Apply: Please send your resume and cover letter to Jobs+SeniorWaybackEngineer@archive.org with the subject line "Wayback Machine Senior Engineer."

Internet Archive reserves the right to revise job descriptions or work hours as required.

Internet Archive is an Equal Opportunity Employer and a 501(c)(3) non profit library founded in 1996.

The Archive will consider for employment-qualified applicants with criminal histories in a manner consistent with the requirements of the Fair Chance Ordinance.