Archiving a Web Page

Good website governance includes the task of archiving Web pages. Archiving a Web page is a fundamental task of Web content management and falls near the end of most models of the Web content lifecycle. View it as a part of your complete content management strategy. Managing the information you intend to deliver to audiences is like tending a garden: you need to pick out the weeds and the dead or dying parts so that what remains flourishes.

Archiving is a necessary process as Web pages are reviewed and updated; as new information architectures are being considered and then put into place; or as content audits uncover duplicate, orphan, or little-visited pages (your lonely pages). For some organizations, archiving a website may be governed by laws and regulations, and professional services are available to support these activities. If this is the case, you might want to consult your internal legal or records management staff for guidance.

If you are a Web Content Manager and do not have access to professional support services, and you are not regularly auditing your pages and archiving unnecessary pages off your site, you may find yourself trying to manage an unwieldy mess of pages, folders, and related information assets (documents, images, etc.). Now imagine the experience your visitors are having as they try to navigate to the information they need to do the tasks they want to accomplish at your website. They might feel like they’re hacking their way through a forest of outdated or misplaced content in search of a clear path.

Realizing the importance of archiving Web pages, I developed a process anyone can use for sites built with static HTML pages.  Typically, the task of archiving Web pages falls to a Web Content Manager, not software engineers or developers on the IT staff. I hope you’ll find this to be a simple, content-friendly process or one that you can take and modify to suit your needs.

Definition of Archiving a Web Page

Archiving a Web page might mean one of two things:

1. Placing “Notice of Archive” information on a Web page to inform visitors that the page is no longer being updated by website owners and is being left in place in the site architecture for some reason (historical, informational, etc.).


2. Completely removing a page from a website.

Why Archive a Web Page?

Managing a website means managing content all the way through the Web content lifecycle. Stages in the lifecycle include content creation, revision, distribution, and, finally, some form of archiving for (possible) re-use.

Archiving should be done as needed to help maintain the integrity of a website. Websites age, content owners and projects come and go, and Web Managers remain challenged with keeping content fresh and engaging. Archiving removes unnecessary pages from your website, helps visitors accomplish tasks more easily on your site, and supports site optimization. Archiving (in the sense of completely removing a page) keeps a record of every page that has been on your website in its original information architecture; so if you ever need to repurpose or rollback content, this can be readily done.

Which Archiving Choice is Right for a Page?

As a Web Content Manager, you’ll want to consider the needs of your audiences when considering what type of archiving a page needs.

A “Notice of Archive” may be appropriate for a page with “timely” information, that is, related to a project. When the project is completed, what should be done with the page(s)? Remove them completely? Or “freeze” them in time on the website but with a “Notice of Archive” placed on the page?

Complete removal of a page is needed for duplicate pages, orphan pages, or pages that have been determined to be no longer useful to the website’s information architecture.

“Notice of Archive” Process

Placing a “Notice of Archive” on a page is done by drafting appropriate language and placing it on the page where visitors can easily see it. Between these steps you’ll want to vet the notice through internal governance structures, of course, including legal and marketing departments. Point forward, you’ll want to make sure that the page content truly is frozen. Consult with internal IT staff on how best to accomplish this.

Examples of “Notice of Archive” language include the following:

  • This page is obsolete and kept for historical purposes only.
  • This page was placed in “online archive” status since [date] and is kept within our site architecture for historical purposes only. See current [page].
  • Notice of Archive. The contents on this page remain on our website for informational purposes only. Content on this page will not be reviewed or updated from [date].
  • This page is available primarily for archival purposes. Page and contact information may be out of date; please [Contact Us] for current information.

You may want to place the notice within a standard image or box, so consult with a graphic designer or Web developer as needed. Make sure the text is retrievable through your site search engine.

Complete Removal (Archiving) Process

Pages removed from a website contain information that might be needed later for any number of reasons. You may want to re-purpose content or other information assets (documents, images), you may need a record of project activities, you may even need it for legal purposes. Whatever the need, the archive should contain everything as it was presented on the live website. So you’ll want to replicate three areas: the content (including documents and images), the code, and the site architecture. For your own benefit, the replicated archive should be easily searchable.

First, re-create your website’s folder structure in a secure environment:

screen shot of simple folder structure

Next, use MS Word documents to capture each Web page’s information assets. Information at the top of the page to include would be the page URL, the date the page was removed from the live site, documents/images linked from the Web page, content owners, etc. Taking a screen shot of the page as it presents to visitors on the live site can be a useful record, as will the complete source code for the page. Simply copy and paste source code from the View > Page Source toolbar in a browser.

During this process you’ll want to check all links (internal, external, and inbound). Archiving can wreak your SEO if it results in “404 Page Not Found” errors. (Use the Google Webmaster Tools internal link checker for this.) And you don’t want to break links to sites with which you have established relations. Inbound links to a page can be checked through Google. Just type “link:” followed by the URL, and Google will present a list of sites linking to that URL. With this information you can devise a plan: will you create re-direct pages, move certain page content elsewhere on your site, or contact sites linking to only your most popular pages? The upshot is: Think about links before you archive a page, work with current and future content stakeholders, and find appropriate content-based solutions.

Place the MS Word document in its proper folder, and name it according to its complete Web file name.

screen shot of index file in folder structure

If there are documents or images that exist as links only on the page being archived, then those documents and images should be removed from the live website and placed in the archived site in their replicated locations.

A final step of documenting this process would be to make appropriate notations in your content inventory workbook. Create a tab in the workbook inventory for your archived pages and note the date of archive along with any other information you need to track.


Strategy plays a role in archiving, too, because it will impact the site’s overall content strategy. If you manage a particularly large website, you may want to develop an archiving policy and process as part of your website governance documentation before beginning any archiving work. You’ll want to sort through content review cycles, content ownership issues (individually and across departments), and time limits for justifying the continued existence of old content.


Archiving Web pages from a website is part of best practices of website governance. Without proper archiving of individual Web pages, your site can become unmanageable to you and a frustrating experience for your audiences.

The process presented here is easy to implement and will keep a complete record of your website’s code, content, and information architecture, for historical purposes or future re-use. As one stage of the Web content lifecycle, archiving (when done properly) can be a vital part of how you manage your website and its information assets.


Thank you to David Harbottle at Typeclear and Pete Stevens for their reviews and helpful comments.

Robert Jacoby is a past Website Manager for the Johns Hopkins School of Public Health and Editor-in-Chief of the American Medical Writers Association Journal. He is currently enrolled in the Master of Information Management program at the University of Maryland’s College of Information Studies.  You can reach Robert by email at rajacoby at gmail dot com.

Posted August 19, 2010.

Comments (1)

1 Comment »

  1. Robert, I really enjoyed your article. I am often preaching the same thing to clients and potential clients about the value of website archiving.

    Our company SiteQuest Technologies offers a product called Compliance WatchDog that automatically performs this service at a very affordable rate. It not only makes a daily snapshot of the site, but it consumers and stores every element of your site so that a manual process is no longer necessary.

    Our service runs less than$50/month for a typical site, which is cheaper than the time spent doing it manually. We also have audit reports that quickly show you which pages haven’t changed in a while which might be the pages that need to be updated or retired.

    Thanks for the great post Robert!


    Comment by James Cella — August 26, 2010 @ 6:49 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WP Hashcash