The Technium

Movage

Digital continuity is a real problem. Digital information is very easy to copy within short periods of time, but very difficult to copy over long periods of time. That is, it is very easy to make lots of copies now, but very difficult to get the data to copy over a century. For two reasons:

1) Formats change. Because of rapid technological evolution the "language" which one storage media speaks can become obsolete (incomprehensible) in only a few years. Or the hardware that speaks that language becomes so rare, it cannot be accessed. Who can read the data on ten-year old floppy disks?

2) The storage medium itself can decay. Turns out that paper is much more stable over the long term than most digital media. Magnetic surfaces flake, peel, shatter. And the supposed durable CDs and DVDs aren't very stable either.

Dvd-Scratches-Main Full

As an example of the latter, here's New York Time's tech guru David Pogue lamenting the unadvertised short lifespans of homemade DVDs:

I’ve got all of the original iMovie projects backed up on DVD, in clear cases, neatly arrayed in a drawer next to my desk.

Guess what? On the Mac I use for video editing, most of the DVD’s were unreadable. They’re less than four years old!

I know, of course, that home-burned DVD’s, which rely on organic dye that deteriorates with time, are nowhere near as long-lived as commercially pressed discs. But man. Four years? Scared the bejeezus out of me.

OK, listen up people!

The only way to archive digital information is to keep it moving. I call this movage instead of storage. Proper movage means transferring the material to current platforms on a regular basis -- that is, before the old platform completely dies, and it becomes hard to do. This movic rythym of refreshing content should be as smooth as a respiratory cycle -- in, out, in, out. Copy, move, copy, move.

In other words, anything you want moved to the future has to be given attention to keep it moving forward.

We don't know what the natural movage respiration cycle is for digital media yet since it is still very new, but I suspect the cycle is much shorter than we think. I would guess it is 5 years. No matter what digital format you have your precious stored on, you should expect to move it onto new media in five years -- and five years after that forever!

Move it, move it, move it.

Posted on December 11, 2008 at 9:24 AM | Comments (21)



Comments

Besides physical media, there is the problem of software. I’m thinking about people who created documents in now defunct programs. How do you open those old files unless you convert them to more recent formats?

Posted by Jonathan on December 11, 2008 at 10:43 AM

here is how i solved the hardware part of the problem for myself: i keep all my data on the built-in hard drive of my laptop. no dvds, no external drives. this way i can always be sure the physical media is still working. whenever i run out of disk space i buy a larger hard disc (and one for backup of course) and copy everything over.

in addition i push the really important stuff to amazon s3 regularly - in the cloud there are no physical media. oh wonderful world.

Posted by alex on December 11, 2008 at 11:44 AM

How do you know that files don’t die on S3? After all, S3 doesn’t guarantee permanence…

Posted by Grok2 on December 11, 2008 at 1:21 PM

A thought just struck me. Is this the beginning of digital Darwinism? To survive, information has to grab attention so as to be moved to the next media type. Information which can’t do this, effectively dies.

Posted by Chris Collins on December 12, 2008 at 3:13 AM

@Jonathan: you convert them to more recent formats, otherwise you have to store the context in which they were readable, i.e. the hardware (or at least an emulator), the OS, and the program needed to read it.

This means that for pictures and video, you should be storing them in lossless formats, which means little or no compression. Luckily storage keeps getting cheaper.

Also, I side with Alex: I keep all my data on my hard-drive and a copy on Amazon S3, automatically updated every night. I never even considered burning dvds; backup should be completely automated or it will fail. Nobody is that disciplined.

Posted by jungle on December 12, 2008 at 3:35 AM

There’s a presumption here that digital immortality is a universal good. I suspect perfect digital memory isn’t desirable in all cases. There’s a social, not to mention economic, benefit that comes from forgetfulness. Certainly on an individual scale one wants to retain data. On a larger time scale, universal permanence seems to pose problems. But as we’re apparently doomed to forget the past and to make the same mistakes over and over, it seems certain that data will be lost in the process. Someone at least will be employed remaking that which has been forgotten.

Posted by Thomas Claburn on December 12, 2008 at 7:04 AM

Chris Collins - yes indeed, in fact it’s not “the beginning” of information Darwinism, it’s part of a long continuum. Think of all the writings and creations of centuries past, what has survived to the present? Those items which were deemed valuable enough to have many copies made, and to be translated into new languages as they emerged.

What has changed in the digital world is a vast expansion in the amount of “content” we create, and an acceleration of the process of obsolescence. But the process is essentially the same: those things that are perceived to be of most value on a continuing basis are preserved.

Now, there is real loss here, because information items which are not seen as of great value for many years may suddenly become much more important decades later. Old scientific data is sometimes needed to be looked at based on new insights, but its value may be essentially zero for a long time, and the data likely lost, first. The early digital footprint of a future celebrity has great value only after fame arrives.

What should be preserved, when we will never be able to preserve everything? Somebody has to make judgments about the value of items now, just as in the past; the only difference is those judgements are now on the petabyte scale, rather than the kilobytes of the ancient past.

Posted by Arthur Smith on December 12, 2008 at 7:42 AM

Without commenting on the life span of the dyes, I’ve had dvds and cds that were unusable minutes after being burned that the burning software said were fine: The software didn’t check to see whether the disk had the correct data-it just checked to see if the data was readable and so was evilly saying a corrupted burn was fine.

One must do a bit for bit compare of the newly written data to the original data. I assume is lotsa s’ware that can, I don’t recommend Compupic anymore which does a safe move and assumes that if source and destination files are identical you don’t want to be bothered.

Posted by ST on December 12, 2008 at 11:32 AM

Move it or lose it. Still true.

Posted by Kent Schnake on December 12, 2008 at 6:16 PM

Future business opportunity: archive numerous old digital technologies so you can offer translating and moving services years from now.

Posted by Jeff Lindsay on December 13, 2008 at 11:40 AM

I suppose this is as good an answer as any to the old question “Why do we die?”

Our egos may not like it, but we evolved as vehicles for our genes. A regular cycle of death and replacement by offspring doesn’t serve our purposes, but it exactly suits our genes’ purposes. We are the genes’ DVDs and we are programmed to make copies of ourselves after only 20 years or so, even though we may last several times as long. And that’s with humans; with many life forms, it’s more like “don’t wait more than six hours to make a replacement!”

Posted by Tom Buckner on December 14, 2008 at 8:06 AM

Sometimes preserving data means preserving a system. In order to achieve more stable types of formats sometimes is enough to keep the functions of the program instead of the structure. Mechanisms capable of simulating the situations are very useful if the original is not available. Still we should start looking for naturally stable structures if we want to save some data from the mass produced discs…

Posted by 2008Af on December 14, 2008 at 3:25 PM

Heres a thought: Your distinction of storage and movage echoes a live debate in the world of genetic conservation. Institutional attempts to conserve genetic diversity (eg by the gene-banks of the CGIAR) often revolve around so called ex-situ conservation - basically storage of seeds in large freezers. Farmers have argued that the only way to conserve seeds is to use and replicate them in the field every year (in situ conservation) - movage if you like.

Sure enough a worrying proportion of seeds in gene banks lose their viability and won’t plant out after a few years. One reason may be because the environment itself is changing (just as the computer/software environment is changing for digital media) but also because of physical degredation (just like those CD’s).

Recently the pinacle of the ex-situ (storage) approach was implemented with the setting up of the Svalbard ‘doomsday’ seed vault in the high arctic. Interestingly it has been argued for using computer digital/metaphors - ie. as the ultimate backup for our food supply - like backing your data to some remote harddrive. If it now turns out the metaphor is faulty should we be worried about entrusting the future of our food supply and genetic resources to storage rather than movage?

Posted by jim thomas on December 14, 2008 at 9:24 PM

One project that has thought a bit about data repair and updating over time is the Tahoe project (http://allmydata.org). We’ve developed an open-source distributed file system that takes advantage of whatever storage media happen to be available. From our experience managing a few grids ourselves, we have been able to update and upgrade hardware gradually while maintaining an intact (and growing) data set on top of it. Data death is managed by requiring users to keep leases current on their data sets, and if they do not then the data slowly degrades until there is not enough redundancy left to recover it.

Posted by secorp on December 14, 2008 at 10:39 PM

Great point - movage is IMHO a key piece of the emerging Global SuperOrganism you talked about.

If we are to treat collections of machines on the Web, including all of them, as entities, then such entities are DISTRIBUTED. In such a system, the data does not reside in a specific place, instead it is constantly copied and replicated as peer machines go in and out of being part of the system (entity). Think of P2P e.g. BitTorrent as an example - there is an enormous amount of data stored there, but not in any specific place.

One of the key properties of a useful distributed system is to be capable of self-replication, synchronization and consistency of data. In essence, for an emerging distributed system to keep evolving, it has to continue progressing in a sequence of globally consistent states.

Another example is a project I am working on, a distributed P2P search engines. The consistent state I am talking about is the state of the global index, which is fully distributed. Its consistency is manifested by responses to queries. It evolves when new links are added to the system.

Posted by Borislav Agapiev on December 14, 2008 at 10:53 PM

Looks like you rediscovered the ages old saying of the storage world : “it it’s not spinning, it’s dead”.

Posted by Jean-Marc Liotier on December 15, 2008 at 2:54 AM

so…it was a bad idea to move all my 78s to digital format?

I’ve spent the last few years engaged in this movage. It’s funny because before that, I was engaged in moving crates of records to each new home I moved into. So many of them were broken in the process that I gave up. But the physical act of moving them has its analogy here, too. As I transferred all of my vinyl to CD, so now I transfer all of my CD data to HD storage, and now into the cloud. Same with the old boxes of photograph albums, now in digital format after scanning them…the video, the writing that I used to do longhand or through the typewriter…everything being changed in format just so I can carry it along with me.

I’m hoping that eventually it gets automated, or that there arrives an affordable moving van and company to carry them into the future for me.

Posted by resonanteye on December 24, 2008 at 6:27 PM

The right measure of this problem is the amount of energy necessary to keep a set of data readable. Use as a metric something like kilowatt-hours per gigabyte-year.

This effort then becomes a matter of tradeoffs - do you pay premium prices to have the data accessable in online or near-line storage, which requires constant power? Do you go to offline storage, where the power costs are mostly environmental, but you can’t tell if you’ve had a failure until a relatively long time from the time you need it? Or do you do massive replication of data, where you consume power to make replica artifacts?

There is some regime in all of this where climate controlled storage of good quality paper is energy efficient, or at least that is what intuition would say.

I haven’t run any numbers yet.

Posted by Edward Vielmetti on December 29, 2008 at 6:50 AM

“Who can read the data on ten-year old floppy disks?”

Actually, I can — and I do, quite regularly. I have many disks for my old Apple II computers that were first written in 1982-1984, nearly all of which are quite readable. I still have 5.25 and 3.5 diskette drives on some older but functional PCs, and hundreds of disks that work in them. It costs only a bit of physical storage space to keep these old systems around, which is much cheaper than the effort to move that data.

I also have 12+ year-old burned CDs that work fine, and some burned DVDs that are over 7 years old.

The key to media life, whether paper, magnetic, or optical, is environment. Keep it cool, and watch the moisture. Keep it out of the sun. Dryish, but not too dry, is usually best. Avoid burning your house down, the media doesn’t like that.

I’m sure there will be a time when all of my media will deteriorate beyond recall. Those things that are important enough to use often have indeed been moved onto one or more hard drives. Others get reburned onto new CDs or DVDs every so often.

But, I don’t agree with the notion that media becomes obsolete quite as fast as is implied in this article. You can still buy brand-new phonograph turntables, floppy disc drives, and optical drives that will read all the most common media. Choose your media wisely, take care of your equipment, make lots of backups, and you’ll have access to your data for a long, long, long time.

Posted by Dev Rossik on December 29, 2008 at 9:05 AM

@Dev, you are a shinning counterexample. I found keeping old readers (disk players) more of a technical challenge that I wanted to assume. Taking it from a serial port to usb, for instance. There’s a niche business in transferring old data, as your local photo scanning place will show. I have a whole bunch of Word files on my Mac that are now unreadable because of continual migration from older Macs. They sit on my hard disk, so media is not the problem. I just can’t open them, even with MacLinkPlus. I have no idea why, nor what to do about them. So even movage is not enough unless you occasionally open them up and resave them.

Posted by Kevin Kelly on December 29, 2008 at 9:40 AM

@Edward: That is a fantastic metric. If you do run some numbers I’d love to see them.

Posted by Kevin Kelly on December 29, 2008 at 9:42 AM


Post a Comment

Your Name:









Thanks for your comment. The words in the CAPTCHA box come from old book texts that are being scanned and stored by the Internet Archive. By entering the words in the box, you prove you are not a bot and also you help proofread the books. If the sample you see is too hard to read, simply click the recycle button to get another two. Don't forget to put a space between the words.