Saturday, June 26, 2010

"Saving" Broken Links with the Internet Archive Wayback Machine

The Internet Archive's Wayback Machine gives us snapshots of web sites as they were on previous dates. This can be a very useful tool for people who are looking for web sites that they visited in the past. For genealogy, the archived web pages can serve as both good and bad research tools. A person's genealogical research naturally evolves over time. New information is found, mistakes are corrected, and data is edited. In theory, a person's genealogy web site should also evolve over time. So, a glimpse at an archived copy of a web page can help you see how the evolution of a web site—and through that the evolution of the research—happened. The Wayback Machine serves as a bad research tool if the reader forgets that they are looking at an old and possibly outdated version of the author's research. It serves as a good research tool for those who use it to help them track down the original author or use it as a lead for possible research avenues to follow.

I've decided to use the Wayback Machine to help me "save" broken links. In maintaining Cyndi's List, broken links are the bane of my existence. They create more than half of my workload. Links become broken when a person does any of the following:
  • moves their web site to a new address
  • deletes their web site from the Internet
  • changes or rearranges the layout, and thus the page addresses, of their web site
Cyndi's List is 14 years old. With more than 280,000 links, it isn't possible to avoid broken links. I started the site in 1996, so I've seen many web hosting services come and go. But most of them happened sporadically, giving me enough time to keep up with the address fixes. Recently, several popular hosting services have rapidly and completely gone away:
  • GeoCities (and Yahoo! GeoCities)
  • AOL Hometown
  • ATT Worldnet
  • MSN
  • Compuserv
  • Some, but not all, Prodigy sites
For all Personal Home Pages and Surname sites on Cyndi's List, I've done a mass-replace on the majority of that list of addresses to point to the Wayback Machine version for those sites instead. In doing this, users of Cyndi's List won't receive a broken link error for those addresses. Instead, they'll be redirected to an index for those sites on the Wayback Machine. For example, instead of this address: http://www.geocities.com/donmacnab/ they will be pointed to this address: http://web.archive.org/web/*/http://www.geocities.com/donmacnab/ From there, they can see versions of that web page dating from August 2000 through February 2005. All links on Cyndi's List that point to the Wayback Machine are clearly labeled.

In doing this I am solving two problems. First, it was a quick way to fix several thousand broken links all at once. Second, it "saves" the links for potential viewers. Instead of deleting them you will all have the chance to view what used to be online. Hopefully, this will help your research.

See also:

3 comments:

Deason Hunt said...

A great idea.

Philip said...

I think you should take a look at reocities.com, which saved a lot of the geocities web sites from oblivion. Basically, you just switch the g to an r in the URL, and in most cases it will work. Not all sites were saved, unfortunately, but if it was saved it will probably offer a better experience than redirecting to the wayback machine.

cjblythe said...

Hi Cyndi and thanks for including a link to my site in your database.

This is a great idea! I too am starting to have issues with broken links and had no idea of this. I will definitely look into it.

Christine Blythe
http://www.emptynestgenealogy.emptynestheritage.com