Saturday, March 7, 2009

To Use the Wayback Machine or Not Use the Wayback Machine...

I've got a dilemma. Something I can't seem to make a decision on. Whether or not I should link to archived links within the Wayback Machine Internet archives. I'll elaborate more on my dilemma in a moment.

First, visit the Wayback Machine (part of the Internet Archive) to see what I'm referring to. It was founded in 1996 with the mission to archive all web pages on the Internet. There are many pros and cons to the idea of archiving web pages. You can find conversations on this topic throughout the message boards on the Internet Archive web site. Briefly--once a web page has disappeared from the Internet it is nice to have an old, archived copy to view in order to find information you might need. As a genealogical researcher you might find important data that had been published in the past, but is no longer available on a current web page. Handy. However, on the con side of this issue we run into privacy and intellectual property issues. Does a third-party web site have the right to archive your written works and republish them on their web site without your permission?

You won't find Cyndi's List on the Wayback Machine. Early on I ran into a problem with the archives. People started reporting problems to me regarding broken links to their web sites. I looked on Cyndi's List and found that the links were not broken. Upon further investigation I found that they were referring to outdated links that they found on the Wayback Machine on archived versions of Cyndi's List. Of course, I ran into this same problem with cached versions of the site found on Google, and I periodically get these reports from people of broken links they find in the RootsWeb mailing list archives for the CyndisList Mailing List. Each time I have to explain to these people that they are viewing archived versions of links, not the live versions found on my site. To resolve this problem I have had Cyndi's List removed from archiving on Wayback and Google. The very nature of my site is to be as current and up-to-date as I can make it, so there is no need for archived versions with outdated links.

So, here is my problem. Tonight I'm working on a broken link to a census transcript for a small village in England. The page is no longer online, but it is archived at the Wayback Machine. To date I have always tried to find a replacement URL for a broken link. If there are no replacement addresses available I will delete the link. Instead, should I be updating the link with a URL that points to an archived version on Wayback? On one hand, I want to provide the link to get people to the genealogical information they need in their research. That is my purpose in maintaining Cyndi's List. On the other hand, I realize that the owner of the original web site must have had a reason for removing the page from the Internet. Maybe they lost interest in genealogy. Or they died. Or they didn't feel like sharing anymore. Or maybe the data was no longer free for them to publish. Perhaps they didn't have permission to publish it in the first place. Or maybe they donated it to someone else to publish. I could go on and on with possible reasons. Is it up to me to point people to an archived copy of a web page that the original author may or may not want shared?

The genealogist in me wants to help others and point to the archives. The writer/publisher in me doesn't want to set a precedent by linking to the archives only to find out that it becomes a problem down the road when someone objects. Further, if there are any copyright issues involved I can be held responsible by linking to the archives because it facilitates the propagation of the copied work by sending people in that direction.

Of course, more problems may arise when it comes to fixing broken links in the future if the Wayback Machine web programmers make changes that alters the configuration of the URLs causing all of the archive links to break. Sigh.


Drew Smith said...

While the Wayback Machine may change its URL structure in the future, that doesn't put you in any different situation than you already face when some other website changes its URL structure.

I don't see how linking to material in the Wayback Machine puts you in any kind of copyright peril different from what you already face for linking to things on the regular Web. For all you know, there may be copyright issues there, too.

The situation with the Wayback Machine seems analogous to the relationship between in-print and out-of-print books. As a writer, you might cite an out-of-print work, knowing that it is still in at least some libraries or can be obtained from a used bookstore. You'd still point people to those resources.

Cyndi Ingle said...

Drew - in regard to copyright peril, my thought is that a site may have been taken offline because of copied material or copyright infringement of some sort. With an archived copy being online I run the risk of linking to a problem.

Cyndi Ingle said...

Drew - I like your analogy with out-of-print books. And I think you're right on the other points too. I'm going to give this a try and see how it goes.

brown-eyed said...

I wouldn't link from the Wayback Machine. Much of it is outdated or corrected or redesigned.

Cyndi Ingle said...

brown-eyed - by nature the Wayback machine is outdated because it is an archive of web pages from the past. The point in me linking to something on Wayback would be for when a page has disappeared from the web and I can't get the info any other way than an archived copy. This would be most useful for things like census transcriptions, etc.

Michelle Nichols said...

Check out the Australian version called "PANDORA : Australia's web archive." It is maintained by the National Library of Australia see

It was established by the National Library of Australia in 1996, and now collaborates with nine other Australian libraries and cultural collecting organisations.

The name, PANDORA, is an acronym that encapsulates our mission: Preserving and Accessing Networked Documentary Resources of Australia

Ken Rury said...

I just checked the Pandora site and PANDORA is a selective archive of significant Australian online publications and web sites considered to be of long-term research value. Less than 1% of the au domain.

I was able to find the immigration information, birth,death, marriage of a family I have been recently researching there. So thinks for mentioning it.

I don't have an issue with you linking to historical information that is no longer posted, though it may be inacurate information so should be noted to be point in time info. I think if you were linking to Wayback to embarass someone that would be a different issue.