Reconstructing web sites from cached data

This paper written by a quartet of guys from the CS department at Old Dominion is quite fascinating. They present a software solution for recovering and reconstructing web sites from various Internet archive sources and cached search engine data. The tool is called Warrick and, while I expect I’ll never have to use it, it’s nice to know it’s there if needed.

If nothing else, the paper gives a good overview of how web site caching actually works. And their description of the tool is quite intriguing. They must have spent a good deal of time on it.