I’ve seen several articles on this in the past week or so, and hadn’t thought about it much. Then I was looking at Maximum PC this morning. Another article appears there.
I then realized why I was not very excited about this concept. The idea is that it allows a way to identify a page, so that it wont be cached in more than one way, or more than one time.
This lack of redundancy is against everything I was ever taught about computers. The idea of redundancy is a good one. It simply makes sense. It was an idea that was used long before the days of computers (ever heard of forms in triplicate?). Having a copy of important things in another location is a preventative for disaster, such as having a copy of all your important documents in a safety deposit box at the bank. If you’ve ever had a fire at home, or possibly a flood, you know this is a good idea.
It works with computer files as well. It was not that long ago I was looking for a copy of a CSV viewer made by Diamond Computers in Australia. When the Trans-Pac cable was cut, it was impossible to get a copy of it, because all the places it was ‘stored’ were simply pointers to the original file, sitting on a non-reachable server in Australia. For a few weeks, I was screwed. This did not become a problem until recently. File repositories were there for the exact purpose of redundancy. The pages on search engines, pointing to several places, with the pages cached, serves this same purpose.
When you consider the complexity of modern day web pages, it’s actually a bit of a miracle that search engines work as well as they do. Dealing with duplicate links, especially off pages such as Amazon that may promote an individual product a thousand times or more has always been a challenge. Finally, after years of debate, Google, Yahoo and Microsoft are putting the past behind them to solve this age old issue. The solution is a simple tag that will be added to the standard link format called “canonical”.
The tag is designed to solve issues associated with multiple URL’s pointing to the same page, but may also be helpful when multiple versions of a page exist. Currently, the search engines employ a process that looks at the structure of URL’s to look for similarities. This generally works pretty well, but is far from perfect. It is considered to be somewhat rare for search engines to come together on any issue, but it isn’t unprecedented. In 2006 they joined forces to put unanimous support behind sitemaps.org, and in June of 2008 they jointly announced new standards for the robots.txt directive. Matt Cutts of Google and Nathan Buggia of Microsoft claim this new approach should help reduce the clutter on the web, and improve the accuracy of all search engines.
Even though these tags won’t completely solve all the duplicate problems found on the web, it should significantly enhance the indexing performance of search engines, particularly on e-commerce sites. The new tags will be discussed in depth at this year’s Ask the Search Engines panel at SMX West.
Disk space is cheap. Once something is gone, it’s gone forever. Redundancy is also good in times of peak usage.