I'm an avid user of Microsoft OneNote 2007. I keep all my notes in it. I even wrote the outline and first draft for this post using it. I upgraded to the 2007 version specifically because it allowed the creation of hyperlinks between documents. Unfortunately those hyperlinks aren't worth a darn because if you move a page then all the links to anything on that page get broken, even links from within the same page. Many links don't even work correctly when you first make them. It is incredibly frustrating.
So, I have been stewing on a way to create a note-taking application based on HTML rather than Microsoft's proprietary format. I quickly realized that links created within this new application would also break as soon as the user moved a page in the collection. Sure, I could require the user to always and only use the application to move the pages then have the app update all the links to a page whenever it is moved. However, this would only work if the user made sure to use the application to move the pages and never forgot and simply moved them manually. It would also make moving pages pretty darn slow because it would have to search through every page in the system to find links to update. Therefore, I have been trying to think of a way to quickly find the new page location and update the links as necessary.
In what seems like a separate issue, I have noticed that academic papers often exist in multiple different locations all over the internet. Sometimes the file is named appropriately but oftentimes it is not. Sometimes there is good descriptive text surrounding the link to the file but oftentimes not. Sometimes the original file can still be found exactly where you first referenced it five years ago but usually not. This means that finding a current reference to an original academic paper for which you only have old citation information can be quite daunting. So I have been also trying to think of a way so that one could use a single link to refer to any one of the multiple identical copies of that document no matter where it was actually located on the internet and instantly retrieve that document, even if the original was no longer in place.
I had been thinking about using some kind of indexing system to enable one (or one's browser) to find these moved web pages. This morning, as I was waking up it finally hit me how to solve both of these problems and eliminate the vast majority of 404 errors at the same time. I call this system "Self Healing Hyperlinks."
The basis of the system is to insert additional information into the URL in a link so that either the target web server or the user's browser can find that target even if it has been moved. This additional information consists of domain and/or globally unique HTML element ID values which are included as attributes in the elements of the link target. The system also requires an indexing engine to be installed as a plug-in for the web-server software in order to index and look up these element IDs. When a broken link sends a browser to the target web site, that web server can look up the new location in its index rather than return a 404 error. One or more global indexing servers would also be set up to crawl the internet looking for documents that contain these special element IDs. Then, when a browser cannot find a target that was linked to using this additional information and the target web server did not return a replacement page, then the browser can query the global link database and still find the document. The system does not require any additional scripting in the web pages or the on the server. The web server and browser plug-ins would do all the work.