Tuesday, February 7, 2012

dSRCI - .sci Top Level Domain

In my last post, I outlined a new infrastructure (distributed Scientific Research Collaboration Infrastructure) which I believe will help facilitate a rich and wonderful new way for scientists to collaborate over the internet. In addition this infrastructure will enable future employers, granting agencies and connectome researchers to analyze the patterns of collaboration by bringing the metadata about these collaborations to the surface for easy indexing and searching. One of the requirements for that infrastructure is that each scientist have a unique identifier that they can use to tag all of their work and “Artifacts of Collaboration” (AoCs). This unique identifier will be based on one simple idea: a new .sci Top-Level-domain, under which unique domain names will be issued to scientists. These domain names will exist through perpetuity, even after the death of the scientist.

A New Domain:

Each scientist will be given their own perpetual domain name under a new .sci top-level-domain. This domain name would be provided either at no cost or for a single, one-time fee and would last forever, even after the death of the scientist. This will provide that single, unique, perpetual identifier for each and every scientist in existence. While most, if not all, current scientists now have a “home page” on the web site of their current institution, the URL associated with that home page is subject to the whims of every web-master who will ever work on that site now or in the future. In addition, most scientists do not work at one institution all of their lives, and many are now forced to work at more than one just to earn a decent income.

Many may say that the Semantic Web allows for multiple different ways to refer to the same resource – even if one has to go through many RDF linkages to arrive at the original or primary URI for that resource – and therefore a special, assigned URI is not necessary. However, the ethereal  nature of RDF provides no means to prevent inappropriate duplication. Two different scientists may choose to call themselves JoeSmith and then RDF mining software would need to incorporate extraordinary measures to differentiate the two – or the hundreds of – duplicate URIs. I believe it is this very ethereal nature of RDF that causes scientists, and most others to shy away from using it. What good does it do to assign “my URI” to something if that URI may change in just a couple of years or may be accidentally mistaken for someone else’s URI. Yes, software is becoming more and more powerful, but do we really want to give it ten-thousand times more work to do just to avoid the “restrictiveness” of an assigned domain for use with URIs? Actors live with this “restriction” all the time, many even have to legally change their names in order to avoid duplication with any other actor who has ever been a member of The Actors Guild. If actors can change their names, then scientists can register for a unique domain name. In the future, I expect it to become a badge of honor. Something bestowed upon a scientist when they receive their PhD or other credentials.

Of course there would have to be some rules to ensure that the domain names were actually meaningful and easily discoverable. I don’t think “joethedinosaurhunter.sci” would be appropriate. Though many female scientists may not like this idea, I think the best system would be to simply use someone’s full, given name along with the year they were born. This should provide enough uniqueness (within the narrow scope of scientists) that there would only need to be a few alternates to avoid duplication. These alternatives could consist of using the scientist’s middle name, appending the month or even day of their birth. I would like to avoid things like simply tacking on an A or B to the end of their names as this leads to ambiguity. This naming scheme would also provide valuable information indicating the era in which a scientist has lived. 300 years from now, it will be important to be able to easily spot the difference between alberteinstein1879.sci and alberteinstein2275.sci. Hey, it could happen.

Mentioning Albert Einstein brings me to another point: All past scientists will be assigned their own domain names as well, following the same naming convention as for living scientists. Then, every time someone mentions a scientist, living or dead, they can insert the URI for that scientist within a metadata tag. Then search engines can index that reference so anyone looking for any references to that particular scientist anywhere on the internet can have one single, unique search term to look for.  (I will address possible abuses of the system in yet another post.)

URIs, of course, can also be used as URLs. URLs under the .sci TLD will be the perfect place for scientists to place web pages about themselves and their work. Here, too, it would be helpful to have some consistent structure. So I propose a basic hierarchy of directory names to contain some basic info about a scientist. For instance ScientistNameYear.sci/cv or ScientistNameYear.sci/bio, ScientistNameYear.sci/currentwwork, etcetera. I / we can work out a full structure later. Sure, scientists could follow any structure they want, but consistency makes them easily discoverable. Plus why reinvent the wheel? Everyone can just download and copy the standard template and away they go. And, there is no need for anyone to design their web page to look just like anyone else’s. All that is necessary is to embed the proper RDF tags on the proper pages for people and search engines to find. Everything else is gravy.

Just as any other domain name can be hosted on any server, these .sci domain names can be hosted anywhere the scientist chooses. They can be on the scientist’s university’s or company’s server or on a personally maintained server. The “site” can then be moved to any server in the world, as necessary, and the infrastructure will remain undisturbed. The question now arises as to who would host the domains for scientists who are no longer “with us,” either dead or retired. I expect  that certain famous scientists will have many institutions clamoring to host those domains, if only for the recognition. Therefore I propose a bidding process. Institutions would bid against each other for the privilege of hosting the sites of these famous scientists. However, rather than bidding money, they will offer to host the sites of less popular scientists. So, an institution that wants to host AlbertEinstein1879.sci may need to host the sites of tens of thousands of other dead scientists in exchange. Remember, it is not as if these “charity” sites will take up a lot of space or bandwidth, so it shouldn’t really be much of a problem.

I understand that other, non-scientist, people may want to collaborate with scientists as well. However, I do not think it would be appropriate for just anyone to be allowed to register for a .sci domain name. Only individuals with a certain level of bona fides should be allowed to register. Whether that should include only those with PhDs or also allow others established in their fields, I cannot say. I will leave it up to the scientists to hash out the particulars of what qualifies as a real scientist within their particular fields. There is one thing I am adamant about here: Corporations are not people and, therefore, they cannot be scientists. Even though a corporation may own the intellectual property of the scientists who work for them, it is the individual scientists who have made the contributions, that is what we want to track, and so only the scientists should be able to get a .sci domain name.

I understand that this new top-level-domain, with its special considerations, would require both an act of congress as well as international treaties. However, the potential value gained from it would make it worth the trouble. Some may argue that the cost of maintaining such a long list of domain names would be too expensive. Seriously?! Just keeping a domain name in a list on a few servers would cost too much? The importance of the advancement of science is enshrined in our constitution. The USPTO and Library of Congress cost billions per year. A little bit of bandwidth on a few servers spread out throughout the world would amount to less than a Higgs Boson within an atom in a molecule in a drop in that bucket. Besides, the revenues from the exponentially growing ranks  of new scientists registering for their domains will easily pay for the exponentially shrinking costs of maintaining the lists of all the previous scientists.

Now, the entire dSRCI system is not utterly dependent upon the approval of this new top-level-domain. Though it would certainly make things much easier. Scientists could register domains under the .name TLD. Or simply choose any domain name they, personally, control. The problem with this is the impermanent nature of these registrations. If the registrants or their heirs do not keep up the yearly payments, then the domain name is up for grabs by anyone who wants to capitalize on the scientists’ good names. Perhaps some registrars could be persuaded to offer perpetual registrations for a large enough up-front fee. Unfortunately, without an adequate legal contract, I would still be suspicious as to the actual longevity of said domain name registration. This is an issue for another blog post, but perhaps we could get some lawyers to  figure out the proper language to ensure that a registrar – and any entity that ever receives their assets – will be required to maintain said contracted registrations for perpetuity. Perhaps something similar to liens on property. Heck, if corporations can be people, and simple, obvious ideas can be inviolable property, then domain names can be property to be protected in perpetuity too by gum it!

Another alternative to the new .sci TLD would be for scientists to simply start using these scientistname.sci URIs in their citations and in the metadata on their web sites. The DNS system would not resolve these URIs to actual URLs until the .sci TLD was approved, but search engines would still be able to index the citations. If it turns out there are legal issues with using the .sci suffix in these temporarily imaginary URIs, then it would also be possible to use dSRCI.net/scientistname.sci instead. If “dSRCI” were trademarked then the dSRCI organization would be able to deal with abusers within the regular legal system. I would recommend against the dSRCI organization hosting any web pages pointed to by these URIs, however. I would not want any one organization to have that much control or to become a potential choke-hold for oppressive governments to use for censorship. In this context, I believe a search engine based “replacement” for DNS may be more robust and more resilient to change than the current DNS system. But that is yet another topic for yet another separate post.

Yet another alternative, though my least favorite, would be for scientists to take the string which would be used as their domain name under this system and start using it as the parent folder for their professional web site. For instance: If the university where they work provides them with a folder such as www.university.edu/people/~userName/ then the scientist could create a folder called www.university.edu/people/~userName/scientistNameYear.sci/ and place all their content under there. The file www.university.edu/people/~userName/index.html would simply redirect to www.university.edu/people/~userName/scientistNameYear.sci/index.html . This way, the scientist could move that folder anywhere he or she wanted and search engines would still be able to find it when people search on the “scientistNameYear.sci” string.

So, I guess all I need to do now is form a non-profit to lobby for a new law creating the perpetual .sci TLD as well as the treaties necessary to make it international. Anyone want to help with that?

In my next post I will discuss the data standards and citation format necessary to bring all this data to the surface for ease of analysis.

  1. I have since figured out that I don't necessarily need to create a new TLD. Instead, I could register a URI-Scheme with the W3C and then do what I want under there. I will look further into this possibility.