Wednesday, July 15, 2009

Got GOT? : Using a multi-faceted, multi-leveled, multi-graph to analyze the interconnectedness of all things in a gigantic Graph-O'-Things

In this post I propose a simple - if tedious - means of calculating the interconnectedness between any two nodes in a multi-connected, multi-leveled, multi-faceted graph. The strength of a relationship or the amount of connectedness between two nodes can be calculated based on how many other - closely related - things each of the two have that are also related to each other. The calculation method is essentially nothing more than the same method used in calculating the total resistance in a network of simple electronics resistors. However, the vast size of the graph as well as the multi-leveled, multi-faceted nature of it both increase the number of the calculations and the complexity of task. At the same time, the clustered nature of most multi-graphs which represent the real world provides a means to simplify the calculations and perhaps take a few shortcuts.

I will expound upon this idea by elucidating several possible real-world uses for this technique, starting with the purpose for which it was invented: Organizing the graph of interconnectedness between all the possible topics in in the vast tree of all educational content that will eventually make up the DEMML™ content library. Then I will explain how additional levels (or 'branes) can be added to this gigantic multi-graph to facilitate the analysis and visualization of the interconnectedness between any and all research papers ever published as well as the authors who wrote them and the institutions in which they worked at the time.


Introduction

While working on the DEMML™ standard and system, I have often been stymied by the problem of how to visualize the entire "Tree of Knowledge" in the DEMML™ classification system and the interconnectedness of all the different branches and nodes on the tree. It is very easy to simply expand a huge hierarchical tree and see how each subject is a parent for sub-subjects and so on, down to the smallest individual topic. However, many topics are also related in some way to other topics that aren't direct parents or descendents. Sometimes, topics may be related to other topics way over on the other side of the "tree." You can think of this as a very large tree with lots and lots of clusters of spider webs built within it. These clusters are then connected by thicker strings that tie the entire tree together into one large, interconnected mass.

I have been fetching about in my mind for a way to show all of these connections as well as how strongly all of those individual topics are connected or related to each other. I knew that I would need a vastly interconnected multi-graph overlain on top of the original Tree of Knowledge. I also knew that I would need a means of showing the strength of the connection or the relationship between each two topics. The two most obvious options being physical spacing or thickness of connecting lines. However, what I couldn't figure out was how to calculate a number representing the strength of those relationships or connections. In particular, although I knew that many topics would form clusters of topics (which are referred to as "subjects" in the DEMML™ system), I did not know how I would calculate the strength of the relationship between any two clusters of topics.

Yesterday afternoon it occurred to me how to visualize and analyze this graph of connections. I have been writing furiously since then. Please note: The ideas often come to me faster than I can write them down so this is just a rough outline of my thoughts. I have been trying to get them out of my head and down "on paper" as quickly as possible. I may clean it up and turn it into a real "paper" later.

Things, topics, facets, and the GOT

Before I go too much further I feel it is prudent to introduce a few terms and the specific definitions I will be using in this "paper." DEMML™ is an acronym for Distributable Educational Material Markup Language™. You can learn more about it at www.demml.org. As stated on the DEMML™ site, a "topic" is a very specific set of information about a very specific thing. A topic is atomic in the sense that it is the smallest collection of information that makes any sense all by itself. For instance, one could list a date all by itself or a place all by itself and they would be almost meaningless. However, if that date and place were listed as the time and place of the beginning of a particular war then they would have meaning. The beginning of the war is the topic. The date and place are facts within that topic. The facts are important but meaningless outside of the context of the topic. Therefore, even though the topic is atomic, it is composed of multiple facts, just as atoms are composed of sub-atomic particles. A subject is a collection of closely related topics. How closely related is relative. One could discuss the subject of war or the subject of history. War would be a sub-subject of history and the beginning of the particular war discussed earlier would be a topic within the subject of war.

A facet is an aspect of a particular topic. This term is borrowed from the notion of faceted classification systems used in the field of library science. For instance, when discussing wheat, one could discuss the history of wheat or the methods of propagating wheat. They are both directly related to wheat but are different sub-subjects within the main subject of wheat itself. Usually a topic will discuss only one aspect of a particular subject but it is still possible for there to be some topics that discuss or specifically relate two different aspects of the same general subject.

In order to generalize this discussion I will use the term "thing" to refer to any thing that could be represented by the node of a graph. I will only use the term "node" when referring to the graph itself. This is merely for the convenience of those who are not used to thinking of physical objects or contextual topics as nodes on a graph. I will first discuss the technique proposed in this paper in very general terms, continuously referring to "things." Only when discussing the individual applications of the technique will I switch to using the terms "topic" or "subject." Just as a "subject" is a cluster of closely related topics I will often refer to a cluster of things as simply a "cluster" for simplicity.

Finally, just for the fun of it, I will refer to the vast, multi-faceted, multi-graph of all "Things" in the universe being discussed as the "Graph-O'-Things" or the GOT for short.

Analogy between connected web of things and a resistor network.

When considering a large weighted multi-graph it is possible to analogize the graph as an electronic circuit comprised of nothing more than a bunch of resistors. In electronics class they call this a "resistor network" and students are tasked with calculating the total resistance from one point to another in that network. The calculation is relatively simple, if tedious. When considering multiple resistors wired in series, one simply adds the value of all the resistors. Three 10 ohm resistors wired in series would have a combined resistance of 30 ohms. However, when considering several resistors connected in parallel, one adds the reciprocal of each resistor's value then takes the reciprocal of the sum. For instance, three resistors of 10 ohms each, wired in parallel would have a combined resistance of 3 1/3 ohms. This is simply because the electricity now has three equal paths over which to travel. You can use the analogy of a narrow water hose just as effectively if that is easier for you.

If the resistor network is complex then one must look for small subsets of the network that are wired in parallel or series and calculate the combined resistance for that small part. Then use that small part as if it were a single resistor in the larger network and look for other parts that can be considered to be simple serial or parallel configurations. By working one's way up from the smallest to the largest, conglomerating as one goes, it is possible to calculate the exact resistance between any two points within the resistor network, regardless of how complicated.

Relationship between differences, resistance, and relatedness.

In electronics, the amount of electricity that can pass through a part of a circuit is inversely proportional to the resistance of that part. In our analogy, the only type of "part" we are concerned with is a resistor. {Physics and electronics students will cringe at the over-simplification of this phenomenon. However, I am merely drawing an analogy and am trying to explain it to those who may not be familiar with the science behind it. Therefore, you will just have to grin and bear it for now. You will soon see that the analogy is adequate for the task at hand and there is no reason to complicate things with extreme scientific exactitude.}

If two things - whether they be topics, animals, or phenomenon - are considered closely related then it can be assumed this means the difference between them is small. In other words, the "relatedness" is inversely proportional to the amount of difference between two things. Because we will use the analogy of resistance to calculate the degree of "relatedness" between two things, it is important to keep in mind that the resistance in our analogy is actually analogous to the difference between those two things. So, if two things are closely related then that would be analogous to there being a low resistance value between them. In this paper I often refer to the relatedness of two things and then instantly start talking about the resistance between the nodes. This is because we are most interested in the relatedness between the things but the term commonly used for the inverse of resistance, "conductivity," is not as well known. Just remember: Closely related = small difference ≈ low resistance. When looking at the actual numerical values calculated, keep in mind that smaller numbers mean less difference and therefore a closer relationship between our things or the points in the graph.

One may think that in a vastly interconnected resistor network the total resistance between any two points would be essentially zero. While it is true that the resistance is very low when compared to the individual resistors, the combined resistance is never zero. Because, in our graph, we can choose any range of numerical values we like to represent the difference between two things, we can scale the values chosen such that the calculated values of the combined differences are numbers that are within a useable range. We can even introduce multiplication factors or create new units of measurement to bring the numerical values to within a range that is easy for humans to comprehend and deal with. For instance, a large resistor network comprised of 1,000 ohm resistors can result in a total combined resistance between any two points of less than one ohm. That is OK. We can just pretend that the resistors were all 100,000 ohm resistors and now our numerical result is in a usable range of between 0 and 100. Or we could have simply started measuring the resistance in milliohms and gotten the same range of numerical values. In the end it does not matter. It is only the proportional resistance between various pairs of points within the network - the proportional difference between the relatedness of two things or clusters of things - that we are really concerned about.

Calculating differences in the network

Basics of how to calculate the differences between things in a network

As explained earlier, the strength of a relationship or the amount of connectedness between two things can be calculated based on how many other - closely related - things each of the two have that are also related to each other. In effect, each thing will have a cluster of related things around it. The size and shape of the cluster will be based on the number of closely-related things and in which "direction" they are different from the things under consideration. These "directions" will be N-dimensional base on the number of taxonomic or organizational facets the two things have in common. All these closely-related things within the cluster will each have their own networks of closely-related and peripherally-related things. For instance, if thing A has a lot of closely-related things that also happen to be related peripherally to a lot of things that are closely-related to thing B then A and B can be considered related even though there is nothing specifically about A or B that relates them. In other words, A and B can be considered "related" even if they are not directly connected. Second cousins, if you will. This can also apply to things that are only peripherally-related through peripherally-related things.

For any two things in the GOT one can calculate a score that reflects the degree of the relatedness between the two. This "relatedness score" will be based on:

  • The number of different paths that can be taken to "get" from A to B.
  • The strength all the relationships necessary to connect thing A with thing B.
  • The total number of relationships in each of those "connection chains."
Again, this can be calculated exactly the same way as calculating the combined resistance between any two points in a network of resistors. Because the resistance is analogous to the difference between the two things, when calculating the relatedness between two things we are really calculating the difference and then taking the inverse. A difference of 0 implies a relatedness of infinity, or that the things are actually the same. This is analogous to two points in a circuit being considered "electrically the same." However, we will never see two different topics that have a difference of 0 because that would make them the exact same thing. Although, we may see two synonymous terms show up as having a difference of 0 because they both actually refer to the same thing. In this case one term would need to be chosen over the other as the term to use in the taxonomy just as specific terms are considered "official" in the Library of Congress's Subject Heading Guide. The same terms in two different languages will often have a difference of 0 but not always exactly 0. It is possible - and often likely - that the language differences will also indicate or impart a slightly different interpretation of the term or thing and therefore the difference will be slightly more than 0. In many cases the differences will be so slight as to be considered "essentially zero" for most purposes.

Facets

Basics

Facets are various aspects of a thing. As a simplified example consider two different fruits, the banana and the lemon. There are many different aspects or characteristics by which one can classify these fruit: the color, thickness of the peel, taste of the peel, proportions, juiciness, and sweetness, to name just a few. Both fruit are yellow and both have medium thick peels so the difference between them would be small for these facets. They both have bitter peels but not the same kind of bitterness so the difference would be slightly higher with respect to that facet. Finally the proportions, juiciness, and sweetness of the two fruits get even more drastically different as we go along so the numerical value for their differences would be higher still. Now, we could simply say these two fruit are kind of the same and kind of different but this "kind-of" summarization can become quite vague at times. Besides, it is difficult for a computer to help us visualize something based on thousands of very vague "kind-ofs."

Facets are the cornerstone of the Colon Notation System, invented by Shiyali Ramamrita Ranganathan in the 1930s and widely used in India. As explained here, the hierarchical nature of the Library of Congress system was chosen as the basis for the DEMML™ Classification System (DEMCS™) because it had to be stored in the hierarchical folder structure of a hard drive on a server. However, facets do play a very important role in how people organize almost all but the most simple of things. Therefore, facets are brought to bear when determining just how strongly two different things are related.

When taking facets into consideration we must remember a few things: There can be more than one facet through which two things are related. Each different facet can exhibit a different degree of relatedness between the two things. There can be an infinite number of different "facets" with each thing having its own subset of facets that can be applied to it, and each pair of things having a different union of those subsets. Therefore, when we apply the concept of facets to our GOT (Graph-O'-Things) we add many additional layers of complexity.

The best way to wrap your mind around all these layers of complexity is to build up an image in your head. First, imagine a large set of disconnected nodes (the vertices of the graph that represent our "things") scattered throughout a space. Next, pick your favorite color to represent one facet or characteristic by which we can classify these things. Now, connect all the things that have some similarity based on that facet or characteristic together with lines of your chosen color. Remember, not all of the things will be related to each other through that facet so a lot of the nodes will remain disconnected. Next, pick another color to represent another facet and connect all the appropriate nodes with that color. Do it again and again with different colors for each facet you can think of.

Now that you have built up this beautiful, multi-colored web you can visualize it in more than one way. You can simply mix all the lines together in one mass of lines and see just how interconnected everything is as a whole. Or you can separate each different color onto a different layer in your mind. Each layer could be on a different plane so that you could see them independently, or you could simply fade some colors out to focus on just some subsets of the colors. If you can visualize in N-dimensions as I can (I'll teach you some time. It's easier than people make it out to be.) then you can imagine all the layers together but on separate "'branes" as the cosmologists call them. 'Brane is simply short for membrane and cosmologists use the term to refer to different planes of existence that exist in the same space but are actually separate from each other. I prefer the 'brane metaphor to layers because, as you will soon see, all these facets still connect the same N-dimensional set of nodes (all the things in our Graph-O'-Things) and they actually interact in complicated ways. If you try to separate the facets into different layers, each on its own plane, then it will be difficult to imagine connecting everything together without lots of extra connecting lines that get in the way and cross over each other.

Another way of thinking of these facets as means of connecting various sets of things is to think of them as a set of several different and distinct networks of resistors. One network carries electrons and another network carries "goobatrons" (or what-have-you) and still another network carries some other kind of 'trons. Each different network is made of wires that only carries one kind of 'tron. And it has resistors that resist the passage of that kind of 'tron. All those different 'trons can only travel on their own networks so the resistance between the points in the network (or relatedness of the things in the context of that facet) can be calculated separately… Or can they???

Semi-related facets

As it turns out, the facets themselves can have varying degrees of relatedness. In thinking back to our fruit example, the taste of the peel and the taste of the fruit itself are different characteristics or facets of the fruit. However, they both involved taste so those two facets were more similar to each other than they were to the color facet. In this case, and switching back to our resistance analogy, it may be possible for the 'trons from some foreign network to pass over our network, albeit with some difficulty. The additional difficulty with which those foreign 'trons could pass would be expressed as a form of additional resistance that would be exhibited by our resistors against the foreign 'trons. The amount or proportion of that additional resistance would reflect the degree of relatedness between the two facets.

So, when calculating the total, combined difference between any two things in the GOT, we have to take into consideration all the different ways that those two things may be connected and how much difference is reflected in each of those connections. These different ways may simply be different nodes through which the chain of connections must pass (Ala the Kevin Bacon game). But they may likely also include all the various facets that can be used to connect the two things. Each facet will still need to be calculated separately. However, when calculating one facet we must be sure to take into consideration the other facet networks that may be similar to the one under consideration. Back to our 'tron analogy (no, not the movie), we would need to calculate the resistance for each type of 'tron separately. On that 'tron's home network we would simply use the defined resistance (difference) values for those connections. All the foreign networks that the 'trons would be able to traverse would temporarily be treated as if they were part of the current network except that they will have additional resistance factor applied.

For example, 'tron (facet) type A can pass over a network designed for 'tron type B but with only 10% efficiency. So simply treat network B as if it were all part of network A except that all the resistors in network B have ten times the resistance. Because in reality we would be considering many different overlapping facets while making the calculations for any one facet, it is likely that two directly connected things will be connected via two different facets. Simply follow the rule for calculating the resistance for resistors in parallel as given in the introduction. Now calculate the total combined resistance between all the pairs of points in the whole network. This gives you the difference or relatedness map for all the points in terms of facet A. Do the same for all the different facets. Yes, I know, it is a lot of calculations. That is what super-computers are for.

Fortunately, certain facets will only be applicable in certain niche areas of the GOT. In addition, once all this calculation is done for the massive GOT then additional calculation runs will not be necessary unless the graph is changed appreciably. Finally, once the calculations are made, clusters of things will show up. The aggregate relatedness between these clusters can be calculated by averaging the differences between all the pairs of things in the two clusters. Once this is done, we don't really need to trouble ourselves with all the different individual differences throughout the GOT. We can just use these aggregate differences for most basic analyses. We would only need to use the specific differences between two specific things when doing higher level analysis or trying to spot more fine scale patterns in the graph.

Visualization

The names of these visualization models are also stolen from cosmology. One is called the "Pure Gravity Model" based on the now-defunct cosmological theory that all of the universe interacts based purely on gravity. The "Gravity + Expansion Model" for visualization is based on the current theory in cosmology that for "relatively closely" spaced objects such as galaxies and clusters of galaxies (those quotes are there for a reason) gravity still controls most of their relative motion and position. However, for objects spaced much further away, gravity has far less of an effect and the expansion of time and space within the universe has a much greater effect.

Pure Gravity Model (PGM):

In this model the distance between any two points in the visualization is directly proportional to the combined, aggregate difference between them. This would look like a big cloud of points with lots of lines between them. The cloud would be thicker in some areas but it may be difficult to visually recognize the clustering.

Gravity + Expansion Model (GEM):

In this model the various clusters would be separated more, proportionally, than the things within the clusters. Below a certain threshold of difference, the "gravity" of the individual differences would take precedence. However, above that threshold, the "expansion" factor would take precedence, thus separating the clusters more in proportion and making them easier to see. Various algorithms could be tried to see which gives the best visualization. It could even be possible for the user to adjust these thresholds so as to arrive at the best visualization for their needs.

The clusters will then be shown as connected by "average difference lines" that indicate the average difference between any two things in the two separate clusters. The thicker the line the less the difference. Just like a wire. The user would be able to adjust: The average thickness of the lines; The "contrast" between the thickness of the lines; Thresholds for when lines are visible or not based on various factors; The relative importance of various facets.

In both of these models the distance between things can be in direct or logarithmic proportion to the difference between them. The less difference (or resistance in our calculations) the closer the two things in the graph. Clusters of things can be shown as blurry clouds rather than attempting to show each and every individual dot.

Different facets can be shown in different colors. Since most clusters of things will share a limited subset of facets within the cluster, only a limited number of colors would be needed to see all the different facets within a given cluster. Just as the geometry of a geographic map limits the number of colors necessary to distinguish between states or countries and the colors can be reused in other areas without fear of accidentally using the same color for two adjoining states, it should be possible to find multiple sets of colors that can be used to distinguish between different facets within a cluster and still not overlap the sets of colors that are used within neighboring clusters. Granted, since the intent of this project is to show the vast interconnectedness of all things, lots more than 6 colors will be needed and some overlap is to be expected. The user should be able to highlight different facets (or families of facets) to see how things are connected based on those facets. They should also be able to set the system to ignore certain facets or families of facets and recalculate the connectedness of the GOT given enough computing power.

Uses

General classification aid

This system can be used to look at the interrelations between all kinds of things. Granted, it is a little much for a small collection of simple things. However, it's use may reveal nuances that were not observable without it. Rather than only look at obvious similarities, we can look for groupings or clusters of things in a richly connected, multi-dimensional graph of connections between things. This can be used for chemistry, astronomy, or any field where there are lots of different kinds of things or phenomena that are difficult to classify. Given that we are now learning that some genes can actually jump from one species to another, thereby at least partially diminishing the efficacy of the pure tree model of evolution, it can even be used for biology.

The DEMML™ Standard

Of course the primary use to which I intend to put the system is to aid in classifying topics in the DEMML™ system and then to allow students to surf between related topics that they might not have thought were related. When originally building the giant Tree of Knowledge in which to place all the educational content in the world (yes, yet another gigantic project) I expect that many organizations will vie for control of who gets to place which topics where in the tree. People will naturally want to place the subjects in which they have the most concern at higher positions in the tree (closer to the root directory) so that they will have more prominence. By using this technique and several robust rounds of mass-collaborative voting, I expect to take the majority of the politics out of the process. Thereby, I hope to create a tree structure that accurately reflects the true connections between all the material instead of merely replicating all the archaic academic classification systems that have been passed down through the ages.

Cross relationships between documents, researchers, and institutions

Another very important way that this system could be used is in finding, mapping and visualizing cross relationships between all kinds of documents (particularly research papers), the authors of those documents, and the institutions in which they work. Research papers act as containers for multiple topics. Those topics are sometimes only related to each other by the document that contains them. The document may explain why they are related when no one previously thought they were. Or the document may explain a new and innovative way to combine two or more things that had never been thought of before. In the same way, much of the rest of the real world consists of things that are only related because of the container or set that contains them. The following techniques can be used to map the additional interconnectedness for any sets of things that are only related by the fact that they exist in the same container. Think of the documents as residing on a somewhat higher plane, or perhaps a different 'brane than the GOT or any of the facets in the GOT.

Cross relationships between research papers:

For this discussion I will assume that we are talking about research papers, for consistency. Again, this modification of the technique can be used for any types of things where some things are only related because of the container they are in.

Each paper can be considered a little network or multi-graph of it's own. Think of the documents as residing on a somewhat higher plane, or perhaps a different 'brane than the GOT or any of the facets in the GOT. It has multiple different "things" (topics) in it that are all interrelated by how they are discussed in the document. Thing A may normally not be considered to be connected to thing B. However, in this document they are discussed together. Perhaps as both affecting thing C or through some new proposed relationship that no one else had previously considered. The topics in the document will need to be matched up to things in the GOT. We don't actually draw connection lines at this point in the process. We just determine which things in the GOT that the topics in the document refer to.

It is possible that some of the topics in the paper match up to multiple different things in the GOT. This is sometimes an indication that there are not enough things in the GOT and that it needs to be expanded and enriched. This could also mean that the topic in the document actually discusses two things and is therefore actually two different topics that are being discussed in tandem. Split the document's topic up into two separate topics to better align them with the names of things in the GOT.

The "relatedness factor" of each pair of topics in the paper can be calculated based on: How often they are discussed together; How closely they are discussed within the paper; and several other factors. Researchers are currently working on means to automatically determine these relationships simply by having computers analyze the documents. I will certainly not try to duplicate their work here. However, in some cases we may need to depend on human readers or upon the authors of documents to insert additional XML tags to make the intended relationships more clear. Use the same multi-faceted, resistance-network type of calculation as above to map out the interconnectedness between all the topics in the document.

Next, we need to determine the importance of each topic (and each cluster of topics) in the document based on: How much of the paper is devoted to the topic; Internal clues such as table of contents and heading titles; and XML tags indicating importance of various parts of the document. I know these do not exist yet, but they could perhaps become more popular in the future. Naturally, we would eventually have to watch out for "search engine optimization" techniques that may be used to tie papers to more different or important topics than they perhaps deserve. Using this "importance" information, we can assign multiplication factors to each topic within the paper so that the important topics will have the most effect in the subsequent steps.

After we have done all this we can the "Plug" the "circuit" represented by the document into the "circuit" that is comprised of all the interconnections between all the topics in the GOT. Make a connection between each topic in the document and the topic in the GOT. This connection will be assigned a "resistance" based on the importance of that particular topic within the document and the overall importance of the document. This latter can be measured by: How many other documents cite the document; How much of those other documents are devoted to discussing those citations; A voting process where documents can be voted higher even if they aren't cited, voted lower if they are excessively cited for little reason, or even if they have fallen out of favor. Only vetted experts should be allowed to vote.

Once "plugged in" to the GOT, the document will create a kind of short-circuit, forming new connections between things in the GOT and changing the combined resistance between various points in the GOT. When multiple documents are "plugged in" it will dramatically affect the landscape of the GOT and it's "resistance network." We will then be able to see the interconnectedness of the documents through the GOT. Attempting to see this by looking at the documents alone can leave gaps. We must show how the documents relate through the interconnectedness of all the other things in the universe and how the documents change that interconnectedness to derive any real meaning from them.

"Plugging" documents in to the GOT will affect the Visualization of the GOT in several ways. Using the pure gravity model, the things in the GOT that are now more closely related (via these documents) will be more closely spaced in the visualization. Using the "expanding universe" model the lines between clusters of things would get thicker. In fact, many lines may "jump up" to that 'brane occupied by the documents so that we can see how the documents or document clusters provide the linkages between subjects that may not have previously been thought of as connected.

We should even be able to develop techniques to highlight the differences between the GOT before and after. This could even be done for each document individually or as a cluster of documents contributed by the same authors or fields of study. The difference between the before and after (without and with the documents) can be a strong indicator of the actual contribution made by the document. These before and after analyses can be done two ways: Chronologically, in that each document is added in the order it was written and the analysis is done after adding each document. This will show the contribution the document made at the time it was published. Or all the documents can be added and then can be "removed," each in turn, and the change in the GOT analyzed. This will show the importance of any document in the current grand scheme of things.

Cross relationships between researchers/authors:

Researchers can provide yet another level (or 'brane) for looking at interconnectedness. Each document is written by one or more researcher. Determine the contribution made to each document by each author by any of several means: Using XML tags placed in documents by authors. (I know this would upset many who depend on attaching their name to documents that they made little or no contribution to) or perhaps a vetted voting mechanism.

Connect the researcher to each document with an associated "resistance" based on their contribution. Then redo the total resistance calculations based on these new connections. This will show how researchers are connected to various things within the GOT and how they are interconnected to each other. This may even reveal previously un-thought-of possible collaborations between researchers.

Cross relationships between institutions:

This would be calculated similarly to the relationships between the researchers. Note that this should not be treated as an additional level on top of the researchers because the researchers may be at different institutions at different times. Nor should the institutions be connected directly to the GOT. Rather, they should be connected on top of the documents similar to the way the researchers are connected. This puts the institutions in a kind of sibling relationship to the researchers.

Use the same calculations as for the authors (including weighting based on actual contribution) except: If more than one author is from a given institution then calculate the total contribution for the institution as the sum of the contributions from the individual authors. Depending on needs it may be better to view the authors and institutions separately or in a mixed configuration. On the one hand we shouldn't consider the contributions of the authors and the institutions simultaneously because that would be redundant. On the other hand, one could increase the resistance factors for both the author 'brane and the institution 'brane such that their shared contributions came out the same as any one alone. This may need to be adjusted on a case by case basis depending on how much influence each institution has on the work of the authors. Again, this may have to be determined by some kind of vetted voting process. For instance, one could ask, "How much of what author X was able to contribute was due to the institution he was affiliated with. Then that institution gets a proportional share of what the author "claims" to have contributed to the document. For this type of calculation it may prove informative to consider funding providers as well. It is possible that a particular author X was working under a grant from foundation Y and that institution I had very little to do with allowing that author to complete their work. In this case, assign the foundation an appropriate proportion of the credit.


The content of this post is Copyright © 2009 by Grant Sheridan Robertson.
However, anyone is free to use this idea for research purposes as I will likely not have much time to do anything with it any time soon. You must simply give attribution that this is where you got the idea and notify me of your use.

No comments:

Post a Comment