Saturday, July 25, 2009

Intelligent Epidemic Routing, Preliminary Notes

Ever since I invented DEMML™, I have been stewing on how to actually transmit and/or transfer the content by means other than a direct internet connection. While it would be easy to simply copy files into a directory, I wanted a means whereby someone could simply plug a thumb-drive into a computer and the computer would transfer the appropriate information. I recently began searching for "store and forward" technologies on the internet and came across the following article:
Amin Vahdat and David Becker, Epidemic Routing for Partially-Connected Ad Hoc Networks, Duke Technical Report CS-2000-06, Jul. 2000, available from http://issg.cs.duke.edu/epidemic/epidemic.pdf
In this article, the authors discuss a method of transferring messages across disconnected systems of devices by simply copying the messages to every device encountered until the message finally reaches its destination. Naturally, the method is a little more sophisticated than that but you get the idea. The article does also allude to some additional measures that could be taken to make the system more efficient.
For example, under certain circumstances there may be locality to the movement patterns of mobile nodes. In this case, it would be worthwhile to exchange a list of the last n nodes encountered by a host during anti-entropy. This information can be utilized to once again identify appropriate carriers under the principal that if a particular host has been seen recently, it will be seen again in the near future. (page 12 of the .PDF)
Unfortunately, (or perhaps fortunately) I was not satisfied with the level of completeness in that statement. In fact, before even reading it, several ideas had already been simmering in my mind.
Over the course of several days I have attempted to organize those ideas into an outline. I had hoped to be able to write them up into a slightly more readable form but I just don't have the time or energy right now as I have home projects to do and a GRE to study for. (A chore which I am sure I will have more to say about later.) Therefore, I have decided to simply post the raw notes up as they sit right now. The following is nothing more than an outline for an idea. However, it is a rather thorough outline. Printed, it takes up about 12 pages. For those people who are interested in such esoteric things, I hope this helps. I hope my ideas can help spur more ideas for you or help in your research.


Intelligent Epidemic Routing

Types of Devices:

Before one can really discuss routing of data between multiple disparate devices, one must first asses exactly what types of devices will likely come into use within this context. I have divided the devices into two major categories: "Carriers" (those devices that actually store a message and transport it to the next device by way of the movement of either of the two devices) and "Connection Services" (those devices that can only transmit a message but cannot store it for any length of time).
  • Carriers
    • Can store and forward a message.
    • Any device may or may not have any of these features at any given time or in any given location. (For any device to be considered to have any of these capabilities it would need to have both the requisite hardware and software. Therefore, there is no need to list software drivers separately from the hardware component.)
      • Mobility
      • Connected to a Connection Service (CS)
      • Wireless transmission capability of various protocols
        • Each will have its own coverage area.
        • Each transmission system may have it's own separate availability data. It is possible for Bluetooth to be on all the time but only have the Wi-Fi turned on when needed. Therefore, it would only be possible to do a "drive-by" transmission over Bluetooth.
        • Wi-Fi
          • a, b, g, n
        • BlueTooth
          • 1.0, 2.0
        • Cell
          • CDMA, GMS, etc
      • Line of sight transmission capabilities
        • Laser links
          • Infrared or visible
        • Light beams
        • Audio via parabolic dishes
        • Heck, two cans and a string if that is all you got.
      • Hard-Wired connection abilities:
        • Cannot necessarily assume that the proper cable is carried with the device. If the cable is carried with the device then we still have to consider which other devices have the appropriate connectors and software to communicate with the device.
        • Custom Syncing cradles
          • Palm
          • Windows Mobile
          • Bare hard disk in a docking station.
        • USB
          • Thumb drives or any other USB connected media.
        • Serial
        • Any new devices invented to take advantage of this technology.
      • Computational capabilities
        • Various devices will have various computational capabilities. Devices with better computational capabilities can do some of the heavy calculation work for their lesser capable peers, thus increasing overall efficiency. I call this Mentoring.
        • If a device is a carrier but can not compute then it is simply a storage device.
        • If that storage device is connected to another device that knows how to read and write data to it then the storage device can be a carrier as well. (Call it an intelligent floppy shuffle.)
      • Storage capabilities
        • By definition, a carrier will have some storage. "How much?" is the question.
  • Connection Services (CS)
    • Can only transmit data but cannot store a message in and of themselves.
    • Cannot store information about itself. Therefore, all the other devices will have to store and track the information about the CS, its capabilities, and availability.
    • Stationary
      • Internet
      • Wi-Fi access points.
      • Telephone systems
    • Mobile
      • Wi-Fi access points on vehicles
      • Mobile device with Wi-Fi acting as a router in an Ad-Hoc network.
      • Any other mobile ad-hoc network.

Chronologically and Spatially Dynamic, Probabilistic Routing Map

  • Definition:
    • Chronologically and spatially dynamic, probabilistic map of where each known device is expected to be available and when, based on where they have been in the past. This can be matched up to where the current device is expected to be and when.
  • Uses more resources on the device.
    • Each device would need to store a separate version of this map for each device it is concerned about. That could be a lot of maps. However, in regions of the world where this technology will prove most useful, there are likely to only be a few devices that any one device is concerned about.
    • The designer of each system needs to determine how much benefit can be derived from the use of these extra resources.
    • As devices become more powerful and have more memory, this kind of technology may be possible on the smallest of cell phones.
    • Density and accuracy of the map can be scaled to match the available resources on the device.
    • It is best to start developing this technology before it is actually possible to do so that the technology will be ready when the devices are.
      • Remember, Bill Gates once said no one would ever need more than 640K of RAM.
  • Ways to represent the map in memory:
    • Representing Geographical areas
      • Vector map of outlines of coverage areas.
        • May work best to represent coverage areas for CS or ad-hoc networks rather than for separate mobile devices.
        • May require multiple, separately-outlined areas to track different availabilities for those areas
      • Grids
        • Each cell in the grid represents a geographic area.
        • Each cell contains probability data for the availability of each device within that geographic area.
        • Requires less memory and computational power than a vector map.
        • I am currently disregarding the vertical dimension of space. If this needs to be considered in the future (for high-rise building or mining operations) then an additional dimension can easily be added to the system.
        • Full coverage Grid
          • Complete grid lain over the entire geographic area where the device may interact.
            • This means if any other device that this one is concerned about roams into an area that this device may be concerned about then this device must have a grid large enough to accommodate it.
            • If any monitored device roams out of that area then the grid must be expanded.
          • Devices with more memory can store finer grid with smaller cells for more accuracy.
          • Single field stores GPS coordinates for origin of grid and size of each cell. All other cell locations can be calculated from that.
        • Intermittent Grids
          • Smaller grids that are treated just the same as the supposed full-coverage grid.
          • More than one of these of various resolutions can be used to give adequately detailed coverage of some areas where there is more activity and
        • Independent Grid Cells
          • Used for when the device is only concerned about a small proportion of the covered geographic area.
          • A certain grid size is specified and an origin indicated, however, only certain grid cells exist at all. Each cell has a key indicating how many over and down from the origin it is located.
      • Scattered Spots
        • Individual GPS locations are recorded with time-coded availability data.
      • Any of these methods can be dynamically mixed and matched (even overlapped) as needs dictate.
        • A vector map can cover most of the area for a CS while a full grid can be placed over the most traveled areas and a few independent grid cells and/or spots can cover scattered outlying areas.
          • More detailed information would always take precedence.
            • A fine grid can be overlain on top of a course grid in the areas of most activity.
              • When the fine grid is available then the course grid data for that area is ignored.
            • If there is a vector map but there is also a grid or spot in the middle of that vector map indicating less availability then the vector map would be ignored in that area and the grid or spot information would be used.
              • This could be used to indicate dead spots within an otherwise even coverage area.
    • Representing Time:
      • Because devices (carriers or CS) may be available at different times of the day, week, or month we need a way to indicate this.
      • Using the Z-Dimension
        • Stretch the geographic map vertically. Time is measured along the Z-axis.
        • Essentially, we have all the same options for representing time as we do for distance.
          • We can use full grids, intermittent grids, blobs, or any combination of the three.
        • If a device is likely to be at a particular location but only within a given time range then the device is only marked on the 3-D map for those time ranges.
        • Since time moves forward and the past is less and less relevant, we could represent the most recent time as 0 on the z-axis and past times as negative values.
          • Devices could decide for themselves how much of the past they want to track.
        • Naturally, devices can only appear at any one place at any one time.
          • So, the 3-D view of that data structure would look kind of like a mold of worm-holes in wood or soil where the worm always traveled in a general downward direction. If the device were turned off at various times then that worm hole would seal up for those times, almost as if there had been a cave in of the tunnel.
      • Time-slices:
        • Time is divided up into equal slices of various duration.
        • All of time is simply represented by a stack of slices. Each layer in the slice represents the geographic area during that particular time slice.
        • When using a grid based geographic model we end up with a simple 3-D grid made up of stacks of cubes.
          • Availability probability data stored at any cube position would indicate that the device is likely to be available at that position and time.
          • If a device just sits in one position for a while then it will create a vertical stack of cubes. If it moves to other geographic regions then it will show up as a diagonal line of cubes.
            • The algorithms should be designed so that the diagonal lines do not show breaks if the device moves around too fast.
            • The only time breaks should show up is if the device is simply off or unavailable.
      • Intermittent Time cells
        • Just as with geography, it is possible that only certain segments of time have any activity in them at all.
        • Rather than create a memory hogging 3-D array of all possible times for all possible places, just use stacks of time cells to cover the availability times for specific geographic regions.
        • It would look like stacks of blocks hovering in space.
          • Each stack of blocks would only have to store the origin and block size once.
            • Block size would indicate horizontal distance values and vertical time values.
          • Then the blocks could simply be addressed as offsets from there.
            • The offsets would be simple integer values representing the number to multiply by the size to get the distance from the origin.
            • The offsets could be both for space and time.
            • Thus, one origin value (GPS + Time) plus a series of integer offsets would adequately describe all the cubes within a virtual grid-space where the device could be expected to be found.
      • Time blobs:
        • Just as with geographic data, It is possible to represent availability over time as a bell curve.
        • In this case the standard deviation is expressed in units of time.
        • Since time is one dimensional we don't have to consider an "astigmatism factor" as described in the probabilities section.
      • Because patterns are most likely to develop over days, weeks, or months we could use different sets of time slices.
        • One set, representing hourly statistics could have 24 slices, one for each hour in the day. Another set, representing daily statistics would have seven slices, one for each day in the week. We could have two sets for months, one with five slices for weeks within a month, and another with 31 slices for days within the month. I guess some situations would also call for yet another set with 12 slices representing monthly statistics within the year.
  • Representing Probabilities:
    • Single Probability number for each cell:
      • A common method for storing this information would likely be to store one probability number for each grid cell, thus creating a grid of rectangular columns. At any point within the geographic area covered by that grid cell the expected probility would be considered to be the same.
      • While this provides for simple calculation of the probability at any one point it is not very accurate with large grid cells.
      • In order to increase the accuracy, one would have to decrease the size of the cells, thus increasing memory requirements dramatically.
    • Bell-Curve Representation
      • Rather than store a number indicating the probability of making a connection in every small section of the geographic area, it is possible to simply specify a point to represent the center of a bell-curve, the height of the bell curve, and the standard deviation. This will define the cross-sectional shape of the curve. The maximum height of the curve indicates the maximum probability of getting a connection at that point at that time. The standard deviation is expressed in physical distance units rather than statistical sample units. The probability of getting a connection will fall off as you get farther from the central point at the rate specified by the standard deviation value. Note: the standard deviation is not necessarily calculated by standard statistical methods. It is merely used to indicate how wide that bell curve is, or rather, how fast the probability of getting a connection falls off as one gets farther away from the center. The height of the curve at any one point represents the probability of getting a connection at that point.
        • Yes, I know. In effect we are expressing the probability of a probability. What can I say?
      • This provides a lot of information about the availability of the device over quite a wide area using just two numbers.
        • In this way our grid cells can be much larger. A five-by-five section of a grid containing 25 cells and requiring 25 probability values can be replaced by one grid cell requiring only two numbers.
      • On the other hand, the work needed to calculate the probability of a connection at any given point would be slightly higher.
        • The best method may be determined by the resources available on each device. A device with lots of memory but low computational ability might use a finer grid and simple probability numbers. Whereas a device with a fast processor but little memory to spare may choose the bell-curve method.
        • Algorithms can be devised to translate back and forth between the two methods so that a device using one method can still share data with a device using another method.
      • Normally, a bell curve would just cover a circular geographic area. However, one could also use an "astigmatism" factor, essentially how much the circle is squashed, and an orientation angle to specify that the bell curve actually covers an oval shaped area. Adding just two numbers could allow one set of data to replace several.
        • When bell curves are used within a grid then it might be prudent to use a slight morphing algorithm to stretch the bell curve to fit the square shape of the grid. Otherwise the probability shown at all the grid intersections (where four squares meet) will show a probability lower than reality simply because the corners are farther away from the center than the edges. However, this may prove to use too much computing power than it is worth. Perhaps a quick and dirty formula can be devised that provides a decent morph at a low computing cost.
      • Representing larger areas with good coverage:
        • It is even possible to fudge the way we use the bell curve a bit to define a larger area (in space or time) that has good coverage.
        • Simply specify a maximum availability of over 100% and then truncate the curve at 100% when reading the map.
          • This will produce a flat topped "curve" that still tapers off at the edges as determined by the standard deviation.
          • Remember, this "standard deviation" is only used to give a shape to a curve using as few numbers as possible. It is not really a true statistical value.
      • If two bell-curves overlap, as they will likely do when using a full or intermittent grid, then one of two methods can be used to decide what to do where the two coverage areas meet:
        • Which system is used in any particular map or part of a map can be indicated by a flag.
        • The bell curves can simply be cut off clean at the overlap line.
          • In this case the standard deviation for, say, Area A would have been calculated so that the height of the bell curve just inside the border of Area A will exactly reflect the expected probability of getting a connection there. The same for any of its neighbors.
        • The bell curves can be additive in the overlap areas.
          • The standard deviations would calculated such that the sum of the heights of the overlapped areas represents the total expected probability of getting a connection at any point in those overlapped areas.
            • Now, we aren't expecting any of this to be incredibly exact. We aren't measuring atomic reactions here. So, it is OK if they don't add up to be exactly the predicted value. Pretty close is good enough for our purposes.
    • Using a bell curve with a vector map of a region:
      • Assigning a single (max-probability, standard deviation) pair to any one vector map region can indicate the rate of fall-off around the entire, convoluted perimeter of the region.
      • Again, this could reduce the memory required for the map at the cost of more computing time. Rather than use a very detailed vector map or a series of vector maps each with a different probability value, one could delineate a smaller, less detailed region and then specify how fast the probability of a connection falls off outside of that region.
      • In fact, in some cases, it may be possible to simply indicate a line through a geographical area and assign it a (max-probability, standard deviation) pair. That would then indicate the coverage for a large area along and on either side of that line.
        • Using the chop-off-the-top method we could even cover a much wider area while using less memory.

  • The map is derived from the following types of data::
    • Per device preference settings.
      • Each device can have its own set of preferences concerning how much space or other resources it is willing to devote to messages of various sizes.
      • Then, each node decides if that priority rating is high enough for it to accept the message. Rather than simply accepting or denying based on the value of the hop count number, the node could determine the probability of successfully transmitting the message to its destination based on the node's current version of the dynamic routing map and how it expects other nodes to react to that message along the way.
    • Time and GPS stamped availability data.
      • Each device keeps a log of their location and availability throughout the day.
        • That coverage area may likely diminish in strength or reliability with increasing distance from the device. It may also diminish throughout the day as batteries are used up. The coverage area may even change as the device is taken into different terrain. Therefore, the map should not assume that a carrier's coverage area will always be a specific circle around it.
        • strength of the devices signal throughout the day.
          • May change due to battery loss
          • May change due to weather depending on the transmission medium.
          • May change due to location in difficult terrain.
      • Transmitted during the hand-shaking or "anti-entropy" exchange between carriers.
      • This log data can be kept for a specified or algorithmically determined length of time.
        • Or it can simply be incorporated into the probability data. If it reinforces the probability data then it will be like adding an average value to a statistical pool. It reduces the standard deviation. If the data is different then it could modify the average and increase the standard deviation value.
    • Type of Coverage area of the devices.
      • Mobile devices with coverage area that travels with them.
        • Cell phones with Wi-Fi or Bluetooth
        • Wi-Fi access points on mobile transportation.
      • Mobile devices with no coverage area of their own.
        • PDA with no wireless and requires a sync cradle.
        • These devices (such as PDAs that require synching cradles) will require connection interface hardware. If that hardware is only located at certain places and cannot be carried with the actual "carrier" device, then that device cannot be considered to be able to connect to the service at any location where the service may normally be available. In these cases, the device must be transported back to it's interface hardware in order to connect to that service.
        • There may be some devices that can connect to a service at multiple different (but specific) locations. In these cases the device will simply have multiple locations
      • Stationary devices connected to Wide-Area-Connection-Service (WACS).
        • If a device (such as a computer in an office) is permanently connected to a service then the device simply has a very large coverage area.
      • Stationary devices not connected to Wide-Area-Connection-Service (WACS).
        • especially ones with much higher carrying capacity
        • These should be treated just like any other mobile device except a special flag will indicate that they don't move.
        • When these devices transmit their availability data it will only contain chronological information as GPS data is not necessary.
    • Wide-Area-Connection-Services (WACS)
      • Coverage area availability probabilities.
        • Geographic
        • Chronological
        • Logged by all devices every time they connect.
        • Shared in handshaking.
      • Information about these systems would be stored separately from the carriers that are connected to them.
        • This is for data normalization purposes. There is no need to store the geographic coverage area of an entire cell phone system redundantly for each separate device that permanently or temporarily connects to that system.
      • May or may not be a "Carrier."
        • Some protocols on some WACS can actually store and forward messages within the system.
          • SMS messages on cell systems.
          • These systems merely forward the message to a specific other device. They cannot process the message in any way.
        • Some protocols appear to allow a WACS to be a carrier but do not.
          • NNTP & SMTP on internet
          • However, the device must have an account on some server. This would actually mean that the server is the carrier not the internet.
          • In this case treat the server as a stationary carrier that is always connected to the WACS.
      • The WACS will simply extend the coverage area of a device when that device happens to be connected to the service.
      • It should never be assumed that if the message reaches a WACS that the message will necessarily reach it's destination. The destination may also be a mobile device that may or may not ever come into range of the WACS.
      • Devices must consider the up-time of that system in their calculations.
    • Transmission Capabilities
      • Keep in mind that some devices will not be able to connect directly to some services or other devices.
    • Ad-Hoc Networks
      • A series of WiFi enabled cell phones could create an ad-hoc network that connected someone to the cell system via daisy chaining even if they are not within range of the cell system themselves. Each WiFi enabled phone in the system would act as a router or a non-carrier, Small Area Connection Service (SACS).
        • All the phones in this ad-hoc network would need to be on a list of allowable phones to use as routers. A trusted network. Perhaps some messages could travel over un-trusted, or semi-trusted networks.
        • Like the rest of this system, you should not necessarily need to actually have a connection to the cell system at that time. Using the dynamic routing map, the phones would negotiate as to which phone is most likely to connect to a stationary node and then that phone would be willing to devote more of its resources to carrying the message.
    • Security Clearances
      • A message could also contain a list of hosts that it is allowed to even be transmitted on. The current carrier could then determine if it was safe to pass the message along to a particular subsequent carrier.
      • Knowledge of which carriers have sufficient security clearance to carry the messages contained in the device. Again, this info is collected from past encounters as well as a list that is exchanged among carriers. Kind of like a tree of trust relationships that may be passed verbally among spies or mobsters. "I trust Joe but not Jim. Pass it on."
    • Resource Information
      • Knowledge of which carriers have how much in the way of resources and may be willing to carry messages of what priorities.
      • If carrier C only has enough resources for small, high-priority messages and the current device (A) knows that carrier B only ever connects to carrier C then A may not even bother transmitting a large, low-priority message to B because A knows that message will die at C.
      • Again, based on info exchanged in handshaking.
    • Efficacy History
      • Knowledge of which routes have been most effective and efficient in the past.
      • Based on Ack messages sent back from the ultimate receivers of the messages as well as shared connection logs.
      • Nodes could also track how often and how recently they have connected to a particular other node as well as how often and how recently they have successfully transmitted a packet from a particular node. This would require an acknowledgment to be sent back along the successful route. In lieu of that, the nodes could simply transmit their most recent connection logs to every node they connect to (probably similar to how routers currently work) and then build a "routing" table of the routes most likely to be successful. In this case, rather than only sending the data along that route, the system could still send the data along multiple routes but it could carry a flag to indicate that it expects to be most successful along a particular route or set of routes. The nodes that are on that route would then know to devote more resources to that message than to messages that are flagged as being less likely to be successfully routed through that node.
    • Input from the user:
      • The user can enter an expected itinerary so the device can better know where the device will be to better predict which nodes, or "carriers" it is most likely to come into contact with.
        • With appropriate touch screen technology, the user could simply draw a line on a map indicating where they expect to be and when.
      • The device can also display a prioritized list of messages that it is storing along with a map of which devices it would most like to connect with in order to transmit those messages most efficiently. The user could then adjust their itinerary if necessary to ensure the successful and prompt transmission of an important message or set of messages.
        • All the devices could be given human readable names (if they are known) so that the user can more easily choose among these devices.
      • Prompt the user for routing preferences

Gracefully Scaling the routing map

  • Some devices will have limited resources available for computing or storing this map.
  • The map can be scaled to lower resolution or limited to certain areas or time ranges for smaller devices.
  • Updating map data between devices with different amounts of resources.
    • When a device with a highly detailed map receives less detailed data it can incorporate that data into it's map.
      • There is no need to simply overwrite the highly detailed data with less detailed data.

Priority Rating

  • Rather than a simple hop-count, messages could be assigned a priority rating (perhaps with multiple facets).
  • Then, each node decides if that priority rating is high enough for it to accept the message. Rather than simply accepting or denying based on the value of the hop count number, the node could determine the probability of successfully transmitting the message to its destination based on the nodes current version of the dynamic routing table and how it expects other nodes to react to that message along the way.

Mentoring

  • This is nothing other than having a device with more computing power do some or all of the calculations for a slower device.
  • Rather than wait for a slow device to recalculate all of these maps and tables the faster one will simply take the data it has already gathered from the slow device, do all the calculations, and pass the new tables back to the slower device.
  • The fast device may be able to use much more detailed maps, etc. but it will calculate maps that it knows the slow device will be able to easily use.
  • If slower devices know that they will be in contact with a smarter device within a reasonable period then they can simply defer their calculations until they are connected to the smarter devices and let the smarter devices do the work.
    • This is analogous to graduate students waiting for their advisors to give them research topics.

Storage devices as "carriers"

  • Even though a storage device has no computing capability it will always be connected to some other, smarter device when in use.
  • That smart device can temporarily substitute it's computing abilities (brains) for those lacking in the storage device. It is nothing but Mentoring taken to the nth degree.
    • (I will refer to the smart device as the "brain" and the storage device as simply the "storage" for brevity.)
    • In other words, when the storage is connected to the brain, the brain does all the calculations that any other device would have done for itself (if it had had brains) and then the brain stores the results of those calculations back on the storage.
    • The end result is exactly the same as if the storage device had had computing abilities but then they were removed after being disconnected from the smart device.
  • General Procedure:
    • Upon connection the brain determines that storage designated as a "carrier" has been detected.
      • This can be based on some encryption code or simply the presence of all the necessary files on the storage device.
      • Naturally, almost any brain should also have the ability to "format" a storage device as a new carrier.
    • The brain reads the messages off of the storage and writes messages to the storage just as if they had been transmitted by Wi-Fi or cable.
    • The brain may interact with the user asking them where they intend to go and/or suggesting where they might think of going in order to transmit the messages most efficiently.
    • The brain updates it's log files, monitored device information, connection probability maps, and routing maps based on what it found on the storage.
    • The brain then updates that same data on the storage but based on the messages now on the storage and where they need to go.
    • The brain dismounts or ejects the storage if necessary and informs the user that the transaction is finished.
    • The user takes the storage, puts it in their pocket ,and goes on about their merry way.
  • Wireless capable storage:
    • Sure, it's possible. RFID chips anyone?
    • The brain does all the above just the same. It may or may not need or want to interact with the user.
    • An entire data transmission could take place in the time it takes a user to walk past the front of a store.
    • As with all wireless technologies, the bulk of the security would be up to the wireless technology rather than this technology. Watch out for Blue-Snarfing.
  • Will have to design a protocol for adding messages to and receiving.
  • Any kind of storage medium could be used as long as the devices know how to read and write to it.
    • Thumb-Drives
    • Bare hard drives
    • External hard drives
    • CD-ROM
    • DVD
    • RFID
    • Bar codes on a piece of paper.
      • As long as the brain can print a new piece of paper.
    • Knots on a piece of string.
      • OK, a lot of knots.

Shared Databases

  • Another extension of this technology would be if all or some of the devices used some other database that they all shared. Replication updates could be passed between the devices and all would keep their version of the database relatively up to date. Especially compared to waiting till each device could connect to some primary server and replicate the entire database at that time.
  • Yet another similar use would be if all or some of the devices used some shared content library.
    • Not all of the devices would necessarily need the same parts of the content library at the same time but there is a pretty good possibility for a lot of overlap. Over time, the devices might also decide they need different parts of the library than they had before. A good example of this would be DEMML content. As users learn they require access to different parts of the library, more content. As these users come into contact with other DEMML users their devices can query each other and ask for various parts of the library. "Have you got this part?" "No, have you got that part?" "I have some of what you want. Be my guest. Oh, by the way, some other guy I know needs this other part. Have you got that?" "Sure, send it along with my regards." You get the idea. Its all like a giant game of Go Fish where people have to walk around to different rooms to play. If and when any of those devices (even, perhaps a storage-only device) makes it to either an internet connection or a repository, it can then gather up a large chunk of the library based on requests it has brought with it. Then, the device can distribute all those parts of the library to whomever requested them using the same epidemic routing technology.
    • In some cases a small user or group of users will use a single storage device to pass requests and content back and forth between themselves in their remote village and a single repository located in a larger town. That repository may be nothing more than an old PC set up in the corner of the post office or school room. It would only need enough storage to hold all the content that was most likely to be needed by the users in its area. Even if that repository PC could only connect to the internet once a week via dial-up-modem, that would likely be often enough to keep its part of the library up to date and handle any requests for any content that it didn't already have stored.
    • The primary difference between this technique and transmitting simple messages is that a message is usually directed from one device toward another specific user or device. In this case multiple devices will be carrying around lots of chunks of the library just in case anyone else needs them. The devices can then distribute those chunks to whomever happens to ask. Requests are not sent specifically to the repository either. If any device passes a request to another device that has that part of the content, then that content will be gleefully handed over. Plus, if content is passed as a result of a third-party request, then that content will be flagged as being directed towards the requesting party. Everyone can snatch up a copy of it if they want, but everyone will also try to get that content to the original requester as soon as possible. That is when the simple message passing version of the Episodic Routing will take over. Naturally, an attempt will always be made to get those requests to the repository as soon as possible rather than pass requests around in a circle ad-infinitum. Then the responses to those requests will also be flagged as directed to the original requester.
    • In case anyone is wondering: Yes, I invented all the rest of this technology just for this one purpose; to distribute DEMML content to areas that do not have full time internet connections. I have been stewing on this problem off and on for about two years. Only after reading about Epidemic Routing was I able to make the mental leap necessary. I'm telling you! Read lots of research papers!

Possible Uses

  • If this technology were pre-installed on all cell phones, then when a hurricane takes out all the cell phone towers, people will still be able to get messages through.
  • Remember, any one person may carry multiple devices.
    • It is especially easy to carry thumb drives which have lots of memory capacity.
      • This would enable any one person to be a "carrier" of lots of data even if their cell phone doesn't have a lot of memory itself.
  • Getting information into or out of regions where the government attempts to limit access to information.
    • No one would have to smuggle any single thing (such as a video tape or DVD) across a border. All they would have to do is get near enough to a receiving station to transmit that message.
    • By severely limiting how wide-spread a message is allowed to be, and/or by breaking up messages and spreading out the "packets" over different regions, it would be hard for someone to piece together "captured" information.
    • Granted, it could also be used by terrorists. But then we could also use the wireless chatter to track their activity.
  • It may eventually be possible to design devices specifically to make use of this technology.
    • There could be a separate devices that use special hardware to calculate all these probability and routing maps, store and transmit the data.
    • Chipsets could be made to easily and inexpensively add these capabilities to all kinds of devices.
    • Thumb-drives could be designed to mate head-to head and instantly exchange data.
      • They could include a chipset as described above.
      • Yes, they could be shaped like the money exchanging devices in William Shatner's TekWar series.
      • They would have lots of storage capacity but little computing capacity.

As I said at the beginning, I hope these ideas can help researchers find solutions to the problem of getting information into and out of areas where normal internet protocols do not suffice. Technologies such as this can be used to get educational content to areas of the globe that are not covered by the internet. Closer to the current headlines, these technologies could also have been used to get information out of Iran after the election or to get information about democracy into China.
I am posting these ideas now as an attempt, however futile, to provide published "prior art" and, hopefully, prevent large corporations from patenting this technology. I do this because I believe technologies such as these should belong to the world and should be used to help everyone rather than simply to enrich the few.

The content of this post is Copyright © 2009 by Grant Sheridan Robertson. However, anyone is free to use these ideas for research purposes.

Update -11/02/09:

I came across a post on ars technica about something called Open Data Kit being created by people at the University of Washington. They use it to collect data on cell phones and then transmit it to centralized servers. I thought this idea might be of use to them so I sent them a link to this post. We'll see if it goes anywhere. As I said in my e-mail to them, "I will likely never do anything with this idea so I just want someone to take it and run with it. If it eventually helps people then I will be happy."

1 comment:

  1. I am working on a similar system although not quite as smart as this. you can find it at teotwawki.steubentech.com

    ReplyDelete