Pages

Tuesday, December 29, 2009

The coolest non-gadget things for thinking people.

OK, I'm not usually one for product recommendations on blogs, and I haven't actually used either of these products myself. I just found them on the internet after reading this LifeHacker post. But I am so excited just knowing that they exist. The products are:

  • A "magnetic" paint from a company called Magamagic™.
    • Actually it has iron filings in it so magnets can stick to it.
    • I'm just guessing but it probably blocks WiFi and other microwave frequencies as well.
  • Whiteboard paint from a company called ideapaint.
    • Although certain people I know might not like the idea of drawing dry-erase drawings all over the house, I would definitely like to have some major surface area covered with this stuff.
    • It comes in more colors than just white. They are actually pretty nice, subtle colors.

I freaking LOVE whiteboards. I used whiteboards extensively when working out various aspects of DEMML™ and my Trinary Space concepts (not posted yet). And being able to stick a magnet to the wall anywhere? How cool is that?


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.

Monday, December 21, 2009

Non-Profit Health Insurance

So, why is it that we currently seem to think that insurance companies must be some kind of giant, for-profit, mega-corporation or run by the government? Until recently, a lot of companies were self insured. Those insurance programs were usually run as a non-profit-earning segment of the company. Well, why not go one step further. Why can't someone start up a health-insurance company that is entirely not-for-profit? The only real difference would be that the company doesn't earn more than it spends and pay that difference out to investors. Customers would still pay premiums and co-pays. They would simply pay based on what they could afford rather than how badly the insurance company wanted to ream them for profit. The company would still earn a lot of it's money by investing in other things, just like a regular insurance company and just like a non-profit endowment.

But where would such a company get the initial seed money to start operations? Just like any other company, they would get that money from investors. Except those investors would be foundations and individuals who where not expecting a cash return on their investment. As with other non-profits, this is usually called a donation.

But here is the twist: Rather than those "investors" simply giving money to the Non-Profit Insurance Company (NPIC), they would buy what I am going to call "Non-Profit Stock" (NPS). That stock purchase would not be directly tax deductible. Instead, the NPIC would calculate how much money they would have paid out in dividends based on the insurance premiums they received, the claims they paid out, and the average profit earned by other, for-profit, insurance companies with similar premium/claims ratios and volumes. Then, the NPIC would offer that "Non-Profit Dividend" (NPD) back to the "Non-Profit Investor" (NPI) with a choice. The "investor" could use the "dividend" to "purchase" more "stock" in the NPIC, thus ensuring that they would receive an even larger "dividend" in the future. Or the "investor" could simply give that "dividend" to the NPIC as a pure donation, thus taking the tax deduction on the cash value of the "dividend." So, it is a way to defer tax deductions into the future while ensuring that the potential deduction will grow over time.

I realize that this may require a change in the tax code but I think it would be worth it to promote the creation of many different, competing, NPICs. The tax code could even allow for other types of "Non-Profit Stock" in other types of non-profit companies that would normally earn a profit but choose not to in order to serve more of the public. Heck, the code could even allow for people to sell their "stock", thus allowing someone else to take that tax deduction in the future.


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.
However, anyone is welcome to use this idea to go out and reform health care using the principles of competition that the conservatives pretend to be so fond of.

Monday, November 23, 2009

Spiderman?

I am often anxious about everything that will be involved when I finally start getting some attention for DEMML and start actually implementing it. When I feel this way I am reminded of a classic Spiderman line and encourage myself with my own modification:

With a great idea comes great responsibility.


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.

Conflating Two Freedoms

Freedom of the people should always trump businesses freedom to make money. Conflating the two inevitably leads to reversing them.


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.

Saturday, November 21, 2009

A word to describe me.

In my graphic design class one of my first assignments was to take a word that I felt described me and stylize it in some aesthetically pleasing way. Most of the other students quickly sat down, typed a word, tried different fonts, and then used some of the fancy tools in Adobe Illustrator to mess with the outline of the word and add drop shadows and such. I went home to think.

I really don't believe that I can be described in only one word. On top of that, most people see me one way based on my outward appearance but I think I am actually quite different and more nuanced than they usually think. Sure, most everyone thinks they are unique. But I am a pretty unassuming guy and I tend to get pigeon-holed quite a lot. And, if you have read any of the other posts on this blog, you will see that I am not your average Joe either. So I decided that people usually think I am predictable while I feel that I am actually pretty indecipherable. So I set out to design a graphic that made that point. Here is what I came up with:


Predictable - Indecipherable

(You can click on the picture for a full sized view.)
At first glance it looks as if the word is "Predictable." But, if you look closely, you can find the word "Indecipherable."

In critique, my teacher said that, although the design was simple, it was the only one that was actually "Designed" and that was what the class was really about. I thought that was pretty cool

The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.

Thursday, November 5, 2009

HTMLzip

You know how some "books" are published as a folder full of HTML files with an index.html at the root of that set of folders? That makes for a heck of of a lot of files that are really compressible, just sitting there on your hard drive uncompressed. This is necessary because browsers can't see into .ZIP files. Well, I say, why the heck not? The compression algorithms seem to be everywhere except in the browsers. We could zip up a folder full of HTML files (and their accompanying images, etc.), give it an extension like htmlzip, and then just point the browser to that file. It would open the index.html file by default and there you go. An HTML book all in one file simply by zipping it up and changing the extension.

I know there are programs that will convert a set of HTML files to a .chm help file and various other things. But these are often proprietary and platform specific. This would provide a completely open, cross-platform, and really convenient way to do the same thing.


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.
However, anyone is welcome to incorporate this idea into their browser. In fact, please do. Thanks.

Tuesday, November 3, 2009

reCAPTCHA Suggestions

I was reading about the reCAPTCHA™ project this morning. On their page about High Transcription Accuracy I saw that some words are simply digitized so poorly that even humans can't make them out. However, I noticed that they kind of give up there. If humans can't make out the whole word then the transcription just keeps some garbled nonsense word and leaves it at that. For instance, look at the last error in their example. I have included screen shots from their page for easier comparison:

Unsolved reCAPTCHA errors
Original Original Scanned Word
"Solution" buub r

OK, now I want you to notice a few things:

  • The original word is clearly one word with a comma after it but the transcription shows a word and a letter separated by a space with no comma.
  • In the original word it is pretty darn clear that the first letter is a capital 'R' but that is not included in the transcribed word.
  • In the context of the rest of the document the word is obviously the name of a yacht.
  • It is pretty obvious that the second and third letters are not the same, yet in the transcription they are both 'u's.
  • The second letter in the original looks like an 'a' but it definitely does not have an ascender.
  • The third letter definitely does have an ascender and yet the letter in the transcribe word does not.
  • Depending on what the third letter is, the mark to it's right may be part of that third letter or it may be a completely separate letter. One thing is for sure, that mark is at exactly the same angle as the third letter.
  • The last letter is definitely an 'r' followed by a comma.
  • The two marks before the 'r' are either a letter with a left ascender such as a 'b' or 'h', or they are a single letter, most likely a lower-case 'L' followed by a letter that curves on the left and has no ascender..
  • The space between the last 'r' and that tall letter before it seems to be just a little too wide for the letter before it to be something like a 'b' or 'h'. It seems more likely that there are two letters before the 'r'. Something like 'le' or 'lc' perhaps.
  • When was the last time you saw a word that ended in 'br' or 'hr'?

As a matter of fact, after really looking at the word and seeing it in context, my best guess is that the word is 'Rattler,' (with the comma).

So I have a few suggestions as to how reCAPTCHA™ can improve their digitization:

  • For these hard to decipher words, they should include more of the context surrounding the target word. People often need context in order to correctly transcribe a word. Without the context we have far less to go on.
    • Include a picture of a much larger part of the original scan and highlight the part that is in question. This will give users a better feel for how certain letters show up in that scan. Different books use different fonts and different typesetters have different levels of consistency in how the type was placed during printing. Humans can intuitively use that information to help them figure out a word or a letter.
    • Include a few sentences of the text that has been transcribed with high confidence surrounding the target word. This will give humans a context within which to work. Only by looking at the whole sentence was I able to determine that the word in question was the name of a Yacht. That narrows down the list of possibilities considerably.
  • We need an XML standard for marking up these 'iffy' words so that we can at least capture and store what information we do have about them. Then, even if we can't figure out exactly what the word is, intelligent search engines can locate it as a possible match to something else.
    • For instance, someone may be searching for a yacht named 'Rattler.' No search engine would have matched to 'buub r.' However, if the search engine could know that the word in this document started with a capital 'R', ended with a lower-case 'r', had from five to seven letters in it, that two or three of the letters had ascenders, and it was likely the name of a yacht, then the engine could show it as a possible match. If the search engine was integrated with the reCAPTCHA™ engine then, once the user had determined that the word very likely was 'Rattler' then reCAPTCHA could update its information about the word and make it even easier to find and transcribe later.

I do not propose to devise that XML standard here but it should at least be able to do what I have mentioned here as well as list all the most likely choices as entered by reCAPTCHA™ users. As it stands all we get is a garbled aggregate of what people guessed based on the severely limited information they had to work with. Not that reCAPTCHA™ isn't genius and doing wonderful work. But in these situations there is definitely room for improvement.

Again, as they say on Usenet, "I hope that helps."


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson. However, reCAPTCHA™ is welcome to use these ideas to improve the quality of their fine work.

Wednesday, October 21, 2009

Response to: What is the big deal about Meghan McCain's tits?

While wasting time on DIGG I noticed the above linked blog post and decided to respond. I realize that this is not my usual fair of technology ideas, but I do have opinions about other things in this world.

Well, the fact of the matter is that we are all sexual beings in that sex and sexual attraction is part of our makeup. We evolved that way so that we would actually reproduce. Women evolved to not show outward signs of "estrus" so that men would be attracted to them all the time. This was beneficial at some time in our distant past. Now that our culture is evolving faster than out bodies this whole thing has us rather confused. But there is no denying that - other than some exceptions and differences in degree - people are sexually attracted to the bodies and behaviors of others.

There is also no denying that some of us indulge this urge more than others. And that is perfectly natural too. The problem comes when those people who - through either nature or nurture - have less of a tendency toward this indulgence feel it is incumbent on them to force others to behave as they see fit. That is when calling someone a "slut" shifts from a statement of statistical fact (meaning that person is a statistical outlier in the indulgence bell curve) to being an insult (meaning that person is somehow bad for simply being who they are).

About all one can do about those people who would choose to control others is to tell them to shut the hell up and then ignore them as best as one can. If they attempt to use force - through the imposition of laws - then we have to work together to shut them up, either through publicly exposing their attempts to control or by other political means.

However, licking a book while making an expression that looks as if the book either tastes absolutely horrible or that one is being forced to do so at gunpoint doesn't really seem to help much. If you want to illustrate that you like the book then just hold it and smile. Look as if you actually enjoy reading the book. Trying - unsuccessfully, I might add - to look as if you would enjoy having sexual congress with the book simply makes you appear to be quite the statistical outlier indeed.


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.

Sunday, October 18, 2009

Information Processing in Nerve Cells, My Own Theory

Way back when I was a kid, I think this was when I was in Junior High or about 1972, my science teacher showed us a movie about how nerve impulses traveled up a nerve fiber. I remember that they took a live but "pithed" frog (they had destroyed the brain with a needle) and strapped it's leg onto a board. They had dissected out one of the nerve fibers in the leg and connected an electrode to the nerve. That electrode was then connected to an oscilloscope (an electronic device used to look at the shape of electrical signals. I know that all sounds pretty gross but all they showed us in the movie was the leg strapped down with the electrode coming out. Anyway, they also had a device that could be made to touch the frog's foot with a specific amount of pressure. They proceeded to touch the foot and show the signal that was detected in the nerve fiber.

Now most people think of a nerve signal traveling up a nerve fiber as a singular event. They think only one pulse travels up the nerve for any one event sensed by the nerve endings. After all, all the biology textbooks ever describe is how one pulse travels up the fiber. However, when the device touched the frog's foot a series of pulses were detected by the oscilloscope. What's more, the pulses were not regular. They were separated by varying spaces like the lines on a bar code. This is very similar to what radio control hobbyists call Pulse Code Modulation, or PCM. In PCM the differences in the spacing between a set of pulses provide information to the device on the receiving end, such as the R/C airplane. In addition, the pattern of pulses detected in the frog's nerve was repeated every time the device touched it with the same pressure. However, when the device was set to touch the foot with a different pressure, a different pattern was detected, which also showed up exactly the same every time that particular amount of pressure was applied. This is again similar to PCM in that the same pattern always means the same thing to the receiving end and different patterns mean different things. Usually different patterns in PCM mean to set the flaps on an airplane to a different angle, or to turn the wheels of a car to a different degree.

Wednesday, October 7, 2009

Multi-Hyperlinks

OK, as always, I have no idea whether someone has already thought of this or patented the idea, but here is an idea for making one <a> tag refer to multiple different URLs and having those choices show up in a menu when right-clicking the link.

There are two different approaches to doing this:

  • Multiple additional attributes in the <a> tag itself.
    • Unfortunately, attribute names must be unique so something would have to create the new attributes and make sure that no names were replicated. This would require additional code in an HTML editor or extra work on the part of the web designer.
  • Inserting a sequence of additional elements within the content of the <a> tag.
    • Those elements would be empty content tags that only contain a href= attribute for the additional hyperlink.
    • It appears (after cursory experimentation) that empty <a> tags could be used.

Then all that would be needed is a plug-in for the browser to insert these hyperlinks into the context menu when a user right-clicked on the link. Eventually, this functionality could be incorporated into all browsers, eliminating the need for the plug-in. For those whose browser does not have this functionality, the primary href in the enclosing <a> tag will still work when left-clicked.

Test Link

The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.
However, the idea (if it does not already belong to someone else) is hereby declared to be in the public domain. Anyone can use this idea for any purpose.

Wednesday, September 30, 2009

Really?! Microsoft Update really needed to screw with my networking settings?

I have a desktop and a Tablet PC which I network together via a crossover cable. I use the NWLink IPX/SPX NetBIOS Compatible Transport Protocol and the NWLink NetBIOS protocol rather than the standard Microsoft NetBIOS because I find it to be far more reliable, and a little bit faster.

This morning I ran the Microsoft Update on my Tablet PC. When it was finished the network connection between my Tablet PC and desktop wouldn't work. A quick look revealed that Microsoft Update had disabled my NWLink protocols and installed Microsoft's NetBIOS protocols instead.

Are they that freaking determined that everyone on the planet use their products that they have to screw up my network connections just to increase the proportion of computers that are using NetBIOS? Really!?

And the bizarre thing is that the NWLink protocols were written by Microsoft too. The technology just wasn't invented by Microsoft so they gotta try to shut it down. Talk about poor sports.

Unfortunately, I am far too dependent on certain Windows compatible programs to jump ship over to Linux. And Apple is even worse than Microsoft when it comes to consumer lock-in. Not going there either.


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.

Tuesday, September 29, 2009

Acrobat Clippings

This is my idea for what I call Acrobat Clippings.

Currently all acrobat documents must be viewed in a separate window via the acrobat browser plug-in. Why. Every other type of browser plug-in can display the specified content inline with other HTML content on the page. Why must all Acrobat content be kept segregated? I understand thatmost Acrobat documents are just that, entire documents. However, if the Acrobat browser plug-in could display documents in-line (on the same page and in the same window as other HTML content) then a wider array of applications opens up for the Acrobat format.

I propose that Adobe develop a means by which the Acrobat browser plug-in can dislpay content in-line with other HTML content.In addition, I propose that they develop a means by which users can select segments of a page in an Acrobat document and quickly save just that selected area of the page as a separate Acrobat "Clipping."

Then, people can include just the parts of an Acrobat document, that cannot easily be displayed as HTML, directly in-line with their HTML content. Yes, I know it is possible to save a selected part of an Acrobat document as a graphic file by using the snapshot tool and pasting that into a graphics program. However, this greatly degrades the quality of the image and throws away all the additional features offered in Acrobat such as linking or forms. Using this technology, web designers and others wishing to document complicated information could place diagrams on their web pages with all the functionality of the Acrobat format. Imagine diagrams with links, without all the complicated area mapping required in HTML.


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.
However, I give anyone permission to use this idea in their product, especially Adobe.

Monday, September 28, 2009

Save my spot!

Here is an idea for programs like Adobe Acrobat (both the reader and the full version). Allow users to set a "Current Page" bookmark. I know Acrobat will remember the last view of a document. But sometimes that is not where the user left off reading. They may have simply looked at another page for reference. Besides, that is saved on the computer rather than within the document. Therefore, if the user is synchronizing the document between multiple machines, as I often do, then when they pick up reading on the other machine they will be taken back to where they left off on that particular machine. This is not ideal.

All these programs need is a special bookmark, using whatever technology they currently use for bookmarks, but just update the location pointed to by that bookmark to whatever is the current page when the appropriate button is pressed or menu item is chosen.


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.

Sunday, September 27, 2009

Black Box Intermediaries

A recent posting on Slashdot mentions an article about a bank accidentally sending customer data to some unknown gmail account. The bank is suing to get that account disabled and to get the owner's personal information. Most of the people on Slashdot just seem to be moaning and complaining. So I posted the following idea:


Why can't the courts in these cases set up third-party intermediaries to receive the information that the plaintiffs are asking for (such as someone's personally-identifying information) and then have all communications go through that intermediary? This is just the same as e-mails from Craig's List users going through Craig's List instead of directly between the users. It could even be a system where no human ever sees the information. Instead it could be encrypted such that no one would ever be able to dig it out. Then the plaintiff could contact the individual and they could carry on a conversation and straighten things out, without the individual's individual identifying information ever being disclosed.

Perhaps what we need is a government sponsored but publicly run (and open-source developed) central system to provide this service. It would have to be open source so that anyone could check to make sure that the system didn't have any back doors.

Without a system like this, then the technique used by this bank could become a powerful tool to do an end-run around privacy laws. If I want to find out the personal information about someone, or even shut down their e-mail accounts or all of their internet access, all I have to do is claim to have accidentally sent them private information about someone else. Heck, I could just make up bogus info and send it to the individual. Who would know, because that info would be kept sealed "for the privacy of the people in the list."


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.

Wednesday, September 23, 2009

Steady Misrepresentation

I was reading the January 2009 Scientific American, a special issue about Charles Darwin and evolution. On page 99 they have a quote from Darwin that I think is especially prescient:

"Great is the power of steady misrepresentation; but the history of science shows that fortunately this power does not long endure."

Charles Darwin

Sunday, September 20, 2009

Self Healing Hyperlinks

I'm an avid user of Microsoft OneNote 2007. I keep all my notes in it. I even wrote the outline and first draft for this post using it. I upgraded to the 2007 version specifically because it allowed the creation of hyperlinks between documents. Unfortunately those hyperlinks aren't worth a darn because if you move a page then all the links to anything on that page get broken, even links from within the same page. Many links don't even work correctly when you first make them. It is incredibly frustrating.

So, I have been stewing on a way to create a note-taking application based on HTML rather than Microsoft's proprietary format. I quickly realized that links created within this new application would also break as soon as the user moved a page in the collection. Sure, I could require the user to always and only use the application to move the pages then have the app update all the links to a page whenever it is moved. However, this would only work if the user made sure to use the application to move the pages and never forgot and simply moved them manually. It would also make moving pages pretty darn slow because it would have to search through every page in the system to find links to update. Therefore, I have been trying to think of a way to quickly find the new page location and update the links as necessary.

In what seems like a separate issue, I have noticed that academic papers often exist in multiple different locations all over the internet. Sometimes the file is named appropriately but oftentimes it is not. Sometimes there is good descriptive text surrounding the link to the file but oftentimes not. Sometimes the original file can still be found exactly where you first referenced it five years ago but usually not. This means that finding a current reference to an original academic paper for which you only have old citation information can be quite daunting. So I have been also trying to think of a way so that one could use a single link to refer to any one of the multiple identical copies of that document no matter where it was actually located on the internet and instantly retrieve that document, even if the original was no longer in place.

I had been thinking about using some kind of indexing system to enable one (or one's browser) to find these moved web pages. This morning, as I was waking up it finally hit me how to solve both of these problems and eliminate the vast majority of 404 errors at the same time. I call this system "Self Healing Hyperlinks."

The basis of the system is to insert additional information into the URL in a link so that either the target web server or the user's browser can find that target even if it has been moved. This additional information consists of domain and/or globally unique HTML element ID values which are included as attributes in the elements of the link target. The system also requires an indexing engine to be installed as a plug-in for the web-server software in order to index and look up these element IDs. When a broken link sends a browser to the target web site, that web server can look up the new location in its index rather than return a 404 error. One or more global indexing servers would also be set up to crawl the internet looking for documents that contain these special element IDs. Then, when a browser cannot find a target that was linked to using this additional information and the target web server did not return a replacement page, then the browser can query the global link database and still find the document. The system does not require any additional scripting in the web pages or the on the server. The web server and browser plug-ins would do all the work.

Wednesday, September 16, 2009

My First Flash Project

Here is the first Adobe Flash animation that I made for my AR399 class (Animation and Gaming). I have attempted to recreate the experience I had when I was a child and first learned that a "famous artist" could possibly make a mistake. This painting is what made art seem accessible to me. I have been a big Bruegel fan ever since.

Unfortunately, I had a hell of a time getting Blogger to display my flash properly. So I have posted it up on the tiny bit of personal web space I have with my ISP. Just click the picture to go to the actual animation.

Bruegel's Peasant Wedding

The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.
Naturally, this only includes the Flash animation. The rights information for the original image that I used is at http://commons.wikimedia.org/wiki/File:Pieter_Bruegel_d._%C3%84._011b.jpg

Saturday, August 22, 2009

Oops! I Didn't Know It Meant That!

I am trying out the new Google Reader "Send To" feature that allows me to easily blog about articles I read via Google Reader. Here is the comment I posted about the following article:


Oops! I Didn't Know It Meant That!:

"In my previous post, I wrote about how “advanced” vocabulary words that can be perfect when used in writing are often awkward or pretentious if used in speech. "Advanced” vocabulary words always take up residence in educated minds. They're wonderful, since they bring liveliness to writing by breaking up sentences that otherwise contain mostly ordinary words, and, if used right, they..."


I am currently studying for the GRE. I consider myself to be well read, but many of the GRE vocabulary words just seem completely unnecessary to me. When I read them -or worse- hear them in speech, I always find myself thinking, "Ah, someone else who had to study those damned vocabulary words." Now that they know them they seem compelled to use them, as if to continuously prove that they did, in fact, pass the GRE.

Personally, I think the GRE is ruining academic writing. I have never believed in an ivory tower. All writing should be as accessible as is reasonably possible considering the topic at hand. Peppering our writing with what I alternatively call "GRE-speak" or "Graduatese" does not serve to "spice it up." It only serves to add another layer of separation between academics and the general public. In an age when many in the public (and in pundit land) are inclined to dismiss science and academics, spicing up our writing and speech with incomprehensible words - especially ones that can easily be replaced with common words - only adds to the divisiveness.

I, for one, intend to memorize those infernal vocabulary words, take the test, and then promptly forget them. When I communicate, I prefer to communicate with as many people as possible. Not just those who have been forced to waste their time memorizing a bunch of extraneous words.


The contents of this post is Copyright © 2009 by Grant Sheridan Robertson.

Wednesday, August 5, 2009

Screw Web 3.0: Whatever Happened to Web 0.0?

I recently read an article posted on ARS Technica and mentioned on Slashdot about replacing MS Word with a wiki for internal business intellegence. Rather than a cogent discussion of how this could be accomplished, I found lots of naysaying and complaining about how that was old news. However, just because some problem is old news is no reason to ignore it. So I posted the following reply on Slashdot.

I agree that Word and other word processors are not as useful as they used to be in an age where many documents are not necessarily printed out. However, that does not mean that nothing will ever be printed again. I also agree that a Wiki is a great way to store business intelligence however, MediaWiki does not have a very easy to use editor. Other wiki servers offer much better editors. I also agree with many posters in Slashdot and ARS that without the ability to easily embed things like spreadsheets into wiki pages then we will still need word processors to generate the documents the way we really want them to look and then post those as .PDF files.

But why is everyone just sitting around whining about how his idea won't work instead of getting together and figuring out a way to make it work.

Thursday, July 30, 2009

Two kinds of people?

There are two kinds of people in the world. Those who think there are two kinds of people in the world, and those who know better.


The content of this post is Copyright © 2009 by Grant Sheridan Robertson.

Tuesday, July 28, 2009

Box? What box?

Think as if the box were never there.

The content of this post is Copyright © 2009 by Grant Sheridan Robertson.

Sunday, July 26, 2009

Every Man For Himself?

I am often amazed by the simplistic thinking that goes into what some apparently believe to be "deep thoughts." In the signature line of this reply on Slashdot I found the following:

In free countries, how did the powerful become powerful? Have they done something you couldn't do?

To which I reply: Often they have done something (or a series of things) that most people wouldn't do, and that many believe one shouldn't do. It is rarely ever a simple matter of the ones with the power having been the ones who were merely more capable. Free countries still have social norms, standard ethical codes, and even laws that a few choose to ignore. That those few who choose to ignore the norms, codes, and laws sometimes gain power is not an excuse for the rest of us to ignore them as well. An "every man for himself" culture often sounds great until that "every man" happens to be someone who is willing or able to take from you to get what they want.


Update: 7/28/09 - I had a nice exchange of messages with the user of the signature and, as it turns out, he/she was attempting to spur others to think how they could work to become more powerful. He/she had no intention of encouraging others to cheat in order to gain power. So we discussed possible alternatives and he/she settled on "In free countries, how did the powerful become powerful? Have they done something you couldn't do (honorably)?" It may not be perfect but it does prevent him/her from appearing to condone cheating to get ahead. Kudos to him/her for encouraging others to improve their lives.


The content of this post is Copyright © 2009 by Grant Sheridan Robertson.

Saturday, July 25, 2009

Intelligent Epidemic Routing, Preliminary Notes

Abstract:
Previous work by two Duke professors proposes to transmit data between intermittently-connected, mobile devices by transmitting said data to all devices encountered, and also passing data along from previously encountered devices, such that the data will eventually get to its desired destination. This is called an "epidemic routing protocol" in the paper. Retransmission of data is limited to a specified number of "hops" in order to avoid overloading the "system" with infinite recopying of said data.

This post is an embellishment of that idea, using a rather convoluted means of tracking when and where other known devices may likely be encountered in the future. This allows the data to only be retransmitted to devices that have a high likelihood of successfully shuttling the data to its destination … eventually. This increases the "bandwidth" of the system at the cost of additional memory and compute time used on the individual mobile devices. In this system, content is transmitted as a unit, rather than being packetized, and it is transmitted in a store-and-forward manner, similar to the old Fidonet protocol. Therefore, each device carries a cache of data for possible retransmission to other encountered devices and the data is distributed according to algorithms that determine the best selection of other devices to retransmit said data to.

The proposed system, called an "intelligent epidemic routing protocol," is in no way meant to replace the internet. Nor is it expected to be fast or efficient. But it is a possible solution for situations where the internet may be unavailable (even through mesh networks), very intermittent, filtered (as in China, North Korea, or a future Texas), or untrustworthy (as in … um … now). It is accepted that it may take quite some time for requests to make their way through the system and then for the content to make its way back to the requester (on the order of weeks). The system works best for content that is not time sensitive and may be requested by more than one device but not necessarily all devices, such as for transmitting educational content that could likely be used by more than one person in the underserved region.


Ever since I invented DEMML™, I have been stewing on how to actually transmit and/or transfer the content by means other than a direct internet connection. While it would be easy to simply copy files into a directory, I wanted a means whereby someone could simply plug a thumb-drive into a computer and the computer would transfer the appropriate information. I recently began searching for "store and forward" technologies on the internet and came across the following article:
Amin Vahdat and David Becker, Epidemic Routing for Partially-Connected Ad Hoc Networks, Duke Technical Report CS-2000-06, Jul. 2000, available from http://issg.cs.duke.edu/epidemic/epidemic.pdf
In this article, the authors discuss a method of transferring messages across disconnected systems of devices by simply copying the messages to every device encountered until the message finally reaches its destination. Naturally, the method is a little more sophisticated than that but you get the idea. The article does also allude to some additional measures that could be taken to make the system more efficient.
For example, under certain circumstances there may be locality to the movement patterns of mobile nodes. In this case, it would be worthwhile to exchange a list of the last n nodes encountered by a host during anti-entropy. This information can be utilized to once again identify appropriate carriers under the principal that if a particular host has been seen recently, it will be seen again in the near future. (page 12 of the .PDF)
Unfortunately, (or perhaps fortunately) I was not satisfied with the level of completeness in that statement. In fact, before even reading it, several ideas had already been simmering in my mind.
Over the course of several days I have attempted to organize those ideas into an outline. I had hoped to be able to write them up into a slightly more readable form but I just don't have the time or energy right now as I have home projects to do and a GRE to study for. (A chore which I am sure I will have more to say about later.) Therefore, I have decided to simply post the raw notes up as they sit right now. The following is nothing more than an outline for an idea. However, it is a rather thorough outline. Printed, it takes up about 12 pages. For those people who are interested in such esoteric things, I hope this helps. I hope my ideas can help spur more ideas for you or help in your research.

Thursday, July 16, 2009

Prior Art Combinator: A tool to preemptively invalidate troll patents.

As I understand it, in order to invalidate a claim in a patent, all the elements of the claim must appear together in the same published article. From what I have read, blog posts count as well. I have not been able to determine if all the concepts listed together in a book would count. I imagine that the Patent and Trademark Office (PTO) would require that the book specifically mention that all the elements listed in the book actually belong to the same invention.

Also, as I understand it, a patent claim may still be considered valid if it lists a limitation that is not listed in the prior art. What I can't figure out so far is: If the prior art lists more elements than the patent claim, does that invalidate the claim or can the very act of not listing elements be considered a limitation within the claim. In effect, allowing patent trolls to simply patent anything and everything they find in prior art simply by removing one element from the prior art and calling that their claim. This does not seem reasonable but then very little in patent law these days does.

Finally, I have not seen anything that says the prior art publication had to be written by a human.

So I have an invention to prevent the patenting of inventions. While that may sound onerous, I suspect that it will actually alleviate a lot of the trouble we have been having with patent trolls and return patents to the realm of real innovation.

I propose that we create a database of all possible discrete elements and limitations that have appeared in all existing patents and any other documentation anywhere. Heck, people could even add any additional elements or limitations as they see fit. Then use a simple computer program to generate and "publish" articles with every possible combination of element and limitation. I know, we are talking about trillions and trillions of articles. The program could never finish.

Therefore, I also propose the following additional, optional features:

  • Users of the system could vote on which would be the most important elements and limitations to make sure are covered in our "articles."
  • Users could add tags to elements indicating which fields they are most likely to be used in. This would reduce the number of "articles" about "Umbrellas with USB connectors sewn into the seats of their pants for control and monitoring of garage doors."
  • Users could then also vote on which tags are the most important to focus on.

I realize that this would likely require more web document storage space than is currently indexed by Google. Therefore, I also propose the following additional options for storing and "publishing" the "articles":

  • The web pages could actually be automatically generated by PHP or JSP scripts using variables to represent and insert the particular combination of elements and limitations specified for that "article." Each element and limitation in the database would necessarily have a unique code number or index key. The body of the web page could then simply list the codes as parameters and call the function to build the page. The script would then simply insert the appropriate text from the database in the appropriate location in the web page and send it out. Presto, we have a web page listing a specific set of elements and limitations. As far as the browser is concerned, the web page is static and has been "published."
  • It would also be possible to just use HTML server side includes and generate static web pages with various combinations of elements and limitations included. Naturally, this would require that all the elements and limitations be in separate .shtml files on the web server.
  • Here is an even more efficient, if ethereal proposition: Simply create one web page with some PHP or JSP in it to generate a page based on the codes given as parameters, just as before. However, the codes for the elements and limitations would actually be part of the URL and sent to the server in an HTML GET request. The script then takes the codes specified in the URL and automatically generates whatever combination of elements and limitations are specified in the URL. We get an actual web page sent back even though one never really existed before. Simply by posting that one web page and putting the elements and limitations in the database, we have statistically published all possible combinations at once. Anyone who wants to view an article with a particular combination of elements and limitations simply constructs the proper URL and they can see that the page is "there." It's like quantum computing without all those pesky sub-atomic particles. Like Schrodinger's Cat, every combination would "exist" simultaneously but only by observing it does it then become, well, observable. For prior art dating purposes, I suppose we would also have to list when each element and limitation was added to the database. Only if all the elements predate the offending patent would the "Schrodinger's Article" in question apply. Now, I don't know if the PTO would go for this reasoning but it would be worth a try. And it would certainly be fun.

Finally, assuming that the PTO would not fall for "Schrodinger's Article" idea, and that we might not be able to store all of these different combinations of elements on one server, we could spread the love around by providing a web site where users can generate source code for "articles" using any of the dynamic or static methods described above, then allow users to copy and paste the HTML code for that particular combination onto a page on their own web site or blog. People all over the world could set aside some spare space on their web server to "publish" a bunch of these articles. The database could remain on one central server (with read-only access allowed by all) so that volunteers would only need to devote the storage space for the codes rather than the entire elements. Volunteers who cannot use PHP on their site can download a static version of the article.


Unfortunately, I do not have the web-design or coding skills necessary to create the proposed database or web site. I don't know if anyone else has thought of this before. Hopefully, it hasn't been patented. Therefore, as far as I am legally able and as long as the idea doesn't already "belong" to someone else, I am hereby authorizing anyone to use this idea to build such a site. The text of the article is released under the following Creative Commons license:
Creative Commons License
Prior Art Combinator: A tool to preemptively invalidate troll patents. by Grant Sheridan Robertson is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Permissions beyond the scope of this license may be available at www.ideationizing.com.

Wednesday, July 15, 2009

Got GOT? : Using a multi-faceted, multi-leveled, multi-graph to analyze the interconnectedness of all things in a gigantic Graph-O'-Things

In this post I propose a simple - if tedious - means of calculating the interconnectedness between any two nodes in a multi-connected, multi-leveled, multi-faceted graph. The strength of a relationship or the amount of connectedness between two nodes can be calculated based on how many other - closely related - things each of the two have that are also related to each other. The calculation method is essentially nothing more than the same method used in calculating the total resistance in a network of simple electronics resistors. However, the vast size of the graph as well as the multi-leveled, multi-faceted nature of it both increase the number of the calculations and the complexity of task. At the same time, the clustered nature of most multi-graphs which represent the real world provides a means to simplify the calculations and perhaps take a few shortcuts.

I will expound upon this idea by elucidating several possible real-world uses for this technique, starting with the purpose for which it was invented: Organizing the graph of interconnectedness between all the possible topics in in the vast tree of all educational content that will eventually make up the DEMML™ content library. Then I will explain how additional levels (or 'branes) can be added to this gigantic multi-graph to facilitate the analysis and visualization of the interconnectedness between any and all research papers ever published as well as the authors who wrote them and the institutions in which they worked at the time.

Saturday, July 4, 2009

Picture Tests 2

I am doing some tests to determine why Blogger sometimes does not show pictures. I think it has to do with both the type of link and how the URL was generated. If you don't care about such things then you can just ignore this post. In this second experiment I am making sure to use a different image file for each test. In my previous test I just used the same image file for all the different kinds of links and I couldn't get it to break. I think this is because, once the image file was downloaded once for a good link, that the browser may have just used the good image for all the other links as well. I don't know enough of the details of how browsers choose to pull files out of their cache to say for sure. So, this time, I am using separate images to avoid that possibility.


NO a tag, blogspot.com URL

Ebbinghaus Chart 1

YES a tag, blogspot.com URL

This is the standard form of link that you get when you insert a picture the normal way that Blogger expects you to: Compose tab, Insert Picture button, Upload a file from your computer. Blogger sticks that file in a special, unlisted, album in your Picassa account and then links to it there. Interestingly, if you look at the HTML of the link below, you will see that the URL of the image comes from the blogspot.com domain instead of picassa or even picasaweb.google.com. In contrast, if you copy the URL from the Picassa web site (as I did for the next two tests) you get a URL from ggpht.com. I can only guess that this is a domain used by Google for serving up these files via some kind of server redirection or virtual server technology which I won't get into here.

Ebbinghaus Curve 1

(If you already have posts where the images sometimes disappear and those images are on Picassa, then you can use the following trick:

  1. Download those images from Picassa to your computer.
  2. Delete those images from Picassa.
  3. Make sure the files on your computer have meaningful names.
  4. Create a new, bogus, post in Blogger. (You will not be publishing this post.)
  5. Insert all the images into the bogus post by going to the Compose tab, clicking on the Add Image button, then uploading the image. (This will then stick the image back into your Picassa album reserved for this blog but Blogger will then know the special URL that goes through the blogspot.com domain instead of that crazy ggpht.com domain.)
    • Blogger creates an <a> tag with some weird code in it and buries an <img> tag within that. You will want to use this entire thing in the next step.
  6. Copy those <a> tags with the <img> tag inside them and paste them into your original post in the appropriate locations.
    • I paste them above the original <img> tag then check to make sure I have the same height and width specified before deleting the original <img> tag.


NO a tag, ggpht.com URL

This image link is likely to be broken intermittently. For some reason Blogger does not always like accessing images from the ggpht.com domain. To ensure reliability, do not use this technique.

Ebbinghaus Chart 2

YES a tag, ggpht.com URL

This image link is likely to be broken intermittently. For some reason Blogger does not always like accessing images from the ggpht.com domain. To ensure reliability, do not use this technique.

Ebbinghaus vs. Calculated

Using the Embed link provided by Picassa:

From Ideationizing

Using a simple img tag with ggpht.com URL:

This image is stored on Picassa but in the album called Blogger rather than the album named for my blog. This seems to make a difference.


Using a simple img tag from some other domain:

This image is from my web site about the XML standard I am working on (www.demml.org). I am using it because I know I have the right to link to it.


Link created by referring to web based image in Compose tab insert image window, using URL from ggpht.com domain:

I used the ggpht.com URL obtained by right-clicking on the image in Picassa. Then I pasted that URL into the field where Blogger's Compose tab ; insert image dialog lets you paste a URL. If you look at the HTML below you will see that Blogger did not modify that URL in any way. The question is whether this image will show up later.

Update: July 9, 2009: This link did eventually break. This image is stored in my album in Picassa that is named for my blog. I have found that if I right click on any images that are in this album and use the resulting URL that the link eventually breaks. However, if I use an image that is in any of my other albums then the link doesn't break.


Notes:

  • It seems as if there are only two major ways to get images in a Blogger blog such that they will not eventually disappear.
    • Upload the file from your computer using the Compose tab ; Add Image button. This will put the image in your Picassa album named for your blog and will use the blogspot.com domain in the link.
    • Create your own <img> tag in the Blogger post-editor's Edit HTML tab. However, be sure to use an image that is on Picassa but not stored in the album named for your blog. I have found that using the Picassa album called Blogger seems to work reliably. In this case, you can simply find the image in Picassa then right-click on it to get it's URL.
  • I have found that images which are stored on other web sites do not show up when using Internet Explorer.

You are free to copy this post and use it to do experiments on your own blog. Just make sure to change the links to images you control, rather than leaving the links pointing to my images. Please also give full attribution and link back to my original page.
Thank you.

Friday, July 3, 2009

A Word About Copyright and Plagarism

Unless otherwise specified in an individual post, I, Grant Sheridan Robertson retain the copyright for all of my posts. As many of these posts are papers I have written in college, some may be tempted to take these posts and turn them in as their own work. I warn you not to do it. If you can find this web-site then so can your teacher. In addition, I have a rather unique writing style and your teacher will probably recognize that my style does not match your own personal style. You are welcome to quote these papers, although none of them should be considered "academic" or "peer reviewed" because they have not been published in any official academic journals. However, you should feel free to use my list of references to add to your own research. That is always OK - and even encouraged - in academia.


This post is Copyright © 2009 by Grant Sheridan Robertson.

Hard Drive Troubles

I am making this post primarily to warn others about a particular model of hard disk drive. I realize that not many people will actually read this post but, hopefully, someone will find it when searching for information about this drive.

The drive is question is the Western Digital Raptor WD360ADFD 36GB 10000 RPM 16MB Cache SATA 1.5Gb/s 3.5" Hard Drive - OEM. I purchased two of these drives back in January of 2007. Both have now bitten the dust. The first died this April. The second (my C: drive) just died today. Western Digital no longer makes this drive but someone may still be selling them used or from back stock. DO NOT BUY THIS DRIVE. Granted, I am working from a statistical sample of N=2. But when 100% of the drives purchased goes bad within months of each other, that is a really bad sign.

If you do have this drive then be sure to back up your data almost every day and perhaps think about replacing it soon. I had no warning that the drive was going bad. It worked fine last night and then this morning all it did is click when I booted my computer.

Fortunately, Western Digital's tech support is very nice and did not hassle me at all. When I told them the sound it was making the young man simply said, "Wow" and started processing my RMA.

When I RMA'd the last drive they sent me a larger but slower drive. I did not mind. I expect that they will do the same and now I am glad because I don't want another of these drives in my computer again. When I purchased these drives I also purchased a similar model but in a 74 GB capacity. Now I am worried that it will also go bad on me. That one would be more trouble because it is the drive where I keep all my data. I guess I will have to start backing up a lot more often.

Additional Keywords: problem croaked issues

This post is Copyright © 2009 by Grant Sheridan Robertson.

Thursday, July 2, 2009

Basics of Blogger Template Structure

After digging through the Blogger help files I finally figured out what I think is the basics of the structure of a Blogger template. A Blogger template basically consists of regular HTML with special elements which the server will replace with content from the database. I have summarized the basic structure in a set of nested tables to better illustrate what fits where. Each box represents an element that must be put into the template as a unit. Comments about each element are in green under each element.

 

HTML (Any amount of HTML code of whatever type you like can go here, before the first <b:section> element.)

<b:section …>   (one or more)

<b:widget …>   (one or more)

<b:includable id='main'>   (required, only one)

Any mix of HTML and any of the following:

 

<data:.../>

This is where the magic happens. The <data> elements are replaced with actual content from the Blogger database.

 

<b:loop var='identifier' values='set-of-data'>

[repeated content goes here]

</b:loop>

 

<b:if cond='condition'>

[content to display if condition is true]

<b:else/>

[content to display if condition is false]

</b:if>

 

<b:include name='IdOfExistingIncludable' data='i'/>

name attribute must match the ID attribute of one of the subsequent includables in this widget.

Value of data attribute is passed like a function parameter to the included-includable and is inserted anywhere the value of that included-includable's var attribute appears.

 

</b:includable>

All the real work of bringing in the data from the Blogger database and arranging it is done inside the <b:includable> element.

 

<b:includable id='post' var='p'>

Same content allowed as in main includable.

</b:includable>

var 'p' is replaced with value passed as data when this includable is included.

 

</b:widget>

</b:section>

There can not be any HTML inside a <b:section> or <b:widget> element unless it is inside an <b:includable> element.

 

More HTML and/or sections.

 

Notes:

  • You can include as much more HTML and or more <b:section> elements (as defined above) as necessary to complete the design of your template.
  • The <b:section> elements can be nested within HTML <div> tags or tables or whatever you prefer to use in order to position the <b:section> elements the way you want on the page.

Naturally, this post is not an attempt to explain everything about how Blogger templates work. However, I have yet to see any other description that lays things out graphically so that it is easy to see the whole thing at a glance. I hope this will be helpful for people who are having trouble putting all the pieces together.


This post is Copyright © 2009 by Grant Sheridan Robertson.

Grantism - 1

Sometimes I think of or just say (without thinking) weird but oddly profound things. Often they seem appropriate for a T-shirt or bumper sticker. I will just call them "Grantisms" for lack of any better term. Here is one I found while cleaning off my desk:

Some people think bein' smart's stupid, but they're dumb.

This post is Copyright © 2009 by Grant Sheridan Robertson.
Way back, last September I saw a great piece on Nightline about an isolated Amazonian tribe that was suffering because of the ranchers destroying the tribe's lands. (Jungle Journey: Living With an Isolated Amazon Tribe: Exclusive Access to a Remote Amazon Village Where No Reporter Has Been Before By DAN HARRIS Sept. 22, 2008). After watching the story, I came up with an idea that might help the tribe. I quickly wrote it down on a piece of paper and just as quickly lost it on my desk under a pile of other notes and magazines. So, I'm cleaning off my desk today (Yes, it has been over a year since I last undertook this arduous task.), found the notes, and finally posted my comment. Here is what I said:
I realize that this comment is a little late in coming. I just recently found my notes about this article while trying to dig out from under the pile of paper on my desk. Some members of the tribe should put together a presentation (Ala An Inconvenient Truth) and go on the lecture circuit Al Gore style. Who better to tell the world about their way of life, their philosophy, and what is happening to them than actual members of the tribe. they could use the money they earn from lecture fees and all the extra donations they receive to fight the ranchers and corrupt governments that let the ranchers ruin their land. Who knows, they and their philosophy of "Take only what you need and share it with others" may even help the rest of us save our world too.

This post is Copyright © 2009 by Grant Sheridan Robertson.

Monday, June 29, 2009

Why Standards Fail: An Open Letter to OAG and Other Standards Committees

I wrote this letter in response to an article I had read in in some computer magazine decrying the failure of some standard which had been championed by the author. I actually don't recall which magazine or which standard. I don't think it was ever published. As you can see, I have been quite interested in successful standards design for quite some time.



Why Standards Fail
An Open Letter to OAG and Other Standards Committees.
By Grant Robertson
(written sometime between 1993 and 1996)

Why is it that, often, even good standards fail while, at the same time, obviously inferior ones thrive. Perhaps it is because those creating the standards forget three very important factors for success in the world today: Marketing, marketing, and, of course, marketing. What many don't realize is that a standard is a product that must be sold like anything else. It must be sold to the vendors who will implement it and it must be sold to the customers who will pay for it. What good does it do to make the worlds biggest mouse trap if there are no mice that big? And how do you expect to sell the world’s best snargthik traps if no one would know what a snargthik was even if they saw one.

It is the same with standards. You could create a huge, voluminous standard. But if it has to be implemented on an all or nothing basis then most vendors won’t touch it, or will only implement random parts of it, and most customers wouldn’t want to pay for it if they did. Why spend a fortune on software that adheres to some new standard if it’s going to fall flat because no other vendors adopted it? And why pay any money at all for software that does something no one has heard of or thinks they’ll ever need just because it does so in a standardized way.

Sunday, June 28, 2009

Importing Microsoft Word Documents Into Blogger

I have a lot of old papers that I decided to post here on my blog. They are all Word 2003 documents and many of them have quite a bit of formatting which I did not want to replicate by hand in HTML. So, I have tried a bunch of weird tricks and figured out a system for importing Word documents, preserving the formatting in the post while avoiding messing up the CSS styles in the rest of the blog page. This is a relatively convoluted procedure. However, once you try it a couple of times it will work relatively quickly, at least compared to reformatting a 20 page research paper by hand.

One of the steps in this procedure is made a little easier if you have Adobe DreamWeaver. I use CS3 but I don’t know for sure which older versions have the features you will need. It is not absolutely necessary and I will explain a workaround at the appropriate place. Also, I use Microsoft Windows XP. Part of this procedure depends on the behavior of the Windows clipboard. I cannot guarantee that it will work in Mac OS or Linux. However, some of the CSS editing tricks will work for anyone.

Basic Procedure

  1. Copy and paste the Document from word into the Blogger editor compose tab.
  2. Isolate the imported Word document by surrounding its HTML code with a <span class="UniqueName"> and </span> set of tags in the Blogger editor’s HTML tab.
  3. Fix the Word CSS stylesheet by inserting "span.UniqueName " before all of the CSS selectors (without the quotes but with the trailing space), thereby creating descendant selectors that will only apply within the imported Word document and will not affect the formatting of the rest of your blog.

Detailed Procedure

  1. Open the Word document.
  2. Open the Blogger editor to the HTML tab.
  3. If you want to type some introductory text, explaining a bit about what the paper is about or why you wrote it then do it in the Blogger editor now. Surround each paragraph with <p></p> tags.
    • If I do this step then I also put a horizontal rule <hr/> under my initial comments and before the actual paper or Word document I intend to import.
  4. Prepare the Word Document:
    • Often academic papers will have been formatted with double line spacing. Usually this does not read well online so you will want to remove it. If the spacing is set in the styles then you should edit the styles rather than simply selecting all of the text and setting it to single line spacing. The latter results in Word placing a style code in each and every major tag in the document.
    • Text boxes get exported as pictures so, if there are any of these in your document that need to pull the text out of the text box and place it where it will be appropriate within the main body of the document.
    • Remember, all you have to do is not save the modified Word document and these changes will not hurt anything.
  5. Select all of the text in the Word document that you want in your blog post and copy it.
  6. Go to the Blogger post editor and switch to the Compose tab.
  7. Place the cursor under any introductory text you may have entered in Step 3 and paste the Word document there.
    • When you do this paste either Word or the Windows Clipboard will automatically convert the Word formatting to CSS styles and assign HTML element class names to each of the paragraphs which will be enclosed in <p> tags.
    • If you now switch to the Blogger editor’s HTML tab you will be able to see quite a lot of extraneous HTML elements. Some we will keep and some we will either get rid of or ignore. Part of this mess is the CSS stylesheet for this particular document. You will notice that it is now contained in what will be the body of your final blog web page rather than within the <head> tag as is normally the case. This is apparently OK. It is also what will allow your post to keep the same formatting as the original document.
    • If you were to publish your post at this point it is entirely possible that some of the CSS styles listed in Word’s CSS stylesheet would conflict with the CSS stylesheet in your Blogger template. The next steps will rectify that situation.
  8. Clean up extraneous HTML tags.
    • If you have DreamWeaver you can use it to modify the HTML that resulted from the previous paste.
      1. Copy all the HTML from the Blogger editor HTML tab and paste it into a new DreamWeaver HTML document between the <body> and </body> tags in the code editor (NOT the design view).
      2. In DreamWeaver choose { Commands / Clean Up Word HTML… ; <basic> } and select the following check boxes:
        • Remove all word specific markup
        • Clean up <font> tags
        • Fix invalidly nested tags
        • Apply source formatting
        • (Do NOT select “Clean up CSS.”)
      1. Click [OK].
      2. Select all of the HTML code between the <body> and </body> tags in the DreamWeaver code editor.
      3. Copy that and paste it in place of ALL the HTML code that is in the Blogger editor’s HTML tab.
    • If you do not have DreamWeaver you can safely ignore the extraneous HTML tags if you want. Or you can edit out whatever bits that Blogger complains about when you try to post later. I have found that anything between and tags can go. Also all of the meta tags can go. This is what Blogger will complain about the most.
  1. Fix the CSS so that Word’s CSS styles do not conflict with Blogger’s CSS styles.
    • You may not even have to do anything for this step. Most of the HTML element class names and associated CSS selectors created by Word have unique names that will be very unlikely to conflict with the CSS in your Blogger template. In fact, if all of the styles in your Word document were ones you created yourself then it is almost certain that none of their names will conflict. However, if you simply used the Normal style or any of the heading styles without creating your own, uniquely named, version of them then you will probably have problems. This is because Word sometimes uses a simple p, h1, or h2 element selector in the CSS styles. Since most Blogger templates use the <div> tags with specific element id attributes to associate HTML elements with CSS styles, the simple p or h1 selectors in the Word CSS stylesheet takes precedence.
    • There is an easy way to check for these problems. Simply publish your post then look at your blog. Check to see if any of the formatting of the text outside of your post has changed. Check both on your blog’s home page and on the post’s individual page. If nothing is wrong then you are done.
    • If some of the CSS styles have conflicted then follow these steps:
      1. Go to the Blogger editor’s HTML tab and surround all of the HTML code in the post with a <span class="UniqueName"> and </span> set of tags.
        • This sets the post off with it’s own unique class that will apply to the entire imported Word document but not to any of the rest of the blog page or any of your other posts. If you entered introductory text in Step 3 then you may want to place the first <span> tag just after that text and the optional <hr/> tag. This will cause that introductory text to retain the same styling as the rest of your blog, thereby visually setting it apart from the text imported from Word.
        • You must use a unique class name for each of your posts so that one post won't interfere with another.
      2. Look for Word’s CSS stylesheet near the top of the post.
        • If you were able to use DreamWeaver to clean up the HTML then it will be right at the top of your HTML code (or right under your introductory text from Step 3). If you are choosing to just ignore the extraneous HTML then the part you want may be buried a little bit from the top. It will be between a <style> and </style> tag and NOT within any and tags.
        • You will notice that this CSS stylesheet is not formatted nicely at all. All the CSS rules are just strung together on one line of text.
      3. Insert "span.UniqueName " before each of the CSS selectors.
        • Do not insert the quotes but make sure to include the space between the inserted text and the existing selector.
        • This creates what they call a descendent selector. This means that the simple selector will only apply if it is also found buried somewhere within the specified ancestor element. In this case this means that the simple element selector will only apply to elements that are inside our <span class="UniqueName">and</span>set of tags, which means only within our imported Word document.
        • Remember to insert the "span.UniqueName " before each selector in grouped selectors separately or it will only apply to the first one. (Grouped selectors are a list of selectors separated by commas.)
        • Word puts HTML comments around its CSS stylesheet. In DreamWeaver this makes it all light gray. It is safe to remove the HTML comments from around the CSS rules within the <style> tags. This allows DreamWeaver to use its syntax highlighting which makes it much easier to find all of the CSS selectors.

Pictures

If your word document has any pictures in it then you will have to do “a few” additional steps. Again, it goes pretty quick once you get used to it.

  1. In Word { File / Save as Web Page… ; File name: = “whatever.htm” ; Save as type: = “Web Page (*.htm, *.html)”[v] ; [OK] }.
    • This is just to get the pictures out of the file quickly and easily. You should not use the resulting HTML code to paste into Blogger. For some reason copying and pasting from Word into Blogger’s Compose tab cleans up a few things that are difficult to clean up any other way. (At least with my limited knowledge.)
    • Word will create a folder and stick all the files in it.
    • You can just save this to the desktop because you will be deleting it when you are through.
  2. Go through the image files that Word created and rename them to something more meaningful.
  3. Upload all these images to your preferred file or image hosting service. You could just use Picassa if you want.
  4. Go through all the HTML that was the result of pasting the document into Blogger and edit the <img> tags one by one.
    1. Locate the uploaded image on your image hosting service.
    2. Right click on it and choose “Copy Image Location” or some similar menu item.
      • You want to copy the image’s URL to the clipboard. Sometimes copying the URL from the browser’s address bar does not give the correct URL. Always get it by right clicking on the image itself.
    3. In the appropriate <img> tag select all of the text between the quotes for the src attribute and paste the new URL there.

Notes

  • If you look at the source for this page you will see that it was originally created in Word. While I could have created it directly in DreamWeaver, I wanted you to be able to look at the code as an example.
  • I have found that it is easier to do all of this editing in DreamWeaver. I just copy all the HTML from Blogger’s HTML tab and paste it between the <body> tags in a new DreamWeaver HTML file. Then it is easy to resize the images if necessary without guessing at widths and heights. If I have to resize the image appreciably (which may be necessary if your blog’s main <div> is pretty narrow as in many templates) then I will then turn that image into a link to the full size version of the image for the user’s convenience. All you have to do is paste that same URL into the link field in the image properties bar in DreamWeaver and it automatically wraps the image in an <a> tag. When I am finished I just copy all the HTML code between the <body> tags and paste it into Blogger’s HTML tab, replacing the previous code.
  • In some of these instructions I have used Grant's Concise GUI Notation System (GCGUINS) which I have described in a separate blog post.
  • I should warn you that things do not always come out peachy keen.
    • If you have a deeply indented hierarchical list in your Word document then some levels of it will get turned into HTML lists and some of them will simply be turned into paragraphs with <p> tags instead.
    • If you have any sample HTML tags in your document (like I have lots of in this one) then pasting the text to Blogger's Compose tab will not convert those to appropriate HTML entities (&lt; or &gt; etc.). You will have to dig through the HTML code and switch them around. The easiest way to do that is to cut them from the code window in DreamWeaver and then paste them in the Design window. DreamWeaver will then convert them to the appropriate entities in the HTML code so they show up properly in the post.
    • If you subsequently edit your post in the Blogger editor's Compose tab then it may strip out some or all of the formatting. So, once you have done this you should always use Blogger's HTML tab or an external HTML editor to edit the post rather than the Compose tab.
  • I am sure that others will find many other problems with this technique. Sometimes it goes quickly and sometimes, as with this particular document, it takes a lot of additional editing. Only you can decide whether it will be faster to use this technique or to completely reformat your document in HTML. Feel free to post comments here or in the Blogger Help Forum about any other problems you find.
  • Even if you don't copy and paste from Word some of the other tricks here could help.
    • Using the <span class="UniqueName"> and <span> tags around the entire post to isolate it and then using span.UniqueName to create descendent selectors in your CSS will allow you to use different formatting in your post.
    • If you no longer have access to the original pictures used in your document then the technique shown here can get them out quickly and easily.

This post is Copyright © 2009 by Grant Sheridan Robertson.