Monday, March 28, 2011

How to remove Renderable Text from .PDF files to allow OCR

For all those people out there - students, academics, archivists, and eBooks readers - who have been stymied by Adobe® Acrobat's® stubborn refusal to perform optical character recognition (OCR) on a document, claiming: "Acrobat could not perform recognition (OCR) on this page because: This page contains renderable text." - I believe I have found a workable solution. Notice, I am not saying it is "The" solution. That would be for Adobe® to fix their software. I just think this is a workable solution which is much better than the "save to TIFF and rebuild from there" solution offered by Adobe®. Using this technique, it is possible to obtain a searchable and text-select-able document while preserving the original image of the scanned document, if desired.


  1. Print the "malfunctioning" .PDF file to the "Microsoft XPS Document Writer" printer driver (which you will need to install).
  2. Convert the resulting .XPS file to an Acrobat® .PDF file.
  3. Perform OCR in Acrobat® using one of the three available output styles depending on the type of document you have and the results you want.