David's technobabble Rotating Header Image

searchable PDF

OCR’ing all of the PDF files in a SharePoint Document Library using PowerShell and Solid PDF Tools

A recent review of the PDF Documents in our Document Control Library, revealed that most were “image only” PDF’s. We’ve run our document control system on different versions of SharePoint technologies since SharePoint Portal Server 2001. We are currently running SharePoint 2007. I’m surprised that someone did not previously notice that most of our PDF files were not showing up in the searches.

The question is:“How can we get all of these PDFs reprocessed to be searchable for a reasonable cost?” continue reading » »

A walkthrough the code to extend the HP Digital Sending Software

Recently, I wrote about extending the HP Digital Sending Software to “close the gap” and email the OCR’ed result. I received a request to release the code, so I’ve decided to do so with a bit a of documentation.
continue reading » »

Making OCR’ed PDF’s using the HP Digital Sending Software

We recently became aware that our fancy HP workgroup printers which can copy a document and email the result as pdf to a set of email addresses only creates image pdf’s. None of the text in the pdf is searchable. After some investigation, we discovered that we needed to install the HP Digital Sending Software and let it perform OCR post-processing on the image pdf that was created by the printer.

continue reading » »

Bad Behavior has blocked 525 access attempts in the last 7 days.