We recently became aware that our fancy HP workgroup printers which can copy a document and email the result as pdf to a set of email addresses only creates image pdf’s. None of the text in the pdf is searchable. After some investigation, we discovered that we needed to install the HP Digital Sending Software and let it perform OCR post-processing on the image pdf that was created by the printer.
This works all well and good, except that the HP Digital Sending Software places the OCR’ed file in a single configurable folder on the Print Server’s local disk . You might be wondering: “What’s the problem with that? Just share the folder to all users and be done with it.” Well, my answer is: “Not so fast. Much of the material being scanned is confidential and should not be placed where an unauthorized user might be able to access it.”
After further analysis of how the HP Digital Sending Software works, we determined that for every document scanned there are 2 files being created. One is the OCR’ed pdf file and the other is an informational xml file. The xml file contains the information that the user provided during the scan. This gave us access to the “from email address”, “to email address(es)”, and the OCR’ed pdf file name.
With this information, it was pretty straight forward to write a FileWatcher service in C# that would:
- Handle the file changed events events with a filter on “*.xml” files.
I initially thought that I would be processing a file created event, but the HP software creates the xml file, then updates it.
- Generate an email from the “from email address” to the “to email address(es)” with the OCR’ed pdf file as an attachment.
- Delete the processed xml and pdf files from the folder.
It was pretty easy to plug this gap. However, without our little “gap filler” the HP Digital Sending Software is just technology and not a solution for a Company. I wonder why HP doesn’t provide tools with their Digital Sending Software to perform this function already?