<div dir="ltr"><div class="gmail_extra">I don't know if the list allows attachments, but attached is the text generated by tesseract 3.05 (current development head) out of the box for the scanned image PDF here: <a href="http://bitsavers.trailing-edge.com/pdf/dec/pdp15/DEC-15-GXZC-D_MUMPS_Apr72.pdf">http://bitsavers.trailing-edge.com/pdf/dec/pdp15/DEC-15-GXZC-D_MUMPS_Apr72.pdf</a></div><div class="gmail_extra"><br></div><div class="gmail_extra">No OCR is going to be good enough for applications like text-to-speech without additional manual correction, but for the purposes of help search engines find the PDFs in the first place and then easily searching within them after they're found, I think the current generation of OCR is more than adequate.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Tom</div></div>