[Simh] OSs with accessible documentation

Paul Koning paulkoning at comcast.net
Sat Feb 6 16:05:53 EST 2016


> On Feb 6, 2016, at 2:28 PM, Tom Morris <tfmorris at gmail.com> wrote:
> 
> ...
> I think Tesseract is pretty close to the quality of ABBYY.  Google has trained it on a very large corpus and it's used for Google Books, Google Drive OCR, etc, so it gets a fair amount of attention.  Of course, a lot of the training effort has gone into training it for over 100 languages, which isn't really relevant to old computer documentation, but even for plain English, it's received lots of training attention.

Is Tesseract open source?  It sounds vaguely like the one I tried, but I'm not sure; I remember something that felt more like a toolkit than like an application.

Google's OCR is pretty lousy in many cases.  Perhaps that's because they just feed it stuff without ever looking at the result.  There are plenty of Google books that have errors in the majority of the words.

	paul




More information about the Simh mailing list