[Simh] OSs with accessible documentation

Paul Koning paulkoning at comcast.net
Sat Feb 6 14:01:03 EST 2016


> On Feb 5, 2016, at 6:10 PM, Timothe Litt <litt at ieee.org> wrote:
> 
> Some of the PDFs on bitsavers are searchable.  It would be a good
> project to OCR the rest into searchable pdfs - as that also means that
> the text can be extracted.   OCR is getting good enough (finally) that
> it's feasible.  I'm sure that they'd be accepted back into bitsavers  -
> searchable is good for everyone.

Some disapprove of OCR for reasons I don't really understand.

A problem with OCR is that it's hard to find a good one.  I dabbled with an OCR plugin that Adobe once offered (free, and worth about that).  I also once tried an open source OCR, which was vastly inferior still.

But commercial OCR programs exist that do a decent job, especially if the scanned material is clean as is the case for much of what is on Bitsavers.  I use Abbyy FineReader which I rather like, but I expect there are other good ones out there too.

One key point is that you typically need to spend some time "training" the program on the particular type of material -- typeface etc. -- that you're working with.  The default settings are rarely adequate.

	paul



More information about the Simh mailing list