[Simh] OSs with accessible documentation
Paul Koning
paulkoning at comcast.net
Sat Feb 6 14:01:03 EST 2016
> On Feb 5, 2016, at 6:10 PM, Timothe Litt <litt at ieee.org> wrote:
>
> Some of the PDFs on bitsavers are searchable. It would be a good
> project to OCR the rest into searchable pdfs - as that also means that
> the text can be extracted. OCR is getting good enough (finally) that
> it's feasible. I'm sure that they'd be accepted back into bitsavers -
> searchable is good for everyone.
Some disapprove of OCR for reasons I don't really understand.
A problem with OCR is that it's hard to find a good one. I dabbled with an OCR plugin that Adobe once offered (free, and worth about that). I also once tried an open source OCR, which was vastly inferior still.
But commercial OCR programs exist that do a decent job, especially if the scanned material is clean as is the case for much of what is on Bitsavers. I use Abbyy FineReader which I rather like, but I expect there are other good ones out there too.
One key point is that you typically need to spend some time "training" the program on the particular type of material -- typeface etc. -- that you're working with. The default settings are rarely adequate.
paul
More information about the Simh
mailing list