I am playing around with the OCR Demo. I am using a PDF file which contains an invoice. I am wondering why it does not recognize nearly 100% of its contents.
I am working with the default settings of the demo application, except the default language is now set to German.
The PDF file I am using is the one I already sent for another support case (case where the PDF characters read in weird order)
After some more investigation I got the following, interesting, results:
Small segments, which only contains one font, and maybe fontsize as well, seems to be recognized much better than huge segments
Non standard fonts, but no script fonts (Arial, Courier New or Times New Roman are standard in this definition) seems to have a general problem. My sample uses a font called Eurostile