I am looking for software that recognizes text within images. I tried out all of the tools mentioned here (gocr, fuzzyocr, libhocr0, ocrad, ocrfeeder, ocropus, tesseract-ocr, cuneiform). My input was a photograph of a printed document, hence not hand writing, just printed letters. Of all the tools, tesseract-ocr is the most accurate in my tests, but it still produces many many errors. Hence, scanning a document to some image file, and then continuing with indexing it or performing some NLP, sadly isn't an option. The error rate is too high.
So, given the age of the above mentioned posting, are there any better tools for extracting text from images or photographs?
EDIT 1:
With "image containing text" I mean, that I have a PNG/JPG/BMP file as a source and that I want to extract the pixelized text within it and have an ASCII/UTF-8 text as result and output.
0 Answers