ocr questions - Page 1

Gennaro Arguzzi

Asked: 2020-07-21 09:16:51 +0800 CST

OCR of a pdf with gocr

3

I installed gocr, with the command suggested by the ubuntu terminal (sudo apt install gocr), in order to carry out an OCR recognition of the text in a pdf file. How could I use it? I didn't find a tutorial to do this.

rubo77

Asked: 2015-07-04 15:29:26 +0800 CST

ocrfeeder doesn't detect anything

4

When I try to detect text on my jpeg, it shows correctly all areas where it suspects text and images, but when I export it to ODT it only creates an ODT with empty text- and imageframes.

Do I have to configure tesseract somehow?

(I use Ubuntu 14.10 32bit)

user299889

Asked: 2014-07-03 11:23:49 +0800 CST

How I prevent hocr2pdf to use a large font from tesseract generated .hocr file?

4

Tesseract now creates an .hocr file rather than an .html file for ocr output, but this is not exactly what is at issue here. When hocr2pdf uses this output it uses a large text size with small bounding boxes since the upgrade. Most of the text doesn't even appear in the resulting pdf, and what small amount of text does appear is unreadable and unselectable.

I'm using a script that goes through each .tif file in the directory and does the ocr on each one. I use a for loop like this:

for page in "$dir"/*page*.tif
do
    base="${page%.tif}"
    tesseract "$page" "$base" -l eng hocr
    hocr2pdf -i "$page" -o "$base.pdf" < "$base.hocr"
done

I also tried specifying the resolution with a -r 400 switch to hocr2pdf, but this did not result in any changes. I can only assume that the current version of tesseract is not producing appropriate output for hocr2pdf to work with.

Tesseract is my only ocr option because it handles Icelandic and Old Norse very well, so moving to another ocr tool is probably not a possibility.

Bernard Decock

Asked: 2011-02-11 10:44:08 +0800 CST

How can I specify the language to be used by Tesseract when using OCRFeeder

6

I'm using the OCR-utility of OCRFeeder. OCRFeeder is using the tesseract-engine. I have installed the several language-packs needed for tesseract. How can I set the language such that tesseract will use the right language-file for converting the scanned document into text?

Bou

Asked: 2010-12-06 02:32:07 +0800 CST

What's the best, simplest OCR solution?

94

I'd like to scan a good amount of papers I have lying around, with the least possible hassle. I would like to convert them to images using Simple Scan, then convert them to text using OCR. Is there a good OCR app with a GUI that will give me good results at the push of a button?

OCR of a pdf with gocr

ocrfeeder doesn't detect anything

How I prevent hocr2pdf to use a large font from tesseract generated .hocr file?

How can I specify the language to be used by Tesseract when using OCRFeeder

What's the best, simplest OCR solution?

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?

Questions[ocr](ubuntu)