I am trying to use the best model from tesseract. However, I am getting the following error:
tesseract sample.jpg stdout --tessdata-dir tessdata/
Error opening data file tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
Here is the folder structure: .
├── sample.jpg
└── tessdata
└── eng.traineddata
Ubuntu Version:
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
tesseract version:
tesseract 4.0.0-beta.1
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
I had the same problem, I did a fair bit of looking around for a solution and these looked complicated but not always successful - then I realised that the problem was actually rather simple, a quick fix was right there in that the error message is explicit about where the files are expected to be, in the parent folder of tessdata.
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory
It seems a configuration file expects files to be one level up so in my case /usr/share/tesseract-ocr/4.00/
By copying the language files and the training data (in my case eng.traineddata and osd.traineddata) in the tessdata folder /usr/share/tesseract-ocr/4.00/tessdata to the parent folder one level up
After this tesseract did not have any more problems
These were the correct locations for an Ubuntu 19.10 installation
You seem to have not set the
TESSDATA_PREFIX
variable. Edit ~/.bashrc with any text editor, eg.nano ~/.bashrc'
and add a lineexport TESSDATA_PREFIX='<absolute path to tessdata>'
where I suppose tessdata refers to the folder you have mentioned.Do run
source ~/.bashrc
once you are done editing and have saved .bashrc. Hope that helps!