I am trying to train Tesseract in Ubuntu 20.04.1 LTS.I have downloaded tesseract and the training tools required.
For the training data I am using jTessBoxEditor.I have the .tiff files but I am unable to make the .box files.When I type the following in my terminal:
tesseract --psm 6 --oem 3 Liberation_serif.font.exp0.tif Liberation_serif.font.exp0 makebox
I get the following error:
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
I have tried downloading eng.traineddata from git and pasting it to tessdata.But I got the same error message.Then I changed TESSDATA_PREFIX several times to make it point tessdata but I got the same error message again. How do I resolve this?
Edit:The tesseract executable and tesseract source code I downloaded are in different locations.
I downloaded tesseract in two locations.The location that TESSDATA_PREFIX was pointing to did not have eng.traineddata.I downloaded it in that directory from github and used
cat >> .pam_environment
again to make TESSDATA_PREFIX point that location.I logged in again and I am able to make .box files now.