Tesseract now creates an .hocr file rather than an .html file for ocr output, but this is not exactly what is at issue here. When hocr2pdf uses this output it uses a large text size with small bounding boxes since the upgrade. Most of the text doesn't even appear in the resulting pdf, and what small amount of text does appear is unreadable and unselectable.
I'm using a script that goes through each .tif file in the directory and does the ocr on each one. I use a for loop like this:
for page in "$dir"/*page*.tif
do
base="${page%.tif}"
tesseract "$page" "$base" -l eng hocr
hocr2pdf -i "$page" -o "$base.pdf" < "$base.hocr"
done
I also tried specifying the resolution with a -r 400
switch to hocr2pdf, but this did not result in any changes. I can only assume that the current version of tesseract is not producing appropriate output for hocr2pdf to work with.
Tesseract is my only ocr option because it handles Icelandic and Old Norse very well, so moving to another ocr tool is probably not a possibility.