I used tesseract to produce the special html to use with hocr2pdf starting from a muti-page tif.
I tried using hoc2pdf to produce a "sandwich pdf" (image + hidden text layer).
Hocr2pdf produces a one page pdf with all the pages superimposed.
Is there a way to solve this problem or an alternative solution?
I found a workaround to this issue. Hocr2pdf has issues with producing multi-page pdfs so I produced single-page tifs, ran tesseract-ocr, ran hocr2pdf then combined the results with the following script: