I want to convert a .pdf
file to an .odt
file so that I can further convert it to a .doc
file. Is there any software/script that can do this. I have tried to copy the content of the .pdf
file and pasted it in liberoffice writer the formatting isn't preserved.
The document is confidential so I'd prefer not to use any on-line service for the conversion.
Any help is highly appreciated.
You could take a look at
PDF Utilities
(poppler-utils via Synaptic or apt-get) which includes pdftotext:Of course, success will depend on how the pdf file was generated. If you get what you want as a text file, you could then save that as an .odt file.
Edit: I forgot to provide the source for the quote. It's from the description tab in Synaptic for
PDF Utilities (based on Poppler).
I was annoyed by the lack of a free PDF to ODT converter too. I didn't even need anything complicated. Just a tool that generates ODT files that I can then annotate in LibreOffice (e.g. to fill out forms).
I know how to do this manually, by converting the PDF document into graphics files and then importing them into LibreOffice, but that gets tedious quite fast.
So, I finally wrote a quick little shell script that does all the required steps automatically. You can find it at https://github.com/gutschke/pdf2odt
It can take any number of PDF and image files as input and generates a ODT file that can be opened and edited in LibreOffice. Images show up as page background, so you can write over them freely. Each image is associated with its own page style. Keep that in mind, when inserting page breaks and adjust the page style as necessary.
I tested the script on both Linux and Mac. Given that it only needs a handful of reasonably standard tools, it should be quite portable.
LibreOffice is capable of importing
.pdf
files. Simply open it in a current version of LibreOffice for best results. It will, however, open the document as a drawing, and you will be able to convert it only to one of the supported image formats, not as a Writer document.Naturally, not all formatting is preserved, but at least some.
Try Calibre. It converts to html and then into other formats. It did a pretty good job on a large (183 pages) file I would have otherwise had to print.
In my case I converted it to an epub, but for fun just converted it to a .docx which turned out very well.
If the poppler-utils package is installed, the Nautilus script below (to be placed into ~/.gnome2/nautilus-scripts folder as an executable file) will help convert PDF file to HTML (option "-i" can be deleted to include images as well), which can then be opened with LibreOffice Writer and saved as ODT although the success of formatting conversion depends very much on how PDF is created.
http://ubuntuone.com/6xI1afyu6QdQvgdCGn0kym