How to unzip a zip file from the Terminal?

Question

1kb

Asked: 2012-06-13 14:36:28 +0800 CST2012-06-13 14:36:28 +0800 CST 2012-06-13 14:36:28 +0800 CST

Extracting embedded images from a PDF

772

Before I started using Ubuntu I used Nitro PDF reader to automatically extract images from PDF files. Is there a PDF reader for Linux that does this?

I would like to be able to extract images faster/easier than when taking a snapshot.

11 Answers

Voted

pl1nk · Answer 1 · 2012-06-13T15:06:49+08:00

Use `pdfimages`

pdfimages is a PDF image extractor tool which saves the images in a PDF file to PPM, PBM, JPEG or JPEG 2000 file(s) format.

It's a part of the poppler-utils package, which you'll need to install.

Usage: pdfimages [options] <PDF-file> <image-root>

option -all will extract images in original format.
option -j will extract images as .jpg (caveat: images are converted and usually size is larger than original)

Example1: The following extracts all images from a PDF file, saving them in their orginal format.

pdfimages -all in.pdf /tmp/out

Example2: The following extracts all images from a PDF file, saving them in JPEG format.

pdfimages -j in.pdf /tmp/out

Will save images from PDF file in.pdf in files /tmp/out-000.jpg (or /tmp/out-000.pbm; see below), /tmp/out-001.jpg, etc.

The pdfimages man page explains:

-j:  Normally, all images are written as PBM (for monochrome images) or PPM for
     non-monochrome images) files. With this option,  images in DCT format are
     saved as JPEG files. All non-DCT images are saved in PBM/PPM format as usual.

Gremlin · Answer 2 · 2014-09-12T05:12:52+08:00

Gremlin

2014-09-12T05:12:52+08:002014-09-12T05:12:52+08:00

I often use Inkscape for this. Load the page, and delete all the other stuff. The advantage is that you can get vector images in SVG and modify them as you choose.

33

Gabriel Staples · Answer 3 · 2019-11-11T20:26:41+08:00

Note that this question is specifically asking about "Extracting embedded images from a PDF". The keyword is extracting! That means: I have a PDF; it has some images embedded within it; how do I get them out!? If that is your question, use pdfimages as the main answer by @pl1nk states.

How to convert a PDF into a bunch of images:

Many people Googling around and landing on this question (myself included), however, are searching for a slightly different question on not even realizing the difference until hours of frustration later. So, if you are looking for "How to convert a PDF into a bunch of images" instead, which is NOT the same thing as "how to extract images from a PDF", here's how: use pdftoppm. "PPM" here is an image format, so this simply means "PDF to image". It works extremely well, albeit slow for a modern multi-core system, since it's a single-threaded application and doesn't take advantage of multiple cores of processing power.

Ubuntu 18.04 comes with pdftoppm version 0.62.0. Check your version with pdftoppm -v:

$ pdftoppm -v
pdftoppm version 0.62.0
Copyright 2005-2017 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC

Read the manual pages with man pdftoppm to see all of its many useful features.

Supported output image formats:

As the man pages show, pdftoppm allows you to output images in the following formats:

PPM (default)
PNG (with -png)
JPEG (with -jpeg)
TIFF (with -tiff)

It also allows you to specify output in monochrome (-mono) or grayscale (-gray) (default is color), to specify page numbers, to place output images into a folder, to crop and resize, specify resolution, specify jpeg quality (between 0 and 100), specify TIFF compression, process only even or odd-numbered pages, etc. It works extremely well and is EXTREMELY USEFUL!

Here's some examples of how to use `pdftoppm` to convert a PDF to a bunch of image files:

Output ppm files as pg-1.ppm, pg-2.ppm, pg-3.ppm, etc, in default 150 DPI x and y resolution:
```
pdftoppm mypdf.pdf pg
```
Same as 1, except place all of the output files in a folder called images:
```
mkdir -p images && pdftoppm mypdf.pdf images/pg
```
[My favorite] Output images into "images" folder in jpeg format with 300 DPI x & y resolution instead of the default 150 DPI. Note that the output images are at some default jpeg compression level, and will take up approximately 0.1~1 MB in space per file for 300 DPI resolution and assuming standard 8.5" x 11" PDF pages.
```
mkdir -p images && pdftoppm -jpeg -r 300 mypdf.pdf images/pg
```
Output images into "images" folder in jpeg format with 300 DPI x & y resolution, at the highest quality jpeg level possible! quality values can range from 0 to 100. See the man pages. With quality set to 100 and resolution set to 300 DPI, expect each jpeg file to take up 2x the storage as above, with sizes ranging from ~0.2~2MB, depending on the content, and assuming 8.5" x 11" PDF pages.
```
mkdir -p images && pdftoppm -jpeg -jpegopt quality=100 -r 300 mypdf.pdf images/pg
```
Output uncompressed .tif* images with **300 DPI x & y resolution. Output file sizes will be approximately 25 MB for 300 DPI and 8.5" x 11" PDF pages.
```
mkdir -p images && pdftoppm -tiff -r 300 mypdf.pdf images/pg
```

Note that outputing each page above at 300 DPI takes 15~45 seconds on my slow computer, meaning that a 100 pg PDF could take as long as 100 x 45/60 = 75 minutes or so for 300 DPI jpeg images, for example.

To time how long the process takes on your computer, simply place the time command in front of the pdftoppm portion of any of the commands above. Ex: here's the output from converting a PDF which had 3 pages:

$ mkdir -p images && time pdftoppm -tiff -r 300 testpdf.pdf images/pg

real    1m47.572s
user    1m45.675s
sys 0m1.536s

This means it took a total real-life clock time of 1m47.572s, or 60 + ~48 = 108 sec, which is 108/3 = 36 seconds per page.

[How to turn a PDF into a searchable PDF w/pdf2searchablepdf] How to turn a pdf into a text searchable pdf?
How to turn a pdf into a text searchable pdf?
How to convert PDF to Image?
https://stackoverflow.com/questions/6605006/convert-pdf-to-image-with-high-resolution/58795684#58795684
https://www.linuxuprising.com/2019/03/how-to-convert-pdf-to-image-png-jpeg.html
How to programmatically determine DPI of images in PDF file?

To Do · Answer 4 · 2012-06-14T09:18:09+08:00

To Do

2012-06-14T09:18:09+08:002012-06-14T09:18:09+08:00

You may also try pdfmod. It is a GUI (graphical interface) which can extract images and do other basic pdf manipulation.

6

Pantelis Sopasakis · Answer 5 · 2016-04-13T08:50:03+08:00

I have a double-column PDF file with embedded images created with LaTeX where the original images were provided as EPS. I tried the proposed solution based on pdfimages, but unfortunately, it didn't return any images. I tried then to use Inkscape, but the SVG images it generated were distorted and I had no luck exporting them as EPS either.

The software that worked for we was the MasterPdfEditor.

Here is the procedure

Open your file using Master PDF Editor
Use the edit tool (Alt+1) to select the image you need to extract
Copy the figure (Ctrl+C)
Click on the surrounding dashed frame around the image and check out the right sidebar (Object Inspector) and click on "Geomerty". There you can see the size of your selection
Create a new file (Ctrl+N). It will prompt you to provide the page size. Provide the exact size of your image and create the new file
Now it's a bit tricky: paste the image (Ctrl+V). The image may not show in the new file. Use the arrows to move it until you are able to trace it.
Use the arrows to centre the image in the new page
Save as PDF

The result is of very high quality, but the software is not free of charge. There is a demo version which "allows you to try all features," but comes with "the addition of a watermark on output file." To be frank, I didn't notice any watermark in the produced PDF.

macieksk · Answer 6 · 2014-06-13T06:17:07+08:00

macieksk

2014-06-13T06:17:07+08:002014-06-13T06:17:07+08:00

If what you need is a cropped image in pdf/eps format, then extract a page with the image using pdfmod (as suggested by To Do).

Then using pdfcrop you may crop it properly setting margins by trial and error:

pdfcrop --margins "-15 -50 0 -140" extracted_page.pdf

3

DafyddG · Answer 7 · 2015-01-07T15:54:05+08:00

DafyddG

2015-01-07T15:54:05+08:002015-01-07T15:54:05+08:00

With pdfimages the extracted image may be in two or more parts. A simple way to put them together again with no worries about extracted formats is to import the parts into LibreOffice Draw, crop with the image crop dialogue, position the parts, adjust the page size and export in whatever format you prefer.

2

user203413 · Answer 8 · 2015-07-07T07:09:50+08:00

user203413

2015-07-07T07:09:50+08:002015-07-07T07:09:50+08:00

If you want to crop a image from a pdf with a pdfviewer, you can try okular. It can crop anything (texts or images) in png or jpeg format. If you want to extract images in png format from a pdf, you can do it with minimal command with pdftohtml. It converts pdf to html plus images. Here you can find an example - https://www.youtube.com/watch?v=CG1rf7k3xo8 . If you want to extract many images from a pdf, I suggest you to try this.

1

Yash · Answer 9 · 2018-10-05T09:49:47+08:00

Yash

2018-10-05T09:49:47+08:002018-10-05T09:49:47+08:00

Software used : Xreader OS : Antergos

Steps:

Open PDF
Right click on image
Select Save Image As..
Input file name and extension.
Save.

1

orthodoxpirate · Answer 10 · 2013-07-24T17:39:33+08:00

orthodoxpirate

2013-07-24T17:39:33+08:002013-07-24T17:39:33+08:00

I use pdfimages which is a command line tool and it works great for me. It is very easy to use and you can use --help option to learn more about its usage. I use Ubuntu and it comes pre-installed. If your pdf files is encrypted or password protected there are options for that, so this tool works great. You can read more about pdfimages here

0

Extracting embedded images from a PDF

Use `pdfimages`

How to convert a PDF into a bunch of images:

Supported output image formats:

Here's some examples of how to use `pdftoppm` to convert a PDF to a bunch of image files:

Related:

How to delete a non-empty directory in Terminal?