Script for converting PDF pages to images
Jorge Martínez Garrido
May 24, 2024
Recently, I created some beautiful images from a PDf document, see the image below:
Previous images were generated by taking a manual snapshot of every page and applying them some shadow. This is not ideal. Every snapshot may have a different size since I am manually cropping a region.
Thus, I wondered if there was any program out there. This program should be able to accept the path to a PDF file, the page number and the output file path where the generated image can be saved.
pdftoppm: easy PDF to PNG converter
I was able to find pdftoppm. This program is part of the poppler library. This library contains other utilities for extracting images, inspecting the font or splitting pages, among others.
- pdfdetach -- lists or extracts embedded files (attachments)
- pdffonts -- font analyzer
- pdfimages -- image extractor
- pdfinfo -- document information
- pdfseparate -- page extraction tool
- pdfsig -- verifies digital signatures
- pdftocairo -- PDF to PNG/JPEG/PDF/PS/EPS/SVG converter using Cairo
- pdftohtml -- PDF to HTML converter
- pdftoppm -- PDF to PPM/PNG/JPEG image converter
- pdftops -- PDF to PostScript (PS) converter
- pdftotext -- text extraction
- pdfunite -- document merging tool
The only issue is that pdftoppm generates
images in PPM format. Despite its
name, modern versions of this program include a -png
flag, supporting thus the
PNG format. Another flag,
-singlefile
, is required to avoid pdftoppm writing numbers to the output
filename.
pdftoppm \
# Specify the desired page
-f <first-page> -l <last-page> \
# Convert to PNG format
-png <file.pdf> \
# Do not write numbers in output filename
-singlefile <outfile-prefix>
ImageMagick: editing digital images
So far so good, I was able to generate a PNG image from a PDF page. However, I also wanted to apply a shadow effect on the image. The solution was to use imagemagick.
Turns out that imagemagick provides lots of command-line tools, including some for editing images and applying some styles.
In particular, the convert command has a -shadow
flag for simulating an
image shadow. A great set of code samples are provided in the official
documentation.
The shadow flag accepts various parameters following the format:
-shadow <opacity>x<blur><+-xoffset><+-yoffset>
Following the code snippets, the following command can be used:
convert <input-file> \
# Create a drop shadow effect in a new layer
\( +clone -background <color-name> -shadow <opacity>x<blur><+-xoffset><+-yoffset> \) \
# Swap layers, merge them, and remove any canvas
+swap -background none -layers merge +repage \
<output-file>
Crafting the program
With the main logic in place, I was able to code the following script:
#!/bin/bash
# Ensure correct number of arguments
if [ "$#" -ne 3 ]; then
echo "Usage: $0 <file.pdf> <page-number> <output.png>"
exit 1
fi
PDF_FILE=$1
PAGE_NUMBER=$2
OUTPUT_IMAGE=$3
# Check that input file exists
if [ ! -f "${PDF_FILE}" ]; then
echo "File ${PDF_FILE} not found."
exit 1
fi
# Convert desired page from PDF file to a PNG image
pdftoppm -f ${PAGE_NUMBER} -l ${PAGE_NUMBER} -singlefile -png "${PDF_FILE}" "${OUTPUT_IMAGE%.*}"
# Ensure the output image is created and in PNG format
if [ ! -f "${OUTPUT_IMAGE%.*}.png" ]; then
echo "Failed to create PNG file."
exit 1
fi
# Apply two drop shadow effects with some blur using ImageMagick
convert "${OUTPUT_IMAGE%.*}.png" \
\( -clone 0 -background black -shadow 40x10+15+15 \) \
+swap -background none -layers merge +repage \
\( -clone 0 -background black -shadow 40x10-5-5 \) \
+swap -background none -layers merge +repage \
"${OUTPUT_IMAGE}"
# Check if the final output image was successfully created
if [ $? -eq 0 ]; then
echo "Image processing complete: ${OUTPUT_IMAGE}"
else
echo "Image processing failed."
exit 1
fi
Results
Here are some examples generated using previous program:
Enjoy!