Jorge Martinez

Aerospace Engineer and Senior Software Developer


Script for converting PDF pages to images

Jorge Martínez Garrido

May 24, 2024

algorithms shell linux


Recently, I created some beautiful images from a PDf document, see the image below:

PDF pages as images

Previous images were generated by taking a manual snapshot of every page and applying them some shadow. This is not ideal. Every snapshot may have a different size since I am manually cropping a region.

Thus, I wondered if there was any program out there. This program should be able to accept the path to a PDF file, the page number and the output file path where the generated image can be saved.

pdftoppm: easy PDF to PNG converter

I was able to find pdftoppm. This program is part of the poppler library. This library contains other utilities for extracting images, inspecting the font or splitting pages, among others.

- pdfdetach -- lists or extracts embedded files (attachments)
- pdffonts -- font analyzer
- pdfimages -- image extractor
- pdfinfo -- document information
- pdfseparate -- page extraction tool
- pdfsig -- verifies digital signatures
- pdftocairo -- PDF to PNG/JPEG/PDF/PS/EPS/SVG converter using Cairo
- pdftohtml -- PDF to HTML converter
- pdftoppm -- PDF to PPM/PNG/JPEG image converter
- pdftops -- PDF to PostScript (PS) converter
- pdftotext -- text extraction
- pdfunite -- document merging tool

The only issue is that pdftoppm generates images in PPM format. Despite its name, modern versions of this program include a -png flag, supporting thus the PNG format. Another flag, -singlefile, is required to avoid pdftoppm writing numbers to the output filename.

pdftoppm \
    # Specify the desired page
    -f <first-page> -l <last-page> \
    # Convert to PNG format
    -png <file.pdf> \
    # Do not write numbers in output filename
    -singlefile <outfile-prefix>

ImageMagick: editing digital images

So far so good, I was able to generate a PNG image from a PDF page. However, I also wanted to apply a shadow effect on the image. The solution was to use imagemagick.

Turns out that imagemagick provides lots of command-line tools, including some for editing images and applying some styles.

In particular, the convert command has a -shadow flag for simulating an image shadow. A great set of code samples are provided in the official documentation.

The shadow flag accepts various parameters following the format:

-shadow <opacity>x<blur><+-xoffset><+-yoffset>

Following the code snippets, the following command can be used:

convert <input-file> \
    # Create a drop shadow effect in a new layer
    \( +clone  -background <color-name> -shadow <opacity>x<blur><+-xoffset><+-yoffset> \) \
    # Swap layers, merge them, and remove any canvas
    +swap -background none -layers merge  +repage \
    <output-file>

Crafting the program

With the main logic in place, I was able to code the following script:

#!/bin/bash

# Ensure correct number of arguments
if [ "$#" -ne 3 ]; then
    echo "Usage: $0 <file.pdf> <page-number> <output.png>"
    exit 1
fi

PDF_FILE=$1
PAGE_NUMBER=$2
OUTPUT_IMAGE=$3

# Check that input file exists
if [ ! -f "${PDF_FILE}" ]; then
    echo "File ${PDF_FILE} not found."
    exit 1
fi

# Convert desired page from PDF file to a PNG image
pdftoppm -f ${PAGE_NUMBER} -l ${PAGE_NUMBER} -singlefile -png "${PDF_FILE}" "${OUTPUT_IMAGE%.*}"

# Ensure the output image is created and in PNG format
if [ ! -f "${OUTPUT_IMAGE%.*}.png" ]; then
    echo "Failed to create PNG file."
    exit 1
fi

# Apply two drop shadow effects with some blur using ImageMagick
convert "${OUTPUT_IMAGE%.*}.png" \
    \( -clone 0 -background black -shadow 40x10+15+15 \) \
    +swap -background none -layers merge +repage \
    \( -clone 0 -background black -shadow 40x10-5-5 \) \
    +swap -background none -layers merge +repage \
    "${OUTPUT_IMAGE}"

# Check if the final output image was successfully created
if [ $? -eq 0 ]; then
    echo "Image processing complete: ${OUTPUT_IMAGE}"
else
    echo "Image processing failed."
    exit 1
fi

Results

Here are some examples generated using previous program:

Enjoy!