Handling Multiple-Page PDF Document

Learn via video courses
Topics Covered

Overview

Matplotlib provides the modules and functions to handle multiple pages of PDF documents. Matplotlib has a minimalistic user interface for handling pdf documents. We can save plots on pages in PDF using the module backend.backend_pdf.PdfPages in matplotlib library. We will discuss how to save plots in a PDF document, like creating a PDF, adding a figure, adding plots on the figure, and writing the PDF to the disk.

Introduction

Matplotlib provided a variety of formats (pdf, png, svg, jpg, etc.) to save a plot. We are going to discuss how to handle multiple-page PDF documents in matplotlib. However, Matplotlib is a scientific plotting package, not a document composition system, such as LaTex. Thus, support for the multiple pages in matplotlib is fairly minimal.

Need for Multi-page PDF

Matplotlib provides handy modules to add multiple plots to a figure. There are functions like Figure.add_subfigure() and Figure.subfigures. Maplotlib also provides the function to save these plots. By default, a figure is shown with a minimalistic user interface, which allows you to save a figure to a file. However, this approach is not convenient if you generate large figures. To save multiple plots on the different figures, we can use the concept of handling PDF in matplotlib.

How to Handle Multiple-Page PDF Document

How does it Work?

We use the PDF backend in matplotlib to save plots in PDF format. First, we have to import the module.

from matplotlib.backends.backend_pdf import PdfPages

Then we can instantiate the PDF document. Here is the syntax for the same.

pdf= PdfPages('Filename.pdf')

For each new page, we create a new Figure instance. It's simple, one page, one figure. After plotting the graphs on the pages, we must write the PDF document to the disk. Here is the syntax for the same

pdf.close()

Using PdfPages():

Syntax

matplotlib.backends.backend_pdf.PdfPages(filename, keep_empty=True, metadata=None) is a class that contains the different functions related to handling PDF documents in matplotlib.

  1. attach_note(text, positionRect=[- 100, - 100, 0, 0])
  2. close()
  3. get_pagecount()
  4. infodict()
  5. savefig(figure= None, **kwargs)

Parameters of backend_pdf.pdfPages()

ParametersDescription
filenameThis parameter sets the file name to the pdf. Either we can save it at the location of the python code file or to a specific location using absolute path
Keep_emptyIf set to False, empty pdf files will be automatically deleted when closed.
metadataIt contains the data about the pdf.

Code Examples

Example1:

Output: Handling multiple-page PDF document

Code Explanation:

  • Importing required modules PdfPages, pyplot and random.
  • Then instantiate the pdf document by using the function PdfPage().
  • Method that generates pdf pages along with plots on them.
    • random.randint() is used to generate a random number within a range, and it is combined with for loop to generate a list of random numbers.
    • Creating figure using matplotlib.pyplot.figure() and also setting up the resolution.
    • Clearing the current figure using the function matplotlib.pyplot.clf().
    • Adding details to the figure using matplotlib.pyplot.title, matplotlib.pyplot.xlabel, and matplotlib.pyplot.ylabel().
    • Plotting data using the matplotlib.pyplot.scatter function.
    • Save the plot in the pdf using the `savefig()`` function.
  • For loop, which calls the method to add pages in the PDF document.
  • Finally, write in the pdf file by closing it.

Example2:

Output: Handling multiple-page PDF document

Code Explanation:

  • Importing required modules.
  • Generating the normal distribution using the numpy.random.randn() function.
  • Instantiating PDF document file name is Sample_file_1.pdf.
  • Number of pages is five, and plots per page is 4.
  • Page count = Number of pages/plots per page.
  • Setting tuple as a grid having rows = plot per page and columns = 1.
  • Then, in the for loop
    • First, if block generates a new figure when plots per page are drawn and setting the page number using matplotlib.suptitle().
    • Plotting bar chart on the subplots.
    • Second, block to save the Pdf after all the changes.
  • Then, write the PDF document to the disk.

Conclusion

  • First, we have to import the module PdfPages.
  • For instantiating the PDF document we use PdfPages() function.
  • For adding a new page to the pdf, we have to create a new figure and save it using the function savefig().
  • We can add subfigures and grid specs on the figure to graph the subplots in a different layout.
  • Finally, we must write the pdf data to the disk using pdf.close() function.