Create PDF Tools with PyPDF2

This post will show you how you can create a useful PDF File Editor with a few lines of simple codes by using Python and PyPDF2 python module.

Before we start writing our program, a few things we need to set up first;

  • Install python 3.x.x version and add your python path to system environment variables.
  • Open command prompt and type
>>>python -m pip install PyPDF2

It will install PyPDF2 to python package library. Please check to make sure it’s installed correctly.

Now, open your favorite text editor write the first line of code for our program.

from PyPDF2 import PdfFileReader, PdfFileWriter, PdfFileMerger

This import three class from PyPDF2 module into our program. And create our class PdfGetOperator.

from PyPDF2 import PdfFileReader, PdfFileWriter, PdfFileMerger
class PdfGetOperator():
    def __init__(self, file):
        self.file = PdfFileReader(file);
    def numberOfPage(self):
        return self.file.getNumPages();

    def extractPage(self, pageNumList, outputDir="./", combine=False):
        if not hasattr(type(pageNumList), "__iter__"):
            pageNumList = [pageNumList];
        if combine:
            pdfWriter = PdfFileWriter();
            for page in pageNumList:
                pdfWriter.addPage(self.file.getPage(page));
                newPdf = open("%s/output-combine.pdf" %outputDir,"wb");
                pdfWriter.write(newPdf);
                newPdf.close();
        else:
            for page in pageNumList:
                pdfWriter = PdfFileWriter();
                pdfWriter.addPage(self.file.getPage(page));
                newPdf = open("%s/page-%s.pdf" %(outputDir,page),"wb");
                pdfWriter.write(newPdf);
                newPdf.close();

pdfFile = r"Your pdf file path";
pdf = PdfOperator(pdfFile); print(pdf.numberOfPage());
# pdf.extractPage has two default arguments outputDir and combine.
pdf.extractPage([1,3,5])

The first ” __init__ “ function is the class constructor of our PdfGetOperator class. It called (run) when we instantiate our class. To instantiate our class we need one argument (a pdf file).

Second function ” numberOfPage ”  use getNumPages function from to collect number of pages from input pdf file.  (Note: ” getNumPages ” function includes in PyPDF2. PdfFileReader class, for further understanding PdfFileReader Docs )

The ” extractPage ” function performs extraction pages operation from input file. Three arguments are

  1. pageNumList (page number list)
  2. outputDir (default directory is input file directory)
  3. combine (combine output pages into single pdf file)
if not hasattr(type(pageNumList), "__iter__"): pageNumList = [pageNumList];

This statement is to check whether pageNumList input is iterable or not. If not it will convert into a list.

if combine:
    pdfWriter = PdfFileWriter();
    for page in pageNumList:
        pdfWriter.addPage(self.file.getPage(page));
        newPdf = open("%s/output-combine.pdf" %outputDir,"wb");
        pdfWriter.write(newPdf);
        newPdf.close();

If combine is True, this piece of code will run. To combine our pages we have to use only one file writer object and add pages. Finally, create new pdf document and write our pages.

else:
    for page in pageNumList:
        pdfWriter = PdfFileWriter();
        pdfWriter.addPage(self.file.getPage(page));
        newPdf = open("%s/page-%s.pdf" %(outputDir,page),"wb");
        pdfWriter.write(newPdf);
        newPdf.close();

Else clause run when combine is False. It almost look the same with “If clause”, except file writer is inside the loop and create new pdf for each page. That’s all for our first PdfGetOperator class.

The second class PdfSetOperator performs merging pdf pages operation, add this class to our program.

class PdfSetOperator():
    def __init__(self):
        pass;
    def bindPage(self, *files,outputDir="./"):
        merger = PdfFileMerger();
        for file in files:
            #bind files
            merger.append(file);
        output = open("%s/output-binder.pdf" %outputDir,"wb");
        merger.write(output);
        output.close();
setpdf = PdfSetOperator();
setpdf.bindPage((tuple list of your file paths))

” bindPage ” function have two arguments, *files (multiple files) and default outputDir (output directory). Instantiate PdfFileMerger object to append our files, and write this pages into new pdf file. That’s the end of our program and i hope you find it useful.

You can download this code from here.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.