This post will show you how you can create a useful PDF File Editor with a few lines of simple codes by using Python and PyPDF2 python module.
Before we start writing our program, a few things we need to set up first;
- Install python 3.x.x version and add your python path to system environment variables.
- Open command prompt and type
>>>python -m pip install PyPDF2
It will install PyPDF2 to python package library. Please check to make sure it’s installed correctly.
Now, open your favorite text editor write the first line of code for our program.
from PyPDF2 import PdfFileReader, PdfFileWriter, PdfFileMerger
This import three class from PyPDF2 module into our program. And create our class PdfGetOperator.
from PyPDF2 import PdfFileReader, PdfFileWriter, PdfFileMerger class PdfGetOperator(): def __init__(self, file): self.file = PdfFileReader(file); def numberOfPage(self): return self.file.getNumPages(); def extractPage(self, pageNumList, outputDir="./", combine=False): if not hasattr(type(pageNumList), "__iter__"): pageNumList = [pageNumList]; if combine: pdfWriter = PdfFileWriter(); for page in pageNumList: pdfWriter.addPage(self.file.getPage(page)); newPdf = open("%s/output-combine.pdf" %outputDir,"wb"); pdfWriter.write(newPdf); newPdf.close(); else: for page in pageNumList: pdfWriter = PdfFileWriter(); pdfWriter.addPage(self.file.getPage(page)); newPdf = open("%s/page-%s.pdf" %(outputDir,page),"wb"); pdfWriter.write(newPdf); newPdf.close(); pdfFile = r"Your pdf file path"; pdf = PdfOperator(pdfFile); print(pdf.numberOfPage()); # pdf.extractPage has two default arguments outputDir and combine. pdf.extractPage([1,3,5])
The first ” __init__ “ function is the class constructor of our PdfGetOperator class. It called (run) when we instantiate our class. To instantiate our class we need one argument (a pdf file).
Second function ” numberOfPage ” use getNumPages function from to collect number of pages from input pdf file. (Note: ” getNumPages ” function includes in PyPDF2. PdfFileReader class, for further understanding PdfFileReader Docs )
The ” extractPage ” function performs extraction pages operation from input file. Three arguments are
- pageNumList (page number list)
- outputDir (default directory is input file directory)
- combine (combine output pages into single pdf file)
if not hasattr(type(pageNumList), "__iter__"): pageNumList = [pageNumList];
This statement is to check whether pageNumList input is iterable or not. If not it will convert into a list.
if combine: pdfWriter = PdfFileWriter(); for page in pageNumList: pdfWriter.addPage(self.file.getPage(page)); newPdf = open("%s/output-combine.pdf" %outputDir,"wb"); pdfWriter.write(newPdf); newPdf.close();
If combine is True, this piece of code will run. To combine our pages we have to use only one file writer object and add pages. Finally, create new pdf document and write our pages.
else: for page in pageNumList: pdfWriter = PdfFileWriter(); pdfWriter.addPage(self.file.getPage(page)); newPdf = open("%s/page-%s.pdf" %(outputDir,page),"wb"); pdfWriter.write(newPdf); newPdf.close();
Else clause run when combine is False. It almost look the same with “If clause”, except file writer is inside the loop and create new pdf for each page. That’s all for our first PdfGetOperator class.
The second class PdfSetOperator performs merging pdf pages operation, add this class to our program.
class PdfSetOperator(): def __init__(self): pass; def bindPage(self, *files,outputDir="./"): merger = PdfFileMerger(); for file in files: #bind files merger.append(file); output = open("%s/output-binder.pdf" %outputDir,"wb"); merger.write(output); output.close(); setpdf = PdfSetOperator(); setpdf.bindPage((tuple list of your file paths))
” bindPage ” function have two arguments, *files (multiple files) and default outputDir (output directory). Instantiate PdfFileMerger object to append our files, and write this pages into new pdf file. That’s the end of our program and i hope you find it useful.
You can download this code from here.