Read, merge, split PDF in Python
GemBox.Pdf is a .NET library that enable you to process PDF files from any .NET application. But it's also a COM accessible library that you can use in Python as well.
System Requirements
To use GemBox.Pdf in Python, you'll need to:
- Download and install GemBox.Pdf Setup.
- Expose GemBox.Pdf to COM Interop with Regasm.exe tool:
:: Add GemBox.Pdf to COM registry for x86 (32-bit) applications. C:\Windows\Microsoft.NET\Framework\v4.0.30319\RegAsm.exe [path to installed GemBox.Pdf.dll] :: Add GemBox.Pdf to COM registry for x64 (64-bit) applications. C:\Windows\Microsoft.NET\Framework64\v4.0.30319\RegAsm.exe [path to installed GemBox.Pdf.dll]
- Install Python for Windows extension:
:: Install Python extension for Windows. pip install pywin32
Working with PDF files in Python
The following example shows how you can read a PDF file from Python, merge multiple PDF files into a single PDF and split a single PDF into multiple PDF files.

import os
import win32com.client as COM
# Create ComHelper object.
comHelper = COM.Dispatch("GemBox.Pdf.ComHelper")
# If using Professional version, put your serial key below.
comHelper.ComSetLicense("FREE-LIMITED-KEY")
fileNames = ["\\%#MergeFile01.pdf%", "\\%#MergeFile02.pdf%", "\\%#MergeFile03.pdf%"]
################
### Read PDF ###
################
# Load PDF file.
document1 = comHelper.Load(os.getcwd() + fileNames[0])
pages1 = document1.Pages
# Read text content from each PDF page.
for i1 in range(pages1.Count):
page = pages1.Item(i1)
print(page.Content.ToString() + "\n")
document1.Dispose()
#################
### Merge PDF ###
#################
# Create PdfDocument object.
document2 = COM.Dispatch("GemBox.Pdf.PdfDocument")
# Merge multiple PDF files into a single PDF file.
for fileName in fileNames:
sourceDocument = comHelper.Load(os.getcwd() + fileName)
sourcePages = sourceDocument.Pages
for i2 in range(sourcePages.Count):
document2.Pages.AddClone(sourcePages.Item(i2))
sourceDocument.Dispose()
comHelper.Save(document2, os.getcwd() + "\\Merge Files.pdf")
document2.Dispose()
#################
### Split PDF ###
#################
# Load PDF file.
document3 = comHelper.Load(os.getcwd() + "\\Merge Files.pdf")
pages3 = document3.Pages
# Split a single PDF file into multiple PDF files.
for i3 in range(pages3.Count):
destinationDocument = COM.Dispatch("GemBox.Pdf.PdfDocument")
destinationDocument.Pages.AddClone(pages3.Item(i3))
comHelper.Save(destinationDocument, os.getcwd() + "\\Page" + str(i3) + ".pdf")
destinationDocument.Dispose()
document3.Dispose()
Wrapper Library
Not all members of GemBox.Pdf are accesible because of the COM limitations like unsupported static and overload methods. That is why you can use ComHelper
class which provides alternatives for some members that cannot be called with COM Interop.
However, if you need to use many GemBox.Pdf members from Python, a recommended approach is to create a .NET wrapper library instead. Your wrapper library should do all the work within and exposes a minimal set of classes and methods to the unmanaged code.
This will enable you to take advantage of GemBox.Pdf's full capabilities, avoid any COM limitations, and improve performnace by reducing the number of COM Callable Wrappers created at runtime.