Read, merge, split PDF in Python

GemBox.Pdf is a .NET library that enable you to process PDF files from any .NET application. But it's also a COM accessible library that you can use in Python as well.

System Requirements

To use GemBox.Pdf in Python, you'll need to:

  1. Download and install GemBox.Pdf Setup.
  2. Expose GemBox.Pdf to COM Interop with Regasm.exe tool:
    :: Add GemBox.Pdf to COM registry for x86 (32-bit) applications.
    C:\Windows\Microsoft.NET\Framework\v4.0.30319\RegAsm.exe [path to installed GemBox.Pdf.dll]
    
    :: Add GemBox.Pdf to COM registry for x64 (64-bit) applications.
    C:\Windows\Microsoft.NET\Framework64\v4.0.30319\RegAsm.exe [path to installed GemBox.Pdf.dll]
  3. Install Python for Windows extension:
    :: Install Python extension for Windows.
    pip install pywin32

Working with PDF files in Python

The following example shows how you can read a PDF file from Python, merge multiple PDF files into a single PDF and split a single PDF into multiple PDF files.

Merging PDF files into one and splitting PDF pages into multiple PDF files in Python
Screenshot of merged PDF files and split PDF pages
import os
import win32com.client as COM

# Create ComHelper object.
comHelper = COM.Dispatch("GemBox.Pdf.ComHelper")
# If using Professional version, put your serial key below.
comHelper.ComSetLicense("FREE-LIMITED-KEY")

fileNames = ["\\%#MergeFile01.pdf%", "\\%#MergeFile02.pdf%", "\\%#MergeFile03.pdf%"]

################
### Read PDF ###
################

# Load PDF file.
document1 = comHelper.Load(os.getcwd() + fileNames[0])
pages1 = document1.Pages

# Read text content from each PDF page.
for i1 in range(pages1.Count):
    page = pages1.Item(i1)
    print(page.Content.ToString() + "\n")

document1.Dispose()

#################
### Merge PDF ###
#################

# Create PdfDocument object.
document2 = COM.Dispatch("GemBox.Pdf.PdfDocument")

# Merge multiple PDF files into a single PDF file.
for fileName in fileNames:
    sourceDocument = comHelper.Load(os.getcwd() + fileName)
    sourcePages = sourceDocument.Pages

    for i2 in range(sourcePages.Count):
        document2.Pages.AddClone(sourcePages.Item(i2))

    sourceDocument.Dispose()

comHelper.Save(document2, os.getcwd() + "\\Merge Files.pdf")
document2.Dispose()

#################
### Split PDF ###
#################

# Load PDF file.
document3 = comHelper.Load(os.getcwd() + "\\Merge Files.pdf")
pages3 = document3.Pages

# Split a single PDF file into multiple PDF files.
for i3 in range(pages3.Count):
    destinationDocument = COM.Dispatch("GemBox.Pdf.PdfDocument")
    destinationDocument.Pages.AddClone(pages3.Item(i3))
    
    comHelper.Save(destinationDocument, os.getcwd() + "\\Page" + str(i3) + ".pdf")
    destinationDocument.Dispose()

document3.Dispose()

Wrapper Library

Not all members of GemBox.Pdf are accesible because of the COM limitations like unsupported static and overload methods. That is why you can use ComHelper class which provides alternatives for some members that cannot be called with COM Interop.

However, if you need to use many GemBox.Pdf members from Python, a recommended approach is to create a .NET wrapper library instead. Your wrapper library should do all the work within and exposes a minimal set of classes and methods to the unmanaged code.

This will enable you to take advantage of GemBox.Pdf's full capabilities, avoid any COM limitations, and improve performnace by reducing the number of COM Callable Wrappers created at runtime.

Want more?

Next example GitHub

Check the next example or select an example from the menu. You can also download our examples from the GitHub.


Like it?

Download Buy

If you want to try the GemBox.Pdf yourself, you can download the free version. It delivers the same performance and set of features as the professional version, but with some operations limited. To remove the limitation, you need to purchase a license.