Click or drag to resize

File Structure

The PDF file structure determines how objects are stored in a PDF file, how they are accessed, and how they are updated. This structure is independent of the semantics of the objects.

The following section describes how GemBox.Pdf provides loading and saving of a PDF document to a PDF file, efficient random access of objects in a PDF file, incremental updates of a PDF file and other PDF file-related functionalities.

File Structure

Implementation of the PDF file structure is currently not exposed through GemBox.Pdf interface.

Interface to the PDF file structure in GemBox.Pdf is exposed through the following members:

The following subsections give more details about each PDF file-structure-related operation in GemBox.Pdf.

Creating a new PDF document

To create a new in-memory PDF document, use the PdfDocument constructor.

It will create an empty PDF document. To make the PDF document valid, at least one page should be added to it after its creation.

This PDF document is contained entirely in memory and is not associated with any PDF file.

PdfIndirectObjects created with GemBox.Pdf will have an Id equal to PdfIndirectObjectIdentifierUndefined until they are written to a PDF file.

Loading a PDF document from a PDF file

PdfDocument can be loaded from a PDF file either by specifying a path to a PDF file via PdfDocumentLoad(String) or PdfDocumentLoad(String, PdfLoadOptions) methods or by specifying a PDF file stream via PdfDocumentLoad(Stream) or PdfDocumentLoad(Stream, PdfLoadOptions) methods.

Overloads that do not accept PdfLoadOptions as a parameter will use PdfLoadOptionsDefault.

A PDF document can be loaded in a read-only mode to prevent accidental changes to the PDF file. For more information, see PdfLoadOptionsReadOnly property.

PdfIndirectObjects read from the PDF file will have a unique Id that is different than PdfIndirectObjectIdentifierUndefined. PdfIndirectObjects created with GemBox.Pdf will have an Id equal to PdfIndirectObjectIdentifierUndefined until they are written to a PDF file.

Important note Important

A loaded PDF document is associated with the PDF file from which it was loaded, and the PDF file remains opened until PdfDocumentClose or PdfDocumentDispose method is called.

Any PDF document that is associated with the PDF file should be closed (disposed), otherwise memory and resource leaks might occur because the PDF file stream might not be closed until the application exists.

Loading a PDF document fully to a memory

A PDF file associated with the loaded PdfDocument must remain open because GemBox.Pdf reads the PDF file in a lazy fashion (indirect object values are parsed from the PDF file only when they are requested for the first time). This feature enables GemBox.Pdf to perform fast reading and updating of the PDF file.

If you want to dispose the associated PDF file, but still want to be able to fully use the PdfDocument instance, then PdfDocument instance first must be fully loaded from the associated PDF file to memory by using the PdfDocumentLoad instance method. Then the associated PDF file can be disposed as explained in the next subsection.

Effectively, this operation will read the PDF file in an eager fashion (values of all indirect objects accessible from the PdfDocument instance will be, if not already, parsed at that point).

If you do not load the PdfDocument instance to a memory before disposing the associated PDF file, then an exception might occur if some indirect object’s value is requested for the first time and it is not possible to parse it from a closed PDF file.

Closing the associated PDF file

If PdfDocument instance will no longer be used, but is associated with the PDF file (either because it was loaded from a PDF file or saved to a PDF file), then it must be closed or disposed by calling PdfDocumentClose or PdfDocumentDispose method.

This operation will close the associated PDF file.

At this point, all PdfIndirectObjects in a PDF document will have an Id equal to PdfIndirectObjectIdentifierUndefined until they are written to a PDF file.

PdfDocument is still fully usable at this point if it was loaded to memory before, as explained in the preceding subsection.

Note Note

Closing/disposing the PdfDocument instance with Close or Dispose method does not mean that the PdfDocument instance cannot longer be used, it just means that the associated PDF file is closed and that the PdfDocument instance is no longer associated with any PDF file.

Saving the PDF document to a new PDF file

PdfDocument can be saved to a new PDF file by specifying a path to a PDF file via PdfDocumentSave(String) method or by specifying a PDF file stream via PdfDocumentSave(Stream) method.

All save operations on the PdfDocument use the same PdfSaveOptions instance specified in the PdfDocumentSaveOptions property to control the details of the output PDF file structure. If PdfDocumentSaveOptions property is not specified, it will be set to a copy of the current PdfSaveOptionsDefault.

Various PDF file structure settings can be specified via PdfSaveOptions. Among them is PdfSaveOptionsCrossReferenceType that enables you to specify if the output PDF file will be compressed (information about the location of the indirect objects in the PDF file and the indirect objects are written compactly and are compressed) or not. For more details, see PdfCrossReferenceType enumeration. Note that some settings are applicable only if a PDF document is saved to a new PDF file, while others are applicable only if it is saved to the same PDF file as explained in the next subsection.

After the save operation, PdfIndirectObjects whose Id was PdfIndirectObjectIdentifierUndefined will now have a unique Id that is different than PdfIndirectObjectIdentifierUndefined.

Important note Important

A saved PDF document is associated with the PDF file to which it was saved, and the PDF file remains open until PdfDocumentClose or PdfDocumentDispose method is called.

Any PDF document that is associated with the PDF file should be closed (disposed), otherwise memory and resource leaks might occur because the PDF file stream might not be closed until the application exists.

Saving the PDF document to the same PDF file (incremental update)

Changes made to the PdfDocument after the load or the last save operation can be saved to the same PDF file by using PdfDocumentSave method.

GemBox.Pdf is able to automatically determine what objects have been changed, and if no object has been changed, then the PDF file won’t be updated.

Incremental save operation on the PdfDocument uses the PdfSaveOptions instance specified in the PdfDocumentSaveOptions property to control the details of the output PDF file structure that will be appended with the changed and new objects. If PdfDocumentSaveOptions property is not specified, it will be set to a copy of the current PdfSaveOptionsDefault.

Note that some settings are applicable only if a PDF document is saved to a new PDF file, while others are applicable only if it is saved to the same PDF file.

Tip Tip

Using the incremental update is the preferred way of making small changes to (potentially) large PDF files because it utilizes less memory.

After the incremental save operation, PdfIndirectObjects, whose Id was PdfIndirectObjectIdentifierUndefined, will now have a unique Id that is different than PdfIndirectObjectIdentifierUndefined.

Important note Important

A PDF document is associated with the PDF file to which it was incrementally updated, and the PDF file remains open until PdfDocumentClose or PdfDocumentDispose method is called.

Any PDF document that is associated with the PDF file should be closed (disposed), otherwise memory and resource leaks might occur because the PDF file stream might not be closed until the application exists.

See Also