The PDF document structure specifies how the basic object types are used to represent components of a PDF document: pages, fonts, annotations and so forth.
The following section gives more information about PDF document components currently implemented in GemBox.Pdf.
The root of the Document Structure as specified in PDF Specification ISO 32000-1:2008, section '7.7 Document Structure' in GemBox.Pdf is PdfDocument type.
PdfDocument currently contains the reference to the following complex PDF components:
Pages - the page tree node that shall be the root of the document’s page tree (see PDF Specification ISO 32000-1:2008, section '7.7.3 Page Tree').
Form - an interactive form (PDF 1.2) (sometimes referred to as an AcroForm) that is a collection of fields for gathering information interactively from the user (see PDF Specification ISO 32000-1:2008, section '12.7 Interactive Forms').
Outlines - the outline dictionary that shall be the root of the document’s outline hierarchy (see PDF Specification ISO 32000-1:2008, section '12.3.3 Document Outline').
Info - the document’s information dictionary (see PDF Specification ISO 32000-1:2008, section '14.3.3 Document Information Dictionary').
ViewerPreferences - a viewer preferences dictionary specifying the way the document shall be displayed on the screen (see PDF Specification ISO 32000-1:2008, section '12.2 Viewer Preferences').
Id - an array of two byte-strings constituting a file identifier for the file (see PDF Specification ISO 32000-1:2008, section ' 14.4 File Identifiers').
The pages of a document are accessed through a structure known as the page tree, which defines the ordering of pages in the document.
Using the tree structure, conforming readers using only limited memory can quickly open a document containing thousands of pages.
The tree contains nodes of two types:
intermediate nodes, called page tree nodes, implemented in PdfPages class and
leaf nodes, called page objects, implemented in PdfPage class.
The simplest structure can consist of a single page tree node that references all of the document’s page objects directly.
However, to optimize application performance, a conforming writer can construct trees of a particular form, known as balanced trees.
GemBox.Pdf provides two views of the document’s page tree:
tree view - exposing both intermediate and leaf nodes via PdfPagesKids property.
Base class for both intermediate PdfPages node and leaf PdfPage node is PdfPageObject that contains inheritable properties. If an intermediate PdfPages node or a leaf PdfPage node does not set an inheritable property, then the property value will be inherited from the first ancestor PdfPages node that has that inheritable property set, or the property value will be represented by a default value if no ancestor PdfPages node has that inheritable property set.
PDF document and page tree nodes can be cloned.
To clone the entire document, use PdfDocumentClone method.
To clone the specific page, use either PdfPages methods AddClone(PdfPage) and InsertClone(Int32, PdfPage) or PdfPageObjectCollection methods AddClone(PdfPageObject) and InsertClone(Int32, PdfPageObject).
For more information, see Cloning example.