Document Structure

The PDF document structure specifies how the basic object types are used to represent components of a PDF document: pages, fonts, annotations and so forth.

The following section gives more information about PDF document components currently implemented in GemBox.Pdf.

Document Structure

The root of the Document Structure as specified in PDF Specification ISO 32000-1:2008, section '7.7 Document Structure' in GemBox.Pdf is PdfDocument type.

PdfDocument currently contains the reference to the following complex PDF components:

Pages - the page tree node that shall be the root of the document's page tree (see PDF Specification ISO 32000-1:2008, section '7.7.3 Page Tree').
Form - an interactive form (PDF 1.2) (sometimes referred to as an AcroForm) that is a collection of fields for gathering information interactively from the user (see PDF Specification ISO 32000-1:2008, section '12.7 Interactive Forms').
Outlines - the outline dictionary that shall be the root of the document's outline hierarchy (see PDF Specification ISO 32000-1:2008, section '12.3.3 Document Outline').
Metadata - a metadata stream that shall contain metadata for the document (see PDF Specification ISO 32000-1:2008, section '14.3.2 Metadata Streams').
Info - the document's information dictionary (see PDF Specification ISO 32000-1:2008, section '14.3.3 Document Information Dictionary').
ViewerPreferences - a viewer preferences dictionary specifying the way the document shall be displayed on the screen (see PDF Specification ISO 32000-1:2008, section '12.2 Viewer Preferences').
SecurityStore - document-wide security-related information (see ETSI EN 319 142-1, section '5.4.2 Document Security Store').
Id - an array of two byte-strings constituting a file identifier for the file (see PDF Specification ISO 32000-1:2008, section ' 14.4 File Identifiers').

Note

For each PDF component, the underlying PdfDictionary or PdfArray can be accessed, as explained in Objects. But note that for some PDF components that are optional and unchanged, the underlying PdfDictionary or PdfArray instance might be null.

Page Tree

The pages of a document are accessed through a structure known as the page tree, which defines the ordering of pages in the document.

Using the tree structure, conforming readers using only limited memory can quickly open a document containing thousands of pages.

The tree contains nodes of two types:

intermediate nodes, called page tree nodes, implemented in PdfPages class and
leaf nodes, called page objects, implemented in PdfPage class.

Note

The simplest structure can consist of a single page tree node that references all of the document's page objects directly. However, to optimize application performance, a conforming writer can construct trees of a particular form, known as balanced trees.

GemBox.Pdf provides two views of the document's page tree:

flattened view - exposing leaf PdfPage nodes directly on the intermediate PdfPages node via interface and other PdfPages members.
tree view - exposing both intermediate and leaf nodes via Kids property.

Base class for both intermediate PdfPages node and leaf PdfPage node is PdfPageObject that contains inheritable properties. If an intermediate PdfPages node or a leaf PdfPage node does not set an inheritable property, then the property value will be inherited from the first ancestor PdfPages node that has that inheritable property set, or the property value will be represented by a default value if no ancestor PdfPages node has that inheritable property set.

Cloning

PDF document and page tree nodes can be cloned.

To clone the entire document, use Clone() method.

To clone the specific page, use either PdfPages methods PdfPages.AddClone(PdfPage) and PdfPages.InsertClone(Int32, PdfPage) or PdfPageObjectCollection methods PdfPageObjectCollection.AddClone(PdfPageObject) and PdfPageObjectCollection.InsertClone(Int32, PdfPageObject).

For more information, see Cloning example.

Document Structure