Click or drag to resize

Objects

A PDF document is a data structure composed from a small set of basic types of data objects.

These basic types of objects are implemented in GemBox.Pdf.Objects namespace.

The following sections describe the essential properties of the basic objects and their implementation in the GemBox.Pdf assembly:

Objects

A PDF includes eight basic types of objects (ordered based on complexity):

  1. PdfBasicObjectNull,

  2. PdfBoolean,

  3. PdfInteger and PdfNumber,

  4. PdfName,

  5. PdfString,

  6. PdfArray,

  7. PdfDictionary and

  8. PdfStream.

Additionally, GemBox.Pdf defines one more basic object type:

The null object

The null object is a representation of no value and a singleton whose only instance can be obtained with PdfBasicObjectNull property.

The null object is usually used in PdfArray to specify that the array’s element has no value (depending on the context, it usually means that the default value of the array’s element should be used). The null object is rarely used in PdfDictionary because specifying the null object as the value of a dictionary entry is equivalent to omitting the entry entirely.

Boolean values

Boolean values represent the logical values of true and false and are singletons whose only two instances can be obtained with PdfBooleanFalse and PdfBooleanTrue properties.

Their Boolean value can be obtained with PdfBooleanValue property.

Integer and Real numbers

Integer numbers represent mathematical integers. They can be created with PdfIntegerCreate(Int32) method, and their Int32 value can be obtained with PdfIntegerValue property. To minimize memory usage, Integer numbers created from an implementation-defined interval around zero will always return the same PdfInteger instance.

Real numbers represent mathematical real numbers. They can be created with PdfNumberCreate(Double) method, and their Double value can be obtained with PdfNumberValue property. If input Double value is actually an integer (Double casted to Int32 is equal to the original Double), then the created number will be the PdfInteger instance.

PdfInteger extends PdfNumber so it can be used in any place where PdfNumber can be used.

Names

Name is an atomic symbol uniquely defined by a sequence of any characters (8-bit values) except null (character code 0).

Name can be created with PdfNameCreate(String) method, and its String value can be obtained with PdfNameToString method.

UTF-8 encoding is used to encode the input String value to PdfName and to decode the PdfName to output String value.

Names are predominantly used as keys in PdfDictionary.

Strings

String consists of a series of zero or more bytes.

String can be created with:

Its String value can be obtained with:

String can be written in two forms:

  • Literal - as a sequence of literal characters enclosed in parentheses, and

  • Hexadecimal - as hexadecimal data enclosed in angle brackets < >.

Use PdfStringCreate(String, IPdfEncoding, PdfStringForm) method to create a string in the specified form and use PdfStringForm property to get the form of the string.

Arrays

An array is a one-dimensional collection of objects arranged sequentially. Arrays may be heterogeneous; that is, an array’s elements may be any combination of PdfNumbers, PdfStrings, PdfDictionaries or any other PdfBasicObjects, including other PdfArrays.

An array may have zero elements. Only one-dimensional arrays are directly supported. Arrays of higher dimensions can be constructed by using arrays as elements of arrays, nested to any depth.

An array can be created with any of the PdfArrayCreate methods.

PdfArray implements IListT, IList and all of their descendant interfaces that you can use to work with an array.

Dictionaries

A dictionary is an associative table containing pairs of objects, known as the dictionary’s entries. The first element of each entry is the key and the second element is the value. The key is always of type PdfName. The value may be any kind of PdfBasicObject, including another PdfDictionary.

PdfDictionaryEntry whose value is PdfBasicObjectNull is treated the same as if the entry does not exist. The dictionary may have zero entries. The entries in a dictionary are unordered. Multiple entries in the same dictionary cannot have the same key.

The dictionary can be created with any of the PdfDictionaryCreate methods.

PdfDictionary implements IDictionaryTKey, TValue, IDictionary and all of their descendant interfaces that you can use to work with a dictionary.

Streams

A stream is a (potentially large) sequence of bytes. A stream can specify filters that indicate whether and how the data in the stream should be transformed (decoded) before it is used. All streams are indirect objects, meaning that they can be contained only as a value of PdfIndirectObjectValue property.

A stream can be created with PdfStreamCreate method.

A stream’s data (either decoded or encoded) can be read or written with PdfStreamOpen(PdfStreamDataMode, PdfStreamDataState) method. A stream’s extent (the number of bytes of its encoded data) can be obtained with PdfStreamLength property. Stream’s data can be encoded (usually compressed) by specifying various PdfFilters in the PdfStreamFilters property. Other stream properties can be obtained from PdfStreamDictionary property.

GemBox.Pdf takes care to maintain integrity between a stream’s data and the Filters required for decoding and encoding the data. If Filters with which the data was encoded are changed before reading the decoded data, InvalidOperationException will be thrown. Additionally, PdfStreamLength is automatically updated when writing the stream’s data.

Note Note

GemBox.Pdf currently does not directly support streams whose data is contained in an external file. Still, these streams can be created with GemBox.Pdf by using PdfStreamDictionary property and specifying F and, optionally, FFilter and FDecodeParms entries.

Indirect Objects

Any PdfBasicObject, except PdfIndirectObject, can be contained in a PdfIndirectObject. This gives the object the ability to be contained in multiple places (for example, as an element of a PdfArray and as the value of a PdfDictionary entry) because its container (PdfIndirectObject) has the ability to be contained in multiple places.

If a PdfIndirectObject is read from a PDF file or is written to a PDF file, then it will have a unique object identifier that can be obtained with PdfIndirectObjectId property, and that consists of:

Together, the combination of an object number and a generation number uniquely identify an indirect object in a PDF file. If a PdfIndirectObject is not associated with any PDF file, then its Id will be PdfIndirectObjectIdentifierUndefined.

Note Note

PdfIndirectObjectId property should be used only as a diagnostics and debugging aid because its value might change if the PDF file is closed or when saving a PDF document to a new PDF file.

Opposed to PDF Specification ISO 32000-1:2008 that also defines an indirect reference with which the indirect object may be referred to from elsewhere in the file, GemBox.Pdf only defines PdfIndirectObject that serves both as a definition of an indirect object and of an indirect reference.

GemBox.Pdf takes care to map each indirect object and indirect reference from a PDF file with the same object identifier to the same PdfIndirectObject instance and vice versa. If a PDF file contains an invalid indirect reference (whose object identifier cannot be located in the PDF file), then indirect reference is mapped to PdfBasicObjectNull instance, if invalid indirect reference is located in a PdfArray; or PdfDictionaryEntry is not added, if it is located as a value of a PdfDictionary entry.

This GemBox.Pdf feature makes indirect objects and indirect references easier to work with, and it minimizes memory usage.

An indirect object can be created with any of the PdfIndirectObjectCreate methods. Its value can be obtained or set with PdfIndirectObjectValue property.

Caution note Caution

PdfIndirectObjectValue is obtained in a lazy fashion if the PdfIndirectObject is associated with the PDF file. This means that the value will be parsed from the PDF file only when it is requested for the first time. This feature enables GemBox.Pdf to perform fast reading and updating of the PDF file.

Set a PdfIndirectObjectValue only if you are sure that the PdfIndirectObject is not referenced from any other place.

GemBox.Pdf never sets a Value of an existing PdfIndirectObject because the PdfIndirectObject might be referenced from several places. Instead, a new PdfIndirectObject is created, and its Value is set.

Class Hierarchy

Basic object types are leaf nodes of GemBox.Pdf basic object type class hierarchy that is shown in the following picture:

GemBox.Pdf.Objects class hierarchy

Following subsections describe base classes from the above class hierarchy:

PdfBasicObject

PdfBasicObject is a base class for all basic PDF objects.

It provides PdfBasicObjectObjectType property for faster testing if a basic object is of the specified type than the as or is cast operators and PdfBasicObjectToString method that returns a String representation of the basic object used primarily for debugging purposes.

PdfBasicValue

PdfBasicValue is a base class for all immutable basic PDF objects.

PdfBasicValue instance is immutable and therefore can be shared (contained in multiple PdfDictionary or PdfArray objects), thus reducing the memory usage.

PdfBasicValue instance is thread-safe.

PdfBasicValue instance implements value equality by requiring all derived types to implement Equals(Object) and GetHashCode methods.

PdfBasicContainer

PdfBasicContainer is a base class for all mutable basic PDF objects.

PdfBasicContainer instance is mutable and therefore cannot be shared (contained in multiple PdfDictionary or PdfArray objects). The only exception is PdfIndirectObject instance that is also mutable, but can be shared.

PdfBasicContainer instance is not thread-safe.

PdfBasicContainer instance implements reference equality by making its implementations of Equals(Object) and GetHashCode methods sealed.

Additionally, PdfBasicContainer instance might be in a read-only state, meaning that it and all of its descendant PdfBasicContainers cannot be changed anymore. This enables faster implementations of other features, such as maintenance of integrity between PdfStream’s data and Filters required for decoding and encoding the data. Use PdfBasicContainerIsReadOnly property to test if an object is in a read-only state.

If you want to change the PdfBasicContainer instance that is in a read-only state or you want to use the similar instance in another location, then use one of the PdfBasicContainerClone methods, and use the returned instance for the requested operations.

PdfBasicCollection

PdfBasicCollection is a base class for all basic PDF objects that are collections (PdfArray and PdfDictionary).

It implements ICollection interface and is a data source for PDF Document Structure components.

Usage in GemBox.Pdf

To use basic PDF objects in GemBox.Pdf, you must first get the underlying PdfDictionary of a PdfDocument or any other PDF Document Structure component.

This is accomplished by importing GemBox.Pdf.Objects namespace with a statement using GemBox.Pdf.Objects; for C# or Import GemBox.Pdf.Objects for VB.NET. This namespace import will expose extensions methods defined in PdfObjectExtensions that can be used on any PDF Document Structure component to get the underlying PdfDictionary or PdfArray.

Here is an example of how to use PdfObjectExtensions to set a conforming product’s private data in a page-piece dictionary associated with the document. The example also shows how to use various basic PDF objects, such as PdfName, PdfString, PdfDictionary and PdfIndirectObject. For more information about page-piece dictionaries in PDF, see PDF Specification ISO 32000-1:2008, section '14.5 Page-Piece Dictionaries'.

using System;
using System.Globalization;
using GemBox.Pdf.Objects;
using GemBox.Pdf.Text;

namespace GemBox.Pdf.Examples
{
    class ObjectsExample
    {
        void SetPrivateDataOnDocument(PdfDocument document)
        {
            // Get document's trailer dictionary.
            var trailer = document.GetDictionary();

            // Get document catalog dictionary from the trailer.
            var catalog = (PdfDictionary)((PdfIndirectObject)trailer[PdfName.Create("Root")]).Value;

            // Either retrieve 'PieceInfo' entry value from document catalog or create page-piece dictionary and set it to document catalog under 'PieceInfo' entry.
            PdfDictionary pieceInfo;
            var pieceInfoKey = PdfName.Create("PieceInfo");
            var pieceInfoValue = catalog[pieceInfoKey];
            switch (pieceInfoValue.ObjectType)
            {
                case PdfBasicObjectType.Dictionary:
                    pieceInfo = (PdfDictionary)pieceInfoValue;
                    break;
                case PdfBasicObjectType.IndirectObject:
                    pieceInfo = (PdfDictionary)((PdfIndirectObject)pieceInfoValue).Value;
                    break;
                case PdfBasicObjectType.Null:
                    pieceInfo = PdfDictionary.Create();
                    catalog[pieceInfoKey] = PdfIndirectObject.Create(pieceInfo);
                    break;
                default:
                    throw new InvalidOperationException("PieceInfo entry must be dictionary.");
            }

            // Create page-piece data dictionary for 'GemBox.Pdf' conforming product and set it to page-piece dictionary.
            var data = PdfDictionary.Create();
            pieceInfo[PdfName.Create("GemBox.Pdf")] = data;

            // Create private data dictionary that will hold private data that 'GemBox.Pdf' conforming product understands.
            var privateData = PdfDictionary.Create();
            data[PdfName.Create("Data")] = privateData;

            // Set 'Title' and 'Version' entries to private data.
            privateData[PdfName.Create("Title")] = PdfString.Create(ComponentInfo.Title);
            privateData[PdfName.Create("Version")] = PdfString.Create(ComponentInfo.Version);

            // Specify date of the last modification of 'GemBox.Pdf' private data (required by PDF specification).
            data[PdfName.Create("LastModified")] = PdfString.Create("D:" + DateTimeOffset.Now.ToString("yyyyMMddHHmmssK", CultureInfo.InvariantCulture).Replace(':', '\'') + "'", PdfEncoding.ASCII, PdfStringForm.Literal);
        }
    }
}
See Also