Here you can download the latest version of DocxReader demo WPF application, together with C# source code:
Download (last updated on 2013-09-10)

DocxReader demo WPF application


A DOCX file is actually a zipped group of files and folders, called a package. Package consists of package parts (files that contain any type of data like text, images, binary, etc.) and relationships files. Package parts have a unique URI name and relationships XML files contain these URIs.

When you open the DOCX file with a zipping application, you can see the document structure and its package’s parts.

DOCX document structure

DOCX main content is stored in the package part document.xml, which is often located in word directory, but it does not have to be. To find out URI (location) of document.xml, we should read a relationships XML file inside the _rels directory and look for a relationship type

DOCX document main part

document.xml file contains XML elements defined primarily in WordprocessingML XML namespace of Office Open XML specification. The basic structure of document.xml consists of a document (<document>) element which contains a body (<body>) element. Body element consists of one or more block level elements such as paragraph (<p>) elements. A paragraph contains one or more inline level elements such as run (<r>) elements. A run element contains one or more document’s text content elements such as text (<t>), page break (<br>) and tab (<tab>) elements.


In short, to retrieve and display a DOCX text content, application will use two classes: DocxReader and its subclass DocxToFlowDocumentConverter.

DocxReader will unzip the file with the help of System.IO.Packaging namespace, find the document.xml file through the relationship and read it with XmlReader.

DocxToFlowDocumentConverter will convert the XML elements from XmlReader into a corresponding WPF’s FlowDocument elements.


DocxReader constructor first opens (unzips) the package from the DOCX file stream and retrieves the mainDocumentPart (document.xml) with the help of its PackageRelationship.

After retrieving the document.xml PackagePart, we can read it with .NET’s XmlReader class, a fast forward-only XML reader which has the same path trajectory as depth-first traversal algorithm in tree data structure.

DOCX document XmlReader path trajectory

First path, 1 to 4, shows the simplest path in retrieving a text from the paragraph element. The second path, 5 – …, shows a more complex paragraph content. In this path, we will also read paragraph properties (<pPr>) and run properties (<rPr>) which contain various formatting options.

We create a series of reading methods for every element we wish to support in this path trajectory.

protected virtual void ReadDocument(XmlReader reader)
        while (reader.Read())
            if (reader.NodeType == XmlNodeType.Element && reader.NamespaceURI ==
                WordprocessingMLNamespace && reader.LocalName == BodyElement)
                ReadXmlSubtree(reader, this.ReadBody);
    private void ReadBody(XmlReader reader) {...}
    private void ReadBlockLevelElement(XmlReader reader) {...}
    protected virtual void ReadParagraph(XmlReader reader) {...}
    private void ReadInlineLevelElement(XmlReader reader) {...}
    protected virtual void ReadRun(XmlReader reader) {...}
    private void ReadRunContentElement(XmlReader reader) {...}
    protected virtual void ReadText(XmlReader reader) {...}

To point out a few things you will notice in DocxReader reading methods:

  • We use XmlNameTable to store XML namespace, element and attribute names. This provides us with a better looking code but we also get better performance because now we can do an object (reference) comparisons on these strings rather than a more expensive string (value) comparison since XmlReader will use atomized strings from XmlNameTable for its LocalName and NamespaceURI properties and because .NET uses string interning and cleverly implements string equality by first doing reference equality and then value equality.
  • We use XmlReader.ReadSubtree method while passing the XmlReader into a specific DocxReader reading method to create a boundary around that XML element. DocxReader reading methods will now have access to only that specific XML element, rather than to the entire document.xml. Using this method has some performance penalty which we traded for more secure and intuitive code.
private static void ReadXmlSubtree(XmlReader reader, Action<XmlReader> action)
        using (var subtreeReader = reader.ReadSubtree())
            // Position on the first node.
            if (action != null)


This class inherits from the DocxReader and it overrides some of the reading methods of DocxReader to create a corresponding WPF’s FlowDocument element.

So, for example, while reading document element, we will create a new FlowDocument, while reading paragraph element we will create a new Paragraph element and while reading run element we will create a new Span element.

protected override void ReadDocument(XmlReader reader)
        this.document = new FlowDocument();
    protected override void ReadParagraph(XmlReader reader)
        using (this.SetCurrent(new Paragraph()))
    protected override void ReadRun(XmlReader reader)
        using (this.SetCurrent(new Span()))

Also, this class implements setting some Paragraph and Span properties which are read from paragraph property element <pPr> and run property element <rPr>. While XmlReader is reading these property elements we have already created a new Paragraph or Span element and now we need to set their properties.

Because we are moving from the parent element (Paragraph) to child elements (Spans) and back to a parent, we will have to track our current element in the FlowDocument with a variable of type TextElement (an abstract base class for Paragraph and Span).

This is accomplished with a help of CurrentHandle and C# using statement syntactic sugar for try-finally construct. With a SetCurrent method we set a current TextElement and with a Dispose method will retrieve our previous TextElement and set it as the current TextElement.

private struct CurrentHandle : IDisposable
        private readonly DocxToFlowDocumentConverter converter;
        private readonly TextElement previous;
        public CurrentHandle(DocxToFlowDocumentConverter converter, TextElement current)
            this.converter = converter;
            this.previous = this.converter.current;
            this.converter.current = current;
        public void Dispose()
            this.converter.current = this.previous;
    private IDisposable SetCurrent(TextElement current)
        return new CurrentHandle(this, current);

Using the Code

To get a FlowDocument all we need is to create a new DocxToFlowDocumentConverter instance from a DOCX file stream and call Read method on that instance.

After that, we can display the flow document content in WPF application using the FlowDocumentReader control.

using (var stream = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
        var flowDocumentConverter = new DocxToFlowDocumentConverter(stream);
        this.flowDocumentReader.Document = flowDocumentConverter.Document;
        this.Title = Path.GetFileName(path);


DOCX Reader is not a complete solution and is intended to be used for simple scenarios (without tables, lists, pictures, headers/footers, styles, etc.). This application can be enhanced to read more DOCX features, but to get a full DOCX support with all advanced features would require a lot more time and knowledge of DOCX file format. Hopefully, this article and accompanying application has shown you some insights into DOCX file format and might provide a basis for doing more complex DOCX related applications.

Download the Free Version of GemBox.Document

GemBox.Document Free delivers the same performance and set of features as the Professional version. However, the Free version is limited to 20 paragraphs. You can use the free version for any purpose, including commercial applications.
If you want to see how fast and efficient GemBox.Document performs for large files, you can enable Trial mode right from within the free version.

Get the GemBox.Document Professional for unlimited use

GemBox.Document is a standalone .NET component for fast reading, writing, editing, converting and printing of DOCX, DOC, PDF, HTML, XPS, RTF, and TXT files using C# or VB.NET with one simple API.