Convert between Word files and HTML pages in C# and VB.NET

With GemBox.Document you can achieve quick and efficient conversion between Word documents and HTML pages, using simple and straightforward C# or VB.NET code.

The following examples show how you can import and export HTML content to and from DOC, DOCX, RTF and XML formats.

Convert Word files to HTML or MHTML

GemBox.Document creates a well-formed HTML file from the Word document's rich content and images. The images are extracted as separate files to HtmlSaveOptions.FilesDirectoryPath and referenced relative to HtmlSaveOptions.FilesDirectorySrcPath.

Alternatively, you can specify that images should be exported directly into an HTML file as base64-encoded data (Data URLs image source) using the HtmlSaveOptions.EmbedImages property.

The following example shows how you can convert a Word file to HTML with embedded images and semantic elements.

Converted Word document to HTML format in C# and VB.NET
Screenshot of input Word and converted output HTML
Upload your file (Drag file here)
using GemBox.Document;

class Program
{
    static void Main()
    {
        // If using the Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY");

        // Load Word file (DOC, DOCX, RTF, XML) into DocumentModel object.
        var document = DocumentModel.Load("%InputFileName%");

        var saveOptions = new HtmlSaveOptions()
        {
            HtmlType = HtmlType.Html,
            EmbedImages = true,
            UseSemanticElements = true
        };

        // Save DocumentModel object to HTML (or MHTML) file.
        document.Save("Exported.html", saveOptions);
    }
}
Imports GemBox.Document

Module Program

    Sub Main()

        ' If using the Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY")

        ' Load Word file (DOC, DOCX, RTF, XML) into DocumentModel object.
        Dim document = DocumentModel.Load("%InputFileName%")

        Dim saveOptions As New HtmlSaveOptions() With
        {
            .HtmlType = HtmlType.Html,
            .EmbedImages = True,
            .UseSemanticElements = True
        }

        ' Save DocumentModel object to HTML (or MHTML) file.
        document.Save("Exported.html", saveOptions)

    End Sub
End Module

You can also convert your Word file to web archive format (MHTML format) which is useful for creating a web page with concatenated resources or creating an email message.

By default GemBox.Document will reference images within MHTML files with Content-Location headers. However, some MHTML viewers, like Microsoft Outlook, fail to load such resources. In that case you can switch to Content-ID (CID) references using the HtmlSaveOptions.UseContentIdHeaders property.

Convert HTML pages to Word files

GemBox.Document supports reading input HTML files from a path, URL or a stream by using one of the DocumentModel.Load methods, and supports reading HTML text by using the ContentRange.LoadText or ContentPosition.LoadText methods.

When loading an HTML text or HTML stream you'll need to specify HtmlLoadOptions.BaseAddress in order to import images with the relative path.

The following example shows how you can convert an HTML file to a Word document.

Converted HTML web page to Word format in C# and VB.NET
Screenshot of input HTML and converted output DOCX
Upload your file (Drag file here)
using GemBox.Document;

class Program
{
    static void Main()
    {
        // If using the Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY");

        // Load input HTML file.
        DocumentModel document = DocumentModel.Load("%InputFileName%");

        // When reading any HTML content a single Section element is created,
        // which can be used to specify various Word document's page options.
        // The same can also be achieved with HTML document itself,
        // by using CSS properties on "@page" directive or "<body>" element.
        Section section = document.Sections[0];
        PageSetup pageSetup = section.PageSetup;
        PageMargins pageMargins = pageSetup.PageMargins;
        pageMargins.Top = pageMargins.Bottom = pageMargins.Left = pageMargins.Right = 0;

        // Save output DOCX file.
        document.Save("Output.%OutputFileType%");
    }
}
Imports GemBox.Document

Module Program

    Sub Main()

        ' If using the Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY")

        ' Load input HTML file.
        Dim document As DocumentModel = DocumentModel.Load("%InputFileName%")

        ' When reading any HTML content a single Section element is created,
        ' which can be used to specify various Word document's page options.
        ' The same can also be achieved with HTML document itself,
        ' by using CSS properties on "@page" directive or "<body>" element.
        Dim section As Section = document.Sections(0)
        Dim pageSetup As PageSetup = section.PageSetup
        Dim pageMargins As PageMargins = pageSetup.PageMargins
        With pageMargins
            .Left = 0
            .Right = 0
            .Top = 0
            .Bottom = 0
        End With

        ' Save output DOCX file.
        document.Save("Output.%OutputFileType%")

    End Sub
End Module

See also


Next steps

GemBox.Document is a .NET component that enables you to read, write, edit, convert, and print document files from your .NET applications using one simple API. How about testing it today?

Download Buy