Convert Word and HTML to PDF in C# and VB.NET

With GemBox.Document you can easily convert files from one format into another, using just C# or VB.NET code. You can load, or read, any supported input file format and save, or write, it as any supported output file format.

You can find the full list of formats on Supported File Formats help page.

The following examples demonstrate some commonly required file format conversions.

Convert Word files to PDF

For both reading and writing, you can either provide the file's path or a stream by using one of the DocumentModel.Load and DocumentModel.Save methods.

When converting Word document to PDF, the machine that's executing the C# or VB.NET code should have the fonts, that are used in the document, installed on it. If not, you can provide them as private fonts or as embedded fonts.

The following example demonstrates how you can convert Word file to PDF using the default LoadOptions and SaveOptions.

Screenshots of input Word and coverted output PDF
Input Word document Converted Word document to PDF format in C# and VB.NET
Upload your file (Drag file here)
using GemBox.Document;

class Program
{
    static void Main()
    {
        // If using Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY");

        // In order to convert Word to PDF, we just need to:
        // 1. Load DOC or DOCX file into DocumentModel object.
        // 2. Save DocumentModel object to PDF file.
        DocumentModel document = DocumentModel.Load("%InputFileName%");
        document.Save("Output.%OutputFileType%");
    }
}
Imports GemBox.Document

Module Program

    Sub Main()

        ' If using Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY")

        ' In order to convert Word to PDF, we just need to:
        ' 1. Load DOC or DOCX file into DocumentModel object.
        ' 2. Save DocumentModel object to PDF file.
        Dim document As DocumentModel = DocumentModel.Load("%InputFileName%")
        document.Save("Output.%OutputFileType%")

    End Sub
End Module

Convert Word pages to PDF

Beside saving the whole Word document as a single PDF file, you can also convert each document's page as a separate PDF file.

The following example demonstrates how to do that and also how to use PdfSaveOptions.ImageDpi property to optimize, reduce, the resulting PDF files sizes.

Screenshots of converted Word pages to PDF files
First Word page converted to PDF in C# and VB.NET Second Word page converted to PDF in C# and VB.NET Third Word page converted to PDF in C# and VB.NET
Upload your file (Drag file here)
using System;
using System.IO;
using System.IO.Compression;
using GemBox.Document;

class Program
{
    static void Main()
    {
        // If using Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY");

        // Load Word file.
        DocumentModel document = DocumentModel.Load("%InputFileName%");

        // Get Word pages.
        var pages = document.GetPaginator().Pages;

        // Create PDF save options.
        var pdfSaveOptions = new PdfSaveOptions() { ImageDpi = 220 };

        // Create ZIP file for storing PDF files.
        using (var archiveStream = File.OpenWrite("Output.zip"))
        using (var archive = new ZipArchive(archiveStream, ZipArchiveMode.Create))
            // Iterate through Word pages.
            for (int pageIndex = 0; pageIndex < pages.Count; pageIndex++)
            {
                DocumentModelPage page = pages[pageIndex];

                // Create ZIP entry for each document page.
                var entry = archive.CreateEntry($"Page {pageIndex + 1}.pdf");

                // Save each document page as PDF to ZIP entry.
                using (var pdfStream = new MemoryStream())
                using (var entryStream = entry.Open())
                {
                    page.Save(pdfStream, pdfSaveOptions);
                    pdfStream.CopyTo(entryStream);
                }
            }
    }
}
Imports System
Imports System.IO
Imports System.IO.Compression
Imports GemBox.Document

Module Program

    Sub Main()

        ' If using Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY")

        ' Load Word file.
        Dim document As DocumentModel = DocumentModel.Load("%InputFileName%")

        ' Get Word pages.
        Dim pages = document.GetPaginator().Pages

        ' Create PDF save options.
        Dim pdfSaveOptions As New PdfSaveOptions() With {.ImageDpi = 220}

        ' Create ZIP file for storing PDF files.
        Using archiveStream = File.OpenWrite("Output.zip")
            Using archive As New ZipArchive(archiveStream, ZipArchiveMode.Create)
                ' Iterate through Word pages.
                For pageIndex As Integer = 0 To pages.Count - 1

                    Dim page As DocumentModelPage = pages(pageIndex)

                    ' Create ZIP entry for each document page.
                    Dim entry = archive.CreateEntry($"Page {pageIndex + 1}.pdf")

                    ' Save each document page as PDF to ZIP entry.
                    Using pdfStream As New MemoryStream()
                        Using entryStream = entry.Open()
                            page.Save(pdfStream, pdfSaveOptions)
                            pdfStream.CopyTo(entryStream)
                        End Using
                    End Using
                Next
            End Using
        End Using

    End Sub
End Module

Convert HTML to PDF

For convenience sake, GemBox.Document supports reading input files from URL, so you can load your HTML file from local or remote path.

GemBox.Document supports inline styling, internal and external stylesheet. It uses a subset of CSS properties and additionally some arbitrary properties from Microsoft Word (like 'mso-pagination', 'mso-rotate', etc.). Also, it uses print type media rule (e.g. @media print { ... }).

The following example demonstrates how you can convert HTML file to PDF and specify some page options.

Screenshots of input HTML and coverted output PDF
Input Word document Converted HTML web page to PDF format in C# and VB.NET
Upload your file (Drag file here)
using GemBox.Document;

class Program
{
    static void Main()
    {
        // If using Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY");

        // Load input HTML file.
        DocumentModel document = DocumentModel.Load("%InputFileName%");

        // When reading any HTML content a single Section element is created.
        // We can use that Section element to specify various page options.
        Section section = document.Sections[0];
        PageSetup pageSetup = section.PageSetup;
        PageMargins pageMargins = pageSetup.PageMargins;
        pageMargins.Top = pageMargins.Bottom = pageMargins.Left = pageMargins.Right = 0;

        // Save output PDF file.
        document.Save("Output.%OutputFileType%");
    }
}
Imports GemBox.Document

Module Program

    Sub Main()

        ' If using Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY")

        ' Load input HTML file.
        Dim document As DocumentModel = DocumentModel.Load("%InputFileName%")

        ' When reading any HTML content a single Section element is created.
        ' We can use that Section element to specify various page options.
        Dim section As Section = document.Sections(0)
        Dim pageSetup As PageSetup = section.PageSetup
        Dim pageMargins As PageMargins = pageSetup.PageMargins
        With pageMargins
            .Left = 0
            .Right = 0
            .Top = 0
            .Bottom = 0
        End With

        ' Save output PDF file.
        document.Save("Output.%OutputFileType%")

    End Sub
End Module

To get the most accurate PDF conversion, you should prefer providing a printer‑friendly HTML pages to GemBox.Document. In other words, your website's content and structure should ideally, be optimized for print.

There are often differences when targeting screen or print type media, which is why a common practice is to add a separate print stylesheet to the HTML after the standard stylesheet (e.g. <link media="print" href="print.css" />). Or alternatively, you can use the print type media rule in your existing stylesheet.

Convert HTML to PDF with headers and footers

GemBox.Document supports reading various page options (like margins, size and orientation) and page styles (like borders and color) from the HTML content itself through "@page" directive or "<body>" CSS properties.

Also, GemBox.Document supports creating HeaderFooter elements from HTML content. If "<header>" is a first element in the HTML file, then its content is going to be read as a document's default header and if "<footer>" is a last element in the HTML file, then its content is going to be read as a document's default footer.

The following example demonstrates how you can create PDF file from HTML text, with pages that have landscape orientation and repeated headers and footers.

Screenshot of converted HTML to PDF with headers, footers and landscape
Converted HTML content to PDF with headers, footers and landscape in C# and VB.NET
using System.IO;
using GemBox.Document;

class Program
{
    static void Main()
    {
        // If using Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY");

        var html = @"
<html>
<style>
  @page {
    size: A5 landscape;
    margin: 6cm 1cm 1cm;
    mso-header-margin: 1cm;
    mso-footer-margin: 1cm;
  }

  body {
    background: #EDEDED;
    border: 1pt solid black;
    padding: 20pt;
  }

  br {
    page-break-before: always;
  }

  p { margin: 0; }
  header { color: #FF0000; text-align: center; }
  main { color: #00B050; }
  footer { color: #0070C0; text-align: right; }
</style>

<body>
  <header>
    <p>Header text.</p>
  </header>
  <main>
    <p>First page.</p>
    <br>
    <p>Second page.</p>
    <br>
    <p>Third page.</p>
    <br>
    <p>Fourth page.</p>
  </main>
  <footer>
    <p>Footer text.</p>
    <p>Page <span style='mso-field-code:PAGE'>1</span> of <span style='mso-field-code:NUMPAGES'>1</span></p>
  </footer>
</body>
</html>";

        var htmlLoadOptions = new HtmlLoadOptions();
        using (var htmlStream = new MemoryStream(htmlLoadOptions.Encoding.GetBytes(html)))
        {
            // Load input HTML text as stream.
            var document = DocumentModel.Load(htmlStream, htmlLoadOptions);
            // Save output PDF file.
            document.Save("Output.%OutputFileType%");
        }
    }
}
Imports System.IO
Imports GemBox.Document

Module Program

    Sub Main()

        ' If using Professional version, put your serial key below.
        ComponentInfo.SetLicense("FREE-LIMITED-KEY")

        Dim html = "
<html>
<style>
  @page {
    size: A5 landscape;
    margin: 6cm 1cm 1cm;
    mso-header-margin: 1cm;
    mso-footer-margin: 1cm;
  }

  body {
    background: #EDEDED;
    border: 1pt solid black;
    padding: 20pt;
  }

  br {
    page-break-before: always;
  }

  p { margin: 0; }
  header { color: #FF0000; text-align: center; }
  main { color: #00B050; }
  footer { color: #0070C0; text-align: right; }
</style>

<body>
  <header>
    <p>Header text.</p>
  </header>
  <main>
    <p>First page.</p>
    <br>
    <p>Second page.</p>
    <br>
    <p>Third page.</p>
    <br>
    <p>Fourth page.</p>
  </main>
  <footer>
    <p>Footer text.</p>
    <p>Page <span style='mso-field-code:PAGE'>1</span> of <span style='mso-field-code:NUMPAGES'>1</span></p>
  </footer>
</body>
</html>"

        Dim htmlLoadOptions As New HtmlLoadOptions()
        Using htmlStream As New MemoryStream(htmlLoadOptions.Encoding.GetBytes(html))

            ' Load input HTML text as stream.
            Dim document = DocumentModel.Load(htmlStream, htmlLoadOptions)
            ' Save output PDF file.
            document.Save("Output.%OutputFileType%")

        End Using

    End Sub
End Module

Check next example or download examples from GitHub.