How to convert Html to Pdf in C#: A complete guide

If you have ever wondered how to convert a web page or an HTML file to PDF in C# or VB.NET, you have come to the right place. This article will demonstrate and explain various ways to convert HTML to PDF using the GemBox.Document library.

GemBox.Document is a .NET class library (component) for manipulating various document formats like DOCX, ODT, HTML, and RTF. In this article, we will focus only on HTML to PDF conversion, but if you are interested in other features or want to learn more, please visit the GemBox.Document product page.

Since GemBox.Document supports various HTML to PDF conversion options. We have organized this article into the following sections:

Install and configure the GemBox.Document library

For this article, we propose that you create a new .NET project. If you are unfamiliar with Visual Studio or need a reminder, refer to the official tutorial. Also, although GemBox.Document supports a wide range of .NET versions (from .NET Framework 3.5) we recommend that you use the newest version.

Before you can start converting HTML to PDF, you need to install GemBox.Document. The best way to do that is via NuGet Package Manager.

  1. In the Solution Explorer window, right-click on the solution and select 'Manage NuGet Packages for Solution'.
    Manage NuGet Packages
  2. Search for GemBox.Document and click on 'Install'.
    Scan of printed PDF file

As an alternative, you can open the NuGet Package Manager Console (Tools -> NuGet Package Manager -> Package Manager Console) and run the following command:

Install-Package GemBox.Document

Now that you have installed the GemBox.Document library, all you have to do is make sure you call the ComponentInfo.SetLicense method before using any other member of the library. Since we are working with a console application, we suggest putting it at the beginning of the Main() method.

If you have set everything correctly, your code should look like this:

using GemBox.Document;
class Program
{
    static void Main()
    {
        ComponentInfo.SetLicense("FREE-LIMITED-KEY");

        // The code starts here

    }
}

In all parts of this tutorial except the 'How to convert web page's section, we are going to work in the free mode. The free mode allows you to use the library without purchasing a license, but with some limitations. You can read more about working modes and limitations on the Evaluation and Licensing documentation page.

Convert simple HTML string to PDF

We will start this tutorial with something simple. This section will demonstrate how to create a document with one paragraph using HTML code.

  1. First, create a document and a section to it:
    var document = new DocumentModel();
    var section = new Section(document);
    
    document.Sections.Add(section);
  2. Now you will add some content, but instead of adding the elements manually you will load an HTML string:
    section.Content.LoadText(
        "<p style=\"color:red;font-size:40px;text-align:center\">Hello World!</p>", 
        LoadOptions.HtmlDefault);
    

    As you might have guessed, the code above will add one paragraph with the text 'Hello World!' to the document. The text will be centered horizontally, in red color, and the size will be 30 points (40 pixels).

  3. To see that for yourself, save the document to PDF and open it.
    document.Save("Output.pdf");

And here is a screenshot of how it looks in Adobe Reader.

how to convert html string to PDF
Screenshot of a PDF file generated from a HTML string

Note that GemBox.Document supports other file formats like DOCX, ODT, and RTF. To save to one of those formats, you just need to change the output file extension. For example, to save to DOCX, just change the previous code to this:

document.Save("Output.docx");

How to convert HTML file to PDF

This section will show how you can convert an HTML file to PDF. For this purpose, we created a rather complex HTML file to demonstrate that GemBox.Document can handle various elements like CSS styles, pictures, tables, etc. The file can be downloaded from here.

The process is straightforward. You need to load the HTML file to DocumentModel and then save the model to a PDF file.

var document = DocumentModel.Load("Input.html");
document.Save("Output.pdf");

The output looks like this:

how to convert html file to PDF
Screenshot of a PDF file generated from a HTML file

You can write this code as one line, but this way you can examine the model if you put a breakpoint on the second line. That will allow you to see how GemBox.Document parsed the HTML content into its model.

You can also use this in-memory model to modify the document by adding, updating, or removing the elements (content).

How to convert a web page (URL) to PDF

Apart from converting HTML files and strings to PDF, you can also use GemBox.Document to convert live web pages to PDF. The process is the same as converting an HTML file, but instead of providing a file name, you need to provide an URL to the web page that you want to convert to PDF.

var document = DocumentModel.Load(
    "https://www.gemboxsoftware.com/document/docs/supported-file-formats.html",
    LoadOptions.HtmlDefault);
document.Save("Output.pdf");

Here is the screenshot of the resulting PDF file:

web page to pdf converter
Screenshot of a PDF file generated from a HTML string

As you may notice, the PDF looks different than the web page. It is missing some elements like the top and side menus and table formatting. This is because default HtmlLoadOptions use print type media rule (@media print { … }) as the default style. Most pages have this style defined for printing purposes. You can check this by opening a print preview in the browser and comparing it to the original page. For the page used in the code above, the print preview looks like this:

print preview pdf file
Screenshot of a print preview window in a Pdf file generated from an URL

You can, of course, change that by setting the HtmlLoadOptions.StyleMediaType property to null or some other media type value.

var options = new HtmlLoadOptions() { StyleMediaType = null };
var document = DocumentModel.Load(
    "https://www.gemboxsoftware.com/document/docs/supported-file-formats.html",
options);

Running this code will create a different PDF, as you can see here:

convert URL to PDF csharp
Screenshot of a PDF file generated from a HTML string

The PDF looks much different from the previous one, but it is also different from the web page itself. This is mostly because GemBox.Document doesn't support JavaScript rendering and some complex display options. Also, HTML structure is more flexible than PDF structure, so it is not always possible to convert HTML to PDF directly. Because of that, a common practice is to add a separate print style to the HTML.

<link media="print" href="print.css" />

This is why to get the most accurate HTML to PDF conversion with GemBox.Document, you should optimize the web page content for printing.

Note that the web page loaded in this example requires a professional version because the resulting file contains more than 20 paragraphs. You can read more about free version limitations here.

Additional HTML to PDF conversion options

In this section, we will explore some additional options that GemBox.Document supports when converting HTML to PDF.

We will start with the following HTML code:

<html>
<body>
  <header>
    <p>Header text.</p>
  </header>

  <main>
    <p>First page.</p>
    <br>
    <p>Second page.</p>
    <br>
    <p>Third page.</p>
    <br>
    <p>Fourth page.</p>
  </main>

  <footer>
    <p>Footer text.</p>
  </footer>
</body>
</html>

The idea here is to put each <p> element on a separate page. For that to work, you need to set the break-before CSS property to 'always' for <br> elements. Since you will add more style options, use a <style> element:

<head>
  <style>
    br {
      break-before: always;
    }
  </style>
</head>

While we're at it, let's style the rest of the HTML elements:

body {
  background: #EDEDED;
  border: 1pt solid black;
  padding: 20pt;
}
header { color: #FF0000; text-align: center; }
main { color: #00B050; }
p { margin: 0; }
footer { color: #0070C0; text-align: right; }

One of the additional conversion options that GemBox.Document supports is reading page headers and footers as HeaderFooter elements. It works only if <header> is the first element in the HTML body. If that is the case, it will be converted to the document's default header. The same rule applies to the <footer> element, but only if it is the last element in HTML's body.

Next, you will specify page options. For this specific document you will set the page size to A5 and add some margin. GemBox.Document supports the standard CSS @page at-rule, so you can define everything in that rule.

@page {
    size: A5 landscape;
    margin: 6cm 1cm 1cm;
 }

You will also specify some footer and header margin, but since there are no standard properties for that, you need to set those with mso properties:

mso-header-margin: 1cm;
mso-footer-margin: 1cm;

Mso CSS properties are Microsoft Office specific properties that are not supported by common Internet browsers. They are used only when converting HTML to and from Microsoft Office applications. Since GemBox.Document also supports DOCX and DOC formats. It also supports some of these mso properties, so you can use them to specify additional document options when converting HTML to PDF or any other format that GemBox.Document supports.

Another mso property that you can use is mso-field-code. With it you can add fields to the output document. For this example, you will add the PAGE and NUMPAGES fields to the footer so that each page in the output PDF shows its number and the overall number of pages in the footer.

<footer>
  <p>Footer text.</p>
  <p>Page <span style='mso-field-code:PAGE'>1</span> of <span style='mso-field-code:NUMPAGES'>1</span></p>
</footer> 

Now, for the conversion part, you first need to save the HTML to a file. You can call the file MyHtml.html. All you have to do afterward is load that HTML file into GemBox.Document's model and save it to a PDF file.

var document = DocumentModel.Load("MyHtml.html");
document.Save("Output.pdf");

And here is the screenshot:

html to pdf conversion
Screenshot of a PDF file generated from a HTML file

GemBox.Pdf for advanced PDF editing

GemBox.Document is a great library for converting HTML to PDF, but if you are looking for advanced PDF editing, then you should take a look at the GemBox.Pdf library.

It is designed specifically for manipulating PDF files since PDF's fixed structure is quite different from flow document structure. With its simple but powerful API, you can easily read, write, edit, and print PDF files in C# or VB.NET. For more information, visit the GemBox.Pdf product page on our website.

Conclusion

In this article, you saw all the ways you can use GemBox.Document for converting HTML to PDF programmatically in .NET. We covered everything from loading a simple HTML string to converting a complex HTML file with additional page options to PDF.

For more information regarding the GemBox.Document API, check the documentation pages. We also recommend that you check our GemBox.Document examples where you can examine other features and even run the example code.

Happy HTML to PDF converting!

See also


Next steps

GemBox.Document is a .NET component that enables you to read, write, edit, convert, and print document files from your .NET applications using one simple API. How about testing it today?

Download Buy