Read PDF files and extract text from PDF files in C# and VB.NET with the GemBox.Document component.

GemBox.Document is a C# / VB.NET component that enables developers to read, write, convert, and print document files (DOCX, DOC, PDF, HTML, XPS, RTF, and TXT) from .NET applications in a simple and efficient way without the need for Microsoft Word on either the developer or client machines.
GemBox.Document Free is free of charge, while GemBox.Document Professional is a commercial version that is licensed per developer.
For more information, see GemBox.Document Features or try our examples.

Following example reads PDF file and extracts text from PDF file into Console.

C# code

// Load PDF file.
var document = DocumentModel.Load("CustomInvoice.pdf");

// Specify regular expression used to extract text from PDF file.
var totalRegex = new Regex(@"Total\s+(?<Total>\d+\.\d{2})");

// Extract text from PDF file.
var totalValue = totalRegex.Match(document.Content.ToString()).Groups["Total"].Value;

// Write text extracted from PDF file to Console.
Console.WriteLine("Total: {0}", totalValue);

VB.NET code

' Load PDF file.
Dim document = DocumentModel.Load("CustomInvoice.pdf")

' Specify regular expression used to extract text from PDF file.
Dim totalRegex = New Regex("Total\s+(?<Total>\d+\.\d{2})")

' Extract text from PDF file.
Dim totalValue = totalRegex.Match(document.Content.ToString()).Groups("Total").Value

' Write text extracted from PDF file to Console.
Console.WriteLine("Total: {0}", totalValue)