GemBox.Pdf
  • Overview
  • Examples
  • Free version
  • Support
  • Pricelist

    Show / Hide Table of Contents

    OcrReadOptions Class

    Namespace:
    GemBox.Pdf.Ocr
    Assembly:
    GemBox.Pdf.Ocr.dll

    Represents options for performing optical character recognition.

    • C#
    • VB.NET
    public sealed class OcrReadOptions
    Public NotInheritable Class OcrReadOptions
    Inheritance:
    System.Object
    OcrReadOptions

    Constructors

    OcrReadOptions()

    Initializes a new instance of the OcrReadOptions class.

    • C#
    • VB.NET
    public OcrReadOptions()
    Public Sub New

    Properties

    KeepContent

    Gets or sets a value indicating whether the content of the original document should be preserved.

    • C#
    • VB.NET
    public bool KeepContent { get; set; }
    Public Property KeepContent As Boolean
    Property Value
    System.Boolean

    true if the content of the original document should be preserved, false otherwise.

    Remarks

    This property controls preservation of all elements except images. To preserve also images, use the KeepImage property.

    The default value is false.

    KeepImage

    Gets or sets a value indicating whether the text should be loaded on top of the image.

    • C#
    • VB.NET
    public bool KeepImage { get; set; }
    Public Property KeepImage As Boolean
    Property Value
    System.Boolean

    true if the text should be loaded on top of the image, false otherwise.

    Remarks

    If the value is true, the image will be visible in the resulting document, and the transparent (unless overridden by TextFormatter) text will be written on top of it.

    This is useful for creating documents from which the text can be copied while still visually preserving the non-text parts of the image and the texts that couldn't be recognized.

    The default value is false.

    Languages

    Gets or sets the languages which should be used to recognize text.

    • C#
    • VB.NET
    public IList<string> Languages { get; set; }
    Public Property Languages As IList(Of String)
    Property Value
    System.Collections.Generic.IList<System.String>

    The languages which should be used to recognize text.

    Remarks

    The values of this property need to match a data files in TesseractDataPath. If the list is empty, English is used as a default language.

    See Also
    https://www.gemboxsoftware.com/pdf/docs/ocr.html#language-data

    LibraryPath

    Gets or sets the path to the directory that contains tesseract and leptonica libraries.

    • C#
    • VB.NET
    public static string LibraryPath { get; set; }
    Public Shared Property LibraryPath As String
    Property Value
    System.String

    The path to the directory which contains tesseract and leptonica libraries.

    Remarks

    If this property is not set, these locations are searched:

    • Assembly.GetExecutingAssembly().Location
    • AppDomain.CurrentDomain.BaseDirectory
    • Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "bin")
    • Environment.CurrentDirectory

    The path should point to the directory that contains x64 and x86 folders and not inside the x64 or x86 folder itself.

    PageSize

    Gets or sets the size of the page to which the content will be loaded.

    If the value is null, the size of the page will be determined from the image.

    The default value is null.

    • C#
    • VB.NET
    public PdfSize? PageSize { get; set; }
    Public Property PageSize As PdfSize?
    Property Value
    System.Nullable<PdfSize>

    The size of the page to which the content will be loaded.

    Remarks

    This property is valid only when reading from an image.

    TesseractDataPath

    Gets or sets the path to the directory which contains tesseract data.

    • C#
    • VB.NET
    public string TesseractDataPath { get; set; }
    Public Property TesseractDataPath As String
    Property Value
    System.String

    The path to the directory which contains tesseract data.

    Remarks

    By default this property points to the folder created by GemBox which contains data for English language. You can download the data for other languages at https://www.gemboxsoftware.com/pdf/docs/ocr.html#language-data

    See Also
    https://www.gemboxsoftware.com/pdf/docs/ocr.html#language-data

    TextFormatter

    Gets or sets the action which should be performed on the recognized text.

    • C#
    • VB.NET
    public Action<PdfFormattedText> TextFormatter { get; set; }
    Public Property TextFormatter As Action(Of PdfFormattedText)
    Property Value
    System.Action<PdfFormattedText>

    The action which should be performed on the recognized text.

    Remarks

    This property can be used to change the formatting (font family or color) of the recognized text.

    By default, the Calibri font is used and the font size is automatically determined from the recognized text.

    Back to top

    Facebook • Twitter • LinkedIn

    © GemBox Ltd. — All rights reserved.