OcrReadOptions Class
Represents options for performing optical character recognition.
- Inheritance:
- System.ObjectOcrReadOptions
Constructors
OcrReadOptions()
Initializes a new instance of the OcrReadOptions class.
Properties
KeepContent
Gets or sets a value indicating whether the content of the original document should be preserved.
Property Value
- System.Boolean
true if the content of the original document should be preserved, false otherwise.
Remarks
This property controls preservation of all elements except images. To preserve also images, use the KeepImage property.
The default value is false.
KeepImage
Gets or sets a value indicating whether the text should be loaded on top of the image.
Property Value
- System.Boolean
true if the text should be loaded on top of the image, false otherwise.
Remarks
If the value is true, the image will be visible in the resulting document, and the transparent (unless overridden by TextFormatter) text will be written on top of it.
This is useful for creating documents from which the text can be copied while still visually preserving the non-text parts of the image and the texts that couldn't be recognized.
The default value is false.
Languages
Gets or sets the languages which should be used to recognize text.
Property Value
- System.Collections.Generic.IList<System.String>
The languages which should be used to recognize text.
Remarks
The values of this property need to match a data files in TesseractDataPath. If the list is empty, English is used as a default language.
See Also
LibraryPath
Gets or sets the path to the directory that contains tesseract and leptonica libraries.
Property Value
- System.String
The path to the directory which contains tesseract and leptonica libraries.
Remarks
If this property is not set, these locations are searched:
- Assembly.GetExecutingAssembly().Location
- AppDomain.CurrentDomain.BaseDirectory
- Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "bin")
- Environment.CurrentDirectory
The path should point to the directory that contains x64 and x86 folders and not inside the x64 or x86 folder itself.
PageSize
Gets or sets the size of the page to which the content will be loaded.
If the value is null, the size of the page will be determined from the image.
The default value is null.
Property Value
- System.Nullable<PdfSize>
The size of the page to which the content will be loaded.
Remarks
This property is valid only when reading from an image.
TesseractDataPath
Gets or sets the path to the directory which contains tesseract data.
Property Value
- System.String
The path to the directory which contains tesseract data.
Remarks
By default this property points to the folder created by GemBox which contains data for English language. You can download the data for other languages at https://www.gemboxsoftware.com/pdf/docs/ocr.html#language-data
See Also
TextFormatter
Gets or sets the action which should be performed on the loaded text.
public Action<PdfFormattedText> TextFormatter { get; set; }
Public Property TextFormatter As Action(Of PdfFormattedText)
Property Value
- System.Action<PdfFormattedText>
The action which should be performed on the loaded text.
Remarks
This property can be used to change the formatting (font family or color) of the text.
By default, the Calibri font is used and the font size is automatically determined from the image.