OCR Module Description

The OCR module enables developers to:

The OCR module can be either licensed independently of our other PDF Components or as an add-on to existing licenses.

The Amyuni OCR module is based on the Tesseract Open Source project with the Amyuni PDF technology being used to process and create the PDF documents. The Tesseract library provides high reliability at a low cost and avoids developers the annoyances related to licensing commercial OCR tools which are often licensed on a per-page basis or at a ridiculously high cost to the developer.

 

Features

 

Supported Platforms

Distributable Files

PDFCreactiveX.dll

This is the main ActiveX control that hosts the Amyuni PDF Library and the interface to the OCR engine.

acPDFCreatorLib.Net.dll

This is the .NET class library that is equivalent to the PDFCreactiveX.dll ActiveX control. Developers can either use the ActiveX or .NET but do not need to include both.

Tesseract41.dll

This file contains the Tesseract OCR engine. This DLL and the Tessdata folder described below should be located in the same folder as PDFCreactiveX.dll.

Tessdata Folder

This folder contains all the dictionaries used by the OCR engine. Each language is supported by 8 dictionary files prefixed with the language name, e.g deu for German. If not all languages are needed, then only the required languages can be distributed, e.g. only the eng and fra prefixed files can be distributed for English and French only support.

 

All the samples that are provided in this documentation assume that the developer is using the ActiveX version (PDFCreactiveX.dll.) When using the .NET version (acPDFCreatorLib.Net.Dll), the functions are very similar although the code slightly different. Rather than duplicating all the documentation and sample code, we have chosen to provide a complete .NET sample at the end of this documentation.

 

>