Introducing a way to finally make your documents in image formats discoverable. Fig Leaf Software, has partnered with ABBYY, a leading provider of document recognition, data capture and linguistic software to bring the ABBYY Recognition Server for the Google Search Appliance to our customers.
The ABBYY Recognition Server works as a background optical character recognition (OCR) service, enabling the GSA to index full text content from documents in image formats.
Organizations need a fast and reliable way to sort through the tremendous volumes of information they accumulate over time. Implementing the ABBYY Recognition Server for the Google Search Appliance, enables you to sort through your organization's information faster and increase your efficiency, without spending a large amount of time on training.
How It Works
ABBYY Recognition Server - Architecture ABBYY Recognition Server consists of several components, which can be installed on the same or on different computers in LAN. The main components are:
- Server Manager - the central service component, which controls the document processing queue and orchestrates the work of Processing Stations and Verification Stations.
- Processing Station - a service that performs recognition and document conversion.
- Verification Station - a client station which provides an interface for proofreading the recognition results.
- Remote Administration Console - a client console used for configuring and monitoring Recognition Server.

The document conversion process in Recognition Server can be divided in four logical parts:
1. Uploading documents
The user (or a client software program) uploads the images to one of the following network resources:
- network folder (which is convenient in case of centralized processing of many image files);
- FTP folder (e.g. if images should be uploaded from remote locations);
- email folder (e.g. if users send their images for conversion by e-mail).
The Server Manger component of Recognition Server imports the images from the Input source and arranges them in a queue for processing.
2. Processing
The processing of the images and PDF files is done on a Processing Station.
It is possible to connect several computers to the Server Manager as Processing Stations, and the Server Manager will balance the workload among these stations evenly. This will result in much faster processing of the documents.
There are a few essential steps in the document conversion process. Recognition Server does them all automatically without any user assistance.
First there goes an image pre-processing step, at which some preliminary actions are performed on each page:
-
skew correction;
-
automatic detection of page orientation;
-
splitting of facing pages in the case of book scans;
-
noise and garbage removal.
Next comes the recognition part of the process. The OCR and barcode recognition technologies implemented in Recognition Server deliver the unprecedented accuracy and support processing of various types of text and the most popular 1D and 2D barcodes. The OCR process is supported with extensive language databases which include:
-
37 main languages with Latin and Cyrillic alphabets;
-
133 additional languages with Latin, Cyrillic, Greek and other alphabets;
-
Old European languages;
-
Chinese, Japanese and Korean languages
-
Hebrew;
-
Thai;
-
Chemical formulas, artificial and programming languages.
For images scanned in a batch, Recognition Server offers several document separation options. For example, the batch can be split into individual documents using blank separator sheets, barcode sheets, or barcodes stuck or printed on the first page of each document. Recognition Server performs document separation based on the separation rule and the recognized data. Each document will then be exported to a separate output file.
3. Quality Control
Sometimes there is a need to process important documents which have to be recognized with exceptional accuracy. Meanwhile, the quality of scanned images may not be perfect, suffering from low resolution and unwanted noise. In this case it is very important to have a reliable quality check mechanism. Recognition Server provides options for both automatic quality control and a visual verification.
-
Automatic quality control allows the administrator to set a threshold for recognition accuracy. When this option is on, documents with poor-quality text will not be converted, but rather stored in a separate folder for special treatment;
-
If the Verification option is enabled, the pages will be routed to available Verification Stations. Verification Stations allow operators to check the accuracy of the layout and the recognized text, perform any necessary corrections and do the spell checking. Verification can be enabled either for all recognized pages or only for those pages which are recognized with an accuracy below the certain threshold.
4. Getting converted documents
Recognition Server saves the documents in the specified format and delivers them to one of the output destinations:
The program supports creating flexible rules for naming output files and routing them to specific output subfolders. For instance, the current time, date or barcode value can be used to name the output file or folder in the most convenient manner.
Recognition Server can convert images into various kinds of searchable or editable formats: PDF, PDF/A, RTF, TXT, DOC(X), XLS(X), XML, as well as into popular image formats: TIFF, multipage TIFF and JPEG.
Within PDF creation functionality Recognition Server offers extended set of options:
-
document security,
-
file compression,
-
web-optimization,
-
optimization for handled devices,
-
adding headers, footers and Bates stamps into documents,
-
creation of PDF files compliant with PDF/A standard.
Administration
The administration of Recognition Server is performed via a convenient administration interface based on the Microsoft Management Console. It allows the administrator to configure the system and monitor its activity: to set processing parameters, to manage licenses, stations, and user permissions, to manage the processing queue and to view the log files.
The priority management and advanced scheduling features allow the administrator to control the order in which the documents are processed and use the stations’ hardware resources efficiently by scheduling OCR for night hours or weekends.
Integration
ABBYY Recognition Server provides an application programming interface (API) for integration with other applications. The API can be used to pass image files and processing parameters to Recognition Server, get notifications about job completion and obtain converted files. See more information in the Development and integration section.
ABBYY Recognition Server 3.0
Recognition Server 3.0 is now available. Key features include:
- Scanning Station with TWAIN and ISIS support
- Indexing Station with point-and-click indexing capabilities
- Scripts for easy customization and integration
- Modules for Enterprise Search System like Google Search Appliance and MS Search
What's New in Recognition Server 3.0
- Scanning Station
- Indexing Station
- GSA and I filter connectors
- Scripts:
Document Separation Scripts
Document Type Detection and Indexing Scripts
Export Scripts for handling output documents and failed jobs
- Improved CJK (Chinese/Japanese/Korean languages)
- New Barcodes: Data Matrix, QR Code, Aztec
- 11th technologies including ADRT and MRC
- SharePoint Server connector
Usage scenarios:
- Indexing and Archiving
- PDF conversion, Export to Doc. Txt, image formats
- Point-and-click indexing
- Simple classification
- Enterprise Search Systems
- Unlocks image-based documents and feeds back with searchable text
- GSA and iFilter modules
- E-Discovery
- Convert images and email attachments into searchable formats\
- Highly scalable
- Bates stamping
- Everyday conversion
- 24/7 Service
- Easy install
- No training necessary