Charles River Analytics is developing a document image enhancement system for the Army Research Laboratory. The Autonomous Intelligent Document Analysis (AIDA) system is designed to enhance the performance of optical character recognition (OCR).
Although OCR is critical in document management, it is still far behind human accuracy. When low quality documents are imaged in adverse conditions, OCR performance is significantly affected. The AIDA system is designed to enhance OCR performance on degraded document images by automatically detecting and correcting artifacts in image data before it is presented to a commercial OCR engine. The benefit of improved OCR accuracy is especially valuable in the area of anti-terrorist intelligence, where adverse conditions are prominent when imaging documents.
In earlier development, the feasibility of the AIDA system was demonstrated by detecting and correcting artifacts, such as uneven illumination and rotational skew, in both Arabic and English documents. In the current phase of development, Charles River Analytics is adding a number of capabilities to the AIDA system, including perspective warp rectification, character glyph restoration, resolution enhancement, and multi-OCR optimization.
AIDA’s novel approach exploits a knowledge-base for optimizing enhancement parameters and the sequencing of enhancement operations. Additionally, AIDA relies on a rule-based system for controlling multiple OCR engines in order to assign each image to the engine most likely to succeed on that image.