You might think there's not much more Optical Character Recognition (OCR) software can do. As long as it recognises documents accurately and reasonably fast, where's the scope for improvement? In fact, Nuance claims several notable improvements for OmniPage Professional 16, probably the best-known OCR application on the market.
First, says Nuance, the new version is between 16 and 27 per cent more accurate than before, while at the same time being up to 46 per cent faster. On top of this, it should be able to compensate for lens distortions in pictures of pages taken with a camera, automatically black out words in sensitive documents and handle electronic and paper forms. It can produce documents in Office 2007's XPS format and includes copies of both PaperPort 11 (Nuance's document management application) and PDF Converter 4 which, as you might guess, converts documents to PDF format.
The program is also claimed to make a better job of producing accurate representations of pages, without putting everything in separate text and graphics frames. This has long been a gripe, as it's one thing to have the page look right, but another to easily edit the text within that layout. Most OCR programs struggle with the ‘easy edit in layout' part.
Once you've installed and activated OmniPage Professional 16, you have to set up a scanner to work with it. The Scanner Setup Wizard should run automatically, though in our case it didn't. The Wizard downloaded the latest scanner database from Nuance, which didn't include our HP OfficeJet 7210, a current and popular All-in-One. We had to run the program's diagnostics to get it recognised, which involved scanning text, grey scale and colour documents - about five minutes work.
The main processing screen offers four main task tabs at the top with three panes below; one for thumbnails, one for a graphic image of the page and one for the OCRed text. At the bottom is a full-width pane of document statistics, most of which OmniPage works out for itself.
The tabs are for workflow, load or scan type, page layout and export. Despite what Nuance seems to think, they are not that intuitive to use. As if an admission of this, a series of How-to-Guides runs through many of the tasks which should be obvious, but aren't. Unexpectedly, the default 1-2-3 workflow, designed to handle the most common OCR tasks automatically, is set by default to load images from file - is that really where most customers want to get their input documents? You have to alter this behaviour before the program starts to look to a scanner, instead.