I double checked some things:
After loading the tiff file into the ImageCollection (sourceCollection), each image has a resolution of 300dpi. So I just skipped over the SetRenderingSettings method above.
I perform directly on the images in the ImageCollection. No pre- or post processing will be performed. (I guess you are talking about things like the DocCleanUp)
OCR is done in a specified region. Most times the region exactly fits the area where the words are located, sometimes the word on the document is a little bit outside of this region. The result is: exact recognition or weird results (but results).
Now I saved my i.e. 10 page multipage tiff into some some new multipage tiffs. As explained in my previous post this will be done by checking if the OCR returned a result or not. (there are also more features like checking for exact text or contains text) If so this page becomes the first page of the new document and next pages will be added to this new document until we recognized a result in the given region. This splitting is done by just adding the image from sourceCollection to a new ImageCollection (targetCollection). targetCollection will be stored using the following code.
Small snipped how the source image becomes a target image
for (int page = 0; page < sourceFileImages.Count; page++)
{
targetFileImages.Add(sourceFileImages[page]);
if (pagesToSplit.Contains(page + 1) || page == sourceFileImages.Count - 1 )
{
targetFileImages.SaveSync(targetFileName, true);
targetFileImages.ClearAndDisposeItems();
}
}
Now there are i.e. 4 new multipage tiff files.
I run the application again. But now it does recognize just nothing. And I am wondering why. I expect to get the same recognition results. Except there are now more files (or better less pages in each file), nothing else changes.
BTW: usually the application needs to handle each image or pdf or whatever to an VintaSoft ImageCollection addible file. Those could have various dpi settings. For this reason the SetRenderingSettings needs to be called to have the same dpi for each document, otherwise predefined recognition regions would not fit any more (in 96dpi a 100px long line is longer than in 300dpi)