VintaSoft Imaging .NET SDK and Plug-ins Discussions
Questions, comments and suggestions concerning VintaSoft Imaging .NET SDK.
Board index < VintaSoft Imaging < VintaSoft Imaging .NET SDK and Plug-ins Discussions
public static void RecognizeNonTextPdfPages(OcrEngine ocrEngine, string inPdfFilename, string outPdfFilename)
{
OcrEngineManager engineManager = new OcrEngineManager(ocrEngine);
OcrEngineSettings ocrSettings = new OcrEngineSettings(OcrLanguage.English);
PdfRenderingSettings renderingSettings = new PdfRenderingSettings();
renderingSettings.Resolution = new Resolution(300, 300);
// open source PDF document
using (PdfDocument document = new PdfDocument(inPdfFilename))
{
// create PDF document builder
PdfDocumentBuilder builder = new PdfDocumentBuilder(document);
builder.Font = PdfDocumentBuilder.CreateGlyphLessFont(document);
builder.PageCreationMode = PdfPageCreationMode.ImageOverText;
// for each page in source PDF document
for (int i = 0; i < document.Pages.Count; i++)
{
// get PDF page
PdfPage page = document.Pages[i];
// if page does not have text
if (page.TextRegion.IsEmpty)
{
// render image of PDF page
using (VintasoftImage image = page.Render(renderingSettings, null, null))
{
// recognize text in rendered image
OcrPage ocrPage = engineManager.Recognize(image, ocrSettings);
// if page has text
if (ocrPage != null && !string.IsNullOrEmpty(ocrPage.GetText()))
// set OCR page as a background for PDF page
builder.SetAsBackground(i, ocrPage);
}
}
}
// save PDF document to a new file
document.Pack(outPdfFilename);
}
}
Best regards, Alexander
Is it possible to remove previous text layer on pdf?Yes, this is possible. Please use the PdfPage.RemoveText() method for removing text from PDF page:
I used PdfPage.RemoveText() to remove text. The problem is that the image is also removed for som PDF.The PdfPage.RemoveText method removes only text and does not remove images.
We have two kind of PDFs. One that are already searchable and the other that has been OCR (image+text layer).
The PdfPage.RemoveText method removes only text and does not remove images.Yes I think you are right.