VintaSoft PDF .NET Plug-in Discussions

Questions, comments and suggestions concerning VintaSoft PDF .NET Plug-in.

Board index < VintaSoft Imaging < VintaSoft PDF .NET Plug-in Discussions

We are migrating to new forums engine, no new registration or posting currently available. TIA for your patience.

PDF Document TextRegion has strange structure



PDF Document TextRegion has strange structure

Post by SebastianB »

Hi,

I have problem with a PDF document. The documents contains some text-elements which are formatted with a special font (i.e. Tahoma).
I am iterating over the pages and textlines to check every symbol for beeing formatted with this font and attach those characters to a StringBuilder. the idea behind is to extract information from the document for further processing.

Here is what I see in any kind of PDF Viewer (this also includes the VintaSoft Demo Applications):
&Field1:608121 &Field64:01.07.2010 &Field3:12.286,75

I am using the following code to extract the required stuff:
var pdf = new PdfDocument(file);
                var sb = new StringBuilder();
                for (int iPage = 0; iPage < pdf.Pages.Count; iPage++)
                {
                    var page = pdf.Pages[iPage];
                    foreach (var textRegionLine in page.TextRegion.Lines)
                        foreach (var symbol in textRegionLine.Symbols)
                        {
                            //Compare fonts with allowed ones and add the symbol to the StringBuilder
                        }
                }
                pdf.Dispose();
                pdf.ClearCache();
An here is what I get (just a part of the output):
&Field1:608121 &Field64:01.07.2010 &Field3:12 286 75
.
,

Any suggestions? Any ideas?
Thanks,
Sebastian


Re: PDF Document TextRegion has strange structure

Post by Alex »

Hello Sebastian,

Could you send us a demo project which demonstrates the issue? If yes, please send the project with description of the problem to support@vintasoft.com

Best regards, Alexander


Page 1 from 1: 1