Some text objects fail to read correctly

The PDF Creator .NET Library enables you to create secure PDF documents on the fly and to view, print and process existing PDF documents. If you have any questions about PDF Creator .Net, please post them here.

Some text objects fail to read correctly

Postby Wesdpl on Tue Nov 06 2012

I am using version and have adjusted the sample code to extract all the text elements from a PDF file, but some of the text read is rubbish e.g. "W\DQGÀQDQFLDODI"

Here is the extract of my code where it is trying to read all the text objects from page 1

For Each obj As Amyuni.PDFCreator.IacObject In arList
'you can access all properties of each object
Dim attr As IacAttribute = obj.Attribute("ObjectType")
Dim oPage As Integer
oPage = obj.PageNumber
If oPage = i Then
Dim oType As Integer
oType = CInt(attr.Value)
Dim oTypeMR As String = ""
Select Case oType
Case 5 : oTypeMR = "Text"
If Pass = 1 Then
Dim oTextText As String
oTextText = obj.Attribute("Text").Value
Dim oTextColor As String
oTextColor = obj.Attribute("TextColor").Value
Dim oTextFont As String
oTextFont = obj.Attribute("TextFont").Value

All of the fonts in the PDF are subsets of fonts e.g. "AZVOLY+Arial Black,20.0000,400,0,0,0,0"

It would appear that depending on the subset, some fonts are read correctly and others don't.

Any help would be appreciated

Posts: 1
Joined: Tue Nov 06 2012

Re: Some text objects fail to read correctly

Postby Jose on Mon Jan 14 2013


Without looking at the PDF document it makes it difficult to detect the issue.

However, I suggest that you look at the DelimitedText Method. The DelimitedText() function retrieves only the text within a PDF object and not the string formatting (bounding box) of the object.

The link below points to our online help where the DelimitedText is explained further.

Get PDF Suite, the expert .NET developer toolkit for PDF conversion, creation and editing - [color=#0040FF] [/color]
Amyuni Team
Posts: 544
Joined: Tue Oct 01 2002

Re: Some text objects fail to read correctly

Postby ThomasUttendorfer on Wed Sep 12 2018

maybe it helps when you call OptimizeDocument(1) before you get the text attributes.

This function seems to have the side effect that Identity-H font encoding is resolved
which helps to retrieve text-attributes.
Identity-H encoding seems to be used when (Unicode)fonts are partially embedded.

Kind regards
Posts: 2
Joined: Fri Dec 02 2016

Return to Amyuni PDF Creator .NET (PDF Viewer / Editor)

Who is online

Users browsing this forum: No registered users and 3 guests