Extract text content

The PDF Creator enables you to create secure PDF documents and to view, print and process existing PDF documents.
If you have any questions about the installation and usage of the PDF Creator please post them here.
Post Reply
zaksoft
Posts: 66
Joined: Fri Feb 20 2004

Extract text content

Post by zaksoft »

Is there any whay to get full text content of a document or a page ?

I need to identify documents from their content.

TIA.
Davide Zaccanti
Joan
Amyuni Team
Posts: 2799
Joined: Wed Sep 11 2002
Contact:

Re: Extract text content

Post by Joan »

Hello David,

The Page and Document don't have a Text attribute. it is the labels on that page that have a text attribute. You can retreive in a loop the text of all the labels in a document or on a page and save them to a text file or dispaly them.

Hope this helps.
Custom Brand the Amyuni PDF Printer Driver http://www.amyuni.com/en/developer/branding/index.html

Amyuni PDF Converter tested for true PDF performance. View results - http://www.amyuni.com/benchmark
zaksoft
Posts: 66
Joined: Fri Feb 20 2004

Re: Extract text content

Post by zaksoft »

Thank you for this information, I'll try to implement it, but since many other items can be involved ( various fields ) I suggest a page-level function that retrieve the full text. Maybe other developers are interested.
Davide Zaccanti
Joan
Amyuni Team
Posts: 2799
Joined: Wed Sep 11 2002
Contact:

Re: Extract text content

Post by Joan »

Hello,

In a loop you can retreive all the text in all the objects containing text from 1 page or more of the document.

I will forward your suggestion to our developers to check it. Please feel free to add it to our wish list forum as well. This forum is usally visited by our project managers.

Best Regards,
Custom Brand the Amyuni PDF Printer Driver http://www.amyuni.com/en/developer/branding/index.html

Amyuni PDF Converter tested for true PDF performance. View results - http://www.amyuni.com/benchmark
Joan
Amyuni Team
Posts: 2799
Joined: Wed Sep 11 2002
Contact:

Re: Extract text content

Post by Joan »

Hello,

I checked this issue with our developers.

There is already a method to do that, it is GetRawPageText().

Here is a sample code on using it.

Code: Select all

// open a PDF file
axPDFCreactiveX1.Open(System.IO.Directory.GetCurrentDirectory()+"\\sampleBookmarks.pdf", "");
axPDFCreactiveX1.Refresh ();
String text = axPDFCreactiveX1.GetRawPageText (1);
MessageBox.Show (text);
Hope this helps.
Custom Brand the Amyuni PDF Printer Driver http://www.amyuni.com/en/developer/branding/index.html

Amyuni PDF Converter tested for true PDF performance. View results - http://www.amyuni.com/benchmark
Post Reply