Using the Amyuni PDF Creator libraries to redact a PDF document.
“To redact is to select or adapt (as by obscuring or removing sensitive information) from a document prior to publication or release.”
To most people simply rendering a PDF document in PDF editor and adding a solid black rectangular frame object around a text string is considered redaction. However true that the text sting is no longer visible, it is actually still possible to retrieve this text.
To better illustrate that a PDF document contains both text that is visible when previewing the document in a PDF viewer and text that is compressed in the PDF document, below is a screen shot of sample registration form with a user’s social security number that needs to be redacted.
As mentioned above a common technique of redaction is to try hiding sensitive information in a PDF document by covering the information with a PDF frame object. Using the Amyuni PDF Creator product the user can draw a frame object around the security number values (indicated in the Figure 1 with red underline).
This technique will basically only hide the text below the object. This can be illustrated by opening the PDF document in Adobe Reader and searching for the particular text string (ex: 123-456-789). Adobe Reader will still find the text as shown in Figure 3 below and this indicates that the information is still present in the document.
Using methods exposed by the APIs of either our Amyuni PDF Creator .NET or Amyuni PDF Creator ActiveX versions, developers can safely “erase” sensitive information. This is actually a quite simple process and involves getting a reference to the object which holds the text and clearing this object’s text attribute.
The Amyuni PDF Creator exposes a number of different methods that can be used to get to specific object on a PDF page ( i.e. GetObjectXY(), GetObjectsInRectangle(), GetObjectByName() etc.). All of these methods are explained in detail in our online documentation accessible for the link below.
In the code snippet below, which uses the ActiveX version of the Amyuni PDF Creator, another method that can be used to get to a specific object on a page is the ReachTextEx() method.
The ReachText() function reaches a text object having a specified text and font attributes. It searches the document for the first object containing the specified text and makes that object visible. ReachTextEx returns the object reference if the text is found, an empty string otherwise.
The ReachTextEx() method is used to search for a particular text string (in this case the social security number) and change this object’s text and back color attributes.
After executing the VBS script, the resulting redacted PDF document now has the text string removed. This can be illustrated by using the same technique as before and searching for the text string (ex: 123-456-789) in Adobe Reader.
Figure5 illustrates that the text is no longer present.
If you wish to test the solution further, we suggest that you download evaluation versions of our Amyuni PDF Creator products.
This complete technical note is available from the link below