Annotate This!

by | Jul 28, 2011 | Alfresco, Alfresco Development | 0 comments

Over the past 12 months (or maybe longer) I have been involved with 2 separate projects involved in Alfresco integration with Daeja ViewONE Pro.

Daeja ViewONE Pro is a java applet designed to allow users to apply annotations to any* document within a document management system.  That is just the starting point.  The applet has been developed whereby an implementer can interact with it via javascript to enhance the user experience.  The implementer can also dynamically change what controls and features are available to a user based upon the user’s role and permissions within the CMS.  I could go on but I would probability end up repeating Daeja’s website 🙂

Daeja also offers modules to extend the functionality of its offering.  Let’s take a look at the two functionalities that i needed and how Daeja delivered on these needs:

PDF Annotation

For this i used the PDF Module, which adds PDF support to Daeja ViewONE Pro. This allows for users to annotate PDF files in ways that are outside of the way that Adobe offers these services.

Permanent Redaction

The Permanent Redaction module came as a result of clients requiring annotations on documents which, for security purposes, needed them to be burned into the document itself.  A redaction is a special annotation.  When placing a redaction onto a document it’s intended purpose is to obscure the information below.  In other words, make it unreadable, unrecognizable to anyone or anything.  This reminds me of the film Good Morning Vietnam when Adrian Cronauer if given news to read out most of the pages contain black bars.  This is redaction, and in this case permanent redaction. See Example Image Below

Redaction Example

This is where Daeja’s offering of the Permanent Redaction module comes in.  It can take the annotations** on the document, pass them through the Permanent Redaction Module and burn them into the document, producing either a TIFF or PDF as output.  Please do note, it is up to the client and implementer as to how and when the burning takes place and who has access to original content and who has access to burned content.

OK, enough about Daeja.  Back to the projects.

Annotation Example: Project 1

The client was not concerned about maintaining an annotations file.  They simply wanted all redactions to be burned into the document and for the document to be versioned (so as to maintain a history of redactions).  This was to support redacting PII (Personally Identifiable Information) in support of FOIA (Freedom of Information Act) requests. In this case all documents being handled were PDF.  If you remember one of the output options of Permanent Redaction was PDF.  This made it simple in that each new version was the result of burning.  This greatly simplified the model around security.

The burning process can be implemented in 2 ways, on-demand and background processing, and the client wanted the burning process to be on-demand.

On implementing the burning process it was determined to run this in a separate Tomcat instance from Alfresco.  Not knowing how frequently documents were be worked on and knowing that the size of some of these PDF’s could in fact be over 200Mb in size offloading this to another instance seemed to be the correct way to go.  Daeja provide example code on how to implement the burning process but it is up to the implementor to customize this for the particular environment this is intended for.

This leads me nicely into Alfresco and it’s numerous API’s.  One of the many strengths of Alfresco is it’s numerous API’s which help with the flexibility of integration.  There is RESTful, Web Services, JavaScript, Java Foundation, JCR, etc. etc.  One of the common questions asked is ‘which API should I use?’.  For the Permanent Redaction integration, the implementer opted for a combo if Web Scripts and Web Services.  Web Scripts are awesome.  If the language you are using for integration can make an HTTP request then you can use Web Scripts.  Web Scripts are simple to implement, follow an MVC approach, support multiple inputs (think overloading) and support multiple outputs (need HTML, JSON or something returned?).  If the JavaScript API does not provide enough, it can be extended at the service level and exposed at the JavaScript.

Now, you may ask if they are so awesome why use Web Services as in combination with Web Scripts.  The issue for this client was the size of the PDF file.  Potentially 200Mb or more.  This is a limitation of Web Scripts in that uploading large files is a memory / performance hit.  This is the main reason Web Services was included in the combo.  As part of Web Services there is a highly optimized set of functions for streaming content into the repository.  Thankfully Alfresco also included the ability to share the authentication token between the different API’s!  To summarize for this client, Daeja ViewONE Pro is integrated into Alfresco.  When a document is opened for redaction, the applet is rendered to the browser through a custom webscript.  When the user clicks on the ‘burn’ button, the noderef of the document is sent to the Permanent Redaction instance running in the separate Tomcat instance, along with the annotation data.  The custom code running under the Permanent Redaction instance fetches the document from the repository via webscripts, applies the annotations and burns them into a new PDF.  This PDF is uploaded to the repository as a new version of the starting document.

Annotation Example: Project 2

The second project followed more of the traditional annotation model.  The client wanted the ability to annotate documents based upon content type, content format and user role and permissions.  One of the requirements was to NOT version the document.  To be honest, requirement I like.  The document itself is not changing so why constantly version it?  The way Daeja works is it maintains a separate content file with the annotations.  The initial integration  of Daeja and Alfresco was by Dr. Qu (Alfresco) and further continued by Jared Ottley (Alfresco).  This initial integration includes a basic annotation content model which associates the annotation file to the document as a child association.  Simple and effective.  The one thing that was annoying me for this client, it would be nice to version the annotations applied to the document so that there was a visual history.  Using the power of Web Scripts and Free Marker a template was created to list versions of the annotation file associated with a document, if it has one.

Some more information on Daeja is the ability to specify a server side script, cgi, etc. the call when the annotation save button is pressed.  Web Scripts are invoked via HTTP.  This means the Daeja save annotation button can invoke a custom webscript.  This webscript adds the versional aspect to the annotations file (if it is not already attached) then versions the annotation file.  This webscript also attached the Free Market template to the document to provide access to the versioned annotation files.  When clicking on a previous annotation this opens the document with the specified annotation and displays this in Daeja ViewONE Pro.  Oh, and this is in read only mode so that previous versions cannot be rewritten!

The other more interesting requirement was each annotation added to the document also needed text below this annotation to indicate who created it and when. Daeja provides this functionality as a tooltip but for compliance reasons this needed to be added to the document to be seen visually and not as a tooltip.  Unfortunately Daeja does not provide this capability (though it has now been logged as a feature request and has gotten their boffins pondering on it).  In the meantime a workaround was provided, again highlighting the flexibility of Daeja.  The applet provides events which can have custom coded attached to them.  In this instance when a save annotation takes place and event is triggered.  It was then possibly to have custom javascript be invoked on this event.  This is were the fun started and flash backs to maths and geometry classes happened.

It should be noted the user/date label being applied to the document was actually an annotation.  Annotations must be on the document and not fall outside the page boundaries.  This meant some calculations were needed to determine the location of the annotation and to decide if the label would be applied below or above the annotation.  The next consideration was how close to the right hand margin was the annotation.  I would like to say some complex mathematical formula was implemented but it was some simple maths that came to the rescue.  Actually this functionality is most likely cause for another blog as there are other things to consider.

This project was different to the first project but both were great learning experiences in the areas of annotation and redaction and what clients are looking for.  It has provided great information and ideas on what else could be done in the area of integration to provide even more cool integration features.

*currently Daeja supports over 300 document types!

**TIFF and PDF support a subset of annotations which can be burned into the document.


Related Posts


Submit a Comment

Your email address will not be published. Required fields are marked *