10 Tips for Working with xCP

10 Tips for Working with xCP

Currently, I am finishing up an EMC xCP project, and I would like to leave behind some tips I have had to discover the hard way. Hopefully, this will save you some time during your xCP project.

tips for working with xCP

Tip #1 – Have Well-Defined Use Cases

I know it is usually a requirement for all projects, but for xCP you really need to know what the user experience is going to be before jumping into xCP configurations. Spend extra time on the planning, and you may keep yourself from having to undo/modify a lot of configurations.

Tip #2 – xCP is Not Webtop or D2!

This has been the biggest issue thus far with the current project. The experience the users desired was not process oriented; rather they wanted a CMS front-end. For well-defined business processes, xCP works as expected. In fact, we have a separate xCP project that is going well because it is process oriented. Trying to make a full-featured CMS front-end using xCP, however, will reveal what will be many future enhancements for xCP. If you are simply looking for a CMS front-end, and you have limited time to get the job done, consider EMC’s other alternatives like Webtop or D2.

Armedia, in being a vendor-neutral company, has also used Generis CARA to fit our client’s needs. CARA is a third party CMS front-end/business rules engine that integrates with Documentum, Oracle WebCenter, and Alfresco. So consider your alternatives carefully based on your use cases (hint rule#1), and your requirements.

Tip #3 – Make Certain your Object Model Definitions are Complete Before Configuration Begins

You will save yourself a lot of aggravation if your model is set in stone. xCP does a great job helping you out upon page creation, however, any time after creation modification of the model will require manual changes or recreation of pages. By the time you have created your custom pages, you will not want to change your model. (You can change the model of course. You’ll simply wish you didn’t have to.)

Tip #4 – Work on the Import Fragments Before Any Other Pages or Fragments

The Import Fragment page is called by the ‘Default Import Document’ Action Flow. If you happen to research the ‘Default Import Document’ Action Flow, you will see it looks for a fragment with a name ending in ‘_imp’. The Import Fragment of an object type will contain most, if not all, of your business logic for creating a document type. Once this page is complete, it can be duplicated for the ‘Default Import New Version’ Action Flow.

Tip #5 – There is a Difference Between, and Need for, ‘_imp’ and ‘_chk’ Pages

I thought it would be a good idea to change the ‘Default Import New Version’ Action flow to use the ‘_imp’ pages instead of the ‘_chk’ pages. As soon as I did this I started to realize the ‘Import New Version’ really needed different business logic on the pages. When you perform an import your model is empty so you don’t have to deal with business rules until the user enters data. When you import a new version, your model and associated fields will contain pre-populated data. This, for me, required thinking about some of the events that are triggered. These events, of course, tended to conflict with the ones I had in place. So I had to go back and use the ‘_chk’ pages as intended. They were still duplicates of the ‘_imp’, but with minor changes as compared to making complex rule changes to the ‘_imp’ page.

Tip #6 – Take Advantage of Custom UI Events

With xCP 2.1, you can create custom UI Events. The additions of the events have been helpful. In my instance, I created a custom UI Event to keep track of the validation rules and populate a message to the users. For every show/focus/change event, I published a custom UI Event called ‘Required Field Change’. The event contained a message string and a field name string. By having this Event published, and triggered, I could populate a value display field with the event’s message.

Tip #7 – Use the Process Debugger Before Pulling your Hair out

When you start working with processes, and you will make certain your process is running properly before trying to launch it from a page. Here the developers of xCP provided a very nice debugging tool that allowing you to test the process without the need to deploy and call it from a page. By using this tool you ensure issues you encounter along the way are not blamed on the process itself.

Tip #8 – Use a Hidden “Debug” Column Box to Track Complex Validation Rules

I was given this tip, and I will share it with you. Like most Input screens, your business rules will require some complex validations. In order to track these rules, I set up a “Debug” column box, which I keep hidden based on roles. Within the “Debug” box, I have a series of value display fields all set as a Boolean field. Each of these value display fields contains a rule by which I validate a particular field by. I then have one overall value display field that is set to true once all of the other boxes are set to true. These value display fields also helped me drive the ‘Required Field Change’ event in tip #6. As long as the display value fields were false then the warning message remained visible to let the user know what was wrong with the input given.

Tip #9 – Don’t Start Deleting the Buttons xCP Gives You

When creating a new View/Edit/Import Page or Fragment, xCP will set up an array of buttons you may or may not desire the users to have access to. Instead of deleting the button, however, simply set the ‘Hidden’ attribute to ‘true’. You may find the button a pain to rebuild later if you decide you needed it. Once everything is working like you want it, and you want to improve performance/size, then you can consider removing the button. (However, a button and process definition on a page really isn’t doing anything to hinder performance and doesn’t take up much space. Check to see if the process is triggered ‘On Load’ so it doesn’t execute unnecessarily, but that is all you need for performance.)

Tip #10 – Be Willing to use Plug-Ins

Remember, xCP is first a framework to build a custom Documentum Web Application with. Therefore the core functionality within xCP is basic. You will find yourself writing custom widgets to perform a particular task or you can look into the list of plugins to help solve a particular problem for you. The plugins found on EMC’s support site have many features already created that you may be looking for.

Ephesoft 3.1 – Thoughts on the Feature Demo

For those of you who may not have heard, Ephesoft 3.1 is now available and last week an in-depth demo of the new features was recorded.  You can view the Ephesoft feature demo here:

For those without the 1 hour and 18 minutes to spare, below are some of the highlights, and my thoughts of the demo.  All of the Ephesoft Features can be found in the product documentation found HERE

New Ephesoft Extraction Features

Fuzzy Key Field – This is a great new feature!  This allows for key value fields to still be recognized even if the key value you were looking for was marred in some manner and partially OCR’d.

Zone Extraction – It’s simplified zonal extraction, and it makes for a nice filter to fine tune your KV extraction.  Say for instance you have a health form with a section for applicant and applicant’s spouse, yet both sections have a SSN to be extracted.  Here you can set a zone in the Key field extraction to filter one from the other.

A Regular Expression Builder/Wizard and Library With a great new tool, Ephesoft continues to make the development process simpler by taking the pain out of Regular Expressions.  This tool is available everywhere a regular expression is required. I applaud the effort!

Classification Features

New Test Classification Tools – Ephesoft has added more tools to provide easier ways for developers to test and control the Lucene search engine. Without having to run batches, and open up Luke, developers can see which files were used to determine classification and if they need to adjust confidence scores.  This is very handy and will make classification setup more efficient.

Advanced DA Switch – This feature adds an extra layer to classification to help determine alternative documents that may have been missed for document assembly in prior versions.

Regex Classification Switch – A great feature for having keywords help classify documents.  If there is a particular word like a form name to help classify the document, then this feature could be used.

Predefined Document Type – If you just have one document and want everything to be a document then you can set it here.

Set Unknown documents to a set type – Finally! Get ready to strip out some script coding! Now you can set a default type for unknown documents without scripting.

Fixed Form Extraction Features

Multiple Page Fixed Form Extractions – Have you ever wanted to set different RecoStar Form project files to different pages within in a document?  Well now you can!

Table Extraction Features – Ephesoft, from the beginning, has worked hard to make Table Extraction easy for developers and end users.  In this version Table Extraction has had a bit of a rework to make table extraction less complex.  Testing of a table extraction has also been added to help developer see results without having to run a batch. Confidence scores are now attached to each extraction rule to help with extraction. For users, a valid table check box is added to the User Index interface to help show a table was extracted.

Page Processing

Deskew Option – A deskew option has been added to the page process module.  YEA!!!

User Screens

New Home/Timeout Screen – This new “Home” page contains links to all of the different Ephesoft User Screens.  This screen will also be what users are returned to should the session time out.

New Warning for TimeoutsA new warning screen will appear for users before timeout to allow users a chance to leave a session open.

Web Scanner Enhancements – New client side PNG creation in the browser scanner.  No streaming of images to server until the batch is released.

Administrator Screen

Column sorting – Column sorting is now available within the Admin screens for sorting batches.

Color theming – A color them screen has been added for personalizing color theme.

Trouble Shoot button – A new tool to fetch all information on a failed batch.  This is an excellent feature for the admin tool.

System Configuration Screen – This is a new screen to show all of the plugins in the Ephesoft library, set Application information, and fetch license details, without digging through xml files.  This is a base screen now and is expected to be used more in future releases.

Email Import

Usage of Open Office for processing is now using Libre Office, and is being packaged with Ephesoft.  This is very important to note if you are currently using Open Office integrations.

Email Batch Processing – Ephesoft has added the ability to batch e-mails together. Batches can be picked up after a set number of e-mails or by time.  A neat little feature is the timer can be set that after a set amount of time after receiving the set number of e-mails then it will fetch again.  That was a small batch doesn’t get left waiting too long to fill a batch.

RecoStar TIFF Conversion – PDF to TIFF conversion can now be switched from GhostScript to RecoStar.  It has been found RecoStar can be more efficient than the GhostScript engine, but both are available to be used.  Changing to RecoStar should be considered!

Overall Improvements

RecoStar 7.0 upgrade

Application Level Scripting – A timer can be set to fire off a script after a specific task.  This could be used for a cleaning or monitoring task.  Developers have your way!

Tomcat memory improvements – Decrease possibility of Perm Gen issues.

GWT 2.5 upgrade

Single Sign On – added support for SSO.

The Sunny Side of Tika

One of the great joys of development comes when you learn about a program making your development task very simple.  Apache Tika is one of those programs, and before I even begin to talk about Tika, I have to tip my hat to the developers.  Thank you very much for making my job simpler.

So, what is Tika?

Tika reads the context and content of almost any file so your programs can consume it.  Tika is commonly used for those working with eDiscovery, Taxonomy generation, content capture, and indexing for content management systems. Basically, anytime you want your application to understand all there is about a file, or URI look to Tika to convert it for you.

Tika initially was designed as a part of the Apache Lucene project which is used to automatically index files for full text index searches.  Tika has been around for several years now, but remains a very active application due to the task it was designed for and how well it was written.

The great thing about Tika is you don’t have to know what the file type is for Tika to parse it.  Tika determines the file type for you based on the files header or by extensions. It then uses its built-in parsers to read the file.

How easy is Tika?

Below is the section of my code that implements Tika version 1.4.

	public String parseToString(File givenFile) throws IOException, TikaException{
		Tika myTika = new Tika();
		return myTika.parseToString(givenFile);
	}

Yep, that is parsed content in two simple lines of code.  Now, mind you, I was only interested in getting the content of the file in a text format, but that simplicity is what I am grateful for.  Tika, in all of its facets, is simple to use while remaining very versatile. (Note: When testing this code, make certain the file actually has readable content.  A TIFF or a MP3 file generally does not contain content to be parsed, so you won’t see anything.)

Other ways of using Tika

Okay so maybe you want a little bit more information from a File, like fetching the extra metadata found in a file header.  To complete this task some extra steps are needed, but are well documented. Below is an example of fetching metadata from a file.  In the case below, I’m generating a JSONObject of name-value pairs for the metadata so I the application can consume the metadata later.

public JSONObject fetchMetaData(File givenFile) throws IOException {

		JSONObject jsonMetaData = new JSONObject();

		Metadata metadata = new Metadata();
		Tika myTika = new Tika();

		InputStream instream = new FileInputStream(givenFile);

		myTika.parse(instream, metadata);

		 for (String metaKey : metadata.names()) {
			 String metaValue = metadata.get(metaKey);

			 jsonMetaData.put(metaKey, metaValue);
		}

		return jsonMetaData;
	}

Now maybe you want to fetch both metadata and content, then below uses the Tika classes to fetch both at one time.  That way you only make one pass on the File or inputstream:

public String getEverything(File givenFile) throws IOException, SAXException, TikaException {

		JSONObject jsonMetaData = new JSONObject();
		ContentHandler handler = new BodyContentHandler();

		Metadata metadata = new Metadata();
		Parser parser = new AutoDetectParser();
		InputStream instream = new FileInputStream(givenFile);
		ParseContext parsedContent = new ParseContext();

		parser.parse(instream, handler, metadata, parsedContent);

		 for (String metaKey : metadata.names()) {
			 String metaValue = metadata.get(metaKey);

			 jsonMetaData.put(metaKey, metaValue);
		}

		// Return string
		return jsonMetaData.toJSONString() + "\n\n" + handler.toString();
	}

 

More Tika Documentation and Examples

https://tika.apache.org/index.html

https://tika.apache.org/1.4/gettingstarted.html

https://www.openlogic.com/wazi/bid/314389/Content-mining-with-Apache-Tika

https://mvnrepository.com/artifact/org.apache.tika/tika-parsers/1.4

 

 

Tips for Starting an Enterprise Document Capture Project

In today’s world, it is critical that businesses and government organizations are able to capture and fully utilize all of the information they have at their disposal, this includes information found on both paper and electronic documents. This is where intelligent document capture solutions can come into play. By integrating an Intelligent Document Capture solution with your existing document management system, you gain the ability to digitize paper documents and utilize them in electronic format within your existing workflows.

However, before an Intelligent Document Capture solution can be successfully implemented, a few steps have to be taken to ensure that the solution is implemented correctly according to each businesses specific requirements and current processes.

Tip #1 – Know Your Business 

(more…)