Blog

Expanding Documentum’s Full Text Search Capability with a Thesaurus

by | Mar 19, 2014 | Content Server, Documentum, Enterprise Content Management, Enterprise Search, Solutions, User Experience | 0 comments

An approach for enhancing a customers’ satisfaction with Documentum’s built-in full text search capabilities is to provide them with a thesaurus of terms relevant to their industry, region, or business process.  For example, suppose a user needs to find all the invoices in their repository for the soda products they ordered last year.  In some parts of the country, ‘pop’ is an acceptable alternative to ‘soda’.  Therefore, your search must equate these two terms, as well as expand them to contain the names of actual products.  A simple approach for implementing this capability is to build a thesaurus that contains an ontology of soda products.

In this example, let’s suppose a user searched on the words ‘soda’ and ‘invoice’, expecting to see results for Pepsi, Coke-Cola, Dr. Pepper, and Mt. Dew.  The search engine, as part of its preparation for executing the query, searches the thesaurus for ‘soda’ and automatically includes ‘Pepsi, ‘Coke-Cola’, ‘Dr. Pepper’, and ‘Mt. Dew’ as search terms in the query.  Now the user gets the results they expected.

Documentum’s full text search engine is EMC’s xPlore and, among other cool things, it implements thesauri using the Simple Knowledge Organizational System (SKOS) representation.  Once a thesaurus is created and installed in xPlore[1], it can be used to expand search terms with synonyms to perform broader searches.  SKOS can represent far more complex relationships than xPlore currently uses, but building the search engine on a representation like SKOS, positions xPlore for much greater and advanced types of searching in the future.

Building an xPlore SKOS thesaurus is as simple as writing an XML file containing SKOS elements.  There are really on three SKOS tags you need to know:

  • Concept – the concept is the idea you want to expand by including additional search terms in your query.  In this example, the concept is ‘soda’.
  • prefLabel – this is the preferred form of the term to be added to the query, i.e., the synonym for the concept.
  • altLabel – this is an alternate form of the term that can be added to query.  altLabels often include abbreviations or alternate spellings of the prefLabel values.

With these SKOS elements, a simple xPlore thesaurus for this example might look like this:

<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf=https://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:skos="https://www.w3.org/2004/02/skos/core#">
<skos:Concept rdf:about="https://www.my.com/#soda">
<skos:prefLabel>Coke-Cola</skos:prefLabel>
<skos:prefLabel>Pepsi</skos:prefLabel>
<skos:prefLabel>Dr. Pepper</skos:prefLabel>
<skos:prefLabel>Mt. Dew</skos:prefLabel>
</skos:Concept>
<skos:Concept rdf:about="https://www.my.com/#pop">
<skos:prefLabel>Coke-Cola</skos:prefLabel>
<skos:prefLabel>Pepsi</skos:prefLabel>
<skos:prefLabel>Dr. Pepper</skos:prefLabel>
<skos:prefLabel>Mt. Dew</skos:prefLabel>
</skos:Concept>
</rdf:RDF>

 

Figure 1 – Sample xPlore Thesaurus for ‘soda’ and ‘pop’

As you can see in Figure 1, the thesaurus file contains two Concepts, ‘soda’ and ‘pop’.  The body of each Concept element contains the names of the soda products you want to include in your search whenever ‘soda’ or ‘pop’ is expanded.  These product names are represented as prefLabel elements in the XML.  In this case, the ‘soda’ and ‘pop’ Concepts contain the same set of expansion terms because ‘soda’ and ‘pop’ are synonyms and we want to ensure the same expansion is executed for both terms.

As an example of using the altLabel tag, consider that ‘coke’ is often used as an abbreviated form for Coke-Cola, a specific product, as well as for cola, a general class of soda products.  To ensure the term ‘coke’ is properly expanded to include ‘Coke-Cola’ as well as ‘cola’, the Concept depicted in Figure 2 can be added to the thesaurus.

<skos:Concept rdf:about="https://www.my.com/#coke">
<skos:prefLabel>Coke-Cola</skos:prefLabel>
<skos:altLabel>cola</skos:altLabel>
</skos:Concept>

 

Figure 2 – Concept Element for ‘coke’

Using a thesaurus can be a simple, yet powerful way to enhance users’ search experiences.  Because thesauri in xPlore are XML, they can be quickly and easily modified to meet the changing needs of users, industries, or business processes.  This means search experiences and results can be evaluated immediately without any reindexing of content or recompilation of search code.  Therefore, it is a good place to build industry-specific ontologies that can be easily adjusted and transferred to other systems which also understand SKOS.

References

 

 


[1] The xPlore v1.3 Administration and Development Guide, pages 213-217 discusses how to load your thesaurus and optionally configure debugging to watch search term expansion
.

Categories

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *