More Thoughts on Data

by | Jan 3, 2011 | Design, ECM Industry, Enterprise Content Management, Search | 1 comment

In my previous blog, Data, the Neglected Part of Content Management,  I discussed the importance of structured data within a cms. Getting a good structured data design to compliment the design of the overall content management solution can be dirty work; not quite Mike Rowe “Dirty Jobs” dirty, but to some in our profession it might as well be.

Why? Because you have to get right down to the user level and find out what they need and the context about how they think about the information.

So, let’s think about structured data and how it affects the user’s ability to locate and use the information in the content management system; in other words the search function.

The textbook approach is to define the search precision and recall ratios for the system. Search precision being defined as the ratio of relevant (as judged by the user) items retrieved to the total number of items displayed in the search results list. And recall being defined as the ratio of relevant items retrieved to the total number of relevant items in the area being searched.

I mention these terms and definitions because I know many of my traditionally IT trained professionals like to follow the textbook and have their workshops and whiteboard sessions and generate flow maps and UML diagrams. But, in projects I’ve been involved with, the terms search precision and recall are never mentioned. However, here is the typical search discussion I’ve witnessed:

  • Interviewer – Ok, how do you want to search for these documents?
  • Interviewee – Well I search all the time for data A, data B and data C and it sure would be nice to search on data D and data E.
  • Interviewer – Ok, great, we already have identified attributes for data A, C and E, we’ll make them searchable for you. And we’ll add searchable attributes for data B and D.
  • Interviewee – You know sometimes I don’t really know the data to search on but I am looking for some text inside the document, is there any way I can search on text?
  • Interviewer – Oh yes this system has the ability to do text searches on any documents added to the system.
  • Interviewee – Great!

One of the critical success factors of a content management system is user satisfaction with the search and retrieval. Research consistently indicates that most users won’t make more than two or three attempts at finding the information they need. (Think about your own experiences using Google.) Search is typically driven by structured data. Fail to pay attention to the structured data driving the search functionality and you’re already placed one of the critical success factors at risk.

I never use the terms search precision and recall anywhere in the design criteria. The terms are just too hard to use quantitatively and what’s relevant to one user may not be relevant to another user. But the concepts need to be firmly in mind when users are being interviewed about their search preferences and habits. Users are remarkably tolerant of search as long as they get the expected results. Unless searching using a unique ID field like document number, users expect to sift through a results list to find the document of interest. How big result listings are they willing to sift through to find the document they are looking for? 10, 50, 100 results? That’s the blend of precision and recall you’re looking for in the interviews.

Much of my work has been in the area of engineering information in the utilities and oil and gas sectors. A favorite attribute that everyone wants to search is the equipment tag number. Drawings show equipment tags; procedures describe operation, maintenance and testing of equipment; specifications and calculations provide design information for equipment; vendor information tells how to maintain and order spare parts for equipment and so on. Lots of great indexing options! Great let’s add those attributes into the data model.

Stop! This is where the dirty works kicks in.

Does anyone index that information now? No, then how will the information get into the new system? What’s the source of the equipment tags? Is it validated? Who will be responsible for the information? When will it be complete and how will it be maintained up to date?

Users will tolerate the search feature up to a point. But, when an attribute is available for searching that sets the expectation of a high search precision it should be fully functional. If there’s little indexing or controls to support the search attribute then two bad things can happen. User frustration starts to set in or worse, a business decision is made without a possibly key piece of information. It is our responsibility as the content management professionals to do the dirty work and let the client know the possible implications of adding this attribute.

So let’s say the user community is ok with a low search precision % but high recall % listing for the equipment id search example. (Or in other words a result listing of 50.) How does a good structured data design support this result?



Need a bit more info on how Armedia can help you?

Feel free to schedule a 30-minute no-obligations meeting.

1 Comment

  1. Sha Lalich

    Hey, I have been reading and enjoying your posts for some time now. I appreciate your views on the topics. I’m finally going to bookmark your site so I can keep coming back. I repeatedly stumble upon it and keep thinking it’s amazing but never saving it. See you next time! Keep providing us with these great essays!


Submit a Comment

Your email address will not be published. Required fields are marked *