Blog

Armedia’s Caliente is a high-performance content migration tool designed to quickly and easily import content and metadata into a variety of leading content management systems (CMS). Caliente has been used in numerous projects delivered by Armedia and is currently in use as a key component of many more. Some of our clients run Caliente continually to import content from external “feeds” into managed content repositories. These customers process millions of content files per day using multiple instances of Caliente. Other implementations of Caliente were designed for one-time migrations of content from one CMS to another. These instances also have moved millions of files.

Inevitably, a prospective client’s first question is, “How fast can it load my [insert number here] files?” Our pat answer is, “How fast do you need them loaded?” I think that’s a fair answer to an inherently unfair question. Obviously the rate at which Caliente can import content files relies on several variables outside the control of Caliente, or Armedia, or sometimes even the client.

For example:

  • the speed of a client’s network infrastructure can add latency to the process;
  • the size of the content files being imported can also add to latency;
  • the speed of the hosting hardware running Caliente and the resources available to it can affect performance;
  • the capacity of the CMS to ingest the content can affect performance;
  • the complexity of any transformations required to the metadata or content before loading can add latency to the process;
  • dependence on external resources (e.g., looking up additional metadata from an external system) can add latency to the process;
  • drive and/or database contention among processes running on the hosting machine.

These and other factors all effect how quickly Caliente can load a customer’s files. The good news — and the justification for the answer given — is that there are numerous configurations and techniques Armedia can employ to ensure Caliente can load a customer’s files in the time frame required.

All that said, I think what most customers are looking for when they ask that questions is a benchmark — honest to goodness statistics, not anecdotal explanations about performance. So, following is a quick benchmark I performed to provide hard evidence of Caliente’s performance characteristics.

The entire benchmark was run on a virtual machine hosted on my laptop. The virtual server was running Windows 2003, with 2 CPUs (2.2MHz) and 4GB of RAM. The virtual server was running Caliente, SQL Server 2005, and Documentum Content Server 6.7. For test content I used a variety of files downloaded from textfiles.com, the Gutenberg project, and binaries from the image itself.

Here are some statistics about the test corpus:

  • corpus file count: 25,890 files;
  • corpus size: 3.31 GB;
  • average file size: 133 KB;
  • minimum file size: 168 bytes;
  • maximum file size: 36.9 MB.

I ran Caliente in “hot folder” mode, which simply means it waited for me to drop files into a watched folder, and then it processed them. Each content file was accompanied by a metadata file that contained a set of five attributes to be set on the Documentum object (dm_document) when it was imported.

Here are the benchmark results:

  • total files processed: 25,890;
  • total files imported: 25,890;
  • total processing time: 00:52:31 (hr:min:sec);
  • rates:
    • 0.122 sec / file;
    • 8.22 files / sec;
    • 0.935 sec / MB;
    • 1.07 MB / sec.

A few notes about this benchmark:

  • To monitor performance for Caliente meant turning on detailed logging and running other hardware monitoring processes (e.g., Microsoft’s Performance MMC, and Task Manager). Monitoring performance like this inherently introduces load and latency that otherwise would not exist, thus affecting the results of the benchmark. It is sort of a computer analog to physics’ observer effect. Therefore, the benchmark metrics listed above could be improved by turning off all of this monitoring and debugging.  Not surprising, my monitoring of the import process revealed that disk I/O was the greatest bottleneck in my environment.
  • If you were to run this benchmark on a different server or in a different environment, you would likely receive different results — even if you used the same test corpus and configuration of Caliente. That’s just the nature of benchmarks; they are only valid in very controlled situations. However, they are a good indicator of performance as long as you understand the conditions of the benchmark.

The point I want to make is this: Caliente is capable of processing and importing an impressive volume of content, and can be tuned and configured to meet your performance requirements in whatever environment you run it. With Armedia’s vast experience with content-related migration and migration tools like Caliente, we can assure you that we can meet your import/migration performance requirements, whatever they may be.

Categories

Related Posts

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *