When it comes to in-house IT services, every company has different practices. We will stick to the basics, unless our company’s core product is cutting edge hardware or software. Usually businesses follow the pack, and their system architects color within the lines.
For several years there has been the discussion around SAN versus NAS storage. What is the best solution? The simple non-answer is “It depends.” We need to consider several things while designing the storage architecture to support a solution, or working with the customer’s storage engineer to design it.
The decision points are common among all software solutions. However, the software products themselves may move you toward a particular hardware and storage solution. Documentum is no different than Alfresco, and it is no different than FileNet and others regarding the questions to ask.
There may be different storage requirements for products within a vendor’s suite, and the solution “stack” may dictate the storage solution for all the products deployed in a system.
Regarding Documentum is there a “best solution?” We will address concerns, provide you questions to ask, and present a couple of recommendations.
As a friend of mine once said, “Indecision is the key to flexibility,” but eventually you have to order hardware.
NAS versus SAN
We will not go deeply into SAN and NAS storage definitions. However, here is a very simplified view of NAS and SAN configurations:
NAS provides both storage and a file server. The NAS unit has an operating system. The application host, in our case Documentum, communicates with NAS over the network using file based protocols such as NFS and CIFS to read and write the content located on the file system. To the host operating system, NAS appears as a file server, that provides drive and share mapping. An example is EMC Celerra.
SAN storage is also connected to the network, connected to a SAN switch with connections to the client hosts. Blocks of storage appear the same as local storage disks. SAN appears as a disk to the operating system. There are management utilities such as Veritas to make it accessible. SAN protocols include Fiber Channel, iSCSI, and ATA over Ethernet (AoE). An example is EMC Symmetrix or CLARiiON.
Storage protocols can significantly affect price and performance. For example an iSCSI SAN may cost multiple times an ATA. So why pick iSCSI? Well, iSCSI is much faster and generally more reliable than ATA. Fiber Channel is faster still. However, iSCSI uses less expensive Ethernet switches and cables, where Fiber Channel requires more specialized components. What do you need to meet your requirements – operating distance, support staff skills, available budget?
In years past, the arguments about SAN versus NAS were more dichotomous than today. The cost, features, and performance are different, but with hybrid configurations we can get efficiency and performance. Differences still exist if you choose one versus the other.
Generalizations are dangerous, because there are exceptions, protocols change rapidly and blur the lines and turn todays best recommendation into tomorrow’s dog. So, anticipating comments to the contrary, here ya go: Consider SAN to be faster, and consider NAS lower cost and easier to maintain.
What about CAS (Content Addressable Storage)? CAS is used mainly for archiving content, especially large amounts of content. The storage unit contains a CPU. The storage address for each content file is computed based on an algorithm using the actual characters in the file. CAS offers security features and supports retention policies. We’ll discuss CAS at another time. However, there are NAS and even SAN solutions that can be used in concert with Documentum products for archiving and retention policy compliance.
What Works Well in Documentum
Let’s think about what Documentum is and does, what your business requirements are, and why you might choose SAN over NAS and vice versa.
The rule of thumb is that file I/O is the critical path to any single processing thread. However, we need to consider all kinds of performance and latency issues; for example, remote users accessing centralized repositories, network bandwidth, file transfer protocols, peer to peer protocol layers, application design, host resources, and other such. However, reading and writing data to and from disk, with the necessary transfer of packets across the wires is the single most time critical process. Any single component out of whack will degrade performance. However, if all else is well, then file I/O performance will impact speed directly.
What does Documentum do? For a moment, let’s forget the EMC IIG product suites and stacks. The basic, over-simplified answer is that Documentum manages content files in various formats and it collects and stores data about each one. That means it must transmit data over the wires to and from a relational database. Second, it must transmit files over the wires, to and from storage.
Generally speaking, Documentum can use either SAN or NAS storage for content files. NAS may be best choice if you require sharing by multiple content servers.
Full Text Index
One of the features of Documentum is full text indexing. Documentum specifically recommended against NAS with their old Verity and FAST full text indexing integrations. The reason is that the content server is already dealing with the basic file and data communications, and that NAS puts additional load on the entire process and negatively impacted throughput as it copied content and created the index. Full text indexing even now is run on a separate dedicated host. With both FAST and xPlore, you can search on metadata in the full text index, as well as for specific text. The underlying data is in XML.
Even with great improvements in NAS performance, we would recommend to use SAN storage with full text indexing.
Documentum xCP is suite of products and a platform for designing and building process driven applications. There is considerable file activity between the product components and database. We would recommend SAN.
Database performance yields Documentum performance. Query design and tuning in custom applications as well as managing indexes, query plans, and statistics with the out of box product are mandatory for good performance. SAN versus NAS is an important conversation. However, one of our largest clients uses SAN with Oracle.
Oracle, EMC2 and others continue improving storage designs. For example, Oracle now recommends their Direct NFS (dNFS) with release 11g. Oracle has integrated NFS into its product. Oracle accesses NFS storage and communicates with it directly within Oracle and not via the host operating system.
What are some questions to ask, and things to consider?
The objective is to get maximum performance at minimum total cost. Here are a few things to consider besides unit cost:
- Existing contract with the storage vendor
- Licensing fees
- Internal versus vendor support
- Discounts on storage and software bundles (Hmm. EMC has storage solutions as well as Documentum.)
- External versus internal Cloud storage solutions
- Tiered storage solution based on retention policy – lower cost slower storage for archiving?
- Do you want to add CAS to the mix or apply the storage you know to your Documentum archiving solution?
Do you really need sub-second response times from your content management system? Performance is in the eye of the beholder, usually the end user. Consider design recommendations unrelated to hardware:
- Separate your CMS from content presentation.
- Separate physically or by process your content authoring from publishing channels
- Pre-publish content that is to be consumed
- Archive “old” content to reduce query and response times.
- Documentum custom types utilizing different default storage locations
- Distributed stores to bring content physically closer to the consumer
“It depends” is the operative phrase when deciding what kind of storage you want to purchase for your Documentum system, or any other application. Different Documentum products have unique storage considerations. When designing your system, consider costs other than the direct storage price and build efficiencies into the architecture from the ground up.
SAN versus NAS is still a valuable discussion to have in spite of rapid improvements in technology. They continue to converge. Hybrid systems offer performance and cost savings. Be careful of Documentum product requirements, but also use Documentum features to take advantage of storage technology and savings.