E-Discovery Insights – Clearwell Systems, Inc.

Ruling the World of Information Management and Electronic Discovery by Kurt Leafstrandon November 17th, 2010

If you’re anything like Dr. Evil, Tears for Fears, or Napoleon, ruling the world is at or near the top of your to-do list, and part of ruling the world is having as omniscient a knowledge as possible of what’s going on, in order to better control it. Ruling the world has also long been the dream of many software vendors, who want to own and understand all the information in an enterprise in order to, um, provide maximum value to their customers… oh, and also to lock them in to a single underlying platform that allows them to control as much of the organization’s information management decisions as possible.

In some cases, these dual interests are aligned. However, in e-discovery, it’s not so clear. Over the last couple of years, many vendors have pushed a notion of “index everything” or so-called “proactive” e-discovery, in which you have instant access to all the information in your enterprise, in real-time, from which to drive your e-discovery process. But is this feasible? Or even desirable? The Myth of the Silver Bullet It can be tempting for IT to turn to an enterprise search solution that can index all data sources – laptops, desktops, file servers, SharePoint servers, databases, email archives, content management systems – and enable e-discovery across the entire enterprise in an instant. The reality is that while such a solution may work for enterprise search in small and medium-sized companies with a finite scope of data, the level of complexity in scale and defensibility of operations makes this simply not an achievable approach for e-discovery at most large enterprises. As Anne Kershaw and Joe Howie of the Electronic Discovery Institute noted in their just-published Judges’ Guide to Cost-Effective E-Discovery:

“There is no single silver bullet that solves all problems associated with escalating discovery costs and delays. As noted above, the single most effective cost reduction method is the focused collection of records most likely to contain relevant information. Some argue that e‐discovery is best accomplished by taking large amounts of data from clients and then applying keyword or other searches or filters. While, in some rare cases, this method might be the only option, it is also apt to be the most expensive. In fact, keyword searching against large volumes of data to find relevant information is a challenging, costly, and imperfect process. A much better approach is to ask key client contacts to help you locate core relevant information and then, by reading that information, determine other sources of relevant information.”

E-Discovery Insights – Clearwell Systems, Inc. What are the specific reasons why a targeted collection approach is superior? From our conversations with clients as we have been developing our solution to this problem over the last couple of years, three major drawbacks to the index-everything approach stand out. 1. Impact to Existing IT Environment While the collect-and-preserve approach employed by Clearwell is widely accepted for ediscovery, index-everything and preserve-in-place solutions have recently emerged, originating from other enterprise applications such as knowledge management and enterprise search. These approaches from other domains have significant disadvantages when applied to ediscovery, including impact to existing IT infrastructure and processes that result in increased cost and complexity. For instance, the scope of e-discovery can exceed the amount of information being indexed by knowledge management or enterprise search applications. According to Forrester, the majority of enterprise search implementations range in size from the hundreds of thousands to tens of millions of records, not billions of documents that are potentially discoverable during litigation. Consequently, index-everything solutions must index a much larger volume of data across a broader range of applications and data stores than would typically be necessarily for enterprise search.

Indexing such a large amount of data has implications for the entire IT environment. These solutions either crawl data repositories over the network or employ agents on local desktops and laptops to find new and modified files. IT organizations using these solutions report experiencing disruptions including: • Requiring read access and permissions to numerous line-of-business applications and storage systems where data resides • Significant increases to disk I/O for enterprise applications, network file shares, and client machines • Increased network consumption as large amounts of data are read over the network • Increased consumption of local hard drive space on employee desktops and laptops for search indexes and redundant copies of preserved files • Scheduling resource-intensive indexing tasks during off-peak hours, impacting the ability of IT departments to complete backups during shrinking backup windows Taken together, these issues add cost and complexity to the deployment of index-everything and preserve-in-place solutions. This often results in organizations not fully deploying the solution after purchasing licenses and spending months or years trying to integrate with their existing systems. 2. Risk of Missing Critical Data

E-Discovery Insights – Clearwell Systems, Inc. Another key concern of organizations seeking to meet e-discovery requests is the ability to find all relevant files and documents for a case. Missing even a few important documents may result in multimillion dollar fines and sanctions. UBS and Morgan Stanley each paid $29.2 million and $12.5 million, respectively, for losing key files during litigation. It is therefore critically important that e-discovery solutions have the ability to not only index and search common file types, but also a range of less common but equally important files such as those within nested container files, encrypted files, and TIFF images containing text. Solutions that originate from applications outside the e-discovery domain often skip these files because 100% accuracy is not required for other applications such as enterprise search. Across organizations with billions of documents, there may be hundreds of thousands of potentially relevant files which are in the dark and unknown to legal teams because they are not indexed. Know more on legal electronic discovery.

Index corruption is another commonly reported issue with index-everything solutions that results in incomplete search results. Search indexes are susceptible to data corruption just like any other computer file, but the large size of indexes containing billions of records increases the probability of errors. In fact, this is a common problem of most archive solutions and other solutions that manage billions of records. A corrupt search index will result in incomplete results or in the worst case scenario, the inability to conduct searches until the index is repaired. In some situations, data must be re-indexed to rebuild a corrupt search index which is time consuming due to the slow speed of some solutions.

The net result isthat in-place solutions increase the likelihood of missing critical data, exposing the organization to considerable legal and financial risk. 3. Time Delays and Uncertainty in Searches When embarking on a project to make all enterprise data searchable for e-discovery, an important consideration is indexing speed in relation to total outstanding data and projected data growth. Organizations deploying such a solution typically have a large amount of existing data that needs to be indexed, and this index must be continually updated as data is modified and new data is created. Many companies report that although vendors claim high processing rates, these high rates erode over time as companies index greater amounts of their existing data, increasing the size of search indexes. Beyond an application’s ability to index data, there are exogenous factors affecting indexing performance including network speed, disk I/O, and latency. Along with index size and the number of search indexes, these factors can also affect search query performance, resulting in searches that take hours or days to return results.

Another issue facing organizations deploying index-everything solutions is that end users may be creating and modifying documents faster than the solution can index them. As a result, there is a widening gap between the state of data in the wild and the solution’s picture of that data, leading to incomplete search results. Equally troubling, search results may include files that were moved after the search engine indexed them, and so they appear in the results but cannot be viewed, retrieved, or preserved. End users clicking on the link to an item may receive an error similar to the “404 Error: File Not Found” that everyone has experienced when

E-Discovery Insights – Clearwell Systems, Inc. browsing the web. This presents a significant defensibility problem in e-discovery, and IT teams often end up tracking down these missing files one-by-one to ensure they are preserved. The result is that organizations may be exposed to unnecessary legal risk while IT teams have the additional burden of manually tracking down hundreds of files for each legal matter. A Better Approach to Collection and Preservation Recognizing the challenges of collection and preservation, Clearwell has developed a targeted approach that enables organizations to defensibly collect and preserve data without increasing the work of IT or exposing the organization to risk. Targeted collection provides an easy way for IT or Legal teams to collect from all critical data sources and securely manage collected data in a preservation store for the duration of a case. Unlike index-everything and preserve-in-place approaches, Clearwell is up and running quickly, delivering value in hours or days without the cost and complexity of lengthy multi-month deployment timelines. In addition, Clearwell’s targeted collect-and-preserve approach has a number of benefits over in-place approaches:

Minimal impact to IT infrastructure: Clearwell only collects potentially relevant data from custodians involved in a case or investigation, targeting resources at the most important data instead of wasting resources on indexing all data across the entire organization. As a result, targeted collection requires less impact to existing applications and storage systems, does not cause significant increases to disk I/O or network consumption, and does not require agents to be installed on client machines or servers. Finds all critical data: Purpose-built to support the complex and difficult to read file types required by e-discovery, Clearwell can index and search all critical content such as nested container files, encrypted files, images containing text, and hidden content. Up-to-date collection: Clearwell collects all relevant data for e-discovery by targeting information that is related to custodians in the case. Because this approach is not limited by legacy indexing approaches, Clearwell is able to collect data that has been recently modified or moved. Maintains existing workflow: With Clearwell, end users are able to continue using their existing workflows and business processes without interruption. Using targeted collection, Clearwell can collect data in the background without altering data where it resides. When users create or modify files in the normal course of business, Clearwell incrementally collects new data automatically. Reduces risk: Targeted collection significantly reduces the risk of spoliation by retaining data in a secure preservation store, providing a defensible process that maintains chain of custody. As a result, data cannot be tampered with by end users or accidently lost on laptops, desktops, or other data repositories not under the control of IT.

E-Discovery Insights – Clearwell Systems, Inc. Collecting and preserving evidence are critical steps in the e-discovery process. Solutions that promote indexing everything as the optimal solution for your e-discovery problems might be conceptually promising, but create new challenges for IT and increase risk in practice. As a result, organizations are seeking a solution that enables them to respond effectively to ediscovery without causing major disruptions or exposing the organization to additional risk. Clearwell’s targeted approach solves the challenges of collection and preservation by making it easy to collect data from all critical data sources and preserve data defensibly, without incurring greater risk or disrupting the organization’s business processes. Know more on Electronic Evidence Discovery

