This Webcast Will Begin Shortly If you have any technical problems with the Webcast or the streaming audio, please contact us via email at: accwebcast@commpartners.com Thank You! Welcome! Electronic Data Discovery Technology & Terminology A Primer for In-House Counsel July 26, 2007 Presented by the ACC Litigation Committee and Steptoe & Johnson LLP Association of Corporate Counsel www.acc.com 1
Today s Panel Stephanie Mendelsohn, Director of Corporate Records and Electronic Discovery, Genentech, Inc. José Ramón González-Magaz, Partner, Steptoe & Johnson LLP Mike Bergeron, Of Counsel, Steptoe & Johnson LLP Sonya Sigler, General Counsel, Cataphora, Inc. Bill Mooz, VP and General Counsel, Catalyst Repository Systems, Inc. Agenda I. Key Process Steps for Running an Electronic Discovery Project Identification, Preservation & Collection, Stephanie Mendelsohn, Genentech, Inc. Review of ediscovery Data, José González-Magaz & Mike Bergeron, Steptoe & Johnson LLP II. Technology & Terminology Collection, Culling & Analysis, Sonya Sigler, Cataphora, Inc. Review & Production, Bill Mooz, Catalyst Repository Systems, Inc. III. Q&A with Panel 2
Identification, Preservation and Collection Stephanie Mendelsohn Director of Corporate Records and Electronic Discovery Genentech, Inc. Page 6 Identify Data Sources Before any legal hold: Prepare for early attention to ESI: Describe infrastructure Identify most knowledgeable Identify data sources Document preservation and collection process After a legal hold is initiated: Identify custodians Identify data sources 6 3
Page 7 Preservation Process Initiate legal hold to suspend routine disposition of documents and ESI. Engage in custodian interviews. Provide repositories as needed. Document, document, document. 7 Page 8 What must be preserved? 8 4
Page 9 And Coming to a Phone Near You... 9 Page 10 Collection Process Who performs the collection? How is the collection performed? What is collected? Identifying any sources of data that are not reasonably accessible. Prioritizing reasonably accessible data sources. 10 5
Review of Electronic Discovery Data José Ramón González-Magaz, Partner Steptoe & Johnson LLP Mike Bergeron, Of Counsel Steptoe & Johnson LLP Page 12 Reviewing the Data Major Cost Factors Reviewing platform to be used. Native file review versus image-based review. Complexity of coding form. Degree of experience/ specialization of the reviewers. Training of reviewers: Preparation of project manual. Scale of supervision needed. Quality control. 12 6
Page 13 Training the Reviewers Techniques for reviewing documents/files/data efficiently and reliably. Detection of privileged materials and preparation of privilege log coding. Coding of data reviewed. Identification and handling of close-call data. Substantive parameters of the review. Use of the technology/ equipment. 13 Page 14 Data Review Progress Reports Review rate assessment. Budget tracking. Substantive reports. Memorialization of project developments. 14 7
Page 15 Review Rate Estimates Review rate* (electronic) Review rate* (scanned with objective coding**) Review rate* (scanned without objective coding**) High (basic coding) *Number of documents per 8 hour reviewer day **objective coding = coding for Date, Document Type, Title, Author, Recipient, Copyee, etc. Average Low (advanced coding) 800 600 400 300 200 100 500 400 300 15 Page 16 Quality Control of Data Review Confirm supervision given during the review. Ensure all data were reviewed. Random check of coding performed. Second level review of odd tags. Client confirmation that correct reviewing parameters were properly applied. 16 8
Collection, Culling & Analysis Sonya Sigler General Counsel Cataphora, Inc. Page 18 Collection Collection Tools/Methods: Mirror Image of Hard Drives or Servers Self Selection Others Data Mapping Appliances (ESI blueprint): Kazeon Deep Dive Forensic Analysis: Deleted or Missing Data Not What You Expected 18 9
Page 19 Collection Philosophy Narrow Based Collection: By Custodian - John Doe By Date Range - January 1, 2002 - July 31, 2006 Documents Pulled by Keywords - fraud, invoice Broad Based Collection: Collect it ALL Cull After Collection 19 Page 20 Culling Goals Readily Accessible Data: Readily Accessible under FRCP 34 Not Readily Accessible: Database data Source Code, etc. Reduce your Data Set Make it Manageable 20 10
Page 21 De-duplication Methods MD5 hash values: Do I need to know what this is??? De-duplication of Data Sets: Within custodian sets Across custodian sets Across all data sets Near Duplicates Know what is being done to your data: ALWAYS ask! Vendors need to explain this clearly. 21 Page 22 Duplicate Range 25% 90% 90% 22 Broad Based Collection Restoring Back-Up Tapes 11
Page 23 Culling Methodologies Linguistic Methods (Word Based): Keyword Ontologies Statistical Methods (#s based): Topic Clustering: Statistical Similarity Counting #s of words, appearance together Latent Semantic Indexing 23 Page 24 Keyword Culling Pro Word Stemming: Hous* - house, housemate, household. Easy to use/explain/agree Familiar Fast results Con Over-inclusive: Disambiguate Under-inclusive Word must be present Hard to craft Ineffective with short messages, IMs 24 12
Page 25 Linguistic Methods What are ontologies? Combines previous methods Built on continual improvement Review privileged information Production by Ontology: Automated Review Technology Assisted Review 25 Page 26 Statistical Methods Topical Clustering: Statistical similarity: Royalty, Disney, high Supervised clustering: Choosing the Topics to Cluster Latent Semantic Indexing: Searches By Concept: Find Me More Like Simplified Searches: Natural Language Entire Documents 26 13
Page 27 Analysis Methods 27 Page 28 Analysis Graphically Depicting Data and Connections in the Data: Closeness Analysis Map the Data Set Mindshare Analysis Tone Detection 28 14
Page 29 Closeness Analysis 29 E-Mail Communications: Map The Entire Dataset Up Front Page 30 Green: Administration Red: Legal Blue: Accounting 30 15
Page 31 Mindshare Analysis 31 Page 32 Tone Detection 32 16
Page 33 Conclusion Don t Be Afraid to Ask Educate Yourself: ACC Website Vendor s Websites 33 Review & Production Bill Mooz Vice President and General Counsel Catalyst Repository Systems, Inc. 17
Page 35 Setting Up the Review: Tools Repository or Review Platform: System of hardware & software used to store and review discovery data. Enterprise Software: Software that runs on your hardware. Hosted Solution or Software as a Service: Review platform that runs on providers infrastructure that you access remotely. Web-Based: Access is via a browser. Terminal Service: A software layer that enables you to access a system remotely; requires additional hardware & software. Plug-Ins: Software loaded on user s computer to access the review platform; generally not required with web-based systems. 35 Page 36 36 Setting Up the Review: Data Native Files: The form in which the document was generated originally. The default format for production under the new FRCP. Conversion: Converting native files into TIFF (Tagged Image File Format) or PDF (Portable Document Format) for review or production; may or may not be required. Metadata: Data about the document itself (e.g., date created, author, recipient, etc.). May be objective (residing in document itself) or subjective (identified by humans). Processing: Extracting metadata from native files to enable the review process. FTP: File Transfer Protocol, a way to send electronic data via the internet. Not effective for files greater than 2 gigabytes in size. 18
Page 37 Organizing the Review: Batching Labor Arbitrage: Moving tasks to lower-cost providers; typically involves using contract attorneys (on-shore or off-shore) to conduct first pass review. Batching: Putting documents in logical groups for assignment to reviewers. Concept Clustering: Using mathematical equations to sort documents into related groups. Fielded Search: Searching document sets by meta data or a combination of meta data and text terms. Filters/Navigators: Built-in tools for organizing search results into subcategories like date ranges, author, recipient, etc. 37 Organizing the Review: Folders & Forms Folders: Files for organizing documents. May be dynamic (autopopulating based upon criteria) or static. Security/access rights often administered at folder level. Review Forms: What the reviewer sees on the screen when reviewing documents. Will include a variety of fields to be coded, often with checkthe-box capability. Forms are customized by case and even level of review (first-pass, second-pass, etc.). Fields: Document attributes that can be used to organize them. Examples include date, bates number, author, hot doc, privilege, responsive, etc. Private Fields: Fields that are restricted to specific users; essential requirement for sharing a repository. Page 38 38 19
Page 39 Conducting the Review Linear Review: The process of reviewing documents one-by-one. Can include multiple passes. Bulk Tagging: Marking multiple documents all at once, e.g., designating an entire folder of documents irrelevant with a single action. Redaction Tools: Tools that enable you to redact sensitive electronic documents, preserving the original for control purposes. Audit Trails: System-generated reports that enable you to review the actions of review teams or individual reviewers. 39 Page 40 40 Multi-Language Reviews ASCII: American Standard Code for Information Interchange, the standard system for encoding characters in the English language for use by computers. UTF 8: Unicode Transformation Format, the new global standard for encoding characters in all languages, including those with more than 26 character sets. CJK: Chinese, Japanese, Korean & Thai languages that do not use spaces between individual characters or words. Tokenization: The process of putting white space between character sets in CJK documents to make them searchable. Language Packs: Upgrades that allow software to work with foreign languages. Essential that reviewers have them on their systems. Available at http://en.wikipedia.org/wiki/help:multilingual_support. 20
Productions Export: Get data out of a review platform for use elsewhere. Can export in multiple different formats. Conversion: Transforming a native file into another format, usually PDF or TIFF. Blowback: Print a set of data back to paper. Subcollection: A sub-set of documents on a repository that is made available to someone with a limited need-to-know. Used increasingly to produce documents to opposing parties, especially regulatory agencies. Privilege Logs: Typically handled by exporting a limited set of metadata for the documents designated as privileged. Page 41 41 Q&A with Panel Stephanie Mendelsohn, Director of Corporate Records and Electronic Discovery, Genentech, Inc. mendelsohn.stephanie@gene.com José Ramón González-Magaz, Partner, Steptoe & Johnson LLP 202-429-8110 / jrgonzalez@steptoe.com Mike Bergeron, Of Counsel, Steptoe & Johnson LLP 301-610-2397 / mbergeron@steptoe.com Sonya Sigler, General Counsel, Cataphora, Inc. 650-622-9840 x604 / sonya.sigler@cataphora.com Bill Mooz, VP and General Counsel, Catalyst Repository Systems, Inc. 303-824-0842 / bmooz@catalystsecure.com Thank you for your time! 21
Thank you for attending another presentation from ACC s Desktop Learning Webcasts Please be sure to complete the evaluation form for this program as your comments and ideas are helpful in planning future programs. You may also contact Sherrese Williams at williams@acc.com This and other ACC webcasts have been recorded and are available, for one year after the presentation date, as archived webcasts at www.webcasts.acc.com. You can also find transcripts of these programs in ACC s Virtual Library at www.acc.com/vl 22