Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts.

Similar documents
This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

PROVIDING INVESTIGATIVE SOLUTIONS

Business Impacts of Poor Data Quality: Building the Business Case

Data Management Glossary

DATA MINING AND WAREHOUSING

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Nokia Conference Call 1Q 2012 Financial Results

NOKIA FINANCIAL RESULTS Q3 / 2012

Full file at

DATA PRIVACY & PROTECTION POLICY POLICY INFORMATION WE COLLECT AND RECEIVE. Quality Management System

Forensic analysis with leading technology: the intelligent connection Fraud Investigation & Dispute Services

RIMS Perk Session Protecting the Crown Jewels A Risk Manager's guide to cyber security March 18, 2015

Chapter 6 VIDEO CASES

CE4031 and CZ4031 Database System Principles

The Definitive Guide to Preparing Your Data for Tableau

Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries

CE4031 and CZ4031 Database System Principles

2017 RIMS CYBER SURVEY

FinFit will request and collect information in order to determine whether you qualify for FinFit Loans*.

COMP 465 Special Topics: Data Mining

Knowledge Modelling and Management. Part B (9)

HAGA CLICK AQUÍ PARA TRADUCCION AL ESPAÑOL DE LA POLÍTICA DE PRIVACIDAD

Minimum Wage Survey Results Phase 2. Medicine Hat & District Statistical Breakout Of the Alberta Chamber of Commerce s Dataset

Professional Training Course - Cybercrime Investigation Body of Knowledge -

Legal notice and Privacy policy

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

Forensic analysis with leading technology: the intelligent connection Fraud Investigation & Dispute Services

Accounting Ethics and Auditing

SEC Issues Updated Guidance on Cybersecurity Disclosure

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

We offer background check and identity verification services to employers, businesses, and individuals. For example, we provide:

Data Mining: Approach Towards The Accuracy Using Teradata!

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Question Bank. 4) It is the source of information later delivered to data marts.

Clarity on Cyber Security. Media conference 29 May 2018

EUROPEAN ICT PROFESSIONAL ROLE PROFILES VERSION 2 CWA 16458:2018 LOGFILE

STRUCTURED SYSTEM ANALYSIS AND DESIGN. System Concept and Environment

TIPS FOR FORGING A BETTER WORKING RELATIONSHIP BETWEEN COUNSEL AND IT TO IMPROVE CYBER-RESPONSE

In-Memory Analytics with EXASOL and KNIME //

ADVANCED AUDIT AND ASSURANCE

encrypted, and that all portable devices (laptops, phones, thumb drives, etc.) be encrypted while in use and while at rest?


IMAGE ACQUISTION AND PROCESSING FOR FINANCIAL DUE DILIGENCE

Discovery of Databases in Litigation

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

Throughout this Data Use Notice, we use plain English summaries which are intended to give you guidance about what each section is about.

Survey Report Industry Survey. Data Governance, Technology & Analytics Trends Q1 2014

ACL Interpretive Visual Remediation

by Prentice Hall

What is ISO ISMS? Business Beam

M&A Cyber Security Due Diligence

Legal Considerations and Case Studies

2014 NETWORK SECURITY & CYBER RISK MANAGEMENT: THE THIRD ANNUAL SURVEY OF ENTERPRISE-WIDE CYBER RISK MANAGEMENT PRACTICES IN EUROPE

LAWGIX LAWYERS, LLC. External Privacy Policy Policy ID: 3214 Date Last Revised: January 16, Internal Use Only 1

Introduction to Data Mining and Data Analytics

TRACKING & MARKETING CLOUD REPORTS

Nokia Conference Call Fourth Quarter 2010 and Full Year 2010 Financial Results

Institute of Internal Auditors 2019 CONNECT WITH THE IIA CHICAGO #IIACHI

Cyber Secure Dashboard Cyber Insurance Portfolio Analysis of Risk (CIPAR) Cyber insurance Legal Analytics Database (CLAD)

Eagles Charitable Foundation Privacy Policy

Knowledge Engineering and Data Mining. Knowledge engineering has 6 basic phases:

Understanding Computer Forensics

Outlier Detection With SQL And R. Kevin Feasel, Engineering Manager, ChannelAdvisor Moderated By: Satya Jayanty

The Data Mining usage in Production System Management

Now, Data Mining Is Within Your Reach

BIG DATA TESTING: A UNIFIED VIEW

PRIVACY POLICY TYPE AND USES OF INFORMATION WE COLLECT FROM YOU:

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Massive Scalability With InterSystems IRIS Data Platform

TIM 50 - Business Information Systems

When Recognition Matters WHITEPAPER CLFE CERTIFIED LEAD FORENSIC EXAMINER.

Key Findings from the Global State of Information Security Survey 2017 Indonesian Insights

Nokia Conference Call First Quarter 2010 Financial Results

Customer Clustering using RFM analysis

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software

Privacy Policy Effective May 25 th 2018

Incident Response Services to Help You Prepare for and Quickly Respond to Security Incidents

GDPR Compliant. Privacy Policy. Updated 24/05/2018

Knowledge Discovery and Data Mining

Top Ten Tips for Managing e-discovery Vendors

Privacy Policy. 1. Collection and Use of Your Personal Information

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER. Combining Data Profiling and Data Modeling for Better Data Quality

Database Vs. Data Warehouse

Gain Control Over Your Cloud Use with Cisco Cloud Consumption Professional Services

Announcement date: December 1, 2009 New program launch date: May 1, 2010

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

Chapter 8: SDLC Reviews and Audit Learning objectives Introduction Role of IS Auditor in SDLC

PART I A Technical Guide to Oracle Endeca Information Discovery

ZEROING IN DATA TARGETING IN EDISCOVERY TO REDUCE VOLUMES AND COSTS

Chapter X Security Performance Metrics

Lecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered

ACHIEVING FIFTH GENERATION CYBER SECURITY

2017 THALES DATA THREAT REPORT

Kirk Skaugen Senior Vice President General Manager, PC Client Group Intel Corporation

Retention & Archiving Policy

Startup Genome LLC and its affiliates ( Startup Genome, we or us ) are committed to protecting the privacy of all individuals who ( you ):

Managing Data Resources

Mobile ACH Payments Request for Comment

Transcription:

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts. BY SCOTT A. BARNES, CPA, CFF, CGMA The adversarial nature of the American legal system creates a natural conflict between how facts may be viewed by the various litigating parties. Data and information that the plaintiff claims is completely reliable is resoundingly criticized by the defendant as misleading. Mark Twain wrote in his autobiography: Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: There are three kinds of lies: lies, damned lies, and statistics. In today s business world, the software systems and platforms that drive commerce are heavily reliant on vast amounts of data and databases summarizing historical transactions and events. Depending upon the case facts, the amount of data can be staggeringly large and voluminous. The facts, the story and the evolution of events get lost in the details of the vast forest of data. In fact, the flow of transactions can be increasingly difficult to trace as certain databases only contain specific information or only a partial view of the entire transaction at large. As a general rule, when large amounts of data are involved, it is never simple and straightforward. Where electronic processing of transactions takes place, information in databases are generally encrypted, making identification of payment information by a specific individual challenging. The resolution is often the process of cross matching information from separate databases through a common point of origin to fully pull the complete transaction together for further analysis or summary. This Data Mining process is not often obvious due to the nature of the databases and the vast volume of data.

The use of data mining techniques can be extremely powerful in proving, disproving or bringing to light the true facts. In many litigation cases, there is no option other than to apply various levels of data mining to sift through the large volumes of information. At, we have applied data mining techniques to the following types of legal claims: Types of Damage Claims Where Data Mining Can Be Applied Lost Profits Breach of Contract Patent Infringement Business Interruption Antitrust Fraud Compliance with Policies & Procedures Bankruptcy Lost Value Certain industries are prone to high volume of transactions that allow for the application of data mining techniques, including consumer banking, insurance, brokerage & securities, telecommunications, retail, healthcare, oil and gas, manufacturing, utilities and transportation. The Data Mining Process & Analytical Flow Data mining involves collecting, processing, storing and analyzing data in order to identify and extract new information from it. From a commercial perspective, a large number of industries and companies use data mining internally to improve their knowledge of their customers and to improve their approach to marketing and delivering services to the end customers. In each of these instances a company is seeking to collect data to identify patterns and trends that then can be applied to improving profitability and specifically targeting customers. Data mining models can be applied to specific scenarios, such as: Forecasting: Estimating sales and estimating seasonality of sales Risk and probability: Identifying the best customers to target for mailings, assigning probabilities to specific outcomes Recommendations: Identifying which products are likely to be sold together 2

Finding sequences: Analyzing customer purchasing habits and trends, predicting other customer behavioral events Grouping: Separating customers or events into clusters of related items Internal data mining models used by a company to project and budget sales and operating levels may be highly useful to analyze lost profits or a business interruption loss. One of the benefits of these internal data mining models is they have been used specifically by company management to make business decisions and to project economic outcomes. This approach can be more reliable than creating damage models specifically for litigation, and for which may not incorporate historical data patterns and experience. Data mining, like any analytical process, has a common methodology of application. 1. Defining the Problem 2. Preparing Data 3. Exploring Data 4. Building Models 5. Exploring and Validating Models 6. Updating Models Defining the Problem The first step in any analytical process is to identify the problem or questions to be answered. This step includes analyzing business requirements, defining the scope of the problem, defining the metrics by which the model will be evaluated, and defining specific objectives for the data-mining scope of work. This process may translate into identifying the following types of questions: What data or information are you looking for? What types of relationships or correlations are to be identified? Does the problem you are trying to solve reflect the policies or operational performance of the business? 3

Do you want to make predictions of data, or just identify patterns and associations? Which outcome or attribute do you want to try to predict? What kind of data is available? Does the process of identifying data require any cleansing, aggregation, or multi-database processing to make the data usable? Is the data seasonal? Does the data accurately represent the processes of the business? Addressing these questions may require the making inquiry or specific document discover of the type of data available. Preparing Data Data can be dispersed across a company and stored in different formats, or may contain inconsistencies such as incorrect or missing entries. Data may also be encrypted, thereby limiting the type of analysis or extraction that can be performed. Accordingly, a data cleansing process may be required as the first step to accumulating the data to be analyzed. Data cleaning is not just about removing bad data or inserting missing values, but about finding hidden correlations in the data, identifying sources of data that are the most accurate. Incomplete data, wrong data, and inputs that appear separate, but in fact, are strongly correlated all can influence the results of the analytical model being developed. In most data-mining projects, the amount of data and size of the databases are extremely large and may not allow the examination of every transaction for data quality. In this case, the use of data profiling and automated data cleansing and filtering tools may be required such as applying, Microsoft SQL Server Master Data Services or SQL Server Data Quality Services to explore the data and find the inconsistencies. Exploring Data Understanding the data, any cross relationships between the data and the type of data that can be extracted allows for proper decisions to be made when creating the data-mining models. Certain exploration techniques, such as, identifying if the data sets contain the specific customers, transactions and information required for answering the questions and queries to be performed. For example, is the data representative of the customers or business processes to be analyzed? Applying certain statistical analyses such as assessing the standard deviations of data 4

samples and other distribution values can provide useful information about the stability and accuracy of the results. Data that strongly deviates from a standard distribution might be skewed, or may not represent an accurate picture of a problem. By exploring the data in conjunction with the overall understanding of the business problem, can identify if the dataset contains flawed data and the proper strategy for fixing the problem. Building Models The actual data-mining model is linked to the source of data, but does not actually contain any data until it is processed and the application of data extraction begins. Depending upon the complexity of the data extracting and data mining, the results can be analyzed and filtered and cross checked with the source data to ensure that formulas and algorithms are properly pulling the correct data. Exploring and Validating Models Any data-mining model must be tested to identify how well the model performs. The first step is identifying if the results provide answers to the questions originally developed during the establishment of the scope of the project. Are the patterns and trends originally expected matching the underlying data? If the data-mining model developed is not providing clarity, or is inconsistent with expectations, then redefining the problem or reinvestigating the data in the original datasets is required. The answer may also indicate that the legal claim being alleged may not be accurate. 5

Deploying and Updating Models Once the data-mining models are deployed and extracting the desired underlying data, they may require constant updating as additional data is made available. Final Considerations in the Possible Application of Data Mining in Litigation Matters Understanding several characteristics data mining can helpful in identifying those opportunities where it can best be used in a litigation matter involving economic damages and/or identifying whether a company has complied with polices and procedures. What makes data mining useful is in those situations where the Four Vs exist: Ø The sheer volume of data that is available, produced, aggregated and requires analysis is large and covers an extended period of time. Ø The velocity of data that is created and available to analyze (i.e., where data is available in real time). Ø The variety of data has become quite broad. (e.g., where data is available on customer purchase histories, unique impressions, locations bought and bundled purchase metrics) Ø The veracity of data can be matched. (e.g., a retailer can match and observe what customers really buy compared to what they respond to as buying a consumer survey. * * * * * * * Scott A. Barnes, CPA, CFF, CGMA is the Chief Executive Officer of who during the past 25 years has assisted corporations, investors and attorneys in assessing the financial issues surrounding complex and sophisticated accounting issues, merger and acquisition transactions, business valuation and industry and competitor analyses. He has extensive experience providing sophisticated commercial damage analyses and expert testimony for clients in a diverse range of industries, especially with respect to forensic analyses, intellectual property infringement, antitrust, lost profits, business interruption, contract disputes, business valuation, fraud, accounting, accounting malpractice and other complex commercial damage issues. 6

He as served on two editorial boards of national publications specializing in economic damages, business valuation and forensic accounting issues and published numerous articles on the subject of economic damages, business valuation and professional standards in the performance of litigation services. He can be reached at sbarnes@barnesco.com 7