INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

Similar documents
Oracle9i Data Mining. Data Sheet August 2002

Oracle9i Data Mining. An Oracle White Paper December 2001

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

Gain Greater Productivity in Enterprise Data Mining

Now, Data Mining Is Within Your Reach

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Question Bank. 4) It is the source of information later delivered to data marts.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

Data Mining at a Major Bank: Lessons from a Large Marketing Application

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

Data Mining and Warehousing

data-based banking customer analytics

Types of Data Mining

Introduction to Oracle

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Technical report. Clementine. Solution Publisher

Gain Insight and Improve Performance with Data Mining

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Introducing SAS Model Manager 15.1 for SAS Viya

Deploying, Managing and Reusing R Models in an Enterprise Environment

Applications and Trends in Data Mining

Intelligence Platform

A Comparison of Leading Data Mining Tools

In-Memory Analytics with EXASOL and KNIME //

Accessibility Features in the SAS Intelligence Platform Products

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

DATA MINING AND WAREHOUSING

Oracle Big Data Connectors

COMP 465 Special Topics: Data Mining

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Introduction to Data Mining and Data Analytics

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software

VERSION EIGHT PRODUCT PROFILE. Be a better auditor. You have the knowledge. We have the tools.

Evaluation Report on PolyAnalyst 4.6

MetaSuite : Advanced Data Integration And Extraction Software

The strategic advantage of OLAP and multidimensional analysis

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Mike Schulte Data Scientist at the University of Pittsburgh Professor of Economics and Philosophy at Western Michigan University

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Overview and Practical Application of Machine Learning in Pricing

Introducing Oracle R Enterprise 1.4 -

Chapter 1, Introduction

TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER. Combining Data Profiling and Data Modeling for Better Data Quality

Introduction to Data Mining. Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts.

Database code in PL-SQL PL-SQL was used for the database code. It is ready to use on any Oracle platform, running under Linux, Windows or Solaris.

IMPLICATIONS AND OPPORTUNITIES OF THE REIT MODERNIZATION ACT

Knowledge Discovery in Data Bases

Data Mining: Approach Towards The Accuracy Using Teradata!

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

Taking Your Application Design to the Next Level with Data Mining

DECISIONSCRIPT TM. What Makes DecisionScript Unique? Vanguard. Features At-a-Glance

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

PART I A Technical Guide to Oracle Endeca Information Discovery

Customer Clustering using RFM analysis

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

9. Conclusions. 9.1 Definition KDD

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

SAS E-MINER: AN OVERVIEW

Informatica Enterprise Information Catalog

QuickSpecs. ISG Navigator for Universal Data Access M ODELS OVERVIEW. Retired. ISG Navigator for Universal Data Access

System Requirements. SAS Profitability Management 2.1. Server Requirements. Server Hardware Requirements

Oracle9i for e-business: Business Intelligence. An Oracle Technical White Paper June 2001

Sales Presentation Case 2018 Dell EMC

Hot Tech Tips for using SPSS Statistics Part 1. Laura Watts Mitya Moitra Wednesday October 11 th 2017, 11AM

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

ORACLE SERVICES FOR APPLICATION MIGRATIONS TO ORACLE HARDWARE INFRASTRUCTURES

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Netezza The Analytics Appliance

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

To enable us to input the real world data warehouse requirements into cost comparison model in a way that would not favor any single vendor, we needed

DATA WAREHOUSING AND MINING UNIT-V TWO MARK QUESTIONS WITH ANSWERS

1 (eagle_eye) and Naeem Latif

Differentiate Your Business with Oracle PartnerNetwork. Specialized. Recognized by Oracle. Preferred by Customers.

COURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER

Spectrum Miner. Version 8.0. Release Notes

Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)

IBM InfoSphere Information Analyzer

Big Data Analytics The Data Mining process. Roger Bohn March. 2016

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Data warehouse and Data Mining

IBM QMF for Windows for DB2 Workstation Databases, V7.2 Business Intelligence Starts Here!

SAS Enterprise Miner : What does the future hold?

TIM 50 - Business Information Systems

Fusion Registry 9 SDMX Data and Metadata Management System

White Paper(Draft) Continuous Integration/Delivery/Deployment in Next Generation Data Integration

Data warehouse and Data Mining

Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L

Data Warehousing and OLAP Technologies for Decision-Making Process

Web Application Performance Testing with MERCURY LOADRUNNER

Data Mining in the Application of E-Commerce Website

The recent agreement signed with IBM means that WhereScape will be looking to integrate its offering with a wider range of IBM products.

Fast Innovation requires Fast IT

DATA WIZARD. Technical Highlights

Transcription:

INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data Mining for Telecommunications... 11 Data Mining for Financial Services... 11 Data Mining for Government... 12 Data Mining for Database Marketing... 12 Data Mining for Healthcare... 12 SPECIAL FEATURES OF DARWIN... 13 LATEST FEATURES OF DARWIN... 14 STRENGTHS & LIMITATIONS OF DARWIN... 16 STRENTGHS... 16 LIMITATIONS... 17 SUMMARY... 17 Roopa & Jinguang Liu Page 1 of 17

INTRODUCTION WHAT IS DATA MINING? First a brief introduction to what is Data Mining. Data Mining constitutes Knowledge Discovery and Knowledge Deployment. Knowledge Discovery is the process of finding the hidden patterns that transform data into insight. Knowledge Deployment is the application of discovered knowledge for business advantage. The figure below is an illustration of Data Mining! " # $! % &'( HOW TO ACHIEVE DATA MINING Roopa & Jinguang Liu Page 2 of 17

THE ROLE OF DARWIN In today s competitive marketplace, it is critical that companies manage their most valuable assets--their customers and the information they know about their customers. Darwin helps you to build a close relationship with and understand your customers, which helps you to: Better retain customers and avoid churn Profile customers and understand behavior Maintain and improve profit margins Reduce customer acquisition costs Target profitable customers with the right offer Darwin's data mining strategy is BUILD, EVOLVE, SELECT, EVALUATE, and APPLY. Darwin can be used to build predictive models to reveal patterns hidden in data and reveal valuable new information information that can provide early warning systems for churn and attrition; provide hidden insights about customer behavior, customer demand, and market segments; identify cross-sell and up-sell opportunities; and help combat fraud. The information thus extracted from data can provide business intelligence that can improve customer relationships and hence generate profits and costs savings that go straight to the bottom line. Darwin provides various functionality in a user-friendly environment that a business analyst can use, and yet they can always tap into the horsepower of some very powerful UNIX, perhaps SMP, multi-processor environment so they can mine larger amounts of data and extract more information. Roopa & Jinguang Liu Page 3 of 17

FEATURES OF DARWIN USER FRIENDLY MS Excel integration Intuitive GUI Wizards to guide and automate Model wizard Evaluation wizard Darwin provides an intuitive, easy-to-use user interface. It also provides wizards to simplify and automate the data mining steps. For example, the Key Fields wizard automatically finds the variables that are most influential in addressing a particular question. The Model Seeker wizard automatically builds many data mining models, displays interactive graphs and tables of results, and recommends the best model(s). Darwin workspace or tree-directory keeps track of data, models and results. Like all windows desktop applications, Darwin supports all point and click, drag and drop functionalities. Workspace or tree directory Roopa & Jinguang Liu Page 4 of 17

Darwin provides a variety of wizards to guide the user through the data mining process. Modeling wizard, along with the evaluation wizard, guides the user through the basic data mining. Modeling wizard allows the user to select the data to be mined, specify the target field, specify the desired model type, and give the model a name. After these, the user can optionally specify model parameters, or start the model building process. Roopa & Jinguang Liu Page 5 of 17

The evaluation wizard applies the model created to another set of data to refine the models and display the results. It also allows the selection of the desired output type. SCALABILITY Darwin uses powerful algorithms and parallelized computing to achieve power and scalability during Data Mining. It uses a powerful, scalable, parallel UNIX server. Darwin runs with the following software : Client: Windows Servers: Sun Solaris HP-UX Windows NT (Release 4.0) Darwin data mining software runs in a Windows client and UNIX server architecture. Darwin emphasizes ease of use and yet have the power and depth of functionality to solve large problems. Roopa & Jinguang Liu Page 6 of 17

One of Darwin s key strengths is its ability to handle large sets of data by using parallel processing environments. When you have parallel processors, say in a Sun SMP environment, you can divide the task up across many processors. Darwin is shown to run in linear scalability. In other words, if there are 64 processors, it can mine the large amount of data 64 times faster than on a single-processor machine. That means it can mine more data and build more accurate models. It can include more variables for analysis as compared to traditional statistical packages (<= 30). With data mining 1,000+ variables can be taken as input variables and mined to identify key relationships and to build predictive models. And because modeling is an iterative process, many different types of models need to be built with different inputs and different assumptions. Number of Processors Darwin Module 1 2 4 8 Tree 1 1.93 3.80 6.30 Net 1 1.92 3.36 7.37 Match (train) 1 2.00 4.04 7.97 Performance Speed-Up 8 7 6 5 4 3 2 1 Linear Scalability Tree Net Match (train) 1 2 3 4 5 6 7 8 9 Number of CPUs running in parallel Small Data Size Large Low Model Accuracy High With Darwin, you can run a large number of models very quickly and very easily to extract more business intelligence more rapidly. Roopa & Jinguang Liu Page 7 of 17

VISUALIZATION Darwin uses Excel to display the results in a nice, straightforward, user-friendly fashion. It also uses Microsoft Excel for graphing, say, in this case, for comparison between three different models, a Match model, a Net model, and a Tree model. It also supplies Margin and ROI graphs to show the financial patterns for decision making. Darwin s outputs are geared towards ease of interpretation by business analysts. Roopa & Jinguang Liu Page 8 of 17

24% Roopa & Jinguang Liu Page 9 of 17

FUNCTIONALITY To solve business questions that SQL programming, statistical packages, and Query/OLAP tools cannot adequately address Types of data Darwin can access Data warehouses Relational databases (ODBC) Support for SQL queries SAS files Flat files Darwin can access data from a variety of network data sources. We can extract information from the database using ODBC drivers. We use DataDirect s (from MERANT, formerly Micro Focus and INTERSOLV) gateways for database connections and can extract from other databases using their ODBC drivers. We also support SQL queries and we also can import and export SAS files. We can import flat files or ASCII text files. Once you bring the data into Darwin, you often need to prepare the data for data mining, and Darwin supplies a set of transformation functions such as sampling, randomization, generating new computed fields, handling missing values, merging, replacing values, splitting the dataset into train, testing, and evaluation datasets, and so on. In the model-building phase, Darwin supplies multiple algorithms--c&rt, classification and regression trees, also known as decision trees, neural networks, k-nearest-neighbors technique of memory-based reasoning, and clustering. All of these algorithms are full-featured, tested, accurate, very powerful as they have all been implemented to take advantage of parallel Roopa & Jinguang Liu Page 10 of 17

computing techniques. For speed and the ability to handle large amounts of data- -gigabytes and terabytes--they are implemented as parallelized algorithms. Data Mining for Telecommunications Which customers are the most likely to churn, or leave for a competitor? What are the profiles of most and least profitable customers? Which products and services should be offered to which potential customers? What is the customer profile that characterizes frequent card users? Which patterns of usage indicate fraud? Which bundles of products and services will sell best to which current customers? Data Mining for Financial Services Which customers am I most likely to lose? Who is most likely to transfer an outstanding balance? What are the profiles of most and least profitable customers? What is the risk associated with this loan applicant? What are the profiles of high-risk and low-risk customers? Who is most likely to default on a loan? What is the debt pattern indicating impending bankruptcy? What purchase patterns indicate credit card fraud? Which product bundles will sell best to whom? Roopa & Jinguang Liu Page 11 of 17

Data Mining for Government Which patterns are associated with tax, welfare, illicit drugs, or other types of fraudulent behavior? What profile characteristics of people, equipment, or modes of transportation are indicative of future illegal behavior? Which military personnel make the best leaders under battlefield conditions? Which patterns help to police stock market activities? What are the indicators of investor trading violations? Which patterns indicate cases of money laundering? Data Mining for Database Marketing What customer profiles are most promising for this campaign? Who is more likely to respond to a given offer? What offers are more likely to enlist a positive response? How should prospects be segmented for maximum profitability? What customer characteristics are associated with the highest lifetime earnings? How often should customers be approached, and when? Data Mining for Healthcare What patterns indicate fraudulent claims from patients and/or doctors? In which cases do which drugs or treatments work best? Which customers are most likely to respond to a promotion for this new drug? Roopa & Jinguang Liu Page 12 of 17

SPECIAL FEATURES OF DARWIN Darwin also has a unique, special feature called the workflow object. It automatically documents the data mining analysis steps and records them in a graphical function. You can also query those objects to see what the object contains, when it was created, and the information about that object. Darwin also has a scripting capability that allows you to record, edit, and rerun macros of your data mining process, which is the programmatic equivalent of the workflow graphical object. And, with the scripting capabilities of Darwin, you can record data mining scripts of tasks that you want to repeat--say, rerun the January data, the February data and so on. You can also save and/or edit and use these scripts to integrate Darwin into other customer applications so you can develop an entire closed-loop marketing automation solution. Roopa & Jinguang Liu Page 13 of 17

Another unique feature of Darwin is the ability to export the models. Darwin can export C, C++, and Java models which then can be integrated into other applications. LATEST FEATURES OF DARWIN! " A missing value treatment wizard that handles the problem of missing data. Darwin supports both field-wise and record-wise treatments. A Model Seeker wizard automatically builds many different data mining models, presents the results in interactive graphs and tables, and recommends the best model(s). Roopa & Jinguang Liu Page 14 of 17

A new computed fields transform allows users to create new derived fields using over 100 statistical, mathematical, comparison, and boolean logic expressions. An interactive Tree display feature allows users to view and query decision trees and the rules that Darwin generates. The Key Fields wizard automatically sifts through the data using a series of C&RT dives into the data to identify the fields that are most influential in addressing a particular problem. The output of the Key Fields wizard is a useful subset of data fields for input into OLAP. Multi-model comparison to easily compare mining models. Lift charts for Trees that show the lift by tree segment for more refined control in your market segmentation. An editable workflow object to document and communicate your mining activities. Clustering wizard k-means. Clustering, also known as Unsupervised Learning, finds groups of records that are similar in some ways. ODBC write-back to the database Roopa & Jinguang Liu Page 15 of 17

Interactive cluster reports allow users to view and query the clusters created by Darwin so they can make better decisions. Significant enhancement in the Darwin server and ability to run on Windows NT servers. Darwin provides native access to the Oracle database, which besides faster access to the data, also facilitates scoring--or placing the prediction results back into the database. Darwin provides faster algorithms by taking advantage of threads for better parallelism. Additionally, includes support for Naïve Bayes and enhances support for clustering by adding self-organizing maps (SOM). STRENGTHS & LIMITATIONS OF DARWIN STRENTGHS Darwin is more powerful in classification and prediction than IBM IM. For example, Darwin's CART allows user to choose the best among different pruned subtrees, by testing and evaluation. Darwin's neural network allows user to specify its number of layers, number of nodes at each layer, and learning algorithms. It has also mechanisms to avoid over-training. Darwin's "build, test, evaluate, and apply" strategy avoids idiosyncratic mining results, and offers a better estimate of the accuracy likely to be obtained when the model is run against new data. Darwin has a genetic algorithm. Even it takes more time to train than neural networks, the genetic algorithm is always able to find a global optimization. Roopa & Jinguang Liu Page 16 of 17

LIMITATIONS Darwin is dedicated to classification and prediction, and does not support association and time sequence analysis. Darwin has little visualization functions. SUMMARY Darwin is enterprise-wide, parallel, scalable data mining software that helps you rapidly extract business intelligence hidden within large amounts of customer data. Darwin s strength is being able to handle large amounts of data, which is what data mining is all about, not only large amounts of data in terms of volume but also breadth in terms of the number of variables. Darwin also supports a comprehensive multi-algorithmic approach so as to mine more data for better understanding of customer data behavior. Darwin is easy to use, with the standard Windows client that provides access to the full horsepower of Darwin on the server. Darwin can also integrate the results into other existing systems to distribute the business intelligence throughout the enterprise. Roopa & Jinguang Liu Page 17 of 17