Gain Insight and Improve Performance with Data Mining

Similar documents
Gain Greater Productivity in Enterprise Data Mining

Now, Data Mining Is Within Your Reach

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

> Data Mining Overview with Clementine

Oracle9i Data Mining. Data Sheet August 2002

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

Data Science. Data Analyst. Data Scientist. Data Architect

Oracle9i Data Mining. An Oracle White Paper December 2001

MICROSOFT BUSINESS INTELLIGENCE

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

VERSION EIGHT PRODUCT PROFILE. Be a better auditor. You have the knowledge. We have the tools.

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

End-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree

SELECTING CLASSIFICATION AND CLUSTERING TOOLS FOR ACADEMIC SUPPORT

Fusion Registry 9 SDMX Data and Metadata Management System

Hot Tech Tips for using SPSS Statistics Part 1. Laura Watts Mitya Moitra Wednesday October 11 th 2017, 11AM

Technical report. Clementine. Solution Publisher

HP Database and Middleware Automation

Getting Started with Advanced Analytics in Finance, Marketing, and Operations

CA ERwin Data Profiler

Data Mining: Approach Towards The Accuracy Using Teradata!

Overview. Business value

The strategic advantage of OLAP and multidimensional analysis

ncode Automation 8 Maximizing ROI on Test and Durability Product Details Key Benefits: Product Overview: Key Features:

The Now Platform Reference Guide

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

August Oracle - GoldenGate Statement of Direction

IBM InfoSphere Information Analyzer

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

EMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH

Information empowerment for your evolving data ecosystem

Accessibility Features in the SAS Intelligence Platform Products

Passit4sure.P questions

Putting it all together: Creating a Big Data Analytic Workflow with Spotfire

Week 1 Unit 1: Introduction to Data Science

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Progress DataDirect For Business Intelligence And Analytics Vendors

Introducing SAS Model Manager 15.1 for SAS Viya

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

SAS High-Performance Analytics Products

Enterprise Miner Software: Changes and Enhancements, Release 4.1

Load Dynamix Enterprise 5.2

Vendor: Cisco. Exam Code: Exam Name: Advanced Collaboration Architecture Sales Engineer. Version: Demo

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

2 The IBM Data Governance Unified Process

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Pre-Requisites: CS2510. NU Core Designations: AD

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Deploying, Managing and Reusing R Models in an Enterprise Environment

<Insert Picture Here> Enterprise Data Management using Grid Technology

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

New Features Summary. SAP Sybase Event Stream Processor 5.1 SP02

Informatica Enterprise Information Catalog

Question Bank. 4) It is the source of information later delivered to data marts.

SAS Enterprise Miner 7.1

Oracle Big Data Science

Predicting Service Outage Using Machine Learning Techniques. HPE Innovation Center

Introduction to Data Mining. Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

Get the most value from your surveys with text analysis

Teradata Aggregate Designer

Oracle Big Data Connectors

Oracle and Tangosol Acquisition Announcement

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

Why Converged Infrastructure?

Taking Your Application Design to the Next Level with Data Mining

Don t just manage your documents. Mobilize them!

9. Conclusions. 9.1 Definition KDD

ACHIEVEMENTS FROM TRAINING

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Spectrum Miner. Version 8.0. Release Notes

PRODUCT BROCHURE Mailbox Shuttle and Archive Shuttle:

Data Mining with Microsoft

Informatica Axon Data Governance 5.2. Release Guide

Advanced ODBC and JDBC Access to Salesforce Data

IBM Data Replication for Big Data

Intelligence Platform

Fundamentals of Information Systems, Seventh Edition

A SAS/AF Application for Parallel Extraction, Transformation, and Scoring of a Very Large Database

Cornerstone 7. DoE, Six Sigma, and EDA

QuickSpecs. ISG Navigator for Universal Data Access M ODELS OVERVIEW. Retired. ISG Navigator for Universal Data Access

Technical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed.

DATA MINING AND WAREHOUSING

IBM Internet Security Systems Proventia Management SiteProtector

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Fast Innovation requires Fast IT

Optim. Optim Solutions for Data Governance. R. Kudžma Information management technical sales

Jitterbit is comprised of two components: Jitterbit Integration Environment

Tamr Technical Whitepaper

JMP and SAS : One Completes The Other! Philip Brown, Predictum Inc, Potomac, MD! Wayne Levin, Predictum Inc, Toronto, ON!

IJMIE Volume 2, Issue 9 ISSN:

The Power of Analysis Framework

Terabyte-class data analysis for CRM in service provider

Transcription:

Clementine 11.0 Specifications Gain Insight and Improve Performance with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events. With the Clementine data mining workbench from SPSS Inc., your organization can conduct data mining that incorporates many types of data, resulting in outstanding insight into your customers and every aspect of your operations. Clementine enables your organization to strengthen performance in a number of areas. For example, you could improve customer acquisition and retention, increase customer lifetime value, detect and minimize risk and fraud, reduce cycle time while maintaining quality in product development, or support scientific research. Use the predictive insight gained through Clementine to guide customer interactions in real time and share that insight throughout your organization. Solve business problems faster, thanks to Clementine s powerful new data preparation and predictive modeling capabilities. And leverage high-performance hardware to get answers more quickly. Clementine is the leading data mining workbench, popular worldwide with data miners and business users alike. By using Clementine, you can: Easily access, prepare, and integrate structured data and also text, Web, and survey data Rapidly build and validate models, using the most advanced statistical and machine-learning techniques available Efficiently deploy insight and predictive models on a scheduled basis or in real time to the people that make decisions and recommendations, and the systems that support them Quickly achieve better ROI and time-to-solution by leveraging advanced performance and scalability features Encrypt sensitive data for data mining applications where security is critical

Extend the benefits of data mining throughout your entire organization by using Clementine with SPSS Predictive Enterprise Services, which enables you to centralize the storage and management of data mining models and all associated processes. You can control the versioning of your predictive models, audit who uses and modifies them, provide full user authentication, automate the process of updating your models, and schedule model execution. As a result, your predictive models become real business assets and your organization gains the highest possible return on your data mining investment. Clementine offers a number of unique capabilities that make it an ideal choice for today s data-rich organizations. Streamline the data mining process Clementine s intuitive graphical interface enables analysts to visualize every step of the data mining process as part of a stream. By interacting with streams, analysts and business users can collaborate in adding business knowledge to the data mining process. Because data miners can focus on knowledge discovery rather than on technical tasks like writing code, they can pursue train-of-thought analysis, explore the data more deeply, and uncover additional hidden relationships. From this visual interface, you can easily access and integrate data from textual sources, data from Web logs, and data from SPSS Dimensions survey research products. No other data mining solution offers this versatility. Choose from an unparalleled breadth of techniques Clementine offers a broad range of data mining techniques that are designed to meet the needs of every data mining application. You can choose from a number of algorithms for clustering, classification, association, and prediction, as well as new algorithms for automated multiple modeling, time-series forecasting, and interactive rule building. Optimize your current information technologies Clementine is an open, standards-based solution. It integrates with your organization s existing information systems, both when accessing data and when deploying results. You don t need to move data into and out of a proprietary format. This helps you conserve staff and network resources, deliver results faster, and reduce infrastructure costs. With Clementine, you can access data in virtually any type of database, spreadsheet, or flat file. In addition, Clementine automatically generates SQL for any ODBCcompliant database, and performs in-database modeling in major databases offered by IBM, Oracle, and Microsoft, improving the speed and scalability of these operations. Clementine can deploy results to any of SPSS predictive applications, as well as to solutions provided by other companies. This interoperability helps your IT department meet internal customer needs while ensuring that your organization gains even greater value from your information technology investments. Leverage all your data for improved models Only with Clementine can you directly and easily access text, Web, and survey data, and integrate these additional types of data in your predictive models. SPSS customers have found that using additional types of data increases the lift or accuracy of predictive models, leading to more useful recommendations and improved outcomes. With the fully integrated Text Mining for Clementine module, you can extract concepts and opinions from any type of text such as internal reports, call center notes, customer e-mails, media or journal articles, blogs, and more*. And, with Web Mining for Clementine, you can discover patterns in the behavior of visitors to your Web site. * Text mining capabilities will be available in Clementine Desktop 11.1. 2

Clementine provides data mining scalability by using a three-tiered architecture, as shown in this diagram. The Clementine Client tier (shown at the bottom) passes stream description language (SDL) to Clementine Server. Clementine Server then analyzes particular tasks to determine which it can execute in the database. After the database runs the tasks that it can process, it passes only the relevant aggregated tables to Clementine Server. Direct access to survey data in Dimensions products enables you to include demographic, attitudinal, and behavioral information in your models rounding out your understanding of the people or organizations you serve. Deploy data mining results efficiently Clementine is scalable capable of cost effectively analyzing as many as hundreds of millions of records. It uses a three-tiered architecture to manage modeling and scoring with maximum efficiency. Clementine s deployment options enhance its scalability by enabling your organization to deliver results to suit your particular requirements. For example, through Clementine s batch mode feature or the Clementine Solution Publisher Runtime component, you can deploy data mining streams to your database to run in the background of other operational processes. Scores or recommendations are created without interrupting daily business tasks. You can also export information about the steps in the process, such as data access, modeling, and post-processing. This option gives you greater flexibility in how you process models and integrate predictive insight with other systems. Or you may choose to export Clementine models to other applications in industry-standard Predictive Model Markup Language (PMML). Through its integration with all of SPSS predictive applications and its openness to other systems,clementine supports the delivery of recommendations to managers, customercontact staff, and the systems supporting your operations. 3

Follow a proven, repeatable process During every phase of the data mining process, Clementine supports the de facto industry standard, the CRoss-Industry Standard Process for Data Mining (CRISP-DM). This means Business Understanding Data Understanding your company can focus on solving business problems through data mining, rather than on reinventing a new Data Preparation process for every project. Individual Clementine projects can be efficiently organized using the CRISP-DM project manager. Or, with SPSS Predictive Enterprise Services, you can support the CRISP-DM process enterprise wide. Deployment Data Modeling What s new in Clementine 11.0 With this release, SPSS continues its commitment to delivering a data mining solution that offers the greatest possible efficiency and flexibility in the development and deployment of predictive models. Evaluation The CRISP-DM process, as shown in this diagram, enables data miners to implement efficient data mining projects that yield measurable business results. Clementine 11.0 features extensive enhancements in four key areas to help your organization improve performance, productivity, and return on your data mining investment. Improved reporting features Clementine 11.0 includes enhanced reporting features that enable your organization s analysts to illustrate results graphically and communicate them quickly and clearly to executive management. Generate report-quality graphics via a new graphics engine Control the appearance of graphs using new graph editing tools Leverage SPSS output from within Clementine, including extensive reports and high-quality graphics Enhanced data preparation tools Extensive enhancements to Clementine 11.0 s data preparation capabilities help your organization ensure better results by providing more efficient approaches to cleaning and transforming data for statistical analysis. Correct data quality issues such as outliers and missing values more easily via a unified view of data Streamline the transformation process by visualizing and selecting from multiple transformations Access SPSS scripting capabilities and syntax for greater flexibility in data management 4

New data mining algorithms More than a dozen new algorithms enable your analysts to perform processes such as modeling, regression, and forecasting faster and with greater precision, permitting more sophisticated statistical analysis and more accurate results. Easily automate the building, evaluation, and automation of multiple models, eliminating repetitive tasks Create rule-based models interactively that include your business knowledge Employ many new in-database algorithms for Microsoft SQL Server and Oracle to leverage your investments in these database systems High-performance architecture Gain significant improvements in security, scalability, integration, and performance as a result of major enhancements to the Clementine 11.0 architecture. Take advantage of secure sockets layer (SSL) encryption for applications involving sensitive data Enjoy greater flexibility and interoperability through improved integration with other systems and architectures and extended support for industry standards such as PMML 3.1 Use parallel processing to leverage high-performance hardware and achieve faster time-to-solution and higher ROI Employ more of SPSS powerful functionality via tighter integration with SPSS for Windows Features Clementine s main features are described below in terms of the CRISP-DM process. Business understanding Clementine s visual interface makes it easy for your organization to apply business knowledge to data mining projects. In addition, optional business-specific Clementine Application Templates (CATs) are available to help you get results faster. CATs ship with sample data that can be installed as flat files or as tables in a relational database schema. CRM CAT** Telco CAT** Fraud CAT** Microarray CAT** Web Mining CAT** (requires the purchase of Web Mining for Clementine) Data understanding Obtain a comprehensive first look at your data using Clementine s data audit node View data quickly through graphs, summary statistics, or an assessment of data quality Create basic graph types, such as histograms, distributions, line plots, and point plots Edit your graphs to communicate results more clearly Use association detection when analyzing Web data Interact with data by selecting a region of a graph and see the selected information in a table; or use the information in a later phase of your analysis Access SPSS statistics and reporting tools from Clementine, including reports and graphics Data preparation Access data Structured (tabular) data Access ODBC-compliant data sources with the included SPSS Data Access Pack. Drivers in this middleware pack support IBM DB2, Oracle, Microsoft SQL Server, Informix, and Sybase databases. Import delimited and fixed-width text files, any SPSS file, and SAS 6, 7, 8, and 9 files Specify worksheets and data ranges when accessing data in Excel Unstructured (textual) data Automatically extract concepts from any type of text by using Text Mining for Clementine* Features subject to change based on final product release. Symbol indicates a new feature. * Text mining capabilities will be available in Clementine Desktop 11.1. * * Separately priced modules 5

Features (continued) Web site data Automatically extract Web site events from Web logs using Web Mining for Clementine** Survey data Directly access data stored in the Dimensions Data Model or in the data files of Dimensions** products Data export Work with delimited and fixed-width text files, ODBC, Microsoft Excel, SPSS, and SAS 6, 7, 8, and 9 files Export in XLS format through the Excel Output Node Choose from various data-cleaning options Remove or replace invalid data Use predictive modeling to automatically impute missing values Automatically generate operations for the detection and treatment of outliers and extremes Manipulate data Work with complete record and field operations, including: Field filtering, naming, derivation, binning, re-categorization, value replacement, and field reordering Record selection, sampling, merging (through inner joins, full outer joins, partial outer joins, and anti-joins), and concatenation; sorting, aggregation, and balancing Data restructuring, including transposition Splitting numerical records into sub-ranges optimized for prediction Extensive string functions: string creation, substitution, search and matching, whitespace removal, and truncation Preparing data for time-series analysis with the Time Plot node Partition data into training, test, and validation datasets Transform data automatically for multiple variables Visualization of standard Modeling transformations Access data management and transformations performed in SPSS directly from Clementine Mine data in the database where it resides with in-database modeling. Support: IBM DB2 Enterprise Edition 8.2: decision tree, association, sequential association, regression, and demographic clustering techniques Oracle 10g: decision tree, O-cluster, KMeans, NMF, Apriori, and MDL, in addition to Naïve Bayes, Adaptive Bayes, and Support Vector Machines (SVM) Microsoft SQL Server Analysis Services: decision tree, clustering, association rules, Naïve Bayes, linear regression, neural network, and logistic regression Use predictive and classification techniques Neural networks (multi-layer perceptrons using error back-propagation and radial basis function) Browse the importance of the predictors Decision trees and rule induction techniques, including CHAID, exhaustive CHAID, QUEST, C&RT and interactive tree building New decision list algorithm for interactive rule building Browse and create splits in decision trees interactively Rule induction techniques in C5.0 Linear regression, logistic regression, and multinomial logistic regression View model equations and advanced statistical output Use binomial logistic regression models and diagnostics for optimal modeling Identify variables separating categories using discriminant analysis Anomaly Detection node for directly detecting unusual cases Feature Selection node for identifying fields that are most and least important to an analysis Additional statistics available at the Means and Matrix nodes Analyze the impact of variables and build models around them via generalized linear model (GLM) techniques Use clustering and segmentation techniques Kohonen networks, K-means, and TwoStep View cluster characteristics with a graphical viewer Choose from several association detection algorithms GRI, Apriori, sequence, and CARMA algorithms Score data using models generated by association detection algorithms Filter, sort, and create subsets of association models using the association model viewer Employ data reduction techniques Factor analysis and principal components analysis View model equation and advanced statistical output Combine models through meta-modeling Multiple models can be combined, or one model can be used to analyze a second model Automate the creation and evaluation of multiple models using binary classifier algorithm Import PMML-generated models created in other tools such as AnswerTree and SPSS for Windows Use Clementine External Module Interfaces (CEMI) for custom algorithms Purchase add-on tools from the Clementine Partner Plus Program Generate and automatically select timeseries forecasting models Features subject to change based on final product release. Symbol indicates a new feature. * * Separately priced modules 6

Evaluation Easily evaluate models using lift, gains, profit, and response graphs Use a one-step process that shortens project time when evaluating multiple models Define hit conditions and scoring expressions to interpret model performance Analyze overall model accuracy with coincidence matrices and other automatic evaluation tools Deployment Clementine offers a choice of deployment capabilities to meet your organization s needs. Clementine Solution Publisher Runtime (optional**) Automate the export of all operations, including data access, data manipulation, text mining, model scoring including combinations of models and postprocessing Use a runtime environment for executing image files on target platforms Automatically export Clementine streams to SPSS predictive analytics applications Combine exported Clementine streams with predictive models, business rules, and exclusions to optimize customer interactions Cleo (optional**) Implement a Web-based solution for rapid model deployment Enable multiple users to simultaneously access and immediately score single records, multiple records, or an entire database, through a customizable browser-based interface Clementine Batch Automate production tasks while working outside the user interface Automate Clementine processes from other applications or scheduling systems Generate encoded passwords Call Clementine processes via the command line Scripting Use command-line scripts or scripts associated with Clementine streams to automate repetitive tasks Export generated models as PMML 3.1 Perform in-database scoring, which eliminates the need for and costs associated with transferring data to client machines or performing calculations there Deploy Clementine PMML models to IBM DB2 Intelligent Miner Visualization and Intelligent Miner Scoring Use the bulk-loading capabilities of your database to improve performance Secure client-server communication Transmit sensitive data securely between Clementine Client and Clementine Server through SSL encryption SPSS Predictive Enterprise Services (optional**) Centralize data mining projects to leverage organizational knowledge Save streams, models, and other objects in a central, searchable repository Save entire Clementine projects, including all process steps Group streams in folders, and secure folders and streams by user or user groups Provide permission-based access to protect the privacy of sensitive information Reuse the most effective streams and models to improve processes and increase the accuracy of results Search on input variables, target variables, model types, notes, keywords, authors, and other types of metadata Automate Clementine model refreshing from within SPSS Predictive Enterprise Services Ensure reliable results by controlling versions of predictive models Automatically assign versions to streams and other objects Protect streams from being overwritten through automatic versioning Scalability Employ in-database mining to leverage high-performance database implementations Use in-database modeling to build models in the database using leading database technologies Additional vendor-supplied algorithms included for Microsoft SQL Server 2005, IBM DB DWE, and Oracle Data Mining Support multi-threaded processing during sorting and model building, in environments with multiple CPUs or multi-core CPUs Minimize data transfer: Clementine pulls data only as needed from your data warehouse and passes only relevant results to the client Leverage high-performance hardware, experience quicker time-to solution, and achieve greater ROI through parallel execution of streams and multiple models 7

For more than 25 years, U.S.-based National Instruments has revolutionized the way engineers and scientists in industry, government, and academia approach measurement and automation. The company sells virtual instrumentation software and hardware in a wide range of industries, from automotive to consumer electronics. As National Instruments began to rapidly expand from its mid-sized roots, it recognized a need for more sophisticated data mining software to keep up with accompanying growth in sales leads and customers. Specifically, the company sought to identify the best prospects for its salespeople to target and segment customers for more strategic marketing programs. Our customer base was so broad and diverse across geography, industry, and even from customer to customer, that we didn t know if it could be segmented. By using Clementine, we now have a robust profile of each of our market segments, which allows us to use direct marketing to customize messages delivered to each one. Amanda Jensen Analytics Manager Business Intelligence Department National Instruments Corporation The cornerstone of the Predictive Enterprise Clementine makes data predictive and facilitates the delivery of predictive insight to the people in your organization who make decisions and the systems that support daily customer interactions. If your organization has any data stored in text form or in Web logs as many do you can leverage it by using Text Mining for Clementine or Web Mining for Clementine*. And you can understand the attitudes and beliefs that lie behind behavior why customers make the choices they do by incorporating survey data from any of SPSS Dimensions survey research products. Thanks to its integration with SPSS predictive applications and other information systems, Clementine enables you to guide daily decisions and recommendations, as well as long-range planning, with insight into current and future conditions. You can accomplish this securely and efficiently, across your entire enterprise, with SPSS Predictive Enterprise Services. Clementine's extensive capabilities are supported by the most advanced statistical and machine learning techniques available. For greater value, it is open to operation and integration with your current information systems. * Text mining capabilities will be available in Clementine Desktop 11.1. To learn more, please visit www.spss.com. For SPSS office locations and telephone numbers, go to www.spss.com/worldwide. SPSS is a registered trademark and the other SPSS products named are trademarks of SPSS Inc. All other names are trademarks of their respective owners. 2007 SPSS Inc. All rights reserved. CLM11SPCA4-0107