ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
|
|
- Austen Grant
- 6 years ago
- Views:
Transcription
1 ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
2 AGENDA Intro Analytics using SAS Enterprise Guide Ellen Lokollo Advanced Analytics using SAS Enterprise Miner Rens Feenstra Lunch Advanced programming: to get better performance from your SAS code Alfredo Iglesias Rey ABN AMRO presents Cees Harlaar Project INSPIRE Arthur Usov Dynamic Linear Modelling From Data to Insights Pim Veeger SAS on Linux Leon Ellermeijer SAS Improvements Project Wrap up
3 ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
4 AGENDA Advanced Analytics / Datamining / Machine learning SEMMA Rapid Predictive Modeler Enterprise Miner High Performance Analytics R-integration Text Miner Analytic Lifecycle
5 ADVANCED ANALYTICS HOW ADVANCED IS ADVANCED?
6
7 Machine Learning
8 ADVANCED ANALYTICS WHAT IS MACHINE LEARNING? Machine learning is a branch of artificial intelligence that automates the building of systems that learn iteratively from data, identify patterns, and predict future results with minimal human intervention. It shares many approaches with other related fields, but it focuses on predictive accuracy rather than interpretability of the model
9 ADVANCED ANALYTICS MACHINE LEARNING IS NOT A NEW DISCIPLINE Statistics Pattern Recognition Computational Neuroscience Data Science Data Mining AI Databases Machine Learning KDD Graphic from the SAS Data Mining Primer course in 1998
10 ADVANCED ANALYTICS MACHINE LEARNING INCLUDES A COMPREHENSIVE SET OF METHODS Local search optimization k-means clustering Bayesian networks Gradient boosting Deep Learning Random forests Latest techniques Complex Can be more accurate Decisions Trees Regression Neural Networks Principal components Model Ensembles Traditional Easy-to-explain Often good enough Support vector machines SAS covers the full range from Regression to Deep Learning
11 ADVANCED ANALYTICS WHY IS MACHINE LEARNING SO IMPORTANT NOW? Data Computing Power Algorithms
12 ANALYTICS TEXT ANALYTICS Finding treasures in unstructured data like social media or survey tools that could uncover insights about key business challenges FORECASTING Leveraging historical data to drive better insight into proactive decision-making Data Management (Integration, Quality & Governance) OPTIMIZATION DATA MINING/ MACHINE LEARNING Analyze massive amounts of data in order to accurately identify areas likely to produce the most profitable results Mine transaction databases to create models of likely outcomes STATISTICS C op yr i g h t , S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d.
13 ANALYTICS TEXT ANALYTICS Finding treasures in unstructured data like social media or survey tools that could uncover insights about key business challenges FORECASTING Leveraging historical data to drive better insight into proactive decision-making Data Management (Integration, Quality & Governance) OPTIMIZATION DATA MINING/ MACHINE LEARNING Analyze massive amounts of data in order to accurately identify areas likely to produce the most profitable results Mine transaction databases to create models of likely outcomes STATISTICS C op yr i g h t , S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d.
14 ADVANCED ANALYTICS WHERE TO START?
15 ENTERPRISE MINER SEMMA IN ACTION REPEATABLE PROCESS
16 SAMPLE REORGANIZE YOUR DATA Use Weight or Stratified sampling to balance the dataset Partition data into train, validate and test set Error rate Optimum Validation Set Model complexity Training set
17 EXPLORE CHECK DATA TO UNDERSTAND VARIABLE VALUES
18 MODIFY TRANSFORM VARIABLES TO OPTIMIZE RESULTS Transform variables using math function (eg. lognormal) Standardize numeric values in z-scores ( how far from average ) Binning numeric variables (dates into tenures, age into buckets) Remove outliers ( Or it is what you are looking for? ) Group categorical variables into classes Impute missing values
19 MODEL LIST OF MAIN ALGORITHMS Neural networks Deep Learning Decision trees Random forests Associations and sequence discovery Gradient boosting and bagging Support vector machines Nearest-neighbor mapping k-means clustering Self-organizing maps Local search optimization techniques such as genetic algorithms Regression Expectation maximization Kernel density estimation Multivariate adaptive regression splines Bayesian networks Principal components analysis Singular value decomposition Gaussian mixture models Sequential covering rule building Model Ensembles
20 ASSESS EVALUATE MODEL RESULTS AND SCORE
21 SAS ENTERPRISE MINER COMPLETE LIST OF NODES SAMPLE Append Data Partition File Import Filter Merge Sample Input Data EXPLORE Association Cluster Graph Explore Variable Clustering DMDB MultiPlot Market Basket StatExplore Link Analysis Path Analysis Variable Selection SOM/Kohonen MODIFY Drop Impute Interactive Binning Principal Components Replacement Rules Builder Transform Variables Decision Tree AutoNeural Regression Neural Network Partial Least Squares Dmine Regression DM Neural Ensemble Rule Induction Gradient Boosting LARS MBR Two Stage Model Import MODEL Incremental Response Survival Analysis Credit Scoring* TS Correlation TS Data Prep TS Dimension Reduction TS Decomp. TS Similarity TS Exponential Smoothing HP Explore HP Impute HP Regression HP Transform HP Variable Selection HP Neural HP Forest HP Decision Tree HP Data Partition HP GLM HP Cluster HP Prin HP SVM HP BNET Comp ASSESS Cutoff Decisions Model Comparison Score Segment Profile UTILITY Control Point End Groups Start Groups Open Source Integration Reporter Score Code Export Metadata SAS Code Ext Demo Save Data Register Metadata *Requires Credit Scoring for SAS Enterprise Miner Add-on License.
22 MACHINE LEARNING WHY IS IT SO IMPORTANT NOW? Data Computing Power Algorithms
23 SAS HIGH-PERFORMANCE SAS PROCESSING DIRECTLY ATTACHED TO YOUR DATA DATA MINING Database/DW SAS HP Data Mining SAS ANALYTICS Client C op yr i g h t , S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d. Hadoop Cluster
24 SAS ENTERPRISE MINER COMPLETE LIST OF NODES SAMPLE Append Data Partition File Import Filter Merge Sample Input Data EXPLORE Association Cluster Graph Explore Variable Clustering DMDB MultiPlot Market Basket StatExplore Link Analysis Path Analysis Variable Selection SOM/Kohonen MODIFY Drop Impute Interactive Binning Principal Components Replacement Rules Builder Transform Variables Decision Tree AutoNeural Regression Neural Network Partial Least Squares Dmine Regression DM Neural Ensemble Rule Induction Gradient Boosting LARS MBR Two Stage Model Import MODEL Incremental Response Survival Analysis Credit Scoring* TS Correlation TS Data Prep TS Dimension Reduction TS Decomp. TS Similarity TS Exponential Smoothing HP Explore HP Impute HP Regression HP Transform HP Variable Selection HP Neural HP Forest HP Decision Tree HP Data Partition HP GLM HP Cluster HP Prin HP SVM HP BNET Comp ASSESS Cutoff Decisions Model Comparison Score Segment Profile UTILITY Control Point End Groups Start Groups Open Source Integration Reporter Score Code Export Metadata SAS Code Ext Demo Save Data Register Metadata *Requires Credit Scoring for SAS Enterprise Miner Add-on License.
25 SAS EM 14.1 HP BAYESIAN NETWORK NODE Enables the creation of Bayesian networks. probabilistic graphical model that represents the data and the conditional dependencies via a directed acyclic graph (DAG). Supports the following network structures: Naïve, Tree-Augmented Naïve (TAN), Bayesian network Augmented Naïve (BAN), Parent Child (PC) and Markov Blanket. Enables automatic network model selection. Requires a categorical target variable and categorical or interval (binned) input variables.
26 SAS EM 13.1 HP SUPPORT VECTOR MACHINE NODE Enables the creation of linear and nonlinear support vector machine models. supervised machine-learning method that is used to perform classification and regression analysis Constructs separating hyperplanes that maximize the margin between two classes. Enables the use a variety of kernels: linear, polynomial, radial basis function, and sigmoid function. The node also provides Interior point and active set optimization methods.
27 SAS EM 13.2 HP FOREST NODE A forest consists of several decision trees that differ from each other in two ways. First, the training data for a tree is a sample without replacement from all available observations. Second, the input variables that are considered for splitting a node are randomly selected from all available inputs. In other respects, trees in a forest are trained like standard trees. Adds support for a partitioned validation data. HP Forest now performs variable selection using the data partitioned for validation, instead of outof-bag (OOB) data. The HP Forest iteration history plot and table also uses partitioned validation data.
28 SAS ENTERPRISE MINER HIGH-PERFORMANCE NODES AND PROCEDURES Not only nodes available via the interface Also procedures available via any coding interface
29 EXAMPLE CASE PREDICT CUSTOMER RESPONSE TO RETAIL MARKETING Current Process High-Performance Process Neural Network Method (1 iteration) DATA EXPLORATION M O D E L D E V E L O P M E N T MODEL DEPLOYMENT Neural Network Method (100 iterations) 5 hours to process model 6 minutes to process model Limited to 1 or 2 modeling methods Model lift of 1.6% Model lift of 3.2% 84 Experiment with multiple modeling methods SECONDS
30 SAS ENTERPRISE MINER OPEN SOURCE INTEGRATION NODE (R SUPPORT) Allows users to integrate R code (supervised and unsupervised models) inside a SAS Enterprise Miner process flow diagram. Provides flexibility to include R code within a data mining flow, using EM for data prep, R for modeling, and then EM for deployment. Includes R models in model assessment with models generated by SAS Enterprise Miner and in some R-generated PMML cases, corresponding SAS DATA step scoring code.
31 LIBRARY/THE-OPEN-SOURCE-INTEGRATION-NODE- INSTALLATION-CHEAT-SHEET/TA-P/223470
32 ADVANCED ANALYTICS TEXT MINER ADDON TO ENTERPRISE MINER Discovering and using knowledge which exists in the document collection as a whole Uncovering patterns within the document collection Establishing connections between documents and the terms in the collection as a whole Combining free-form text and quantitative variables to derive information and to make better predictions
33 SAS TEXT MINER TEXT MINING PROCESS Typical SAS Enterprise Miner text mining process flow Change Text Topic Node Values for Basic Sentiment Text Mining Raw Data Predictive Modeling
34 SAS TEXT MINER TEXT MINING NODES Users control the Text Miner nodes by modifying their default properties. Part of the Text Parsing node properties Different Parts of Speech Find Entities Multi-word Terms Synonyms Stop or Start List Minimum Number of Documents SVD Resolution Max SVD Dimensions Number of Terms to Display And more!
35 TEXT CLUSTERS
36 TEXT TOPICS
37 TEXT PROFILE
38 SAS ANALYTICS IN ACTION Discovery Deployment Data
39 THANK-YOU
SAS High-Performance Analytics Products
Fact Sheet What do SAS High-Performance Analytics products do? With high-performance analytics products from SAS, you can develop and process models that use huge amounts of diverse data. These products
More informationData Mining Using SAS Enterprise Miner : A Case Study Approach, Fourth Edition
Data Mining Using SAS Enterprise Miner : A Case Study Approach, Fourth Edition SAS Documentation March 26, 2018 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2018.
More informationEnterprise Miner Software: Changes and Enhancements, Release 4.1
Enterprise Miner Software: Changes and Enhancements, Release 4.1 The correct bibliographic citation for this manual is as follows: SAS Institute Inc., Enterprise Miner TM Software: Changes and Enhancements,
More informationENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION
ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION JOZEF MOFFAT, ANALYTICS & INNOVATION PRACTICE, SAS UK 10, MAY 2016 DATA EXPLORATION AND VISUALISATION AGENDA SAS Webinar 10th May 2016 at 10:00 AM
More informationGetting Started with. SAS Enterprise Miner 5.3
Getting Started with SAS Enterprise Miner 5.3 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2008. Getting Started with SAS Enterprise Miner TM 5.3. Cary, NC: SAS
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationWhat does SAS Enterprise Miner do? For whom is SAS Enterprise Miner designed?
FACT SHEET SAS Enterprise Miner Create highly accurate analytical models that enable you to predict with confidence What does SAS Enterprise Miner do? It streamlines the data mining process so you can
More informationOutrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS
Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with
More informationPredictive Modeling with SAS Enterprise Miner
Predictive Modeling with SAS Enterprise Miner Practical Solutions for Business Applications Second Edition Kattamuri S. Sarma, PhD From Predictive Modeling with SAS Enterprise Miner TM, From Predictive
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationEvent: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect
Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationIntroduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)
Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data
More informationNow, Data Mining Is Within Your Reach
Clementine Desktop Specifications Now, Data Mining Is Within Your Reach Data mining delivers significant, measurable value. By uncovering previously unknown patterns and connections in data, data mining
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More information劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012
劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012 Overview of Data Mining ( 資料採礦 ) What is Data Mining? Steps in Data Mining Overview of Data Mining techniques Points to Remember Data mining
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationEnterprise Miner Version 4.0. Changes and Enhancements
Enterprise Miner Version 4.0 Changes and Enhancements Table of Contents General Information.................................................................. 1 Upgrading Previous Version Enterprise Miner
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationCorrelative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC
Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC 2018 Storage Developer Conference. Dell EMC. All Rights Reserved. 1 Data Center
More informationMachine Learning Duncan Anderson Managing Director, Willis Towers Watson
Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018 GIRO 2016, Dublin - Response to machine learning Don t panic! We re doomed! 2 This is not all new Actuaries adopt
More informationData Science. Data Analyst. Data Scientist. Data Architect
Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &
More informationSAS Enterprise Miner 7.1
SAS Enterprise Miner 7.1 Data Mining using SAS IASRI Satyajit Dwivedi Transforming the World DATA MINING SEMMA Process Sample Explore Modify Model Assess Utility 2 SEMMA Process - Creating Library Select
More informationOverview and Practical Application of Machine Learning in Pricing
Overview and Practical Application of Machine Learning in Pricing 2017 CAS Spring Meeting May 23, 2017 Duncan Anderson and Claudine Modlin (Willis Towers Watson) Mark Richards (Allstate Insurance Company)
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationPractical Guidance for Machine Learning Applications
Practical Guidance for Machine Learning Applications Brett Wujek About the authors Material from SGF Paper SAS2360-2016 Brett Wujek Senior Data Scientist, Advanced Analytics R&D ~20 years developing engineering
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationBIG DATA SCIENTIST Certification. Big Data Scientist
BIG DATA SCIENTIST Certification Big Data Scientist Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big Data. To obtain a certification,
More informationGETTING STARTED WITH DATA MINING
GETTING STARTED WITH DATA MINING Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIR Forum 2017 Washington, D.C. 1 Using Data
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationData Mining. Lab Exercises
Data Mining Lab Exercises Predictive Modelling Purpose The purpose of this study is to learn how data mining methods / tools (SAS System/ SAS Enterprise Miner) can be used to solve predictive modeling
More informationIntro to Artificial Intelligence
Intro to Artificial Intelligence Ahmed Sallam { Lecture 5: Machine Learning ://. } ://.. 2 Review Probabilistic inference Enumeration Approximate inference 3 Today What is machine learning? Supervised
More informationData Mining Technology Based on Bayesian Network Structure Applied in Learning
, pp.67-71 http://dx.doi.org/10.14257/astl.2016.137.12 Data Mining Technology Based on Bayesian Network Structure Applied in Learning Chunhua Wang, Dong Han College of Information Engineering, Huanghuai
More informationEnd-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.
End-to-End data mining feature integration, transformation and selection with Datameer Fastest time to Insights Rapid Data Integration Zero coding data integration Wizard-led data integration & No ETL
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More informationSAS Enterprise Miner : What does the future hold?
SAS Enterprise Miner : What does the future hold? David Duling EM Development Director SAS Inc. Sascha Schubert Product Manager Data Mining SAS International Topics for Discussion: EM 4.2/SAS 9.0 AF/SCL
More informationMore Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA
More Learning Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA 1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationUsing Existing Numerical Libraries on Spark
Using Existing Numerical Libraries on Spark Brian Spector Chicago Spark Users Meetup June 24 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationThe Consequences of Poor Data Quality on Model Accuracy
The Consequences of Poor Data Quality on Model Accuracy Dr. Gerhard Svolba SAS Austria Cologne, June 14th, 2012 From this talk you can expect The analytical viewpoint on data quality Answers to the questions
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationData Science Course Content
CHAPTER 1: INTRODUCTION TO DATA SCIENCE Data Science Course Content What is the need for Data Scientists Data Science Foundation Business Intelligence Data Analysis Data Mining Machine Learning Difference
More informationMachine Learning in Action
Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting
More informationDATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:
DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationEnterprise Miner Tutorial Notes 2 1
Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender
More informationSAS E-MINER: AN OVERVIEW
SAS E-MINER: AN OVERVIEW Samir Farooqi, R.S. Tomar and R.K. Saini I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 012 Samir@iasri.res.in; tomar@iasri.res.in; saini@iasri.res.in Introduction SAS Enterprise
More informationOracle9i Data Mining. Data Sheet August 2002
Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationGetting Started with SAS Enterprise Miner 12.1
Getting Started with SAS Enterprise Miner 12.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc 2012. Getting Started with SAS Enterprise Miner 12.1.
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationPractical Machine Learning Agenda
Practical Machine Learning Agenda Starting From Log Management Moving To Machine Learning PunchPlatform team Thales Challenges Thanks 1 Starting From Log Management 2 Starting From Log Management Data
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationThink & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)
Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) About the Speaker Dr. SubraMANI Paramasivam PhD., MCT, MCSE, MCITP, MCP, MCTS, MCSA CEO, Principal Consultant & Trainer
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationF-SECURE S UNIQUE CAPABILITIES IN DETECTION & RESPONSE
TECHNOLOGY F-SECURE S UNIQUE CAPABILITIES IN DETECTION & RESPONSE Jyrki Tulokas, EVP, Cyber security products & services UNDERSTANDING THE THREAT LANDSCAPE Human orchestration NATION STATE ATTACKS Nation
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationData mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014
Data Mining Data mining processes What technological infrastructure is required? Data mining is a system of searching through large amounts of data for patterns. It is a relatively new concept which is
More informationInternational Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications
Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 5, Issue 01, January -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 A Survey
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationIntroduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition
More informationRobot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning
Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge
More informationSAS Enterprise Miner TM 6.1
Getting Started with SAS Enterprise Miner TM 6.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2009. Getting Started with SAS Enterprise Miner TM
More informationR (2) Data analysis case study using R for readily available data set using any one machine learning algorithm.
Assignment No. 4 Title: SD Module- Data Science with R Program R (2) C (4) V (2) T (2) Total (10) Dated Sign Data analysis case study using R for readily available data set using any one machine learning
More informationCommunity edition(open-source) Enterprise edition
Suseela Bhaskaruni Rapid Miner is an environment for machine learning and data mining experiments. Widely used for both research and real-world data mining tasks. Software versions: Community edition(open-source)
More informationIntroducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone
Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230
More informationMULTIVARIATE ANALYSES WITH fmri DATA
MULTIVARIATE ANALYSES WITH fmri DATA Sudhir Shankar Raman Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering University of Zurich & ETH Zurich Motivation Modelling Concepts Learning
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationGetting Started with SAS Enterprise Miner 14.2
Getting Started with SAS Enterprise Miner 14.2 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. Getting Started with SAS Enterprise Miner 14.2.
More informationPre-Requisites: CS2510. NU Core Designations: AD
DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification
More informationBayesian Network & Anomaly Detection
Study Unit 6 Bayesian Network & Anomaly Detection ANL 309 Business Analytics Applications Introduction Supervised and unsupervised methods in fraud detection Methods of Bayesian Network and Anomaly Detection
More informationGain Insight and Improve Performance with Data Mining
Clementine 11.0 Specifications Gain Insight and Improve Performance with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.
More informationSAS Enterprise Miner : Tutorials and Examples
SAS Enterprise Miner : Tutorials and Examples SAS Documentation February 13, 2018 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. SAS Enterprise Miner : Tutorials
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationPredicting Computing Prices Dynamically Using Machine Learning
Technical Disclosure Commons Defensive Publications Series December 07, 2017 Predicting Computing Prices Dynamically Using Machine Learning Thomas Price Follow this and additional works at: http://www.tdcommons.org/dpubs_series
More informationMachine Learning Part 1
Data Science Weekend Machine Learning Part 1 KMK Online Analytic Team Fajri Koto Data Scientist fajri.koto@kmklabs.com Machine Learning Part 1 Outline 1. Machine Learning at glance 2. Vector Representation
More informationOutlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationSAS Visual Data Mining and Machine Learning 8.2: Advanced Topics
SAS Visual Data Mining and Machine Learning 8.2: Advanced Topics SAS Documentation January 25, 2018 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. SAS Visual
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationINTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...
INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data
More information