Practical Data Science with R. Matthew Renze CFT Training for AIG
|
|
- Shon Dixon
- 6 years ago
- Views:
Transcription
1 Practical Data Science with R Matthew Renze CFT Training for AIG
2 Motivation The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that s going to be a hugely important skill in the next decades, because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it. Hal Varian, Google s Chief Economist The McKinsey Quarterly, Jan 2009
3 Source: Dice 2014 Tech Salary Survey Results
4 A Flood of Data is Coming... Source: Source: Wikipedia Sink or Swim
5 Overview 1. Introduction 2. Transforming Data 3. Descriptive Statistics 4. Data Visualization 5. Statistical Modeling 6. Handling Big Data 7. Machine Learning 8. R in Practice
6 How Does This Apply to Me? As a software developer, I often: Perform log file analysis Analyze software performance Analyze code metrics for code quality Detect anomalies in source data Transform or clean data files to make them usable Help decision makers make decisions based on data
7 How Does This Apply to Me? Deterministic Programming Probabilistic Programming If x and y then z If P(x) > y then z
8 About Me Independent software consultant Education B.S. in Computer Science B.A. in Philosophy Community Pluralsight Author Public Speaker Microsoft MVP ASPInsider Open-Source Software
9 Agenda 1. Introduction 2. Transforming Data 3. Descriptive Statistics 4. Data Visualization 5. Statistical Modeling 6. Handling Big Data 7. Machine Learning 8. R in Practice 1. Transforming Data Project 2. Descriptive Statistics Project 3. Data Visualization Project 4. Statistical Modeling Project 5. Machine Learning Project
10 Schedule Lectures (15 min) Demos (10 min) Labs (20 min) Breaks (10 min)
11 Logistics Pairing for labs is optional Ask questions if needed Come and go as needed Feedback forms at the end
12 Lab Options Type all code Copy and paste Run demo scripts
13 Workshop URL
14 Introduction to Data Science
15 What is Data Science Interdisciplinary field Derive insight from data Transform data into knowledge Source:
16
17 What is a Data Scientist? Performs data science Proper accreditations More than a scientist More than an analyst More than a developer Source:
18
19 The Data Science Toolkit Programming Data transformation Descriptive statistics Data visualization Statistical modeling Big Data Machine learning
20 Source: O Reilly 2015 Data Science Salary Survey
21
22
23 Why is Data Science Important? Lots of data Fast computers Powerful analysis tools Prediction vs. explanation
24 The Data Science Process 1. Ask a question 2. Explore the data 3. Prepare the data 4. Create a model 5. Evaluate the model 6. Deploy the model / results
25 The Data Science Process Iterative process Non-sequential Early termination Established process (CRISP-DM)
26 Introduction to R
27 What is R? Open source Language and environment Numerical and graphical analysis Cross platform
28 What is R? Active development Large user community Modular and extensible extensions
29 FREE!
30 Source:
31
32 Source:
33
34 Code Demo
35 Lab 1 R Programming Basics
36 Transforming Data
37 Data Munging Transforming data Raw data to usable data Data must be cleaned first
38 Data Munging Tasks Renaming variables Data type conversion Encoding, decoding or recoding data Merging data sets Transforming data Handling missing data Handling anomalous values
39 Loading Data in R File-based data Web-based data Databases Statistical data And many more CSV XML
40 Cleaning Data This step is often the Most difficult Most time consuming TIP: Record all steps
41 Data Munging Tools in R plyr dplyr reshape2 tidyr stringr lubridate
42 dplyr Single Data Frame Verbs Select Filter Mutate Summarize Arrange Multiple Data Frame Verbs Inner join Left join Semi join Anti join Union Intersect Set difference
43 Open Movies Database Title Year Rating Runtime (minutes) Movies Genre Critic Score Box Office Awards International The Whole Nine Yards 2000 R 98 Comedy 45% $57.3M No No Cirque du Soleil 2000 G 39 Family 45% $13.4M Yes No Gladiator 2000 R 155 Action 76% $187.3M Yes Yes Dinosaur 2000 PG 82 Family 65% $135.6M Yes No Big Momma's House 2000 PG Comedy 30% $0.5M Yes Yes
44
45
46 1. Column with wrong name 2. Rows with missing values 3. Runtime column has units 4. Revenue in multiple scales 5. Wrong file format
47 Code Demo
48 Lab 2 Transforming Data
49
50 Descriptive Statistics
51 Descriptive Statistics Movie Runtime Statistic Value (minutes) Describe data Provides a summary aka: Summary statistics Minimum 38 1 st Quartile 93 Median 101 Mean rd Quartile 113 Maximum 219
52 Statistical Terms ID Date Customer Product Quantity Observations Variables Qualitative variable Quantitative variable John Pizza John Soda Jill Salad Jill Milk Miko Pizza Miko Soda Sam Pizza Sam Milk 1
53 Number of Variables Types of Analysis Number of variables 1 - Univariate 2 - Bivariate n - Multivariate Type of variables Categorical - Qualitative Numeric - Quantitative One Categorical Variable Two Categorical Variables Categorical & Numeric Variable Many Variables One Numeric Variable Two Numeric Variables Type of Variable(s)
54 Analyzing One Categorical Variable Movies by Genre Genre Frequency Percentage Action 612 9% Qualitative univariate analysis Frequency of observations Adventure 496 7% Animation 168 2% Comedy % Drama % Horror 269 4%
55 Analyzing One Numeric Variable Quantitative univariate analysis Central tendency Dispersion Shape
56 Visualizing Two Categorical Variables Movies by Genre and Rating Genre G PG PG-13 R Total Qualitative bivariate analysis Joint frequency Marginal frequency Relative frequency Contingency table Action Adventure Animation Comedy Drama Family Total
57 Visualizing Two Categorical Variables Movies by Genre and Rating Genre G PG PG-13 R Total Qualitative bivariate analysis Joint frequency Marginal frequency Relative frequency Contingency table Action Adventure Animation Comedy Drama Family Total
58 Analyzing Two Numeric Variables Quantitative bivariate analysis Predictor vs. outcome Covariance Correlation
59 Analyzing a Numeric Variable Grouped By a Categorical Variable One categorical variable One numeric variable Aggregate measures
60 Number of Variables Analyzing Many Variables One Categorical Variable One Numeric Variable Multivariate analysis Specific meaning in statistics Two Categorical Variables Categorical & Numeric Variable Two Numeric Variables Many Variables Type of Variable(s)
61
62 Cowboys & Space Invaders: The Musical Extended Edition
63 Code Demo
64 Lab 3 Descriptive Statistics
65 Cowboys & Space Invaders: The Musical Extended Edition
66 Data Visualization
67 Data Visualization Visual data representation For human pattern recognition Map dimensions to visual
68 Data Visualization ID Date Customer Product Quantity John Pizza John Soda Jill Salad Jill Milk Miko Pizza Miko Soda Sam Pizza Sam Milk 1
69 Number of Variables Types of Data Visualizations One Categorical Variable One Numeric Variable Number of variables Type of variable(s) Two Categorical Variables Categorical & Numeric Variable Two Numeric Variables Many Variables Type of Variable(s)
70 Visualizing One Categorical Variable Frequency Proportion
71 Visualizing One Numeric Variable Location Spread Shape
72 Visualizing Two Categorical Variables Joint frequency Marginal frequency Relative frequency
73 Visualizing Two Numeric Variables Relationship Correlation Distribution
74 Visualizing Two Numeric Variables Relationship Correlation Distribution
75 Visualizing Time Series Data Values over time Rate of change
76 Visualizing a Numeric Variable Grouped by a Categorical Variable Aggregate Grouped Comparison
77 Visualizing Many Variables
78 Source: Nathan Yau (
79 Plotting Systems in R Base Lattice ggplot2
80 Cowboys & Space Invaders: The Musical Extended Edition
81 Code Demo
82 Lab 4 Data Visualization
83 The Warlord of the Rings : An Unexpected Adventure Feature Length PG
84 Statistical Modeling
85 Statistical Models Representation of reality Mathematical function Parametric models Non-parametric models
86 Types of Statistical Models Probability distribution ANOVA Linear regression Non-linear regression Bayesian network
87 Gaussian Distribution Probability distribution Parametric model Mean Standard deviation Generative model
88 Simple Linear Regression Linear model Relationship between variables Continuous explanatory variable Single scalar dependent variable
89 Simple Linear Regression Linear predictor function y = m x + b Parameters estimated from data Makes certain assumptions
90 Source:
91
92 Photo by Danielle Langlois
93 Code Demo
94 Lab 5 Statistical Models
95 Photos by Radomił Binek, Danielle Langlois, and Frank Mayfield
96 Handling Big Data
97 What is Big Data?
98 Big Data is a Moving Target What is Big Data today Will not be Big Data tomorrow
99 Do I Have Big Data? Can it fit in memory? Can it fit on your hard drive? Can you process it in a day? Does it fit into tables?
100 Big Data Decision Table Source: Michael E Driscoll Note: Last updated in 2010
101 Decision Table for Big Data in R Class Rows Manage with Small Less than 1 million R on desktop computer Medium 1 million to 1 billion R with 3 rd party support Big More than 1 billion R with big data support Source: Jan Wijffels at user!-conference (2013)
102 If you don t have a big data problem, but you think you have a big data problem, then you ve just created a big data problem!
103 How to Handle Big Data in R Sampling Microsoft R Open 3 rd -party medium data packages 3 rd -party big data packages Microsoft R Server
104 Sampling Randomly selected Subset of original data Low-cost solution
105 Microsoft R Open Enhance 64-bit R distribution Multithreaded performance Only helps with CPU constraint Package repository snapshots Source: Microsoft R Open
106 Microsoft R Open Performance Source:
107 3 rd -Party Extension Packages Big Memory ff
108 3 rd -Party Extension Packages pbdr Rhipe Rhive Rhbase Rhdfs, Rmr Source: Wikipedia
109 Microsoft R Server Microsoft R Server Windows SQL Server 2016 SUSE Linux Redhat Linux Teradata Hadoop HD Insight Source: Microsoft R Server
110 No Demo or Lab 6
111 Machine Learning
112 What is Machine Learning? Subtype of artificial intelligence Learning without being programmed Similar to statistical modeling Similar to data mining Source: Futurama
113 How does Machine Learning Work? Uses statistical models Model s parameters are trained Trained with algorithm and data Model is used to predict output Prediction vs. explanation Source: Futurama
114 What Can Machine Learning Do? Classification Regression Clustering Anomaly detection Source: Futurama
115 Types of Machine Learning Supervised Unsupervised Reinforcement Source: Futurama
116 Types of ML Algorithms Decision Trees Naïve Bayes Classifier Linear Regression Logistic Regression Support Vector Machines Neural Networks K-Means Clustering Ensemble Learning Source: Futurama
117 k-means Clustering Unsupervised learning Specify k (# of clusters) Algorithm finds centers Random restarts Source: Wikipedia
118 Decision Tree Classifier Supervised learning Tree of decisions Easy to understand Transparent Source: Wikipedia
119 Naïve Bayes Classifier Supervised learning Simple Bayesian classifier Independence assumption Relatively easy to understand Transparent
120 Neural Network Classifier Supervised learning Model of neurons in brain Layers of neurons Backpropagation Complex Not transparent Source: Wikipedia
121 Overfitting and Regularization Overfit too specialized Underfit too generalized Regularization techniques Early stopping Pruning (trees) Adding noise Parameter tuning Source: Wikipedia
122 The Machine Learning Process 1. Find a question 2. Prepare the data 3. Train the model Monitor the model Find a question Prepare the data 4. Evaluate the model 5. Deploy the model 6. Monitor the model Deploy the model Evaluate the model Train the model
123 Photos by Radomił Binek, Danielle Langlois, and Frank Mayfield
124 Code Demo
125 Lab 7 Machine Learning
126
127 R in Practice
128 How to Use R in Practice? 1. Deploying R to production 2. Best practices 3. Creating reproducible research
129 How to Deploy to Production Export charts (Rstudio) Create documents (Markdown) Create interactive reports (Shiny) Deploy to Server (R Server) Deploy to Cloud (Azure ML)
130 Code Demo
131
132 Start with a Question How many movies were released in each rating category from 2000 to 2015?
133 Use the Right Tool for the Job
134 Know Your Audience
135 Movies Create Clean Data Analyses 1500 Movies by Rating Category G PG PG-13 0 G PG Rating PG-13 R R
136 Avoid Biases and Information Distortions
137 Create Reproducible Research Replication is hallmark of science Big issue in science right now Allows other to verify findings Creates transparency Allows other to build upon work Source:
138 How Do We Create Reproducible Research? Provide raw data and code Script all analysis steps Use source control State all assumptions Use markdown Source:
139 Where to Go Next Coursera: Revolutions: Flowing Data: R-Blogger: R-Seek:
140 My Website Articles Presentations Source Code Videos Workshops
141 Conclusion
142 Conclusion 1. Introduction 2. Transforming Data 3. Descriptive Statistics 4. Data Visualization 5. Statistical Modeling 6. Handling Big Data 7. Machine Learning 8. R in Practice
143 Feedback Feedback is very important to me! One thing you liked? One thing I could improve?
144 Contact Info Matthew Renze Data Science Consultant Renze Consulting Website: Thank You! : )
Hal Varian, Google s Chief Economist The McKinsey Quarterly, Jan 2009
The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that s going to be a hugely important skill in the next decades, because
More informationPractical Data Science with #DevoxxMA
Practical Data Science with R @matthewrenze #DevoxxMA The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that s going
More informationExploratory Data Analysis with R. Matthew Renze Iowa Code Camp Fall 2013
Exploratory Data Analysis with R Matthew Renze Iowa Code Camp Fall 2013 Motivation The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate
More informationIntroduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)
Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data
More informationDATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:
DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business
More informationThink & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)
Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) About the Speaker Dr. SubraMANI Paramasivam PhD., MCT, MCSE, MCITP, MCP, MCTS, MCSA CEO, Principal Consultant & Trainer
More informationIntroduction to Data Mining
Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data
More informationData Science. Data Analyst. Data Scientist. Data Architect
Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &
More informationIvy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)
Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V) Based on Industry Cases, Live Exercises, & Industry Executed Projects Module (I) Analytics Essentials 81 hrs 1. Statistics
More informationDemystifying Machine Learning
Demystifying Machine Learning Dmitry Figol, WW Enterprise Sales Systems Engineer - Programmability @dmfigol CTHRST-1002 Agenda Machine Learning examples What is Machine Learning Types of Machine Learning
More informationBIG DATA SCIENTIST Certification. Big Data Scientist
BIG DATA SCIENTIST Certification Big Data Scientist Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big Data. To obtain a certification,
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationDr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R
Dr. SubraMANI Paramasivam Think & Work like a Data Scientist with SQL 2016 & R About the Speaker Group Leader Dr. SubraMANI Paramasivam PhD., MVP, MCT, MCSE (x2), MCITP (x2), MCP, MCTS (x3), MCSA CEO,
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationEvent: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect
Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationJMP Book Descriptions
JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationSTAT 1291: Data Science
STAT 1291: Data Science Lecture 20 - Summary Sungkyu Jung Semester recap data visualization data wrangling professional ethics statistical foundation Statistical modeling: Regression Cause and effect:
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationOverview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce
Overview Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Core Ideas in Data Mining Classification Prediction Association Rules Data Reduction Data Exploration
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationMachine Learning with Python
DEVNET-2163 Machine Learning with Python Dmitry Figol, SE WW Enterprise Sales @dmfigol Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session
More informationLearning Objectives for Data Concept and Visualization
Learning Objectives for Data Concept and Visualization Assignment 1: Data Quality Concept and Impact of Data Quality Summarize concepts of data quality. Understand and describe the impact of data on actuarial
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationData Science Training
Data Science Training R, Predictive Modeling, Machine Learning, Python, Bigdata & Spark 9886760678 Introduction: This is a comprehensive course which builds on the knowledge and experience a business analyst
More informationAnalyzing Big Data with Microsoft R
Analyzing Big Data with Microsoft R 20773; 3 days, Instructor-led Course Description The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis
More informationDSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Exploratory Data Analysis Dr. David Koop What is Exploratory Data Analysis? "Detective work" to summarize and explore datasets Includes: - Data acquisition and input
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationContents. I Introduction to Data Science 1. Preface. List of Tables. List of Figures
Contents Preface List of Tables List of Figures xv xix xxi I Introduction to Data Science 1 1 Prologue: Why data science? 3 1.1 What is data science?.............................. 4 1.2 Case study: The
More informationA Brief Introduction to Data Mining
A Brief Introduction to Data Mining L. Torgo ltorgo@dcc.fc.up.pt Departamento de Ciência de Computadores Faculdade de Ciências / Universidade do Porto Feb, 2017 What is Data Mining? Introduction A possible
More informationClustering algorithms and autoencoders for anomaly detection
Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms
More informationWeighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract
Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.
More informationOverview. Audience profile. At course completion. Course Outline. : 20773A: Analyzing Big Data with Microsoft R. Course Outline :: 20773A::
Module Title Duration : 20773A: Analyzing Big Data with Microsoft R : 3 days Overview The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis
More informationADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationIntroducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone
Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230
More informationPractical Machine Learning Agenda
Practical Machine Learning Agenda Starting From Log Management Moving To Machine Learning PunchPlatform team Thales Challenges Thanks 1 Starting From Log Management 2 Starting From Log Management Data
More informationAssignment 5: Collaborative Filtering
Assignment 5: Collaborative Filtering Arash Vahdat Fall 2015 Readings You are highly recommended to check the following readings before/while doing this assignment: Slope One Algorithm: https://en.wikipedia.org/wiki/slope_one.
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationSAS (Statistical Analysis Software/System)
SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationData Analytics Training Program
Data Analytics Training Program In exclusive association with 1200+ Trainings 20,000+ Participants 10,000+ Brands 45+ Countries [Since 2009] Training partner for Who Is This Course For? Programers Willing
More informationSQL Server Machine Learning Marek Chmel & Vladimir Muzny
SQL Server Machine Learning Marek Chmel & Vladimir Muzny @VladimirMuzny & @MarekChmel MCTs, MVPs, MCSEs Data Enthusiasts! vladimir@datascienceteam.cz marek@datascienceteam.cz Session Agenda Machine learning
More informationData Mining. Vera Goebel. Department of Informatics, University of Oslo
Data Mining Vera Goebel Department of Informatics, University of Oslo 2012 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD
More informationClustering Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April, 2017 Sham Kakade 2017 1 Document Retrieval n Goal: Retrieve
More informationCase-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationData Mining: Models and Methods
Data Mining: Models and Methods Author, Kirill Goltsman A White Paper July 2017 --------------------------------------------------- www.datascience.foundation Copyright 2016-2017 What is Data Mining? Data
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationAntrix Academy of Data Science TM
TM Preparing for MOST Promising Career Opportunities in Data Analytics... Excel Tableau SAS Excel & SQL IBM SPSS Business Analytics COURSES # Duration* 1 Excel Proficiency 5 Hrs 2 Data Analytics with SAS
More informationSlice Intelligence!
Intern @ Slice Intelligence! Wei1an(Wu( September(8,(2014( Outline!! Details about the job!! Skills required and learned!! My thoughts regarding the internship! About the company!! Slice, which we call
More informationData analysis case study using R for readily available data set using any one machine learning Algorithm
Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationClustering Documents. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 21 th, 2015 Sham Kakade 2016 1 Document Retrieval Goal: Retrieve
More informationOutrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS
Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationSAS High-Performance Analytics Products
Fact Sheet What do SAS High-Performance Analytics products do? With high-performance analytics products from SAS, you can develop and process models that use huge amounts of diverse data. These products
More informationSOFTWARE DEVELOPMENT: DATA SCIENCE
PROFESSIONAL CAREER TRAINING INSTITUTE SOFTWARE DEVELOPMENT: DATA SCIENCE www.pcti.edu/data-science applicant@pcti.edu 832-484-9100 PROGRAM OVERVIEW Prepare for a life changing career as a data scientist
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationIntroducing Categorical Data/Variables (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Definition: Feature Engineering (FE) = the process of transforming the data to an optimal representation for a given application. Scaling (see Chs.
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationScalable Machine Learning in R. with H2O
Scalable Machine Learning in R with H2O Erin LeDell @ledell DSC July 2016 Introduction Statistician & Machine Learning Scientist at H2O.ai in Mountain View, California, USA Ph.D. in Biostatistics with
More informationThe Definitive Guide to Preparing Your Data for Tableau
The Definitive Guide to Preparing Your Data for Tableau Speed Your Time to Visualization If you re like most data analysts today, creating rich visualizations of your data is a critical step in the analytic
More informationNow, Data Mining Is Within Your Reach
Clementine Desktop Specifications Now, Data Mining Is Within Your Reach Data mining delivers significant, measurable value. By uncovering previously unknown patterns and connections in data, data mining
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationOverview and Practical Application of Machine Learning in Pricing
Overview and Practical Application of Machine Learning in Pricing 2017 CAS Spring Meeting May 23, 2017 Duncan Anderson and Claudine Modlin (Willis Towers Watson) Mark Richards (Allstate Insurance Company)
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationClosing Thoughts on Machine Learning (ML in Practice)
Closing Thoughts on (ML in Practice) 1 Closing Thoughts on (ML in Practice) 1 When someone asks What is? Learning is any process by which a system improves performance from experience. - Herbert Simon
More informationPre-Requisites: CS2510. NU Core Designations: AD
DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification
More informationHow App Ratings and Reviews Impact Rank on Google Play and the App Store
APP STORE OPTIMIZATION MASTERCLASS How App Ratings and Reviews Impact Rank on Google Play and the App Store BIG APPS GET BIG RATINGS 13,927 AVERAGE NUMBER OF RATINGS FOR TOP-RATED IOS APPS 196,833 AVERAGE
More information90 Hours for online Live Training
Online Live Course Course Name Data Science Course Objective 1. To make the learner identify potential zones of uses of Data Science. 2. Providing experience of working with real time applications of Data
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationDivide & Recombine with Tessera: Analyzing Larger and More Complex Data. tessera.io
1 Divide & Recombine with Tessera: Analyzing Larger and More Complex Data tessera.io The D&R Framework Computationally, this is a very simple. 2 Division a division method specified by the analyst divides
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More informationMachine Learning in Python. Rohith Mohan GradQuant Spring 2018
Machine Learning in Python Rohith Mohan GradQuant Spring 2018 What is Machine Learning? https://twitter.com/myusuf3/status/995425049170489344 Traditional Programming Data Computer Program Output Getting
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationIntroducing Oracle R Enterprise 1.4 -
Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationMachine Learning Reliability Techniques for Composite Materials in Structural Applications.
Machine Learning Reliability Techniques for Composite Materials in Structural Applications. Roberto d Ippolito, Keiichi Ito, Silvia Poles, Arnaud Froidmont Noesis Solutions Optimus by Noesis Solutions
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationClean Architecture Patterns, Practices, and #sddconf
Clean Architecture Patterns, Practices, and Principles @matthewrenze #sddconf About Me Independent consultant Education B.S. in Computer Science (ISU) B.A. in Philosophy (ISU) Community Public Speaker
More informationData Science Tutorial
Eliezer Kanal Technical Manager, CERT Daniel DeCapria Data Scientist, ETC Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 2017 SEI SEI Data Science in in Cybersecurity Symposium
More informationCPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018
CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2018 Admin Assignment 2 is due Friday. Assignment 1 grades available? Midterm rooms are now booked. October 18 th at 6:30pm (BUCH A102
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationEntry Name: "INRIA-Perin-MC1" VAST 2013 Challenge Mini-Challenge 1: Box Office VAST
Entry Name: "INRIA-Perin-MC1" VAST 2013 Challenge Mini-Challenge 1: Box Office VAST Team Members: Charles Perin, INRIA, Univ. Paris-Sud, CNRS-LIMSI, charles.perin@inria.fr PRIMARY Student Team: YES Analytic
More information10601 Machine Learning. Model and feature selection
10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior
More information