Tomaž Kaštrun. Data Science for Beginners

Size: px
Start display at page:

Download "Tomaž Kaštrun. Data Science for Beginners"

Transcription

1 Tomaž Kaštrun Data Science for Beginners

2 To all sponsors, thank you!

3 Thanks to all organizers! GetLatestVersion. it

4 About (2.0.1) BI Developer and data analyst SQL Server, SAS, R, Python, C#, SAP, SPSS 15years experience MSSQL, DEV, BI, DM Working at GEN-I Frequent community speaker Avid coffee drinker & Bicycle junkie

5 This talk was born out of frustration! Why?

6 faking it!

7 Can you answer? 1.Explain what regularization is and why it is useful. 2. Explain what precision and recall are. How do they relate to the ROC curve? 3.What is root cause analysis? 4.Are you familiar with pricing optimization, price elasticity, inventory management, competitive intelligence? Give examples. 5.What is statistical power? 6.Explain what resampling methods are and why they are useful. Also explain their limitations. 7.Is it better to have too many false positives, or too many false negatives? Explain. 8.What is selection bias, why is it important and how can you avoid it? 9.Give an example of how you would use experimental design to answer a question about user behavior. 10.How would you screen for outliers and what should you do if you find one? 11.How would you use either the extreme value theory, Monte Carlo simulations or mathematical statistics (or anything else) to correctly estimate the chance of a very rare event? 12.What is a recommendation engine? How does it work? 13.Explain what a false positive and a false negative are. Why is it important to differentiate these from each other? Source:

8 Can you answer? Source:

9 Developers and database people Statisticians Data science People from business

10 Developers and database people Statisticians People from business

11 Developers and database people Statisticians a.k.a. Data scientist People from business

12 What is data science? It s a buzz word!

13 Terms over time statistics Data mining Data science? ~ Examples: - Regression (Stats 101, SSAS, R+MSSQL) - Decision trees (Stats 101, SSAS, R+MSSQL) - Complexity reduction (Stats 101, SSAS, R+MSSQL) - Clustering (Stats 101, SSAS, R+MSSQL) 2010 ~

14 Interfaces over time statistics Data mining Data science ~ ~

15 Who is data scienctist? MacBook Statistician San Francisco

16 So who is a data scienctist? NOOooo!!!! It s a statistician!!! Source: Internet

17 Because what s next???? Internet killing doctors? It s like searching on internet for your symptoms instead of visiting a doctor!

18 Same goes for data science? Is the buzz data science killing statistics? and statisticians?

19 Why data scientist was born? because it was too damn hard to pronounce /ˌstætɪˈstɪʃ(ə)n/ STATISTICIAN

20 and because it s a sexy job According to a lot of articles published in 2015 and 2016 (thanks to decline of research journalism and copy/paste journalist) this was the top paid job, highly appreciated and wanted position. Well good morning! But we had these positions since the 60 s. They were called. f@%#! that word stasti statsi.something well, straccatella

21 and think about the movies.

22 at the end everyone would like to talk about. eventhough they have never seen.

23 at the very end data science is like teenage sex Everybody talks about it. Nobody really knows how to do it. Everybody thinks everyone else is doing it. So everyone claims they are doing it. In reality this looks like

24 I don t want to conclude, but. Having copy of your business logic and data in jammed into 1,048,576 rows by 16,384 column Data Science = Point & Click adventure game in Azure Machine Learning Copy-Paste-from-web advanced skills Huh Totally forgot about that?!

25 Let s do a quick example to support this problem Euclidian distances between friends

26 Let s do a quick example to support this problem Did he just say Eucl???? distance?

27 Let s do a quick example to support this problem dist(samplerestauratns, distance=euclidian)

28 Let s do a quick example to support this problem Simple distance from the center can gives us sensitivity of the ratings

29 Let s do a quick example to support this problem And so called sensitivity?

30 Think about. Data scientist use: Statisticians (and developers) use: dist(samplerestauratns, distance=euclidian) scale(samplerestaurants, center=t, scale=t)

31 Let s do a quick example to support this problem And doing some more goofying around.

32 Let s do a quick example to support this problem but not sure if algorithm preventing goofy pictures exists?

33 How to stop faking it? Start learning it Understand Test, test, test

34 Why you re not a data scientist?! Some Time series business intelligence stack doesn t make you a data scientist. Programming experience with Hadoop, R, Python, Octave, Matlib and Mathematica are data science tools. Tool skills alone don t give you data science cred. The 8-week course you took on Coursera or the Data Science boot camp you attended does not make you a data scientist Evangelizing Big data does not make you a data scientist Having degree in mathematics and statistics without field and applicative knowledge does not make you a data scientists Source:

35 Agenda for today 1) What we do in Data Science 2) Materials, tools and programs 3) Data science in business world

36 1) What we do in Data Science (part 1) Querying relational data Analyzing and visualizing the data Tasks for: Developers and Database people

37 1) What we do in Data Science (part 2) Understanding stastistics Exploring data with R / Python / Julia Understanding core data science concepts Understand Machine Learning Programming with R / Python to manipulate and model data Apply solution Tasks for: statisticians Tasks for: Business people

38 5 Core data science concetps? 1) is this weird? 2) is A better than B, respectively? 3) how much / many of this is needed? 4) this belong to group A? 5) what is next?

39 Think of. Algorithm as a cooking recipe

40 Think of. Your dataset as an ingredients

41 Think of. Your pans and pots as a computer

42 Think of. Statistician Data Scientist as a chef

43 Think of. Results as a finished dish

44 Think of. API as a prepared food

45 Is this weird spot the intruder? Anomaly detection

46 Is it Blue or is it Gold? Classification

47 What will be the temperature / stock? Regression

48 Belong to which group? Clustering

49 what is next? Reinforcement learning algorithms

50 2) Is your data ready for data science? 1) Is it relevant 2) Is data correlated 3) Is data distributed and accurate 4) Do I have enough data (variables, columns) 5) Unwanted correlations (multicolinearity, hyper )

51 2) Is your data ready for data science? Is it relevant

52 2) Is your data ready for data science? Is it related ( or non-empty)?

53 2) Is your data ready for data science? Is it accurate? Source:

54 2) Is your data ready for data science? Do we have enough? Sampling Number or observations vs. Number of variables Type of algorithm

55 3) Ask the right question? 1) Ask SMART 2) Ask in this way that includes target/predicted data 3) Formulate question based on data and algorithm

56 Model data and apply solution 1) Model data 2) Predict data 3) Apply solution

57 2) Materials and tools (Part 1) R consortium Books on statistics, statistical learning and machine learning Microsoft Books on line Microsoft Virtual academy Udemy, Packt,

58 2) Materials and tools (Part 2) R / Python / Julia / Excel Use Microsoft Azure ML Amazon Web service EC2 SQL Server BI stack Many vendors: SAS, IBM, Tibco, SAP, Tableau, Pentaho,Qlick, Microstrategy, Alteryx, etc.

59 3) Data Science in business world Loyalty program churn analysis Frau detection Out-of-stock prediction Customer classification Recommendation stuff Customer behaviour

60 Sources: Microsoft MVA Microsoft Data Science program ( Stats

61 R and SQL Server (SQL Server 2017 CTP 2.x)

62 The behind Architecture

63 Ecosystem RevoScaleR Package

64 R and T-SQL for predictive analytics EXECUTE = N'R',@script = N' library(e1071); irismodel <-naivebayes(iris_data[,1:4], iris_data[,5]); trained_model <- data.frame(payload = as.raw(serialize(irismodel, connection=null)));',@input_data_1 = N'select "Sepal.Length", "Sepal.Width","Petal.Length","Petal.Width","Species" from iris_data',@input_data_1_name = N'iris_data',@output_data_1_name = N'trained_model' WITH RESULT SETS ((model VARBINARY(MAX))); EXECUTE = N'R',@script = N'require("RevoScaleR"); irislinmod <- rxlinmod(sepal.length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data = iris_rx_data); trained_model <- data.frame(payload = as.raw(serialize(irislinmod, connection=null)));',@input_data_1 = N'select "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species" from iris_rx_data',@input_data_1_name = N'iris_rx_data',@output_data_1_name = N'trained_model' WITH result SETS ((model VARBINARY(MAX)));

65 Thanks! and learn Statistics!!

66 #sqlsat675 THANKS! Q&A

Andrea Martorana Tusa. Failure prediction for manifacturing industry

Andrea Martorana Tusa. Failure prediction for manifacturing industry Andrea Martorana Tusa Failure prediction for manifacturing industry Event Sponsors Expo Sponsors Expo Light Sponsors Speaker Info First name: Andrea. Last name: Martorana Tusa. Italian, working by Widex

More information

SQL Server 2016 R Integration for database administrators

SQL Server 2016 R Integration for database administrators SQL Server 2016 R Integration for database administrators What can DBA gain by using R Integration for SQL Server 2016? Tomaž Kaštrun 20.Jänner, 2017 Our Sponsors About BI Developer and data analyst (SQL

More information

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) About the Speaker Dr. SubraMANI Paramasivam PhD., MCT, MCSE, MCITP, MCP, MCTS, MCSA CEO, Principal Consultant & Trainer

More information

BEST BIG DATA CERTIFICATIONS

BEST BIG DATA CERTIFICATIONS VALIANCE INSIGHTS BIG DATA BEST BIG DATA CERTIFICATIONS email : info@valiancesolutions.com website : www.valiancesolutions.com VALIANCE SOLUTIONS Analytics: Optimizing Certificate Engineer Engineering

More information

Dr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R

Dr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R Dr. SubraMANI Paramasivam Think & Work like a Data Scientist with SQL 2016 & R About the Speaker Group Leader Dr. SubraMANI Paramasivam PhD., MVP, MCT, MCSE (x2), MCITP (x2), MCP, MCTS (x3), MCSA CEO,

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

Modeling. Preparation. Operationalization. Profile Explore. Model Testing & Validation. Feature & Algorithm Selection. Transform Cleanse Denormalize

Modeling. Preparation. Operationalization. Profile Explore. Model Testing & Validation. Feature & Algorithm Selection. Transform Cleanse Denormalize Preparation Modeling Ingest Transform Cleanse Denormalize Profile Explore Visualize Feature & Algorithm Selection Model Testing & Validation Operationalization Models Visualizations Deploy Apps, Services

More information

Boost your Analytics with ML for SQL Nerds

Boost your Analytics with ML for SQL Nerds Boost your Analytics with ML for SQL Nerds SQL Saturday Spokane Mar 10, 2018 Julie Koesmarno @MsSQLGirl mssqlgirl.com jukoesma@microsoft.com Principal Program Manager in Business Analytics for SQL Products

More information

Build a system health check for Db2 using IBM Machine Learning for z/os

Build a system health check for Db2 using IBM Machine Learning for z/os Build a system health check for Db2 using IBM Machine Learning for z/os Jonathan Sloan Senior Analytics Architect, IBM Analytics Agenda A brief machine learning overview The Db2 ITOA model solutions template

More information

Data Analysis Using Sql And Excel 2nd Edition

Data Analysis Using Sql And Excel 2nd Edition We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with data analysis using

More information

MACHINE LEARNING Example: Google search

MACHINE LEARNING Example: Google search MACHINE LEARNING Lauri Ilison, PhD Data Scientist 20.11.2014 Example: Google search 1 27.11.14 Facebook: 350 million photo uploads every day The dream is to build full knowledge of the world and know everything

More information

Data Science Training

Data Science Training Data Science Training R, Predictive Modeling, Machine Learning, Python, Bigdata & Spark 9886760678 Introduction: This is a comprehensive course which builds on the knowledge and experience a business analyst

More information

The Allure of Machine Learning, now within Reach in Microsoft Azure

The Allure of Machine Learning, now within Reach in Microsoft Azure A Mariner White Paper The Allure of Machine Learning, now within Reach in Microsoft Azure Or Why AzureML is Better for Data Mining than Excel By Colby Ford, Associate Data Analytics Consultant 2719 Coltsgate

More information

Scalable Tools - Part I Introduction to Scalable Tools

Scalable Tools - Part I Introduction to Scalable Tools Scalable Tools - Part I Introduction to Scalable Tools Adisak Sukul, Ph.D., Lecturer, Department of Computer Science, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/mbds2018/ Scalable Tools session

More information

SOFTWARE DEVELOPMENT: DATA SCIENCE

SOFTWARE DEVELOPMENT: DATA SCIENCE PROFESSIONAL CAREER TRAINING INSTITUTE SOFTWARE DEVELOPMENT: DATA SCIENCE www.pcti.edu/data-science applicant@pcti.edu 832-484-9100 PROGRAM OVERVIEW Prepare for a life changing career as a data scientist

More information

Clustering algorithms and autoencoders for anomaly detection

Clustering algorithms and autoencoders for anomaly detection Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Deploying, Managing and Reusing R Models in an Enterprise Environment

Deploying, Managing and Reusing R Models in an Enterprise Environment Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics

More information

Data Engineering for Data Science

Data Engineering for Data Science Engineering for Science Arup Nanda VP, Services Priceline booking.com priceline.com kayak.com agoda.com rentalcars.com opentable.com 2 Science and Machine Learning Customer Segmentation Prediction of Behavior

More information

Slice Intelligence!

Slice Intelligence! Intern @ Slice Intelligence! Wei1an(Wu( September(8,(2014( Outline!! Details about the job!! Skills required and learned!! My thoughts regarding the internship! About the company!! Slice, which we call

More information

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

SQL Server Machine Learning Marek Chmel & Vladimir Muzny SQL Server Machine Learning Marek Chmel & Vladimir Muzny @VladimirMuzny & @MarekChmel MCTs, MVPs, MCSEs Data Enthusiasts! vladimir@datascienceteam.cz marek@datascienceteam.cz Session Agenda Machine learning

More information

USERS CONFERENCE Copyright 2016 OSIsoft, LLC

USERS CONFERENCE Copyright 2016 OSIsoft, LLC Bridge IT and OT with a process data warehouse Presented by Matt Ziegler, OSIsoft Complexity Problem Complexity Drives the Need for Integrators Disparate assets or interacting one-by-one Monitoring Real-time

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Spotfire and Tableau Positioning. Summary

Spotfire and Tableau Positioning. Summary Licensed for distribution Summary So how do the products compare? In a nutshell Spotfire is the more sophisticated and better performing visual analytics platform, and this would be true of comparisons

More information

ACHIEVEMENTS FROM TRAINING

ACHIEVEMENTS FROM TRAINING LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM

More information

Dr. Michael Curry. Oregon. The Big Picture: SQL Overview and Getting the Most from SQL Saturday

Dr. Michael Curry. Oregon. The Big Picture: SQL Overview and Getting the Most from SQL Saturday Dr. Michael Curry michael.curry@wsu.edu Oregon The Big Picture: SQL Overview and Getting the Most from SQL Saturday Academic Data Management E-Commerce Entrepreneurship Dr. Michael Curry /michaellcurry/

More information

Execution of R Built Predictive Solutions

Execution of R Built Predictive Solutions Execution of R Built Predictive Solutions Alex Guazzelli, PhD VP, Analytics - Zementis, Inc. user! 2010 Zementis Exporting Models from R Memory Why? Speed Transparency Freedom Interoperability Accessibility

More information

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition What s the BIG deal?! 2011 2011 2008 2010 2012 What s the BIG deal?! (Gartner Hype Cycle) What s the

More information

The Data Science Process. Polong Lin Big Data University Leader & Data Scientist IBM

The Data Science Process. Polong Lin Big Data University Leader & Data Scientist IBM The Data Science Process Polong Lin Big Data University Leader & Data Scientist IBM polong@ca.ibm.com Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today

More information

Mastering Data Warehouse Aggregates Solutions For Star Schema Performance

Mastering Data Warehouse Aggregates Solutions For Star Schema Performance Mastering Data Warehouse Aggregates Solutions For Star Schema Performance Star Schema The Complete Reference Christopher Adamson Amazon. Mastering Data Warehouse Aggregates, Solutions for Star Schema Performance

More information

Certified Data Science with Python Professional VS-1442

Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional Certified Data Science with Python Professional Certification Code VS-1442 Data science has become

More information

Python With Data Science

Python With Data Science Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,

More information

Chuck Cartledge, PhD. 12 October 2018

Chuck Cartledge, PhD. 12 October 2018 Big Data: Data Analysis Boot Camp Introduction and Overview Chuck Cartledge, PhD 12 October 2018 1/14 Table of contents (1 of 1) 1 Introduction The global view 2 Overview The world from 50,000 feet. Text

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #1: Course Introduction U Kang Seoul National University U Kang 1 In This Lecture Motivation to study data mining Administrative information for this course U Kang 2

More information

Indira Bandari. Predictive Analytics using R in SQL Server

Indira Bandari. Predictive Analytics using R in SQL Server Indira Bandari Predictive Analytics using R in SQL Server Agenda What is Predictive Analytics? Analytics vs. Predictive Analytics Benefits of using R Predictive Analytics Life Cycle Demo Indira Bandari

More information

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

Mike Schulte Data Scientist at the University of Pittsburgh Professor of Economics and Philosophy at Western Michigan University

Mike Schulte Data Scientist at the University of Pittsburgh Professor of Economics and Philosophy at Western Michigan University Mike Schulte mike@shrewd-owl.com Data Scientist at the University of Pittsburgh Professor of Economics and Philosophy at Western Michigan University Advanced Analytics Introduced Advanced Analytics within

More information

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their

More information

Note: In the presentation I should have said "baby registry" instead of "bridal registry," see

Note: In the presentation I should have said baby registry instead of bridal registry, see Q-and-A from the Data-Mining Webinar Note: In the presentation I should have said "baby registry" instead of "bridal registry," see http://www.target.com/babyregistryportalview Q: You mentioned the 'Big

More information

Ian Choy. Technology Solutions Professional

Ian Choy. Technology Solutions Professional Ian Choy Technology Solutions Professional XML KPIs SQL Server 2000 Management Studio Mirroring SQL Server 2005 Compression Policy-Based Mgmt Programmability SQL Server 2008 PowerPivot SharePoint Integration

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

Doing the Data Science Dance

Doing the Data Science Dance Doing the Data Science Dance Dean Abbott Abbott Analytics, SmarterHQ KNIME Fall Summit 2018 Email: dean@abbottanalytics.com Twitter: @deanabb 1 Data Science vs. Other Labels 2 Google Trends 3 Abbott Analytics,

More information

Citizen Data Scientist is the new Data Analyst

Citizen Data Scientist is the new Data Analyst Welcome # T C 1 8 Citizen Data Scientist is the new Data Analyst Mehmet Vanli Sales Consultant Tableau Australia Citizen data scientist: A person who creates models that use advanced diagnostic analytics

More information

Spam. Time: five years from now Place: England

Spam. Time: five years from now Place: England Spam Time: five years from now Place: England Oh no! said Joe Turner. When I go on the computer, all I get is spam email that nobody wants. It s all from people who are trying to sell you things. Email

More information

Database Management Systems

Database Management Systems Database Management Systems Fall 2017 Knowledge is of two kinds: we know a subject ourselves, or we know where we can find information upon it. -- Samuel Johnson (1709-1784) Queries for Today Why? Who?

More information

BEGINNER SQL PROGRAMMING USING MICROSOFT SQL SERVER 2016

BEGINNER SQL PROGRAMMING USING MICROSOFT SQL SERVER 2016 BEGINNER SQL PROGRAMMING USING PDF EBOOK3000 LEARNING SQL PROGRAMMING - LYNDA.COM 1 / 6 2 / 6 3 / 6 beginner sql programming using pdf ebook Details: Paperback: 296 pages Publisher: WOW! ebook (September

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Collaboration at Scale: Prioritizing a Backlog. 13-Dec-2017

Collaboration at Scale: Prioritizing a Backlog. 13-Dec-2017 Collaboration at Scale: Prioritizing a Backlog 13-Dec-2017 Collaboration at Scale Designed for Scrum-centric organizations with more than 10 Scrum teams, the Collaboration at Scale webinar series provides

More information

KNIME for the life sciences Cambridge Meetup

KNIME for the life sciences Cambridge Meetup KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More

More information

Data Mining Concepts & Tasks

Data Mining Concepts & Tasks Data Mining Concepts & Tasks Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Sept 9, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos Last Time

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology ❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous

More information

BEGINNER SQL PROGRAMMING USING MICROSOFT SQL SERVER 2012

BEGINNER SQL PROGRAMMING USING MICROSOFT SQL SERVER 2012 BEGINNER SQL PROGRAMMING USING PDF EBOOK3000 LEARNING SQL PROGRAMMING - LYNDA.COM 1 / 6 2 / 6 3 / 6 beginner sql programming using pdf ebook Details: Paperback: 206 pages Publisher: WOW! ebook (September

More information

DEEP DIVE. Leave IT Alone: The Vast Value of Self-Service. #DMRadio

DEEP DIVE. Leave IT Alone: The Vast Value of Self-Service. #DMRadio DEEP DIVE Leave IT Alone: The Vast Value of Self-Service #DMRadio Featured Speakers The Long-Standing Data Warehousing Models The Reliance on ETL Must Subside! Trust is the Cornerstone of Data-Driven

More information

APACHE SPARK 2 FOR BEGINNERS BY RAJANARAYANAN THOTTUVAIKKATUMANA DOWNLOAD EBOOK : APACHE SPARK 2 FOR BEGINNERS BY RAJANARAYANAN THOTTUVAIKKATUMANA PDF

APACHE SPARK 2 FOR BEGINNERS BY RAJANARAYANAN THOTTUVAIKKATUMANA DOWNLOAD EBOOK : APACHE SPARK 2 FOR BEGINNERS BY RAJANARAYANAN THOTTUVAIKKATUMANA PDF Read Online and Download Ebook APACHE SPARK 2 FOR BEGINNERS BY RAJANARAYANAN THOTTUVAIKKATUMANA DOWNLOAD EBOOK : APACHE SPARK 2 FOR BEGINNERS BY RAJANARAYANAN Click link bellow and free register to download

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Machine Learning - Clustering. CS102 Fall 2017

Machine Learning - Clustering. CS102 Fall 2017 Machine Learning - Fall 2017 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for

More information

OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS

OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS 1 Why GPUs? A Tale of Numbers 100x Performance Increase Infrastructure Cost Savings Performance 100x gains over traditional

More information

Lecture 19: Generative Adversarial Networks

Lecture 19: Generative Adversarial Networks Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,

More information

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017 CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

I am a Data Nerd and so are YOU!

I am a Data Nerd and so are YOU! I am a Data Nerd and so are YOU! Not This Type of Nerd Data Nerd Coffee Talk We saw Cloudera as the lone open source champion of Hadoop and the EMC/Greenplum/MapR initiative as a more closed and

More information

Week 1 Unit 1: Introduction to Data Science

Week 1 Unit 1: Introduction to Data Science Week 1 Unit 1: Introduction to Data Science The next 6 weeks What to expect in the next 6 weeks? 2 Curriculum flow (weeks 1-3) Business & Data Understanding 1 2 3 Data Preparation Modeling (1) Introduction

More information

SQL Server 2017: Data Science with Python or R?

SQL Server 2017: Data Science with Python or R? SQL Server 2017: Data Science with Python or R? Dejan Sarka Sponsor Introduction Dejan Sarka (dsarka@solidq.com, dsarka@siol.net, @DejanSarka) 30 years of experience SQL Server MVP, MCT, 16 books 20+ courses,

More information

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio Adela Ioana Tudor, Adela Bâra, Simona Vasilica Oprea Department of Economic Informatics

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with

More information

Oracle Big Data Science

Oracle Big Data Science Oracle Big Data Science Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com @VlamisSoftware Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri

More information

How App Ratings and Reviews Impact Rank on Google Play and the App Store

How App Ratings and Reviews Impact Rank on Google Play and the App Store APP STORE OPTIMIZATION MASTERCLASS How App Ratings and Reviews Impact Rank on Google Play and the App Store BIG APPS GET BIG RATINGS 13,927 AVERAGE NUMBER OF RATINGS FOR TOP-RATED IOS APPS 196,833 AVERAGE

More information

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4 Prof. James She james.she@ust.hk 1 Selected Works of Activity 4 2 Selected Works of Activity 4 3 Last lecture 4 Mid-term

More information

Why is it Difficult to Find a Good Free Web Host

Why is it Difficult to Find a Good Free Web Host From the SelectedWorks of Umakant Mishra February, 2012 Why is it Difficult to Find a Good Free Web Host Umakant Mishra Available at: https://works.bepress.com/umakant_mishra/102/ Why is it difficult to

More information

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Creating a Recommender System. An Elasticsearch & Apache Spark approach Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused

More information

SQL SERVER INTERVIEW QUESTIONS AND ANSWERS FOR ALL DATABASE DEVELOPERS AND DEVELOPERS ADMINISTRATORS

SQL SERVER INTERVIEW QUESTIONS AND ANSWERS FOR ALL DATABASE DEVELOPERS AND DEVELOPERS ADMINISTRATORS SQL SERVER INTERVIEW QUESTIONS AND ANSWERS FOR ALL DATABASE DEVELOPERS AND DEVELOPERS ADMINISTRATORS page 1 / 5 page 2 / 5 sql server interview questions pdf SQL Server - 204 SQL Server interview questions

More information

The OLX data theory of everything

The OLX data theory of everything The OLX data theory of everything Caspar Schönau Head of Global BI Jakub Orłowski Data engineering manager The biggest internet company that you have never heard of Founded 1915 South-Africa Market cap:

More information

Overview and Practical Application of Machine Learning in Pricing

Overview and Practical Application of Machine Learning in Pricing Overview and Practical Application of Machine Learning in Pricing 2017 CAS Spring Meeting May 23, 2017 Duncan Anderson and Claudine Modlin (Willis Towers Watson) Mark Richards (Allstate Insurance Company)

More information

Data Analytics Training Program

Data Analytics Training Program Data Analytics Training Program In exclusive association with 1200+ Trainings 20,000+ Participants 10,000+ Brands 45+ Countries [Since 2009] Training partner for Who Is This Course For? Programers Willing

More information

Data Warehouse Tutorial For Beginners Sql Server 2008 Book

Data Warehouse Tutorial For Beginners Sql Server 2008 Book Data Warehouse Tutorial For Beginners Sql Server 2008 Book You've read some of the content of well-known Data Warehousing books now what? How do. Implementing a Data Warehouse with Microsoft SQL Server.

More information

What's New in MATLAB for Engineering Data Analytics?

What's New in MATLAB for Engineering Data Analytics? What's New in MATLAB for Engineering Data Analytics? Will Wilson Application Engineer MathWorks, Inc. 2017 The MathWorks, Inc. 1 Agenda Data Types Tall Arrays for Big Data Machine Learning (for Everyone)

More information

Clickbank Domination Presents. A case study by Devin Zander. A look into how absolutely easy internet marketing is. Money Mindset Page 1

Clickbank Domination Presents. A case study by Devin Zander. A look into how absolutely easy internet marketing is. Money Mindset Page 1 Presents A case study by Devin Zander A look into how absolutely easy internet marketing is. Money Mindset Page 1 Hey guys! Quick into I m Devin Zander and today I ve got something everybody loves! Me

More information

CSE 446 Bias-Variance & Naïve Bayes

CSE 446 Bias-Variance & Naïve Bayes CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework

More information

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course: DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business

More information

Instructor: Dr. Mehmet Aktaş. Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

Instructor: Dr. Mehmet Aktaş. Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Instructor: Dr. Mehmet Aktaş Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

More information

Play with Python: An intro to Data Science

Play with Python: An intro to Data Science Play with Python: An intro to Data Science Ignacio Larrú Instituto de Empresa Who am I? Passionate about Technology From Iphone apps to algorithmic programming I love innovative technology Former Entrepreneur:

More information

Intro to Stata for Political Scientists

Intro to Stata for Political Scientists Intro to Stata for Political Scientists Andrew S. Rosenberg Junior PRISM Fellow Department of Political Science Workshop Description This is an Introduction to Stata I will assume little/no prior knowledge

More information

BDD and Testing. User requirements and testing are tightly coupled

BDD and Testing. User requirements and testing are tightly coupled BDD and Testing User requirements and testing are tightly coupled 1 New Concept: Acceptance Tests Customer criteria for accepting a milestone Get paid if pass! Black-box tests specified with the customer

More information

Getting Started with Advanced Analytics in Finance, Marketing, and Operations

Getting Started with Advanced Analytics in Finance, Marketing, and Operations Getting Started with Advanced Analytics in Finance, Marketing, and Operations Southwest Regional Oracle Applications User Group Dan Vlamis February 24, 2017 @VlamisSoftware Vlamis Software Solutions Vlamis

More information

Machine Learning in Action

Machine Learning in Action Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting

More information

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230

More information

Creating publication-ready Word tables in R

Creating publication-ready Word tables in R Creating publication-ready Word tables in R Sara Weston and Debbie Yee 12/09/2016 Has this happened to you? You re working on a draft of a manuscript with your adviser, and one of her edits is something

More information

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise

More information

Introduction to Data Management. Lecture #1 (Course Trailer )

Introduction to Data Management. Lecture #1 (Course Trailer ) Introduction to Data Management Lecture #1 (Course Trailer ) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Topics v Welcome to one

More information

Data and AI LATAM 2018

Data and AI LATAM 2018 Data and AI LATAM 2018 La parte de imagen con el identificador de relación rid5 no se encontró en el archivo. La parte de imagen con el identificador de relación rid5 no se encontró en el archivo. La parte

More information

Overview of Big Data

Overview of Big Data Overview of Big Data Tools and Techniques, Discoveries and Pitfalls Spring 2018 What Does Big Data Mean? (1) Collecting large amounts of data Via computers, sensors, people, events (2) Doing something

More information

Prototyping Data Intensive Apps: TrendingTopics.org

Prototyping Data Intensive Apps: TrendingTopics.org Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page

More information

Data Mining Concepts & Tasks

Data Mining Concepts & Tasks Data Mining Concepts & Tasks Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Jan 16, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos Last Time

More information

Welcome! Power BI User Group (PUG) Copenhagen

Welcome! Power BI User Group (PUG) Copenhagen Welcome! Power BI User Group (PUG) Copenhagen Connect to Data in Power BI Desktop Just Thorning Blindbæk Consultant, Trainer and Speaker Connect to Data in Power BI Desktop Basic introduction to data connectivity

More information

Approaching the Petabyte Analytic Database: What I learned

Approaching the Petabyte Analytic Database: What I learned Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may

More information

The Mathematics Behind Neural Networks

The Mathematics Behind Neural Networks The Mathematics Behind Neural Networks Pattern Recognition and Machine Learning by Christopher M. Bishop Student: Shivam Agrawal Mentor: Nathaniel Monson Courtesy of xkcd.com The Black Box Training the

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information