Hal Varian, Google s Chief Economist The McKinsey Quarterly, Jan 2009
|
|
- Dwight Gibson
- 5 years ago
- Views:
Transcription
1
2
3
4
5 The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that s going to be a hugely important skill in the next decades, because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it. Hal Varian, Google s Chief Economist The McKinsey Quarterly, Jan 2009
6
7 Job Postings for Data Scientists
8 Top-paying Tech Skills Skill 2016 Change Skill 2016 Change Source: Dice Salary Survey 2017
9 SQL Excel Python R MySQL Python tools ggplot SQL Server Tableau JavaScript Matplotlib Java PostgreSQL Oracle D3 Homegrown Hive Spark Cloudera Visual Basic MongoDB Hadoop SAS C++ PowerPivot Scala SQLite C Pig RedShift Weka Hbase (EMR) Perl SPSS Teradata Share of Respondents 70% 60% Data Science Tools 50% 40% 30% 20% 10% 0% Tool: language, platform, analytics Source: O Reilly 2015 Data Science Salary Survey
10
11
12
13 Overview Introduction to R Working with Data Descriptive Statistics Data Visualization Beyond R and EDA
14 Introduction to R
15 What is R? Open source Language and environment Numerical and graphical analysis Cross platform
16 What is R? Active development Large user community Modular and extensible extensions and best of all
17 FREE
18 FREE
19 Source:
20
21
22
23 Code Demo
24 Working with Data
25 Working with Data
26 Working with Data
27 Working with Data
28 Working with Data
29 Working with Data
30 Working with Data Data munging Data wrangling Data cleaning Data cleansing
31 Loading Data in R
32 Loading Data in R CSV
33 Loading Data in R CSV XML
34 Loading Data in R CSV XML
35 Loading Data in R CSV XML
36 Cleaning Data
37 Cleaning Data Reshape data
38 Cleaning Data Reshape data Rename columns
39 Cleaning Data Reshape data Rename columns Convert data types
40 Cleaning Data Reshape data Rename columns Convert data types Ensure proper encoding
41 Cleaning Data Reshape data Rename columns Convert data types Ensure proper encoding Ensure internal consistency
42 Cleaning Data Reshape data Rename columns Convert data types Ensure proper encoding Ensure internal consistency Handle errors and outliers
43 Cleaning Data Reshape data Rename columns Convert data types Ensure proper encoding Ensure internal consistency Handle errors and outliers Handle missing values
44 Transforming Data
45 Transforming Data Select columns
46 Transforming Data Select columns Select rows
47 Transforming Data Select columns Select rows Group rows
48 Transforming Data Select columns Select rows Group rows Order rows
49 Transforming Data Select columns Select rows Group rows Order rows Merging data sets
50 Exporting Data File-based data Web-based data Databases Statistical data CSV XML
51 Advice for Working with Data Often difficult Time consuming TIP: Record all steps
52 Open Movies Database Title Year Rating Movies Runtime (minutes) Genre Critic Score Box Office The Whole Nine Yards 2000 R 98 Comedy 45% $57.3M Cirque du Soleil 2000 G 39 Family 45% $13.4M Gladiator 2000 R 155 Action 76% $187.3M Dinosaur 2000 PG 82 Family 65% $135.6M Big Momma's House 2000 PG Comedy 30% $0.5M
53
54
55 1. Column with wrong name 2. Rows with missing values 3. Runtime column has units 4. Revenue in multiple scales 5. Wrong file format
56 Code Demo
57
58 Descriptive Statistics
59 Descriptive Statistics Movie Runtime Statistic Value (minutes) Describe data Provides a summary aka: Summary statistics Minimum 38 1 st Quartile 93 Median 101 Mean rd Quartile 113 Maximum 219
60 Statistical Terms ID Date Customer Product Quantity John Pizza John Soda Jill Salad Jill Milk Miko Pizza Miko Soda Sam Pizza Sam Milk 1
61 Statistical Terms Observations ID Date Customer Product Quantity John Pizza John Soda Jill Salad Jill Milk Miko Pizza Miko Soda Sam Pizza Sam Milk 1
62 Statistical Terms ID Date Customer Product Quantity Observations Variables John Pizza John Soda Jill Salad Jill Milk Miko Pizza Miko Soda Sam Pizza Sam Milk 1
63 Statistical Terms ID Date Customer Product Quantity Observations Variables Categorical variables John Pizza John Soda Jill Salad Jill Milk Miko Pizza Miko Soda Sam Pizza Sam Milk 1
64 Statistical Terms ID Date Customer Product Quantity Observations Variables Categorical variables Numeric variables John Pizza John Soda Jill Salad Jill Milk Miko Pizza Miko Soda Sam Pizza Sam Milk 1
65 Number of Variables Types of Analysis One Categorical Variable One Numeric Variable Two Categorical Variables Categorical & Numeric Variable Two Numeric Variables Many Variables Type of Variable(s)
66 Number of Variables Analyzing One Categorical Variable One Categorical Variable One Numeric Variable Two Categorical Variables Categorical & Numeric Variable Two Numeric Variables Many Variables Type of Variable(s)
67 Analyzing One Categorical Variable Frequency Movies by Genre Genre Frequency Percentage Action 612 9% Adventure 496 7% Animation 168 2% Comedy % Drama % Horror 269 4%
68 Analyzing One Categorical Variable Movies by Genre Genre Frequency Percentage Action 612 9% Frequency Proportion Adventure 496 7% Animation 168 2% Comedy % Drama % Horror 269 4%
69 Number of Variables Analyzing One Numeric Variable One Categorical Variable One Numeric Variable Two Categorical Variables Categorical & Numeric Variable Two Numeric Variables Many Variables Type of Variable(s)
70 Analyzing One Numeric Variable Central tendency Dispersion Shape
71 Number of Variables Analyzing Two Categorical Variables One Categorical Variable One Numeric Variable Two Categorical Variables Categorical & Numeric Variable Two Numeric Variables Many Variables Type of Variable(s)
72 Analyzing Two Categorical Variables Joint frequency Movies by Genre and Rating Genre G PG PG-13 R Total Action Adventure Animation Comedy Drama Family Total
73 Analyzing Two Categorical Variables Movies by Genre and Rating Genre G PG PG-13 R Total Joint frequency Contingency table Action Adventure Animation Comedy Drama Family Total
74 Analyzing Two Categorical Variables Movies by Genre and Rating Genre G PG PG-13 R Total Joint frequency Contingency table Marginal frequency Action Adventure Animation Comedy Drama Family Total
75 Analyzing Two Categorical Variables Movies by Genre and Rating Genre G PG PG-13 R Total Joint frequency Contingency table Marginal frequency Relative frequency Action Adventure Animation Comedy Drama Family Total
76 Number of Variables Analyzing Two Numeric Variables One Categorical Variable One Numeric Variable Two Categorical Variables Categorical & Numeric Variable Two Numeric Variables Many Variables Type of Variable(s)
77 Analyzing Two Numeric Variables Explanatory vs. outcome Covariance Correlation
78 Number of Variables Analyzing a Numeric Variable Grouped by a Categorical Variable One Categorical Variable One Numeric Variable Two Categorical Variables Categorical & Numeric Variable Two Numeric Variables Many Variables Type of Variable(s)
79 Analyzing a Numeric Variable Grouped by a Categorical Variable One categorical variable One numeric variable Aggregate measures
80 Number of Variables Analyzing Many Variables One Categorical Variable One Numeric Variable Two Categorical Variables Categorical & Numeric Variable Two Numeric Variables Many Variables Type of Variable(s)
81
82 Cowboys & Space Invaders: The Musical Extended Edition
83 Code Demo
84 Cowboys & Space Invaders: The Musical Extended Edition
85 Data Visualization
86 Data Visualization Visual data representation
87 Data Visualization Visual data representation Human pattern recognition
88 Data Visualization Visual data representation Human pattern recognition Map dimensions to visual
89 Data Visualization ID Date Customer Product Quantity John Pizza John Soda Jill Salad Jill Milk Miko Pizza Miko Soda Sam Pizza Sam Milk 1
90 Data Visualization ID Date Customer Product Quantity John Pizza John Soda Jill Salad Jill Milk Miko Pizza Miko Soda Sam Pizza Sam Milk 1
91 Number of Variables Types of Analysis One Categorical Variable One Numeric Variable Two Categorical Variables Categorical & Numeric Variable Two Numeric Variables Many Variables Type of Variable(s)
92 Cowboys & Space Invaders: The Musical Extended Edition
93 Code Demo
94 The Warlord of the Rings : An Unexpected Adventure Feature Length PG
95 Beyond R and EDA
96 This is just the tip of the iceberg! This is just the tip of the iceberg!
97 Advanced Data Analysis with R Cluster Analysis Statistical Modeling Dimensionality Reduction Analysis of Variance (ANOVA)
98 Source: Nathan Yau (
99 Machine Learning with R
100
101 Photos by Radomił Binek, Danielle Langlois, and Frank Mayfield
102 Code Demo
103
104 Where to Go Next R website: RStudio: Revolutions: Flowing Data: R-Blogger: R-Seek:
105 Data Science with R Exploratory Data Analysis with R Data Visualization with R (3-part) Data Science: The Big Picture
106
107 Feedback Very important to me! One thing you liked? One thing I could improve?
108 Conclusion
109 Conclusion Introduction to R Working with Data Descriptive statistics Data visualization Beyond R & EDA
110 Thank You! Matthew Renze Data Science Consultant Renze Consulting Website:
Practical Data Science with #DevoxxMA
Practical Data Science with R @matthewrenze #DevoxxMA The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that s going
More informationPractical Data Science with R. Matthew Renze CFT Training for AIG
Practical Data Science with R Matthew Renze CFT Training for AIG Motivation The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate
More informationExploratory Data Analysis with R. Matthew Renze Iowa Code Camp Fall 2013
Exploratory Data Analysis with R Matthew Renze Iowa Code Camp Fall 2013 Motivation The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate
More informationACHIEVEMENTS FROM TRAINING
LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM
More informationDo we really understand SQL?
Do we really understand SQL? Leonid Libkin University of Edinburgh Joint work with Paolo Guagliardo, also from Edinburgh Basic questions We are taught that the core of SQL is essentially syntax for relational
More informationDATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:
DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business
More informationDSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Exploratory Data Analysis Dr. David Koop What is Exploratory Data Analysis? "Detective work" to summarize and explore datasets Includes: - Data acquisition and input
More informationData Science Training
Data Science Training R, Predictive Modeling, Machine Learning, Python, Bigdata & Spark 9886760678 Introduction: This is a comprehensive course which builds on the knowledge and experience a business analyst
More informationIan Choy. Technology Solutions Professional
Ian Choy Technology Solutions Professional XML KPIs SQL Server 2000 Management Studio Mirroring SQL Server 2005 Compression Policy-Based Mgmt Programmability SQL Server 2008 PowerPivot SharePoint Integration
More informationCERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)
CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program
More informationDealing with Data Especially Big Data
Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationEnd-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.
End-to-End data mining feature integration, transformation and selection with Datameer Fastest time to Insights Rapid Data Integration Zero coding data integration Wizard-led data integration & No ETL
More informationIntroducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone
Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230
More informationIntroduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)
Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data
More informationR Language for the SQL Server DBA
R Language for the SQL Server DBA Beginning with R Ing. Eduardo Castro, PhD, Principal Data Analyst Architect, LP Consulting Moderated By: Jose Rolando Guay Paz Thank You microsoft.com idera.com attunity.com
More informationKNIME for the life sciences Cambridge Meetup
KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More
More informationHadoop course content
course content COURSE DETAILS 1. In-detail explanation on the concepts of HDFS & MapReduce frameworks 2. What is 2.X Architecture & How to set up Cluster 3. How to write complex MapReduce Programs 4. In-detail
More informationData Analytics Job Guarantee Program
Data Analytics Job Guarantee Program 1. INSTALLATION OF VMWARE 2. MYSQL DATABASE 3. CORE JAVA 1.1 Types of Variable 1.2 Types of Datatype 1.3 Types of Modifiers 1.4 Types of constructors 1.5 Introduction
More informationSalary Guide. Technology Guide For further information please contact Ida Renaud, Manager ,
Salary Guide Technology Guide 2017 For further information please contact Ida Renaud, Manager + 353 1 6146048, ida.renaud@cpl.ie Technology Software Development & Mobile For 2017 there will be a big push
More informationThe Definitive Guide to Preparing Your Data for Tableau
The Definitive Guide to Preparing Your Data for Tableau Speed Your Time to Visualization If you re like most data analysts today, creating rich visualizations of your data is a critical step in the analytic
More informationThink & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)
Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) About the Speaker Dr. SubraMANI Paramasivam PhD., MCT, MCSE, MCITP, MCP, MCTS, MCSA CEO, Principal Consultant & Trainer
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationSpotfire Advanced Data Services. Lunch & Learn Tuesday, 21 November 2017
Spotfire Advanced Data Services Lunch & Learn Tuesday, 21 November 2017 CONFIDENTIALITY The following information is confidential information of TIBCO Software Inc. Use, duplication, transmission, or republication
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationOracle Big Data Discovery
Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It
More informationScaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Scaling Up HBase Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on materials
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationCitizen Data Scientist is the new Data Analyst
Welcome # T C 1 8 Citizen Data Scientist is the new Data Analyst Mehmet Vanli Sales Consultant Tableau Australia Citizen data scientist: A person who creates models that use advanced diagnostic analytics
More informationData Science with PostgreSQL
Balázs Bárány Data Scientist pgconf.de 2015 Contents Introduction What is Data Science? Process model Tools and methods of Data Scientists Business & data understanding Preprocessing Modeling Evaluation
More informationData Science and Open Source Software. Iraklis Varlamis Assistant Professor Harokopio University of Athens
Data Science and Open Source Software Iraklis Varlamis Assistant Professor Harokopio University of Athens varlamis@hua.gr What is data science? 2 Why data science is important? More data (volume, variety,...)
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationDepartment of Information Technology, St. Joseph s College (Autonomous), Trichy, TamilNadu, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 A Survey on Big Data and Hadoop Ecosystem Components
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationSTAT 1291: Data Science
STAT 1291: Data Science Lecture 20 - Summary Sungkyu Jung Semester recap data visualization data wrangling professional ethics statistical foundation Statistical modeling: Regression Cause and effect:
More informationAnalyzing Big Data with Microsoft R
Analyzing Big Data with Microsoft R 20773; 3 days, Instructor-led Course Description The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis
More informationApache Kylin. OLAP on Hadoop
Apache Kylin OLAP on Hadoop Agenda What s Apache Kylin? Tech Highlights Performance Roadmap Q & A http://kylin.io What s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite
More informationCOPYRIGHT DATASHEET
Your Path to Enterprise AI To succeed in the world s rapidly evolving ecosystem, companies (no matter what their industry or size) must use data to continuously develop more innovative operations, processes,
More informationSQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism
Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and
More informationData Lake Based Systems that Work
Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationData Foundations. Topic Objectives. and list subcategories of each. its properties. before producing a visualization. subsetting
CS 725/825 Information Visualization Fall 2013 Data Foundations Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs725-f13/ Topic Objectives! Distinguish between ordinal and nominal values and list
More informationData Visualization Tools & Techniques
Data Visualization Tools & Techniques John Brosz Research Data & Visualization Coordinator Renée Reaume Digital Media & Technical Services Director http://bit.ly/tfdlvis Plan for Today 9:00 Intro 9:30
More informationOverview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::
Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional
More informationDr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R
Dr. SubraMANI Paramasivam Think & Work like a Data Scientist with SQL 2016 & R About the Speaker Group Leader Dr. SubraMANI Paramasivam PhD., MVP, MCT, MCSE (x2), MCITP (x2), MCP, MCTS (x3), MCSA CEO,
More informationOverview. Audience profile. At course completion. Course Outline. : 20773A: Analyzing Big Data with Microsoft R. Course Outline :: 20773A::
Module Title Duration : 20773A: Analyzing Big Data with Microsoft R : 3 days Overview The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis
More informationPython & Spark PTT18/19
Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes Härtel Msc. Marcel Heinz The Big Picture [Aggarwal15] Plenty of Building Blocks are involved in this Big Picture Back to the Big Picture [Aggarwal15]
More informationProbability and Statistics. Copyright Cengage Learning. All rights reserved.
Probability and Statistics Copyright Cengage Learning. All rights reserved. 14.5 Descriptive Statistics (Numerical) Copyright Cengage Learning. All rights reserved. Objectives Measures of Central Tendency:
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationData analysis using Microsoft Excel
Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data
More informationThe TIBCO Insight Platform 1. Data on Fire 2. Data to Action. Michael O Connell Catalina Herrera Peter Shaw September 7, 2016
The TIBCO Insight Platform 1. Data on Fire 2. Data to Action Michael O Connell Catalina Herrera Peter Shaw September 7, 2016 Analytics Journey with TIBCO Source: Gartner (May 2015) The TIBCO Insight Platform:
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationData Science. Data Analyst. Data Scientist. Data Architect
Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &
More informationHadoop: The Definitive Guide PDF
Hadoop: The Definitive Guide PDF Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, youâ ll learn how to build and maintain reliable, scalable, distributed
More informationAntrix Academy of Data Science TM
TM Preparing for MOST Promising Career Opportunities in Data Analytics... Excel Tableau SAS Excel & SQL IBM SPSS Business Analytics COURSES # Duration* 1 Excel Proficiency 5 Hrs 2 Data Analytics with SAS
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationHDInsight > Hadoop. October 12, 2017
HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond
More informationWebinar Series TMIP VISION
Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing
More informationIvy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)
Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V) Based on Industry Cases, Live Exercises, & Industry Executed Projects Module (I) Analytics Essentials 81 hrs 1. Statistics
More informationCloudExpo November 2017 Tomer Levi
CloudExpo November 2017 Tomer Levi About me Full Stack Engineer @ Intel s Advanced Analytics group. Artificial Intelligence unit at Intel. Responsible for (1) Radical improvement of critical processes
More informationNew Approaches to Big Data Processing and Analytics
New Approaches to Big Data Processing and Analytics Contributing authors: David Floyer, David Vellante Original publication date: February 12, 2013 There are number of approaches to processing and analyzing
More informationRead & Download (PDF Kindle) Pro Apache Hadoop
Read & Download (PDF Kindle) Pro Apache Hadoop Pro Apache Hadoop, Second Edition brings you up to speed on Hadoop â the framework of big data. Revised to cover Hadoop 2.0, the book covers the very latest
More informationData Architectures in Azure for Analytics & Big Data
Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A
More informationDATA 301 Introduction to Data Analytics Visualization. Dr. Ramon Lawrence University of British Columbia Okanagan
DATA 301 Introduction to Data Analytics Visualization Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca DATA 301: Data Analytics (2) Why learn Visualization? Visualization
More informationSpotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data
Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationExtract API: Build sophisticated data models with the Extract API
Welcome # T C 1 8 Extract API: Build sophisticated data models with the Extract API Justin Craycraft Senior Sales Consultant Tableau / Customer Consulting My Office Photo Used with permission Agenda 1)
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationThis is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.
About the Tutorial Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and
More informationHow App Ratings and Reviews Impact Rank on Google Play and the App Store
APP STORE OPTIMIZATION MASTERCLASS How App Ratings and Reviews Impact Rank on Google Play and the App Store BIG APPS GET BIG RATINGS 13,927 AVERAGE NUMBER OF RATINGS FOR TOP-RATED IOS APPS 196,833 AVERAGE
More informationImporting and Exporting Data Between Hadoop and MySQL
Importing and Exporting Data Between Hadoop and MySQL + 1 About me Sarah Sproehnle Former MySQL instructor Joined Cloudera in March 2010 sarah@cloudera.com 2 What is Hadoop? An open-source framework for
More informationSlice Intelligence!
Intern @ Slice Intelligence! Wei1an(Wu( September(8,(2014( Outline!! Details about the job!! Skills required and learned!! My thoughts regarding the internship! About the company!! Slice, which we call
More informationInterview Questions And Answers For Experienced Candidates In Php Mysql
Interview Questions And Answers For Experienced Candidates In Php Mysql We have selected PHP Technology Questions and Answers, PHP Interview Questions and their Solution and PHP Tutorial for all levels
More informationShen PingCAP 2017
Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL
More informationA REVIEW PAPER ON BIG DATA ANALYTICS
A REVIEW PAPER ON BIG DATA ANALYTICS Kirti Bhatia 1, Lalit 2 1 HOD, Department of Computer Science, SKITM Bahadurgarh Haryana, India bhatia.kirti.it@gmail.com 2 M Tech 4th sem SKITM Bahadurgarh, Haryana,
More informationData in the Cloud and Analytics in the Lake
Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationTalent Trends Research Division
Talent Trends Research Division North Carolina IT Jobs Research & Analysis Requestors: Dara West, Veterans Program Manager Michael Veysey, Director Corporate Affairs William Blackstorm, Talent Trends Manager
More informationVocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.
5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table
More informationUsing Data Science to deliver Workforce & Labour Market Insights. Gary Gan Co-Founder, JobKred
Using Data Science to deliver Workforce & Labour Market Insights Gary Gan Co-Founder, JobKred Collection of Data Online Sources Skills, Education, Experience AI-powered Career Development Platform Cloud-based
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationAzure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD
Azure Data Factory VS. SSIS Reza Rad, Consultant, RADACAD 2 Please silence cell phones Explore Everything PASS Has to Offer FREE ONLINE WEBINAR EVENTS FREE 1-DAY LOCAL TRAINING EVENTS VOLUNTEERING OPPORTUNITIES
More informationAnalytics & Sport Data
Analytics & Sport Data Could Neymar s Injury be Prevented? SAS Data Preparation https://www.sas.com/en_gb/events/2017/ebooster-sas-partners.html AGENDA Player Performance Monitoring SAS Data Preparation
More informationEntry Name: "INRIA-Perin-MC1" VAST 2013 Challenge Mini-Challenge 1: Box Office VAST
Entry Name: "INRIA-Perin-MC1" VAST 2013 Challenge Mini-Challenge 1: Box Office VAST Team Members: Charles Perin, INRIA, Univ. Paris-Sud, CNRS-LIMSI, charles.perin@inria.fr PRIMARY Student Team: YES Analytic
More informationData Wrangling in the Tidyverse
Data Wrangling in the Tidyverse 21 st Century R DS Portugal Meetup, at Farfetch, Porto, Portugal April 19, 2017 Jim Porzak Data Science for Customer Insights 4/27/2017 1 Outline 1. A very quick introduction
More informationThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,
More informationPrincipal Software Engineer Red Hat Emerging Technology June 24, 2015
USING APACHE SPARK FOR ANALYTICS IN THE CLOUD William C. Benton Principal Software Engineer Red Hat Emerging Technology June 24, 2015 ABOUT ME Distributed systems and data science in Red Hat's Emerging
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationGETTING STARTED WITH DATA MINING
GETTING STARTED WITH DATA MINING Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIR Forum 2017 Washington, D.C. 1 Using Data
More informationIndex. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225
Index A Anonymous function, 66 Apache Hadoop, 1 Apache HBase, 42 44 Apache Hive, 6 7, 230 Apache Kafka, 8, 178 Apache License, 7 Apache Mahout, 5 Apache Mesos, 38 42 Apache Pig, 7 Apache Spark, 9 Apache
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationJoe Hummel, PhD. Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago
Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials: http://www.joehummel.net/downloads.html Email: joe@joehummel.net
More information