Science Cookbook. Practical Data. open source community experience distilled. Benjamin Bengfort. science projects in R and Python.

Size: px
Start display at page:

Download "Science Cookbook. Practical Data. open source community experience distilled. Benjamin Bengfort. science projects in R and Python."

Transcription

1 Practical Data Science Cookbook 89 hands-on recipes to help you complete real-world data science projects in R and Python Tony Ojeda Sean Patrick Murphy Benjamin Bengfort Abhijit Dasgupta PUBLISHING open source community experience distilled BIRMINGHAM MUMBAI -

2 Preface 1 Chapter 1: Preparing Your Data Science Environment 7 Introduction 7 Understanding the data science pipeline 9 Installing R on Windows, Mac OS X, and Linux 11 Installing libraries in R and RStudio 14 Installing Python on Linux and Mac OS X 17 Installing Python on Windows 18 Installing the Python data stack on Mac OS X and Linux 21 Installing extra Python packages 24 Installing and using virtualenv 26 Chapter 2: Driving Visual Analysis with Automobile Data (R) 31 Introduction 31 Acquiring automobile fuel efficiency data 32 Preparing R for your first project 34 Importing automobile fuel efficiency data into R 35 Exploring and describing fuel efficiency data 38 Analyzing automobile fuel efficiency over time 43 Investigating the makes and models of automobiles 54 Chapter 3: Simulating American Football Data (R) 59 Introduction 59 Acquiring and cleaning football data 61 Analyzing and understanding football data 65 Constructing Indexes to measure offensive and defensive strength 74 Simulating a single game with outcomes decided by calculations 77 Simulating multiple games with outcomes decided by calculations 81

3 Chapter 4: Modeling Stock Market Data (R) 89 Introduction 89 Acquiring stock market data 91 Summarizing the data 93 Cleaning and exploring the data 96 Generating relative valuations 103 Screening stocks and analyzing historical prices 109 Chapter 5: Visually Exploring Employment Data (R) 117 Introduction 118 Preparing for analysis 119 Importing employment data into R 121 Exploring the employment data 123 Obtaining and merging additional data 125 Adding geographical information 129 Extracting state- and county-level wage and employment information 133 Visualizing geographical distributions of pay 136 Exploring where the jobs are, by industry 140 Animating maps for a geospatial time series 143 Benchmarking performance for some common tasks 149 Chapter 6: Creating Application-oriented Analyses Using Tax Data (Python) 153 Introduction 153 Preparing for the analysis of top incomes 155 Importing and exploring the world's top incomes dataset 156 Analyzing and visualizing the top income data of the US 165 Furthering the analysis of the top income groups of the US 174 Reporting with Jinja2 179 Chapter 7: Driving Visual Analyses with Automobile Data (Python) 187 Introduction 187 Getting started with I Python 188 Exploring I Python Notebook 191 Preparing to analyze automobile fuel efficiencies 196 Exploring and describing fuel efficiency data with Python 199 Analyzing automobile fuel efficiency over time with Python 202 Investigating the makes and models of automobiles with Python 211 Chapter 8: Working with Social Graphs (Python) 217 Introduction 217 Preparing to work with social networks in Python 220 Importing networks 222 HjD

4 Exploring subgraphs within a heroic network 225 Finding strong ties 230 Finding key players 234 Exploring the characteristics of entire networks 246 Clustering and community detection in social networks 248 Visualizing graphs 254 Chapter 9: Recommending Movies at Scale (Python) 259 Introduction 260 Modeling preference expressions 261 Understanding the data 263 Ingesting the movie review data 266 Finding the highest-scoring movies 270 Improving the movie-rating system 273 Measuring the distance between users in the preference space 276 Computing the correlation between users 280 Finding the best critic for a user 282 Predicting movie ratings for users 285 Collaboratively filtering item by item 288 Building a nonnegative matrix factorization model 292 Loading the entire dataset into the memory 295 Dumping the SVD-based model to the disk 298 Training the SVD-based model 300 Testing the SVD-based model 303 Chapter 10: Harvesting and Geolocating Twitter Data (Python) 307 Introduction 308 Creating a Twitter application 309 Understanding the Twitter API vl.l 312 Determining your Twitter followers and friends 317 Pulling Twitter user profiles 320 Making requests without running afoul of Twitter's rate limits 322 Storing JSON data to the disk 323 Setting up MongoDB for storing Twitter data 325 Storing user profiles in MongoDB using PyMongo 327 Exploring the geographic information available in profiles 330 Plotting geospatlal data in Python 333 Chapter 11: Optimizing Numerical Code with NumPy and SciPv (Python) 339 Introduction 340 Understanding the optimization process 341 Identifying common performance bottlenecks in code 343 pin-

5 Reading through the code 346 Profiling Python code with the Unix time function 349 Profiling Python code using built-in Python functions 350 Profiling Python code using IPython's %tlmelt function 352 Profiling Python code using line_profller 354 Plucking the low-hanging (optimization) fruit 356 Testing the performance benefits of NumPy 359 Rewriting simple functions with NumPy 362 Optimizing the innermost loop with NumPy 366 Index 371 -DD

Free ebooks ==>

Free ebooks ==> Free ebooks ==> www.ebook777.com www.ebook777.com Free ebooks ==> www.ebook777.com www.ebook777.com Practical Data Science Cookbook 89 hands-on recipes to help you complete real-world data science projects

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Certified Data Science with Python Professional VS-1442

Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional Certified Data Science with Python Professional Certification Code VS-1442 Data science has become

More information

Conda Documentation. Release latest

Conda Documentation. Release latest Conda Documentation Release latest August 09, 2015 Contents 1 Installation 3 2 Getting Started 5 3 Building Your Own Packages 7 4 Getting Help 9 5 Contributing 11 i ii Conda Documentation, Release latest

More information

Analyzing Big Data with Microsoft R

Analyzing Big Data with Microsoft R Analyzing Big Data with Microsoft R 20773; 3 days, Instructor-led Course Description The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Overview. Audience profile. At course completion. Course Outline. : 20773A: Analyzing Big Data with Microsoft R. Course Outline :: 20773A::

Overview. Audience profile. At course completion. Course Outline. : 20773A: Analyzing Big Data with Microsoft R. Course Outline :: 20773A:: Module Title Duration : 20773A: Analyzing Big Data with Microsoft R : 3 days Overview The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis

More information

THE DATA ANALYTICS BOOT CAMP

THE DATA ANALYTICS BOOT CAMP THE DATA ANALYTICS BOOT CAMP CURRICULUM OVERVIEW Over the course of the past decade, the explosion of data has transformed nearly every industry known to man. Whether it s in marketing, healthcare, government,

More information

Index. Bessel function, 51 Big data, 1. Cloud-based version-control system, 226 Containerization, 30 application, 32 virtualize processes, 30 31

Index. Bessel function, 51 Big data, 1. Cloud-based version-control system, 226 Containerization, 30 application, 32 virtualize processes, 30 31 Index A Amazon Web Services (AWS), 2 account creation, 2 EC2 instance creation, 9 Docker, 13 IP address, 12 key pair, 12 launch button, 11 security group, 11 stable Ubuntu server, 9 t2.micro type, 9 10

More information

DATA SCIENCE NORTHWESTERN BOOT CAMP CURRICULUM OVERVIEW DATA SCIENCE BOOT CAMP

DATA SCIENCE NORTHWESTERN BOOT CAMP CURRICULUM OVERVIEW DATA SCIENCE BOOT CAMP DATA SCIENCE BOOT CAMP NORTHWESTERN DATA SCIENCE BOOT CAMP CURRICULUM OVERVIEW Over the past decade, the explosion of data has transformed nearly every industry known to man. Whether it s marketing, healthcare,

More information

UCF DATA ANALYTICS AND VISUALIZATION BOOT CAMP

UCF DATA ANALYTICS AND VISUALIZATION BOOT CAMP UCF DATA ANALYTICS AND VISUALIZATION BOOT CAMP CURRICULUM OVERVIEW Over the past decade, the explosion of data has transformed nearly every industry known to man. Whether it s marketing, healthcare, government,

More information

Homework: Building an Apache-Solr based Search Engine for DARPA XDATA Employment Data Due: November 10 th, 12pm PT

Homework: Building an Apache-Solr based Search Engine for DARPA XDATA Employment Data Due: November 10 th, 12pm PT Homework: Building an Apache-Solr based Search Engine for DARPA XDATA Employment Data Due: November 10 th, 12pm PT 1. Overview This assignment picks up where the last one left off. You will take your JSON

More information

Analytics Platform for ATLAS Computing Services

Analytics Platform for ATLAS Computing Services Analytics Platform for ATLAS Computing Services Ilija Vukotic for the ATLAS collaboration ICHEP 2016, Chicago, USA Getting the most from distributed resources What we want To understand the system To understand

More information

DATA ANALYTICS BOOT CAMP

DATA ANALYTICS BOOT CAMP The UofT SCS DATA ANALYTICS BOOT CAMP Curriculum Overview Over the past decade, the explosion of data has transformed nearly every industry known to man. Whether it s marketing, healthcare, government,

More information

Using the Force of Python and SAS Viya on Star Wars Fan Posts

Using the Force of Python and SAS Viya on Star Wars Fan Posts SESUG Paper BB-170-2017 Using the Force of Python and SAS Viya on Star Wars Fan Posts Grace Heyne, Zencos Consulting, LLC ABSTRACT The wealth of information available on the Internet includes useful and

More information

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian Demystifying movie ratings 224W Project Report Amritha Raghunath (amrithar@stanford.edu) Vignesh Ganapathi Subramanian (vigansub@stanford.edu) 9 December, 2014 Introduction The past decade or so has seen

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:

More information

Python for Data Analysis

Python for Data Analysis Python for Data Analysis Wes McKinney O'REILLY 8 Beijing Cambridge Farnham Kb'ln Sebastopol Tokyo Table of Contents Preface xi 1. Preliminaries " 1 What Is This Book About? 1 Why Python for Data Analysis?

More information

National College of Ireland Project Submission Sheet 2015/2016 School of Computing

National College of Ireland Project Submission Sheet 2015/2016 School of Computing National College of Ireland Project Submission Sheet 2015/2016 School of Computing Student Name: Sean McNally Student ID: 15021581 Programme: MSc Data Analytics Year: 2015-2016 Module: Supervisor: Configuration

More information

At the University we see a wide variety Focusing on free. 1. Preparing Data 2. Visualization

At the University we see a wide variety Focusing on free. 1. Preparing Data 2. Visualization At the University we see a wide variety Focusing on free 1. Preparing Data 2. Visualization http://vis.stanford.edu/wrangler https://www.trifacta.com Interactive tool for cleaning & rearranging Suggests

More information

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Creating a Recommender System. An Elasticsearch & Apache Spark approach Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused

More information

Comparing R and Python for PCA PyData Boston 2013

Comparing R and Python for PCA PyData Boston 2013 Vipin Sachdeva Senior Engineer, IBM Research Comparing R and Python for PCA PyData Boston 2013 Comparison of R and Python for Principal Component Analysis R and Python are popular choices for data analysis.

More information

Blurring the Line Between Developer and Data Scientist

Blurring the Line Between Developer and Data Scientist Blurring the Line Between Developer and Data Scientist Notebooks with PixieDust va barbosa va@us.ibm.com Developer Advocacy IBM Watson Data Platform WHY ARE YOU HERE? More companies making bet-the-business

More information

zap Documentation Release 1.0.dev86 Kurt Soto

zap Documentation Release 1.0.dev86 Kurt Soto zap Documentation Release 1.0.dev86 Kurt Soto February 03, 2016 Contents 1 Installation 3 1.1 Requirements............................................... 3 1.2 Steps...................................................

More information

Data Science with Python Course Catalog

Data Science with Python Course Catalog Enhance Your Contribution to the Business, Earn Industry-recognized Accreditations, and Develop Skills that Help You Advance in Your Career March 2018 www.iotintercon.com Table of Contents Syllabus Overview

More information

Facebook data extraction using R & process in Data Lake

Facebook data extraction using R & process in Data Lake Facebook data extraction using R & process in Data Lake An approach to understand how retail companie B s y G c a a ut n am p Go e sw rf a o m r i m Facebook data mining to analyze customers behavioral

More information

About Intellipaat. About the Course. Why Take This Course?

About Intellipaat. About the Course. Why Take This Course? About Intellipaat Intellipaat is a fast growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 700,000 in over

More information

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

SQL Server Machine Learning Marek Chmel & Vladimir Muzny SQL Server Machine Learning Marek Chmel & Vladimir Muzny @VladimirMuzny & @MarekChmel MCTs, MVPs, MCSEs Data Enthusiasts! vladimir@datascienceteam.cz marek@datascienceteam.cz Session Agenda Machine learning

More information

Introduction to Data Analytics. David Walling

Introduction to Data Analytics. David Walling Introduction to Data Analytics David Walling walling@tacc.utexas.edu Source: http://research.microsoft.com/en-us/collaboration/fourthparadigm/default.aspx Computational Simulation Model first, given initial

More information

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core) Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data

More information

Scientific computing platforms at PGI / JCNS

Scientific computing platforms at PGI / JCNS Member of the Helmholtz Association Scientific computing platforms at PGI / JCNS PGI-1 / IAS-1 Scientific Visualization Workshop Josef Heinen Outline Introduction Python distributions The SciPy stack Julia

More information

Weekly Discussion Sections & Readings

Weekly Discussion Sections & Readings Weekly Discussion Sections & Readings Teaching Fellows (TA) Name Office Email Mengting Gu Bass 437 mengting.gu (at) yale.edu Paul Muir Bass437 Paul.muir (at) yale.edu Please E-mail cbb752@gersteinlab.org

More information

Lotus IT Hub. Module-1: Python Foundation (Mandatory)

Lotus IT Hub. Module-1: Python Foundation (Mandatory) Module-1: Python Foundation (Mandatory) What is Python and history of Python? Why Python and where to use it? Discussion about Python 2 and Python 3 Set up Python environment for development Demonstration

More information

QMiner is a data analytics platform for processing large-scale real-time streams containing structured and unstructured data.

QMiner is a data analytics platform for processing large-scale real-time streams containing structured and unstructured data. Data analytics with QMiner This topic provides a practical insights on data analytics using QMiner. QMiner implements a comprehensive set of techniques for supervised, unsupervised and active learning

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

RethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu

RethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu RethinkDB Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu Content Introduction System Features Data Model ReQL Applications Introduction Niharika Vithala What is a NoSQL Database Databases that

More information

Functionality, Challenges and Architecture of Social Networks

Functionality, Challenges and Architecture of Social Networks Functionality, Challenges and Architecture of Social Networks INF 5370 Outline Social Network Services Functionality Business Model Current Architecture and Scalability Challenges Conclusion 1 Social Network

More information

MeerKAT Data Architecture. Simon Ratcliffe

MeerKAT Data Architecture. Simon Ratcliffe MeerKAT Data Architecture Simon Ratcliffe MeerKAT Signal Path MeerKAT Data Rates Online System The online system receives raw visibilities from the correlator at a sufficiently high dump rate to facilitate

More information

ClaNC: The Manual (v1.1)

ClaNC: The Manual (v1.1) ClaNC: The Manual (v1.1) Alan R. Dabney June 23, 2008 Contents 1 Installation 3 1.1 The R programming language............................... 3 1.2 X11 with Mac OS X....................................

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

Introduction to Programming with Python 3, Ami Gates. Chapter 1: Creating a Programming Environment

Introduction to Programming with Python 3, Ami Gates. Chapter 1: Creating a Programming Environment Introduction to Programming with Python 3, Ami Gates Chapter 1: Creating a Programming Environment 1.1: Python, IDEs, Libraries, Packages, and Platforms A first step to learning and using any new programming

More information

Chapter 3 Computer Software

Chapter 3 Computer Software Chapter 3 Computer Software Learning Objectives LO3.1: Explain system software and operating systems LO3.2: Identify operating systems for desktop PCs LO3.3: Identify operating systems for handheld PCs

More information

MOSEK Optimization Suite

MOSEK Optimization Suite MOSEK Optimization Suite Release 8.1.0.72 MOSEK ApS 2018 CONTENTS 1 Overview 1 2 Interfaces 5 3 Remote optimization 11 4 Contact Information 13 i ii CHAPTER ONE OVERVIEW The problem minimize 1x 1 + 2x

More information

Build a system health check for Db2 using IBM Machine Learning for z/os

Build a system health check for Db2 using IBM Machine Learning for z/os Build a system health check for Db2 using IBM Machine Learning for z/os Jonathan Sloan Senior Analytics Architect, IBM Analytics Agenda A brief machine learning overview The Db2 ITOA model solutions template

More information

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second

More information

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014 CS15-319 / 15-619 Cloud Computing Recitation 3 September 9 th & 11 th, 2014 Overview Last Week s Reflection --Project 1.1, Quiz 1, Unit 1 This Week s Schedule --Unit2 (module 3 & 4), Project 1.2 Questions

More information

SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics

SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics Min LI,, Jian Tan, Yandong Wang, Li Zhang, Valentina Salapura, Alan Bivens IBM TJ Watson Research Center * A

More information

Know your neighbours: Machine Learning on Graphs

Know your neighbours: Machine Learning on Graphs Know your neighbours: Machine Learning on Graphs Andrew Docherty Senior Research Engineer andrew.docherty@data61.csiro.au www.data61.csiro.au 2 Graphs are Everywhere Online Social Networks Transportation

More information

Course Syllabus. Course Information

Course Syllabus. Course Information Course Syllabus Course Information Course: MIS 6V99 Special Topics Programming for Data Science Section: 5U1 Term: Summer 2017 Meets: Friday, 6:00 pm to 10:00 pm, JSOM 2.106 Note: Beginning Fall 2017,

More information

CIS : Scalable Data Analysis

CIS : Scalable Data Analysis CIS 602-01: Scalable Data Analysis Visualization Dr. David Koop Growth of Data 2 Usefulness of Data 3 Analyzed Data 4 Example Data Sources Radio Telescopes Twitter Wind Turbine Sensors Surveillance Cameras

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

Machine Learning in Action

Machine Learning in Action Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting

More information

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab Recommender System What is it? How to build it? Challenges R package: recommenderlab 1 What is a recommender system Wiki definition: A recommender system or a recommendation system (sometimes replacing

More information

How s your Sports ESP? Using SAS Event Stream Processing with SAS Visual Analytics to Analyze Sports Data

How s your Sports ESP? Using SAS Event Stream Processing with SAS Visual Analytics to Analyze Sports Data Paper SAS638-2017 How s your Sports ESP? Using SAS Event Stream Processing with SAS Visual Analytics to Analyze Sports Data ABSTRACT John Davis, SAS Institute Inc. In today's instant information society,

More information

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Command Line and Python Introduction Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Today Assignment #1! Computer architecture Basic command line skills Python fundamentals

More information

Econ 430 Lecture 3: Significance and Structural Properties of N

Econ 430 Lecture 3: Significance and Structural Properties of N Econ 430 Lecture 3: Significance and Structural Properties of Networks Alper Duman Izmir University Economics, March 8, 2013 Prevalence of Networks Networks are everywhere! Even in this class. We can classify

More information

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition What s the BIG deal?! 2011 2011 2008 2010 2012 What s the BIG deal?! (Gartner Hype Cycle) What s the

More information

SciSpark 201. Searching for MCCs

SciSpark 201. Searching for MCCs SciSpark 201 Searching for MCCs Agenda for 201: Access your SciSpark & Notebook VM (personal sandbox) Quick recap. of SciSpark Project What is Spark? SciSpark Extensions scitensor: N-dimensional arrays

More information

4th Quarter Communicating with Fans and Advertisers Using Databases

4th Quarter Communicating with Fans and Advertisers Using Databases 4th Quarter Communicating with Fans and Advertisers Using Databases You did a great job publicizing your dream team around town with the presentations. The whole town is excited! In the 4th quarter you

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

A Tutorial on Apache Spark

A Tutorial on Apache Spark A Tutorial on Apache Spark A Practical Perspective By Harold Mitchell The Goal Learning Outcomes The Goal Learning Outcomes NOTE: The setup, installation, and examples assume Windows user Learn the following:

More information

Seeking Supernovae in the Clouds: A Performance Study

Seeking Supernovae in the Clouds: A Performance Study Seeking Supernovae in the Clouds: A Performance Study Keith R. Jackson, Lavanya Ramakrishnan, Karl J. Runge, Rollin C. Thomas Lawrence Berkeley National Laboratory Why Do I Care About Supernovae? The rate

More information

Hal Varian, Google s Chief Economist The McKinsey Quarterly, Jan 2009

Hal Varian, Google s Chief Economist The McKinsey Quarterly, Jan 2009 The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that s going to be a hugely important skill in the next decades, because

More information

Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research Center

Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research Center Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research Center Ed Bulliner U.S. Geological Survey, Columbia Environmental Research Center

More information

Let s Review Lesson 2!

Let s Review Lesson 2! What is Technology Teachers and Discovering Why it so Important Computers in Integrating Technology and Education Today? Digital Media in the Classroom 5 th Edition Let s Review Lesson 2! Wheel of Terms

More information

Python Quant Platform

Python Quant Platform Python Quant Platform Web-based Financial Analytics and Rapid Financial Engineering with Python Yves Hilpisch The Python Quant Platform offers Web-based, scalable, collaborative financial analytics and

More information

Intel Distribution for Python* и Intel Performance Libraries

Intel Distribution for Python* и Intel Performance Libraries Intel Distribution for Python* и Intel Performance Libraries 1 Motivation * L.Prechelt, An empirical comparison of seven programming languages, IEEE Computer, 2000, Vol. 33, Issue 10, pp. 23-29 ** RedMonk

More information

Towards a Cross- Disciplinary Pedagogy for Big Data. Joshua Eckroth Math/CS Department Stetson University CCSC- Eastern 2015

Towards a Cross- Disciplinary Pedagogy for Big Data. Joshua Eckroth Math/CS Department Stetson University CCSC- Eastern 2015 Towards a Cross- Disciplinary Pedagogy for Big Data Joshua Eckroth Math/CS Department Stetson University CCSC- Eastern 2015 What is big data? Data mining and analysis require big data techniques when

More information

Automation.

Automation. Automation www.austech.edu.au WHAT IS AUTOMATION? Automation testing is a technique uses an application to implement entire life cycle of the software in less time and provides efficiency and effectiveness

More information

Data Analytics Training Program

Data Analytics Training Program Data Analytics Training Program In exclusive association with 1200+ Trainings 20,000+ Participants 10,000+ Brands 45+ Countries [Since 2009] Training partner for Who Is This Course For? Programers Willing

More information

Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies

Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies Joseph Coughlin, Rohit Mital, Shashi Nittur, Benjamin SanNicolas, Christian Wolf, Rinor Jusufi Stinger

More information

Introduction to R: Part I

Introduction to R: Part I Introduction to R: Part I Jeffrey C. Miecznikowski March 26, 2015 R impact R is the 13th most popular language by IEEE Spectrum (2014) Google uses R for ROI calculations Ford uses R to improve vehicle

More information

A data-driven framework for archiving and exploring social media data

A data-driven framework for archiving and exploring social media data A data-driven framework for archiving and exploring social media data Qunying Huang and Chen Xu Yongqi An, 20599957 Oct 18, 2016 Introduction Social media applications are widely deployed in various platforms

More information

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44 Data Mining Piotr Paszek piotr.paszek@us.edu.pl Introduction (Piotr Paszek) Data Mining DM KDD 1 / 44 Plan of the lecture 1 Data Mining (DM) 2 Knowledge Discovery in Databases (KDD) 3 CRISP-DM 4 DM software

More information

Operator Combination and Control

Operator Combination and Control Operator Combination and Control Introduction Orchestrate Shell (OSH), a scripting language used to create a parallel job application, is integrated with the DataStage Suite as Parallel Extender (now called

More information

Developer Internship Opportunity at I-CC

Developer Internship Opportunity at I-CC Developer Internship Opportunity at I-CC Who We Are: Technology company building next generation publishing and e-commerce solutions Aiming to become a leading European Internet technology company by 2015

More information

Network community detection with edge classifiers trained on LFR graphs

Network community detection with edge classifiers trained on LFR graphs Network community detection with edge classifiers trained on LFR graphs Twan van Laarhoven and Elena Marchiori Department of Computer Science, Radboud University Nijmegen, The Netherlands Abstract. Graphs

More information

Using WebSphere Application Server Optimized Local Adapters (WOLA) to Integrate COBOL and zaap-able Java

Using WebSphere Application Server Optimized Local Adapters (WOLA) to Integrate COBOL and zaap-able Java Using WebSphere Application Server Optimized Local Adapters (WOLA) to Integrate COBOL and zaap-able Java David Follis IBM March 12, 2014 Session Number 14693 Insert Custom Session QR if Desired. Trademarks

More information

A MODEL FOR COMPARATIVE ANALYSIS OF THE SIMILARITY BETWEEN ANDROID AND IOS OPERATING SYSTEMS

A MODEL FOR COMPARATIVE ANALYSIS OF THE SIMILARITY BETWEEN ANDROID AND IOS OPERATING SYSTEMS Bulletin of the Transilvania University of Braşov Series V: Economic Sciences Vol. 7 (56) No. 2-2014 A MODEL FOR COMPARATIVE ANALYSIS OF THE SIMILARITY BETWEEN ANDROID AND IOS OPERATING SYSTEMS R. LIXĂNDROIU

More information

mongodb-tornado-angular Documentation

mongodb-tornado-angular Documentation mongodb-tornado-angular Documentation Release 0.1.1 David Levy February 22, 2017 Contents 1 Installation 3 1.1 linux/mac................................................. 3 1.2 Python3.x.................................................

More information

Jenkins: A complete solution. From Continuous Integration to Continuous Delivery For HSBC

Jenkins: A complete solution. From Continuous Integration to Continuous Delivery For HSBC Jenkins: A complete solution From Integration to Delivery For HSBC Rajesh Kumar DevOps Architect @RajeshKumarIN www.rajeshkumar.xyz Agenda Why Jenkins? Introduction and some facts about Jenkins Supported

More information

Home of Redis. Redis for Fast Data Ingest

Home of Redis. Redis for Fast Data Ingest Home of Redis Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast Data Ingest Pub/Sub List Sorted Sets as a Time Series Database The Demo Scaling with Redis e Flash 2 Fast

More information

CS 224W Final Report Group 37

CS 224W Final Report Group 37 1 Introduction CS 224W Final Report Group 37 Aaron B. Adcock Milinda Lakkam Justin Meyer Much of the current research is being done on social networks, where the cost of an edge is almost nothing; the

More information

Esri and MarkLogic: Location Analytics, Multi-Model Data

Esri and MarkLogic: Location Analytics, Multi-Model Data Esri and MarkLogic: Location Analytics, Multi-Model Data Ben Conklin, Industry Manager, Defense, Intel and National Security, Esri Anthony Roach, Product Manager, MarkLogic James Kerr, Technical Director,

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

Excel Assignment 4: Correlation and Linear Regression (Office 2016 Version)

Excel Assignment 4: Correlation and Linear Regression (Office 2016 Version) Economics 225, Spring 2018, Yang Zhou Excel Assignment 4: Correlation and Linear Regression (Office 2016 Version) 30 Points Total, Submit via ecampus by 8:00 AM on Tuesday, May 1, 2018 Please read all

More information

Cisco Tetration Analytics

Cisco Tetration Analytics Cisco Tetration Analytics Enhanced security and operations with real time analytics John Joo Tetration Business Unit Cisco Systems Security Challenges in Modern Data Centers Securing applications has become

More information

CPSC 67 Lab #5: Clustering Due Thursday, March 19 (8:00 a.m.)

CPSC 67 Lab #5: Clustering Due Thursday, March 19 (8:00 a.m.) CPSC 67 Lab #5: Clustering Due Thursday, March 19 (8:00 a.m.) The goal of this lab is to use hierarchical clustering to group artists together. Once the artists have been clustered, you will calculate

More information

Knowledge Discovery and Data Mining 1 (KU)

Knowledge Discovery and Data Mining 1 (KU) Knowledge Discovery and Data Mining 1 (KU) Simon Walk IICM, TU Graz October 22, 2015 Simon Walk (IICM) KDDM1 October 22, 2015 1 / 11 KDDM 1 (KU) - Introduction Introduction Institute for Information Systems

More information

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning

More information

Data Science Bootcamp Curriculum. NYC Data Science Academy

Data Science Bootcamp Curriculum. NYC Data Science Academy Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations

More information

Big Data, Right Tools: Computational Resources for Empirical Research 2014

Big Data, Right Tools: Computational Resources for Empirical Research 2014 Big Data, Right Tools: Computational Resources for Empirical Research 2014 Dokyun Lee, PhD Candidate, OPIM Dept. July 30, 2014 The aim of this course is to familiarize beginning Wharton PhD studentswithbothpubliclyavailable

More information

Machine Learning and SystemML. Nikolay Manchev Data Scientist Europe E-

Machine Learning and SystemML. Nikolay Manchev Data Scientist Europe E- Machine Learning and SystemML Nikolay Manchev Data Scientist Europe E- mail: nmanchev@uk.ibm.com @nikolaymanchev A Simple Problem In this activity, you will analyze the relationship between educational

More information

Pre-Requisites: CS2510. NU Core Designations: AD

Pre-Requisites: CS2510. NU Core Designations: AD DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification

More information

abstar Documentation Release Bryan Briney

abstar Documentation Release Bryan Briney abstar Documentation Release 0.3.1 Bryan Briney Apr 26, 2018 Contents 1 Getting Started 3 2 Usage 7 3 About 13 4 Related Projects 15 5 Index 17 i ii AbStar is a core component of the Ab[x] Toolkit for

More information

Multi-Factor Authentication (MFA)

Multi-Factor Authentication (MFA) 10.10.18 1 Multi-Factor Authentication (MFA) What is it? Why should I use it? CYBERSECURITY Tech Fair 2018 10.10.18 2 Recent Password Hacks PlayStation Network (2011) 77 Million accounts hacked Adobe (2013)

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

SAS and Python: The Perfect Partners in Crime

SAS and Python: The Perfect Partners in Crime Paper 2597-2018 SAS and Python: The Perfect Partners in Crime Carrie Foreman, Amadeus Software Limited ABSTRACT Python is often one of the first languages that any programmer will study. In 2017, Python

More information

Machine learning algorithms for datasets popularity prediction

Machine learning algorithms for datasets popularity prediction Machine learning algorithms for datasets popularity prediction Kipras Kančys and Valentin Kuznetsov Abstract. This report represents continued study where ML algorithms were used to predict databases popularity.

More information