MATH36032 Problem Solving by Computer. Data Science

Similar documents
Introduction to Data Mining and Data Analytics

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140

What's New in MATLAB for Engineering Data Analytics?

Big Data Specialized Studies

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition

Based on Big Data: Hype or Hallelujah? by Elena Baralis

COMP 465 Special Topics: Data Mining

PASSPORT USER GUIDE. This guide provides a detailed overview of how to use Passport, allowing you to find the information you need more efficiently.

Data Mining Concepts & Tasks

Data Mining. Jeff M. Phillips. January 8, 2014

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Challenges and Opportunities with Big Data. By: Rohit Ranjan

Data Mining Concepts & Tasks

Computers Are Your Future

Computing Yi Fang, PhD

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL

A Review Paper on Big data & Hadoop

M.S. in Information Systems

Analysing Search Trends

BIG DATA TESTING: A UNIFIED VIEW

Slice Intelligence!

Introduction to Big Data

Overview of Big Data

A Smart New Cofely Dutch Data Summit Roland Schneiders

CS 345A Data Mining Lecture 1. Introduction to Web Mining

Big Data with Hadoop Ecosystem

MBA Tech Subjects. (All Branches)

Web Mining TEAM 8. Professor Anita Wasilewska CSE 634 Data Mining

Some Big Data Challenges

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Mining of Massive Datasets

Learning Objectives for Data Concept and Visualization

Machine Learning for Large-Scale Data Analysis and Decision Making A. Distributed Machine Learning Week #9

Embedded Technosolutions

PASSPORT USER GUIDE. This guide provides a detailed overview of how to use Passport, allowing you to find the information you need more efficiently.

retail Free popcorn today cinema All food 20% off women s clothing counter food court

Big Data Issues for Federal Records Managers

Course Structure A : General Education Course B : Major Course C : Free Elective Course

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Lecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered

COUNTRY PROFILE. Mexico

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

Big Data Analytics The Data Mining process. Roger Bohn March. 2017

WEB SITE FUNCTIONAL SPECIFICATION FOR A FICTION EXECUTIVE EDUCTIONAL INSTITUTE

14th Iran Media Technology Conference. by H. Shah-Hosseini. 12 Dec Gathered & presented by H. Shah-Hosseini 1

Strategic Briefing Paper Big Data

REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK

Machine Learning with Python

What the is SEO? And how you can kick booty in the interwebs game

2 nd Year. Module Basket of Courses Duration Credit Offered Status. 12 Weeks 4 NPTEL Programming in Java

Big Data Its All Around You

Dealing with Data Especially Big Data

Data Science Course Content

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

Consumer Insights. YouGov Omnibus, 5 th -6 th April

Business Analytics and Big Data: the process and the tools

CSC 261/461 Database Systems. Fall 2017 MW 12:30 pm 1:45 pm CSB 601

In-Memory Analytics with EXASOL and KNIME //

Unit 10 Databases. Computer Concepts Unit Contents. 10 Operational and Analytical Databases. 10 Section A: Database Basics

3 Data, Data Mining. Chengkai Li

Introduction to Data Science Day 2

Chapter 6 VIDEO CASES

Tackling Big Data Using MATLAB

Pre-Requisites: CS2510. NU Core Designations: AD

ECS289: Scalable Machine Learning

The future of shopping: I want it all, and I want it now Anthony Norman, Managing Director, GfK Retail & Technology

Large-Scale Data Engineering. Overview and Introduction

CS 6240: Parallel Data Processing in MapReduce: Module 1. Mirek Riedewald

DATA MINING II - 1DL460

Search Engine Optimization Specialized Studies

Seek and Ye shall Find

Big Data - Some Words BIG DATA 8/31/2017. Introduction

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

data-based banking customer analytics

MATH36032 Problem Solving by Computer. More Data Structure

Hadoop, Yarn and Beyond

Big Data A Growing Technology

Campaign Goals, Objectives and Timeline SEO & Pay Per Click Process SEO Case Studies SEO, SEM, Social Media Strategy On Page SEO Off Page SEO

Sustained Progress Year on Year

Project Design. Version May, Computer Science Department, Texas Christian University

Netezza The Analytics Appliance

Student Handbook Master of Information Systems Management (MISM)

2011 TMT Predictions.

Question Bank. 4) It is the source of information later delivered to data marts.

McAfee Total Protection for Data Loss Prevention

Global Standalone VPA (Virtual Personal Assistant) Device Market: Size, Trends & Forecasts ( ) May 2018

ECON/FIN 250: Forecasting in Finance and Economics

pandas: Rich Data Analysis Tools for Quant Finance

TOP 7 UPDATES IN LOCAL SEARCH FOR JANUARY 2015 YAHOO DIRECTORY NOW OFFICALLY CLOSED GOOGLE INTRODUCES NEWADWORDS TOOL AD CUSTOMIZERS

A REVIEW PAPER ON BIG DATA ANALYTICS

Overview of Web Mining Techniques and its Application towards Web

The amount of data increases every day Some numbers ( 2012):

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

2/26/2017. The amount of data increases every day Some numbers ( 2012):

Acceptance. Changes to this Policy

An Introduction to Data Analysis, Statistics, and Graphing

Gotcha! Network Analytics to augment Fraud Detection Big Data in the Food Chain: the un(der)explored goldmine?

Big Data con MATLAB. Lucas García The MathWorks, Inc. 1

Data science How to prepare engineers for this field

Transcription:

MATH36032 Problem Solving by Computer Data Science

NO. of jobs on jobsite 1 10000 NO. of Jobs 8000 6000 4000 2000 MATLAB Data Data Science 0 Jan 2016 Jul 2016 Jan 2017 1 http://www.jobsite.co.uk/

What is Data Science? (from Wiki) an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics,...

What is Data Science? Math & Stat Knowledge: calculus, statistics/probability, linear algebra Hacking skills: number/text manipulation and (vectorized) manipulation, algorithmic thinking,... Substantive expertise/domain knowledge: knowledge related to specific facts

Data Science/Big Data: why now?

Data Science/Big Data: why now? We are generating more data before

Data Science/Big Data: why now? We are generating more data before Technology in data collection and storage are improving

Applications: Social media and search engine Personalised webpage How does Amazon know which items to recommend?

Applications: Retailers Personalised promotion offers Who should get what kind of offer?

Applications: Credit Card Fraud Detection Do these transactions look normal?

Applications: Credit Card Fraud Detection Do these transactions look normal? This is my credit card statement. The transactions are made within two hours after I lost the card in Montreal in the summer of 2015.

More applications Insurance/ Actuarial Science: how much do you charge your customer Weather/climate forecasting: long term prediction Finance: better prediction of the stock prices.

Big data leaked and generated Is the 2.6 terabytes Panama Papers big data? How about the 1 billion accounts leaked from Yahoo s database?

Big data leaked and generated Is the 2.6 terabytes Panama Papers big data? How about the 1 billion accounts leaked from Yahoo s database? Data generated daily More than 10 terabytes for most national meteorological center More than 500 terabytes of data processed by Facebook More than 20 petabytes (2.0 10 16 bytes) handled by Google

Three V s of Big Data Volume: large quantity of data, big size of datasets Variety: many different types and forms of data, e.g. transactional from ATMs, social media site, emails, demographics data, tracking data from cell phones, etc. Velocity: data that is coming in at a very fast pace

Three V s of Big Data Volume: large quantity of data, big size of datasets Variety: many different types and forms of data, e.g. transactional from ATMs, social media site, emails, demographics data, tracking data from cell phones, etc. Velocity: data that is coming in at a very fast pace Require many new softwares/technologies: The support for big data sets is extended to all major technologies in MATLAB (mapreduce, datastore and other toolboxes)

Big Data Landscape

The first data science application? Kepler s three laws of planetary motion

Make a TV show using data?

TV made by Amazon and Netflix using big data

TV made by Amazon and Netflix using big data Alpha House was not as successful as expected. Netflix also had open competitions for best algorithms to predict user ratings for films (discontinued now for privacy and other reasons).

Google flu trends Google made a big splash in the news in 2008

Google flu trends Google made a big splash in the news in 2008 and five years later

The dark side of data science Data, if used in the right way, can greatly facilitate our life (like the concept of smart cities), but...

The dark side of data science Data, if used in the right way, can greatly facilitate our life (like the concept of smart cities), but... Privacy: how the data collected from you are being used?

The dark side of data science Data, if used in the right way, can greatly facilitate our life (like the concept of smart cities), but... Privacy: how the data collected from you are being used? Biased data: polls before Brexit or US presidential election in 2016

The dark side of data science Data, if used in the right way, can greatly facilitate our life (like the concept of smart cities), but... Privacy: how the data collected from you are being used? Biased data: polls before Brexit or US presidential election in 2016 Biased interpretation:

The dark side of data science Data, if used in the right way, can greatly facilitate our life (like the concept of smart cities), but... Privacy: how the data collected from you are being used? Biased data: polls before Brexit or US presidential election in 2016 Biased interpretation: Data shows that people who shop at Waitrose have longer life span than those at Aldi and Asda.

Data Science Workflow (in business)

Data Science using MATLAB MATLAB is not the best tool for data science. Certain tasks like text processing is better done with Python or R. More tools and data types are introduced in MATLAB for the past few years, mainly to cope with the increased need in data science.

MATLAB Data Type 2 We have already seen these (type whos in command window): Numerical Types: double (double precision floating number), uint8 (images), int32, int64,... Symbolic: Defined by syms Logical: true,false Characters and strings: A = string Function Handles: @ as in integral(@(x) sin(x),...) New data types introduced recently : structures (struct), cell arrays (cell), Time Series (timeseries from 2006b), Table (table from 2013b), Categorical Arrays (categorical from 2013b), Date and Time (datetime from 2015b),.. 2 http://uk.mathworks.com/help/matlab/data-types data-types.html

The plan for the rest of the semester In Week 8 (Friday) Review (and introduce) a few new data structures (mainly character strings) Read (and write) different data formats (csv, excel, image,...) Specific topics: Random simulation (week 9) Regression and classification (week 10) Dimension reduction/low rank approximation (week 10) Google Pagerank (week 11) Other related topics (if time permits, week 11)