BIG DATA: Data Analytics with MATLAB Christophe POUILLOT Senior Consultant MathWorks

Similar documents
Data Analytics with MATLAB. Tackling the Challenges of Big Data

Data Analytics with MATLAB

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2

2015 The MathWorks, Inc. 1

Big Data con MATLAB. Lucas García The MathWorks, Inc. 1

Tackling Big Data Using MATLAB

Integrate MATLAB Analytics into Enterprise Applications

Getting Started with MATLAB Francesca Perino

Integrate MATLAB Analytics into Enterprise Applications

Scaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1

Navigating Big Data with MATLAB

Integrating Advanced Analytics with Big Data

Integrating MATLAB Analytics into Business-Critical Applications Marta Wilczkowiak Senior Applications Engineer MathWorks

Working with Large Sets of Images in MATLAB Just Got Easier Avi Nehemiah

Parallel and Distributed Computing with MATLAB The MathWorks, Inc. 1

What s New MATLAB and Simulink

Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer

Advanced Software Development with MATLAB

Integrate MATLAB Analytics into Enterprise Applications

What's New in MATLAB for Engineering Data Analytics?

What s New in MATLAB May 16, 2017

Behind Today s Trends The Technologies Driving Change. Paul Smith Director Consulting Services

Introduction to MATLAB application deployment

Scaling up MATLAB Analytics Marta Wilczkowiak, PhD Senior Applications Engineer MathWorks

MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by

Introduction to Data Mining and Data Analytics

The Evolution of Big Data Platforms and Data Science

Big Data with Hadoop Ecosystem

Challenges for Data Driven Systems

Sharing and Deploying MATLAB Applications

Sharing and Deploying MATLAB Programs Sundar Umamaheshwaran Amit Doshi Application Engineer-Technical Computing

Decision models for the Digital Economy

What s New for MATLAB David Willingham

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data Science IOUG Collaborate 16

Optimizing and Accelerating Your MATLAB Code

2^48 - keine Angst vor großen Datensätzen in MATLAB

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Technical Computing with MATLAB

Analyzing Fleet Data with MATLAB and Spark

Cortana Intelligence Suite; Where the Magic Happens

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Process Big Data in MATLAB Using MapReduce

Dealing with Data Especially Big Data

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

R Language for the SQL Server DBA

SpagoBI and Talend jointly support Big Data scenarios

Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA

Big Data and Object Storage

Specialist ICT Learning

BIG DATA TESTING: A UNIFIED VIEW

Processing of big data with Apache Spark

Scalable Tools - Part I Introduction to Scalable Tools

Oracle Big Data Science

SFO15-TR6: Hadoop on ARM

Data Science with PostgreSQL

Application Development and Deployment With MATLAB

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

HDInsight > Hadoop. October 12, 2017

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

Talend Big Data Sandbox. Big Data Insights Cookbook

Hadoop An Overview. - Socrates CCDH

MATLAB Introduction. Ron Ilizarov Application Engineer

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Pervasive DataRush TM

Streaming iphone sensor data to SAS Event Stream Processing

Embedded Technosolutions

Based on Big Data: Hype or Hallelujah? by Elena Baralis

An Introduction to Apache Spark

Logging Reservoir Evaluation Based on Spark. Meng-xin SONG*, Hong-ping MIAO and Yao SUN

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Data Platforms and Pattern Mining

Pre-Requisites: CS2510. NU Core Designations: AD

Parallel Computing with MATLAB

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

MATH36032 Problem Solving by Computer. Data Science

Introduction to MATLAB for Finance

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Cloud Computing. Technologies and Types

Fast, Interactive, Language-Integrated Cluster Computing

Data Science Bootcamp Curriculum. NYC Data Science Academy

Big Data Architect.

ANY Data for ANY Application Exploring IBM Data Virtualization Manager for z/os in the era of API Economy

2015 The MathWorks, Inc. 1

Introducing SAS Model Manager 15.1 for SAS Viya

DATA SCIENCE USING SPARK: AN INTRODUCTION

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Big Data and FrameWorks; Perspectives to Applied Machine Learning

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

Cloud Computing & Visualization

SAS Platform Strategy Prepared for FANS usergroup. Mike Frost, Director, Product Management Fiona McNeill, Global Product Marketing

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018

Using Parallel Computing Toolbox to accelerate the Video and Image Processing Speed. Develop parallel code interactively

Cloud Computing Technologies and Types

Machine Learning with Python

Big Data Hadoop Course Content

STA141C: Big Data & High Performance Statistical Computing

Evolving To The Big Data Warehouse

The Future of Analytics or The New SQL

Cloud, Big Data & Linear Algebra

Transcription:

BIG DATA: Data Analytics with MATLAB Christophe POUILLOT Senior Consultant MathWorks christophe.pouillot@mathworks.fr 2014 The MathWorks, Inc. 1

Definition of Big Data Data so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. from wikipedia Volume Velocity Variety Name Symbol Value gigaoctet Go 10 9 téraoctet To 10 12 pétaoctet Po 10 15 exaoctet Eo 10 18 Social network: xxto/day zettaoctet Zo 10 21 Worldwide: 2,8Zo/year (2012) yottaoctet Yo 10 24 Data structured or not from different sources: Web/Text/Image mining 2

Data containers Collection of files Databases Huge single file 3

Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores Analysis Machine learning Analysis domain Statistics Programming Constructs Streaming Block Processing Parallel-for loops GPU Arrays SPMD and Distributed Arrays MapReduce Platforms Desktop (Multicore, GPU) Clusters Cloud Computing (MDCS on EC2) Hadoop 4

Agenda Machine Learning Datastore/MapReduce Integration with Hadoop Databases Huge file Deployment Key takeaways 5

Overview Machine Learning Type of Learning Categories of Algorithms Unsupervised Learning Clustering Machine Learning Group and interpret data based only on input data Classification Supervised Learning Develop predictive model based on both input and output data Regression 6

Demo: Machine Learning 7

Agenda Machine Learning Datastore/MapReduce Integration with Hadoop Databases Huge file Deployment Key takeaways 8

DataStore datastore Import text files & collections of text files that don t fit into memory ds = datastore('file1.mat'); ds = datastore('*.csv'); ds = datastore('/shared/data_repository/'); ds = datastore('hdfs://myserver:7867/data/file1.txt'); ds = datastore({'/shared01/','/shared02/'}); while hasdata(ds) T = read(ds); end 9

Demo mapreduce Data Store Map Reduce 1503 UA LAX -5-10 2356 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 0 13 568 272 US BWI 4-2 359 UA PS DL DL US 2356 186 1876 568 359 UA 2356 UA 1867 UA 1365 PS 186 784 PS SEA 7 3 176 PS 176 PS 237 796 PS LAX -2 2 237 PS 237 1525 UA SFO 3-5 1867 UA 1867 DL 1876 632 US PS SJC 2-4 245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW -2 6 914 US UA DL DL 245 1365 789 914 DL 914 US 359 US 245 10

Demo: datastore/mapreduce 11

Agenda Machine Learning Datastore/MapReduce Integration with Hadoop Databases Huge file Deployment Key takeaways 12

Integration with Hadoop Datastore HDFS MATLAB Node Data Map Reduce Node Data Map Reduce map.m reduce.m main.m Node Data Map Reduce 13

Agenda Machine Learning Datastore/MapReduce Integration with Hadoop Databases Huge file Deployment Key takeaways 14

Databases Relational database (ODBC/JDBC-compliant) DatabaseDatastore (DataBase Toolbox) conn = database.odbcconnection('mysql','username','pwd'); dbds = datastore(conn, 'select * from producttable'); NOSQL database MATLAB calls external functions: C/C++ shared libraries JAVA libraries.net libraries COM Objects (ActiveX ) Python libraries WSDL Web Service 15

Agenda Machine Learning Datastore/MapReduce Integration with Hadoop Databases Huge file Deployment Key takeaways 16

Huge flat files: Mapping memory within MATLAB Read/write variables in files, without loading into memory matfile: mat files m = matfile('myfile.mat'); z = m.x(85:94,85:94); % read from disk m.x(81:100,81:100) = magic(20); % write on disk memmapfile: any files m = memmapfile('records.dat','offset',9000, 'Format','int32'); z = m.data(85:94,85:94); % read from disk m.data(81:100,81:100) = magic(20); % write on disk 17

Agenda Machine Learning Datastore/MapReduce Integration with Hadoop Databases Huge file Deployment Key takeaways 18

Deployment: Compiling MATLAB to go everywhere.lib.dll.exe Java MATLAB Java Analytic 19

Agenda Machine Learning Datastore/MapReduce Integration with Hadoop Databases Huge file Deployment Key takeaways 20

Key takeaways MATLAB is the framework for BIG DATA analytics MathWorks services can help you: Training services Consulting services http://www.mathworks.fr/discovery/big-data-matlab.html?s_tid=gn_loc_drop 21

Questions? 22