Tackling Big Data Using MATLAB
|
|
- Donna Gabriella Bell
- 5 years ago
- Views:
Transcription
1 Tackling Big Data Using MATLAB Alka Nair Application Engineer 2015 The MathWorks, Inc. 1
2 Building Machine Learning Models with Big Data Access Preprocess, Exploration & Model Development Scale up & Integrate with Production Systems 2
3 Case study: Predict Air Quality Factors Affecting Air Quality My Weather Page Temperature Pressure Relative Humidity Dew Point Wind speed Wind direction Ozone CO NO2 SO2 3
4 4
5 Building Machine Learning Models with Big Data Access Preprocess, Exploration & Model Development Scale up & Integrate with Production Systems 5
6 Challenges in Modeling and Deploying Big Data Applications Access Preprocess, Exploration & Model Development Scale up & Integrate with Production Systems Distributed Data Storage Different Data Sources & Types Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment Managing Different APIs for Data Sources and Data Formats Rewriting Algorithms to Use Big Data Platforms Parallelizing Code to Scale up to Use Cluster and Cloud Compute Overhead in Moving the Algorithm to Production 6
7 Wouldn t it be nice if you could: Easily access data however it is stored Prototype algorithms quickly using small data sets Scale up to big data sets running on large clusters Using the same intuitive MATLAB syntax you are used to 7
8 Building machine learning models with big data Access Preprocess, Exploration & Model Development Scale up & Integrate with Production Systems 8
9 Access and Manage Big Data Different Data Types Different Data Sources Different Applications Text Images Spreadsheet Custom File Formats Hadoop Distributed File System (HDFS) Amazon S3 Windows Azure Blob Storage Relational Database HDFS on Hortonworks or Cloudera MapReduce Image Segmentation Image Classification Denoising Images Predictive Maintenance Datastores 9
10 Datastore Single Machine Memory Single Machine Memory Process Cluster of Machines Memory Cluster of Machines Memory One or more files 10
11 Air Quality Data on Local Folder 11
12 Accessing and Processing different types of data TabularTextDatastore Text files containing column-oriented data, including CSV files Image Collection MDF Files ImageDatastore SpreadsheetDatastore MDFDatastore Custom Datastore Image files, including formats that are supported by imread such as JPEG and PNG Spreadsheet files with a supported Excel format such as.xlsx Datastore for collection of MDF files Datastore for custom or proprietary format 12
13 You have 1 TB of data you ve never seen before. How do you access this data? 13
14 Historical files are on HDFS and real time data are available through an API Temperature Pressure Relative Humidity Dew Point Wind Speed Wind Direction Ozone CO NO2 SO2 14
15 Access air quality data using datastore 15
16 Preview the data and adjust properties to best represent the data of interest 16
17 Access data from anywhere with minimal changes Local disk 17
18 Datastores enable big data workflows Deep Learning 18
19 Datastores enable big data workflows Predictive Maintenance 19
20 Datastores enable big data workflows Fleet Analytics 20
21 Datastores: Access Big Data with Minimal Changes Different Data Types Different Data Sources Different Applications Text Images Spreadsheet Custom File Formats Hadoop Distributed File System (HDFS) Amazon S3 Windows Azure Blob Storage Relational Database HDFS on Hortonworks or Cloudera MapReduce Image Segmentation Image Classification Denoising Images Predictive Maintenance 21
22 Building machine learning models with big data Access Preprocess, Exploration & Model Development Scale up & Integrate with Production Systems 22
23 You have 1TB of data you ve never seen before. How do you visualize and process the data? 23
24 Use tall arrays to work with the data like any MATLAB array 24
25 Introduction to Tall Arrays Tall Arrays for Big Data Visualization and Preprocessing Machine Learning for Big Data Using Tall Arrays 25
26 Tall arrays Single Machine Memory Data is in one or more files Files stacked vertically Typically tabular data Challenge Data doesn t fit into memory (even cluster memory) Takes a lot of time for even simple operations on data Cluster of Machines Memory 26
27 Tall arrays (new R2016b) Single Machine Memory tall array Process Single Machine Memory Create tall table from datastore ds = datastore('*.csv') tt = tall(ds) Datastore Operate on whole tall table just like ordinary table Cluster of Machines Memory summary(tt) max(tt.endtime tt.starttime) 27
28 tall arrays Single Machine Memory tall array Process Single Machine Memory With Parallel Computing Toolbox, process several chunks at once Process Single Machine Memory Can scale up to clusters with MATLAB Distributed Computing Server Cluster of Machines Memory Process Single Machine Memory Process Single Machine Memory 28
29 Use a Spark-enabled Hadoop cluster and MATLAB Support for many other platforms through reference architectures 29
30 It s easy to run MATLAB code on Spark + Hadoop Spark Connection Cluster Config for Spark Hadoop Access 30
31 MATLAB Documentation for 31
32 Summary for tall arrays Local disk, Shared folders, Databases Run on Compute Clusters or Spark + Hadoop (HDFS), for large scale analysis Process out-of-memory data on your Desktop to explore, analyze, gain insights and to develop analytics Use Parallel Computing Toolbox for increased performance MATLAB Distributed Computing Server, Spark+Hadoop Develop your code locally using Tall Arrays or MapReduce only once Use the same code to scale up to cluster 32
33 Create a tall array for each datastore ozone 33
34 Execution model makes operations more efficient on big data tt : tall array Deferred evaluation Commands are not executed right away Operations are added to a queue Execution triggers include: gather function summary function Machine learning models Plotting 34
35 Execution model makes operations more efficient on big data Unnecessary results are not computed 35
36 Introduction to Tall Arrays Tall Arrays for Big Data Visualization and Preprocessing Machine Learning for Big Data Using Tall Arrays 36
37 Explore Big Data with Tall Visualizations plot scatter binscatter histogram histogram2 ksdensity 37
38 Explore Big Data with Tall Visualizations 38
39 Get a summary of the data tt tall table 39
40 Use data types to best represent the data 40
41 Managing Big and Messy Time-stamped Data 41
42 Use the results of explorations to help make decisions - Synchronize to daily data - By location 42
43 Synchronize all data to daily times 43
44 Clean messy data using common preprocessing functions 44
45 Use familiar MATLAB functions on tall arrays Functions Supported with Tall Arrays 45
46 You don t need to leave MATLAB to monitor large jobs 46
47 Save preprocessed data 47
48 Introduction to Tall Arrays Tall Arrays for Big Data Visualization and Preprocessing Machine Learning for Big Data Using Tall Arrays 48
49 Predict air quality Air Quality Index Air Quality Label Regression Classification 49
50 How do you know which model to use? Try them all 50
51 Use apps for model exploration on a subset of data Air Quality Index Air Quality Label Regression Learner Classification Learner 51
52 Validate and Compare Machine Learning Models 52
53 Validate and Compare Machine Learning Models 53
54 Validate and Compare Machine Learning Models 54
55 Validate and Compare Machine Learning Models 55
56 Scale up with tall machine learning models Linear Regression (fitlm) Logistic & Generalized Linear Regression (fitglm) Discriminant Analysis Classification (fitcdiscr) K-means Clustering (kmeans) Principal Component Analysis (pca) Partition for Cross Validation (cvpartition) Linear Support Vector Machine (SVM) Classification (fitclinear) Naïve Bayes Classification (fitcnb) Random Forest Ensemble Classification (TreeBagger) Lasso Linear Regression (lasso) Linear Support Vector Machine (SVM) Regression (fitrlinear) Single Classification Decision Tree (fitctree) Linear SVM Classification with Random Kernel Expansion (fitckernel) Gaussian Kernel Regression (fitrkernel) 56
57 Training Machine Learning Model against Spark for Air Quality Classification 57
58 Train and validate with tall data for Air Quality Index Prediction 58
59 Select the most important features 59
60 Introduction to Tall Arrays Tall Arrays for Big Data Visualization and Preprocessing Machine Learning for Big Data Using Tall Arrays 61
61 Building machine learning models with big data Access Preprocess, Exploration & Model Development Scale up & Integrate with Production Systems 62
62 63
63 Predict air quality for given location Current Weather My Weather Page Your Weather Conditions Get weather conditions for your area. Location: MATLAB Runtime Temperature: 32F Humidity: 76% Wind: SSW 13 mph Use MATLAB model running on Spark in Python web framework 64
64 Integrate analytics with systems Embedded Hardware C, C++ HDL PLC GPU Enterprise Systems Standalone Application Excel Add-in Hadoop/ Spark C/C++ Java ++ Python.NET MATLAB Production Server MATLAB Runtime 65
65 Package and test MATLAB code 66
66 67
67 Package and test MATLAB code 68
68 Call MATLAB in production environment AirQual.ctf 69
69 MATLAB Production Server Server software Manages packaged MATLAB programs and worker pool MATLAB Runtime libraries Single server can use runtimes from different releases RESTful JSON interface Enterprise Application MPS Client Library Applications/ Database Servers RESTful JSON MATLAB Production Server Request Broker & Program Manager Lightweight client libraries C/C++,.NET, Python, and Java MATLAB Runtime 70
70 MATLAB for Modeling and Deploying Big Data Applications Access Preprocess, Exploration & Model Development Scale up & Integrate with Production Systems Distributed Data Storage Different Data Sources & Types Preprocessing and Visualizing Big Data Parallelizing Jobs and Scaling up Computations to Cluster Enterprise level deployment Easily Access Data however/wherever it is stored using Datastore Prototype and easily scale up algorithms to Big Data platforms using the familiar MATLAB Syntax with Tall Arrays Seamless integration with Enterprise level systems using MATLAB Production Server 71
71 How do you get started? Try Tall Array Based Processing on Your Own Set of Big Data Refer to the example mentioned below to get started: Other Resources mathworks.com/big-data mathworks.com/machine-learning ebook 72
72 MathWorks Training Offerings 73
73 Speaker Details LinkedIn: a/ Contact MathWorks India Products/Training Enquiry Booth Call: Share your experience with MATLAB & Simulink on Social Media Use #MATLABEXPO Share your session feedback: Please fill in your feedback for this session in the feedback form 74
Integrating Advanced Analytics with Big Data
Integrating Advanced Analytics with Big Data Ian McKenna, Ph.D. Senior Financial Engineer 2017 The MathWorks, Inc. 1 The Goal SCALE! 2 The Solution tall 3 Agenda Introduction to tall data Case Study: Predicting
More informationWhat's New in MATLAB for Engineering Data Analytics?
What's New in MATLAB for Engineering Data Analytics? Will Wilson Application Engineer MathWorks, Inc. 2017 The MathWorks, Inc. 1 Agenda Data Types Tall Arrays for Big Data Machine Learning (for Everyone)
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Dr. Roland Michaely 2015 The MathWorks, Inc. 1 Data Analytics Workflow Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Aurélie Urbain MathWorks Consulting Services 2015 The MathWorks, Inc. 1 Data Analytics Workflow Data Acquisition Data Analytics Analytics Integration Business
More informationNavigating Big Data with MATLAB
Navigating Big Data with MATLAB Isaac Noh Application Engineer 2015 The MathWorks, Inc. 1 How big is big? What does Big Data even mean? Big data is a term for data sets that are so large or complex that
More informationBig Data con MATLAB. Lucas García The MathWorks, Inc. 1
Big Data con MATLAB Lucas García 2015 The MathWorks, Inc. 1 Agenda Introduction Remote Arrays in MATLAB Tall Arrays for Big Data Scaling up Summary 2 Architecture of an analytics system Data from instruments
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Lyamine Hedjazi 2015 The MathWorks, Inc. 1 Data Analytics Workflow Preprocessing Data Business Systems Build Algorithms Smart Connected Systems Take Decisions
More informationMATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2
1 Senior Application Engineer The MathWorks Korea 2017 The MathWorks, Inc. 2 Data Analytics Workflow Business Systems Smart Connected Systems Data Acquisition Engineering, Scientific, and Field Business
More informationScaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1
Scaling MATLAB for Your Organisation and Beyond Rory Adams 2015 The MathWorks, Inc. 1 MATLAB at Scale Front-end scaling Scale with increasing access requests Back-end scaling Scale with increasing computational
More informationSharing and Deploying MATLAB Programs Sundar Umamaheshwaran Amit Doshi Application Engineer-Technical Computing
Sharing and Deploying Programs Sundar Umamaheshwaran Amit Doshi Application Engineer-Technical Computing 2016 The MathWorks, Inc. 1 Summary: Data Analytics Workflow Business Systems Smart Connected Systems
More informationAnalyzing Fleet Data with MATLAB and Spark
Analyzing Fleet Data with MATLAB and Spark Christoph Stockhammer 2018 The MathWorks, Inc. 1 What does Fleet mean? A Fleet is any group of things that can generate data and that you would like to look at
More informationWhat s New in MATLAB May 16, 2017
What s New in MATLAB May 16, 2017 2017 The MathWorks, Inc. 1 Agenda MATLAB Foundation Working with Data Building & Sharing MATLAB Applications Application Specific Enhancements Summary and Wrap-up 2 Agenda
More informationData Analytics with MATLAB
Data Analytics with MATLAB Tackling the Challenges of Big Data Adrienne James, PhD MathWorks 7 th October 2014 2014 The MathWorks, Inc. 1 Big Data in Industry ENERGY Asset Optimization FINANCE Market Risk,
More informationSharing and Deploying MATLAB Applications
Sharing and Deploying Applications Dr. Roland Michaely Applications Engineer 2015 The MathWorks, Inc. 1 ICICI Securities Develops Online Financial Planning and Advisory Platform Challenge Launch a scalable
More informationData Analytics with MATLAB. Tackling the Challenges of Big Data
Data Analytics with MATLAB Tackling the Challenges of Big Data How big is big? What characterises big data? Any collection of data sets so large and complex that it becomes difficult to process using traditional
More information2015 The MathWorks, Inc. 1
2015 The MathWorks, Inc. 1 What s New in Release 2015a and 2014b Young Joon Lee Principal Application Engineer 2015 The MathWorks, Inc. 2 Agenda New Features Graphics and Data Design Performance Design
More informationAdvanced Software Development with MATLAB
Advanced Software Development with MATLAB From research and prototype to production 2017 The MathWorks, Inc. 1 What Are Your Software Development Concerns? Accuracy Compatibility Cost Developer Expertise
More informationParallel and Distributed Computing with MATLAB The MathWorks, Inc. 1
Parallel and Distributed Computing with MATLAB 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster insight on more complex problems with larger datasets
More informationIntroduction to MATLAB application deployment
Introduction to application deployment Antti Löytynoja, Application Engineer 2015 The MathWorks, Inc. 1 Technical Computing with Products Access Explore & Create Share Options: Files Data Software Data
More informationMATLAB 에서작업한응용프로그램의공유 : App 에서부터웹서비스까지
MATLAB 에서작업한응용프로그램의공유 : App 에서부터웹서비스까지 Application Engineer 엄준상 2013 The MathWorks, Inc. 1 Application Deployment with MATLAB Suppliers MATLAB Author Clients Organization Group Members Collaborators 2
More informationParallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer
Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster
More informationIntegrating MATLAB Analytics into Business-Critical Applications Marta Wilczkowiak Senior Applications Engineer MathWorks
Integrating MATLAB Analytics into Business-Critical Applications Marta Wilczkowiak Senior Applications Engineer MathWorks 2015 The MathWorks, Inc. 1 Problem statement Democratization: Is it possible to
More informationApplication Development and Deployment With MATLAB
Application Development and Deployment With Jean-Philippe Villaréal Application Engineer Applications Engineering Group MathWorks Benelux June 11, 2015 2015 The MathWorks, Inc. 1 Typical Industry Challenges
More informationWhat s New for MATLAB David Willingham
What s New for MATLAB David Willingham 2015 The MathWorks, Inc. 1 MATLAB Execution Engine Redesigned execution engine runs MATLAB code faster All MATLAB code is now JIT compiled A platform for future improvements
More informationBIG DATA: Data Analytics with MATLAB Christophe POUILLOT Senior Consultant MathWorks
BIG DATA: Data Analytics with MATLAB Christophe POUILLOT Senior Consultant MathWorks christophe.pouillot@mathworks.fr 2014 The MathWorks, Inc. 1 Definition of Big Data Data so large and complex that it
More informationMATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by
1 MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by MathWorks In 2004, MATLAB had around one million users
More informationScaling up MATLAB Analytics Marta Wilczkowiak, PhD Senior Applications Engineer MathWorks
Scaling up MATLAB Analytics Marta Wilczkowiak, PhD Senior Applications Engineer MathWorks 2013 The MathWorks, Inc. 1 Agenda Giving access to your analytics to more users Handling larger problems 2 When
More informationMATLAB as a Financial Engineering Development Platform Delivering Financial / Quantitative Models to the Enterprise Eugene McGoldrick
as a Financial Engineering Development Platform Delivering Financial / Quantitative Models to the Enterprise Eugene McGoldrick 2016 The MathWorks, Inc. 1 Development Environment for Financial Services
More informationDeploying Deep Learning Networks to Embedded GPUs and CPUs
Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1 MATLAB Deep Learning Framework Access Data Design + Train
More informationDeveloping Optimization Algorithms for Real-World Applications
Developing Optimization Algorithms for Real-World Applications Gautam Ponnappa PC Training Engineer Viju Ravichandran, PhD Education Technical Evangelist 2015 The MathWorks, Inc. 1 2 For a given system,
More informationSimplifier la mise en production d applications MATLAB. Marc Wolff Application Engineer MathWorks 1
Simplifier la mise en production d applications MATLAB Marc Wolff Application Engineer MathWorks marc.wolff@mathworks.fr 1 What if you could turn a MATLAB application into an interactive standalone application?
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationAn Introduction to Apache Spark
An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationKNIME for the life sciences Cambridge Meetup
KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More
More informationTutorial on Machine Learning Tools
Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow
More informationIBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics
IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationSparkling Water. August 2015: First Edition
Sparkling Water Michal Malohlava Alex Tellez Jessica Lanford http://h2o.gitbooks.io/sparkling-water-and-h2o/ August 2015: First Edition Sparkling Water by Michal Malohlava, Alex Tellez & Jessica Lanford
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationFrom Apps to Web Services: Deploying Your MATLAB Algorithms and Applications Marta Wilczkowiak
From Apps to Web Services: Deploying Your Algorithms and Applications Marta Wilczkowiak 1 2013 The MathWorks, Inc. Why deploy your algorithms? Raise awareness of your work Reduce duplication of efforts
More informationWhat s New in MATLAB and Simulink
What s New in MATLAB Simulink Fabrizio Sara 2015 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers scientists deploy algorithms
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationThe Evolution of Big Data Platforms and Data Science
IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering
More informationWhat s New in MATLAB and Simulink The MathWorks, Inc. 1
What s New in MATLAB Simulink 2015 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers scientists deploy algorithms applications
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationDesign Challenges for Sensor Data Analytics in Internet of Things (IoT)
Design Challenges for Sensor Data Analytics in Internet of Things (IoT) Corey Mathis 2015 The MathWorks, Inc. 1 Agenda IoT Overview Design Challenges for Sensor Data Analytics Example Solutions
More informationMachine Learning with MATLAB --classification
Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which
More informationWhat s New in MATLAB and Simulink Young Joon Lee Principal Application Engineer
What s New in MATLAB Simulink Young Joon Lee Principal Application Engineer 2016 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationUsing Existing Numerical Libraries on Spark
Using Existing Numerical Libraries on Spark Brian Spector Chicago Spark Users Meetup June 24 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm
More informationApache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved.
Apache Hadoop 3 Balazs Gaspar Sales Engineer CEE & CIS balazs@cloudera.com 1 We believe data can make what is impossible today, possible tomorrow 2 We empower people to transform complex data into clear
More informationHDInsight > Hadoop. October 12, 2017
HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond
More informationAnalyzing Big Data with Microsoft R
Analyzing Big Data with Microsoft R 20773; 3 days, Instructor-led Course Description The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis
More informationScalable Machine Learning in R. with H2O
Scalable Machine Learning in R with H2O Erin LeDell @ledell DSC July 2016 Introduction Statistician & Machine Learning Scientist at H2O.ai in Mountain View, California, USA Ph.D. in Biostatistics with
More informationWorking with Large Sets of Images in MATLAB Just Got Easier Avi Nehemiah
Working with Large Sets of Images in MATLAB Just Got Easier Avi Nehemiah 2015 The MathWorks, Inc. 1 Challenges Posed by Large Sets of Images 1. How do I import several thousand images into MATLAB? 2. Can
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationWhat s New MATLAB and Simulink
What s New MATLAB and Simulink Ascension Vizinho-Coutry Application Engineer Manager MathWorks Ascension.Vizinho-Coutry@mathworks.fr Daniel Martins Application Engineer MathWorks Daniel.Martins@mathworks.fr
More informationMachine Learning and SystemML. Nikolay Manchev Data Scientist Europe E-
Machine Learning and SystemML Nikolay Manchev Data Scientist Europe E- mail: nmanchev@uk.ibm.com @nikolaymanchev A Simple Problem In this activity, you will analyze the relationship between educational
More informationOverview. Audience profile. At course completion. Course Outline. : 20773A: Analyzing Big Data with Microsoft R. Course Outline :: 20773A::
Module Title Duration : 20773A: Analyzing Big Data with Microsoft R : 3 days Overview The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis
More informationUsing Numerical Libraries on Spark
Using Numerical Libraries on Spark Brian Spector London Spark Users Meetup August 18 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm with
More informationOliver Engels & Tillmann Eitelberg. Big Data! Big Quality?
Oliver Engels & Tillmann Eitelberg Big Data! Big Quality? Like to visit Germany? PASS Camp 2017 Main Camp 5.12 7.12.2017 (4.12 Kick Off Evening) Lufthansa Training & Conference Center, Seeheim SQL Konferenz
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationVerteego VDS Documentation
Verteego VDS Documentation Release 1.0 Verteego May 31, 2017 Installation 1 Getting started 3 2 Ansible 5 2.1 1. Install Ansible............................................. 5 2.2 2. Clone installation
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationProcessing of big data with Apache Spark
Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT
More informationCOPYRIGHT DATASHEET
Your Path to Enterprise AI To succeed in the world s rapidly evolving ecosystem, companies (no matter what their industry or size) must use data to continuously develop more innovative operations, processes,
More informationA Cloud System for Machine Learning Exploiting a Parallel Array DBMS
2017 28th International Workshop on Database and Expert Systems Applications A Cloud System for Machine Learning Exploiting a Parallel Array DBMS Yiqun Zhang, Carlos Ordonez, Lennart Johnsson Department
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationBig Data and FrameWorks; Perspectives to Applied Machine Learning
Big Data and FrameWorks; Perspectives to Applied Machine Learning Mehdi Habibzadeh PhD in Computer Science Outlines (Oct 2016) : Big Data and Challenges Review and Trends Math and Probability Concepts
More informationCSE 444: Database Internals. Lecture 23 Spark
CSE 444: Database Internals Lecture 23 Spark References Spark is an open source system from Berkeley Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei
More informationHardware and Software Co-Design for Motor Control Applications
Hardware and Software Co-Design for Motor Control Applications Gaurav Dubey Durvesh Kulkarni 2015 The MathWorks, Inc. 1 Key trend: Increasing demands from motor drives Advanced algorithms require faster
More informationMachine Learning with Python
DEVNET-2163 Machine Learning with Python Dmitry Figol, SE WW Enterprise Sales @dmfigol Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session
More informationAzure Data Factory. Data Integration in the Cloud
Azure Data Factory Data Integration in the Cloud 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationOverview of Data Services and Streaming Data Solution with Azure
Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server
More informationDatabricks, an Introduction
Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,
More informationScaled Machine Learning at Matroid
Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Machine Learning Pipeline Learning Algorithm Replicate model Data Trained Model Serve Model Repeat entire pipeline Scaling
More informationRAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE
RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo (luigi.grimaudo@polito.it) DataBase And Data Mining Research Group (DBDMG) Summary RapidMiner project Strengths
More informationSummary. RapidMiner Project 12/13/2011 RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE
RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo (luigi.grimaudo@polito.it) DataBase And Data Mining Research Group (DBDMG) Summary RapidMiner project Strengths
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationThis document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and
AI and Visual Analytics: Machine Learning in Business Operations Steven Hillion Senior Director, Data Science Anshuman Mishra Principal Data Scientist DISCLAIMER During the course of this presentation,
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More informationAbout Intellipaat. About the Course. Why Take This Course?
About Intellipaat Intellipaat is a fast growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 700,000 in over
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationSpotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data
Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing
More informationOliver Engels & Tillmann Eitelberg. Big Data! Big Quality?
Oliver Engels & Tillmann Eitelberg Big Data! Big Quality? Sponsors help us to run this event! THX! You Rock! Sponsor Gold Sponsor Silver Sponsor Bronze Sponsor You Rock! Sponsor Session 13:45 Track 1 Das
More informationMap Reduce Group Meeting
Map Reduce Group Meeting Yasmine Badr 10/07/2014 A lot of material in this presenta0on has been adopted from the original MapReduce paper in OSDI 2004 What is Map Reduce? Programming paradigm/model for
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University
More informationSCALABLE DISTRIBUTED DEEP LEARNING
SEOUL Oct.7, 2016 SCALABLE DISTRIBUTED DEEP LEARNING Han Hee Song, PhD Soft On Net 10/7/2016 BATCH PROCESSING FRAMEWORKS FOR DL Data parallelism provides efficient big data processing: data collecting,
More informationTom Brenneman. Good morning and welcome, introductions and thank you for being here.
Welcome Tom Brenneman Good morning and welcome, introductions and thank you for being here. This is a best practices seminar. We're going to be sharing with you what we found to be best practices that
More informationSQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism
Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and
More information