1
Senior Application Engineer The MathWorks Korea 2017 The MathWorks, Inc. 2
Data Analytics Workflow Business Systems Smart Connected Systems Data Acquisition Engineering, Scientific, and Field Business and Transactional Data Analytics Data Pre-processing Feature Extraction Building algorithms, math models Making business decisions : Single Platform Analytics Integration Integrate algorithms with IT Analytics run on Embedded targets 3
Key Takeaways 1. Distribute applications to non- users royalty-free. 2. Integrate functions into existing workflows and development platf orms. 3. Deploy Analytics for Big Data on Hadoop enabled Spark Clusters. 4. Deploy applications to service simultaneous user requests enterp rise-wide via web or cloud frameworks. 4
Challenges Multiple internal and external consumers of algorithms Challenging and time consuming to re-code algorithms for integrat ion into IT frameworks Development resources are scarce and time-to-market is short Company priority to deploy solutions to enterprise scale web or cloud frame works Scale application to serve large numbers of simultaneous requests 5
Programs Can be Shared With Anyone Share With Other Users Share With People Who do Not Have 6
Write Your Programs Once Then Share To Different Targets Compiler Compiler SDK Coder Apps Files Standalone Application Excel Add-in Hadoop C/C ++ Java.NET Python Production Server Custom Toolbox With Users With People Who Do Not Have Source Code 7
Share with People Who Do Not Have Compiler Compiler SDK Standalone Application Excel Add-in Hadoop C/C ++ Java.NET Python Production Server Share Applications with No Additional Programming Integrate -based Components With Your Own Software Royalty-free Sharing IP Protection via Encryption 8
Share Applications Built Completely in Application Author Toolboxes 1 2 Compiler End User Standalone Application Excel Add-in Hadoop 3 Runtime 9
10
Integrate -based Components With Your Own Software Application Author Toolboxes 1 Software Developer Compiler SDK 2 C/C ++.NET Production Server 3 4 Runtime Python Java 12
13
Using Compiler SDK to create Python Packages 14
and Production Server is the easiest and most productive environment to take your enterprise analytics or IoT solution from idea to production Idea Production 16
Why Production Server Matters to You Domain Expert Solution Architect Production Server allow you to continue to work in the envi ronment that you love No need to learn another program ming language Production Server integr ates with enterprise IT infrastructu re Production Server integr ates code into the enterp rise IT fabric that you are comforta ble with No need to re-code into another p rogramming language Web and cloud friendly architectur e 17
Scale Up with Production Server Directly deploy programs into production Centrally manage multiple programs and runtime version s Automatically deploy updates without server restarts Most efficient path for creating enterprise applications Production Server(s) Scalable and reliable Service large numbers of concurrent requests Add capacity or redundancy with additional servers Web Server(s) HTML XML Java Script Use with web, database and application servers Lightweight client library isolates processing Access programs using native data types 18
Customer examples: Financial customer advisory service Production Server Global financial institution with European HQ Request Broker o Saved 2 million annually for an external system Algorithm Developers Compiler SDK Request Broker o Quicker implementation of adjustments in source code by the quantitative analysts Request Broker o Knowledge + = Build your own systems 19
Industrial IoT Analytics on AWS Global industrial equipment manufacturer Industrial Equipment Networked communication Embedded sensors Data reduction Production Server Request Broker Business Systems Users Compiler SDK Algorithm Developers 20
Building Automation IoT Analytics on Azure Global heavy duty electrical equipment manufacturer Building/HVAC automation control system Variety of sensors and controls Networked communication Data reduction Azure EventHub Azure Blob Production Server Request Broker Azure SQL Compiler SDK Business Systems Users Algorithm Developers 21
Production Server Enterprise Class Framework For Running Packaged Programs Server software Manages packaged progr ams and worker pool Runtime libraries Single server can use runtimes fro m different releases RESTful JSON interface and lightweigh t client library (C/C++,.NET, Python, an d Java) Enterprise Application MPS Client Library Enterprise Application RESTful JSON Production Server Request Broker & Program Manager Runtime 22
Calling Functions Enterprise Application Production Server MWHttpClient object HTTP(S) Request Broker & Program Manager Calculation Process Calculation Process Worker Pool 23
Technology Stack Data Analytics Business System Databases Distributed Computing Server Visualization Cloud Storage Azure Blob Production Server Web Request Broker IoT Custom App Public Cloud Platform Private Cloud 24
Example - Integrating with IT systems Compiler SDK Web Applications Web Server Production Server Portfolio Optimization Excel Add-in Desktop Applications Application Server Pricing Risk Analytics Database Server 25
Production Deployment Workflow Development Developer Initial Test Application Verify data handling and initial behavior Debug Algorithm Algorithm Compiler SDK Enterprise Application Developer Web Application Function Call Deployable Archive Production Server Production Production Server.. Web Application Function Calls Deployable Archives 26
Develop and Test with Compiler SDK Application HTTP Test environment for Production Se rver Test and debug in desktop Details on request transactions debug and profiling with end to end testin g 27
Web Management Dashboard New in R2017a 28
Load Forecasting Demo Energy load forecasting demo Production Server(s) HTML XML Java Script Web Server(s) 29
at Scale Production Server Application server for Front-end scalability Manage large numbers of requests to run short-running deployed programs Distributed Computing Server Cluster framework for /Simulink Back-end scalability Speed up computationally intensive programs on computer clusters, clouds, and grids 30
Distinct Offerings Scale Application Access and Computation Compiler SDK Deployed Application Desktop (client) Parallel Computing Toolbox code with batch, parfor, or other parallel constructs Request broker Deployed Application Deployed Application Deployed Application Deployed Application GPU Multi-core CPU Production Server Distributed Computing Server 31
Distinct Offerings Scale Application Access and Computation Compiler SDK Deployed Application Desktop (client) Parallel Computing Toolbox code with batch, parfor, or other parallel constructs Compiler SDK Request broker Deployed Application Deployed Application Deployed Application Deployed Application GPU Multi-core CPU Production Server Parallel workers on remote hardware Distributed Computing Server 32
Online Resources Documentation Create and Share Toolboxes Website Desktop and Web Deployment Free White Paper Building a Website with Analytics Website Using With Other Programming Languages 33
Supplemental Slides Use the following slides for more detailed discussions on various implementations using Production Server. 34
Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult to process using t raditional data processing applications. (Wikipedia) Deploy Visualize Rapid Data Exploration Develop Scalable Algorithms Integrate Big Data Applications 46
Hadoop: The Big Data Platform Datastore HDFS Node Data Map Reduce Node Data Map Reduce Node Data Map Reduce 47
Matlab Integration with Hadoop clusters Datastore HDFS Node Data Map Reduce Node Data Map Reduce map.m reduce.m Node Data Map Reduce 48
Deploy Applications with Hadoop Datastore HDFS runtime Node Data Map Reduce Node Data Map Reduce Node Data Map Reduce Compile Map Reduce Code 49
Use with Spark on Gigabytes and Terabytes of Data tall array or tall tables 50
Run scripts on SPARK & HADOOP workers on worker nodes in the cluster MDCS workers (working from ) Job submitted using Java RDD API YARN Spark-submit script Hadoop & Spark Library Resource Manager Data Nodes HDFS Edge Node Master Name Node Worker Nodes 51
Example: Running on Spark enabled Hadoop % Define the Execution Environment. % Desktop mr = mapreducer(gcp); % Access the data. ds = datastore( C:/datasets/taxiData/*.csv'); tt = tall(ds); Desktop Code PCT, Datastore, tall Spark + Hadoop Code Spark Connection Cluster Config for Spark %% Define the Execution Environment. % Hadoop/Spark Cluster setenv('hadoop_home', '/dev_env/cluster/hadoop'); setenv('spark_home', '/dev_env/cluster/spark'); numworkers = 16; cluster = parallel.cluster.hadoop; cluster.sparkproperties('spark.executor.instances') = num2str(numworkers); mr = mapreducer(cluster); Hadoop Access % Access the data ds = datastore('hdfs://hadoop01:54310/datasets/taxidata/*.csv'); tt = tall(ds); 52
Example: Running on Spark and Hadoop 53
Run scripts on SPARK & HADOOP workers on worker nodes in the cluster Runtime (deployed applications) YARN Compile Code Hadoop & Spark Library Edge Node Resource Manager Data Nodes HDFS Master Name Node Worker Nodes 54
Deploying Spark Applications 55