CONTAINERIZED SPARK ON KUBERNETES. William Benton Red Hat,
|
|
- Corey Sims
- 5 years ago
- Views:
Transcription
1 CONTAINERIZED SPARK ON KUBERNETES William Benton Red Hat,
2 BACKGROUND
3 BACKGROUND
4 BACKGROUND
5 BACKGROUND
6 BACKGROUND
7 BACKGROUND
8 BACKGROUND
9 BACKGROUND
10 WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014
11 WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Spark executor Spark executor Spark executor Spark executor Spark executor Spark executor
12 WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Spark executor Spark executor Networked POSIX FS Spark executor Spark executor Spark executor Spark executor
13 WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Mesos Spark executor Spark executor Networked POSIX FS Spark executor Spark executor Spark executor Spark executor
14 WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Mesos 1 2 Spark executor Spark executor Spark executor Networked POSIX FS 3 Spark executor Spark executor 4 Spark executor
15 WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Mesos 1 1 Spark executor Networked 1 Spark executor POSIX FS 2 2 Spark executor 3 3 Spark executor 3 Spark executor 4 4 Spark executor
16 WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Mesos 1 1 Spark executor Networked 1 Spark executor POSIX FS 2 2 Spark executor 3 3 Spark executor 3 Spark executor 4 4 Spark executor
17 Analytics is no longer a separate workload.
18 Analytics is an essential component of modern datadriven applications.
19 OUR GOALS
20 OUR GOALS
21 OUR GOALS
22 OUR GOALS
23 OUR GOALS git
24 OUR GOALS git
25 OUR GOALS git
26 FORECAST Motivating containerized microservices Architectures for analytics and applications Spark clusters in containers: practicalities and pitfalls Play along at home Future work
27 MOTIVATING MICROSERVICES
28 A microservice architecture employs lightweight, modular, and typically stateless components with well-defined interfaces and contracts.
29 BENEFITS OF MICROSERVICE ARCHITECTURES
30 BENEFITS OF MICROSERVICE ARCHITECTURES
31 BENEFITS OF MICROSERVICE ARCHITECTURES
32 BENEFITS OF MICROSERVICE ARCHITECTURES
33 BENEFITS OF MICROSERVICE ARCHITECTURES
34 BENEFITS OF MICROSERVICE ARCHITECTURES
35 BENEFITS OF MICROSERVICE ARCHITECTURES
36 BENEFITS OF MICROSERVICE ARCHITECTURES 2 + 2
37 BENEFITS OF MICROSERVICE ARCHITECTURES
38 BENEFITS OF MICROSERVICE ARCHITECTURES
39 BENEFITS OF MICROSERVICE ARCHITECTURES
40 BENEFITS OF MICROSERVICE ARCHITECTURES
41 BENEFITS OF MICROSERVICE ARCHITECTURES?
42 BENEFITS OF MICROSERVICE ARCHITECTURES?
43 BENEFITS OF MICROSERVICE ARCHITECTURES?
44 MICROSERVICES AND SPARK executor executor executor executor master
45 MICROSERVICES AND SPARK λ x: x * 2 executor executor executor executor master
46 MICROSERVICES AND SPARK λ x: x * 2 executor executor executor executor master λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2
47 MICROSERVICES AND SPARK λ x: x * 2 executor executor executor executor master λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2
48 MICROSERVICES AND SPARK λ x: x * 2 executor executor executor executor master λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2
49 ARCHITECTURES FOR ANALYTICS AND APPLICATIONS
50 APPLICATION RESPONSIBILITIES events transform databases transform file, object storage transform
51 APPLICATION RESPONSIBILITIES events transform databases transform aggregate file, object storage transform
52 APPLICATION RESPONSIBILITIES events transform databases transform aggregate file, object storage transform models train
53 APPLICATION RESPONSIBILITIES events transform databases transform aggregate file, object storage transform models train
54 APPLICATION RESPONSIBILITIES events transform databases transform aggregate archive file, object storage transform models train
55 APPLICATION RESPONSIBILITIES events transform databases transform aggregate archive file, object storage transform models train
56 APPLICATION RESPONSIBILITIES events transform developer UI web and mobile databases transform aggregate archive file, object storage transform reporting models train
57 APPLICATION RESPONSIBILITIES events transform developer UI web and mobile databases transform aggregate archive file, object storage transform reporting models train management
58 LEGACY ARCHITECTURES
59 CONVENTIONAL DATA WAREHOUSE events
60 CONVENTIONAL DATA WAREHOUSE events transform
61 CONVENTIONAL DATA WAREHOUSE events transform UI
62 CONVENTIONAL DATA WAREHOUSE events transform UI business logic
63 CONVENTIONAL DATA WAREHOUSE events transform UI business logic RDBMS
64 CONVENTIONAL DATA WAREHOUSE events transform UI business logic transaction RDBMS processing
65 CONVENTIONAL DATA WAREHOUSE events transform UI business logic transaction RDBMS processing
66 CONVENTIONAL DATA WAREHOUSE events transform UI business logic transaction processing RDBMS RDBMS
67 CONVENTIONAL DATA WAREHOUSE events transform UI business logic analysis transaction processing RDBMS RDBMS
68 CONVENTIONAL DATA WAREHOUSE events transform reporting UI business logic analysis transaction processing RDBMS RDBMS
69 CONVENTIONAL DATA WAREHOUSE events transform reporting UI business logic analysis transaction processing RDBMS RDBMS
70 CONVENTIONAL DATA WAREHOUSE events transform reporting interactive query business UI analysis logic transaction RDBMS RDBMS processing
71 CONVENTIONAL DATA WAREHOUSE events transform reporting interactive query business UI analysis logic transaction analytic RDBMS RDBMS processing processing
72 HADOOP-STYLE DATA LAKE HDFS HDFS HDFS
73 HADOOP-STYLE DATA LAKE HDFS HDFS HDFS HDFS HDFS
74 HADOOP-STYLE DATA LAKE events HDFS HDFS HDFS HDFS HDFS
75 HADOOP-STYLE DATA LAKE events HDFS HDFS HDFS HDFS HDFS
76 HADOOP-STYLE DATA LAKE events HDFS HDFS HDFS HDFS HDFS compute compute compute compute compute
77 HADOOP-STYLE DATA LAKE events HDFS HDFS compute compute
78 MODERN ARCHITECTURES
79 THE LAMBDA ARCHITECTURE transform (imprecise) analysis events
80 THE LAMBDA ARCHITECTURE transform (imprecise) analysis events DFS
81 THE LAMBDA ARCHITECTURE transform (imprecise) analysis speed layer events DFS
82 THE LAMBDA ARCHITECTURE transform (imprecise) analysis speed layer events transform (precise) analysis DFS
83 THE LAMBDA ARCHITECTURE transform (imprecise) analysis speed layer events transform (precise) analysis DFS batch layer
84 THE LAMBDA ARCHITECTURE transform (imprecise) analysis federate speed layer events transform (precise) analysis DFS batch layer
85 THE LAMBDA ARCHITECTURE transform (imprecise) analysis federate UI speed layer events transform (precise) analysis DFS batch layer
86 THE LAMBDA ARCHITECTURE transform (imprecise) analysis federate UI speed layer serving layer events transform (precise) analysis DFS batch layer
87 THE KAPPA ARCHITECTURE events
88 THE KAPPA ARCHITECTURE events message bus
89 THE KAPPA ARCHITECTURE events message bus queue for raw data topic
90 THE KAPPA ARCHITECTURE events message bus queue for raw data topic transform
91 THE KAPPA ARCHITECTURE events message bus queue for raw data topic queue for preprocessed data topic transform
92 THE KAPPA ARCHITECTURE events message bus queue for raw data topic queue for preprocessed data topic transform analysis
93 THE KAPPA ARCHITECTURE events message bus queue for raw data topic queue for preprocessed data topic queue for analysis results topic transform analysis
94 THE KAPPA ARCHITECTURE events message bus queue for raw data topic queue for preprocessed data topic queue for analysis results topic transform analysis reporting
95 THE KAPPA ARCHITECTURE events message bus queue for raw data topic queue for preprocessed data topic queue for analysis results topic transform analysis reporting end-user UI
96 DATA FEDERATION IN THE COMPUTE LAYER events transform developer UI web and mobile databases transform aggregate archive file, object storage transform reporting models train management
97 DATA FEDERATION IN THE COMPUTE LAYER events transform developer UI web and mobile databases transform aggregate archive file, object storage transform reporting models train management
98 DATA FEDERATION IN THE COMPUTE LAYER events transform developer UI web and mobile databases transform aggregate archive file, object storage transform reporting models train management
99 SIDEBAR: THE MONOLITHIC SPARK ANTIPATTERN Resource manager Cluster scheduler app 1 app 2 Spark executor Shared FS Spark executor Spark executor app 3 app 4 Spark executor Spark executor Spark executor
100 SIDEBAR: THE MONOLITHIC SPARK ANTIPATTERN Resource manager Cluster scheduler app 1 app 2 Spark executor Shared FS Spark executor Spark executor app 3 app 4 Spark executor Spark executor Spark executor
101 ONE CLUSTER PER APPLICATION Resource manager app 1 app 2 app 3 Object stores app 4 app 5 app 6 Databases
102 ONE CLUSTER PER APPLICATION Resource manager app 1 app 2 app 3 Object stores app 4 app 5 app 6 Databases
103 PRACTICALITIES AND POTENTIAL PITFALLS
104 SCHEDULING Kubernetes app 1 app 2 app 3 Object stores app 4 app 5 app 6 Databases
105 SCHEDULING Kubernetes app 1 app 2 app 1 app 3 app 4 app 5 app 6
106 SCHEDULING Kubernetes app 1 app 2 app 1 app 3 app 4 app 5 app 6
107 SCHEDULING Kubernetes app 1 app 2 app 1 app 3 app 4 app 5 app 6
108 SCHEDULING Kubernetes app 1 app 2 app 1 app 3 app 4 app 5 app 6
109 SCHEDULING Kubernetes app 1 app 2 app 1 app 3 app 4 app 5 app 6
110 STORAGE Kubernetes app 1 app 2 app 3 app 4 app 5 app 6
111 STORAGE Kubernetes app 1 app 2 app 3 POSIX filesystem app 4 app 5 app 6
112 STORAGE Kubernetes app 1 app 2 app 3 POSIX filesystem familiar interface app 4 app 5 app 6 interoperability with other programs
113 STORAGE Kubernetes app 1 app 2 app 3 POSIX filesystem familiar interface app 4 app 5 app 6 interoperability with other programs unnecessary semantic guarantees difficult to manage
114 STORAGE Kubernetes HDFS HDFS HDFS app 1 app 2 app 3 HDFS HDFS HDFS app 4 app 5 app 6
115 STORAGE Kubernetes HDFS HDFS HDFS app 1 app 2 app 3 HDFS HDFS HDFS support for legacy app 4 app 5 app 6 Hadoop installations
116 STORAGE Kubernetes HDFS HDFS HDFS app 1 app 2 app 3 HDFS HDFS HDFS support for legacy app 4 app 5 app 6 Hadoop installations inelastic stateful can t collocate compute and data
117 STORAGE Kubernetes app 1 app 2 app 3 object store app 4 app 5 app 6
118 STORAGE Kubernetes app 1 app 2 app 3 object store interoperability app 4 app 5 app 6 fine-grained AC many implementations
119 STORAGE Kubernetes app 1 app 2 app 3 object store interoperability app 4 app 5 app 6 fine-grained AC many implementations consistency model performance (?)
120 STORAGE Kubernetes app 1 app 2 app 3 object store interoperability app 4 app 5 app 6 fine-grained AC many implementations consistency model performance
121 in a cloud native architecture, the benefit of HDFS is actually very small and that is why many cloud-first organizations no longer run HDFS, or only run it as a caching layer for S3. Reynold Xin on Quora (
122 NETWORKING
123 NETWORKING
124 NETWORKING
125 NETWORKING
126 NETWORKING can t access worker web UI (but wait for Spark 2.1!)
127 NETWORKING can t access worker web UI (but wait for Spark 2.1!)
128 NETWORKING can t access worker web UI (but wait for Spark 2.1!)
129 SECURITY
130 SECURITY
131 SECURITY
132 SECURITY $SPARK_HOME/bin/spark-class \ org.apache.spark.deploy.worker.worker \ master:7077 pid root net
133 SECURITY $SPARK_HOME/bin/spark-class \ org.apache.spark.deploy.worker.worker \ master:7077 pid root net
134 SECURITY $SPARK_HOME/bin/spark-class \ org.apache.spark.deploy.worker.worker \ master:7077 pid root /tmp/foo net
135 SECURITY $SPARK_HOME/bin/spark-class \ org.apache.spark.deploy.worker.worker \ master:7077 pid root /tmp/foo net
136 SECURITY
137 SECURITY k8s namespace
138 SECURITY k8s namespace *
139 NEXT STEPS: FUTURE WORK & PLAYING ALONG AT HOME
140 NEXT STEPS Further performance evaluation Better developer experience Improved scheduling of Spark tasks on Kubernetes
141 TRY IT OUT YOURSELF Kubernetes standalone Spark example: Enabling Spark on OpenShift: Native Spark on Kubernetes proposal:
142
Some things you learn running Apache Spark in production for three years. William Benton Red Hat, Inc.
Some things you learn running Apache Spark in production for three years William Benton (@willb) Red Hat, Inc. Some things you learn running Apache Spark in production for three years William Benton (@willb)
More informationDeploying Applications on DC/OS
Mesosphere Datacenter Operating System Deploying Applications on DC/OS Keith McClellan - Technical Lead, Federal Programs keith.mcclellan@mesosphere.com V6 THE FUTURE IS ALREADY HERE IT S JUST NOT EVENLY
More informationCOURSE 20487B: DEVELOPING WINDOWS AZURE AND WEB SERVICES
ABOUT THIS COURSE In this course, students will learn how to design and develop services that access local and remote data from various data sources. Students will also learn how to develop and deploy
More information[MS20487]: Developing Windows Azure and Web Services
[MS20487]: Developing Windows Azure and Web Services Length : 5 Days Audience(s) : Developers Level : 300 Technology : Cross-Platform Development Delivery Method : Instructor-led (Classroom) Course Overview
More information@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS
@unterstein @dcos @bedcon #bedcon Operating microservices with Apache Mesos and DC/OS 1 Johannes Unterstein Software Engineer @Mesosphere @unterstein @unterstein.mesosphere 2017 Mesosphere, Inc. All Rights
More informationProcessing of big data with Apache Spark
Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT
More informationOracle Big Data Fundamentals Ed 2
Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
CNA1612BU Deploying real-world workloads on Kubernetes and Pivotal Cloud Foundry VMworld 2017 Fred Melo, Director of Technology, Pivotal Merlin Glynn, Sr. Technical Product Manager, VMware Content: Not
More informationPrincipal Software Engineer Red Hat Emerging Technology June 24, 2015
USING APACHE SPARK FOR ANALYTICS IN THE CLOUD William C. Benton Principal Software Engineer Red Hat Emerging Technology June 24, 2015 ABOUT ME Distributed systems and data science in Red Hat's Emerging
More informationLOG AGGREGATION. To better manage your Red Hat footprint. Miguel Pérez Colino Strategic Design Team - ISBU
LOG AGGREGATION To better manage your Red Hat footprint Miguel Pérez Colino Strategic Design Team - ISBU 2017-05-03 @mmmmmmpc Agenda Managing your Red Hat footprint with Log Aggregation The Situation The
More informationMesosphere and Percona Server for MongoDB. Jeff Sandstrom, Product Manager (Percona) Ravi Yadav, Tech. Partnerships Lead (Mesosphere)
Mesosphere and Percona Server for MongoDB Jeff Sandstrom, Product Manager (Percona) Ravi Yadav, Tech. Partnerships Lead (Mesosphere) Mesosphere DC/OS MICROSERVICES, CONTAINERS, & DEV TOOLS DATA SERVICES,
More informationMesosphere and Percona Server for MongoDB. Peter Schwaller, Senior Director Server Eng. (Percona) Taco Scargo, Senior Solution Engineer (Mesosphere)
Mesosphere and Percona Server for MongoDB Peter Schwaller, Senior Director Server Eng. (Percona) Taco Scargo, Senior Solution Engineer (Mesosphere) Mesosphere DC/OS MICROSERVICES, CONTAINERS, & DEV TOOLS
More informationDeveloping Windows Azure and Web Services
Developing Windows Azure and Web Services Course 20487B; 5 days, Instructor-led Course Description In this course, students will learn how to design and develop services that access local and remote data
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationMESOS A State-Of-The-Art Container Orchestrator Mesosphere, Inc. All Rights Reserved. 1
MESOS A State-Of-The-Art Container Orchestrator 2016 Mesosphere, Inc. All Rights Reserved. 1 About me Jie Yu (@jie_yu) Tech Lead at Mesosphere Mesos PMC member and committer Formerly worked at Twitter
More informationYARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa
YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa ozawa.tsuyoshi@lab.ntt.co.jp ozawa@apache.org About me Tsuyoshi Ozawa Research Engineer @ NTT Twitter: @oza_x86_64 Over 150 reviews in 2015
More informationBuilding a Data-Friendly Platform for a Data- Driven Future
Building a Data-Friendly Platform for a Data- Driven Future Benjamin Hindman - @benh 2016 Mesosphere, Inc. All Rights Reserved. INTRO $ whoami BENJAMIN HINDMAN Co-founder and Chief Architect of Mesosphere,
More informationImpala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam
Impala A Modern, Open Source SQL Engine for Hadoop Yogesh Chockalingam Agenda Introduction Architecture Front End Back End Evaluation Comparison with Spark SQL Introduction Why not use Hive or HBase?
More informationIOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK
IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK DR. KONSTANTIN BOUDNIK DR.KONSTANTIN BOUDNIK EPAM SYSTEMS CHIEF TECHNOLOGIST BIGDATA, OPEN SOURCE
More informationDeveloping Microsoft Azure and Web Services. Course Code: 20487C; Duration: 5 days; Instructor-led
Developing Microsoft Azure and Web Services Course Code: 20487C; Duration: 5 days; Instructor-led WHAT YOU WILL LEARN In this course, students will learn how to design and develop services that access
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationIoT with Apache ActiveMQ, Camel and Spark
IoT with Apache ActiveMQ, Camel and Spark Burr Sutter - Red Hat Agenda Business & IT Architecture IoT Architecture IETF IoT Use Case Ingestion: Apache ActiveMQ, Apache Camel Analytics: Apache Spark Demos
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationContainers, Serverless and Functions in a nutshell. Eugene Fedorenko
Containers, Serverless and Functions in a nutshell Eugene Fedorenko About me Eugene Fedorenko Senior Architect Flexagon adfpractice-fedor.blogspot.com @fisbudo Agenda Containers Microservices Docker Kubernetes
More informationWhat s New in K8s 1.3
What s New in K8s 1.3 Carter Morgan Background: 3 Hurdles How do I write scalable apps? The App How do I package and distribute? What runtimes am I locked into? Can I scale? The Infra Is it automatic?
More informationNetworking & Security for Mesos
Sponsored by Networking & Security for Mesos AN IP FOR EVERY CONTAINER AND MORE! Christopher Liljenstolpe February 24, 2016 The #1 Challenge for Cloud? Recent data breaches due to hacking or poor security
More informationEmploying HPC DEEP-EST for HEP Data Analysis. Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN)
Employing HPC DEEP-EST for HEP Data Analysis Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN) 1 Outline The DEEP-EST Project Goals and Motivation HEP Data Analysis on HPC with Apache Spark on HPC
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationContainer Orchestration on Amazon Web Services. Arun
Container Orchestration on Amazon Web Services Arun Gupta, @arungupta Docker Workflow Development using Docker Docker Community Edition Docker for Mac/Windows/Linux Monthly edge and quarterly stable
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationAn Introduction to Apache Spark Big Data Madison: 29 July William Red Hat, Inc.
An Introduction to Apache Spark Big Data Madison: 29 July 2014 William Benton @willb Red Hat, Inc. About me At Red Hat for almost 6 years, working on distributed computing Currently contributing to Spark,
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationMicrosoft Developing Windows Azure and Web Services
1800 ULEARN (853 276) www.ddls.com.au Microsoft 20487 - Developing Windows Azure and Web Services Length 5 days Price $4510.00 (inc GST) Version B Overview In this course, students will learn how to design
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
CNA2080BU Deep Dive: How to Deploy and Operationalize Kubernetes Cornelia Davis, Pivotal Nathan Ness Technical Product Manager, CNABU @nvpnathan #VMworld #CNA2080BU Disclaimer This presentation may contain
More informationSCALING LIKE TWITTER WITH APACHE MESOS
Philip Norman & Sunil Shah SCALING LIKE TWITTER WITH APACHE MESOS 1 MODERN INFRASTRUCTURE Dan the Datacenter Operator Alice the Application Developer Doesn t sleep very well Loves automation Wants to control
More informationHow to Put Your AF Server into a Container
How to Put Your AF Server into a Container Eugene Lee Technology Enablement Engineer 1 Technology Challenges 2 Cloud Native bring different expectations 3 We are becoming more impatient Deploy Code Release
More informationHadoop, Yarn and Beyond
Hadoop, Yarn and Beyond 1 B. R A M A M U R T H Y Overview We learned about Hadoop1.x or the core. Just like Java evolved, Java core, Java 1.X, Java 2.. So on, software and systems evolve, naturally.. Lets
More informationContainer 2.0. Container: check! But what about persistent data, big data or fast data?!
@unterstein @joerg_schad @dcos @jaxdevops Container 2.0 Container: check! But what about persistent data, big data or fast data?! 1 Jörg Schad Distributed Systems Engineer @joerg_schad Johannes Unterstein
More informationAnalytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation
Analytic Cloud with Shelly Garion IBM Research -- Haifa 2014 IBM Corporation Why Spark? Apache Spark is a fast and general open-source cluster computing engine for big data processing Speed: Spark is capable
More informationModernizing Business Intelligence and Analytics
Modernizing Business Intelligence and Analytics Justin Erickson Senior Director, Product Management 1 Agenda What benefits can I achieve from modernizing my analytic DB? When and how do I migrate from
More informationElastify Cloud-Native Spark Application with PMEM. Junping Du --- Chief Architect, Tencent Cloud Big Data Department Yue Li --- Cofounder, MemVerge
Elastify Cloud-Native Spark Application with PMEM Junping Du --- Chief Architect, Tencent Cloud Big Data Department Yue Li --- Cofounder, MemVerge Table of Contents Sparkling: The Tencent Cloud Data Warehouse
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationLessons Learned: Deploying Microservices Software Product in Customer Environments Mark Galpin, Solution Architect, JFrog, Inc.
Lessons Learned: Deploying Microservices Software Product in Customer Environments Mark Galpin, Solution Architect, JFrog, Inc. Who s speaking? Mark Galpin Solution Architect @jfrog magalpin Microservices
More informationStorm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter
Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Storm at Twitter Twitter Web Analytics Before Storm Queues Workers Example (simplified) Example Workers schemify tweets and
More informationPOWERING THE INTERNET WITH APACHE MESOS
Neil Conway, Niklas Nielsen, Greg Mann & Sunil Shah POWERING THE INTERNET WITH APACHE MESOS 1 MESOS: ORIGINS 2 THE BIRTH OF MESOS TWITTER TECH TALK APACHE INCUBATION The grad students working on Mesos
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationService and Cloud Computing Lecture 10: DFS2 Prof. George Baciu PQ838
COMP4442 Service and Cloud Computing Lecture 10: DFS2 www.comp.polyu.edu.hk/~csgeorge/comp4442 Prof. George Baciu PQ838 csgeorge@comp.polyu.edu.hk 1 Preamble 2 Recall the Cloud Stack Model A B Application
More informationData science for the datacenter: processing logs with Apache Spark. William Benton Red Hat Emerging Technology
Data science for the datacenter: processing logs with Apache Spark William Benton Red Hat Emerging Technology BACKGROUND Challenges of log data Challenges of log data SELECT hostname, DATEPART(HH, timestamp)
More informationRed Hat Roadmap for Containers and DevOps
Red Hat Roadmap for Containers and DevOps Brian Gracely, Director of Strategy Diogenes Rettori, Principal Product Manager Red Hat September, 2016 Digital Transformation Requires an evolution in... 2 APPLICATIONS
More informationMicroservices with Red Hat. JBoss Fuse
Microservices with Red Hat Ruud Zwakenberg - ruud@redhat.com Senior Solutions Architect June 2017 JBoss Fuse and 3scale API Management Disclaimer The content set forth herein is Red Hat confidential information
More informationKontejneri u Azureu uz pomoć Kubernetesa što i kako? Tomislav Tipurić Partner Technology Strategist Microsoft
Kontejneri u Azureu uz pomoć Kubernetesa što i kako? Tomislav Tipurić Partner Technology Strategist Microsoft Source: Softpedia Credits: James Niccolai A decade ago no one could have seen this coming.
More informationMaking Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST
Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0 WEBINAR MAY 15 th, 2018 1PM EST 10AM PST Welcome and Logistics If you have problems with the sound on your computer, switch
More informationTurning Relational Database Tables into Spark Data Sources
Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationAgenda. Spark Platform Spark Core Spark Extensions Using Apache Spark
Agenda Spark Platform Spark Core Spark Extensions Using Apache Spark About me Vitalii Bondarenko Data Platform Competency Manager Eleks www.eleks.com 20 years in software development 9+ years of developing
More information利用 Mesos 打造高延展性 Container 環境. Frank, Microsoft MTC
利用 Mesos 打造高延展性 Container 環境 Frank, Microsoft MTC About Me Developer @ Yahoo! DevOps @ HTC Technical Architect @ MSFT Agenda About Docker Manage containers Apache Mesos Mesosphere DC/OS application = application
More informationSpark Overview. Professor Sasu Tarkoma.
Spark Overview 2015 Professor Sasu Tarkoma www.cs.helsinki.fi Apache Spark Spark is a general-purpose computing framework for iterative tasks API is provided for Java, Scala and Python The model is based
More informationCOSC 6339 Big Data Analytics. Introduction to Spark. Edgar Gabriel Fall What is SPARK?
COSC 6339 Big Data Analytics Introduction to Spark Edgar Gabriel Fall 2018 What is SPARK? In-Memory Cluster Computing for Big Data Applications Fixes the weaknesses of MapReduce Iterative applications
More informationBuilding a Microservices Platform with Kubernetes. Matthew Mark
Building a Microservices Platform with Kubernetes Matthew Mark Miller @DataMiller Cloud Native: Microservices running inside Containers on top of Platforms on any infrastructure Microservice A software
More informationIndex. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI /
Index A ACID, 251 Actor model Akka installation, 44 Akka logos, 41 OOP vs. actors, 42 43 thread-based concurrency, 42 Agents server, 140, 251 Aggregation techniques materialized views, 216 probabilistic
More informationPrzyspiesz tworzenie aplikacji przy pomocy Openshift Container Platform. Jarosław Stakuń Senior Solution Architect/Red Hat CEE
Przyspiesz tworzenie aplikacji przy pomocy Openshift Container Platform Jarosław Stakuń Senior Solution Architect/Red Hat CEE jstakun@redhat.com Monetize innovation http://www.forbes.com/innovative-companies/list/
More informationUNIFY DATA AT MEMORY SPEED. Haoyuan (HY) Li, Alluxio Inc. VAULT Conference 2017
UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li, CEO @ Alluxio Inc. VAULT Conference 2017 March 2017 HISTORY Started at UC Berkeley AMPLab In Summer 2012 Originally named as Tachyon Rebranded to Alluxio in
More informationDistributed CI: Scaling Jenkins on Mesos and Marathon. Roger Ignazio Puppet Labs, Inc. MesosCon 2015 Seattle, WA
Distributed CI: Scaling Jenkins on Mesos and Marathon Roger Ignazio Puppet Labs, Inc. MesosCon 2015 Seattle, WA About Me Roger Ignazio QE Automation Engineer Puppet Labs, Inc. @rogerignazio Mesos In Action
More informationBuilding/Running Distributed Systems with Apache Mesos
Building/Running Distributed Systems with Apache Mesos Philly ETE April 8, 2015 Benjamin Hindman @benh $ whoami 2007-2012 2009-2010 - 2014 my other computer is a datacenter my other computer is a datacenter
More informationCisco Container Platform
Cisco Container Platform Pradnesh Patil Suhail Syed Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session in the Cisco Live Mobile App 2. Click
More informationPage 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24
Goals for Today" CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing" Distributed systems Cloud Computing programming paradigms Cloud Computing OS December 2, 2013 Anthony
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
VIRT1351BE New Architectures for Virtualizing Spark and Big Data Workloads on vsphere Justin Murray Mohan Potheri VMworld 2017 Content: Not for publication #VMworld #VIRT1351BE Disclaimer This presentation
More informationFranck Mercier. Technical Solution Professional Data + AI Azure Databricks
Franck Mercier Technical Solution Professional Data + AI http://aka.ms/franck @FranmerMS Azure Databricks Thanks to our sponsors Global Gold Silver Bronze Microsoft JetBrains Rubrik Delphix Solution OMD
More informationSAMPLE CHAPTER IN ACTION. Roger Ignazio. FOREWORD BY Florian Leibert MANNING
SAMPLE CHAPTER IN ACTION Roger Ignazio FOREWORD BY Florian Leibert MANNING Mesos in Action by Roger Ignazio Chapter 1 Copyright 2016 Manning Publications brief contents PART 1 HELLO, MESOS...1 1 Introducing
More informationTensorFlowOnSpark Scalable TensorFlow Learning on Spark Clusters Lee Yang, Andrew Feng Yahoo Big Data ML Platform Team
TensorFlowOnSpark Scalable TensorFlow Learning on Spark Clusters Lee Yang, Andrew Feng Yahoo Big Data ML Platform Team What is TensorFlowOnSpark Why TensorFlowOnSpark at Yahoo? Major contributor to open-source
More informationCLOUD-NATIVE APPLICATION DEVELOPMENT/ARCHITECTURE
JAN WILLIES Global Kubernetes Lead at Accenture Technology jan.willies@accenture.com CLOUD-NATIVE APPLICATION DEVELOPMENT/ARCHITECTURE SVEN MENTL Cloud-native at Accenture Technology ASG sven.mentl@accenture.com
More informationMATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2
1 Senior Application Engineer The MathWorks Korea 2017 The MathWorks, Inc. 2 Data Analytics Workflow Business Systems Smart Connected Systems Data Acquisition Engineering, Scientific, and Field Business
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Dr. Roland Michaely 2015 The MathWorks, Inc. 1 Data Analytics Workflow Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics
More informationBig Data Security. Facing the challenge
Big Data Security Facing the challenge Experience the presentation xlic.es/v/e98605 About me Father of a 5 year old child Technical leader in Architecture and Security team at Stratio Sailing skipper 3
More informationYCSB++ benchmarking tool Performance debugging advanced features of scalable table stores
YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores Swapnil Patil M. Polte, W. Tantisiriroj, K. Ren, L.Xiao, J. Lopez, G.Gibson, A. Fuchs *, B. Rinaldi * Carnegie
More informationEarthdata Cloud Analytics Project
Earthdata Cloud Analytics Project Chris Lynnes* and Rahul Ramachandran* NASA *U.S. Civil Servant 2 Earth Observing System and Information System (EOSDIS) EOSDIS distribute Research Applications data downlink
More informationFIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS
WHITE PAPER FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS Over the past 15 years, server virtualization has become the preferred method of application deployment in the enterprise datacenter.
More informationOverview of Container Management
Overview of Container Management Wyn Van Devanter @wynv Vic Kumar Agenda Why Container Management? What is Container Management? Clusters, Cloud Architecture & Containers Container Orchestration Tool Overview
More informationExploring Cloud Security, Operational Visibility & Elastic Datacenters. Kiran Mohandas Consulting Engineer
Exploring Cloud Security, Operational Visibility & Elastic Datacenters Kiran Mohandas Consulting Engineer The Ideal Goal of Network Access Policies People (Developers, Net Ops, CISO, ) V I S I O N Provide
More informationShark: SQL and Rich Analytics at Scale. Reynold Xin UC Berkeley
Shark: SQL and Rich Analytics at Scale Reynold Xin UC Berkeley Challenges in Modern Data Analysis Data volumes expanding. Faults and stragglers complicate parallel database design. Complexity of analysis:
More informationcstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman
cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman What is CitusDB? CitusDB is a scalable analytics database that extends PostgreSQL Citus shards your data and automa/cally parallelizes
More informationIBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store
IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data IBM Db2 Event Store Disclaimer The information contained in this presentation is provided for informational purposes only.
More informationCloud-Native Applications. Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0
Cloud-Native Applications Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0 Cloud-Native Characteristics Lean Form a hypothesis, build just enough to validate or disprove it. Learn
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationBest Practices for Deploying Hadoop Workloads on HCI Powered by vsan
Best Practices for Deploying Hadoop Workloads on HCI Powered by vsan Chen Wei, ware, Inc. Paudie ORiordan, ware, Inc. #vmworld HCI2038BU #HCI2038BU Disclaimer This presentation may contain product features
More informationAlexander Klein. #SQLSatDenmark. ETL meets Azure
Alexander Klein ETL meets Azure BIG Thanks to SQLSat Denmark sponsors Save the date for exiting upcoming events PASS Camp 2017 Main Camp 05.12. 07.12.2017 (04.12. Kick-Off abends) Lufthansa Training &
More informationData Movement & Tiering with DMF 7
Data Movement & Tiering with DMF 7 Kirill Malkin Director of Engineering April 2019 Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive Data in Memory 2 Why
More informationCertified Big Data Hadoop and Spark Scala Course Curriculum
Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills
More informationWhat s New in K8s 1.3
What s New in K8s 1.3 Carter Morgan Background: 3 Hurdles How do I write scalable apps? The App How do I package and distribute? What runtimes am I locked into? Can I scale? The Infra Is it automatic?
More informationSyncsort Incorporated, 2016
Syncsort Incorporated, 2016 All rights reserved. This document contains proprietary and confidential material, and is only for use by licensees of DMExpress. This publication may not be reproduced in whole
More informationSOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera
SOLUTION TRACK Finding the Needle in a Big Data Haystack @EvaAndreasson, Innovator & Problem Solver Cloudera Agenda Problem (Solving) Apache Solr + Apache Hadoop et al Real-world examples Q&A Problem Solving
More informationData Architectures in Azure for Analytics & Big Data
Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More informationCapture Business Opportunities from Systems of Record and Systems of Innovation
Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information
More informationBasic Concepts of the Energy Lab 2.0 Co-Simulation Platform
Basic Concepts of the Energy Lab 2.0 Co-Simulation Platform Jianlei Liu KIT Institute for Applied Computer Science (Prof. Dr. Veit Hagenmeyer) KIT University of the State of Baden-Wuerttemberg and National
More informationovirt and Docker Integration
ovirt and Docker Integration October 2014 Federico Simoncelli Principal Software Engineer Red Hat 1 Agenda Deploying an Application (Old-Fashion and Docker) Ecosystem: Kubernetes and Project Atomic Current
More information