Challenges in Data Stream Processing
|
|
- Suzanna Cook
- 5 years ago
- Views:
Transcription
1 Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini
2 Challenge 1: Optimize the DSP application Apply some transformation to streaming graph At design time or run-time Operator reordering To avoid unnecessary data transfers A B B A Redundancy elimination B C C A A B B D D 1
3 Challenge 1: Optimize the DSP application Operator separation A A1 A2 Fusion A B AB 2
4 Challenge 2: Place the operators Operator placement decision: a complex problem Trade communication cost against resource utilization When Initial (static) operator placement Can be more expensive and comprehensive Can also be at run-time Move only relocatable operators Require operator migration We will focus on this issue in a next lesson 3
5 Challenge 3: Manage load variations Typical stream processing workloads are: with high volume and high rates bursty and with workload spikes not known in advance Twitter in 2013: rate of tweets per second = 5700 but significant peak of 144,000 tweets per second 4
6 Challenge 3: Manage load variations Possible approaches: Admission control Static reservation Reserve specific resources in advance Cons: over-provisioning and cost increase Apply dynamic techniques such as load shedding Selectively drop tuples at strategic points (e.g., when CPU usage exceeds a specific limit) Cons: sacrifice accuracy and completeness A Shedder A 5
7 Challenge 3: Manage load variations Possible approaches (continued): Use adaptive rate allocation E.g., backpressure : the upstream operator that precedes the bottleneck stores data in an internal buffer to reduce the pressure; backpressure recursively propagates up to the source operators Redistribute load, e.g., determine new operator placement and relocate operators on computing nodes Cons: available resources could be insufficient What else? 6
8 Exploit data parallelism Alternative solution: Detect bottleneck Use data-parallelism (aka operator fission) Apply SIMD paradigm: concurrent execution of multiple replicas of the same operator on different data portions By hand: possible, but cumbersome A A Split A Merge A 7
9 Elastic stream processing Exploit elasticity: acquire and release resources when needed Where? At application layer (i.e., data parallelism) Scale out (or scale in) operators by adding (removing) operator replicas Activate (or deactivate) already replicated operators At infrastructure layer Scale out (or scale in) computing nodes 8
10 Elastic stream processing When and how to scale? Open issues that deserve investigation Some simple example: When: threshold-based How: add/remove one replica at time, but where to place it? Be careful: elasticity overhead is not zero! In most streaming systems: run a new placement decision to take the new replicas into account Dynamic scaling impacts stateful operators 9
11 Challenge 4: Self-adapt at run-time To cope with highly dynamic operative environment Unpredictable workload Computational characteristics of operators not known a-priori Need to sustained load for long provisioning times Node availability, network congestion, Exploit run-time adaptation capabilities of DSP systems What adaption actions? Scale the number of operator instances, relocate the operators, 10
12 Self-adaptive deployment MAPE (Monitor, Analyze, Plan and Execute) Plan phase: how to reconfigure the application deployment 11
13 Distributed Storm We developed an extension of Storm Goals: to provide distributed monitoring distributed placement and adaptation capabilities Where: large-scale environment Code available on GitHub matnar.github.io/uniroma2-storm/ V. Cardellini, V. Grassi, F. Lo Presti, M. Nardelli, Distributed QoS-aware scheduling in Storm, ACM DEBS
14 Distributed Storm architecture 13
15 Distributed Storm: monitoring QoSMonitor (for each worker node) Estimate network latencies Use a network coordinate system Vivaldi s algorithm: decentralized and gossip-based Monitor QoS attributes Node utilization and availability Worker Monitor (for each worker process) Monitor exchanged data rate among the operators 14
16 Distributed Storm: performance Load spike on a subset of nodes ~50% 15
17 Reconfiguration challenges Reconfiguring the deployment has a non negligible cost! Can affect negatively application performance in the short term Application freezing times caused by operator migration and scaling, especially for stateful operators Perform reconfiguration only when needed Take into account the overhead for migrating and scaling the operators 16
18 Challenge 5: stateful operators State complicates things 1. Dynamic scaling 2. Operator re-placement 3. Recovery from failure impact state Loss of state! 17
19 Approaches for stateful migration Most of streaming systems do not support stateful processing and migration (e.g., Storm) Developers manage state Typically combine with external system to store state Design complexity Requirements for stateful operatior migration Safety (i.e., to preserve the consistency of the operations) Application transparency Minimal footprint 18
20 Stateful operator migration Two approaches: Pause-and-resume Parallel-track Pause-and-resume approach Terminate migrating task and start it on new node Stop migrating task Save state Restore state Resume stream processing 19
21 Stateful operator migration Pause-and-resume drawback Peak in the application latency during the migration Parallel-track approach Old and new operator instances run concurrently until the state of both is synchronized and the new instance can safely take over Drawback: requires enhanced mechanisms No clear winner 20
22 Issues for stateful migration How to identify the portion of state to migrate? Expose an API to let the user manually manage the state Support only partitioned stateful operators Partitioned stateful operators store independent state for each sub-stream identified by a partitioning key Automatically determine, on the basis of a partitioning key, the optimal number of state partitions to be used and migrate 21
23 Issues for stateful migration How to balance the load among multiple stateful replicas? Can use consistent hashing Can use partial key grouping Uses two hash functions where a key can be sent to two different replicas instead of one Only available in research prototypes 22
24 Elastic stateful migration in Storm We developed mechanisms for elastic stateful migration in Storm worker process worker process worker slot worker process worker slot worker slot worker process worker process worker process worker process worker process worker slot DDS DDS DDS DDS Supervisor Supervisor Supervisor Supervisor Network Nimbus ElasticityManager scheduler MigrationNotifier ZooKeeper V. Cardellini, M. Nardelli, D. Luzi, "Elastic stateful stream processing in Storm", HPCS
25 Elastic stateful migration in Storm Scaling decisions at the framework level Adapt the number of parallel instances for each application operator Simple threshold-based scaling policy Relocate the operator internal state on a different node and enable Storm to change the application deployment at run-time DDS first synchronization barrier DDS second synchronization barrier MIGRATION NOTIFIED MIGRATION MODE SAVE STATE new task MIGRATION MODE RESTORE STATE (if any) OPERATIONAL MODE time the migrating task can be terminated streams are resumed 24
26 Performance results DSP app: frequent pattern detection 1600 Elastic scaling and stateful migration improves the application latency 120 tweets/s 350 tweets/s 900 tweets/s 250 tweets/s 120 tweets/s tweets/s 350 tweets/s 900 tweets/s 250 tweets/s 120 tweets/s Application Latency (ms) Data rate Scaling Scheduling with E+SM w/o E+SM Number of Executors Data rate Scaling Scheduling with E+SM Time (s) Time (s) 25
27 Challenge 6: guarantee fault tolerance DSP applications run for long time intervals Possible solutions: Active replication Check-pointing Replay logs Hybrid solutions failures are unavoidable Having different trade-offs between runtime cost in absence of failures and recovery cost Large-scale complicates things Network partitions and CAP theorem 26
28 References M. Hirzel, R. Soulé, S. Schneider, B. Gedik, R. Grimm, A catalog of stream processing optimizations, ACM Comput. Surv., T. Heinze T, L. Aniello, L. Querzoni, Z. Jerzak, Cloud-based data stream processing, Proc. ACM DEBS V. Cardellini, V. Grassi, F. Lo Presti, M. Nardelli, Distributed QoS-aware scheduling in Storm, Proc. ACM DEBS V. Cardellini, M. Nardelli, D. Luzi, Elastic stateful stream processing in Storm, Proc. HPCS B. Gedik, S. Schneider, M. Hirzel, and K.-L. Wu, Elastic scaling for data stream processing, IEEE Trans. Parallel Distrib. Syst. 25, 6,
NewSQL Databases. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NewSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationApache Storm: Hands-on Session A.A. 2016/17
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Apache Storm: Hands-on Session A.A. 2016/17 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference
More informationKafka Streams: Hands-on Session A.A. 2017/18
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Kafka Streams: Hands-on Session A.A. 2017/18 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica
More informationFog Computing. The scenario
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Fog Computing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The scenario
More informationSearch Engines and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationSearch and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria
More informationViper: Communication-Layer Determinism and Scaling in Low-Latency Stream Processing
Viper: Communication-Layer Determinism and Scaling in Low-Latency Stream Processing Ivan Walulya, Yiannis Nikolakopoulos, Vincenzo Gulisano Marina Papatriantafilou and Philippas Tsigas Auto-DaSP 2017 Chalmers
More informationMulti-Level Elasticity for Wide-Area Data Streaming Systems: A Reinforcement Learning Approach
Article Multi-Level Elasticity for Wide-Area Data Streaming Systems: A Reinforcement Learning Approach Gabriele Russo Russo ID, Matteo Nardelli ID, Valeria Cardellini * ID and Francesco Lo Presti ID Department
More informationIntroduction to Big Data
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Introduction to Big Data Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini
More informationFROM PEER TO PEER...
FROM PEER TO PEER... Dipartimento di Informatica, Università degli Studi di Pisa HPC LAB, ISTI CNR Pisa in collaboration with: Alessandro Lulli, Emanuele Carlini, Massimo Coppola, Patrizio Dazzi 2 nd HPC
More informationResearch Faculty Summit Systems Fueling future disruptions
Research Faculty Summit 2018 Systems Fueling future disruptions Elevating the Edge to be a Peer of the Cloud Kishore Ramachandran Embedded Pervasive Lab, Georgia Tech August 2, 2018 Acknowledgements Enrique
More informationSTORM AND LOW-LATENCY PROCESSING.
STORM AND LOW-LATENCY PROCESSING Low latency processing Similar to data stream processing, but with a twist Data is streaming into the system (from a database, or a netk stream, or an HDFS file, or ) We
More informationCloud Computing Architecture
Cloud Computing Architecture 1 Contents Workload distribution architecture Dynamic scalability architecture Cloud bursting architecture Elastic disk provisioning architecture Redundant storage architecture
More informationLiquid Stream Processing across Web browsers and Web servers
Liquid Stream Processing across Web browsers and Web servers Masiar Babazadeh, Andrea Gallidabino, and Cesare Pautasso Faculty of Informatics, University of Lugano (USI), Switzerland {name.surname}@usi.ch
More informationHuge market -- essentially all high performance databases work this way
11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch
More informationAn Empirical Study of High Availability in Stream Processing Systems
An Empirical Study of High Availability in Stream Processing Systems Yu Gu, Zhe Zhang, Fan Ye, Hao Yang, Minkyong Kim, Hui Lei, Zhen Liu Stream Processing Model software operators (PEs) Ω Unexpected machine
More informationData Stream Processing in the Cloud
Department of Computing Data Stream Processing in the Cloud Evangelia Kalyvianaki ekalyv@imperial.ac.uk joint work with Raul Castro Fernandez, Marco Fiscato, Matteo Migliavacca and Peter Pietzuch Peter
More informationIntroduction to Data Intensive Computing
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Introduction to Data Intensive Computing Corso di Sistemi Distribuiti e Cloud Computing A.A. 2017/18
More informationScalable Streaming Analytics
Scalable Streaming Analytics KARTHIK RAMASAMY @karthikz TALK OUTLINE BEGIN I! II ( III b Overview Storm Overview Storm Internals IV Z V K Heron Operational Experiences END WHAT IS ANALYTICS? according
More informationTwitter Heron: Stream Processing at Scale
Twitter Heron: Stream Processing at Scale Saiyam Kohli December 8th, 2016 CIS 611 Research Paper Presentation -Sun Sunnie Chung TWITTER IS A REAL TIME ABSTRACT We process billions of events on Twitter
More informationToward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017
Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store Wei Xie TTU CS Department Seminar, 3/7/2017 1 Outline General introduction Study 1: Elastic Consistent Hashing based Store
More informationFLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568
FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationAn Introduction to Virtualization and Cloud Technologies to Support Grid Computing
New Paradigms: Clouds, Virtualization and Co. EGEE08, Istanbul, September 25, 2008 An Introduction to Virtualization and Cloud Technologies to Support Grid Computing Distributed Systems Architecture Research
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs
More informationP2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli
P2P Applications Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli Server-based Network Peer-to-peer networks A type of network
More informationMapReduce, Hadoop and Spark. Bompotas Agorakis
MapReduce, Hadoop and Spark Bompotas Agorakis Big Data Processing Most of the computations are conceptually straightforward on a single machine but the volume of data is HUGE Need to use many (1.000s)
More informationE-Storm: Replication-based State Management in Distributed Stream Processing Systems
E-Storm: -based State Management in Distributed Stream Processing Systems Xunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein and Rajkumar Buyya Cloud Computing and Distributed Systems
More informationMark Sandstrom ThroughPuter, Inc.
Hardware Implemented Scheduler, Placer, Inter-Task Communications and IO System Functions for Many Processors Dynamically Shared among Multiple Applications Mark Sandstrom ThroughPuter, Inc mark@throughputercom
More informationLecture 16: Data Center Network Architectures
MIT 6.829: Computer Networks Fall 2017 Lecture 16: Data Center Network Architectures Scribe: Alex Lombardi, Danielle Olson, Nicholas Selby 1 Background on Data Centers Computing, storage, and networking
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationStreaming & Apache Storm
Streaming & Apache Storm Recommended Text: Storm Applied Sean T. Allen, Matthew Jankowski, Peter Pathirana Manning 2010 VMware Inc. All rights reserved Big Data! Volume! Velocity Data flowing into the
More informationCloneCloud: Elastic Execution between Mobile Device and Cloud, Chun et al.
CloneCloud: Elastic Execution between Mobile Device and Cloud, Chun et al. Noah Apthorpe Department of Computer Science Princeton University October 14th, 2015 Noah Apthorpe CloneCloud 1/16 Motivation
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system
More informationA two-level distributed architecture for Web content adaptation and delivery
A two-level distributed architecture for Web content adaptation and delivery Claudia Canali University of Parma Valeria Cardellini University of Rome Tor vergata Michele Colajanni University of Modena
More informationTutorial: Apache Storm
Indian Institute of Science Bangalore, India भ रत य वज ञ न स स थ न ब गल र, भ रत Department of Computational and Data Sciences DS256:Jan17 (3:1) Tutorial: Apache Storm Anshu Shukla 16 Feb, 2017 Yogesh Simmhan
More informationPutting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21
Big Processing -Parallel Computation COS 418: Distributed Systems Lecture 21 Michael Freedman 2 Ex: Word count using partial aggregation Putting it together 1. Compute word counts from individual files
More informationBig Data Hadoop Course Content
Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux
More informationTuning Browser-to-Browser Offloading for Heterogeneous Stream Processing Web Applications
Tuning Browser-to-Browser Offloading for Heterogeneous Stream Processing Web Applications Masiar Babazadeh Faculty of Informatics, University of Lugano (USI), Switzerland {name.surname@usi.ch} Abstract.
More informationQuobyte The Data Center File System QUOBYTE INC.
Quobyte The Data Center File System QUOBYTE INC. The Quobyte Data Center File System All Workloads Consolidate all application silos into a unified highperformance file, block, and object storage (POSIX
More information02 - Distributed Systems
02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is
More information02 - Distributed Systems
02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is
More informationContainer-based virtualization: Docker
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Container-based virtualization: Docker Corso di Sistemi Distribuiti e Cloud Computing A.A. 2018/19
More informationChina Big Data and HPC Initiatives Overview. Xuanhua Shi
China Big Data and HPC Initiatives Overview Xuanhua Shi Services Computing Technology and System Laboratory Big Data Technology and System Laboratory Cluster and Grid Computing Laboratory Huazhong University
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationAmazon Web Services. Amazon Web Services
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Amazon Web Services Corso di Sistemi Distribuiti e Cloud Computing A.A. 2013/14 Valeria Cardellini
More informationDYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun
DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:
More informationSizing Guidelines and Performance Tuning for Intelligent Streaming
Sizing Guidelines and Performance Tuning for Intelligent Streaming Copyright Informatica LLC 2017. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies Data streams and low latency processing DATA STREAM BASICS What is a data stream? Large data volume, likely structured, arriving at a very high rate Potentially
More informationADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT
ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision
More informationIN5050: Programming heterogeneous multi-core processors Thinking Parallel
IN5050: Programming heterogeneous multi-core processors Thinking Parallel 28/8-2018 Designing and Building Parallel Programs Ian Foster s framework proposal develop intuition as to what constitutes a good
More informationIncreasing Cloud Power Efficiency through Consolidation Techniques
Increasing Cloud Power Efficiency through Consolidation Techniques Antonio Corradi, Mario Fanelli, Luca Foschini Dipartimento di Elettronica, Informatica e Sistemistica (DEIS) University of Bologna, Italy
More informationSelf Regulating Stream Processing in Heron
Self Regulating Stream Processing in Heron Huijun Wu 2017.12 Huijun Wu Twitter, Inc. Infrastructure, Data Platform, Real-Time Compute Heron Overview Recent Improvements Self Regulating Challenges Dhalion
More informationHadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017
Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google
More informationCloud e Datacenter Networking
Cloud e Datacenter Networking Università degli Studi di Napoli Federico II Dipartimento di Ingegneria Elettrica e delle Tecnologie dell Informazione DIETI Laurea Magistrale in Ingegneria Informatica Prof.
More informationCMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22
More informationDATABASES AND THE CLOUD. Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland
DATABASES AND THE CLOUD Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland AVALOQ Conference Zürich June 2011 Systems Group www.systems.ethz.ch Enterprise Computing Center
More informationCLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters
CLUSTERING HIVEMQ Building highly available, horizontally scalable MQTT Broker Clusters 12/2016 About this document MQTT is based on a publish/subscribe architecture that decouples MQTT clients and uses
More informationDistributed Systems (5DV147)
Distributed Systems (5DV147) Replication and consistency Fall 2013 1 Replication 2 What is replication? Introduction Make different copies of data ensuring that all copies are identical Immutable data
More informationFlying Faster with Heron
Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron TALK OUTLINE BEGIN I! II ( III b OVERVIEW MOTIVATION HERON IV Z OPERATIONAL EXPERIENCES V K HERON PERFORMANCE END [! OVERVIEW TWITTER IS
More informationData Analytics with HPC. Data Streaming
Data Analytics with HPC Data Streaming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationAGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus
AGREEMENT PROTOCOLS Paxos -a family of protocols for solving consensus OUTLINE History of the Paxos algorithm Paxos Algorithm Family Implementation in existing systems References HISTORY OF THE PAXOS ALGORITHM
More informationPrinciples of Parallel Algorithm Design: Concurrency and Mapping
Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday
More informationB.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2
Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,
More informationTyphoon: An SDN Enhanced Real-Time Big Data Streaming Framework
Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework Junguk Cho, Hyunseok Chang, Sarit Mukherjee, T.V. Lakshman, and Jacobus Van der Merwe 1 Big Data Era Big data analysis is increasingly common
More informationNo compromises: distributed transactions with consistency, availability, and performance
No compromises: distributed transactions with consistency, availability, and performance Aleksandar Dragojevi c, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam,
More informationRocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman
Rocksteady: Fast Migration for Low-Latency In-memory Storage Chinmay Kulkarni, niraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman 1 Introduction Distributed low-latency in-memory key-value stores are
More informationMapReduce and Hadoop
Università degli Studi di Roma Tor Vergata MapReduce and Hadoop Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference Big Data stack High-level Interfaces Data Processing
More informationUMP Alert Engine. Status. Requirements
UMP Alert Engine Status Requirements Goal Terms Proposed Design High Level Diagram Alert Engine Topology Stream Receiver Stream Router Policy Evaluator Alert Publisher Alert Topology Detail Diagram Alert
More informationA Scalable and Highly Available Brokering Service for SLA-Based Composite Services
A Scalable and Highly Available Brokering Service for SLA-Based Composite Services Alessandro Bellucci, Valeria Cardellini, Valerio Di Valerio, and Stefano Iannucci Università di Roma Tor Vergata, Viale
More informationCA ERwin Data Modeler s Role in the Relational Cloud. Nuccio Piscopo.
CA ERwin Data Modeler s Role in the Relational Cloud Nuccio Piscopo Table of Contents Abstract.....3 Introduction........3 Daas requirements through CA ERwin Data Modeler..3 CA ERwin in the Relational
More informationDatacenter replication solution with quasardb
Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION
More informationEverything You Need to Know About MySQL Group Replication
Everything You Need to Know About MySQL Group Replication Luís Soares (luis.soares@oracle.com) Principal Software Engineer, MySQL Replication Lead Copyright 2017, Oracle and/or its affiliates. All rights
More informationApache Flink. Alessandro Margara
Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate
More information10. Replication. Motivation
10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure
More informationIntroducing the Cray XMT. Petr Konecny May 4 th 2007
Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationSystematic Cooperation in P2P Grids
29th October 2008 Cyril Briquet Doctoral Dissertation in Computing Science Department of EE & CS (Montefiore Institute) University of Liège, Belgium Application class: Bags of Tasks Bag of Task = set of
More informationChapter 3. Design of Grid Scheduler. 3.1 Introduction
Chapter 3 Design of Grid Scheduler The scheduler component of the grid is responsible to prepare the job ques for grid resources. The research in design of grid schedulers has given various topologies
More informationAdaptive Cluster Computing using JavaSpaces
Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of
More informationGoogle File System. Arun Sundaram Operating Systems
Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)
More informationTIBCO StreamBase 10 Distributed Computing and High Availability. November 2017
TIBCO StreamBase 10 Distributed Computing and High Availability November 2017 Distributed Computing Distributed Computing location transparent objects and method invocation allowing transparent horizontal
More informationOracle Database 11g: Real Application Testing & Manageability Overview
Oracle Database 11g: Real Application Testing & Manageability Overview Top 3 DBA Activities Performance Management Challenge: Sustain Optimal Performance Change Management Challenge: Preserve Order amid
More informationAdaptive Resync in vsan 6.7 First Published On: Last Updated On:
First Published On: 04-26-2018 Last Updated On: 05-02-2018 1 Table of Contents 1. Overview 1.1.Executive Summary 1.2.vSAN's Approach to Data Placement and Management 1.3.Adaptive Resync 1.4.Results 1.5.Conclusion
More informationMapReduce. U of Toronto, 2014
MapReduce U of Toronto, 2014 http://www.google.org/flutrends/ca/ (2012) Average Searches Per Day: 5,134,000,000 2 Motivation Process lots of data Google processed about 24 petabytes of data per day in
More informationAuto Management for Apache Kafka and Distributed Stateful System in General
Auto Management for Apache Kafka and Distributed Stateful System in General Jiangjie (Becket) Qin Data Infrastructure @LinkedIn GIAC 2017, 12/23/17@Shanghai Agenda Kafka introduction and terminologies
More informationREAL-TIME ANALYTICS WITH APACHE STORM
REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student IN TODAY S TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components with an existing applications 3- Proposed Framework 4-
More informationCS Amazon Dynamo
CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed
More informationVendor: EMC. Exam Code: E Exam Name: Cloud Infrastructure and Services Exam. Version: Demo
Vendor: EMC Exam Code: E20-002 Exam Name: Cloud Infrastructure and Services Exam Version: Demo QUESTION NO: 1 In which Cloud deployment model would an organization see operational expenditures grow in
More informationRequirements, Partitioning, paging, and segmentation
Requirements, Partitioning, paging, and segmentation Main Memory: The Big Picture kernel memory proc struct kernel stack/u area Stack kernel stack/u area Stack kernel stack/u area Stack Data Text (shared)
More informationRiak. Distributed, replicated, highly available
INTRO TO RIAK Riak Overview Riak Distributed Riak Distributed, replicated, highly available Riak Distributed, highly available, eventually consistent Riak Distributed, highly available, eventually consistent,
More informationDeep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services
Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More information416 Distributed Systems. Distributed File Systems 4 Jan 23, 2017
416 Distributed Systems Distributed File Systems 4 Jan 23, 2017 1 Today's Lecture Wrap up NFS/AFS This lecture: other types of DFS Coda disconnected operation 2 Key Lessons Distributed filesystems almost
More informationTrade- Offs in Cloud Storage Architecture. Stefan Tai
Trade- Offs in Cloud Storage Architecture Stefan Tai Cloud computing is about providing and consuming resources as services There are five essential characteristics of cloud services [NIST] [NIST]: http://csrc.nist.gov/groups/sns/cloud-
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationDynamic Graph Query Support for SDN Management. Ramya Raghavendra IBM TJ Watson Research Center
Dynamic Graph Query Support for SDN Management Ramya Raghavendra IBM TJ Watson Research Center Roadmap SDN scenario 1: Cloud provisioning Management/Analytics primitives Current Cloud Offerings Limited
More information