Assessing the Impact of Network Bandwidth and Operator Placement on Data Stream Processing for Edge Computing Environments

Size: px
Start display at page:

Download "Assessing the Impact of Network Bandwidth and Operator Placement on Data Stream Processing for Edge Computing Environments"

Transcription

1 Assessing the Impact of Network Bandwidth and Operator Placement on Data Stream Processing for Edge Computing Environments Alexandre Veith, Marcos Dias de Assunção, Laurent Lefèvre Inria Avalon, LIP, ENS Lyon, University of Lyon 46 allee d Italie Lyon - France alexandre.veith@ens-lyon.fr Résumé A substantial part of the big data generated today is received in near real time and must be promptly processed. Cloud-based architectures for data stream processing comprise multiple software modules or frameworks for data collection, message queueing, and stream processing itself. This modular approach allows each component to grow independently from one another and accommodate changes, but it may increase the end-to-end latency when data events are processed in the cloud. Recent solutions intend to explore the edges of the Internet (i.e. edge computing) to perform certain data processing tasks and hence better utilise network resources. This work evaluates the impact regarding network bandwidth while employing frameworks that are commonly used to build cloud and edge-based stream processing solutions. Mots-clés : Traitement de flux de données, méta-données, analyse de performance, systèmes distribués, informatique en nuages 1. Introduction Today s instruments and services are producing ever-increasing amounts of data that require processing and analysis in order to provide insights or assist in decision making. This data deluge, often called big data, poses challenges to existing infrastructure regarding data transfer, storage, and processing. Although application models such as MapReduce have been very popular for batch processing, much of the data generated today is received in near real-time and requires quick analysis. In Internet of Things (IoT) [5, 12], for instance, continuous data streams produced by multiple sources must be handled under very short delays. The interest in processing data events as they arrive (i.e. online) has led to the emergence of several Distributed Stream Processing Engines (DSPEs) such as Apache Storm, Spark, Flink and S4. Under many frameworks, a stream processing application is often a Directed Acyclic Graph (DAG) whose vertices are operators that execute a function over the incoming data and edges define how data flows between them.. This work was performed within the framework of the LABEX MILYON (ANR-10-LABX-0070) of Université de Lyon, within the program Investissements d Avenir (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR).

2 Clouds are often the target infrastructure for deploying such engines due to their scalability, pay-as-you-go business model and resource elasticity. DSPEs are generally part of a larger architecture that comprises multiple tiers of data collection and processing interconnected by message brokers and queuing systems such as Apache ActiveMQ [1] and RabbitMQ [3], and publish-subscribe solutions including Apache Kafka [2]. This modular design enables tiers to grow at different paces and accommodate changes, but can increase the end-to-end latency and communication cost when treating data events. More modern solutions intend to exploit the edges of the Internet (i.e. edge computing) for performing certain data processing tasks and hence: reduce the end-to-end latency and communication costs, enable services to react to events locally, or offload processing from the cloud [7]. However, the architectures and software frameworks for these environments comprising clouds and micro-data centres located at the Internet edges are still evolving. In this work, we evaluate the impact and opportunities for deploying stream processing applications using cloud and edge computing. Our study considers DAGs that span both cloud and edge infrastructure. We are interested in demonstrating the benefits of splitting a stream processing graph and spreading operators across available resources. 2. Proposal Architecture This work considers multiple geographically distributed infrastructures for stream processing in big data environments, as depicted in Figure 1. A data stream flow usually has a source (i.e. a resource that creates or collects the data), operators that perform transformations on the data (e.g. filtering, mapping and aggregation) and a sink (i.e. the destination of the data). In a traditional deployment, all the operators of a data streaming application are placed in the cloud to benefit from virtually unlimited resources. Initially designed to minimise the latency of content delivered to users of mobile devices, edge computing has become an attractive solution for performing certain stream processing operations. By leveraging idle edge resources (i.e. sensors and gateways) computations can be performed closer to where the data is generated. Our effort relates to improving resource utilisation and data movement, which in stream processing can be achieved by deploying data stream operators along the physical path (source and sink). Instead of hosting all the data stream operators required by a stream processing solution in the cloud itself, we consider a more decentralised approach where the application is decoupled and hosted at multiple geographical locations. However, in this kind of environment, there are several variables (e.g. computational power, network bandwidth, network latency, application constraints, network topology) which have a strong impact on the placement decisions and bring complexity to the problem [11]. To optimise the aforementioned environment and deal with stringent stream processing requirements (i.e., events often being handled in the order of seconds or milliseconds), we present an architecture (Figure 2) to optimise the placement of a DAG s elements. In our model, a set of a computational resources will form a group considering resource proximity, which can translate to the number of hops and available bandwidth between the hosts. The resource roles are described as follows: Orchestrator node: Performs global decisions, and stores information about the tasks (i.e., DAG topology) and available containers (i.e. environment prepared to receive and process tasks). The global decisions involve task and container placement, as well as system monitoring. The placement considers information about the tasks, the network and the resources. Master node: Makes local decisions (decisions in the group) and group monitoring. The

3 FIGURE 1 Physical infrastructure which represents the flow between data sources and sinks. local decisions refer to small adjustments on the local task placement. Worker node: Hosts the operators and executes functions over the data streams, and/or stores data after processing. This work focuses on the communication between worker nodes. The flow between the data streams will be performed through the queue s consumption. In this way, a worker will get the data from another worker queue or a source queue. The consumption directions will be given by the Orchestrator or by the Master node. By distributing data stream operators, we would expect to minimise the latency, minimise the amount of data transferred over the wire (e.g. fewer headers, serialisation, etc), and reduce the impact of the external network environment (i.e. Internet). We aim to evaluate multiple combinations of operators and evaluate their impact on the resource utilisation and end-to-end latencies on decoupled scenarios. This decoupled scenario (i.e., using cloud and edge resources) is considered because it enables reducing the amount of data transferred at different phases, avoiding network restrictions that might exist at certain points of a path from edge to the cloud. Moreover, we are interested in evaluating the use of components traditionally employed for cluster/cloud-based stream processing solutions in more decentralised environments such as edge computing, where individual components may be hosted at a micro data centre or constrained resources geographically close to data sources whereas other services can run in the cloud. 3. Experimental Setup and Results This section describes the environment setup and results of a primary evaluation to demonstrate the impact of deploying operators on cloud and edge resources.

4 FIGURE 2 Proposed architecture to improve resource utilisation and throughput Experimental Setup The experiments comprise empirical evaluation performed on a cluster with four R410 Dell servers (Intel R Xeon R Processor E5506 4M Cache, 2.13 GHz, 4.80 GT/s Intel R QPI). The clock of all hosts are synchronised using Network Time Protocol (NTP). The model presented in the physical infrastructure, Figure 4, introduces the following roles: Data Sources: generate data and send it to the message queues. The data is drawn from an internal dataset with 1 GB of tweets that is processed by a sentiment analysis application described later. The tweets are sent by a built-in-house application that receives several parameters (i.e., inter arrival time between tweets and number of processes) to deploy and stress the infrastructure. For this task we use one R410 Dell server. Gateway: receives the tweets and either treats them or forwards them to a message broker (i.e., bypass). In other words, when one or more operators are deployed in the edge, the messages are stored in a lightweight queue (i.e., Mosquitto ) and processed by a lightweight DSPE (i.e., Apache Edgent 1.0.0). Otherwise, the tweets are forwarded to the Cloud without any edge processing. In the physical infrastructure, this represents one R410 Dell server. Cloud: receives the messages in two distinct queues (i.e., Apache Kafka ): (i) sink, if the messages were processed by the Gateway ; or (ii) source, tweets which need some treatment. The DSPE (i.e., Apache Flink 1.2.0) in the Cloud will treat the messages from the source queue. The DSPE was set up with two workers, it represents two R410 Dell servers (i.e., one for the Flink Manager and Flink Worker ; and another for Flink Worker). A sentiment analysis application that evaluates the polarity of tweets was used for performance evaluation (Figure 3). It uses a simple Natural Language Processing (NLP) technique to indicate the polarity of a sentence (i.e., counting positive and negative words and computing the difference). The tweets are JSON dictionaries, each tweet corresponding to an event. Each event is parsed to extract the relevant fields (e.g., tweet ID, language and the message itself). Then, the events are filtered by language, keeping only those that are in English. Next, the Stemmer removes stop words which do not carry sentiment or are irrelevant for the following steps. After that, an operator counts the number of negative and positive words, thus creating positive and the negative scores. At last, the application determines whether the tweet is positive or negative.

5 FIGURE 3 Operators of the sentiment analysis application. FIGURE 4 Physical infrastructure and physical plan for deployment scenarios. The configuration of the different stream-processing flows is presented in Figure 4. As presented in the physical plan, the stream operators will be deployed on the physical network. Scenario 0 represents all operators deployed in the cloud. In this way the Data Source will forward the messages directly to the Cloud, so that the Gateway will play the role of a bypass. Otherwise, under Scenario 6, all operators are deployed in the Gateway, and the Cloud acts as the sink. The remaining scenarios have mix configurations where operators are partially deployed in the Cloud and the Edge. To evaluate the proposed environments and assess the impact of the network bandwidth, the following tools were used: Linux Traffic Control to customise the network bandwidth; Python psutil used for measuring resource consumption, CPU utilisation, memory usage and network I/O. Each individual scenario evaluated is performed in 7 minutes considering the testbed specification. We disregard the first and last minutes of the experiment to eliminate warm up and cool down effects. Each experiment corresponds to the deployment of the operators with a determined network bandwidth (i.e., 10, 100, 1000 and Kbps) capacity on the edge-to-cloud network Experimental Results We noticed that the variations in network bandwidth and the operator placement have a direct impact on the number of treated events. This problem is evident when the edge-to-cloud network capacity is not enough (10, 100 and 1000 Kbps) to transfer the amount of data as presented in Figure 5a and 5b. However, to overcome this problem, we deploy the operators along the path, more specifically at the Gateway (source) and the cloud (sink). As depicted in Figure

6 (a) Number of tweets processed. (b) Amount of data transferred through the Edgeto-Cloud network. FIGURE 5 Amount of tweets processed and through the Edge-to-Cloud network. (a) CPU usage of the Cloud. (b) CPU usage of Gateway. FIGURE 6 CPU usage comparison. 1, by placing operators in the edge we achieve the best solution. These results are obtained without much load on the Gateway as it just used of 28% of the CPU in average (Figure 6b) when the scenario 5 is considered and the edge-to-cloud network is limited to 1000 Kbps. Also, this usage limitation respects the edge constraints as edge devices have less computational power than the cloud. In contrast, the results change substantially when the network bandwidth is not restrictive. As depicted in Figure 5a, when the bandwidth capacity is greater than 1000 Kbps, the edge becomes an execution bottleneck because the application is exploiting only one gateway that runs a lightweight stream processing framework that does not have all the full-fledged features and parallelism provided by Flink, used in the cloud. Although we performed experiments considering only one gateway, we argue that this is not the most practical scenario. In actual deployments, we expect scenarios that: contain multiple edge resources and several gateways (as in Figure 1) that can be used to offload some processing tasks from the cloud, or multiple paths that have several gateways between the sources and the sink. The results allowed us to understand better the impact of the placement of data streams ope-

7 rators and the benefits that certain placement configurations can bring. The benefits concern optimising the number of events processed, the resource consumption and the monetary cost reduction caused by transferring less data to the cloud. Moreover, the lower utilisation of cloud resources as depicted in Figure 6a could be exploited to release unused capacity via autoscaling operations and hence result in further lower costs. 4. Related Work Over the years several frameworks for distributed data stream processing have been proposed, such as Apache Storm, Apache Spark [14] and Apache Flink [4]. In many of these frameworks, the applications are structured as directed graphs of operators that execute either pre-defined functions such as filtering, joins and splitting, or user defined functions. Most solutions are designed to run in homogeneous clusters, but have also been deployed in cloud environments. More recently, services are increasingly being employing on environments that span multiple data centres or on the edges of the Internet (i.e., edge and fog computing). Existing work proposes architectures that place certain stream processing elements on micro data centres closer to where the data is generated [6] or that employ mobile devices for stream processing [10]. Even though these efforts are important, the present work focuses on evaluating the data-source-tocloud latency when using software frameworks that are traditionally deployed for on cloud environments for more decoupled environments. Most of the existing proposed work [8, 13, 15, 9] focuses on environments that do not consider variations in network bandwidth. The present work in contrast, takes into consideration both infrastructure and application constraints. We intend to focus on highly dynamic environments, where adapting an execution plan (the execution graph) is necessary to reduce to optimise the use of cloud and edge resources, and optimise the application end-to-end latency. 5. Conclusions In this work we evaluated the impact in terms of number of tweets handled by a stream processing application when varying the network bandwidth and the deployment of operators across cloud and edge Computing resources. We observe that for the considered application, the partial deployment of operators between the infrastructures brings some important benefits. When a data stream flows through the operators, generally its data size becomes smaller, and depending on the network bandwidth, the edge deployment improves the number of processed events. Bibliographie 1. Apache ActiveMQ Apache Kafka RabbitMQ Alexandrov (A.), Bergmann (R.), Ewen (S.), Freytag (J.), Hueske (F.), Heise (A.), Kao (O.), Leich (M.), Leser (U.), Markl (V.), Naumann (F.), Peters (M.), Rheinländer (A.), Sax (M. J.), Schelter (S.), Höger (M.), Tzoumas (K.) et Warneke (D.). The Stratosphere platform for big data analytics. VLBD Journal, vol. 23, n6, 2014, pp Atzori (L.), Iera (A.) et Morabito (G.). The internet of things: A survey. Computer Networks, vol. 54, n15, 2010, pp Cardellini (V.), Grassi (V.), Presti (F. L.) et Nardelli (M.). Distributed QoS-aware sche-

8 duling in Storm. In 9th ACM International Conference on Distributed Event-Based Systems, DEBS 15, DEBS 15, pp , New York, USA, ACM. 7. Chan (S.). Apache quarks, watson, and streaming analytics: Saving the world, one smart sprinkler at a time. Bluemix Blog, June Cheng (B.), Papageorgiou (A.) et Bauer (M.). Geelytics: Enabling on-demand edge analytics over scoped data sources. In IEEE International Congress on Big Data (BigData Congress), pp , June Hochreiner (C.), Vogler (M.), Waibel (P.) et Dustdar (S.). VISP: An ecosystem for elastic data stream processing for the internet of things. In 20th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2016), pp. 1 11, Sept Morales (J.), Rosas (E.) et Hidalgo (N.). Symbiosis: Sharing Mobile Resources for Stream Processing. In IEEE Symposium on Computers and Communications (ISCC 2014) Workshops, pp. 1 6, June Tziritas (N.), Loukopoulos (T.), Khan (S. U.), Xu (C. Z.) et Zomaya (A. Y.). On Improving Constrained Single and Group Operator Placement Using Evictions in Big Data Environments. IEEE Transactions on Services Computing, vol. 9, n5, September 2016, pp Uckelmann (D.), Harrison (M.) et Michahelles (F.). An architectural approach towards the future internet of things. In : Architecting the internet of things, pp Springer, Wu (Y.) et Tan (K. L.). ChronoStream: Elastic stateful stream computation in the cloud. In 2015 IEEE 31st International Conference on Data Engineering, pp , April Zaharia (M.), Chowdhury (M.), Das (T.), Dave (A.), Ma (J.), McCauley (M.), Franklin (M. J.), Shenker (S.) et Stoica (I.). Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI 12, NSDI 12, pp. 2 2, Berkeley, CA, USA, USENIX Association. 15. Zeng (D.), Gu (L.) et Guo (S.). A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers, pp Cham, Springer International Publishing, View publication stats

Strategies for Big Data Analytics through Lambda Architectures in Volatile Environments

Strategies for Big Data Analytics through Lambda Architectures in Volatile Environments Strategies for Big Data Analytics through Lambda Architectures in Volatile Environments Alexandre Da Silva Veith, Julio C. S. dos Anjos, Edison Pignaton de Freitas, Thomas Lampoltshammer, Claudio Geyer

More information

REAL-TIME ANALYTICS WITH APACHE STORM

REAL-TIME ANALYTICS WITH APACHE STORM REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student IN TODAY S TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components with an existing applications 3- Proposed Framework 4-

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

Vortex Whitepaper. Intelligent Data Sharing for the Business-Critical Internet of Things. Version 1.1 June 2014 Angelo Corsaro Ph.D.

Vortex Whitepaper. Intelligent Data Sharing for the Business-Critical Internet of Things. Version 1.1 June 2014 Angelo Corsaro Ph.D. Vortex Whitepaper Intelligent Data Sharing for the Business-Critical Internet of Things Version 1.1 June 2014 Angelo Corsaro Ph.D., CTO, PrismTech Vortex Whitepaper Version 1.1 June 2014 Table of Contents

More information

Aura: A Flexible Dataflow Engine for Scalable Data Processing

Aura: A Flexible Dataflow Engine for Scalable Data Processing Aura: A Flexible Dataflow Engine for Scalable Data Processing Tobias Herb, Lauritz Thamsen, Thomas Renner, Odej Kao Technische Universität Berlin firstname.lastname@tu-berlin.de Abstract. This paper describes

More information

Data Model Considerations for Radar Systems

Data Model Considerations for Radar Systems WHITEPAPER Data Model Considerations for Radar Systems Executive Summary The market demands that today s radar systems be designed to keep up with a rapidly changing threat environment, adapt to new technologies,

More information

Functional Requirements for Grid Oriented Optical Networks

Functional Requirements for Grid Oriented Optical Networks Functional Requirements for Grid Oriented Optical s Luca Valcarenghi Internal Workshop 4 on Photonic s and Technologies Scuola Superiore Sant Anna Pisa June 3-4, 2003 1 Motivations Grid networking connection

More information

Distributed systems for stream processing

Distributed systems for stream processing Distributed systems for stream processing Apache Kafka and Spark Structured Streaming Alena Hall Alena Hall Large-scale data processing Distributed Systems Functional Programming Data Science & Machine

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference

More information

How to Route Internet Traffic between A Mobile Application and IoT Device?

How to Route Internet Traffic between A Mobile Application and IoT Device? Whitepaper How to Route Internet Traffic between A Mobile Application and IoT Device? Website: www.mobodexter.com www.paasmer.co 1 Table of Contents 1. Introduction 3 2. Approach: 1 Uses AWS IoT Setup

More information

AN EVENTFUL TOUR FROM ENTERPRISE INTEGRATION TO SERVERLESS. Marius Bogoevici Christian Posta 9 May, 2018

AN EVENTFUL TOUR FROM ENTERPRISE INTEGRATION TO SERVERLESS. Marius Bogoevici Christian Posta 9 May, 2018 AN EVENTFUL TOUR FROM ENTERPRISE INTEGRATION TO SERVERLESS Marius Bogoevici (@mariusbogoevici) Christian Posta (@christianposta) 9 May, 2018 About Us Marius Bogoevici @mariusbogoevici Chief Architect -

More information

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

Tuning Browser-to-Browser Offloading for Heterogeneous Stream Processing Web Applications

Tuning Browser-to-Browser Offloading for Heterogeneous Stream Processing Web Applications Tuning Browser-to-Browser Offloading for Heterogeneous Stream Processing Web Applications Masiar Babazadeh Faculty of Informatics, University of Lugano (USI), Switzerland {name.surname@usi.ch} Abstract.

More information

Over the last few years, we have seen a disruption in the data management

Over the last few years, we have seen a disruption in the data management JAYANT SHEKHAR AND AMANDEEP KHURANA Jayant is Principal Solutions Architect at Cloudera working with various large and small companies in various Verticals on their big data and data science use cases,

More information

Big data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT

Big data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT : Choices for high availability and disaster recovery on Microsoft Azure By Arnab Ganguly DataCAT March 2019 Contents Overview... 3 The challenge of a single-region architecture... 3 Configuration considerations...

More information

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Architectural challenges for building a low latency, scalable multi-tenant data warehouse Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics

More information

Integrate MATLAB Analytics into Enterprise Applications

Integrate MATLAB Analytics into Enterprise Applications Integrate Analytics into Enterprise Applications Aurélie Urbain MathWorks Consulting Services 2015 The MathWorks, Inc. 1 Data Analytics Workflow Data Acquisition Data Analytics Analytics Integration Business

More information

Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

Security and Performance advances with Oracle Big Data SQL

Security and Performance advances with Oracle Big Data SQL Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference

More information

Spark, Shark and Spark Streaming Introduction

Spark, Shark and Spark Streaming Introduction Spark, Shark and Spark Streaming Introduction Tushar Kale tusharkale@in.ibm.com June 2015 This Talk Introduction to Shark, Spark and Spark Streaming Architecture Deployment Methodology Performance References

More information

Distributed ETL. A lightweight, pluggable, and scalable ingestion service for real-time data. Joe Wang

Distributed ETL. A lightweight, pluggable, and scalable ingestion service for real-time data. Joe Wang A lightweight, pluggable, and scalable ingestion service for real-time data ABSTRACT This paper provides the motivation, implementation details, and evaluation of a lightweight distributed extract-transform-load

More information

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD S.THIRUNAVUKKARASU 1, DR.K.P.KALIYAMURTHIE 2 Assistant Professor, Dept of IT, Bharath University, Chennai-73 1 Professor& Head, Dept of IT, Bharath

More information

MyCloud Computing Business computing in the cloud, ready to go in minutes

MyCloud Computing Business computing in the cloud, ready to go in minutes MyCloud Computing Business computing in the cloud, ready to go in minutes In today s dynamic environment, businesses need to be able to respond quickly to changing demands. Using virtualised computing

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

CDS. André Schaaff1, François-Xavier Pineau1, Gilles Landais1, Laurent Michel2 de Données astronomiques de Strasbourg, 2SSC-XMM-Newton

CDS. André Schaaff1, François-Xavier Pineau1, Gilles Landais1, Laurent Michel2 de Données astronomiques de Strasbourg, 2SSC-XMM-Newton Docker @ CDS André Schaaff1, François-Xavier Pineau1, Gilles Landais1, Laurent Michel2 1Centre de Données astronomiques de Strasbourg, 2SSC-XMM-Newton Paul Trehiou Université de technologie de Belfort-Montbéliard

More information

Integrate MATLAB Analytics into Enterprise Applications

Integrate MATLAB Analytics into Enterprise Applications Integrate Analytics into Enterprise Applications Lyamine Hedjazi 2015 The MathWorks, Inc. 1 Data Analytics Workflow Preprocessing Data Business Systems Build Algorithms Smart Connected Systems Take Decisions

More information

MAGIC OF SDN IN NETWORKING

MAGIC OF SDN IN NETWORKING Innovate, Integrate, Transform MAGIC OF SDN IN NETWORKING www.altencalsoftlabs.com Executive Summary Software Defined Networking (SDN) brings a transformational paradigm shift from traditional vendor-locked

More information

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane BIG DATA Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management Author: Sandesh Deshmane Executive Summary Growing data volumes and real time decision making requirements

More information

HSM: A Hybrid Streaming Mechanism for Delay-tolerant Multimedia Applications Annanda Th. Rath 1 ), Saraswathi Krithivasan 2 ), Sridhar Iyer 3 )

HSM: A Hybrid Streaming Mechanism for Delay-tolerant Multimedia Applications Annanda Th. Rath 1 ), Saraswathi Krithivasan 2 ), Sridhar Iyer 3 ) HSM: A Hybrid Streaming Mechanism for Delay-tolerant Multimedia Applications Annanda Th. Rath 1 ), Saraswathi Krithivasan 2 ), Sridhar Iyer 3 ) Abstract Traditionally, Content Delivery Networks (CDNs)

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

Integration of Machine Learning Library in Apache Apex

Integration of Machine Learning Library in Apache Apex Integration of Machine Learning Library in Apache Apex Anurag Wagh, Krushika Tapedia, Harsh Pathak Vishwakarma Institute of Information Technology, Pune, India Abstract- Machine Learning is a type of artificial

More information

Analytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation

Analytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation Analytic Cloud with Shelly Garion IBM Research -- Haifa 2014 IBM Corporation Why Spark? Apache Spark is a fast and general open-source cluster computing engine for big data processing Speed: Spark is capable

More information

Fog Computing. The scenario

Fog Computing. The scenario Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Fog Computing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The scenario

More information

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018 Cloudline Autonomous Driving Solutions Accelerating insights through a new generation of Data and Analytics October, 2018 HPE big data analytics solutions power the data-driven enterprise Secure, workload-optimized

More information

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage A Dell Technical White Paper Dell Database Engineering Solutions Anthony Fernandez April 2010 THIS

More information

COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING

COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING Volume 119 No. 16 2018, 937-948 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING K.Anusha

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

Intelligent Edge Computing and ML-based Traffic Classifier. Kwihoon Kim, Minsuk Kim (ETRI) April 25.

Intelligent Edge Computing and ML-based Traffic Classifier. Kwihoon Kim, Minsuk Kim (ETRI)  April 25. Intelligent Edge Computing and ML-based Traffic Classifier Kwihoon Kim, Minsuk Kim (ETRI) (kwihooi@etri.re.kr, mskim16@etri.re.kr) April 25. 2018 ITU Workshop on Impact of AI on ICT Infrastructures Cian,

More information

Hybrid Data Platform

Hybrid Data Platform UniConnect-Powered Data Aggregation Across Enterprise Data Warehouses and Big Data Storage Platforms A Percipient Technology White Paper Author: Ai Meun Lim Chief Product Officer Updated Aug 2017 2017,

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Shark: Hive on Spark

Shark: Hive on Spark Optional Reading (additional material) Shark: Hive on Spark Prajakta Kalmegh Duke University 1 What is Shark? Port of Apache Hive to run on Spark Compatible with existing Hive data, metastores, and queries

More information

Dell EMC Isilon All-Flash

Dell EMC Isilon All-Flash Enterprise Strategy Group Getting to the bigger truth. ESG Lab Validation Dell EMC Isilon All-Flash Scale-out All-flash Storage for Demanding Unstructured Data Workloads By Tony Palmer, Senior Lab Analyst

More information

Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework

Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework Junguk Cho, Hyunseok Chang, Sarit Mukherjee, T.V. Lakshman, and Jacobus Van der Merwe 1 Big Data Era Big data analysis is increasingly common

More information

DriveScale-DellEMC Reference Architecture

DriveScale-DellEMC Reference Architecture DriveScale-DellEMC Reference Architecture DellEMC/DRIVESCALE Introduction DriveScale has pioneered the concept of Software Composable Infrastructure that is designed to radically change the way data center

More information

Optimizing Apache Spark with Memory1. July Page 1 of 14

Optimizing Apache Spark with Memory1. July Page 1 of 14 Optimizing Apache Spark with Memory1 July 2016 Page 1 of 14 Abstract The prevalence of Big Data is driving increasing demand for real -time analysis and insight. Big data processing platforms, like Apache

More information

Architecting the High Performance Storage Network

Architecting the High Performance Storage Network Architecting the High Performance Storage Network Jim Metzler Ashton, Metzler & Associates Table of Contents 1.0 Executive Summary...3 3.0 SAN Architectural Principals...5 4.0 The Current Best Practices

More information

ICN for Cloud Networking. Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory

ICN for Cloud Networking. Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory ICN for Cloud Networking Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory Information-Access Dominates Today s Internet is focused on point-to-point communication

More information

arxiv: v1 [cs.dc] 29 Jun 2015

arxiv: v1 [cs.dc] 29 Jun 2015 Lightweight Asynchronous Snapshots for Distributed Dataflows Paris Carbone 1 Gyula Fóra 2 Stephan Ewen 3 Seif Haridi 1,2 Kostas Tzoumas 3 1 KTH Royal Institute of Technology - {parisc,haridi}@kth.se 2

More information

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test

More information

ALCATEL-LUCENT ENTERPRISE DATA CENTER SWITCHING SOLUTION Automation for the next-generation data center

ALCATEL-LUCENT ENTERPRISE DATA CENTER SWITCHING SOLUTION Automation for the next-generation data center ALCATEL-LUCENT ENTERPRISE DATA CENTER SWITCHING SOLUTION Automation for the next-generation data center For more info contact Sol Distribution Ltd. A NEW NETWORK PARADIGM What do the following trends have

More information

COMMVAULT. Enabling high-speed WAN backups with PORTrockIT

COMMVAULT. Enabling high-speed WAN backups with PORTrockIT COMMVAULT Enabling high-speed WAN backups with PORTrockIT EXECUTIVE SUMMARY Commvault offers one of the most advanced and full-featured data protection solutions on the market, with built-in functionalities

More information

2/4/2019 Week 3- A Sangmi Lee Pallickara

2/4/2019 Week 3- A Sangmi Lee Pallickara Week 3-A-0 2/4/2019 Colorado State University, Spring 2019 Week 3-A-1 CS535 BIG DATA FAQs PART A. BIG DATA TECHNOLOGY 3. DISTRIBUTED COMPUTING MODELS FOR SCALABLE BATCH COMPUTING SECTION 1: MAPREDUCE PA1

More information

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara Week 1-B-0 Week 1-B-1 CS535 BIG DATA FAQs Slides are available on the course web Wait list Term project topics PART 0. INTRODUCTION 2. DATA PROCESSING PARADIGMS FOR BIG DATA Sangmi Lee Pallickara Computer

More information

Optimistic Recovery for Iterative Dataflows in Action

Optimistic Recovery for Iterative Dataflows in Action Optimistic Recovery for Iterative Dataflows in Action Sergey Dudoladov 1 Asterios Katsifodimos 1 Chen Xu 1 Stephan Ewen 2 Volker Markl 1 Sebastian Schelter 1 Kostas Tzoumas 2 1 Technische Universität Berlin

More information

Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b

Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2015) Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b 1

More information

SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES

SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou Cornell

More information

Importance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks

Importance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks Importance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks Richard Harada Product Manager RuggedCom Inc. Introduction Reliable and fault tolerant high speed communication

More information

StreamBox: Modern Stream Processing on a Multicore Machine

StreamBox: Modern Stream Processing on a Multicore Machine StreamBox: Modern Stream Processing on a Multicore Machine Hongyu Miao and Heejin Park, Purdue ECE; Myeongjae Jeon and Gennady Pekhimenko, Microsoft Research; Kathryn S. McKinley, Google; Felix Xiaozhu

More information

Streaming & Apache Storm

Streaming & Apache Storm Streaming & Apache Storm Recommended Text: Storm Applied Sean T. Allen, Matthew Jankowski, Peter Pathirana Manning 2010 VMware Inc. All rights reserved Big Data! Volume! Velocity Data flowing into the

More information

Stream Processing on IoT Devices using Calvin Framework

Stream Processing on IoT Devices using Calvin Framework Stream Processing on IoT Devices using Calvin Framework by Ameya Nayak A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science Supervised

More information

MASERGY S MANAGED SD-WAN

MASERGY S MANAGED SD-WAN MASERGY S MANAGED New Performance Options for Hybrid Networks Business Challenges WAN Ecosystem Features and Benefits Use Cases INTRODUCTION Organizations are leveraging technology to transform the way

More information

Introduction to Big-Data

Introduction to Big-Data Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,

More information

CSMA based Medium Access Control for Wireless Sensor Network

CSMA based Medium Access Control for Wireless Sensor Network CSMA based Medium Access Control for Wireless Sensor Network H. Hoang, Halmstad University Abstract Wireless sensor networks bring many challenges on implementation of Medium Access Control protocols because

More information

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017. Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate

More information

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach Tiziano De Matteis, Gabriele Mencagli University of Pisa Italy INTRODUCTION The recent years have

More information

Qunar Performs Real-Time Data Analytics up to 300x Faster with Alluxio

Qunar Performs Real-Time Data Analytics up to 300x Faster with Alluxio CASE STUDY Qunar Performs Real-Time Data Analytics up to 300x Faster with Alluxio Xueyan Li, Lei Xu, and Xiaoxu Lv Software Engineers at Qunar At Qunar, we have been running Alluxio in production for over

More information

An Implementation of Fog Computing Attributes in an IoT Environment

An Implementation of Fog Computing Attributes in an IoT Environment An Implementation of Fog Computing Attributes in an IoT Environment Ranjit Deshpande CTO K2 Inc. Introduction Ranjit Deshpande CTO K2 Inc. K2 Inc. s end-to-end IoT platform Transforms Sensor Data into

More information

VMware Cloud Application Platform

VMware Cloud Application Platform VMware Cloud Application Platform Jerry Chen Vice President of Cloud and Application Services Director, Cloud and Application Services VMware s Three Strategic Focus Areas Re-think End-User Computing Modernize

More information

Performance and Scalability with Griddable.io

Performance and Scalability with Griddable.io Performance and Scalability with Griddable.io Executive summary Griddable.io is an industry-leading timeline-consistent synchronized data integration grid across a range of source and target data systems.

More information

Fast, Interactive, Language-Integrated Cluster Computing

Fast, Interactive, Language-Integrated Cluster Computing Spark Fast, Interactive, Language-Integrated Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica www.spark-project.org

More information

Survey Paper on Traditional Hadoop and Pipelined Map Reduce

Survey Paper on Traditional Hadoop and Pipelined Map Reduce International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,

More information

Computing in the Continuum: Harnessing Pervasive Data Ecosystems

Computing in the Continuum: Harnessing Pervasive Data Ecosystems Computing in the Continuum: Harnessing Pervasive Data Ecosystems Manish Parashar, Ph.D. Director, Rutgers Discovery Informatics Institute RDI 2 Distinguished Professor, Department of Computer Science Moustafa

More information

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org I ++ Mapreduce: Incremental Mapreduce for

More information

QoS-Aware IPTV Routing Algorithms

QoS-Aware IPTV Routing Algorithms QoS-Aware IPTV Routing Algorithms Patrick McDonagh, Philip Perry, Liam Murphy. School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4. {patrick.mcdonagh, philip.perry,

More information

Similarities and Differences Between Parallel Systems and Distributed Systems

Similarities and Differences Between Parallel Systems and Distributed Systems Similarities and Differences Between Parallel Systems and Distributed Systems Pulasthi Wickramasinghe, Geoffrey Fox School of Informatics and Computing,Indiana University, Bloomington, IN 47408, USA In

More information

Distributed Pub/Sub Model in CoAP-based Internet-of-Things Networks

Distributed Pub/Sub Model in CoAP-based Internet-of-Things Networks Distributed Pub/Sub Model in CoAP-based Internet-of-Things Networks Joong-Hwa Jung School of Computer Science and Engineering, Kyungpook National University Daegu, Korea godopu16@gmail.com Dong-Kyu Choi

More information

Designing Hybrid Data Processing Systems for Heterogeneous Servers

Designing Hybrid Data Processing Systems for Heterogeneous Servers Designing Hybrid Data Processing Systems for Heterogeneous Servers Peter Pietzuch Large-Scale Distributed Systems (LSDS) Group Imperial College London http://lsds.doc.ic.ac.uk University

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

vsan 6.6 Performance Improvements First Published On: Last Updated On:

vsan 6.6 Performance Improvements First Published On: Last Updated On: vsan 6.6 Performance Improvements First Published On: 07-24-2017 Last Updated On: 07-28-2017 1 Table of Contents 1. Overview 1.1.Executive Summary 1.2.Introduction 2. vsan Testing Configuration and Conditions

More information

Key aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling

Key aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling Key aspects of cloud computing Cluster Scheduling 1. Illusion of infinite computing resources available on demand, eliminating need for up-front provisioning. The elimination of an up-front commitment

More information

OpenStack internal messaging at the edge: In-depth evaluation. Ken Giusti Javier Rojas Balderrama Matthieu Simonin

OpenStack internal messaging at the edge: In-depth evaluation. Ken Giusti Javier Rojas Balderrama Matthieu Simonin OpenStack internal messaging at the edge: In-depth evaluation Ken Giusti Javier Rojas Balderrama Matthieu Simonin Who s here? Ken Giusti Javier Rojas Balderrama Matthieu Simonin Fog Edge and Massively

More information

HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT:

HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT: HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms Author: Stan Posey Panasas, Inc. Correspondence: Stan Posey Panasas, Inc. Phone +510 608 4383 Email sposey@panasas.com

More information

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous

More information

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity

More information

MODERNISE WITH ALL-FLASH. Intel Inside. Powerful Data Centre Outside.

MODERNISE WITH ALL-FLASH. Intel Inside. Powerful Data Centre Outside. MODERNISE WITH ALL-FLASH Intel Inside. Powerful Data Centre Outside. MODERNISE WITHOUT COMPROMISE In today s lightning-fast digital world, it s critical for businesses to make their move to the Modern

More information

Hybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud

Hybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud Hybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud Abid Nisar, Waheed Iqbal, Fawaz S. Bokhari, and Faisal Bukhari Punjab University College of Information and Technology,Lahore

More information

Oracle Exadata: Strategy and Roadmap

Oracle Exadata: Strategy and Roadmap Oracle Exadata: Strategy and Roadmap - New Technologies, Cloud, and On-Premises Juan Loaiza Senior Vice President, Database Systems Technologies, Oracle Safe Harbor Statement The following is intended

More information

Towards adaptive execution strategies for large-scale and real-time data analytics

Towards adaptive execution strategies for large-scale and real-time data analytics Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 447 Towards adaptive execution strategies for large-scale and real-time data analytics Martin Köhler 1, Yuriy Kaniovskyi 2, and Siegfried Benkner

More information

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,

More information

NewSQL Without Compromise

NewSQL Without Compromise NewSQL Without Compromise Everyday businesses face serious challenges coping with application performance, maintaining business continuity, and gaining operational intelligence in real- time. There are

More information

Mellanox Virtual Modular Switch

Mellanox Virtual Modular Switch WHITE PAPER July 2015 Mellanox Virtual Modular Switch Introduction...1 Considerations for Data Center Aggregation Switching...1 Virtual Modular Switch Architecture - Dual-Tier 40/56/100GbE Aggregation...2

More information

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Technology Brief Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances The world

More information

Sparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica

Sparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica Outline The Spark scheduling bottleneck Sparrow s fully distributed, fault-tolerant technique

More information

Practical Big Data Processing An Overview of Apache Flink

Practical Big Data Processing An Overview of Apache Flink Practical Big Data Processing An Overview of Apache Flink Tilmann Rabl Berlin Big Data Center www.dima.tu-berlin.de bbdc.berlin rabl@tu-berlin.de With slides from Volker Markl and data artisans 1 2013

More information

FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS

FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS WHITE PAPER FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS Over the past 15 years, server virtualization has become the preferred method of application deployment in the enterprise datacenter.

More information

ELASTIC DATA PLATFORM

ELASTIC DATA PLATFORM SERVICE OVERVIEW ELASTIC DATA PLATFORM A scalable and efficient approach to provisioning analytics sandboxes with a data lake ESSENTIALS Powerful: provide read-only data to anyone in the enterprise while

More information