Assessing the Impact of Network Bandwidth and Operator Placement on Data Stream Processing for Edge Computing Environments
|
|
- Karin Knight
- 5 years ago
- Views:
Transcription
1 Assessing the Impact of Network Bandwidth and Operator Placement on Data Stream Processing for Edge Computing Environments Alexandre Veith, Marcos Dias de Assunção, Laurent Lefèvre Inria Avalon, LIP, ENS Lyon, University of Lyon 46 allee d Italie Lyon - France alexandre.veith@ens-lyon.fr Résumé A substantial part of the big data generated today is received in near real time and must be promptly processed. Cloud-based architectures for data stream processing comprise multiple software modules or frameworks for data collection, message queueing, and stream processing itself. This modular approach allows each component to grow independently from one another and accommodate changes, but it may increase the end-to-end latency when data events are processed in the cloud. Recent solutions intend to explore the edges of the Internet (i.e. edge computing) to perform certain data processing tasks and hence better utilise network resources. This work evaluates the impact regarding network bandwidth while employing frameworks that are commonly used to build cloud and edge-based stream processing solutions. Mots-clés : Traitement de flux de données, méta-données, analyse de performance, systèmes distribués, informatique en nuages 1. Introduction Today s instruments and services are producing ever-increasing amounts of data that require processing and analysis in order to provide insights or assist in decision making. This data deluge, often called big data, poses challenges to existing infrastructure regarding data transfer, storage, and processing. Although application models such as MapReduce have been very popular for batch processing, much of the data generated today is received in near real-time and requires quick analysis. In Internet of Things (IoT) [5, 12], for instance, continuous data streams produced by multiple sources must be handled under very short delays. The interest in processing data events as they arrive (i.e. online) has led to the emergence of several Distributed Stream Processing Engines (DSPEs) such as Apache Storm, Spark, Flink and S4. Under many frameworks, a stream processing application is often a Directed Acyclic Graph (DAG) whose vertices are operators that execute a function over the incoming data and edges define how data flows between them.. This work was performed within the framework of the LABEX MILYON (ANR-10-LABX-0070) of Université de Lyon, within the program Investissements d Avenir (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR).
2 Clouds are often the target infrastructure for deploying such engines due to their scalability, pay-as-you-go business model and resource elasticity. DSPEs are generally part of a larger architecture that comprises multiple tiers of data collection and processing interconnected by message brokers and queuing systems such as Apache ActiveMQ [1] and RabbitMQ [3], and publish-subscribe solutions including Apache Kafka [2]. This modular design enables tiers to grow at different paces and accommodate changes, but can increase the end-to-end latency and communication cost when treating data events. More modern solutions intend to exploit the edges of the Internet (i.e. edge computing) for performing certain data processing tasks and hence: reduce the end-to-end latency and communication costs, enable services to react to events locally, or offload processing from the cloud [7]. However, the architectures and software frameworks for these environments comprising clouds and micro-data centres located at the Internet edges are still evolving. In this work, we evaluate the impact and opportunities for deploying stream processing applications using cloud and edge computing. Our study considers DAGs that span both cloud and edge infrastructure. We are interested in demonstrating the benefits of splitting a stream processing graph and spreading operators across available resources. 2. Proposal Architecture This work considers multiple geographically distributed infrastructures for stream processing in big data environments, as depicted in Figure 1. A data stream flow usually has a source (i.e. a resource that creates or collects the data), operators that perform transformations on the data (e.g. filtering, mapping and aggregation) and a sink (i.e. the destination of the data). In a traditional deployment, all the operators of a data streaming application are placed in the cloud to benefit from virtually unlimited resources. Initially designed to minimise the latency of content delivered to users of mobile devices, edge computing has become an attractive solution for performing certain stream processing operations. By leveraging idle edge resources (i.e. sensors and gateways) computations can be performed closer to where the data is generated. Our effort relates to improving resource utilisation and data movement, which in stream processing can be achieved by deploying data stream operators along the physical path (source and sink). Instead of hosting all the data stream operators required by a stream processing solution in the cloud itself, we consider a more decentralised approach where the application is decoupled and hosted at multiple geographical locations. However, in this kind of environment, there are several variables (e.g. computational power, network bandwidth, network latency, application constraints, network topology) which have a strong impact on the placement decisions and bring complexity to the problem [11]. To optimise the aforementioned environment and deal with stringent stream processing requirements (i.e., events often being handled in the order of seconds or milliseconds), we present an architecture (Figure 2) to optimise the placement of a DAG s elements. In our model, a set of a computational resources will form a group considering resource proximity, which can translate to the number of hops and available bandwidth between the hosts. The resource roles are described as follows: Orchestrator node: Performs global decisions, and stores information about the tasks (i.e., DAG topology) and available containers (i.e. environment prepared to receive and process tasks). The global decisions involve task and container placement, as well as system monitoring. The placement considers information about the tasks, the network and the resources. Master node: Makes local decisions (decisions in the group) and group monitoring. The
3 FIGURE 1 Physical infrastructure which represents the flow between data sources and sinks. local decisions refer to small adjustments on the local task placement. Worker node: Hosts the operators and executes functions over the data streams, and/or stores data after processing. This work focuses on the communication between worker nodes. The flow between the data streams will be performed through the queue s consumption. In this way, a worker will get the data from another worker queue or a source queue. The consumption directions will be given by the Orchestrator or by the Master node. By distributing data stream operators, we would expect to minimise the latency, minimise the amount of data transferred over the wire (e.g. fewer headers, serialisation, etc), and reduce the impact of the external network environment (i.e. Internet). We aim to evaluate multiple combinations of operators and evaluate their impact on the resource utilisation and end-to-end latencies on decoupled scenarios. This decoupled scenario (i.e., using cloud and edge resources) is considered because it enables reducing the amount of data transferred at different phases, avoiding network restrictions that might exist at certain points of a path from edge to the cloud. Moreover, we are interested in evaluating the use of components traditionally employed for cluster/cloud-based stream processing solutions in more decentralised environments such as edge computing, where individual components may be hosted at a micro data centre or constrained resources geographically close to data sources whereas other services can run in the cloud. 3. Experimental Setup and Results This section describes the environment setup and results of a primary evaluation to demonstrate the impact of deploying operators on cloud and edge resources.
4 FIGURE 2 Proposed architecture to improve resource utilisation and throughput Experimental Setup The experiments comprise empirical evaluation performed on a cluster with four R410 Dell servers (Intel R Xeon R Processor E5506 4M Cache, 2.13 GHz, 4.80 GT/s Intel R QPI). The clock of all hosts are synchronised using Network Time Protocol (NTP). The model presented in the physical infrastructure, Figure 4, introduces the following roles: Data Sources: generate data and send it to the message queues. The data is drawn from an internal dataset with 1 GB of tweets that is processed by a sentiment analysis application described later. The tweets are sent by a built-in-house application that receives several parameters (i.e., inter arrival time between tweets and number of processes) to deploy and stress the infrastructure. For this task we use one R410 Dell server. Gateway: receives the tweets and either treats them or forwards them to a message broker (i.e., bypass). In other words, when one or more operators are deployed in the edge, the messages are stored in a lightweight queue (i.e., Mosquitto ) and processed by a lightweight DSPE (i.e., Apache Edgent 1.0.0). Otherwise, the tweets are forwarded to the Cloud without any edge processing. In the physical infrastructure, this represents one R410 Dell server. Cloud: receives the messages in two distinct queues (i.e., Apache Kafka ): (i) sink, if the messages were processed by the Gateway ; or (ii) source, tweets which need some treatment. The DSPE (i.e., Apache Flink 1.2.0) in the Cloud will treat the messages from the source queue. The DSPE was set up with two workers, it represents two R410 Dell servers (i.e., one for the Flink Manager and Flink Worker ; and another for Flink Worker). A sentiment analysis application that evaluates the polarity of tweets was used for performance evaluation (Figure 3). It uses a simple Natural Language Processing (NLP) technique to indicate the polarity of a sentence (i.e., counting positive and negative words and computing the difference). The tweets are JSON dictionaries, each tweet corresponding to an event. Each event is parsed to extract the relevant fields (e.g., tweet ID, language and the message itself). Then, the events are filtered by language, keeping only those that are in English. Next, the Stemmer removes stop words which do not carry sentiment or are irrelevant for the following steps. After that, an operator counts the number of negative and positive words, thus creating positive and the negative scores. At last, the application determines whether the tweet is positive or negative.
5 FIGURE 3 Operators of the sentiment analysis application. FIGURE 4 Physical infrastructure and physical plan for deployment scenarios. The configuration of the different stream-processing flows is presented in Figure 4. As presented in the physical plan, the stream operators will be deployed on the physical network. Scenario 0 represents all operators deployed in the cloud. In this way the Data Source will forward the messages directly to the Cloud, so that the Gateway will play the role of a bypass. Otherwise, under Scenario 6, all operators are deployed in the Gateway, and the Cloud acts as the sink. The remaining scenarios have mix configurations where operators are partially deployed in the Cloud and the Edge. To evaluate the proposed environments and assess the impact of the network bandwidth, the following tools were used: Linux Traffic Control to customise the network bandwidth; Python psutil used for measuring resource consumption, CPU utilisation, memory usage and network I/O. Each individual scenario evaluated is performed in 7 minutes considering the testbed specification. We disregard the first and last minutes of the experiment to eliminate warm up and cool down effects. Each experiment corresponds to the deployment of the operators with a determined network bandwidth (i.e., 10, 100, 1000 and Kbps) capacity on the edge-to-cloud network Experimental Results We noticed that the variations in network bandwidth and the operator placement have a direct impact on the number of treated events. This problem is evident when the edge-to-cloud network capacity is not enough (10, 100 and 1000 Kbps) to transfer the amount of data as presented in Figure 5a and 5b. However, to overcome this problem, we deploy the operators along the path, more specifically at the Gateway (source) and the cloud (sink). As depicted in Figure
6 (a) Number of tweets processed. (b) Amount of data transferred through the Edgeto-Cloud network. FIGURE 5 Amount of tweets processed and through the Edge-to-Cloud network. (a) CPU usage of the Cloud. (b) CPU usage of Gateway. FIGURE 6 CPU usage comparison. 1, by placing operators in the edge we achieve the best solution. These results are obtained without much load on the Gateway as it just used of 28% of the CPU in average (Figure 6b) when the scenario 5 is considered and the edge-to-cloud network is limited to 1000 Kbps. Also, this usage limitation respects the edge constraints as edge devices have less computational power than the cloud. In contrast, the results change substantially when the network bandwidth is not restrictive. As depicted in Figure 5a, when the bandwidth capacity is greater than 1000 Kbps, the edge becomes an execution bottleneck because the application is exploiting only one gateway that runs a lightweight stream processing framework that does not have all the full-fledged features and parallelism provided by Flink, used in the cloud. Although we performed experiments considering only one gateway, we argue that this is not the most practical scenario. In actual deployments, we expect scenarios that: contain multiple edge resources and several gateways (as in Figure 1) that can be used to offload some processing tasks from the cloud, or multiple paths that have several gateways between the sources and the sink. The results allowed us to understand better the impact of the placement of data streams ope-
7 rators and the benefits that certain placement configurations can bring. The benefits concern optimising the number of events processed, the resource consumption and the monetary cost reduction caused by transferring less data to the cloud. Moreover, the lower utilisation of cloud resources as depicted in Figure 6a could be exploited to release unused capacity via autoscaling operations and hence result in further lower costs. 4. Related Work Over the years several frameworks for distributed data stream processing have been proposed, such as Apache Storm, Apache Spark [14] and Apache Flink [4]. In many of these frameworks, the applications are structured as directed graphs of operators that execute either pre-defined functions such as filtering, joins and splitting, or user defined functions. Most solutions are designed to run in homogeneous clusters, but have also been deployed in cloud environments. More recently, services are increasingly being employing on environments that span multiple data centres or on the edges of the Internet (i.e., edge and fog computing). Existing work proposes architectures that place certain stream processing elements on micro data centres closer to where the data is generated [6] or that employ mobile devices for stream processing [10]. Even though these efforts are important, the present work focuses on evaluating the data-source-tocloud latency when using software frameworks that are traditionally deployed for on cloud environments for more decoupled environments. Most of the existing proposed work [8, 13, 15, 9] focuses on environments that do not consider variations in network bandwidth. The present work in contrast, takes into consideration both infrastructure and application constraints. We intend to focus on highly dynamic environments, where adapting an execution plan (the execution graph) is necessary to reduce to optimise the use of cloud and edge resources, and optimise the application end-to-end latency. 5. Conclusions In this work we evaluated the impact in terms of number of tweets handled by a stream processing application when varying the network bandwidth and the deployment of operators across cloud and edge Computing resources. We observe that for the considered application, the partial deployment of operators between the infrastructures brings some important benefits. When a data stream flows through the operators, generally its data size becomes smaller, and depending on the network bandwidth, the edge deployment improves the number of processed events. Bibliographie 1. Apache ActiveMQ Apache Kafka RabbitMQ Alexandrov (A.), Bergmann (R.), Ewen (S.), Freytag (J.), Hueske (F.), Heise (A.), Kao (O.), Leich (M.), Leser (U.), Markl (V.), Naumann (F.), Peters (M.), Rheinländer (A.), Sax (M. J.), Schelter (S.), Höger (M.), Tzoumas (K.) et Warneke (D.). The Stratosphere platform for big data analytics. VLBD Journal, vol. 23, n6, 2014, pp Atzori (L.), Iera (A.) et Morabito (G.). The internet of things: A survey. Computer Networks, vol. 54, n15, 2010, pp Cardellini (V.), Grassi (V.), Presti (F. L.) et Nardelli (M.). Distributed QoS-aware sche-
8 duling in Storm. In 9th ACM International Conference on Distributed Event-Based Systems, DEBS 15, DEBS 15, pp , New York, USA, ACM. 7. Chan (S.). Apache quarks, watson, and streaming analytics: Saving the world, one smart sprinkler at a time. Bluemix Blog, June Cheng (B.), Papageorgiou (A.) et Bauer (M.). Geelytics: Enabling on-demand edge analytics over scoped data sources. In IEEE International Congress on Big Data (BigData Congress), pp , June Hochreiner (C.), Vogler (M.), Waibel (P.) et Dustdar (S.). VISP: An ecosystem for elastic data stream processing for the internet of things. In 20th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2016), pp. 1 11, Sept Morales (J.), Rosas (E.) et Hidalgo (N.). Symbiosis: Sharing Mobile Resources for Stream Processing. In IEEE Symposium on Computers and Communications (ISCC 2014) Workshops, pp. 1 6, June Tziritas (N.), Loukopoulos (T.), Khan (S. U.), Xu (C. Z.) et Zomaya (A. Y.). On Improving Constrained Single and Group Operator Placement Using Evictions in Big Data Environments. IEEE Transactions on Services Computing, vol. 9, n5, September 2016, pp Uckelmann (D.), Harrison (M.) et Michahelles (F.). An architectural approach towards the future internet of things. In : Architecting the internet of things, pp Springer, Wu (Y.) et Tan (K. L.). ChronoStream: Elastic stateful stream computation in the cloud. In 2015 IEEE 31st International Conference on Data Engineering, pp , April Zaharia (M.), Chowdhury (M.), Das (T.), Dave (A.), Ma (J.), McCauley (M.), Franklin (M. J.), Shenker (S.) et Stoica (I.). Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI 12, NSDI 12, pp. 2 2, Berkeley, CA, USA, USENIX Association. 15. Zeng (D.), Gu (L.) et Guo (S.). A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers, pp Cham, Springer International Publishing, View publication stats
Strategies for Big Data Analytics through Lambda Architectures in Volatile Environments
Strategies for Big Data Analytics through Lambda Architectures in Volatile Environments Alexandre Da Silva Veith, Julio C. S. dos Anjos, Edison Pignaton de Freitas, Thomas Lampoltshammer, Claudio Geyer
More informationREAL-TIME ANALYTICS WITH APACHE STORM
REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student IN TODAY S TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components with an existing applications 3- Proposed Framework 4-
More informationTwitter data Analytics using Distributed Computing
Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE
More informationVortex Whitepaper. Intelligent Data Sharing for the Business-Critical Internet of Things. Version 1.1 June 2014 Angelo Corsaro Ph.D.
Vortex Whitepaper Intelligent Data Sharing for the Business-Critical Internet of Things Version 1.1 June 2014 Angelo Corsaro Ph.D., CTO, PrismTech Vortex Whitepaper Version 1.1 June 2014 Table of Contents
More informationAura: A Flexible Dataflow Engine for Scalable Data Processing
Aura: A Flexible Dataflow Engine for Scalable Data Processing Tobias Herb, Lauritz Thamsen, Thomas Renner, Odej Kao Technische Universität Berlin firstname.lastname@tu-berlin.de Abstract. This paper describes
More informationData Model Considerations for Radar Systems
WHITEPAPER Data Model Considerations for Radar Systems Executive Summary The market demands that today s radar systems be designed to keep up with a rapidly changing threat environment, adapt to new technologies,
More informationFunctional Requirements for Grid Oriented Optical Networks
Functional Requirements for Grid Oriented Optical s Luca Valcarenghi Internal Workshop 4 on Photonic s and Technologies Scuola Superiore Sant Anna Pisa June 3-4, 2003 1 Motivations Grid networking connection
More informationDistributed systems for stream processing
Distributed systems for stream processing Apache Kafka and Spark Structured Streaming Alena Hall Alena Hall Large-scale data processing Distributed Systems Functional Programming Data Science & Machine
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference
More informationHow to Route Internet Traffic between A Mobile Application and IoT Device?
Whitepaper How to Route Internet Traffic between A Mobile Application and IoT Device? Website: www.mobodexter.com www.paasmer.co 1 Table of Contents 1. Introduction 3 2. Approach: 1 Uses AWS IoT Setup
More informationAN EVENTFUL TOUR FROM ENTERPRISE INTEGRATION TO SERVERLESS. Marius Bogoevici Christian Posta 9 May, 2018
AN EVENTFUL TOUR FROM ENTERPRISE INTEGRATION TO SERVERLESS Marius Bogoevici (@mariusbogoevici) Christian Posta (@christianposta) 9 May, 2018 About Us Marius Bogoevici @mariusbogoevici Chief Architect -
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationTuning Browser-to-Browser Offloading for Heterogeneous Stream Processing Web Applications
Tuning Browser-to-Browser Offloading for Heterogeneous Stream Processing Web Applications Masiar Babazadeh Faculty of Informatics, University of Lugano (USI), Switzerland {name.surname@usi.ch} Abstract.
More informationOver the last few years, we have seen a disruption in the data management
JAYANT SHEKHAR AND AMANDEEP KHURANA Jayant is Principal Solutions Architect at Cloudera working with various large and small companies in various Verticals on their big data and data science use cases,
More informationBig data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT
: Choices for high availability and disaster recovery on Microsoft Azure By Arnab Ganguly DataCAT March 2019 Contents Overview... 3 The challenge of a single-region architecture... 3 Configuration considerations...
More informationArchitectural challenges for building a low latency, scalable multi-tenant data warehouse
Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Aurélie Urbain MathWorks Consulting Services 2015 The MathWorks, Inc. 1 Data Analytics Workflow Data Acquisition Data Analytics Analytics Integration Business
More informationOracle GoldenGate for Big Data
Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationSpark, Shark and Spark Streaming Introduction
Spark, Shark and Spark Streaming Introduction Tushar Kale tusharkale@in.ibm.com June 2015 This Talk Introduction to Shark, Spark and Spark Streaming Architecture Deployment Methodology Performance References
More informationDistributed ETL. A lightweight, pluggable, and scalable ingestion service for real-time data. Joe Wang
A lightweight, pluggable, and scalable ingestion service for real-time data ABSTRACT This paper provides the motivation, implementation details, and evaluation of a lightweight distributed extract-transform-load
More informationEFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD
EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD S.THIRUNAVUKKARASU 1, DR.K.P.KALIYAMURTHIE 2 Assistant Professor, Dept of IT, Bharath University, Chennai-73 1 Professor& Head, Dept of IT, Bharath
More informationMyCloud Computing Business computing in the cloud, ready to go in minutes
MyCloud Computing Business computing in the cloud, ready to go in minutes In today s dynamic environment, businesses need to be able to respond quickly to changing demands. Using virtualised computing
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationCDS. André Schaaff1, François-Xavier Pineau1, Gilles Landais1, Laurent Michel2 de Données astronomiques de Strasbourg, 2SSC-XMM-Newton
Docker @ CDS André Schaaff1, François-Xavier Pineau1, Gilles Landais1, Laurent Michel2 1Centre de Données astronomiques de Strasbourg, 2SSC-XMM-Newton Paul Trehiou Université de technologie de Belfort-Montbéliard
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Lyamine Hedjazi 2015 The MathWorks, Inc. 1 Data Analytics Workflow Preprocessing Data Business Systems Build Algorithms Smart Connected Systems Take Decisions
More informationMAGIC OF SDN IN NETWORKING
Innovate, Integrate, Transform MAGIC OF SDN IN NETWORKING www.altencalsoftlabs.com Executive Summary Software Defined Networking (SDN) brings a transformational paradigm shift from traditional vendor-locked
More informationBIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane
BIG DATA Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management Author: Sandesh Deshmane Executive Summary Growing data volumes and real time decision making requirements
More informationHSM: A Hybrid Streaming Mechanism for Delay-tolerant Multimedia Applications Annanda Th. Rath 1 ), Saraswathi Krithivasan 2 ), Sridhar Iyer 3 )
HSM: A Hybrid Streaming Mechanism for Delay-tolerant Multimedia Applications Annanda Th. Rath 1 ), Saraswathi Krithivasan 2 ), Sridhar Iyer 3 ) Abstract Traditionally, Content Delivery Networks (CDNs)
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationIntegration of Machine Learning Library in Apache Apex
Integration of Machine Learning Library in Apache Apex Anurag Wagh, Krushika Tapedia, Harsh Pathak Vishwakarma Institute of Information Technology, Pune, India Abstract- Machine Learning is a type of artificial
More informationAnalytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation
Analytic Cloud with Shelly Garion IBM Research -- Haifa 2014 IBM Corporation Why Spark? Apache Spark is a fast and general open-source cluster computing engine for big data processing Speed: Spark is capable
More informationFog Computing. The scenario
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Fog Computing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The scenario
More informationCloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018
Cloudline Autonomous Driving Solutions Accelerating insights through a new generation of Data and Analytics October, 2018 HPE big data analytics solutions power the data-driven enterprise Secure, workload-optimized
More informationMicrosoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage
Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage A Dell Technical White Paper Dell Database Engineering Solutions Anthony Fernandez April 2010 THIS
More informationCOMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING
Volume 119 No. 16 2018, 937-948 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING K.Anusha
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationIntelligent Edge Computing and ML-based Traffic Classifier. Kwihoon Kim, Minsuk Kim (ETRI) April 25.
Intelligent Edge Computing and ML-based Traffic Classifier Kwihoon Kim, Minsuk Kim (ETRI) (kwihooi@etri.re.kr, mskim16@etri.re.kr) April 25. 2018 ITU Workshop on Impact of AI on ICT Infrastructures Cian,
More informationHybrid Data Platform
UniConnect-Powered Data Aggregation Across Enterprise Data Warehouses and Big Data Storage Platforms A Percipient Technology White Paper Author: Ai Meun Lim Chief Product Officer Updated Aug 2017 2017,
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationShark: Hive on Spark
Optional Reading (additional material) Shark: Hive on Spark Prajakta Kalmegh Duke University 1 What is Shark? Port of Apache Hive to run on Spark Compatible with existing Hive data, metastores, and queries
More informationDell EMC Isilon All-Flash
Enterprise Strategy Group Getting to the bigger truth. ESG Lab Validation Dell EMC Isilon All-Flash Scale-out All-flash Storage for Demanding Unstructured Data Workloads By Tony Palmer, Senior Lab Analyst
More informationTyphoon: An SDN Enhanced Real-Time Big Data Streaming Framework
Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework Junguk Cho, Hyunseok Chang, Sarit Mukherjee, T.V. Lakshman, and Jacobus Van der Merwe 1 Big Data Era Big data analysis is increasingly common
More informationDriveScale-DellEMC Reference Architecture
DriveScale-DellEMC Reference Architecture DellEMC/DRIVESCALE Introduction DriveScale has pioneered the concept of Software Composable Infrastructure that is designed to radically change the way data center
More informationOptimizing Apache Spark with Memory1. July Page 1 of 14
Optimizing Apache Spark with Memory1 July 2016 Page 1 of 14 Abstract The prevalence of Big Data is driving increasing demand for real -time analysis and insight. Big data processing platforms, like Apache
More informationArchitecting the High Performance Storage Network
Architecting the High Performance Storage Network Jim Metzler Ashton, Metzler & Associates Table of Contents 1.0 Executive Summary...3 3.0 SAN Architectural Principals...5 4.0 The Current Best Practices
More informationICN for Cloud Networking. Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory
ICN for Cloud Networking Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory Information-Access Dominates Today s Internet is focused on point-to-point communication
More informationarxiv: v1 [cs.dc] 29 Jun 2015
Lightweight Asynchronous Snapshots for Distributed Dataflows Paris Carbone 1 Gyula Fóra 2 Stephan Ewen 3 Seif Haridi 1,2 Kostas Tzoumas 3 1 KTH Royal Institute of Technology - {parisc,haridi}@kth.se 2
More informationAccelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet
WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test
More informationALCATEL-LUCENT ENTERPRISE DATA CENTER SWITCHING SOLUTION Automation for the next-generation data center
ALCATEL-LUCENT ENTERPRISE DATA CENTER SWITCHING SOLUTION Automation for the next-generation data center For more info contact Sol Distribution Ltd. A NEW NETWORK PARADIGM What do the following trends have
More informationCOMMVAULT. Enabling high-speed WAN backups with PORTrockIT
COMMVAULT Enabling high-speed WAN backups with PORTrockIT EXECUTIVE SUMMARY Commvault offers one of the most advanced and full-featured data protection solutions on the market, with built-in functionalities
More information2/4/2019 Week 3- A Sangmi Lee Pallickara
Week 3-A-0 2/4/2019 Colorado State University, Spring 2019 Week 3-A-1 CS535 BIG DATA FAQs PART A. BIG DATA TECHNOLOGY 3. DISTRIBUTED COMPUTING MODELS FOR SCALABLE BATCH COMPUTING SECTION 1: MAPREDUCE PA1
More information8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara
Week 1-B-0 Week 1-B-1 CS535 BIG DATA FAQs Slides are available on the course web Wait list Term project topics PART 0. INTRODUCTION 2. DATA PROCESSING PARADIGMS FOR BIG DATA Sangmi Lee Pallickara Computer
More informationOptimistic Recovery for Iterative Dataflows in Action
Optimistic Recovery for Iterative Dataflows in Action Sergey Dudoladov 1 Asterios Katsifodimos 1 Chen Xu 1 Stephan Ewen 2 Volker Markl 1 Sebastian Schelter 1 Kostas Tzoumas 2 1 Technische Universität Berlin
More informationReal-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b
4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2015) Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b 1
More informationSEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES
SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou Cornell
More informationImportance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks
Importance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks Richard Harada Product Manager RuggedCom Inc. Introduction Reliable and fault tolerant high speed communication
More informationStreamBox: Modern Stream Processing on a Multicore Machine
StreamBox: Modern Stream Processing on a Multicore Machine Hongyu Miao and Heejin Park, Purdue ECE; Myeongjae Jeon and Gennady Pekhimenko, Microsoft Research; Kathryn S. McKinley, Google; Felix Xiaozhu
More informationStreaming & Apache Storm
Streaming & Apache Storm Recommended Text: Storm Applied Sean T. Allen, Matthew Jankowski, Peter Pathirana Manning 2010 VMware Inc. All rights reserved Big Data! Volume! Velocity Data flowing into the
More informationStream Processing on IoT Devices using Calvin Framework
Stream Processing on IoT Devices using Calvin Framework by Ameya Nayak A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science Supervised
More informationMASERGY S MANAGED SD-WAN
MASERGY S MANAGED New Performance Options for Hybrid Networks Business Challenges WAN Ecosystem Features and Benefits Use Cases INTRODUCTION Organizations are leveraging technology to transform the way
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationCSMA based Medium Access Control for Wireless Sensor Network
CSMA based Medium Access Control for Wireless Sensor Network H. Hoang, Halmstad University Abstract Wireless sensor networks bring many challenges on implementation of Medium Access Control protocols because
More informationThe SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.
Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate
More informationParallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach Tiziano De Matteis, Gabriele Mencagli University of Pisa Italy INTRODUCTION The recent years have
More informationQunar Performs Real-Time Data Analytics up to 300x Faster with Alluxio
CASE STUDY Qunar Performs Real-Time Data Analytics up to 300x Faster with Alluxio Xueyan Li, Lei Xu, and Xiaoxu Lv Software Engineers at Qunar At Qunar, we have been running Alluxio in production for over
More informationAn Implementation of Fog Computing Attributes in an IoT Environment
An Implementation of Fog Computing Attributes in an IoT Environment Ranjit Deshpande CTO K2 Inc. Introduction Ranjit Deshpande CTO K2 Inc. K2 Inc. s end-to-end IoT platform Transforms Sensor Data into
More informationVMware Cloud Application Platform
VMware Cloud Application Platform Jerry Chen Vice President of Cloud and Application Services Director, Cloud and Application Services VMware s Three Strategic Focus Areas Re-think End-User Computing Modernize
More informationPerformance and Scalability with Griddable.io
Performance and Scalability with Griddable.io Executive summary Griddable.io is an industry-leading timeline-consistent synchronized data integration grid across a range of source and target data systems.
More informationFast, Interactive, Language-Integrated Cluster Computing
Spark Fast, Interactive, Language-Integrated Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica www.spark-project.org
More informationSurvey Paper on Traditional Hadoop and Pipelined Map Reduce
International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,
More informationComputing in the Continuum: Harnessing Pervasive Data Ecosystems
Computing in the Continuum: Harnessing Pervasive Data Ecosystems Manish Parashar, Ph.D. Director, Rutgers Discovery Informatics Institute RDI 2 Distinguished Professor, Department of Computer Science Moustafa
More informationI ++ Mapreduce: Incremental Mapreduce for Mining the Big Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org I ++ Mapreduce: Incremental Mapreduce for
More informationQoS-Aware IPTV Routing Algorithms
QoS-Aware IPTV Routing Algorithms Patrick McDonagh, Philip Perry, Liam Murphy. School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4. {patrick.mcdonagh, philip.perry,
More informationSimilarities and Differences Between Parallel Systems and Distributed Systems
Similarities and Differences Between Parallel Systems and Distributed Systems Pulasthi Wickramasinghe, Geoffrey Fox School of Informatics and Computing,Indiana University, Bloomington, IN 47408, USA In
More informationDistributed Pub/Sub Model in CoAP-based Internet-of-Things Networks
Distributed Pub/Sub Model in CoAP-based Internet-of-Things Networks Joong-Hwa Jung School of Computer Science and Engineering, Kyungpook National University Daegu, Korea godopu16@gmail.com Dong-Kyu Choi
More informationDesigning Hybrid Data Processing Systems for Heterogeneous Servers
Designing Hybrid Data Processing Systems for Heterogeneous Servers Peter Pietzuch Large-Scale Distributed Systems (LSDS) Group Imperial College London http://lsds.doc.ic.ac.uk University
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationvsan 6.6 Performance Improvements First Published On: Last Updated On:
vsan 6.6 Performance Improvements First Published On: 07-24-2017 Last Updated On: 07-28-2017 1 Table of Contents 1. Overview 1.1.Executive Summary 1.2.Introduction 2. vsan Testing Configuration and Conditions
More informationKey aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling
Key aspects of cloud computing Cluster Scheduling 1. Illusion of infinite computing resources available on demand, eliminating need for up-front provisioning. The elimination of an up-front commitment
More informationOpenStack internal messaging at the edge: In-depth evaluation. Ken Giusti Javier Rojas Balderrama Matthieu Simonin
OpenStack internal messaging at the edge: In-depth evaluation Ken Giusti Javier Rojas Balderrama Matthieu Simonin Who s here? Ken Giusti Javier Rojas Balderrama Matthieu Simonin Fog Edge and Massively
More informationHPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT:
HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms Author: Stan Posey Panasas, Inc. Correspondence: Stan Posey Panasas, Inc. Phone +510 608 4383 Email sposey@panasas.com
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationChelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING
Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity
More informationMODERNISE WITH ALL-FLASH. Intel Inside. Powerful Data Centre Outside.
MODERNISE WITH ALL-FLASH Intel Inside. Powerful Data Centre Outside. MODERNISE WITHOUT COMPROMISE In today s lightning-fast digital world, it s critical for businesses to make their move to the Modern
More informationHybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud
Hybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud Abid Nisar, Waheed Iqbal, Fawaz S. Bokhari, and Faisal Bukhari Punjab University College of Information and Technology,Lahore
More informationOracle Exadata: Strategy and Roadmap
Oracle Exadata: Strategy and Roadmap - New Technologies, Cloud, and On-Premises Juan Loaiza Senior Vice President, Database Systems Technologies, Oracle Safe Harbor Statement The following is intended
More informationTowards adaptive execution strategies for large-scale and real-time data analytics
Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 447 Towards adaptive execution strategies for large-scale and real-time data analytics Martin Köhler 1, Yuriy Kaniovskyi 2, and Siegfried Benkner
More information4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)
4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,
More informationNewSQL Without Compromise
NewSQL Without Compromise Everyday businesses face serious challenges coping with application performance, maintaining business continuity, and gaining operational intelligence in real- time. There are
More informationMellanox Virtual Modular Switch
WHITE PAPER July 2015 Mellanox Virtual Modular Switch Introduction...1 Considerations for Data Center Aggregation Switching...1 Virtual Modular Switch Architecture - Dual-Tier 40/56/100GbE Aggregation...2
More informationIntel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances
Technology Brief Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances The world
More informationSparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica
Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica Outline The Spark scheduling bottleneck Sparrow s fully distributed, fault-tolerant technique
More informationPractical Big Data Processing An Overview of Apache Flink
Practical Big Data Processing An Overview of Apache Flink Tilmann Rabl Berlin Big Data Center www.dima.tu-berlin.de bbdc.berlin rabl@tu-berlin.de With slides from Volker Markl and data artisans 1 2013
More informationFIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS
WHITE PAPER FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS Over the past 15 years, server virtualization has become the preferred method of application deployment in the enterprise datacenter.
More informationELASTIC DATA PLATFORM
SERVICE OVERVIEW ELASTIC DATA PLATFORM A scalable and efficient approach to provisioning analytics sandboxes with a data lake ESSENTIALS Powerful: provide read-only data to anyone in the enterprise while
More information