Elastic Complex Event Processing

Size: px
Start display at page:

Download "Elastic Complex Event Processing"

Transcription

1 Elastic Complex Event Processing Thomas Heinze SAP Research Dresden Dresden, Germany ABSTRACT Complex Event Processing (CEP) systems are designed to process large amount of information by simultaneously evaluating multiple queries over event streams. Two main requirementsimposedbytheusersofthecepsystemsare: (1) ability to process high throughput event data and (2) ability to answer queries with very low latency. In order to meet above requirements CEP systems are becoming increasingly distributed. Distribution of queries as well as event streams across multiple nodes facilitates increasing the throughput of CEP systems while simultaneously maintaining low response times. The widespread adoption of cloud computing and the accompanying pay-as-you-go model has added new dimensions to the problem of complex event processing in a distributed system. Nowadays, it is not only important to be able to scale the processing out to a large number of nodes, it is also equally important to be able to scale the processing down, as soon as the load or user requirements decrease. The ability to scale processing up and down along with the load and user requirements is called elasticity. The goal of the thesis described in this paper is to develop a component allowing for elastic scaling of distributed CEP systems in response to variations in the load and contractual obligations regarding the quality of service. To this end, the thesis described in this paper will address following three major topics: (1) multi query optimization, (2) operator placement in distributed environments, and (3) cost efficiency. This paper outlines the state of art for the three aforementioned topics and presents the overall draft of the solution for the problem of the elastic complex event processing. 1. INTRODUCTION Complex Event Processing (CEP) systems [21] are used in various scenarios, including (but not limited to): stock trading [20], click stream monitoring [11], and information integration workflows [7]. Moreover, Complex Event Pro- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDS 2011, December 12th, 2011, Lisbon, Portugal. Copyright 2011 ACM /11/12...$ cessing is becoming a central component of Business Intelligence (BI) systems, as it facilitates solving BI task in (near) realtime by shortcutting the pipeline between raw data and the dashboard layer [7]. In contrast to classical database systems, CEP systems avoid persisting of incoming events, processing the data directly in memory. This allows CEP systems to process event streams with several thousands of events per second with sub millisecond latency [34]. With the increasing amount of data available for analysis [14] and increasing complexity and availability of the Business Intelligence analysis tools, single machine-based solutions are quickly becoming overloaded. Based on reports from SAP customers one can currently expect that CEP systems will have to cope with as much as simultaneous users with estimated 5 queries being asked by every single user. The Options Price Reporting Authority (OPRA), which aggregates all quotes and trades from options exchanges, estimated peak rates of 456,000 events/sec (July 2007), with rates roughly doubling every year [33]. Since load shedding is not an applicable strategy when the correctness of results is required, distribution of the complex event processing emerges as the only viable option. First attempts at creating distributed CEP systems have been already undertaken and have resulted in systems, like: RAPI- DE [22], Borealis [1], and DCEP [28]. However, in order to be able to competitively run largescale distributed systems, cloud computing[8] resources have to be used. The key feature provided by cloud computing is the notion of elasticity. Elasticity has been defined [24] as: [...] capabilities [which] can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time. Elasticity involves not only transparent scale up of the system, but also transparent scaling down if the current setup is too big with respect to the current load or requested quality of service. In contrast, a classical distributed CEP system [22, 1, 28] is designed to always support for the peak load. Therefore, it is periodically scaled up in case the input rate increases. However, in practice, the load imposed on a distributed system is not increasing linearly, but shows an unpredictable variability see Figure 1. Using a static resource provisioning either not enough resources are available (underprovisioning) and requests are dropped, or too many resources are used (overprovisioning), creating unnecessary costs. In contrast, using elastic resource provisioning, the system can alter its resource utilization dynamically, which

2 Figure 2: Example dataflow on StreamMine Figure 1: Comparison of static and elastic resource provisioning source: [35] avoids over- as well as underprovisioning. Recently a number of prototypes for cloud-based CEP engines has been proposed, including: SEEP [25], an extension for System S [27], Flood [4], StreamCloud [12] and Kleiminger et al. [17]. In the following table (Table 1) we characterize aforementioned systems based on the level at which the elasticity is realized. Elastic Action Taken System Engine Query Operator New engine instance is started and new queries are employed on this engine New query instance is started and the input data is split New operator instance is created and the input data is split [17] [12, 25] [4, 27] Table 1: Different levels of elasticity and corresponding actions A CEP engine can be made elastic by parallelizing either at the engine, query or operator level. Each of these approaches has benefits and shortcomings. The engine level approach requires the smallest set of changes to the overall system, however it also introduces the largest overhead to initialize a new instance and exposes the smallest flexibility and granularity. The query level approach imposes less overhead, however it requires the system to be able to optimize for a possibly very large amount of queries see section 4 for more details on this approach. The query level approach is also less flexible than the operator level approach. The operatorlevel approachhelps to scale the system in the most fine grained fashion. However, it also requires fully elastic operators, which are difficult to implement, especially if a system permits user defined operators. Moreover, too high degree of distribution might result in communication overhead becoming a limiting factor with respect to scalability [12]. In order to exploit elasticity a complex event processing system has to be designed in an elastic way, allowing for operation with a variable number of nodes. To be able to realize such systems, a solution for the following three issues has to be created [35]: 1. Which components or layers in my application architecture can become elastic? 2. What will it take to make them elastic? 3. What will be the impact of realizing elasticity onto my overall system architecture? The current focus of most CEP systems is on implementing answers to questions (1) and (2). None of the works has done an intensive study on the extensions needed for making the elasticity transparent to the user. The thesis described in this paper tries to fill this gap by researching question (3). The system needs to decide automatically, when to scale up and also when to scale down again to reduce the monetary cost. In addition, the system has to detect operators becoming bottlenecks and react by e.g. moving them to another node. The remainder of this paper is structured as follows: Section 2 introduces the system model and general architecture ofthe CEP system whichwillbe usedas a foundation forthe work carried out within the described thesis. In Section 3 main components of the proposed system are introduced, including multi query optimization, operator placement, and cost efficiency. Section 7 describes the planned evaluation strategy and plan for the whole system. Section 8 concludes this paper. 2. SYSTEM MODEL ThethesisdescribedinthispaperispartoftheSRT-15 1 research project. The SRT-15 research project is funded by the European Commission within the Seventh Framework Programme (FP7) under the Grant Agreement number in the area of Internet of Services, Software & Virtualisation (ICT ). A software foundation for both the SRT-15 research project and the described thesis the StreamMine [23] system. This StreamMine system is a framework for processes of events using a stage-based approach, where each stage can be programmed to express an independent logic unit and store independent state. Within a single stage multiple instances of the stage logic realize the same functionality for different data. In order to achieve this, StreamMine offers functionality to split the input data using a key-based approach see Figure 2 for the architectural overview. 1 Project homepage:

3 Figure 3: Thesis prototype StreamMine can be executed in a distributed environment. For each execution all involved stages have to be assigned to physical underlying system nodes. Subsequently, stages have to be explicitly connected using point to point connections. This allows to scale the different stages based on expected event load by: (1) adding new stages to the already existing system or (2) adding additional nodes to already running stages. StreamMine can therefore be used to realize elasticity on an operator level. Within the thesis, described in this paper, we define typical CEP operators (Selection, Join, Aggregation and Sequence) on top of the StreamMine model. We assume that input for the CEP system, running on top of StreamMine, consists of a set of infinite event streams and a set of continuous queries. The number of queries can change during runtime, as new queries can be added and removed dynamically. We also assume that queries submitted to the CEP system model the data flow as an acyclic graph. Figure 2 illustrates such data flow consisting of two Selection operators and one Join operator. 3. SYSTEM ARCHITECTURE The main challenge (and simultaneously advantage) when designing a cloud-based CEP engine is that it must handle a rapidly changing set of events and queries, where both the number of queries and the event rate is very hard to predict. As outlined in Section 1 peak load rates can be as high as 100,000 concurrent queries and 400,000 events per second. However, the normal load is expected to significantly differ from the above values, depending mostly on the application type. This in turn, requires the system to be able to automatically scale up and down along with the varying load, no user interaction is involved in the elastic scaling process. In order to allow for such functionality we propose a system architecture presented in Figure 3. The key concept of the thesis prototype with respect to its architecture, is that it will be decoupled from the underlying elastic CEP system. All communication between the thesis prototype and the underlying CEP system will be executed via generic interfaces. This allows to use of the thesis prototype in combination with any elastic CEP engine, as long as the underlying CEP engine implements the specified interfaces and the required functionality including a distributed execution, dynamic addition of queries, operators and operator instances as well as dynamic movement of operators instances to other nodes. The prototype consists of three major components: Multiple query optimization Given a large amount of concurrently running queries it is likely that results produced by parts of or even whole queries are identical. In order to lower the load on the system such overlaps should be exploited. A solution can be based on multiple query optimization [29] a technique which allows creation of global query plans reusing common results. Operator placement Based on a global query plan and a set of available nodes an assignment of operators or queries onto the physical nodes has to be performed. In order to accommodate for the varying load operator placement has to be aware of the current load levels as well as the current utilization of system nodes. Moreover, considering a deployment in the cloud, the operator placement component must be able to request and release resources as needed in order to cope with the varying workload and quality of service requirements like a maximal end to end latency or an expected minimal throughput. Cost efficiency In most scenarios users provide not only event sources and queries to run against them but also a monetary budget and quality of service requirements which need to be met. Therefore, both aforementioned components must be able to operate in a cost efficient way. This means that the query plan and its execution must be tuned in such a way as to minimize the execution cost, while simultaneously meeting the specified quality of service requirements. In addition, newly added queries should be rejected by an admission control in case they can not be executed with the given budget. In the following sections we will describe each of the aforementioned components in more detail. Specifically, we will motivate the design decisions regarding the components design and we will present related work and the progress beyond the state of the art. 4. MULTIPLE QUERY OPTIMIZATION FOR ELASTIC CEP The goal of the Multiple Query Optimization [29] (MQO) is to reduce the number of concurrently running queries in a CEP and/or database system. A MQO approach involves identification and reuse of identical results produced by operators within a global query plan. MQO allows avoiding repetitive computation of identical results and is specifically well suited for CEP systems, which are working with long running queries [7]. Figure 4 illustrates the multi query optimization approach. The solution to the MQO is a global query plan incorporating multiple queries. The goal of the MQO when creating the global query plan is to minimize a cost metric, such as the sum of all operators cost. The task is known to be NP-complete [30], because the global optimal plan cannot be constructed by simply combining optimal per query plans. Instead, all possible query plans for all queries have to be examined, which leads to an exponential growth of the

4 Figure 4: Example for an incremental multi query optimization search space. To avoid this problem most approaches use heuristics [26] to find a near optimal query plan. The MQO approach used in context of this thesis should be incremental, allowing adding new queries and removing queries during runtime. The global query plan should only contain the current active queries to allow an efficient resource usage. Many classical approaches used in database systems [6, 26] are not incremental algorithms, which means that each time a new query arrives the computation to find the optimalquery plan has to be restarted. This can become very expensive, especially when the query arrival frequency is high. In addition, when adding new queries the already running ones should not be interrupted, i.e., changes to the currently executed global query plan should be avoided. If the position of operators within a running global query plan is changed this might involve changes in the semantics of the current operator state (e.g. the previously seen and still valid events inside a join operator), which either results in a complex reinitialization or incorrect results. In the context of data streaming the usage of multi query optimization was first addressed in[13, 16]. Both approaches show an efficient way to optimize a large amount of continuous queries. However, both approaches do not allow for addition nor removal of queries during runtime. A system for incremental query optimization has been first proposed in [15] it uses internal index structures to easily identify possible merge points for new queries. Additionally, the authors use query subsumption [3] for filter expressions, which results in a higher results reuse. However, the set of operators supported in the above system is very limited and the operators are only executed on the same input stream. The thesis described in this paper will extend the above approach to: (1) accommodate a full set of CEP operators as well as (2) multiple input streams and (3) query removal from the system. In addition, instead of comparing the semantics of the operators, a stream annotation approach will be used. In our approach every stream is annotated with a set of predicates based on the operator semantics defining tuples contained within this stream. The goal is to avoid query rewriting and make use of efficient index structures to allow for fast integration of new queries. 5. OPERATOR PLACEMENT FOR ELAS- TIC CEP Given a query plan and a set of available nodes, it has to be decided which operators are to be mapped onto which nodes and how many nodes should execute a given operator. This problem can be solved by operator placement algorithms [19] whichtryto place a setofoperatorson a fixedset of distributed nodes. A number of approaches [2, 16, 32, 36] which try to optimize operator placement based on a specified metric have been proposed. Authors in [19] outline different design dimensions for operator placement algorithms, in the following we mention those which are relevant to the thesis described in this paper: Architecture Existing approaches can be categorized being an independent module [16, 32] or as distributed logic [2, 36]. With respect to the architecture the thesis prototype described in this paper is implemented as a centralized and independent module. Metric The operator placement can be optimized based on various metrics including load, latency, network bandwidth or operator importance. Within the thesis described in this paper different metrics including monetary cost (see Section 6) will be evaluated. Adjustment The operator placement can either be static, with new placement calculated only when a new query arrives or departs, or dynamic with the placement being adjusted, subject to varying load and node utilization. Within the thesis described in this paper a dynamic operator placement based on the system load and utilization will be used. With the thesis described in this paper two additional problems will be investigated: (1) it has to be decided, how many nodes per operator are needed and (2) how to perform an a priori (instead of an a posteriori) node provisioning. The first problem can be solved in a two phased approach. First, an initial estimate based on the number of available nodes and the complexity of the operator is performed. Subsequently, after deployment has taken place, runtime information is used in order to further refine the number of nodes used to handle the operator load. The second problem stems from the fact that the number of available nodes is not fixed. Nodes are purchased on demand from the cloud provider. Ideally, one should try to avoid situations where nodes are becoming bottlenecks (underprovisioning) as well as overprovisioning of the overall system. The major challenge is to be able to detect such situations early enough to migrate the operator to another machine before violating given QoS constraints. This is motivated by the fact that both allocation of a new node and moving of an operator to a newly allocated node are not instantaneous. Moreover, a frequent allocation and deallocation of nodes (called thrashing) as a result of a bursty load should be avoided. The overall idea for the operator placement is to provide a fast and simple initial placement strategy and perform subsequent adjustments according to the varying load. The motivation is based on following facts: (1) in CEP systems, unlike in classical data bases, it is not possible to predict the load and (2) initial operator placement is only the starting point for the search of an (sub) optimal solution.

5 Figure 5: Cost efficient execution of queries 6. COST EFFICIENT ELASTIC CEP Most cloud providers use a pay-as-you-go model, which means, that a user is charged for every CPU cycle he uses. The Elastic CEP prototype will use this property to reduce the information processing cost for the user as much as possible. Users of the Elastic CEP prototype will be able to specify a maximum cost for information processing along with QoS constraints, such as: a lower bound for the throughput and upper bound for the latency. The Elastic CEP prototype will try to optimize the data flow in such a way so as to stay within these constraints. From the large number of possible global query plans ones which fulfill both expected QoS constraints as well as the cost limits will be selected. Figure 5 illustrates a set of possible query plans with expected throughput and associated cost along with an expected minimal throughput and a maximum budget. The chosen query plan should meets both criteria which is true for the query plan in the dark gray area. A number of authors [9, 18, 31] have studied the problem of cost efficient computations, especially in context of MapReduce[10] and cloud computing. For batch-based tasks it has been shown [18] how to integrate monetary costs into the schedule computation using an additional cost dimension during optimization. An approach proposed by authors in [9] shows how to combine a pricing model with a provisioning infrastructure and how to incorporate existing SLAs defined between customers and the service provider. The authors also demonstrate how service providers have to pay compensation fees in case of SLA violations. Authors in [31] present a heuristic method of optimization which speeds up a given task within a predefined budget by optimizing the number of allocated machines. However, the complexity of cost efficient planning for applications involving streaming data is much higher, because the concrete rate for the event sources cannot be predicted. If the number of machines has to be increased (due to an increased amount of incoming events) the remaining budget has to be taken into consideration. As a result, the system has to decide (using an admission control) whether to reject a new set of queries, due to violation of the available budget. 7. EVALUATION In order to be able to evaluate the Elastic CEP prototype a benchmark, containing both synthetic as well as realworld data, is needed. The goal of the evaluation is to show strengths and limitations of the Elastic CEP prototype with respect to handling varying load (event rates and number of queries) and the responsiveness of the system. The evaluation will examine following metrics: Throughput The elastic CEP prototype must be able to demonstrate handling of query loads exceeding capacityofasingle node. The ElasticCEP prototype should demonstrate the ability to execute a distributed global query plan. Query load This metric will evaluate the multiple query optimization component both the performance of an optimized query plan as well as the performance of the optimizer will be measured. Responsiveness The responsiveness of the Elastic CEP prototype describes the time needed to adapt to changing query loads as well as the ability to detect overloaded operators. Cost efficiency To test the cost efficiency the Elastic CEP prototype will be faced with situations in which queries have to be rejected or the available budget is limited. The system should show its ability to meet the budget constraints. The evaluation will be based on real-world and synthetic test data. Therefore, existing benchmarks will be considered as starting point for the evaluation. The most common benchmark for CEP systems is Linear Road [5]. The Linear Roads benchmark simulates a toll system on a large number of highways. The system uses a fixed set of queries with well defined constraints regarding correctness and response times. The Linear Road benchmark comes with an event generator, which can change the amount of events sent per second. The main goal of Linear Road benchmark is to test the maximum event rate a system can sustain. However, the both event rates and the query load show a linear behavior, which does not allow to test for the elasticity of the overall system. Elasticity has to be tested by additional test cases therefore, one of the tasks of the thesis described within this paper is to develop a benchmark suite which can be used for comparing the performance of the elastic CEP systems. 8. CONCLUSION The thesis described in this paper introduces an extension to existing CEP systems which reflects the high dynamism of a cloud environment. This involves studying the problem of: (1) how to handle a large amount of concurrent queries, (2) how to make use of allocated resources efficiently and (3) how to realize a cost efficient computation. Therefore, the Elastic CEP prototype developed within the thesis will accept following input: (1) a set of queries, (2) a set of event sources and a (3) set of available nodes and their utilization. As an output the component will provide an assignment of operators to nodes for an elastic CEP system taking into account the runtime statistics. This research is based on existing approaches for distributed complex event processing, however progress beyond the state of the art is made in that the dynamism of the cloud environment and load are considered. The Elastic CEP prototype will be evaluated using a new benchmark system allowing the comparison to other systems with respect to the support for elasticity.

6 9. REFERENCES [1] D. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, et al. The design of the Borealis stream processing engine. In Second Biennial Conference on Innovative Data Systems Research (CIDR 2005), Asilomar, CA, pages Citeseer, [2] Y. Ahmad and U. Çetintemel. Network-aware query processing for stream-based applications. In Proceedings of the Thirtieth international conference on Very large data bases- Volume 30, pages VLDB Endowment, [3] M. Al-Qasem and S. Deen. Query subsumption. Flexible Query Answering Systems, pages 29 42, [4] D. Alves, P. Bizarro, and P. Marques. Flood: Elastic streaming MapReduce. In DEBS 10, pages , [5] A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. Maskey, E. Ryvkina, M. Stonebraker, and R. Tibbetts. Linear road: A stream data management benchmark. In Proceedings of the Thirtieth international conference on Very large data bases-volume 30, pages VLDB Endowment, [6] S. Chaudhuri. An overview of query optimization in relational systems. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 34 43, Seattle, Washington, June ACM Press. [7] S. Chaudhuri, U. Dayal, and V. Narasayya. An overview of business intelligence technology. Communications of the ACM, 54(8):88 98, August [8] M. Creeger. Cloud computing: An overview. Queue, 7(5):3 4, June [9] I. Cunha, J. Almeida, V. Almeida, and M. Santos. Selfadaptive capacity management for multi-tier virtualized environments. In Integrated Network Management, IM th IFIP/IEEE International Symposium on, pages IEEE, [10] J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51(1): , [11] M. Gualtieri and J. R. Rymer. The Forrester Wave: Complex Event Processing (CEP) Platforms, Q Online, August [12] V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, and P. Valduriez. StreamCloud: A large scale data streaming system. In 2010 International Conference on Distributed Computing Systems, pages IEEE, [13] M. Hong, M. Riedewald, C. Koch, J. Gehrke, and A. Demers. Rule-based multi-query optimization. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pages ACM, [14] A. Jacobs. The pathologies of big data. Queue, 7(6):10 19, [15] C. Jin and J. Carbonell. Predicate indexing for incremental multi-query optimization. Foundations of Intelligent Systems, pages , [16] E.Kalyvianaki, W.Wiesemann, Q.Vu, D.Kuhn, andp.pietzuch. SQPR: Stream query planning with reuse. Imperial College London, Tech. Rep, [17] W. Kleiminger, E. Kalyvianaki, and P. Pietzuch. Balancing load in stream processing with the cloud. In Proceedings of the 6th International Workshop on Self Managing Database Systems (SMDB 2011), Hannover, Germany. IEEE, [18] H. Kllapi, E. Sitaridi, M. Tsangaris, and Y. Ioannidis. Schedule optimization for data processing flows on the cloud. In Proceedings of the 2011 international conference on Management of data, pages ACM, [19] G. Lakshmanan, Y. Li, and R. Strom. Placement strategies for internet-scale data stream systems. Internet Computing, IEEE, 12(6):50 60, [20] N. Leavitt. Complex-event processing poised for growth. Computer, 42(4):17 20, [21] D. Luckham. The power of events: An introduction to complex event processing in distributed enterprise systems. Addison-Wesley Longman Publishing Co., Inc., [22] D. Luckham and B. Frasca. Complex event processing in distributed systems. Computer Systems Laboratory Technical Report CSL-TR Stanford University, Stanford, [23] A. Martin, T. Knauth, S. Creutz, D. B. de Brum, S. Weigert, A. Brito, and C. Fetzer. Low-overhead fault tolerance for high-throughput data processing systems. In ICDCS 11: Proceedings of the st IEEE International Conference on Distributed Computing Systems, pages , Los Alamitos, CA, USA, June IEEE Computer Society. [24] P. Mell and T. Grance. The NIST definition of cloud computing. National Institute of Standards and Technology, 53(6), [25] M. Migliavacca, D. Eyers, J. Bacon, Y. Papagiannis, B. Shand, and P. Pietzuch. SEEP: Scalable and elastic event processing. In Middleware 10 Posters and Demos Track, page 4. ACM, [26] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. Efficient and extensible algorithms for multi query optimization. In ACM SIGMOD Record, volume 29, pages ACM, [27] S. Schneider, H. Andrade, B. Gedik, A. Biem, and K. Wu. Elastic scaling of data parallel operators in stream processing. In Parallel & Distributed Processing, IPDPS IEEE International Symposium on, pages IEEE, [28] N. Schultz-Moller, M. Migliavacca, and P. Pietzuch. Distributed complex event processing with query rewriting. In Proceedings of the Third ACM International Conference on Distributed Event-Based Systems, pages ACM, operator placement. [29] T. Sellis. Multiple-query optimization. ACM Transactions on Database Systems (TODS), 13(1):23 52, [30] T. Sellis and S. Ghosh. On the multiple-query optimization problem. Knowledge and Data Engineering, IEEE Transactions on, 2(2): , [31] J. Silva, L. Veiga, and P. Ferreira. Heuristic for resources allocation on utility computing infrastructures. In Proceedings of the 6th international workshop on Middleware for grid computing, page 9. ACM, [32] U. Srivastava, K. Munagala, and J. Widom. Operator placement for in-network stream query processing. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages ACM, [33] StreamBase. Real-time data processing with a stream processing engine. Technical report, [34] Sybase. Evaluating Sybase CEP performance. Technical report, October [35] J. Varia. Architecting for the cloud: Best practices. Technical report, Amazon, Inc., [36] Y. Xing, S. Zdonik, and J. Hwang. Dynamic load distribution in the Borealis stream processor. In Data Engineering, ICDE Proceedings. 21st International Conference on, pages IEEE, 2005.

FUGU: Elastic Data Stream Processing with Latency Constraints

FUGU: Elastic Data Stream Processing with Latency Constraints FUGU: Elastic Data Stream Processing with Latency Constraints Thomas Heinze 1, Yuanzhen Ji 1, Lars Roediger 1, Valerio Pappalardo 1, Andreas Meister 2, Zbigniew Jerzak 1, Christof Fetzer 3 1 SAP SE 2 University

More information

Synergy: Quality of Service Support for Distributed Stream Processing Systems

Synergy: Quality of Service Support for Distributed Stream Processing Systems Synergy: Quality of Service Support for Distributed Stream Processing Systems Thomas Repantis Vana Kalogeraki Department of Computer Science & Engineering University of California, Riverside {trep,vana}@cs.ucr.edu

More information

BiCEP Benchmarking Complex Event Processing Systems

BiCEP Benchmarking Complex Event Processing Systems BiCEP Benchmarking Complex Event Processing Systems Pedro Bizarro University of Coimbra, DEI-CISUC 3030-290 Coimbra, Portugal bizarro@dei.uc.pt Abstract. BiCEP is a new project being started at the University

More information

Challenges in Data Stream Processing

Challenges in Data Stream Processing Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Semantic Event Correlation Using Ontologies

Semantic Event Correlation Using Ontologies Semantic Event Correlation Using Ontologies Thomas Moser 1, Heinz Roth 2, Szabolcs Rozsnyai 3, Richard Mordinyi 1, and Stefan Biffl 1 1 Complex Systems Design & Engineering Lab, Vienna University of Technology

More information

Overload Management in Data Stream Processing Systems with Latency Guarantees

Overload Management in Data Stream Processing Systems with Latency Guarantees Overload Management in Data Stream Processing Systems with Latency Guarantees Evangelia Kalyvianaki, hemistoklis Charalambous, Marco Fiscato, and Peter Pietzuch Department of Computing, Imperial College

More information

StreamMine3G OneClick Deploy & Monitor ESP Applications With A Single Click

StreamMine3G OneClick Deploy & Monitor ESP Applications With A Single Click OneClick Deploy & Monitor ESP Applications With A Single Click Andrey Brito, André Martin, Christof Fetzer, Isabelly Rocha, Telles Nóbrega Technische Universität Dresden - Dresden, Germany - Email: {firstname.lastname}@se.inf.tu-dresden.de

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

DATABASES AND THE CLOUD. Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland

DATABASES AND THE CLOUD. Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland DATABASES AND THE CLOUD Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland AVALOQ Conference Zürich June 2011 Systems Group www.systems.ethz.ch Enterprise Computing Center

More information

Dealing with Overload in Distributed Stream Processing Systems

Dealing with Overload in Distributed Stream Processing Systems Dealing with Overload in Distributed Stream Processing Systems Nesime Tatbul Stan Zdonik Brown University, Department of Computer Science, Providence, RI USA E-mail: {tatbul, sbz}@cs.brown.edu Abstract

More information

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Amin Vahdat Department of Computer Science Duke University 1 Introduction Increasingly,

More information

Scalable and Elastic Realtime Click Stream Analysis Using StreamMine3G

Scalable and Elastic Realtime Click Stream Analysis Using StreamMine3G Scalable and Elastic Realtime Click Stream Analysis Using StreamMine3G André Martin *, Andrey Brito # and Christof Fetzer * * Technische Universität Dresden - Dresden, Germany # Universidade Federal de

More information

Multi-Model Based Optimization for Stream Query Processing

Multi-Model Based Optimization for Stream Query Processing Multi-Model Based Optimization for Stream Query Processing Ying Liu and Beth Plale Computer Science Department Indiana University {yingliu, plale}@cs.indiana.edu Abstract With recent explosive growth of

More information

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating

More information

Scaling-Out with Oracle Grid Computing on Dell Hardware

Scaling-Out with Oracle Grid Computing on Dell Hardware Scaling-Out with Oracle Grid Computing on Dell Hardware A Dell White Paper J. Craig Lowery, Ph.D. Enterprise Solutions Engineering Dell Inc. August 2003 Increasing computing power by adding inexpensive

More information

Virtual Machine Placement in Cloud Computing

Virtual Machine Placement in Cloud Computing Indian Journal of Science and Technology, Vol 9(29), DOI: 10.17485/ijst/2016/v9i29/79768, August 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Virtual Machine Placement in Cloud Computing Arunkumar

More information

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,

More information

2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising

More information

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing Nesime Tatbul Uğur Çetintemel Stan Zdonik Talk Outline Problem Introduction Approach Overview Advance Planning with an

More information

Interactive Responsiveness and Concurrent Workflow

Interactive Responsiveness and Concurrent Workflow Middleware-Enhanced Concurrency of Transactions Interactive Responsiveness and Concurrent Workflow Transactional Cascade Technology Paper Ivan Klianev, Managing Director & CTO Published in November 2005

More information

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD S.THIRUNAVUKKARASU 1, DR.K.P.KALIYAMURTHIE 2 Assistant Professor, Dept of IT, Bharath University, Chennai-73 1 Professor& Head, Dept of IT, Bharath

More information

Accommodating Bursts in Distributed Stream Processing Systems

Accommodating Bursts in Distributed Stream Processing Systems Accommodating Bursts in Distributed Stream Processing Systems Distributed Real-time Systems Lab University of California, Riverside {drougas,vana}@cs.ucr.edu http://www.cs.ucr.edu/~{drougas,vana} Stream

More information

An Intelligent Service Oriented Infrastructure supporting Real-time Applications

An Intelligent Service Oriented Infrastructure supporting Real-time Applications An Intelligent Service Oriented Infrastructure supporting Real-time Applications Future Network Technologies Workshop 10-11 -ETSI, Sophia Antipolis,France Karsten Oberle, Alcatel-Lucent Bell Labs Karsten.Oberle@alcatel-lucent.com

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Rethinking the Design of Distributed Stream Processing Systems

Rethinking the Design of Distributed Stream Processing Systems Rethinking the Design of Distributed Stream Processing Systems Yongluan Zhou Karl Aberer Ali Salehi Kian-Lee Tan EPFL, Switzerland National University of Singapore Abstract In this paper, we present a

More information

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data 46 Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

More information

DATA STREAMS AND DATABASES. CS121: Introduction to Relational Database Systems Fall 2016 Lecture 26

DATA STREAMS AND DATABASES. CS121: Introduction to Relational Database Systems Fall 2016 Lecture 26 DATA STREAMS AND DATABASES CS121: Introduction to Relational Database Systems Fall 2016 Lecture 26 Static and Dynamic Data Sets 2 So far, have discussed relatively static databases Data may change slowly

More information

Crescando: Predictable Performance for Unpredictable Workloads

Crescando: Predictable Performance for Unpredictable Workloads Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing

More information

ENERGY EFFICIENT VIRTUAL MACHINE INTEGRATION IN CLOUD COMPUTING

ENERGY EFFICIENT VIRTUAL MACHINE INTEGRATION IN CLOUD COMPUTING ENERGY EFFICIENT VIRTUAL MACHINE INTEGRATION IN CLOUD COMPUTING Mrs. Shweta Agarwal Assistant Professor, Dept. of MCA St. Aloysius Institute of Technology, Jabalpur(India) ABSTRACT In the present study,

More information

White Paper. Major Performance Tuning Considerations for Weblogic Server

White Paper. Major Performance Tuning Considerations for Weblogic Server White Paper Major Performance Tuning Considerations for Weblogic Server Table of Contents Introduction and Background Information... 2 Understanding the Performance Objectives... 3 Measuring your Performance

More information

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Presented by: Ahmed Elbagoury Outline Background & Motivation What is Restore? Types of Result Reuse System Architecture Experiments Conclusion Discussion Background

More information

Performance Evaluation of Yahoo! S4: A First Look

Performance Evaluation of Yahoo! S4: A First Look Performance Evaluation of Yahoo! S4: A First Look Jagmohan Chauhan, Shaiful Alam Chowdhury and Dwight Makaroff Department of Computer Science University of Saskatchewan Saskatoon, SK, CANADA S7N 3C9 (jac735,

More information

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad

More information

Hybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud

Hybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud Hybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud Abid Nisar, Waheed Iqbal, Fawaz S. Bokhari, and Faisal Bukhari Punjab University College of Information and Technology,Lahore

More information

NC Education Cloud Feasibility Report

NC Education Cloud Feasibility Report 1 NC Education Cloud Feasibility Report 1. Problem Definition and rationale North Carolina districts are generally ill-equipped to manage production server infrastructure. Server infrastructure is most

More information

Exploiting Predicate-window Semantics over Data Streams

Exploiting Predicate-window Semantics over Data Streams Exploiting Predicate-window Semantics over Data Streams Thanaa M. Ghanem Walid G. Aref Ahmed K. Elmagarmid Department of Computer Sciences, Purdue University, West Lafayette, IN 47907-1398 {ghanemtm,aref,ake}@cs.purdue.edu

More information

A Predictive Load Balancing Service for Cloud-Replicated Databases

A Predictive Load Balancing Service for Cloud-Replicated Databases paper:174094 A Predictive Load Balancing for Cloud-Replicated Databases Carlos S. S. Marinho 1,2, Emanuel F. Coutinho 1, José S. Costa Filho 2, Leonardo O. Moreira 1,2, Flávio R. C. Sousa 2, Javam C. Machado

More information

StreamGlobe Adaptive Query Processing and Optimization in Streaming P2P Environments

StreamGlobe Adaptive Query Processing and Optimization in Streaming P2P Environments StreamGlobe Adaptive Query Processing and Optimization in Streaming P2P Environments A. Kemper, R. Kuntschke, and B. Stegmaier TU München Fakultät für Informatik Lehrstuhl III: Datenbanksysteme http://www-db.in.tum.de/research/projects/streamglobe

More information

Mark Sandstrom ThroughPuter, Inc.

Mark Sandstrom ThroughPuter, Inc. Hardware Implemented Scheduler, Placer, Inter-Task Communications and IO System Functions for Many Processors Dynamically Shared among Multiple Applications Mark Sandstrom ThroughPuter, Inc mark@throughputercom

More information

Adaptive Aggregation Scheduling Using. Aggregation-degree Control in Sensor Network

Adaptive Aggregation Scheduling Using. Aggregation-degree Control in Sensor Network Contemporary Engineering Sciences, Vol. 7, 2014, no. 14, 725-730 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4681 Adaptive Aggregation Scheduling Using Aggregation-degree Control in

More information

QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation

QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation * Universität Karlsruhe (TH) Technical University of Catalonia (UPC) Barcelona Supercomputing Center (BSC) Samuel

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

Amazon Elastic File System

Amazon Elastic File System Amazon Elastic File System Choosing Between the Different Throughput & Performance Modes July 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided

More information

Large Scale Computing Infrastructures

Large Scale Computing Infrastructures GC3: Grid Computing Competence Center Large Scale Computing Infrastructures Lecture 2: Cloud technologies Sergio Maffioletti GC3: Grid Computing Competence Center, University

More information

Best Practices. Deploying Optim Performance Manager in large scale environments. IBM Optim Performance Manager Extended Edition V4.1.0.

Best Practices. Deploying Optim Performance Manager in large scale environments. IBM Optim Performance Manager Extended Edition V4.1.0. IBM Optim Performance Manager Extended Edition V4.1.0.1 Best Practices Deploying Optim Performance Manager in large scale environments Ute Baumbach (bmb@de.ibm.com) Optim Performance Manager Development

More information

COMPTIA CLO-001 EXAM QUESTIONS & ANSWERS

COMPTIA CLO-001 EXAM QUESTIONS & ANSWERS COMPTIA CLO-001 EXAM QUESTIONS & ANSWERS Number: CLO-001 Passing Score: 800 Time Limit: 120 min File Version: 39.7 http://www.gratisexam.com/ COMPTIA CLO-001 EXAM QUESTIONS & ANSWERS Exam Name: CompTIA

More information

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS Raj Kumar, Vanish Talwar, Sujoy Basu Hewlett-Packard Labs 1501 Page Mill Road, MS 1181 Palo Alto, CA 94304 USA { raj.kumar,vanish.talwar,sujoy.basu}@hp.com

More information

Complex Event Processing in a High Transaction Enterprise POS System

Complex Event Processing in a High Transaction Enterprise POS System Complex Event Processing in a High Transaction Enterprise POS System Samuel Collins* Roy George** *InComm, Atlanta, GA 30303 **Department of Computer and Information Science, Clark Atlanta University,

More information

YOUR APPLICATION S JOURNEY TO THE CLOUD. What s the best way to get cloud native capabilities for your existing applications?

YOUR APPLICATION S JOURNEY TO THE CLOUD. What s the best way to get cloud native capabilities for your existing applications? YOUR APPLICATION S JOURNEY TO THE CLOUD What s the best way to get cloud native capabilities for your existing applications? Introduction Moving applications to cloud is a priority for many IT organizations.

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

Adaptive In-Network Query Deployment for Shared Stream Processing Environments

Adaptive In-Network Query Deployment for Shared Stream Processing Environments Adaptive In-Network Query Deployment for Shared Stream Processing Environments Olga Papaemmanouil Brown University olga@cs.brown.edu Sujoy Basu HP Labs basus@hpl.hp.com Sujata Banerjee HP Labs sujata@hpl.hp.com

More information

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Architectural challenges for building a low latency, scalable multi-tenant data warehouse Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics

More information

Performance Assurance in Virtualized Data Centers

Performance Assurance in Virtualized Data Centers Autonomic Provisioning with Self-Adaptive Neural Fuzzy Control for End-to-end Delay Guarantee Palden Lama Xiaobo Zhou Department of Computer Science University of Colorado at Colorado Springs Performance

More information

Transactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev

Transactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev Transactum Business Process Manager with High-Performance Elastic Scaling November 2011 Ivan Klianev Transactum BPM serves three primary objectives: To make it possible for developers unfamiliar with distributed

More information

An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language

An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language Martin C. Rinard (martin@cs.ucsb.edu) Department of Computer Science University

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW

More information

DSMS Benchmarking. Morten Lindeberg University of Oslo

DSMS Benchmarking. Morten Lindeberg University of Oslo DSMS Benchmarking Morten Lindeberg University of Oslo Agenda Introduction DSMS Recap General Requirements Metrics Example: Linear Road Example: StreamBench 30. Sep. 2009 INF5100 - Morten Lindeberg 2 Introduction

More information

A QoS Control Method Cooperating with a Dynamic Load Balancing Mechanism

A QoS Control Method Cooperating with a Dynamic Load Balancing Mechanism A QoS Control Method Cooperating with a Dynamic Load Balancing Mechanism Akiko Okamura, Koji Nakamichi, Hitoshi Yamada and Akira Chugo Fujitsu Laboratories Ltd. 4-1-1, Kamikodanaka, Nakahara, Kawasaki,

More information

Network Awareness in Internet-Scale Stream Processing

Network Awareness in Internet-Scale Stream Processing Network Awareness in Internet-Scale Stream Processing Yanif Ahmad, Uğur Çetintemel John Jannotti, Alexander Zgolinski, Stan Zdonik {yna,ugur,jj,amz,sbz}@cs.brown.edu Dept. of Computer Science, Brown University,

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation report prepared under contract with Dot Hill August 2015 Executive Summary Solid state

More information

TDDD82 Secure Mobile Systems Lecture 6: Quality of Service

TDDD82 Secure Mobile Systems Lecture 6: Quality of Service TDDD82 Secure Mobile Systems Lecture 6: Quality of Service Mikael Asplund Real-time Systems Laboratory Department of Computer and Information Science Linköping University Based on slides by Simin Nadjm-Tehrani

More information

Hybrid Approach for the Maintenance of Materialized Webviews

Hybrid Approach for the Maintenance of Materialized Webviews Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2010 Proceedings Americas Conference on Information Systems (AMCIS) 8-2010 Hybrid Approach for the Maintenance of Materialized Webviews

More information

SoftNAS Cloud Performance Evaluation on AWS

SoftNAS Cloud Performance Evaluation on AWS SoftNAS Cloud Performance Evaluation on AWS October 25, 2016 Contents SoftNAS Cloud Overview... 3 Introduction... 3 Executive Summary... 4 Key Findings for AWS:... 5 Test Methodology... 6 Performance Summary

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Journal homepage: www.mjret.in ISSN:2348-6953 A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Bhavsar Nikhil, Bhavsar Riddhikesh,Patil Balu,Tad Mukesh Department of Computer Engineering JSPM s

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

WORKFLOW ENGINE FOR CLOUDS

WORKFLOW ENGINE FOR CLOUDS WORKFLOW ENGINE FOR CLOUDS By SURAJ PANDEY, DILEBAN KARUNAMOORTHY, and RAJKUMAR BUYYA Prepared by: Dr. Faramarz Safi Islamic Azad University, Najafabad Branch, Esfahan, Iran. Task Computing Task computing

More information

Star: Sla-Aware Autonomic Management of Cloud Resources

Star: Sla-Aware Autonomic Management of Cloud Resources Star: Sla-Aware Autonomic Management of Cloud Resources Sakshi Patil 1, Meghana N Rathod 2, S. A Madival 3, Vivekanand M Bonal 4 1, 2 Fourth Sem M. Tech Appa Institute of Engineering and Technology Karnataka,

More information

Grand Challenge: Scalable Stateful Stream Processing for Smart Grids

Grand Challenge: Scalable Stateful Stream Processing for Smart Grids Grand Challenge: Scalable Stateful Stream Processing for Smart Grids Raul Castro Fernandez, Matthias Weidlich, Peter Pietzuch Imperial College London {rc311, m.weidlich, prp}@imperial.ac.uk Avigdor Gal

More information

GETTING GREAT PERFORMANCE IN THE CLOUD

GETTING GREAT PERFORMANCE IN THE CLOUD WHITE PAPER GETTING GREAT PERFORMANCE IN THE CLOUD An overview of storage performance challenges in the cloud, and how to deploy VPSA Storage Arrays for better performance, privacy, flexibility and affordability.

More information

Veritas InfoScale Enterprise for Oracle Real Application Clusters (RAC)

Veritas InfoScale Enterprise for Oracle Real Application Clusters (RAC) Veritas InfoScale Enterprise for Oracle Real Application Clusters (RAC) Manageability and availability for Oracle RAC databases Overview Veritas InfoScale Enterprise for Oracle Real Application Clusters

More information

2 The BEinGRID Project

2 The BEinGRID Project 2 The BEinGRID Project Theo Dimitrakos 2.1 Introduction Most of the results presented in this book were created within the BEinGRID project. BEinGRID, Business Experiments in GRID, is the European Commission

More information

CHAPTER 7 CONCLUSION AND FUTURE SCOPE

CHAPTER 7 CONCLUSION AND FUTURE SCOPE 121 CHAPTER 7 CONCLUSION AND FUTURE SCOPE This research has addressed the issues of grid scheduling, load balancing and fault tolerance for large scale computational grids. To investigate the solution

More information

QoS support for Intelligent Storage Devices

QoS support for Intelligent Storage Devices QoS support for Intelligent Storage Devices Joel Wu Scott Brandt Department of Computer Science University of California Santa Cruz ISW 04 UC Santa Cruz Mixed-Workload Requirement General purpose systems

More information

New research on Key Technologies of unstructured data cloud storage

New research on Key Technologies of unstructured data cloud storage 2017 International Conference on Computing, Communications and Automation(I3CA 2017) New research on Key Technologies of unstructured data cloud storage Songqi Peng, Rengkui Liua, *, Futian Wang State

More information

D DAVID PUBLISHING. Big Data; Definition and Challenges. 1. Introduction. Shirin Abbasi

D DAVID PUBLISHING. Big Data; Definition and Challenges. 1. Introduction. Shirin Abbasi Journal of Energy and Power Engineering 10 (2016) 405-410 doi: 10.17265/1934-8975/2016.07.004 D DAVID PUBLISHING Shirin Abbasi Computer Department, Islamic Azad University-Tehran Center Branch, Tehran

More information

SOLUTION BRIEF NETWORK OPERATIONS AND ANALYTICS. How Can I Predict Network Behavior to Provide for an Exceptional Customer Experience?

SOLUTION BRIEF NETWORK OPERATIONS AND ANALYTICS. How Can I Predict Network Behavior to Provide for an Exceptional Customer Experience? SOLUTION BRIEF NETWORK OPERATIONS AND ANALYTICS How Can I Predict Network Behavior to Provide for an Exceptional Customer Experience? SOLUTION BRIEF CA DATABASE MANAGEMENT FOR DB2 FOR z/os DRAFT When used

More information

Grand Challenge: Scalable Stateful Stream Processing for Smart Grids

Grand Challenge: Scalable Stateful Stream Processing for Smart Grids Grand Challenge: Scalable Stateful Stream Processing for Smart Grids Raul Castro Fernandez, Matthias Weidlich, Peter Pietzuch Imperial College London {rc311, m.weidlich, prp}@imperial.ac.uk Avigdor Gal

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Data Stream Processing Topics Model Issues System Issues Distributed Processing Web-Scale Streaming 3 System Issues Architecture

More information

Complexity Measures for Map-Reduce, and Comparison to Parallel Computing

Complexity Measures for Map-Reduce, and Comparison to Parallel Computing Complexity Measures for Map-Reduce, and Comparison to Parallel Computing Ashish Goel Stanford University and Twitter Kamesh Munagala Duke University and Twitter November 11, 2012 The programming paradigm

More information

Pocket: Elastic Ephemeral Storage for Serverless Analytics

Pocket: Elastic Ephemeral Storage for Serverless Analytics Pocket: Elastic Ephemeral Storage for Serverless Analytics Ana Klimovic*, Yawen Wang*, Patrick Stuedi +, Animesh Trivedi +, Jonas Pfefferle +, Christos Kozyrakis* *Stanford University, + IBM Research 1

More information

Statistics Driven Workload Modeling for the Cloud

Statistics Driven Workload Modeling for the Cloud UC Berkeley Statistics Driven Workload Modeling for the Cloud Archana Ganapathi, Yanpei Chen Armando Fox, Randy Katz, David Patterson SMDB 2010 Data analytics are moving to the cloud Cloud computing economy

More information

Reservoir Resource and Service Virtualisation w/out Borders

Reservoir Resource and Service Virtualisation w/out Borders NETWORKED EUROPEAN SOFTWARE & SERVICES INITIATIVE Reservoir Resource and Service Virtualisation w/out Borders Dr. Yaron Wolfsthal IBM Haifa Research Lab NESSI General Assembly 2007 Brussels 11th & 12th

More information

How Real Time Are Your Analytics?

How Real Time Are Your Analytics? How Real Time Are Your Analytics? Min Xiao Solutions Architect, VoltDB Table of Contents Your Big Data Analytics.... 1 Turning Analytics into Real Time Decisions....2 Bridging the Gap...3 How VoltDB Helps....4

More information

Was ist dran an einer spezialisierten Data Warehousing platform?

Was ist dran an einer spezialisierten Data Warehousing platform? Was ist dran an einer spezialisierten Data Warehousing platform? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Data warehousing, Exadata, specialized hardware proprietary hardware Introduction

More information

The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization

The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization May 2014 Prepared by: Zeus Kerravala The Top Five Reasons to Deploy Software-Defined Networks and Network Functions

More information

Design Issues for Second Generation Stream Processing Engines

Design Issues for Second Generation Stream Processing Engines Design Issues for Second Generation Stream Processing Engines Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Uğur Çetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag S. Maskey,

More information

Héctor Fernández and G. Pierre Vrije Universiteit Amsterdam

Héctor Fernández and G. Pierre Vrije Universiteit Amsterdam Héctor Fernández and G. Pierre Vrije Universiteit Amsterdam Cloud Computing Day, November 20th 2012 contrail is co-funded by the EC 7th Framework Programme under Grant Agreement nr. 257438 1 Typical Cloud

More information

W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b l e d N e x t - G e n e r a t i o n V N X

W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b l e d N e x t - G e n e r a t i o n V N X Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b

More information

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER Hardware Sizing Using Amazon EC2 A QlikView Scalability Center Technical White Paper June 2013 qlikview.com Table of Contents Executive Summary 3 A Challenge

More information

Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM

Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM Hyunchul Seok Daejeon, Korea hcseok@core.kaist.ac.kr Youngwoo Park Daejeon, Korea ywpark@core.kaist.ac.kr Kyu Ho Park Deajeon,

More information

Towards Energy Efficient Change Management in a Cloud Computing Environment

Towards Energy Efficient Change Management in a Cloud Computing Environment Towards Energy Efficient Change Management in a Cloud Computing Environment Hady AbdelSalam 1,KurtMaly 1,RaviMukkamala 1, Mohammad Zubair 1, and David Kaminsky 2 1 Computer Science Department, Old Dominion

More information

Data Streams. Building a Data Stream Management System. DBMS versus DSMS. The (Simplified) Big Picture. (Simplified) Network Monitoring

Data Streams. Building a Data Stream Management System. DBMS versus DSMS. The (Simplified) Big Picture. (Simplified) Network Monitoring Building a Data Stream Management System Prof. Jennifer Widom Joint project with Prof. Rajeev Motwani and a team of graduate students http://www-db.stanford.edu/stream stanfordstreamdatamanager Data Streams

More information

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore huwu@i2r.a-star.edu.sg Abstract. Processing XML queries over

More information

Contextion: A Framework for Developing Context-Aware Mobile Applications

Contextion: A Framework for Developing Context-Aware Mobile Applications Contextion: A Framework for Developing Context-Aware Mobile Applications Elizabeth Williams, Jeff Gray Department of Computer Science, University of Alabama eawilliams2@crimson.ua.edu, gray@cs.ua.edu Abstract

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Thesis: An Extensible, Self-Tuning, Overlay-Based Infrastructure for Large-Scale Stream Processing and Dissemination Advisor: Ugur Cetintemel

Thesis: An Extensible, Self-Tuning, Overlay-Based Infrastructure for Large-Scale Stream Processing and Dissemination Advisor: Ugur Cetintemel Olga Papaemmanouil Phone: +1 (401) 588-0230 Department of Computer Science Fax: +1 (401) 863-7657 Box 1910, 115 Waterman St, Floor 4 Email: olga@cs.brown.edu Providence, RI, 02912, USA Web: http://www.cs.brown.edu/

More information

vsan 6.6 Performance Improvements First Published On: Last Updated On:

vsan 6.6 Performance Improvements First Published On: Last Updated On: vsan 6.6 Performance Improvements First Published On: 07-24-2017 Last Updated On: 07-28-2017 1 Table of Contents 1. Overview 1.1.Executive Summary 1.2.Introduction 2. vsan Testing Configuration and Conditions

More information