High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg
|
|
- Cleopatra Wood
- 5 years ago
- Views:
Transcription
1 High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg common work with Nikolaus Glombiewski, Michael Körber, Marc Seidemann
2 1. Motivation reactive monitoring of timecritical business processes predictions about the near future and recommendations for action 2 Bernhard Seeger
3 Situations of Interest Impact Root Cause Event Benefit Opportunity Reaction Costs Options E-2 E-1 E E+1 E+2 E+3 E+4 E+7 E+8 E+9 E+10 Time 3 Bernhard Seeger
4 Many application domains Algorithmic trading Logistics Traffic management Internet of Things System Monitoring & Security 4
5 5 Monitoring IT infrastructures
6 6 Event-based Security within a VM
7 Agenda Review of CEP Architecture Event Store Pattern Matching Conclusions 7 Bernhard Seeger
8 2. A Critical Review of CEP The history of CEP Charles Forgy Inventor of the RETE-algorithms (1981) David Luckham Rapide Project The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems (published 2002) Jennifer Widom Stream Project (2002) 8 Bernhard Seeger
9 Basic Ideas Event Sources Continuous Production of Events Continuous Processing of EPAs Events are flowing through a network of EPAs Event Sinks 9
10 Functionality of EPAs Basic set of of operators Filter Select applications that throw an error message Sliding Window Aggregation Compute number of all running applications within the last minute Window-based Correlator Correlate applications with servers they are running on within the last 10 seconds. Window-based Pattern Matching Detect faulty and anormal application state transitions User-defined operators 10
11 Many CEP-systems available SQL-based systems MS StreamInsight, Esper Tech, Siddhi, Systems with special-purpose language Tibco, Apama, Plain distributed stream systems Twitter Storm, Spark Streaming, Flink, 11 but no agreed semantics
12 Problems and Issues Performance High Throughput vs. Low Latency Scalability Event Store Persistent Management of Events Information extraction from historical events Functionality Support for application time Powerful pattern matching 12
13 The Performance Issue of CEP Esper low latency? Spark Streaming low throughput high throughput high latency Spark 13
14 The Persistence Issue of CEP CEP systems are designed for in-memory processing only. Volatile Data and Persistent Queries Applications require a persistent management of events. Extremely high input rates (millions of events/s) Time-based queries on massive databases 14 Are standard DBMS or NoSql systems the right tools?
15 The Functionality Issue Pattern Matching is the Core Operator for Event Processing Detect a sharp increase in temperature together with sufficiently large amount of smoke within a short period of time. Despite its importance Pattern Matching requires domain knowledge. User-defined implementation vs. General-purpose operator offered by the system 15
16 Summary Problems and Issues in current systems optimized either for low latency or high throughput persistence is still a big issue and often delegated to Apache Kafka very weak or no support for pattern matching 16
17 3. Our Architecture Basic Ideas Combination of an event store and a CEP-engine Similar to the Lambda-architecture Both components run under a unified interface (JEPC) It allows to exchange specific technologies (your most preferred CEP engine, you most preferred store) JEPC acts as a federation platform A continuous query can run in parallel on multiple target platforms. 17
18 Our Architecture C++, Groovy, Realtime Reports (e.g. Grafana) WebSockets Java SQL-like query language JEPC ChronicleDB Bridge Bridge Bridge Native CEP-system JDBC Esper 18 throughput layer H2 PostgreSQL Flink low-latency layer
19 Important Concepts EPAs (aka continuous queries) Queries come with latency constraints Visualization dashboard: 1 min. Security: 1 s Alarm as fast as possible Assignment of queries based on the latency constraint Low-latency layer High-throughput layer 19
20 Our Architecture CEP-only C++, Groovy, WebSockets Realtime Reports (e.g. Grafana) Java SQL-like query language JEPC 20 throughput layer low-latency layer
21 Our Architecture DBS only C++, Groovy, WebSockets Realtime Reports (e.g. Grafana) Java SQL-like query language JEPC 21 throughput layer low-latency layer
22 Our Architecture ideal C++, Groovy, WebSockets Realtime Reports (e.g. Grafana) Java SQL-like query language JEPC 22 throughput layer low-latency layer
23 Necessary Requirements for the Throughput Layer Time to update the database < latency constraint of query Time to process the query on the database < latency constraint of query 23
24 4. ChronicleDB 24 Our Database system for the management of historical events to achieve high throughput. Properties Optimized for fast writes Utilization sequential write performance of magnetic disks Compression Queries Efficient support of temporal predicates Analyze events within a range of four hours Temporal aggregates Number of ssh logins last Tuesday Fast garbage collection of outdated events Bernhard Seeger
25 Architecture of ChronicleDB SQL-like query language Command Line Interface REST-API Java TCP-based protocol ChronicleDB Compression Secondary Indexes (LSM, COLA, ) PAX-Layout Aggregate Temporal B- tree
26 The Gist of ChronicleDB External Memory Main memory optionally secondary indexes Append-only B-tree Event Queues CEP external event streams t 1 t 2 time 26
27 Append-Only B-tree The entire tree is sequentially written in one stream. kept on your favorite technology: UNIX-fs,HDFS,Ceph Record in a leaf consists of Timestamp List of attribute values Index entry in an internal node consists of Temporal routing information Aggregates of the non-indexed attributes min, max, top-k, sum, 27
28 Compression Column Layout (PAX) within a page. A multidimensional time series is split into multiple one-dimensional time series within each page. Compression of one-dimensional time series using a standard algorithm LZ4. LZ4 is very fast in decompression. 28
29 Experimental Results Limited to a central system Maximum disk speed 187 MB/s Measures Events per second Event streams 29
30 Comparison (write performance) Our results for ElasticSearch: events/s to load DEBS 30
31 What is possible? gross data rates 31
32 32 Comparison (Read Performance)
33 33 Performance of Temporal Aggregation vs. Temporal Scans
34 34 Recovery Time of ChronicleDB
35 Summary ChronicleDB provides large performance improvements over other systems. Inserion rate, recovery time search performance ChronicleDB either runs on (parallel) file system HDFS Scalability of ChronicleDB using one of the popular distributed frameworks 35
36 5. Pattern Matching Pattern: Sequence of conditions A B + C Matching: Search for pattern in event stream Stream e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 e 9 36
37 Pattern Matching Pattern: Sequence of conditions A B + C Matching: Search for pattern in event stream Stream e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 e 9 Match! 37
38 Example Query FROM Sensors s DEFINE AS s.temperature > 60 DO prev = s.temperature AS s.temperature > prev DO prev = s.temperature AS s.smoke = true PATTERN WITHIN 60 seconds RETURN ALERT 38
39 Event Processing Implementation E.g. via NFA (Nondeterministic Finite Automaton) S 0 S 1 S 2 S 3 39
40 Event Store Implementation Index incoming data Attributes involved in conditions E.g. A : s.temperature Determine most selective sub-pattern Leverage index to restrict search space 40
41 6. Conclusions 41 Lambda-like architecture for event processing Low latency & High throughput Due to the smart indexing capabilities in ChronicleDB Performance of ChronicleDB Superior to competitive systems Full Support of Pattern Matching Event Processing Engine & ChronicleDB Available under open source Bernhard Seeger
42 Thanks JEPC is common work with Bastian Hoßbach and Marc Seidemann Dieter Gawlick for our great discussions Our team: Nikolaus Glombiewski, Michael Körber, Andreas Morgen, Franz Ritter BMBF for funding ACCEPT 42 Bernhard Seeger
DYNAMIC Complex Event Processing
DYNAMIC Complex Event Processing Not Only the Engine Matters! Bernhard Seeger Universität Marburg Motivation reactive monitoring of timecritical buisness processes predictions about the near future and
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationDistributed systems for stream processing
Distributed systems for stream processing Apache Kafka and Spark Structured Streaming Alena Hall Alena Hall Large-scale data processing Distributed Systems Functional Programming Data Science & Machine
More informationFluentd + MongoDB + Spark = Awesome Sauce
Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationHDInsight > Hadoop. October 12, 2017
HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond
More informationStreaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_
Streaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_ About Us At GetInData, we build custom Big Data solutions Hadoop, Flink, Spark, Kafka and more Our team is today represented
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationUsing ElasticSearch to Enable Stronger Query Support in Cassandra
Using ElasticSearch to Enable Stronger Query Support in Cassandra www.impetus.com Introduction Relational Databases have been in use for decades, but with the advent of big data, there is a need to use
More informationTowards a Real- time Processing Pipeline: Running Apache Flink on AWS
Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges
More information10 Million Smart Meter Data with Apache HBase
10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on
More informationTime Series Storage with Apache Kudu (incubating)
Time Series Storage with Apache Kudu (incubating) Dan Burkert (Committer) dan@cloudera.com @danburkert Tweet about this talk: @getkudu or #kudu 1 Time Series machine metrics event logs sensor telemetry
More informationThe Stream Processor as a Database. Ufuk
The Stream Processor as a Database Ufuk Celebi @iamuce Realtime Counts and Aggregates The (Classic) Use Case 2 (Real-)Time Series Statistics Stream of Events Real-time Statistics 3 The Architecture collect
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationStreaming SQL. Julian Hyde. 9 th XLDB Conference SLAC, Menlo Park, 2016/05/25
Streaming SQL Julian Hyde 9 th XLDB Conference SLAC, Menlo Park, 2016/05/25 @julianhyde SQL Query planning Query federation OLAP Streaming Hadoop Apache member VP Apache Calcite PMC Apache Arrow, Drill,
More informationPulsar. Realtime Analytics At Scale. Wang Xinglang
Pulsar Realtime Analytics At Scale Wang Xinglang Agenda Pulsar : Real Time Analytics At ebay Business Use Cases Product Requirements Pulsar : Technology Deep Dive 2 Pulsar Business Use Case: Behavioral
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationMeasuring Performance of Complex Event Processing Systems
Measuring Performance of Complex Event Processing Systems Torsten Grabs, Ming Lu Microsoft StreamInsight Microsoft Corp., Redmond, WA {torsteng, milu}@microsoft.com Agenda Motivation CEP systems and performance
More informationData Ingestion at Scale. Jeffrey Sica
Data Ingestion at Scale Jeffrey Sica ARC-TS @jeefy Overview What is Data Ingestion? Concepts Use Cases GPS collection with mobile devices Collecting WiFi data from WAPs Sensor data from manufacturing machines
More informationAn InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager
An InterSystems Guide to the Data Galaxy Benjamin De Boe Product Manager Analytics 3 InterSystems Corporation. All rights reserved. 4 InterSystems Corporation. All rights reserved. 5 InterSystems Corporation.
More informationDRAFT A Survey of Event Processing Languages (EPLs)
DRAFT A Survey of Event Processing Languages (EPLs) October 15, 2006 (v14) Tim Bass, CISSP Co-Chair Event Processing Reference Architecture Working Group Principal Global Architect, Director TIBCO Software
More informationBig Streaming Data Processing. How to Process Big Streaming Data 2016/10/11. Fraud detection in bank transactions. Anomalies in sensor data
Big Data Big Streaming Data Big Streaming Data Processing Fraud detection in bank transactions Anomalies in sensor data Cat videos in tweets How to Process Big Streaming Data Raw Data Streams Distributed
More informationEsper EQC. Horizontal Scale-Out for Complex Event Processing
Esper EQC Horizontal Scale-Out for Complex Event Processing Esper EQC - Introduction Esper query container (EQC) is the horizontal scale-out architecture for Complex Event Processing with Esper and EsperHA
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationMonitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino
Monitoring system for geographically distributed datacenters based on Openstack Gioacchino Vino Tutor: Dott. Domenico Elia Tutor: Dott. Giacinto Donvito Borsa di studio GARR Orio Carlini 2016-2017 INFN
More informationMonitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team
Monitoring for IT Services and WLCG Alberto AIMAR CERN-IT for the MONIT Team 2 Outline Scope and Mandate Architecture and Data Flow Technologies and Usage WLCG Monitoring IT DC and Services Monitoring
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationMellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions
Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions Providing Superior Server and Storage Performance, Efficiency and Return on Investment As Announced and Demonstrated at
More informationManaging IoT and Time Series Data with Amazon ElastiCache for Redis
Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All
More informationData Architectures in Azure for Analytics & Big Data
Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationMigrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring
Migrating massive monitoring to Bigtable without downtime Martin Parm, Infrastructure Engineer for Monitoring This is a big deal. -- Nicholas Harteau/VP, Engineering & Infrastructure https://news.spotify.com/dk/2016/02/23/announcing-spotify-infrastructures-googley-future/
More informationdeconstructing LAMBDA Philly ETE Darach Ennis
deconstructing LAMBDA Philly ETE 2014 - Darach Ennis - @darachennis A journey from speed at any cost - to unit cost at considerable scale Philly ETE 2014 - Darach Ennis - @darachennis small FAST DATA
More informationCloudera Kudu Introduction
Cloudera Kudu Introduction Zbigniew Baranowski Based on: http://slideshare.net/cloudera/kudu-new-hadoop-storage-for-fast-analytics-onfast-data What is KUDU? New storage engine for structured data (tables)
More informationDesign and Implementation of Real-time Visualization tool for Network Security Monitoring
Design and Implementation of Real-time Visualization tool for Network Security Monitoring Aneela Safdar Supervisor : Dr. Hanif Durad Co-Supervisor : M. Masoom Alam DCIS PIEAS Motivation To look what s
More informationCaché and Data Management in the Financial Services Industry
Caché and Data Management in the Financial Services Industry Executive Overview One way financial services firms can improve their operational efficiency is to revamp their data management infrastructure.
More informationThales PunchPlatform Agenda
Thales PunchPlatform Agenda What It Does Building Blocks PunchPlatform team Deployment & Operations Typical Setups Customers and Use Cases RoadMap 1 What It Does Compose Arbitrary Industrial Data Processing
More informationA Single Source of Truth
A Single Source of Truth is it the mythical creature of data management? In the world of data management, a single source of truth is a fully trusted data source the ultimate authority for the particular
More informationCOMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING
Volume 119 No. 16 2018, 937-948 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING K.Anusha
More informationIBM Data Replication for Big Data
IBM Data Replication for Big Data Highlights Stream changes in realtime in Hadoop or Kafka data lakes or hubs Provide agility to data in data warehouses and data lakes Achieve minimum impact on source
More informationMicrosoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo
Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 HOTSPOT You install the Microsoft Hive ODBC Driver on a computer that runs Windows
More informationThe Technology of the Business Data Lake. Appendix
The Technology of the Business Data Lake Appendix Pivotal data products Term Greenplum Database GemFire Pivotal HD Spring XD Pivotal Data Dispatch Pivotal Analytics Description A massively parallel platform
More informationHow we built a highly scalable Machine Learning platform using Apache Mesos
How we built a highly scalable Machine Learning platform using Apache Mesos Daniel Sârbe Development Manager, BigData and Cloud Machine Translation @ SDL Co-founder of BigData/DataScience Meetup Cluj,
More informationPNDA.io: when BGP meets Big-Data
PNDA.io: when BGP meets Big-Data Let s go back in time 26 th April 2017 The Internet is very much alive Millions of BGP events occurring every day 15 Routers Monitored 410 active peers (both IPv4 and IPv6)
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationData Analytics with HPC. Data Streaming
Data Analytics with HPC Data Streaming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationBig Data Hadoop Course Content
Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux
More informationWebinar Series TMIP VISION
Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing
More informationAgenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache
Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More informationexam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0
70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to
More informationMicrosoft Perform Data Engineering on Microsoft Azure HDInsight.
Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight http://killexams.com/pass4sure/exam-detail/70-775 QUESTION: 30 You are building a security tracking solution in Apache Kafka to parse
More informationApache Kudu. Zbigniew Baranowski
Apache Kudu Zbigniew Baranowski Intro What is KUDU? New storage engine for structured data (tables) does not use HDFS! Columnar store Mutable (insert, update, delete) Written in C++ Apache-licensed open
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationCOMMUNICATION PROTOCOLS
COMMUNICATION PROTOCOLS Index Chapter 1. Introduction Chapter 2. Software components message exchange JMS and Tibco Rendezvous Chapter 3. Communication over the Internet Simple Object Access Protocol (SOAP)
More informationLecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka
Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another
More informationUpgrade Your MuleESB with Solace s Messaging Infrastructure
The era of ubiquitous connectivity is upon us. The amount of data most modern enterprises must collect, process and distribute is exploding as a result of real-time process flows, big data, ubiquitous
More informationChronix A fast and efficient time series storage based on Apache Solr. Caution: Contains technical content.
Chronix A fast and efficient time series storage based on Apache Solr Caution: Contains technical content. 68.000.000.000* time correlated data objects. How to store such amount of data on your laptop
More informationA New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.
A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd. 1 Agenda Introduction Background and Motivation Hybrid Key-Value Data Store Architecture Overview Design details Performance
More informationContainer 2.0. Container: check! But what about persistent data, big data or fast data?!
@unterstein @joerg_schad @dcos @jaxdevops Container 2.0 Container: check! But what about persistent data, big data or fast data?! 1 Jörg Schad Distributed Systems Engineer @joerg_schad Johannes Unterstein
More informationBuilding a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch
Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning
More informationOracle NoSQL Database Overview Marie-Anne Neimat, VP Development
Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development June14, 2012 1 Copyright 2012, Oracle and/or its affiliates. All rights Agenda Big Data Overview Oracle NoSQL Database Architecture Technical
More informationDatacenter replication solution with quasardb
Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION
More informationCopyright 2018, Oracle and/or its affiliates. All rights reserved.
Beyond SQL Tuning: Insider's Guide to Maximizing SQL Performance Monday, Oct 22 10:30 a.m. - 11:15 a.m. Marriott Marquis (Golden Gate Level) - Golden Gate A Ashish Agrawal Group Product Manager Oracle
More informationUnderstanding the latent value in all content
Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence
More informationS-Store: Streaming Meets Transaction Processing
S-Store: Streaming Meets Transaction Processing H-Store is an experimental database management system (DBMS) designed for online transaction processing applications Manasa Vallamkondu Motivation Reducing
More informationBig Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline
More informationReal-time Streaming Applications on AWS Patterns and Use Cases
Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationSpark Streaming. Guido Salvaneschi
Spark Streaming Guido Salvaneschi 1 Spark Streaming Framework for large scale stream processing Scales to 100s of nodes Can achieve second scale latencies Integrates with Spark s batch and interactive
More informationUn'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018
Un'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018 R E T H I N K I N G Stream Processing with Apache Kafka Kafka the Streaming Data Platform 1.0 Enterprise
More informationEnable IoT Solutions using Azure
Internet Of Things A WHITE PAPER SERIES Enable IoT Solutions using Azure 1 2 TABLE OF CONTENTS EXECUTIVE SUMMARY INTERNET OF THINGS GATEWAY EVENT INGESTION EVENT PERSISTENCE EVENT ACTIONS 3 SYNTEL S IoT
More informationTalend Big Data Sandbox. Big Data Insights Cookbook
Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationIn-Memory Data Management Jens Krueger
In-Memory Data Management Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute OLTP vs. OLAP 2 Online Transaction Processing (OLTP) Organized in rows Online Analytical Processing
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationSearch Engines and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18
More informationClient/Server-Architecture
Client/Server-Architecture Content Client/Server Beginnings 2-Tier, 3-Tier, and N-Tier Architectures Communication between Tiers The Power of Distributed Objects Managing Distributed Systems The State
More informationBig Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity Hardware iminds Ghent University Dieter De Witte, Laurens De Vocht, Ruben Verborgh, Erik Mannens, Rik Van de Walle Ontoforce Kenny Knecht, Filip Pattyn,
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationVoltDB vs. Redis Benchmark
Volt vs. Redis Benchmark Motivation and Goals of this Evaluation Compare the performance of several distributed databases that can be used for state storage in some of our applications Low latency is expected
More informationThe SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.
Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationBigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng
Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationSpark, Shark and Spark Streaming Introduction
Spark, Shark and Spark Streaming Introduction Tushar Kale tusharkale@in.ibm.com June 2015 This Talk Introduction to Shark, Spark and Spark Streaming Architecture Deployment Methodology Performance References
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationData Transformation and Migration in Polystores
Data Transformation and Migration in Polystores Adam Dziedzic, Aaron Elmore & Michael Stonebraker September 15th, 2016 Agenda Data Migration for Polystores: What & Why? How? Acceleration of physical data
More informationIndex. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI /
Index A ACID, 251 Actor model Akka installation, 44 Akka logos, 41 OOP vs. actors, 42 43 thread-based concurrency, 42 Agents server, 140, 251 Aggregation techniques materialized views, 216 probabilistic
More informationIntro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect
Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing
More informationGoogle Dremel. Interactive Analysis of Web-Scale Datasets
Google Dremel Interactive Analysis of Web-Scale Datasets Summary Introduction Data Model Nested Columnar Storage Query Execution Experiments Implementations: Google BigQuery(Demo) and Apache Drill Conclusions
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationHeckaton. SQL Server's Memory Optimized OLTP Engine
Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability
More information