(incubating) Introduction. Swapnil Bawaskar.
|
|
- Randolph McDonald
- 5 years ago
- Views:
Transcription
1 (incubating) Introduction William Swapnil
2 Agenda Introduction What? Who? Why? How? DEBS Roadmap Q&A 2
3 3 Introduction
4 Introduction A distributed, memory-based data management platform for data oriented apps that need: high performance, scalability, resiliency and continuous availability fast access to critical data set location aware distributed data processing event driven data architecture 4
5 Incubating but rock solid systems in production (real customers) Cutting edge use cases Massive increase in data volumes Falling margins per transaction Increasing cost of IT maintenance Need for elasticity in systems Real Time response needs Time to market constraints Need for flexible data models across enterprise Distributed development Persistence + In-memory Global data visibility needs Fast Ingest needs for data Need to allow devices to hook into enterprise data Always on Financial Services Providers (every major Wall Street bank) Department of Defense Largest travel Portal Airlines Trade clearing Online gambling Largest Telcos Large mfrers Largest Payroll processor Auto insurance giants Largest rail systems on earth 5
6 Incubating but rock solid 17 billion records in memory GE Power & Water's Remote Monitoring & Diagnostics Center 3 TB operational data in-memory, 400 TB archived China Railways 4.6 Million transactions a day / 40K transactions a second China Railways 120,000 Concurrent Users Indian Railways 6
7 China Railway Corporation Indian Railways Population: 1,401,586,609 1,251,695,616 World: ~7,349,000,000 ~36% of the world population
8 Incubating but rock solid operations per second Cassandra Geode 0 A Reads A Updates B Reads B Updates C Reads D Inserts D Reads F Reads F Updates YCSB Workloads 8
9 Horizontal scaling for reads, consistent latency and CPU Speedup speedup latency (ms) CPU % Server Hosts Scaled from 256 clients and 2 servers to 1280 clients and 10 servers Partitioned region with redundancy and 1K data size 0
10 What makes it go fast? Minimize copying Minimize contention points Run user code in-process Partitioning and parallelism Avoid disk seeks Automated benchmarks
11 Hands-on: Build & run Clone & Build git clone cd incubator-geode./gradlew build -Dskip.tests=true Start a server cd gemfire-assembly/build/install/apache-geode./bin/gfsh gfsh> start locator --name=locator gfsh> start server --name=server gfsh> create region --name=myregion --type=replicate Docker $ docker run -it apachegeode/geode:1.0.0-incubating.m1 gfsh 11
12 Hands on
13 Concepts Cache Region Member Client Cache Functions Listeners High Availability Serialization 13
14 Concepts Cache In-memory storage and management for your data Configurable through XML, Java API or CLI Region Region Region Cache JVM Collection of Region 14
15 Concepts Region Distributed java.util.map on steroids (Key/Value) Consistent API regardless of where or how data is stored Observable (reactive) Highly available, redundant on cache Member (s). Region java.util.map Cache JVM Key K01 K02 Value May Tim 15
16 Concepts Region Key K01 Value May Region Key K01 Value May Region K02 java.util.map Cache Tim K02 java.util.map Cache Tim JVM JVM Local, Replicated or Partitioned In-memory or persistent Redundant LRU Overflow LOCAL LOCAL_HEAP_LRU LOCAL_OVERFLOW LOCAL_PERSISTENT LOCAL_PERSISTENT_OVERFLOW PARTITION PARTITION_HEAP_LRU PARTITION_OVERFLOW PARTITION_PERSISTENT PARTITION_PERSISTENT_OVERFLOW PARTITION_PROXY PARTITION_PROXY_REDUNDANT PARTITION_REDUNDANT PARTITION_REDUNDANT_HEAP_LRU PARTITION_REDUNDANT_OVERFLOW PARTITION_REDUNDANT_PERSISTENT PARTITION_REDUNDANT_PERSISTENT_OVERFLOW REPLICATE REPLICATE_HEAP_LRU REPLICATE_OVERFLOW REPLICATE_PERSISTENT REPLICATE_PERSISTENT_OVERFLOW REPLICATE_PROXY 16
17 Concepts Persistent Regions Put k4->v7 Member 1 Durability Oplog3.crf Modify k4->v7 WAL for efficient writing Oplog2.crf Modify k1->v5 Create k6->v6 Create k2->v2 Create k4->v4 Consistent recovery Compaction Region Key K01 Value May Region Key K01 Value May K02 Tim K02 Tim java.util.map Cache java.util.map Cache JVM JVM Server 1 Server N 17
18 Persistence - Shared Nothing Server 1 Server 2 Server 3 18
19 Persistence - Shared Nothing Server 1 Server 2 Server 3 B1 B1 Primary B2 B2 Secondary B3 B3 19
20 Persistence - Shared Nothing Server 1 Server 2 Server 3 B1 B1 Primary B2 B2 Secondary B3 B3 20
21 Persistence - Shared Nothing Server 1 Server 2 Server 3 B1 B2 B1 B2 Primary Secondary B3 B3 21
22 Persistence - Shared Nothing Server 1 Server 2 Server 3 B1 B2 B1 B2 Primary Secondary B3 B3 22
23 Persistence - Shared Nothing Server 1 waits for others when it starts Server 1 Server 2 Server 3 B1 B1 Primary B2 B2 B2 Secondary B3 B3 B3 23
24 Persistence - Shared Nothing Fetches missed operations on restart Server 1 Server 2 Server 3 B1 B1 Primary B2 B2 Secondary B3 B3 24
25 Persistence - Operational Logs Put k6->v6 Member 1 Append to operation log Oplog2.crf Modify k1->v5 Create k6->v6 Oplog1.crf Create k1->v1 Create k2->v2 Modify k1->v3 Create k4->v4 25
26 Persistence - Operational Logs - Compaction Put k6->v6 Member 1 Append to operation log Oplog2.crf Modify k1->v5 Create k6->v6 Create k2->v2 Create k4->v4 Oplog1.crf Create k1->v1 Modify k1->v3 Copy live data forward 26
27 Concepts Member A process that has a connection to the system A process that has created a cache Embeddable within your application Client Locator Server 27
28 Concepts Client cache A process connected to the Geode server(s) Can have a local copy of the data Run OQL queries on local data Can be notified about events on the servers GemFire Server Application Client Cache Region Region Region 28
29 Concepts Client Notifications Register Interest Individual Keys OR RegEx for Keys Updates Local Copy Examples: region.registerinterest( key-1 ); region1.registerinterestregex( [a-z]+ ); Continuous Query Receive Notification when Query condition met on server Example: SELECT * FROM /tradeorder t WHERE t.price > Can be DURABLE 29
30 Concepts Functions Used for distributed concurrent processing (Map/Reduce, stored procedure) Highly available Data oriented Member oriented Execute Functions f 1, f 2, f n Submit (f1) 30
31 Concepts Functions Server Partitioned Region Data Store - X Server Distributed System Server Partitioned Region Data Store - Y Server Partitioned Region Data Store - Z Server Partitioned Region Data Accessor 4 execute execute result result Server Partitioned Region Data Accessor 1 Client Region filter = Keys X, Y FunctionService.onRegion.withFilter.execute 2 execute 6 ResultCollector.getResult 5 result 31
32 Concepts Listeners CacheWriter / CacheListener AsyncEventListener (queue / batch) Parallel or Serial Conflation 32
33 33 Concepts - HA
34 Fixed or flexible schema? id name age pet_id or { } id : 1, name : Fred, age : 42, pet : { name : Barney, type : dino }
35 C#, C++, Java, JSON Portable Data exchange header data pdx length dsid typeid fields offsets No IDL, no schemas, no hand-coding Schema evolution (Forward and Backward Compatible) * domain object classes not required
36 Efficient for queries SELECT p.name FROM /Person p WHERE p.pet.type = dino { } id : 1, name : Fred, age : 42, pet : { name : Barney, type : dino } single field deserialization
37 But how to serialize data? Benchmark:
38 Schema evolution Application #1 Member A Member B Application #2 v2 objects preserve data from missing fields v1 objects use default values to fill in new fields PDX provides forwards and backwards compatibility, no code required v1 v2 Distributed Type Definitions
39 39 Adapters
40 memcached write-through as opposed to cache-aside Stale Cache Inconsistent Cache Thundering Heards 40
41 Redis Scalable Data-Structures Use All Cores WAN Replication 41
42 TAXI TRIP ANALYSIS (INCUBATING) (DEBS GRAND-CHALLENGE) WITH APACHE GEODE TAXI TRIP ANALYSIS (DEBS GRAND-CHALLENGE) WITH APACHE GEODE Swapnil Bawaskar William Markito Oliveira
43 INTRODUCTION DEBS Distributed Event-Based Systems Grand challenges (2013, 2014, 2015, 2016 ) Analyze NY Taxi Trip information 2013* 12 GB in size and ~173 million events. Most profitable areas Most frequent routes * FOIL (The Freedom of Information Law)
44 INTRODUCTION DEBS
45
46 IMPLEMENTATION
47 IMPLEMENTATION HOW AsyncEvent Listener Parallel or Serial public class FrequentRouterListener implements AsyncEventListener, Declarable { public boolean processevents(list<asyncevent> list) { // PDX object deserializing single field pickupdatetime = (Date) taxitrip.getfield("pickup_datetime"); // some processing with events } } - Memory - Threads - Persistence - Batch size - Batch interval
48 IMPLEMENTATION HOW SELECT e.key,e.value FROM /FrequentRoute.entries e ORDER BY e.value.numtrips DESC LIMIT 10 1' 2' CLIENT n F_ROUTES Area Area 1.1 x.y 2.1 x'.y' { 3 { TRIPS Taxi Area 1 x.y 2 x.y' CACHING_PROXY F_ROUTES N x.y Area Area 1.1 x.y 2.1 x'.y' Update routes SELECT AVG(getFarePlusTip()) as avgtotal, pickup_cell.tostring() FROM /TaxiTrip t GROUP BY pickup_cell.tostring() ORDER BY avgtotal DESC LIMIT 10" NOT SQL!*
49 IMPLEMENTATION HOW F_ROUTES Area Area 1.1 x.y 2.1 x'.y' Evict entries older than 15 seconds Replicated Listener attached Taxi TRIPS Area 1 x.y 2 x.y' N x.y Historical with memory eviction to disk Partitioned across nodes Async listener with queue
50 Demo
51 Roadmap Off-heap memory storage Cloud Foundry service Lucene Search* HDFS Persistence* Spark Integration* * -Experimental and waiting community feedback 51
52 How to Contribute Code New features Bug fixes Writing tests Documentation Wiki Web site Community Join the mailing list Ask or answer Join our HipChat Become a speaker Finding bugs Testing an RC/Beta User guide 52
53 Links Website JIRA Wiki cwiki.apache.org/confluence/display/geode GitHub Mailing lists mail-archives.apache.org/mod_mbox/incubator-geode-dev/ 53
54 Thank you
IoT Platform using Geode and ActiveMQ
IoT Platform using Geode and ActiveMQ Scalable IoT Platform Swapnil Bawaskar @sbawaskar sbawaskar@apache.org Agenda Introduction IoT MQTT Apache ActiveMQ Artemis Apache Geode Real world use case Q&A 2
More informationA NEW PLATFORM FOR A NEW ERA
A NEW PLATFORM FOR A NEW ERA 2 Evolution of Pivotal Gemfire Which way might the "Apache Way take It? Roman Shaposhnik rvs@apache.org Director of Open Source, Pivotal Inc. @rhatr Milind Bhandarkar milind@ampool.io
More informationBUILT FOR THE SPEED OF BUSINESS
BUILT FOR THE SPEED OF BUSINESS Eliminate disk access in the real time path We Challenge the traditional RDBMS design NOT SQL Buffers primarily tuned for IO First write to Log Second write to Data Files
More informationText Search With Lucene
Text Search With Lucene Please refer to Geode 1.2.0 documentation with final implementation is here. Requirements Related Documents Terminology API User Input Key points Java API Examples Gfsh API XML
More informationUsing the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver
Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationVoltDB vs. Redis Benchmark
Volt vs. Redis Benchmark Motivation and Goals of this Evaluation Compare the performance of several distributed databases that can be used for state storage in some of our applications Low latency is expected
More informationIntroduction to Oracle NoSQL Database
Introduction to Oracle NoSQL Database Anand Chandak Ashutosh Naik Agenda NoSQL Background Oracle NoSQL Database Overview Technical Features & Performance Use Cases 2 Why NoSQL? 1. The four V s of Big Data
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationNew Oracle NoSQL Database APIs that Speed Insertion and Retrieval
New Oracle NoSQL Database APIs that Speed Insertion and Retrieval O R A C L E W H I T E P A P E R F E B R U A R Y 2 0 1 6 1 NEW ORACLE NoSQL DATABASE APIs that SPEED INSERTION AND RETRIEVAL Introduction
More information70-532: Developing Microsoft Azure Solutions
70-532: Developing Microsoft Azure Solutions Exam Design Target Audience Candidates of this exam are experienced in designing, programming, implementing, automating, and monitoring Microsoft Azure solutions.
More informationGridGain and Apache Ignite In-Memory Performance with Durability of Disk
GridGain and Apache Ignite In-Memory Performance with Durability of Disk Dmitriy Setrakyan Apache Ignite PMC GridGain Founder & CPO http://ignite.apache.org #apacheignite Agenda What is GridGain and Ignite
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017
Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 About the Presentation Problems Existing Solutions Denis Magda
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More information70-532: Developing Microsoft Azure Solutions
70-532: Developing Microsoft Azure Solutions Objective Domain Note: This document shows tracked changes that are effective as of January 18, 2018. Create and Manage Azure Resource Manager Virtual Machines
More informationDeveloping Microsoft Azure Solutions (70-532) Syllabus
Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages
More informationAbstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight
ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group
More information2017 GridGain Systems, Inc. In-Memory Performance Durability of Disk
In-Memory Performance Durability of Disk Ignite the Fire in your SQL App Akmal B. Chaudhri Technology Evangelist GridGain Systems Agenda SQL Capabilities Connectivity Data Definition Language Data Manipulation
More informationNoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,
More informationGetting Started with Apache Ignite as a Distributed Database
Getting Started with Apache Ignite as a Distributed Database VALENTIN KULICHENKO Lead Architect GridGain Systems, Inc. 2018 GridGain Systems, Inc. Agenda Apache Ignite as a Distributed Database Connectivity
More informationAgenda. Apache Ignite Project Apache Ignite Data Fabric: Data Grid HPC & Compute Streaming & CEP Hadoop & Spark Integration Use Cases Demo Q & A
Introduction 2015 The Apache Software Foundation. Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are trademarks of The Apache Software Foundation. Agenda Apache Ignite Project Apache
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationBig and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant
Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model
More informationApache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel
Apache HBase 0.98 Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel Who am I? Committer on the Apache HBase project Member of the Big Data Research
More informationRealtime visitor analysis with Couchbase and Elasticsearch
Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo
More informationLow Latency Data Grids in Finance
Low Latency Data Grids in Finance Jags Ramnarayan Chief Architect GemStone Systems jags.ramnarayan@gemstone.com Copyright 2006, GemStone Systems Inc. All Rights Reserved. Background on GemStone Systems
More informationSCYLLA: NoSQL at Ludicrous Speed. 主讲人 :ScyllaDB 软件工程师贺俊
SCYLLA: NoSQL at Ludicrous Speed 主讲人 :ScyllaDB 软件工程师贺俊 Today we will cover: + Intro: Who we are, what we do, who uses it + Why we started ScyllaDB + Why should you care + How we made design decisions to
More informationApache Flink. Alessandro Margara
Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate
More informationSEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013
SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013 1 WHO AM I? Ryan Tabora Think Big Analytics - Senior Data Engineer Lover of dachshunds,
More informationNoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu
NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related
More informationYCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores
YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores Swapnil Patil Milo Polte, Wittawat Tantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016
Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationEsper EQC. Horizontal Scale-Out for Complex Event Processing
Esper EQC Horizontal Scale-Out for Complex Event Processing Esper EQC - Introduction Esper query container (EQC) is the horizontal scale-out architecture for Complex Event Processing with Esper and EsperHA
More informationNoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018
NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationSpark Overview. Professor Sasu Tarkoma.
Spark Overview 2015 Professor Sasu Tarkoma www.cs.helsinki.fi Apache Spark Spark is a general-purpose computing framework for iterative tasks API is provided for Java, Scala and Python The model is based
More informationProcessing of big data with Apache Spark
Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems
More informationCaching patterns and extending mobile applications with elastic caching (With Demonstration)
Ready For Mobile Caching patterns and extending mobile applications with elastic caching (With Demonstration) The world is changing and each of these technology shifts has potential to make a significant
More informationFusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic
WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationGFS-python: A Simplified GFS Implementation in Python
GFS-python: A Simplified GFS Implementation in Python Andy Strohman ABSTRACT GFS-python is distributed network filesystem written entirely in python. There are no dependencies other than Python s standard
More information@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS
@unterstein @dcos @bedcon #bedcon Operating microservices with Apache Mesos and DC/OS 1 Johannes Unterstein Software Engineer @Mesosphere @unterstein @unterstein.mesosphere 2017 Mesosphere, Inc. All Rights
More informationCloudera Kudu Introduction
Cloudera Kudu Introduction Zbigniew Baranowski Based on: http://slideshare.net/cloudera/kudu-new-hadoop-storage-for-fast-analytics-onfast-data What is KUDU? New storage engine for structured data (tables)
More informationFast Big Data Analytics with Spark on Tachyon
1 Fast Big Data Analytics with Spark on Tachyon Shaoshan Liu http://www.meetup.com/tachyon/ 2 Fun Facts Tachyon A tachyon is a particle that always moves faster than light. The word comes from the Greek:
More information<Insert Picture Here> Value of TimesTen Oracle TimesTen Product Overview
Value of TimesTen Oracle TimesTen Product Overview Shig Hiura Sales Consultant, Oracle Embedded Global Business Unit When You Think Database SQL RDBMS Results RDBMS + client/server
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationVoltDB for Financial Services Technical Overview
VoltDB for Financial Services Technical Overview Financial services organizations have multiple masters: regulators, investors, customers, and internal business users. All create, monitor, and require
More information10 Million Smart Meter Data with Apache HBase
10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on
More informationTuning Enterprise Information Catalog Performance
Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States
More informationA memcached implementation in Java. Bela Ban JBoss 2340
A memcached implementation in Java Bela Ban JBoss 2340 AGENDA 2 > Introduction > memcached > memcached in Java > Improving memcached > Infinispan > Demo Introduction 3 > We want to store all of our data
More informationIntro Cassandra. Adelaide Big Data Meetup.
Intro Cassandra Adelaide Big Data Meetup instaclustr.com @Instaclustr Who am I and what do I do? Alex Lourie Worked at Red Hat, Datastax and now Instaclustr We currently manage x10s nodes for various customers,
More informationMassive Online Analysis - Storm,Spark
Massive Online Analysis - Storm,Spark presentation by R. Kishore Kumar Research Scholar Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Kharagpur-721302, India (R
More informationDatabricks, an Introduction
Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,
More informationWHY AND HOW TO LEVERAGE THE POWER AND SIMPLICITY OF SQL ON APACHE FLINK - FABIAN HUESKE, SOFTWARE ENGINEER
WHY AND HOW TO LEVERAGE THE POWER AND SIMPLICITY OF SQL ON APACHE FLINK - FABIAN HUESKE, SOFTWARE ENGINEER ABOUT ME Apache Flink PMC member & ASF member Contributing since day 1 at TU Berlin Focusing on
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More informationDeveloping Microsoft Azure Solutions: Course Agenda
Developing Microsoft Azure Solutions: 70-532 Course Agenda Module 1: Overview of the Microsoft Azure Platform Microsoft Azure provides a collection of services that you can use as building blocks for your
More informationDeveloping Microsoft Azure Solutions (70-532) Syllabus
Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages
More informationAdvanced Database Systems
Lecture II Storage Layer Kyumars Sheykh Esmaili Course s Syllabus Core Topics Storage Layer Query Processing and Optimization Transaction Management and Recovery Advanced Topics Cloud Computing and Web
More informationKafka Connect the Dots
Kafka Connect the Dots Building Oracle Change Data Capture Pipelines With Kafka Mike Donovan CTO Dbvisit Software Mike Donovan Chief Technology Officer, Dbvisit Software Multi-platform DBA, (Oracle, MSSQL..)
More informationCouchbase Architecture Couchbase Inc. 1
Couchbase Architecture 2015 Couchbase Inc. 1 $whoami Laurent Doguin Couchbase Developer Advocate @ldoguin laurent.doguin@couchbase.com 2015 Couchbase Inc. 2 2 Big Data = Operational + Analytic (NoSQL +
More informationMySQL High Availability
MySQL High Availability InnoDB Cluster and NDB Cluster Ted Wennmark ted.wennmark@oracle.com Copyright 2016, Oracle and/or its its affiliates. All All rights reserved. Safe Harbor Statement The following
More informationHYBRID TRANSACTION/ANALYTICAL PROCESSING COLIN MACNAUGHTON
HYBRID TRANSACTION/ANALYTICAL PROCESSING COLIN MACNAUGHTON WHO IS NEEVE RESEARCH? Headquartered in Silicon Valley Creators of the X Platform - Memory Oriented Application Platform Passionate about high
More informationData pipelines with PostgreSQL & Kafka
Data pipelines with PostgreSQL & Kafka Oskari Saarenmaa PostgresConf US 2018 - Jersey City Agenda 1. Introduction 2. Data pipelines, old and new 3. Apache Kafka 4. Sample data pipeline with Kafka & PostgreSQL
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationThe Lion of storage systems
The Lion of storage systems Rakuten. Inc, Yosuke Hara Mar 21, 2013 1 The Lion of storage systems http://www.leofs.org LeoFS v0.14.0 was released! 2 Table of Contents 1. Motivation 2. Overview & Inside
More informationArchitecture of a Real-Time Operational DBMS
Architecture of a Real-Time Operational DBMS Srini V. Srinivasan Founder, Chief Development Officer Aerospike CMG India Keynote Thane December 3, 2016 [ CMGI Keynote, Thane, India. 2016 Aerospike Inc.
More informationThe SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.
Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate
More informationPostgres Plus and JBoss
Postgres Plus and JBoss A New Division of Labor for New Enterprise Applications An EnterpriseDB White Paper for DBAs, Application Developers, and Enterprise Architects October 2008 Postgres Plus and JBoss:
More informationApache HAWQ (incubating)
HADOOP NATIVE SQL What is HAWQ? Apache HAWQ (incubating) Is an elastic parallel processing SQL engine that runs native in Apache Hadoop to directly access data for advanced analytics. Why HAWQ? Hadoop
More informationNOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe
NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe 2017 163 Performance / Benchmarks Traditional database benchmarks
More informationAzure-persistence MARTIN MUDRA
Azure-persistence MARTIN MUDRA Storage service access Blobs Queues Tables Storage service Horizontally scalable Zone Redundancy Accounts Based on Uri Pricing Calculator Azure table storage Storage Account
More informationSQL Server on Linux and Containers
http://aka.ms/bobwardms https://github.com/microsoft/sqllinuxlabs SQL Server on Linux and Containers A Brave New World Speaker Name Principal Architect Microsoft bobward@microsoft.com @bobwardms linkedin.com/in/bobwardms
More informationChina Big Data and HPC Initiatives Overview. Xuanhua Shi
China Big Data and HPC Initiatives Overview Xuanhua Shi Services Computing Technology and System Laboratory Big Data Technology and System Laboratory Cluster and Grid Computing Laboratory Huazhong University
More informationMySQL & NoSQL: The Best of Both Worlds
MySQL & NoSQL: The Best of Both Worlds Mario Beck Principal Sales Consultant MySQL mario.beck@oracle.com 1 Copyright 2012, Oracle and/or its affiliates. All rights Safe Harbour Statement The following
More informationLoosely coupled: asynchronous processing, decoupling of tiers/components Fan-out the application tiers to support the workload Use cache for data and content Reduce number of requests if possible Batch
More informationIn-Memory Performance Durability of Disk GridGain Systems, Inc.
In-Memory Performance Durability of Disk Apache Ignite In-Memory Hammer for Your Data Science Toolkit Denis Magda Ignite PMC Chair GridGain Director of Product Management Agenda Apache Ignite Overview
More informationQunar Performs Real-Time Data Analytics up to 300x Faster with Alluxio
CASE STUDY Qunar Performs Real-Time Data Analytics up to 300x Faster with Alluxio Xueyan Li, Lei Xu, and Xiaoxu Lv Software Engineers at Qunar At Qunar, we have been running Alluxio in production for over
More informationScaling DreamFactory
Scaling DreamFactory This white paper is designed to provide information to enterprise customers about how to scale a DreamFactory Instance. The sections below talk about horizontal, vertical, and cloud
More informationTransform to Your Cloud
Transform to Your Cloud Presented by VMware 2012 VMware Inc. All rights reserved Agenda Corporate Overview Cloud Infrastructure & Management Cloud Application Platform End User Computing The Journey to
More informationTuning Intelligent Data Lake Performance
Tuning Intelligent Data Lake Performance 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without
More informationCloud Programming on Java EE Platforms. mgr inż. Piotr Nowak
Cloud Programming on Java EE Platforms mgr inż. Piotr Nowak Distributed data caching environment Hadoop Apache Ignite "2 Cache what is cache? how it is used? "3 Cache - hardware buffer temporary storage
More informationApache Spark 2.0. Matei
Apache Spark 2.0 Matei Zaharia @matei_zaharia What is Apache Spark? Open source data processing engine for clusters Generalizes MapReduce model Rich set of APIs and libraries In Scala, Java, Python and
More informationBoost Your Hibernate and Application Performance
Boost Your Hibernate and Application Performance Presented by: Greg Luck, Founder and CTO Ehcache March 3, 2010 Agenda Intro to Ehcache and Terracotta Code: Scaling Spring Pet Clinic With Hibernate With
More informationManaging IoT and Time Series Data with Amazon ElastiCache for Redis
Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All
More informationOracle TimesTen In-Memory Database 18.1
Oracle TimesTen In-Memory Database 18.1 Scaleout Functionality, Architecture and Performance Chris Jenkins Senior Director, In-Memory Technology TimesTen Product Management Best In-Memory Databases: For
More informationMySQL Cluster Web Scalability, % Availability. Andrew
MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended
More informationTRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS
TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS Martin Maas* Tim Harris KrsteAsanovic* John Kubiatowicz* *University of California, Berkeley Oracle Labs, Cambridge Why you should care
More informationNewSQL. Database Landscape From: the 451 group. OLTP Focus. NewSQL: Flying on ACID. Cloud DB, Winter 2014, Lecture 14
NewSQL: Flying on ACID David Maier NewSQL Keep SQL (some of it) and ACID But be speedy and scalable Thanks to H-Store folks, Mike Stonebraker, Fred Holahan 3/5/14 David Maier, Portland State University
More informationStreaming Log Analytics with Kafka
Streaming Log Analytics with Kafka Kresten Krab Thorup, Humio CTO Log Everything, Answer Anything, In Real-Time. Why this talk? Humio is a Log Analytics system Designed to run on-prem High volume, real
More informationCloud Computing with FPGA-based NVMe SSDs
Cloud Computing with FPGA-based NVMe SSDs Bharadwaj Pudipeddi, CTO NVXL Santa Clara, CA 1 Choice of NVMe Controllers ASIC NVMe: Fully off-loaded, consistent performance, M.2 or U.2 form factor ASIC OpenChannel:
More informationNewSQL: Flying on ACID
NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred Holahan NewSQL Keep SQL (some of it) and ACID But be speedy and scalable 3/5/14 David Maier, Portland State University
More informationApache BookKeeper. A High Performance and Low Latency Storage Service
Apache BookKeeper A High Performance and Low Latency Storage Service Hello! I am Sijie Guo - PMC Chair of Apache BookKeeper Co-creator of Apache DistributedLog Twitter Messaging/Pub-Sub Team Yahoo! R&D
More informationInnoDB: Status, Architecture, and Latest Enhancements
InnoDB: Status, Architecture, and Latest Enhancements O'Reilly MySQL Conference, April 14, 2011 Inaam Rana, Oracle John Russell, Oracle Bios Inaam Rana (InnoDB / MySQL / Oracle) Crash recovery speedup
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationOracle Exadata: Strategy and Roadmap
Oracle Exadata: Strategy and Roadmap - New Technologies, Cloud, and On-Premises Juan Loaiza Senior Vice President, Database Systems Technologies, Oracle Safe Harbor Statement The following is intended
More informationThe new Ehcache and Hibernate Caching SPI Provider
The new Ehcache 2.0.0 and Hibernate Caching SPI Provider Presented by: Chris Dennis, Software Engineer, Terracotta Inc. May 12, 2010 Agenda Intro to Ehcache and Terracotta Ehcache and Ehcache + Terracotta
More information