Buffering to Redis for Efficient Real-Time Processing. Percona Live, April 24, 2018

Size: px
Start display at page:

Download "Buffering to Redis for Efficient Real-Time Processing. Percona Live, April 24, 2018"

Transcription

1 Buffering to Redis for Efficient Real-Time Processing Percona Live, April 24, 2018

2 Presenting Today Jon Hyman CTO & Co-Founder Braze (Formerly

3 Mobile is at the vanguard of a new wave of borderless engagement. [ ] the roller coaster will be accelerating Digital is the main reason just over half of Fortune 500 companies have disappeared since the year 2000 faster than ever, only this time it ll be about actual experiences, with much less emphasis on the way those experiences get made PIERRE NANTERME, CEO, ACCENTURE WALT MOSSBERG, AMERICAN JOURNALIST & FORMER RECODE EDITOR AT LARGE SOURCE: DIGITAL DISRUPTION HAS ONLY JUST BEGUN (DAVOS WORLD ECONOMIC FORUM), THE DISAPPEARING COMPUTER (RECODE)

4 More than 1 Billion MAU Braze empowers you to humanize your brand-customer relationships at scale. Tens of Billions of Messages Sent Monthly Global Customer Presence ON SIX CONTINENTS

5 Quick Intro to Redis Today Coordinating Customer Journeys with Redis Buffering Analytics to Redis TOC

6 Quick Intro to Redis

7 What is Redis? Redis is an open source (BSD licensed), in-memory data structure store, used as database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs and geospatial indexes with radius queries. Redis has builtin replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster. Braze uses all the data types from Redis Today s talk we ll look at sorted sets, sets, hashes, and strings

8 Redis data types Strings: key value storage. Redis has atomic operations to set a key if it doesn t exist and to set expiry You can use this to create a basic locking mechanism SET key value NX EX 10 Set key to value if it does not exist, and expire the key in 10 seconds Redis returns whether or not the set succeeded

9 Redis data types Sets: Lists of string values that do not contain any duplicates. Sets do not have an ordering. SADD key a SADD key b SADD key a SMEMBERS key [ a, b ]

10 Redis data types Hashes: A data structure that can store string keys and string values HSET key foo bar HSET key bar bang HGETALL key { foo : bar, bar : bang } Hashes can also have keys be incremented HINCRBY key baz 1 HINCRBY key baz 3 HGET key baz 4

11 Redis data types Sorted Sets: Like sets, but each element also has a numerical score associated with it. Sorted Sets are ordered by that score. ZADD scores alice 100 ZADD scores bob 80 ZADD scores carol 110 ZRANGEBYSCORE scores 0-1 [ [bob, 80], [alice, 100], [carol, 110] ] ZREVRANGEBYSCORE scores 0-1 [ [carol, 110], [alice, 100], [bob, 80] ]

12 Coordinating Customer Journeys with Redis

13 Canvas Allows customers to create multi-step, multi-message, multi-day customer journeys

14 Canvas Canvas is distributed and event driven When messages are sent, we fire received campaign event Processes listen for the received campaign event and determine if that should schedule new message If a new message should be scheduled, enqueue a new job process to send the message.

15 Using Redis as a Job Queue Jobs are added to Redis sorted set with Unix timestamp as the score and value as job data One new job added per message Worker processes on servers poll scheduled set with ZRANGEBYSCORE -INF <now> LIMIT 0 1, then one worker process ZREMs ZRANGEBYSCORE -INF <now> LIMIT 0 1 has O(1) runtime due to Redis implementation of sorted sets ZREM has O(log N)runtime For canvas, enqueue one job per each branch. When the job runs, the process determine if the branch path is valid and grab a lock to prevent other branches from processing Lock takes the form of a SET NX EX operation

16 Canvas This architecture worked great in staging, in beta, and for the first few months of the general release and all was good Processing runtime depends on number of branches a canvas has and the number of users entering the canvas. January, 2017 one customer created a canvas with 11 branches targeting more than 10 million users to run at 10am the next day. Canvas architecture design meant we had to process 110 million jobs right at 10am

17 What happened?

18 Thundering Herd: Enqueuing Jobs This particular canvas created 110 million jobs to all run at 10am the next morning at the same timestamp These jobs are stored in a sorted set, where workers are polling to move jobs from sorted set to queues ZRANGEBYSCORE -INF <now> LIMIT 0 1 has O(1) runtime due to Redis implementation of sorted sets ZREM has O(log N) runtime Every worker server s ZRANGEBYSCORE would return something, only one process would successfully ZREM the job Excessive ZREM operations slowed down Redis It took more than 40 minutes just to enqueue the jobs, meaning that if it was 10:35am, we hadn t finished enqueuing the 10am jobs yet. This was now a customer facing incident.

19 One user per job inefficiencies Each job was one {user, branch} pair Determining if the user should go down that path involves querying database state and making Redis locks 110 million roundtrips to each database to determine if processing should continue It took more than 90 minutes to process the next steps

20 What did we do?

21 Fixing Canvas architectural issues Initial code design was inefficient: one job per {user, branch} pair. Each job needs access to database state, so we made a lot of extra database calls. Because messages tend to go to multiple users around the same time, we figured we could buffer them and have a single job process multiple users at once.

22 Use Redis sets as a buffer

23 Fixing one user per job inefficiencies When a received campaign event is fired, instead of enqueueing a new job to send a message, create a new set with key buffer:step_id:timestamp. Add user to this set. This lets users buffer up for the same timestamp. Periodically flush this set in batches of 100 users: When doing an SADD, also do a SET NX EX to a key to determine if we should enqueue a job to run in 3 seconds which will flush the set. The job does an SPOP 100 to get 100 elements, and will re-enqueue other jobs to run to continue flushing the set if it is non-empty

24 Fixing the thundering herd Added random microsecond jitter to all jobs in the sorted set to split up one second into a million pieces Existing code used ZRANGEBYSCORE -INF <now> 0 1 to consume from left side of sorted set Consume from the right side with ZREVRANGEBYSCORE Consume from the middle Keep track of how far backlogged we are in the set Randomly add jitter or whole seconds to move along the set to start consuming the middle

25 Results of architectural changes Saved more than 50 gigabytes of RAM for the original canvas Instead of 110 million jobs, we enqueued only about 1.4 million jobs Instead of 40 minutes to enqueue from the sorted set, all jobs enqueued in a few seconds Next steps of the canvas processed in about 14 minutes, down from 90 minutes.

26 We adapted buffering in other places, such as our REST API

27 REST API Buffering Braze has REST APIs to ingest user attribute data, event data and purchases Application servers query user state when processing, it is more efficient to make batch roundtrips to databases We encourage customers to batch data, but some integrations make 1 API call per data point Less Efficient, 2 Round Trips to Query State POST /users/track More Efficient, 1 Round Trip to Query State POST /users/track { attributes: [{ user_id : 123, first_name : Alice }], } POST /users/track { attributes: [{ user_id : 456, first_name : Bob }], } { } attributes: [ { user_id : 123, first_name : Alice }, { user_id : 456, first_name : Bob }, ],

28 REST API Buffering Braze has REST APIs to ingest user attribute data, event data and purchases Application servers query user state when processing, it is more efficient to make batch roundtrips to databases We encourage customers to batch data, but some integrations make 1 API call per data point Less Efficient, 2 Round Trips to Query State POST /users/track More Efficient, 1 Round Trip to Query State POST /users/track { attributes: [{ user_id : 123, first_name : Alice }], } POST /users/track { attributes: [{ user_id : 456, first_name : Bob }], } { } attributes: [ { user_id : 123, first_name : Alice }, { user_id : 456, first_name : Bob }, ], We use the same pattern and SADD data to a Redis set and flush it every second This lets us buffer multiple API calls and process them together

29 Improving Writes for Time Series Analytics

30 We collect a lot of time series analytics

31 Time series analytics are stored in MongoDB Non-hashed MongoDB sharding divides data into ranges and puts them on different nodes

32 Time series data is easy to pre-aggregate { app_id: date: , name: website_visits, 6: 120, 7: 541, 8: 1200, 9: 800, }

33 Shard on {app_id:1, name:1, date:1} { app_id: date: , name: website_visits, 6: 120, 7: 541, 8: 1200, 9: 800, }

34 {app_id: 1, name: 1, date: 1} One document per app, per event name, per day

35 {app_id: 1, name: 1, date: 1} What happens when more events come in at once than one shard can handle?

36 {app_id: 1, name: 1, date: 1} What happens when more events come in at once than one shard can handle?

37 Treat Redis hashes as if they were MongoDB sub-documents

38 MongoDB Redis { app_id: date: , name: website_visits, Use a hash based on shard key where keys are hours and values are the amount to increment by 6: 120, 7: 541, 8: 1200, 9: 800, }

39 HINCRBY website_visits 8 1 SADD "buffered" website_visits

40 Periodically flush from Redis to MongoDB just like we do with Canvas sets

41 keys = SMEMBERS( buffered ) Flush buffer from Redis to MongoDB increment_hashes = REDIS MULTI keys.each { key HGETALL(key) } SREM( buffered, k) keys.each { key DEL(key) } END MULTI keys.each_with_index do key, i app_id, name, date = deserialize(key) db.my_timeseries.find( {app_id: app_id, name: name, date: date} ).update_one($inc: increment_hashes[i]) end * This example algorithm is vulnerable to data loss, do not use directly

42 We do this with 12 Redis servers to shard out writes to a single MongoDB document Can buffer the same hash key to each Redis and flush independently

43 Scale We re doing over 1 million ops per second to Redis That s 1 million writes to Mongo deferred per second Mongo flush rate is approximately 7k writes per second Redis is handling 142x more writes per second than Mongo for analytics

44 Summary When processing a flurry of events, holding and batching can be efficient to improve throughput Redis multiple data types can be used for buffering Braze uses sets to buffer streams of data to process in bulk Add with SADD, remove with SPOP Reduces database roundtrips and storage costs Braze uses hashes to buffer time series analytics using HINCRBY

45 Thank you! We are hiring! braze.com/careers

46 Rate My Session 5

47

Harnessing the Full power of Redis. Daniel Magliola

Harnessing the Full power of Redis. Daniel Magliola Harnessing the Full power of Redis Daniel Magliola daniel@danielmagliola.com http://danielmagliola.com What is Redis? Redis is essentially like Memcached, but better I mean, it s an in-memory key-value

More information

Redis Functions and Data Structures at Percona Live. Dave Nielsen, Developer Redis

Redis Functions and Data Structures at Percona Live. Dave Nielsen, Developer Redis Redis Functions and Data Structures at Percona Live Dave Nielsen, Developer Advocate dave@redislabs.com @davenielsen Redis Labs @redislabs Redis = A Unique Database Redis is an open source (BSD licensed),

More information

Database Solution in Cloud Computing

Database Solution in Cloud Computing Database Solution in Cloud Computing CERC liji@cnic.cn Outline Cloud Computing Database Solution Our Experiences in Database Cloud Computing SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure

More information

NoSQL Databases Analysis

NoSQL Databases Analysis NoSQL Databases Analysis Jeffrey Young Intro I chose to investigate Redis, MongoDB, and Neo4j. I chose Redis because I always read about Redis use and its extreme popularity yet I know little about it.

More information

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Managing IoT and Time Series Data with Amazon ElastiCache for Redis Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All

More information

LECTURE 27. Python and Redis

LECTURE 27. Python and Redis LECTURE 27 Python and Redis PYTHON AND REDIS Today, we ll be covering a useful but not entirely Python-centered topic: the inmemory datastore Redis. We ll start by introducing Redis itself and then discussing

More information

Redis Func+ons and Data Structures

Redis Func+ons and Data Structures Redis Func+ons and Data Structures About This Talk Topic : Redis Func/ons and Data Structures Presenter: Redis Labs, the open source home and provider of enterprise Redis About Redis Labs: 5300+ paying

More information

Redis as a Reliable Work Queue. Percona University

Redis as a Reliable Work Queue. Percona University Redis as a Reliable Work Queue Percona University 2015-02-12 Introduction Tom DeWire Principal Software Engineer Bronto Software Chris Thunes Senior Software Engineer Bronto Software Introduction Introduction

More information

Home of Redis. April 24, 2017

Home of Redis. April 24, 2017 Home of Redis April 24, 2017 Introduction to Redis and Redis Labs Redis with MySQL Data Structures in Redis Benefits of Redis e 2 Redis and Redis Labs Open source. The leading in-memory database platform,

More information

Redis as a Time Series DB. Josiah Carlson

Redis as a Time Series DB. Josiah Carlson Redis as a Time Series DB Josiah Carlson - @dr_josiah Agenda Who are you? What is Redis? (3 minutes, optional) What is a time series database? Combining structures for success Analyzing/segmenting events

More information

NoSQL: Redis and MongoDB A.A. 2016/17

NoSQL: Redis and MongoDB A.A. 2016/17 Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NoSQL: Redis and MongoDB A.A. 2016/17 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica -

More information

THE FLEXIBLE DATA-STRUCTURE SERVER THAT COULD.

THE FLEXIBLE DATA-STRUCTURE SERVER THAT COULD. REDIS THE FLEXIBLE DATA-STRUCTURE SERVER THAT COULD. @_chriswhitten_ REDIS REDIS April 10, 2009; 6 years old Founding Author: Salvatore Sanfilippo Stable release: 3.0.3 / June 4, 2015; 3 months ago Fundamental

More information

Using Redis As a Time Series Database

Using Redis As a Time Series Database WHITE PAPER Using Redis As a Time Series Database Dr.Josiah Carlson, Author of Redis in Action CONTENTS Executive Summary 2 Use Cases 2 Advanced Analysis Using a Sorted Set with Hashes 2 Event Analysis

More information

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona Beyond Relational Databases: MongoDB, Redis & ClickHouse Marcos Albe - Principal Support Engineer @ Percona Introduction MySQL everyone? Introduction Redis? OLAP -vs- OLTP Image credits: 451 Research (https://451research.com/state-of-the-database-landscape)

More information

Home of Redis. Redis for Fast Data Ingest

Home of Redis. Redis for Fast Data Ingest Home of Redis Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast Data Ingest Pub/Sub List Sorted Sets as a Time Series Database The Demo Scaling with Redis e Flash 2 Fast

More information

How to Scale MongoDB. Apr

How to Scale MongoDB. Apr How to Scale MongoDB Apr-24-2018 About me Location: Skopje, Republic of Macedonia Education: MSc, Software Engineering Experience: Lead Database Consultant (since 2016) Database Consultant (2012-2016)

More information

whitepaper Using Redis As a Time Series Database: Why and How

whitepaper Using Redis As a Time Series Database: Why and How whitepaper Using Redis As a Time Series Database: Why and How Author: Dr.Josiah Carlson, Author of Redis in Action Table of Contents Executive Summary 2 A Note on Race Conditions and Transactions 2 Use

More information

Amritansh Sharma

Amritansh Sharma 17.12.2018 Amritansh Sharma - 000473628 1 CONTENTS 1 Introduction and Background 3 1.1 Relational Databases 3 1.2 NoSQL Databases 4 1.3 Key Value Stores 5 2 Redis 7 2.1 Redis vs Other Key-Value Stores

More information

My Other Car is a Redis. Etan Grundstein & Sasha Popov DYNAMIC YIELD

My Other Car is a Redis. Etan Grundstein & Sasha Popov DYNAMIC YIELD My Other Car is a Redis Etan Grundstein & Sasha Popov DYNAMIC YIELD About Dynamic Yield Dynamic Yield helps marketers increase revenue by personalizing customer interactions across web, mobile web, mobile

More information

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation

More information

XtraDB 5.7: Key Performance Algorithms. Laurynas Biveinis Alexey Stroganov Percona

XtraDB 5.7: Key Performance Algorithms. Laurynas Biveinis Alexey Stroganov Percona XtraDB 5.7: Key Performance Algorithms Laurynas Biveinis Alexey Stroganov Percona firstname.lastname@percona.com XtraDB 5.7 Key Performance Algorithms Focus on the buffer pool, flushing, the doublewrite

More information

Jason Brelloch and William Gimson

Jason Brelloch and William Gimson Jason Brelloch and William Gimson Overview 1. Introduction 2. History 3. Specifications a. Structure b. Communication c. Datatypes 4. Command Overview 5. Advanced Capabilities 6. Advantages 7. Disadvantages

More information

WiredTiger In-Memory vs WiredTiger B-Tree. October, 5, 2016 Mövenpick Hotel Amsterdam Sveta Smirnova

WiredTiger In-Memory vs WiredTiger B-Tree. October, 5, 2016 Mövenpick Hotel Amsterdam Sveta Smirnova WiredTiger In-Memory vs WiredTiger B-Tree October, 5, 2016 Mövenpick Hotel Amsterdam Sveta Smirnova Table of Contents What is Percona Memory Engine for MongoDB? Typical use cases Advanced Memory Engine

More information

High-Level Data Models on RAMCloud

High-Level Data Models on RAMCloud High-Level Data Models on RAMCloud An early status report Jonathan Ellithorpe, Mendel Rosenblum EE & CS Departments, Stanford University Talk Outline The Idea Data models today Graph databases Experience

More information

Aerospike Scales with Google Cloud Platform

Aerospike Scales with Google Cloud Platform Aerospike Scales with Google Cloud Platform PERFORMANCE TEST SHOW AEROSPIKE SCALES ON GOOGLE CLOUD Aerospike is an In-Memory NoSQL database and a fast Key Value Store commonly used for caching and by real-time

More information

IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store

IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data IBM Db2 Event Store Disclaimer The information contained in this presentation is provided for informational purposes only.

More information

羅仲成 ROY LOU 17MEDIA 分散式緩存服務實踐 DISTRIBUTED CACHING SERVICE

羅仲成 ROY LOU 17MEDIA 分散式緩存服務實踐 DISTRIBUTED CACHING SERVICE 羅仲成 ROY LOU 17MEDIA 分散式緩存服務實踐 DISTRIBUTED CACHING SERVICE ABOUT ME 17media architect Past: HTC, Google, NVIDIA 2-year-old monster s dad Jogging, basketball, snowboarding There are only two hard things

More information

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

Caching Memcached vs. Redis

Caching Memcached vs. Redis Caching Memcached vs. Redis San Francisco MySQL Meetup Ryan Lowe Erin O Neill 1 Databases WE LOVE THEM... Except when we don t 2 When Databases Rule Many access patterns on the same set of data Transactions

More information

Redis - a Flexible Key/Value Datastore An Introduction

Redis - a Flexible Key/Value Datastore An Introduction Redis - a Flexible Key/Value Datastore An Introduction Alexandre Dulaunoy AIMS 2011 MapReduce and Network Forensic MapReduce is an old concept in computer science The map stage to perform isolated computation

More information

MongoDB 2.2 and Big Data

MongoDB 2.2 and Big Data MongoDB 2.2 and Big Data Christian Kvalheim Team Lead Engineering, EMEA christkv@10gen.com @christkv christiankvalheim.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis ...without

More information

Scaling for Humongous amounts of data with MongoDB

Scaling for Humongous amounts of data with MongoDB Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis

More information

MongoDB: Comparing WiredTiger In-Memory Engine to Redis. Jason Terpko DBA, Rackspace/ObjectRocket 1

MongoDB: Comparing WiredTiger In-Memory Engine to Redis. Jason Terpko DBA, Rackspace/ObjectRocket  1 MongoDB: Comparing WiredTiger In-Memory Engine to Redis Jason Terpko DBA, Rackspace/ObjectRocket www.linkedin.com/in/jterpko 1 Background Started out in relational databases in public education then financial

More information

Triple R Riak, Redis and RabbitMQ at XING

Triple R Riak, Redis and RabbitMQ at XING Triple R Riak, Redis and RabbitMQ at XING Dr. Stefan Kaes, Sebastian Röbke NoSQL matters Cologne, April 27, 2013 ActivityStream Intro 3 Types of Feeds News Feed Me Feed Company Feed Activity Creation

More information

CS November 2018

CS November 2018 Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

Streaming Log Analytics with Kafka

Streaming Log Analytics with Kafka Streaming Log Analytics with Kafka Kresten Krab Thorup, Humio CTO Log Everything, Answer Anything, In Real-Time. Why this talk? Humio is a Log Analytics system Designed to run on-prem High volume, real

More information

Course Content MongoDB

Course Content MongoDB Course Content MongoDB 1. Course introduction and mongodb Essentials (basics) 2. Introduction to NoSQL databases What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL

More information

Real World Web Scalability. Ask Bjørn Hansen Develooper LLC

Real World Web Scalability. Ask Bjørn Hansen Develooper LLC Real World Web Scalability Ask Bjørn Hansen Develooper LLC Hello. 28 brilliant methods to make your website keep working past $goal requests/transactions/sales per second/hour/day Requiring minimal extra

More information

O Reilly RailsConf,

O Reilly RailsConf, O Reilly RailsConf, 2011-05- 18 Who is that guy? Jesper Richter- Reichhelm / @jrirei Berlin, Germany Head of Engineering @ wooga Wooga does social games Wooga has dedicated game teams Cooming soon PHP

More information

Redis Tuesday, May 29, 12

Redis Tuesday, May 29, 12 Redis 2.6 @antirez Redis 2.6 Major new features. Based on unstable branch (minus the cluster code). Why a 2.6 release? Redis Cluster is a long term project (The hurried cat produced blind kittens). Intermediate

More information

From the event loop to the distributed system. Martyn 3rd November, 2011

From the event loop to the distributed system. Martyn 3rd November, 2011 From the event loop to the distributed system Martyn Loughran martyn@pusher.com @mloughran 3rd November, 2011 From the event loop to the distributed system From the event loop to the distributed system

More information

Run your own Open source. (MMS) to avoid vendor lock-in. David Murphy MongoDB Practice Manager, Percona

Run your own Open source. (MMS) to avoid vendor lock-in. David Murphy MongoDB Practice Manager, Percona Run your own Open source Click alternative to edit to Master Ops-Manager title style (MMS) to avoid vendor lock-in David Murphy MongoDB Practice Manager, Percona Who is this Person and What Does He Know?

More information

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement

More information

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010 Scaling Without Sharding Baron Schwartz Percona Inc Surge 2010 Web Scale!!!! http://www.xtranormal.com/watch/6995033/ A Sharding Thought Experiment 64 shards per proxy [1] 1 TB of data storage per node

More information

Upgrading Databases. without losing your data, your performance or your mind. Charity

Upgrading Databases. without losing your data, your performance or your mind. Charity Upgrading Databases without losing your data, your performance or your mind Charity Majors @mipsytipsy Upgrading Databases without losing your data, your performance or your mind Charity Majors @mipsytipsy

More information

Invitation to a New Kind of Database. Sheer El Showk Cofounder, Lore Ai We re Hiring!

Invitation to a New Kind of Database. Sheer El Showk Cofounder, Lore Ai   We re Hiring! Invitation to a New Kind of Database Sheer El Showk Cofounder, Lore Ai www.lore.ai We re Hiring! Overview 1. Problem statement (~2 minute) 2. (Proprietary) Solution: Datomics (~10 minutes) 3. Proposed

More information

Scaling. Yashh Nelapati Gotham City. Marty Weiner Krypton. Friday, July 27, 12

Scaling. Yashh Nelapati Gotham City. Marty Weiner Krypton. Friday, July 27, 12 Scaling Marty Weiner Krypton Yashh Nelapati Gotham City Pinterest is... An online pinboard to organize and share what inspires you. Relationships Marty Weiner Grayskull, Eternia Relationships Marty

More information

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 About the Presentation Problems Existing Solutions Denis Magda

More information

Counting is Hard: Probabilistically Counting Views at Reddit. Krishnan Chandra, Data Engineer

Counting is Hard: Probabilistically Counting Views at Reddit. Krishnan Chandra, Data Engineer Counting is Hard: Probabilistically Counting Views at Reddit Krishnan Chandra, Data Engineer What is probabilistic counting? Overview How did probabilistic counting help us scale? What issues did we face

More information

Time Series Live 2017

Time Series Live 2017 1 Time Series Schemas @Percona Live 2017 Who Am I? Chris Larsen Maintainer and author for OpenTSDB since 2013 Software Engineer @ Yahoo Central Monitoring Team Who I m not: A marketer A sales person 2

More information

Scaling with mongodb

Scaling with mongodb Scaling with mongodb Ross Lawley Python Engineer @ 10gen Web developer since 1999 Passionate about open source Agile methodology email: ross@10gen.com twitter: RossC0 Today's Talk Scaling Understanding

More information

django-redis-cache Documentation

django-redis-cache Documentation django-redis-cache Documentation Release 1.5.2 Sean Bleier Nov 15, 2018 Contents 1 Intro and Quick Start 3 1.1 Intro................................................... 3 1.2 Quick Start................................................

More information

Time-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018

Time-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018 Time-Series Data in MongoDB on a Budget Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018 TIME SERIES DATA in MongoDB on a Budget Click to add text

More information

The course modules of MongoDB developer and administrator online certification training:

The course modules of MongoDB developer and administrator online certification training: The course modules of MongoDB developer and administrator online certification training: 1 An Overview of the Course Introduction to the course Table of Contents Course Objectives Course Overview Value

More information

Scaling. Marty Weiner Grayskull, Eternia. Yashh Nelapati Gotham City

Scaling. Marty Weiner Grayskull, Eternia. Yashh Nelapati Gotham City Scaling Marty Weiner Grayskull, Eternia Yashh Nelapati Gotham City Pinterest is... An online pinboard to organize and share what inspires you. Relationships Marty Weiner Grayskull, Eternia Yashh Nelapati

More information

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars Practical MySQL Performance Optimization Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars In This Presentation We ll Look at how to approach Performance Optimization Discuss Practical

More information

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System Overview The current paradigm (CCL and Relational DataBase) Propose of a new monitor data system using NoSQL Monitoring Storage Requirements

More information

How you can benefit from using. javier

How you can benefit from using. javier How you can benefit from using I was Lois Lane redis has super powers myth: the bottleneck redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop,mset -P 16 -q On my laptop: SET: 513610 requests

More information

Oracle Database 12c: JMS Sharded Queues

Oracle Database 12c: JMS Sharded Queues Oracle Database 12c: JMS Sharded Queues For high performance, scalable Advanced Queuing ORACLE WHITE PAPER MARCH 2015 Table of Contents Introduction 2 Architecture 3 PERFORMANCE OF AQ-JMS QUEUES 4 PERFORMANCE

More information

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Storm at Twitter Twitter Web Analytics Before Storm Queues Workers Example (simplified) Example Workers schemify tweets and

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 5. Key-value stores Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Key-value store Basic

More information

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented

More information

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since

More information

Real-Time & Big Data GIS: Best Practices. Suzanne Foss Josh Joyner

Real-Time & Big Data GIS: Best Practices. Suzanne Foss Josh Joyner Real-Time & Big Data GIS: Best Practices Suzanne Foss Josh Joyner ArcGIS Enterprise With Real-time Capabilities Desktop Apps APIs visualization ingestion dissemination & actuation analytics storage Agenda:

More information

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another

More information

Scaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer.

Scaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer. caling MongoDB Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB enior ervice Technical ervice Engineer 1 Me and the expected audience @adamotonete Intermediate - At least 6+ months

More information

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these

More information

Classifying malware using network traffic analysis. Or how to learn Redis, git, tshark and Python in 4 hours.

Classifying malware using network traffic analysis. Or how to learn Redis, git, tshark and Python in 4 hours. Classifying malware using network traffic analysis. Or how to learn Redis, git, tshark and Python in 4 hours. Alexandre Dulaunoy January 9, 2015 Problem Statement We have more 5000 pcap files generated

More information

Distributed computing: index building and use

Distributed computing: index building and use Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput

More information

Agenda. Introduction You Me JDriven The case : Westy Tracking Details Implementation Deployment

Agenda. Introduction You Me JDriven The case : Westy Tracking Details Implementation Deployment y t s e W g n i k c a r T e p o r u mmit E u S y r d un o F 6 d 1 u h t Clo 8 2 r e b m Septe Agenda Introduction You Me JDriven The case : Westy Tracking Details Implementation Deployment About you About

More information

Open Source Database Ecosystem in Peter Zaitsev 3 October 2016

Open Source Database Ecosystem in Peter Zaitsev 3 October 2016 Open Source Database Ecosystem in 2016 Peter Zaitsev 3 October 2016 Great things are happening with Open Source Databases It is great Industry and Community to be a part of 2 Why? 3 Data Continues Exponential

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

Making Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS

Making Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS Making Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS What is a session store? A session store is An chunk of data that is connected to one user of a service user

More information

4 Myths about in-memory databases busted

4 Myths about in-memory databases busted 4 Myths about in-memory databases busted Yiftach Shoolman Co-Founder & CTO @ Redis Labs @yiftachsh, @redislabsinc Background - Redis Created by Salvatore Sanfilippo (@antirez) OSS, in-memory NoSQL k/v

More information

Percona Live Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations. Kimberly Wilkins

Percona Live Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations. Kimberly Wilkins Percona Live 2016 Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Kimberly Wilkins Principal Engineer - Databases, Rackspace/ ObjectRocket www.linkedin.com/in/wilkinskimberly,

More information

Fluentd + MongoDB + Spark = Awesome Sauce

Fluentd + MongoDB + Spark = Awesome Sauce Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision

More information

Algorithms for MapReduce. Combiners Partition and Sort Pairs vs Stripes

Algorithms for MapReduce. Combiners Partition and Sort Pairs vs Stripes Algorithms for MapReduce 1 Assignment 1 released Due 16:00 on 20 October Correctness is not enough! Most marks are for efficiency. 2 Combining, Sorting, and Partitioning... and algorithms exploiting these

More information

Hive Metadata Caching Proposal

Hive Metadata Caching Proposal Hive Metadata Caching Proposal Why Metastore Cache During Hive 2 benchmark, we find Hive metastore operation take a lot of time and thus slow down Hive compilation. In some extreme case, it takes much

More information

Scalable Time Series in PCP. Lukas Berk

Scalable Time Series in PCP. Lukas Berk Scalable Time Series in PCP Lukas Berk Summary Problem Statement Proposed Solution Redis Basic Types Summary Current Work Future Work Items Problem Statement Scaling PCP s metrics querying to hundreds/thousands

More information

Sharding Introduction

Sharding Introduction search MongoDB Home Admin Zone Sharding Sharding Introduction Sharding Introduction MongoDB supports an automated sharding architecture, enabling horizontal scaling across multiple nodes. For applications

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

Relational databases

Relational databases COSC 6397 Big Data Analytics NoSQL databases Edgar Gabriel Spring 2017 Relational databases Long lasting industry standard to store data persistently Key points concurrency control, transactions, standard

More information

What is Apache Kafka?

What is Apache Kafka? What is Apache Kafka? How it s similar to the databases you know and love, and how it s not. Kenny Gorman Founder and CEO www.eventador.io www.kennygorman.com @kennygorman I am a database nerd I have done

More information

Scaling Instagram. AirBnB Tech Talk 2012 Mike Krieger Instagram

Scaling Instagram. AirBnB Tech Talk 2012 Mike Krieger Instagram Scaling Instagram AirBnB Tech Talk 2012 Mike Krieger Instagram me - Co-founder, Instagram - Previously: UX & Front-end @ Meebo - Stanford HCI BS/MS - @mikeyk on everything communicating and sharing

More information

Goals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service.

Goals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service. Goals Memcache as a Service Tom Anderson Rapid application development - Speed of adding new features is paramount Scale Billions of users Every user on FB all the time Performance Low latency for every

More information

Marathon Documentation

Marathon Documentation Marathon Documentation Release 3.0.0 Top Free Games Feb 07, 2018 Contents 1 Overview 3 1.1 Features.................................................. 3 1.2 Architecture...............................................

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

INFO-H-415 Advanced Databases Key-value stores and Redis. Fatemeh Shafiee Raisa Uku

INFO-H-415 Advanced Databases Key-value stores and Redis. Fatemeh Shafiee Raisa Uku INFO-H-415 Advanced Databases Key-value stores and Redis Fatemeh Shafiee 000454718 Raisa Uku 000456485 December 2017 Contents 1 Introduction 5 2 NoSQL Databases 5 2.1 Introduction to NoSQL Databases...............................

More information

Extreme Computing. NoSQL.

Extreme Computing. NoSQL. Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable

More information

I Want To Go Faster! A Beginner s Guide to Indexing

I Want To Go Faster! A Beginner s Guide to Indexing I Want To Go Faster! A Beginner s Guide to Indexing Bert Wagner Slides available here! @bertwagner bertwagner.com youtube.com/c/bertwagner bert@bertwagner.com Why Indexes? Biggest bang for the buck Can

More information

CS222P Fall 2017, Final Exam

CS222P Fall 2017, Final Exam STUDENT NAME: STUDENT ID: CS222P Fall 2017, Final Exam Principles of Data Management Department of Computer Science, UC Irvine Prof. Chen Li (Max. Points: 100 + 15) Instructions: This exam has seven (7)

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

Redis for Real-Time Personalization A PSEUDO CODE APPROACH TO IMPLEMENTING PERSONALIZATION WITH REDIS

Redis for Real-Time Personalization A PSEUDO CODE APPROACH TO IMPLEMENTING PERSONALIZATION WITH REDIS Redis for Real-Time Personalization A PSEUDO CODE APPROACH TO IMPLEMENTING PERSONALIZATION WITH REDIS Contents Redis and the Enterprise 3 The Personalization App Implemented in Redis 3 Capabilities That

More information

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan How To Rock with MyRocks Vadim Tkachenko CTO, Percona Webinar, Jan-16 2019 Agenda MyRocks intro and internals MyRocks limitations Benchmarks: When to choose MyRocks over InnoDB Tuning for the best results

More information

Bipul Sinha, Amit Ganesh, Lilian Hobbs, Oracle Corp. Dingbo Zhou, Basavaraj Hubli, Manohar Malayanur, Fannie Mae

Bipul Sinha, Amit Ganesh, Lilian Hobbs, Oracle Corp. Dingbo Zhou, Basavaraj Hubli, Manohar Malayanur, Fannie Mae ONE MILLION FINANCIAL TRANSACTIONS PER HOUR USING ORACLE DATABASE 10G AND XA Bipul Sinha, Amit Ganesh, Lilian Hobbs, Oracle Corp. Dingbo Zhou, Basavaraj Hubli, Manohar Malayanur, Fannie Mae INTRODUCTION

More information

Prototyping Data Intensive Apps: TrendingTopics.org

Prototyping Data Intensive Apps: TrendingTopics.org Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page

More information

Dr. Chuck Cartledge. 19 Nov. 2015

Dr. Chuck Cartledge. 19 Nov. 2015 CS-695 NoSQL Database Redis (part 1 of 2) Dr. Chuck Cartledge 19 Nov. 2015 1/21 Table of contents I 1 Miscellanea 2 DB comparisons 3 Assgn. #7 4 Historical origins 5 Data model 6 CRUDy stuff 7 Other operations

More information

MySQL Performance Improvements

MySQL Performance Improvements Taking Advantage of MySQL Performance Improvements Baron Schwartz, Percona Inc. Introduction About Me (Baron Schwartz) Author of High Performance MySQL 2 nd Edition Creator of Maatkit, innotop, and so

More information

Event Sourcing. Intro & Challenges

Event Sourcing. Intro & Challenges Event Sourcing Intro & Challenges Michael Plöd innoq Principal Consultant @bitboss Most current systems only store the current state Classical Architecture IncidentRestController IncidentBusinessService

More information

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM About us Adamo Tonete MongoDB Support Engineer Agustín Gallego MySQL Support Engineer Agenda What are MongoDB and MySQL; NoSQL

More information