Consistency Without Transactions Global Family Tree
|
|
- Austen Casey
- 6 years ago
- Views:
Transcription
1 Consistency Without Transactions Global Family Tree NoSQL Matters Cologne Spring by Intellectual Reserve, Inc. All rights reserved. 1
2 Contents Introduction to FamilySearch Family Tree Motivation for reimplementation Outline of Cassandra reimplementation Journal-based Consistency Model Experience with Cassandra 2
3 What is FamilySearch? Familysearch.org website Largest single pedigree (Family Tree) Largest collection of free genealogical records Largest genealogical library Family History Department of Church of Jesus Christ of Latter-day Saints (known as Mormons) 3
4 Why does FamilySearch exist? Visit 4
5 Record Preservation Neglect Time Disasters (e.g. WWII) 5
6 Record Preservation (continued) 222 cameras in 48 countries Each camera captures images per hour 6
7 Record Preservation (continued) 150 million images published online 7
8 Indexing 1 billion indexed records about 1M per day! Turns this into this! 8
9 Memories 9
10 Community 10
11 Family Tree & Records Community Records Indexing Family Tree Memories 11
12 Family Tree Data Family Tree: 900M+ person records, open-edit 500M+ relationships, open-edit 8.4B change log entries, 100M+ per quarter Dynamic OLTP system Data-dependent performance issues 12
13 Family Tree: Example 9 Gen Pedigree up to 511 person slots Dynamic content! 13
14 Family Tree: Example Pedigree App 31+ persons per section Dynamic content! 14
15 Family Tree: Example Ancestor Page 10+ persons in families changes Dynamic content! 15
16 Family Tree: Example Change History changes Dynamic content! 16
17 Contents Introduction to FamilySearch Family Tree Motivation for reimplementation Outline of Cassandra reimplementation Journal-based Consistency Model Experience with Cassandra 17
18 Motivations Performance (worse as data has grown) Scale (wider audience) Cost (license & hardware) Cloud (infrastructure moving to cloud) 18
19 Performance & Scale Slow page views pedigree ( ms for 3 generations) change history (2000+ms for first page of changes) large family view Query problems relationships connect persons, range scan by person id every person => person traversal is M btree scan (global index) change history queries travers 8+B btree scan (global index) 19
20 Performance & Scale Query performance problems Wide range scan Pedigree Person Relati onship Person Wide range scan Change History Change History 20
21 Performance & Scale Performance degradation over time more data means wider queries, even for simple records no ability to scale data tier further without prohibitive cost data has resisted sharding, M connected graph Caching wasn t enough cache invalidation was almost as expensive as request immediate read-after-write response required 21
22 Scale & Cost Database bottlenecks even after tuning queries even after getting relationship queries down ~5-10ms database CPU headroom ramping traffic Huge DB box already need to be able to scale data tier directly without huge additional hardware or DB license cost 22
23 Cloud Transition Switching app & database to cloud wide load variation in a week ramp app on-demand db needs to be in the cloud too tolerance for zone / region outages 23
24 Contents Introduction to FamilySearch Family Tree Motivation for reimplementation Outline of Cassandra reimplementation Journal-based Consistency Model Experience with Cassandra 24
25 Cassandra Reimplementation selected Cassandra after extensive testing full data scale proof-of-concept & tests required: new data model (performance) required: new consistency model (critical!) 25
26 Cassandra Reimplementation event-sourced data model journal / views new data model no indexes new consistency model satisfies consistency P1 JE #8 P1 Views A B P2 JE #6 P2 Views A B 26
27 Cassandra Reimplementation denormalized relationships R1 R4 P1 R2 R3 P2 R5 27
28 Cassandra Reimplementation denormalized relationships R1 R4 P1 R2 R3 R2 R3 P2 R5 28
29 Cassandra Reimplementation denormalized relationships R1 R4 P1 R2 R3 R2 R3 P2 R5 29
30 Cassandra Reimplementation denormalized relationships exact duplication allows biderectional traversal P1 R1 R2 R3 R2 R3 R4 P2 Person Relatio nship Wide query Person R5 Person /Rels Person /Rels 30
31 Cassandra Reimplementation change history is a core feature denormalized change history optimizes for displaying recent changes P1 JE #8 P1 Change History View Last changes (local to a single Cassandra cell) 1000s of changes (spread over multiple Cassandra cells) 31
32 Contents Introduction to FamilySearch Family Tree Motivation for reimplementation Outline of Cassandra reimplementation Journal-based Consistency Model Experience with Cassandra 32
33 Journal-based Consistency Model Rough Data Flow Command Journal View View View captures edits safely stores edits canonically view-optimized summations 33
34 Journal-based Consistency Model Command Journal View View View Command write-once with LOCAL_QUORUM application to journal requires 3 tables: pending / completed / aborted idempotent application to journal 34
35 Journal-based Consistency Model Command Journal View View View Command Schema key: command v1 uuid (as text) value: blob (binary json) 35
36 Journal-based Consistency Model Command Journal View View View Journal write-once with LOCAL_QUORUM & C* batch denormalized byte-exact across affected entities each entry stored in separate cell (compaction required for fast journal reads) 36
37 Journal-based Consistency Model Command Journal View View View Journal CmRDT (commutative replicated type) partitions converge without conflict because of unique uuid 37
38 Journal-based Consistency Model Command Journal View View View Partition Key Command UUID Content (blob) KWZ3-P71 KWZ3-P71 eda6f af8d90c-8f3a { "attribution": {}, } (binary json) { "attribution": {}, } (binary json) KCDT-J59 fd35ac61-7def { "attribution": {}, } (binary json) KCDT-J59 b2db2fa5-da5f { "attribution": {}, } (binary json) 38
39 Journal-based Consistency Model Command Journal View View View View multiple views for multiple uses (person, person card, change history) populated by applying journal entries incrementally updated in steady state not canonical data, can be recalculated 39
40 Journal-based Consistency Model Command Journal View View View View CvRDT (convergent replicated type) partitions converge with conflict; resolved by full view refresh from canonical journal steady state: one view of a given type per entity 40
41 Journal-based Consistency Model Command Journal View View View P1 P1 Views A B 41
42 Journal-based Consistency Model Command Journal View View View P1 JE #8 P1 Views JE #8 JE #8 A B 42
43 Journal-based Consistency Model Command Journal View View View P1 JE #8 P1 Views A (new) B (new) JE #8 JE #8 A B 43
44 Journal-based Consistency Model Command Journal View View View View same schema as journal enables journal entries to be written to view for incremental refresh core of the consistency model 44
45 Journal-based Consistency Model Constraints needs to meet scale & performance requirements needs to enable strong consistency needs to support change history natively needs to be easily implementable on key-value stores needs to be tolerant of outages (node, zone, region) 45
46 Journal-based Consistency Model Performance & Scale lookup by partition key only, no indexes any cross-entity change happens in duplicate on all stored current-state views cheapest possible read custom views tunable to different use cases disposable views able to tweak view over time 46
47 Journal-based Consistency Model Strong consistency command store atomic capture of a single command command handling idempotent writes to journal, picked up later even if interrupted no global lock needed for optimistic concurrency 47
48 Journal-based Consistency Model Business Rule Enforcement pre-command checks with LOCAL_QUORUM reads prevent invalid changes write with LOCAL_QUORUM ensures consistent write post-command checks with LOCAL_QUORUM reads prevent business-rules conflicts administrative revert marks command as not applicable and thereby causes full refresh which ignores changes 48
49 Journal-based Consistency Model Journal / View Concerns native support for change history no journal tombstones in steady state write-once simple blob schema able to be implemented on any db engine that supports two-level keys (partition, composite) view tombstones occur on every write, but not likely to live long for any single person, because of write patterns 49
50 Contents Introduction to FamilySearch Family Tree Motivation for reimplementation Outline of Cassandra reimplementation Journal-based Consistency Model Experience with Cassandra 50
51 Experience with Cassandra tested Community 1.2 and 2.0 fantastic performance easy cloud setup great developer response easy to bulk load through CQL3 harder to get running inside AWS VPC 51
52 Experience with Cassandra Bulk import experience 8.4B change log records => 5.8B journal entries (2.5TB lzo) 5 node hi1.4xlarge cluster (2TB SSDs) 11 hours to import through CQL (5h on 30-node cluster) 140k writes / sec, fed from 128 writer threads 20 records / unlogged batch write, 1-2k record size minimal post-import compaction (size-tiered) ended up with 3.5-4TB on C* disk after import OpsCenter great visibility for tuning Community harder to automate repairs, etc. 52
53 Experience with Cassandra Full-scale load test experience got to 25x our production peak load on node cluster production peak load included significant write load working-set size was about 2M persons in a month enabled row cache, ran almost entirely without disk access bottlenecked on interconnect socket round robin client still net bandwidth to spare, trying token-aware next OpsCenter great visibility for tuning Large SSD cluster able to handle repair during scale tests 53
54 Experience with Cassandra current system cassandra impl (1x, 10x, 20x) 54
55 Experience with Cassandra LOG SCALE! current system cassandra impl (1x, 10x, 20x) 55
56 Current Status still working on implementation & rollout migration, reconciliation, integration consistency model code 56
57 Contents Introduction to FamilySearch Family Tree Motivation for reimplementation Outline of Cassandra reimplementation Journal-based Consistency Model Experience with Cassandra Questions? 57
58 Contact Info John Sumsion Sr. Software Engineer Thanks to team at FamilySearch: - RandyB, James, Arn, Jason, Michael, RandallJ, JonM, Louis, JohnK, Tyler, Atsuko, Spencer, Dan Thanks to the awesome presenters & organizers here! 58
Migrating to Cassandra in the Cloud, the Netflix Way
Migrating to Cassandra in the Cloud, the Netflix Way Jason Brown - @jasobrown Senior Software Engineer, Netflix Tech History, 1998-2008 In the beginning, there was the webapp and a single database in a
More informationA Non-Relational Storage Analysis
A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?
More informationSQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
SQL, NoSQL, MongoDB CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden SQL Databases Really better called Relational Databases Key construct is the Relation, a.k.a. the table Rows represent records Columns
More informationAzure-persistence MARTIN MUDRA
Azure-persistence MARTIN MUDRA Storage service access Blobs Queues Tables Storage service Horizontally scalable Zone Redundancy Accounts Based on Uri Pricing Calculator Azure table storage Storage Account
More informationFinal Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm
Final Exam Logistics CS 133: Databases Fall 2018 Lec 25 12/06 NoSQL Final exam take-home Available: Friday December 14 th, 4:00pm in Olin Due: Monday December 17 th, 5:15pm Same resources as midterm Except
More informationDistributed PostgreSQL with YugaByte DB
Distributed PostgreSQL with YugaByte DB Karthik Ranganathan PostgresConf Silicon Valley Oct 16, 2018 1 CHECKOUT THIS REPO: github.com/yugabyte/yb-sql-workshop 2 About Us Founders Kannan Muthukkaruppan,
More informationebay s Architectural Principles
ebay s Architectural Principles Architectural Strategies, Patterns, and Forces for Scaling a Large ecommerce Site Randy Shoup ebay Distinguished Architect QCon London 2008 March 14, 2008 What we re up
More informationebay Marketplace Architecture
ebay Marketplace Architecture Architectural Strategies, Patterns, and Forces Randy Shoup, ebay Distinguished Architect QCon SF 2007 November 9, 2007 What we re up against ebay manages Over 248,000,000
More informationCISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL
CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours
More informationRocksDB Key-Value Store Optimized For Flash
RocksDB Key-Value Store Optimized For Flash Siying Dong Software Engineer, Database Engineering Team @ Facebook April 20, 2016 Agenda 1 What is RocksDB? 2 RocksDB Design 3 Other Features What is RocksDB?
More information10. Replication. Motivation
10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure
More informationSATURN FamilySearch s Family Tree Web Application. Replacing Relational Database Technology and Transitioning to Cloud-Hosted Computing
SATURN 2018 14 th Annual SEI Architecture Technology User Network Conference MAY 7 10, 2018 PLANO, TEXAS FamilySearch s Family Tree Web Application Replacing Relational Database Technology and Transitioning
More informationPetabytes of Preservation on Tape Jason Pierson Oct 2012
Petabytes of Preservation on Tape Jason Pierson Oct 2012 Today we will cover Preservation system overview Tape subsystem design considerations Tape subsystem architecture 2 What do we preserve and why?
More informationState of the Dolphin Developing new Apps in MySQL 8
State of the Dolphin Developing new Apps in MySQL 8 Highlights of MySQL 8.0 technology updates Mark Swarbrick MySQL Principle Presales Consultant Jill Anolik MySQL Global Business Unit Israel Copyright
More informationBUILDING RESILIENCE in PRODUCTION MIGRATIONS. Sangeeta Handa Billing Infrastructure Engineering
BUILDING RESILIENCE in PRODUCTION MIGRATIONS Sangeeta Handa Billing Infrastructure Engineering BUILDING RESILIENCE in PRODUCTION MIGRATIONS Sangeeta Handa Billing Infrastructure Engineering Netflix
More informationBIG DATA AND CONSISTENCY. Amy Babay
BIG DATA AND CONSISTENCY Amy Babay Outline Big Data What is it? How is it used? What problems need to be solved? Replication What are the options? Can we use this to solve Big Data s problems? Putting
More informationINF-5360 Presentation
INF-5360 Presentation Optimistic Replication Ali Ahmad April 29, 2013 Structure of presentation Pessimistic and optimistic replication Elements of Optimistic replication Eventual consistency Scheduling
More information<Insert Picture Here> MySQL Cluster What are we working on
MySQL Cluster What are we working on Mario Beck Principal Consultant The following is intended to outline our general product direction. It is intended for information purposes only,
More informationMigrating Oracle Databases To Cassandra
BY UMAIR MANSOOB Why Cassandra Lower Cost of ownership makes it #1 choice for Big Data OLTP Applications. Unlike Oracle, Cassandra can store structured, semi-structured, and unstructured data. Cassandra
More informationData Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 14: Data Replication Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database Replication What is database replication The advantages of
More informationDocument Sub Title. Yotpo. Technical Overview 07/18/ Yotpo
Document Sub Title Yotpo Technical Overview 07/18/2016 2015 Yotpo Contents Introduction... 3 Yotpo Architecture... 4 Yotpo Back Office (or B2B)... 4 Yotpo On-Site Presence... 4 Technologies... 5 Real-Time
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 16: NoSQL and JSon CSE 414 - Spring 2016 1 Announcements Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5] Today s lecture:
More informationIntroduction to NoSQL Databases
Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction
More informationConfiguration changes such as conversion from a single instance to RAC, ASM, etc.
Today, enterprises have to make sizeable investments in hardware and software to roll out infrastructure changes. For example, a data center may have an initiative to move databases to a low cost computing
More information5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414
Announcements Database Systems CSE 414 Lecture 16: NoSQL and JSon Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5 Today s lecture: JSon The book covers
More informationEvaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization
Evaluating Cloud Storage Strategies James Bottomley; CTO, Server Virtualization Introduction to Storage Attachments: - Local (Direct cheap) SAS, SATA - Remote (SAN, NAS expensive) FC net Types - Block
More information10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414
Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON
More informationDistributed Data Store
Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is
More informationOracle Exadata: Strategy and Roadmap
Oracle Exadata: Strategy and Roadmap - New Technologies, Cloud, and On-Premises Juan Loaiza Senior Vice President, Database Systems Technologies, Oracle Safe Harbor Statement The following is intended
More informationIn-Memory Data Management Jens Krueger
In-Memory Data Management Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute OLTP vs. OLAP 2 Online Transaction Processing (OLTP) Organized in rows Online Analytical Processing
More informationOracle Database 18c and Autonomous Database
Oracle Database 18c and Autonomous Database Maria Colgan Oracle Database Product Management March 2018 @SQLMaria Safe Harbor Statement The following is intended to outline our general product direction.
More informationNew Oracle NoSQL Database APIs that Speed Insertion and Retrieval
New Oracle NoSQL Database APIs that Speed Insertion and Retrieval O R A C L E W H I T E P A P E R F E B R U A R Y 2 0 1 6 1 NEW ORACLE NoSQL DATABASE APIs that SPEED INSERTION AND RETRIEVAL Introduction
More informationMemory-Based Cloud Architectures
Memory-Based Cloud Architectures ( Or: Technical Challenges for OnDemand Business Software) Jan Schaffner Enterprise Platform and Integration Concepts Group Example: Enterprise Benchmarking -) *%'+,#$)
More informationAn Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar
An Exploration into Object Storage for Exascale Supercomputers Raghu Chandrasekar Agenda Introduction Trends and Challenges Design and Implementation of SAROJA Preliminary evaluations Summary and Conclusion
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More information4 Myths about in-memory databases busted
4 Myths about in-memory databases busted Yiftach Shoolman Co-Founder & CTO @ Redis Labs @yiftachsh, @redislabsinc Background - Redis Created by Salvatore Sanfilippo (@antirez) OSS, in-memory NoSQL k/v
More informationNewSQL Database for New Real-time Applications
Cologne, Germany May 30, 2012 NewSQL Database for New Real-time Applications PhD Peter Idestam-Almquist CTO, Starcounter AB 1 New real time applications Millions of simultaneous online users. High degree
More informationBuilding Consistent Transactions with Inconsistent Replication
Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationApache Hadoop Goes Realtime at Facebook. Himanshu Sharma
Apache Hadoop Goes Realtime at Facebook Guide - Dr. Sunny S. Chung Presented By- Anand K Singh Himanshu Sharma Index Problem with Current Stack Apache Hadoop and Hbase Zookeeper Applications of HBase at
More informationCassandra 2012: What's New & Upcoming. Sam Tunnicliffe
Cassandra 2012: What's New & Upcoming Sam Tunnicliffe sam@datastax.com DSE : integrated Big Data platform Built on Cassandra Analytics using Hadoop (Hive/Pig/Mahout) Enterprise Search with Solr Cassandra
More informationSCYLLA: NoSQL at Ludicrous Speed. 主讲人 :ScyllaDB 软件工程师贺俊
SCYLLA: NoSQL at Ludicrous Speed 主讲人 :ScyllaDB 软件工程师贺俊 Today we will cover: + Intro: Who we are, what we do, who uses it + Why we started ScyllaDB + Why should you care + How we made design decisions to
More informationThere And Back Again
There And Back Again Databases At Uber Evan Klitzke October 4, 2016 Outline Background MySQL To Postgres Connection Scalability Write Amplification/Replication Miscellaneous Other Things Databases at Uber
More informationCSE 344 Final Review. August 16 th
CSE 344 Final Review August 16 th Final In class on Friday One sheet of notes, front and back cost formulas also provided Practice exam on web site Good luck! Primary Topics Parallel DBs parallel join
More informationScaleArc for SQL Server
Solution Brief ScaleArc for SQL Server Overview Organizations around the world depend on SQL Server for their revenuegenerating, customer-facing applications, running their most business-critical operations
More informationArchitecture of a Real-Time Operational DBMS
Architecture of a Real-Time Operational DBMS Srini V. Srinivasan Founder, Chief Development Officer Aerospike CMG India Keynote Thane December 3, 2016 [ CMGI Keynote, Thane, India. 2016 Aerospike Inc.
More informationEverything You Need to Know About MySQL Group Replication
Everything You Need to Know About MySQL Group Replication Luís Soares (luis.soares@oracle.com) Principal Software Engineer, MySQL Replication Lead Copyright 2017, Oracle and/or its affiliates. All rights
More informationHP NonStop Database Solution
CHOICE - CONFIDENCE - CONSISTENCY HP NonStop Database Solution Marco Sansoni, HP NonStop Business Critical Systems 9 ottobre 2012 Agenda Introduction to HP NonStop platform HP NonStop SQL database solution
More informationScalability of web applications
Scalability of web applications CSCI 470: Web Science Keith Vertanen Copyright 2014 Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing
More informationDatacenter replication solution with quasardb
Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION
More informationAgenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache
Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationLook Up! Your Future is in the Cloud
Look Up! Your Future is in the Cloud What is the Cloud? Data centers at scale networked elastic computation big data ISP Telecom ISP 2 Paradigm Shift Single computer to clusters + mobile 3 Batch and Interactive
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationMySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX /
MySQL High Availability Michael Messina Senior Managing Consultant, Rolta-AdvizeX mmessina@advizex.com / mike.messina@rolta.com Introduction Michael Messina Senior Managing Consultant Rolta-AdvizeX, Working
More informationCIB Session 12th NoSQL Databases Structures
CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is
More informationFLAT DATACENTER STORAGE CHANDNI MODI (FN8692)
FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) OUTLINE Flat datacenter storage Deterministic data placement in fds Metadata properties of fds Per-blob metadata in fds Dynamic Work Allocation in fds Replication
More informationScylla Open Source 3.0
SCYLLADB PRODUCT OVERVIEW Scylla Open Source 3.0 Scylla is an open source NoSQL database that offers the horizontal scale-out and fault-tolerance of Apache Cassandra, but delivers 10X the throughput and
More informationIncreasing Performance for PowerCenter Sessions that Use Partitions
Increasing Performance for PowerCenter Sessions that Use Partitions 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,
More informationBig Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI
Big Data Development CASSANDRA NoSQL Training - Workshop November 20 to 24 2016 (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 9798 Dubai UAE, email training-coordinator@isidusnet
More informationCMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22
More informationCSE-E5430 Scalable Cloud Computing Lecture 9
CSE-E5430 Scalable Cloud Computing Lecture 9 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 15.11-2015 1/24 BigTable Described in the paper: Fay
More informationTrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa
TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa EPL646: Advanced Topics in Databases Christos Hadjistyllis
More informationFAST& SCALABLE SYSTEMS WITH APACHESOLR. Arnon Yogev IBM Research
FAST& SCALABLE EMAIL SYSTEMS WITH APACHESOLR Arnon Yogev IBM Research Background IBM Verse is a cloud based business email system Background cont. Verse backend is based on Apache Solr Almost every user
More informationScaling for Humongous amounts of data with MongoDB
Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis
More informationsystems & research project
class 4 systems & research project prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ index index knows order about the data data filtering data: point/range queries index data A B C sorted A B C initial
More informationPerformance Innovations with Oracle Database In-Memory
Performance Innovations with Oracle Database In-Memory Eric Cohen Solution Architect Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information
More informationMicroservices at Netflix Scale. First Principles, Tradeoffs, Lessons Learned Ruslan
Microservices at Netflix Scale First Principles, Tradeoffs, Lessons Learned Ruslan Meshenberg @rusmeshenberg Microservices: all benefits, no costs? Netflix is the world s leading Internet television network
More informationOverview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL
* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL
More informationNon-Relational Databases. Pelle Jakovits
Non-Relational Databases Pelle Jakovits 25 October 2017 Outline Background Relational model Database scaling The NoSQL Movement CAP Theorem Non-relational data models Key-value Document-oriented Column
More informationApache Cassandra. Tips and tricks for Azure
Apache Cassandra Tips and tricks for Azure Agenda - 6 months in production Introduction to Cassandra Design and Test Getting ready for production The first 6 months 1 Quick introduction to Cassandra Client
More informationJyotheswar Kuricheti
Jyotheswar Kuricheti 1 Agenda: 1. Performance Tuning Overview 2. Identify Bottlenecks 3. Optimizing at different levels : Target Source Mapping Session System 2 3 Performance Tuning Overview: 4 What is
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationIntro Cassandra. Adelaide Big Data Meetup.
Intro Cassandra Adelaide Big Data Meetup instaclustr.com @Instaclustr Who am I and what do I do? Alex Lourie Worked at Red Hat, Datastax and now Instaclustr We currently manage x10s nodes for various customers,
More informationWhite Paper Amazon Aurora A Fast, Affordable and Powerful RDBMS
White Paper Amazon Aurora A Fast, Affordable and Powerful RDBMS TABLE OF CONTENTS Introduction 3 Multi-Tenant Logging and Storage Layer with Service-Oriented Architecture 3 High Availability with Self-Healing
More informationGhislain Fourny. Big Data 5. Column stores
Ghislain Fourny Big Data 5. Column stores 1 Introduction 2 Relational model 3 Relational model Schema 4 Issues with relational databases (RDBMS) Small scale Single machine 5 Can we fix a RDBMS? Scale up
More informationSQL Gone Wild: Taming Bad SQL the Easy Way (or the Hard Way) Sergey Koltakov Product Manager, Database Manageability
SQL Gone Wild: Taming Bad SQL the Easy Way (or the Hard Way) Sergey Koltakov Product Manager, Database Manageability Oracle Enterprise Manager Top-Down, Integrated Application Management Complete, Open,
More informationScaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX
Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic
More informationMDHIM: A Parallel Key/Value Store Framework for HPC
MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system
More informationA Practical Scalable Distributed B-Tree
A Practical Scalable Distributed B-Tree CS 848 Paper Presentation Marcos K. Aguilera, Wojciech Golab, Mehul A. Shah PVLDB 08 March 8, 2010 Presenter: Evguenia (Elmi) Eflov Presentation Outline 1 Background
More informationScott Meder Senior Regional Sales Manager
www.raima.com Scott Meder Senior Regional Sales Manager scott.meder@raima.com Short Introduction to Raima What is Data Management What are your requirements? How do I make the right decision? - Architecture
More informationBuilding High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL
Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high
More informationScaling Slack. Bing Wei
Scaling Slack Bing Wei Infrastructure@Slack 2 3 Our Mission: To make people s working lives simpler, more pleasant, and more productive. 4 From supporting small teams To serving gigantic organizations
More informationImproving overall Robinhood performance for use on large-scale deployments Colin Faber
Improving overall Robinhood performance for use on large-scale deployments Colin Faber 2017 Seagate Technology LLC 1 WHAT IS ROBINHOOD? Robinhood is a versatile policy engine
More information5 reasons why choosing Apache Cassandra is planning for a multi-cloud future
White Paper 5 reasons why choosing Apache Cassandra is planning for a multi-cloud future Abstract We have been hearing for several years now that multi-cloud deployment is something that is highly desirable,
More informationMain-Memory Databases 1 / 25
1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low
More informationMySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.
MySQL In the Cloud Migration, Best Practices, High Availability, Scaling Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017 1 Let me start. With some Questions! 2 Question One How Many of you
More informationOutline. Spanner Mo/va/on. Tom Anderson
Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable
More informationOracle Database In-Memory What s New and What s Coming
Oracle Database In-Memory What s New and What s Coming Andy Rivenes Product Manager for Database In-Memory Oracle Database Systems DOAG - May 10, 2016 #DBIM12c Safe Harbor Statement The following is intended
More informationCrescando: Predictable Performance for Unpredictable Workloads
Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing
More informationWindows Servers In Microsoft Azure
$6/Month Windows Servers In Microsoft Azure What I m Going Over 1. How inexpensive servers in Microsoft Azure are 2. How I get Windows servers for $6/month 3. Why Azure hosted servers are way better 4.
More informationLecture 8: Internet and Online Services. CS 598: Advanced Internetworking Matthew Caesar March 3, 2011
Lecture 8: Internet and Online Services CS 598: Advanced Internetworking Matthew Caesar March 3, 2011 Demands of modern networked services Old approach: run applications on local PC Now: major innovation
More informationEfficiency at Scale. Sanjeev Kumar Director of Engineering, Facebook
Efficiency at Scale Sanjeev Kumar Director of Engineering, Facebook International Workshop on Rack-scale Computing, April 2014 Agenda 1 Overview 2 Datacenter Architecture 3 Case Study: Optimizing BLOB
More informationNoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Leveraging the NoSQL boom 2 Why NoSQL? In the last fourty years relational databases have been the default choice for serious
More informationBig Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016
Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationCloud Analytics and Business Intelligence on AWS
Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse
More informationThe NoSQL movement. CouchDB as an example
The NoSQL movement CouchDB as an example About me sleepnova - I'm a freelancer Interests: emerging technology, digital art web, embedded system, javascript, programming language Some of my works: Chrome
More informationPutting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt
Putting together the platform: Riak, Redis, Solr and Spark Bryan Hunt 1 $ whoami Bryan Hunt Client Services Engineer @binarytemple 2 Minimum viable product - the ideologically correct doctrine 1. Start
More information10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein
10. Replication CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety Copyright 2012 Philip A. Bernstein 1 Outline 1. Introduction 2. Primary-Copy Replication 3. Multi-Master Replication 4.
More information