S2Graph : A large-scale graph database
|
|
- Junior Atkins
- 6 years ago
- Views:
Transcription
1 daumkakao S2Graph : A large-scale graph database with Hbase Doyoung Yoon x Taejin Chin
2 DaumKakao A Mobile Lifestyle Platform 1. KakaoTalk a. Mobile Messenger replacing SMS b. KaTalkHe is being used as a verb in Korea like Googling c. 96% of Korean smartphone users are using KakaoTalk d. 170M users worldwide e. 3B messages / day 2
3 DaumKakao A Mobile Lifestyle Platform Social Platform Contents Platform Commerce Platform Marketing Platform Local Platform Personal Platform KakaoTalk Media Daum KakaoPick Yellow ID Daum Map KakaoHome KakaoStory KakaoGame Gift Shop Plus Friend KakaoPlace Sol calendar KakaoGroup Digital Item Store KakaoStyle Story Plus Sol Mail Daum Cafe KakaoTopic Daum Cluod 96% of Zap KakaoPage Korean smartphone Biggest mobile users are using KakaoTalk SNS in Korea Sol Group KakaoMusic messenger, 170 million users worldwide) Daum tvpot Daum Webtoon 3
4 Our Social Graph Advertise Listen count : Message Coupon price : affinity Emoticon affinity affinity: affinity affinity Like count : 7 Pick withfriend : Friend Write length : Play level: 6 View count : affinity affinity affinity affinity Comment Eat rating : affinity Read Search keyword : Group size : 6 Present price : 3 4
5 Our Social Graph Message ID : 201 Ad ID : 603 Advertise ctr : 0.32 Listen count : 6 Music ID Message length : 9 affinity 4 affinity 3 affinity: 9 affinity 6 affinity 9 Item ID : 13 Pick withfriend : 3 Friend affinity 1 Write length : 3 Post ID : 97 Play level: 6 Search keyword : HBase" affinity 3 affinity 2 affinity 2 affinity 3 C l Game ID :
6 Technical Challenges 1. Large social graph constantly changing a. Scale more than, social network: 10 billion edges, 200 million vertices, 50 million update on existing edges. user activities: 400 million new edges per day 6
7 Technical Challenges (cont) 2. Low latency for breadth first search traversal on connected data. a. performance requirement peak graph-traversing query per second: response time: 100ms 7
8 Technical Challenges (cont) 3. Update should be applied to graph in real time for viral effect Person A Fast Person B Fast Person C Fast Person D Post Comment Sharing Mention 8
9 Technical Challenges (cont) 4. Support for Dynamic Ranking logic a. push strategy: hard to change data ranking logic dynamically. b. pull strategy: can try various data ranking logic 9
10 Before Messaging App SNS App Blog App Friend relationship SNS feeds Blog user activities Messaging Each app server should know each DB s sharding logic. Highly inter-connected architecture 10
11 After Messaging App SNS App Blog App S2Graph DB stateless app servers 11
12 S2Graph : Distributed Online GraphDB 1.Low-latency 2.Graph-traversable 3.Scalable 4.Eventually consistent 5.Asynchronous, non-blocking 12
13 Why We Choose HBase? 1.High Availability 2.Scalability 3.Low latency 4.High concurrency 5.Fault tolerant 6.Integration with HDFS 7.Distributed operation 13
14 The Data Model vertex 1 out edges vertex 2 in edges 1. Columns 2. Labels 3. Directions 4. Index Properties 5. Non-index Properties edge 2 label edge 2 source vertex edge 2 target vertex 1 2 comment 3 5 date = know 4 created edge 5 properties name = josh age = 32 vertex 4 id vertex 4 properties 14
15 How to store the data - Edge Logical View 1. Snapshot edges : Up-to-date status of edge row column Tgt Vertex ID1 Tgt Vertex ID2 Tgt Vertex ID3 Src Vertex ID1 Properties Properties Properties Src Vertex ID2 Properties Properties Properties a. Fetching an edge between two specific vertex b. Lookup Table to reach indexed edges for update, increment, delete operations 15
16 How to store the data - Edge Logical View 2. Indexed edges : Edges with index row column Index Values Tgt Vertex ID1 Index Values Tgt Vertex ID2 Src Vertex ID1 Non-index Properties Non-index Properties a. Fetches edges originating from a certain vertex in order of index 16
17 How to store the data - Edge Physical View - table schema 1. Snapshot Edge a. Rowkey Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string) 17
18 How to store the data - Edge Physical View - table schema 1. Snapshot Edge b. Qualifier Target Vertex ID variable length c. Value All Property Key Value Pairs variable length 18
19 How to store the data - Edge Physical View - table schema 2. Indexed Edge a. Rowkey Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string) 19
20 How to store the data - Edge Physical View - table schema 2. Indexed Edge b. Qualifier Index Property Values variable length Tgt Vertex ID variable length c. Value Non-index Property Key Value Pairs variable length 20
21 How to store the data - Vertex Logical View 1. Vertex : Up-to-date status of Vertex row column Property Key1 Property Key2 Src Vertex ID1 Value1 Value2 Vertex ID2 Value1 Value2 21
22 How to store the data - Vertex Physical View - table schema 1. Vertex : Up-to-date status of Vertex a. Rowkey Murmur Hash Column ID Vertex ID 16 bit integer(32bit) variable length b. Qualifier c. Value Property Key Byte(8 bit) Property Value variable length 22
23 How to read the data - GetEdges Using a custom query DSL on top of HTTP User 1 curl -XPOST localhost:9000/graphs/getedges -H 'Content-Type: Application/json' -d ' { "srcvertices": [{"servicename": "s2graph", "columnname": "account_id", "id":1}], Friends Friends Step 1 "steps": [ [{"label": "friends", "direction": "out", "limit": 100}], // step friend 1 friend 2 [{"label": "hear", "direction": "out", "limit": 10}] } ] hear time: hear time: hear time: Step 2 ' Steps = a list of Step Step = contains the labels to traverse and how to rank them in the result Don t let go let it be let it go 23
24 How to read the data - GetEdges Example Friend list User 1 curl -XPOST localhost:9000/graphs/getedges -H 'Content-Type: Application/json' -d ' { "srcvertices": [{"servicename": "s2graph", "columnname": "account_id", "id":1}], } ' "steps": [ ] [{"label": "friends", "direction": "out", "limit": 100}], // step Friends Friends friend 1 friend 2 24
25 How to read the data - GetEdges Example Songs my friends have listened User 1 curl -XPOST localhost:9000/graphs/getedges -H 'Content-Type: Application/json' -d ' { Friends Friends "srcvertices": [{"servicename": "s2graph", "columnname": "account_id", "id":1}], "steps": [ [{"label": "friends", "direction": "out", "limit": 50, scoring : { score : 1.0}], friend 1 friend 2 [{"label": "listen", "direction": "out", "limit": 10}] ] } hear time: hear time: hear time: ' Don t let go let it be let it go Reference : 25
26 How to read the data - GetEdges Example Similar songs to songs that I have listened to. User 1 curl -XPOST localhost:9000/graphs/getedges -H 'Content-Type: Application/json' -d ' { } "srcvertices": [{"servicename": "s2graph", "columnname": "account_id", "id":1}], "steps": [ [{"label": "listen", "direction": "out", "limit": 50}], [{"label": "similar_song", "direction": "out", "limit": 10, scoring : { score : 1.0}] ] hear time: hear time: hear time: Don t let go let it be let it go similar_song similarity: 0.3 similar_song similarity: 0.4 similar_song similarity: 0.6 let it bleed Hey jude Do you wanna build a snowman? 26
27 How to read the data - GetVertices curl -XPOST localhost:9000/graphs/getvertices -H 'Content-Type: Application/json' -d ' [ {"servicename": "s2graph", "columnname": "account_id", "ids": [1, 2, 3]}, {"servicename": "kakaomusic", "columnname": "user_id", "ids": [1, 2, 3]} ] ' User 1 {created_at: , updated_at: } User 2 {created_at: , updated_at: } 27
28 How to write the data - Insert curl -XPOST localhost:9000/graphs/edges/insert -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","props":{"time":-1, "weight":10},"timestamp": }, ] ' User 1 User 2 28
29 How to write the data - Delete curl -XPOST localhost:9000/graphs/edges/delete -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","timestamp": }, {"from":1,"to":3,"label":"graph_test","timestamp": }, ] ' User 1 User 2 29
30 How to write the data - Update curl -XPOST localhost:9000/graphs/edges/update -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","timestamp": , "props": {"is_hidden": true, status : 200}, {"from":1,"to":3,"label":"graph_test","timestamp": , "props": {"status": -500} ] friend {is_hidden:true, User 1 User 2 status:200} 30
31 REST API Spec. Read Write Management 1. getedges 2. checkedge 3. getedgescount 4. getvertices 1. insert 2. delete 3. update 4. increment 1. create service (vertex type) 2. create label (edge type) 3. add Index 31
32 HBase Table Configuration 1. setdurability(durability.async_wal) 2. setcompressiontype(compression.algorithm.lz4) 3. setbloomfiltertype(bloomtype.row) 4. setdatablockencoding(datablockencoding.fast_diff) 5. setblocksize(32768) 6. setblockcacheenabled(true) 7. pre-split by (Intger.MaxValue / regioncount). regioncount = 120 when create table(on 20 region server). 32
33 HBase Cluster Configuration each machine: 8core, 32G memory hfile.block.cache.size: 0.6 hbase.hregion.memstore.flush.size: 128MB otherwise use default value from CDH
34 Overall Architecture Clients S2graph 34
35 Compare to other Online GraphDBs Titan (v0.4.2) a. Pros - Rich API and easy to setup - Relatively large community - Transaction handling b. Cons - Using it s own ID system; less efficient for graph traversal (details in next slide) - Index data stored on one region (hotspot) with strong consistency option - Not many references on Titan with HBase comparing to other storages 35
36 Compare to Titan Titan is less efficient for graph traversal - For following 1 normal graph traversal query, Vertex( userid: 1 ).out( friends ).limit(10).out( friends ).limit(10) User 1 friends friends 36
37 Compare to Titan (cont) Vertex( userid:1 ).out( friend ).limit(10).out( friend ).limit(10) Titan A S2grap B h e C f D Titan S2graph # of read requests on HBase 112 = 1 (Vertex Lookup : a) + 1 (1st step edges : b) + 10 (2nd step edges : c) (Destination Vertices : d) 11 = 1 (1step edges : e) + 10 (2nd step edges : f) 37
38 Performance 1. Test data a. Total # of Edges: 9,000,000,000 b. Average # of adjacent edges per vertex: 500 c. Seed vertex: vertices that has more than 100 adjacent edges. 2. Test environment a. Zookeeper server: 3 b. HBase Masterserver: 2 c. HBase Regionserver: 20 d. App server: 8 core, 16GB Ram 38
39 Performance 2. Linear scalability QPS(Query Per Second) Latency(ms) QPS # of app server QPS 4,000 3,000 2,000 1, Benchmark Query : src.out( friend ).limit(50).out( friend ).limit(10) - Total concurrency: 20 * # of app server 3,097 1, # of app server Latency 39
40 Performance 3. Varying width of traverse (tested with a single server) QPS Latency(ms) 2, , QPS 1, Latency Limit on first step - Benchmark Query : src.out( friend ).limit(x).out( friend ).limit(10) - Total concurrency = 20 * 1(# of app server) 40
41 Performance 3. Varying width of traverse (tested with a single server) QPS Latency(ms) 2, , QPS 1, Latency Limit on first step - Benchmark Query : src.out( friend ).limit(x).out( friend ).limit(10) - Total concurrency = 20 * 1(# of app server)
42 Performance 4. Different query path(different I/O pattern) QPS Latency(ms) QPS Latency > > > 10 -> > 5 -> 10 -> > 5 -> 2 -> 5 -> 10 0 limits on path - All query touch 1000 edges. - each step` limit is on x axis. - Can expect performance with given query`s search space. 42
43 Performance 5. Write throughput per operation on single app server 5 Insert operation 3.75 Latency Request per second 43
44 Performance 6. write throughput per operation on single app server 8 Update(increment/update/delete) operation 6 Latency Request per second 44
45 Stats 1. HBase cluster per IDC (2 IDC) - 3 Zookeeper Server - 2 HBase Master - 20 HBase Slave 2. App server per IDC - 10 server for write-only - 20 server for query only 3. Real traffic - read: over 10K request per second - now mostly 2 step queries with limit 100 on first step. - write: over 5k request per second * Deep traversal queries are not counted since it is in test stage for production 45
46 46
47 Through HBase! 47
48 Now Available As an Open Source Finding a mentor Contact - Taejin Chin : taejin.chin@gmail.com - Doyoung Yoon : shom83@gmail.com 48
49 Appendix 3.5x performance improvement using Asynchbase Native Client QPS Native Client Latency(ms) Asnychbase QPS Asynchbase Latency(ms) 2, ,000 1, , ,500 1, ,192 QPS 1, Latency QPS 1, Latency # of app server # of app server - Benchmark Query : src.out( friend ).limit(50).out( friend ).limit(10) - Test seed edges have adjacent edges more than 100: 30millions - Total concurrency: 20 * # of app server 49
S2Graph : A large-scale graph database with Hbase
daumkakao S2Graph : A large-scale graph database with Hbase Reference 1. HBase Conference 2015 1.http://www.slideshare.net/HBaseCon/use-cases-session-5 2.https://vimeo.com/128203919 2. Deview 2015 3. Apache
More informationEfficient and Scalable Friend Recommendations
Efficient and Scalable Friend Recommendations Comparing Traditional and Graph-Processing Approaches Nicholas Tietz Software Engineer at GraphSQL nicholas@graphsql.com January 13, 2014 1 Introduction 2
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationMobile Life Platform. November 2014 Investor Relations
Mobile Life Platform November 2014 Investor Relations Contents Overview About Daum Kakao Merger Summary Our Vision Business Review Service Portfolio KakaoTalk Advertising Game Commerce Contents Payments
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate
More information4 th Quarter and FY2014 Results Investor Relations
4 th Quarter and FY2014 Results 2015. 2. 12 Investor Relations Disclaimer Financial information contained in this document is based on consolidated K-IFRS that have not been audited by an independent auditor:
More informationRails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011
Rails on HBase Zachary Pinter and Tony Hillerson RailsConf 2011 What we will cover What is it? What are the tradeoffs that HBase makes? Why HBase is probably the wrong choice for your app Why HBase might
More informationMobile Life Platform. February 2015 Investor Relations
Mobile Life Platform February 2015 Investor Relations Contents Overview About Daum Kakao History Our Vision Business Review Service Portfolio KakaoTalk Advertising Game Commerce Contents New Initiatives
More informationCOSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables
COSC 6339 Big Data Analytics NoSQL (II) HBase Edgar Gabriel Fall 2018 HBase Column-Oriented data store Distributed designed to serve large tables Billions of rows and millions of columns Runs on a cluster
More informationData Infrastructure at LinkedIn. Shirshanka Das XLDB 2011
Data Infrastructure at LinkedIn Shirshanka Das XLDB 2011 1 Me UCLA Ph.D. 2005 (Distributed protocols in content delivery networks) PayPal (Web frameworks and Session Stores) Yahoo! (Serving Infrastructure,
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data
More informationApache Hadoop Goes Realtime at Facebook. Himanshu Sharma
Apache Hadoop Goes Realtime at Facebook Guide - Dr. Sunny S. Chung Presented By- Anand K Singh Himanshu Sharma Index Problem with Current Stack Apache Hadoop and Hbase Zookeeper Applications of HBase at
More informationBigTable: A Distributed Storage System for Structured Data
BigTable: A Distributed Storage System for Structured Data Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26
More information10 Million Smart Meter Data with Apache HBase
10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationHBase Solutions at Facebook
HBase Solutions at Facebook Nicolas Spiegelberg Software Engineer, Facebook QCon Hangzhou, October 28 th, 2012 Outline HBase Overview Single Tenant: Messages Selection Criteria Multi-tenant Solutions
More informationUnlocking the Power of Mobile (Global vs Local) 2 nd Aug, 2010 UM Digital Media Team
Unlocking the Power of Mobile (Global vs Local) 2 nd Aug, 2010 UM Digital Media Team Mobile adoption is happening in big ways but there s no roadmap for brands and consumers. ipad crushes Obsession Did
More informationArchitecture of a Real-Time Operational DBMS
Architecture of a Real-Time Operational DBMS Srini V. Srinivasan Founder, Chief Development Officer Aerospike CMG India Keynote Thane December 3, 2016 [ CMGI Keynote, Thane, India. 2016 Aerospike Inc.
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationDistributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016
Distributed Systems 05r. Case study: Google Cluster Architecture Paul Krzyzanowski Rutgers University Fall 2016 1 A note about relevancy This describes the Google search cluster architecture in the mid
More informationCIB Session 12th NoSQL Databases Structures
CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is
More informationJargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems
Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons
More informationHBase. Леонид Налчаджи
HBase Леонид Налчаджи leonid.nalchadzhi@gmail.com HBase Overview Table layout Architecture Client API Key design 2 Overview 3 Overview NoSQL Column oriented Versioned 4 Overview All rows ordered by row
More informationIntroduction to Database Services
Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationIntroduction to NoSQL Databases
Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction
More informationMDHIM: A Parallel Key/Value Store Framework for HPC
MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system
More informationSCALING COUCHDB WITH BIGCOUCH. Adam Kocoloski Cloudant
SCALING COUCHDB WITH BIGCOUCH Adam Kocoloski Cloudant Erlang Factory SF Bay Area 2011 OUTLINE Introductions Brief intro to CouchDB BigCouch Usage Overview BigCouch Internals Reports from the Trenches 2
More informationBigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis
BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis Motivation Lots of (semi-)structured data at Google URLs: Contents, crawl metadata, links, anchors, pagerank,
More informationNoSQL Databases Analysis
NoSQL Databases Analysis Jeffrey Young Intro I chose to investigate Redis, MongoDB, and Neo4j. I chose Redis because I always read about Redis use and its extreme popularity yet I know little about it.
More informationBigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13
Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University
More informationSpotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014
Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify
More informationDigital Media Market Profile: South Korea.
Digital Media Market Profile: South Korea. Digital Media Market Profile: South Korea. 01 Introduction South Korea has been among the leading nations for years in terms of internet advancements and broadband
More informationEfficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)
Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-
More informationADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services
ADVANCED HBASE Architecture and Schema Design GeeCON, May 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera Consulting on Hadoop projects (everywhere) Apache Committer
More informationA Glimpse of the Hadoop Echosystem
A Glimpse of the Hadoop Echosystem 1 Hadoop Echosystem A cluster is shared among several users in an organization Different services HDFS and MapReduce provide the lower layers of the infrastructures Other
More informationHadoop. copyright 2011 Trainologic LTD
Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides
More informationBIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,
BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 1 OBJECTIVES ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 2 WHAT
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationStorageTapper. Real-time MySQL Change Data Uber. Ovais Tariq, Shriniket Kale & Yevgeniy Firsov. October 03, 2017
StorageTapper Real-time MySQL Change Data Streaming @ Uber Ovais Tariq, Shriniket Kale & Yevgeniy Firsov October 03, 2017 Overview What we will cover today Background & Motivation High Level Features System
More informationCS November 2018
Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationHBASE MOCK TEST HBASE MOCK TEST III
http://www.tutorialspoint.com HBASE MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to HBase. You can download these sample mock tests at your local machine
More informationUsing space-filling curves for multidimensional
Using space-filling curves for multidimensional indexing Dr. Bisztray Dénes Senior Research Engineer 1 Nokia Solutions and Networks 2014 In medias res Performance problems with RDBMS Switch to NoSQL store
More informationCS November 2017
Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationNew Oracle NoSQL Database APIs that Speed Insertion and Retrieval
New Oracle NoSQL Database APIs that Speed Insertion and Retrieval O R A C L E W H I T E P A P E R F E B R U A R Y 2 0 1 6 1 NEW ORACLE NoSQL DATABASE APIs that SPEED INSERTION AND RETRIEVAL Introduction
More informationCSE 124: Networked Services Lecture-17
Fall 2010 CSE 124: Networked Services Lecture-17 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/30/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationExtreme Computing. NoSQL.
Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable
More informationADVANCED DATABASES CIS 6930 Dr. Markus Schneider. Group 5 Ajantha Ramineni, Sahil Tiwari, Rishabh Jain, Shivang Gupta
ADVANCED DATABASES CIS 6930 Dr. Markus Schneider Group 5 Ajantha Ramineni, Sahil Tiwari, Rishabh Jain, Shivang Gupta WHAT IS ELASTIC SEARCH? Elastic Search Elasticsearch is a search engine based on Lucene.
More informationHBASE INTERVIEW QUESTIONS
HBASE INTERVIEW QUESTIONS http://www.tutorialspoint.com/hbase/hbase_interview_questions.htm Copyright tutorialspoint.com Dear readers, these HBase Interview Questions have been designed specially to get
More informationHow Tencent uses PGXZ(PGXC) in WeChat payment system. jason( 李跃森 )
How Tencent uses PGXZ(PGXC) in WeChat payment system jason( 李跃森 ) jasonysli@tencent.com About Tencent and Wechat Payment Tencent, one of the biggest internet companies in China. Wechat, the most popular
More informationHypertable: The Storage Infrastructure behind Rediffmail - one of the World s Largest Services. Introduction. Current Architecture
Hypertable: The Storage Infrastructure behind Rediffmail - one of the World s Largest Email Services Doug Judd CEO, Hypertable Inc. Introduction Rediff.com India (Nasdaq: REDF) is one of India's top Internet
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationIntroduction to Column Oriented Databases in PHP
Introduction to Column Oriented Databases in PHP By Slavey Karadzhov About Me Name: Slavey Karadzhov (Славей Караджов) Programmer since my early days PHP programmer since 1999
More informationA Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers
A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented
More informationPutting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21
Big Processing -Parallel Computation COS 418: Distributed Systems Lecture 21 Michael Freedman 2 Ex: Word count using partial aggregation Putting it together 1. Compute word counts from individual files
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.
More information#IoT #BigData. 10/31/14
#IoT #BigData Seema Jethani @seemaj @basho 1 10/31/14 Why should we care? 2 11/2/14 Source: http://en.wikipedia.org/wiki/internet_of_things Motivation for Specialized Big Data Systems Rate of data capture
More information10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414
Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON
More informationBox: Using HBase as a message queue. David MacKenzie Staff So2ware Engineer
/events @ Box: Using HBase as a message queue David MacKenzie Staff So2ware Engineer Share, manage and access your content from any device, anywhere 2 What is the /events API? RealOme stream of all acovity
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationFattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب
Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب 1391 زمستان Outlines Introduction DataModel Architecture HBase vs. RDBMS HBase users 2 Why Hadoop? Datasets are growing to Petabytes Traditional datasets
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationTrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa
TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa EPL646: Advanced Topics in Databases Christos Hadjistyllis
More informationDevelopment And Management
Development And Management Native and Instant Applications (PWA) Functionality List for 2017/2018 Around Us The Around Us feature is a great way to display relevant points of interest within your app,
More informationHBase... And Lewis Carroll! Twi:er,
HBase... And Lewis Carroll! jw4ean@cloudera.com Twi:er, LinkedIn: @jw4ean 1 Introduc@on 2010: Cloudera Solu@ons Architect 2011: Cloudera TAM/DSE 2012-2013: Cloudera Training focusing on Partners and Newbies
More informationHBase Practice At Xiaomi.
HBase Practice At Xiaomi huzheng@xiaomi.com About This Talk Async HBase Client Why Async HBase Client Implementation Performance How do we tuning G1GC for HBase CMS vs G1 Tuning G1GC G1GC in XiaoMi HBase
More informationMySQL Group Replication. Bogdan Kecman MySQL Principal Technical Engineer
MySQL Group Replication Bogdan Kecman MySQL Principal Technical Engineer Bogdan.Kecman@oracle.com 1 Safe Harbor Statement The following is intended to outline our general product direction. It is intended
More informationPage 1. Key Value Storage" System Examples" Key Values: Examples "
CS162 Operating Systems and Systems Programming Lecture 15 Key-Value Storage, Network Protocols" October 22, 2012! Ion Stoica! http://inst.eecs.berkeley.edu/~cs162! Key Value Storage" Interface! put(key,
More informationConceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.
Conceptual Modeling on Tencent s Distributed Database Systems Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc. Outline Introduction System overview of TDSQL Conceptual Modeling on TDSQL Applications Conclusion
More informationDocument stores using CouchDB
2018 Document stores using CouchDB ADVANCED DATABASE PROJECT APARNA KHIRE, MINGRUI DONG aparna.khire@vub.be, mingdong@ulb.ac.be 1 Table of Contents 1. Introduction... 3 2. Background... 3 2.1 NoSQL Database...
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationDeep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services
Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationColumn-Family Databases Cassandra and HBase
Column-Family Databases Cassandra and HBase Kevin Swingler Google Big Table Google invented BigTableto store the massive amounts of semi-structured data it was generating Basic model stores items indexed
More informationHadoop/MapReduce Computing Paradigm
Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications
More informationA Non-Relational Storage Analysis
A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationCISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL
CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours
More informationPractical Big Data Processing An Overview of Apache Flink
Practical Big Data Processing An Overview of Apache Flink Tilmann Rabl Berlin Big Data Center www.dima.tu-berlin.de bbdc.berlin rabl@tu-berlin.de With slides from Volker Markl and data artisans 1 2013
More information5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414
Announcements Database Systems CSE 414 Lecture 15: NoSQL & JSON (mostly not in textbook only Ch 11.1) 1 Homework 4 due tomorrow night [No Web Quiz 5] Midterm grading hopefully finished tonight post online
More informationCassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent
Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these
More informationLecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013
Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest
More informationMySQL & NoSQL: The Best of Both Worlds
MySQL & NoSQL: The Best of Both Worlds Mario Beck Principal Sales Consultant MySQL mario.beck@oracle.com 1 Copyright 2012, Oracle and/or its affiliates. All rights Safe Harbour Statement The following
More informationPage 1. Key Value Storage" System Examples" Key Values: Examples "
Key Value Storage" CS162 Operating Systems and Systems Programming Lecture 15 Key-Value Storage, Network Protocols" March 20, 2013 Anthony D. Joseph http://inst.eecs.berkeley.edu/~cs162 Interface put(key,
More informationOne Trillion Edges. Graph processing at Facebook scale
One Trillion Edges Graph processing at Facebook scale Introduction Platform improvements Compute model extensions Experimental results Operational experience How Facebook improved Apache Giraph Facebook's
More informationFacebook Tao Distributed Data Store for the Social Graph
L. Lancia, G. Salillari Cloud Computing Master Degree in Data Science Sapienza Università di Roma Facebook Tao Distributed Data Store for the Social Graph L. Lancia & G. Salillari 1 / 40 Table of Contents
More informationApsaraDB for Redis. Product Introduction
ApsaraDB for Redis is compatible with open-source Redis protocol standards and provides persistent memory database services. Based on its high-reliability dual-machine hot standby architecture and seamlessly
More informationStoring data in databases
Storing data in databases The webinar will begin at 3pm You now have a menu in the top right corner of your screen. The red button with a white arrow allows you to expand and contract the webinar menu,
More informationGoogle Cloud Bigtable. And what it's awesome at
Google Cloud Bigtable And what it's awesome at Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes Agenda 1 Research 2 A story about bigness 3 How it works 4 When it's awesome Google Research
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly
More informationHDFS: Hadoop Distributed File System. Sector: Distributed Storage System
GFS: Google File System Google C/C++ HDFS: Hadoop Distributed File System Yahoo Java, Open Source Sector: Distributed Storage System University of Illinois at Chicago C++, Open Source 2 System that permanently
More informationBespoKV: Application Tailored Scale-Out Key-Value Stores
BespoKV: Application Tailored Scale-Out Key-Value Stores Ali Anwar, Yue Cheng, Hai Huang, Jingoo Han, Hyogi Sim, Dongyoon Lee, Fred Douglis, and Ali R. Butt BespoKV Role of Distributed KV stores in HPC
More informationHYRISE In-Memory Storage Engine
HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University
More informationNoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014
NoSQL Databases Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 10, 2014 Amir H. Payberah (SICS) NoSQL Databases April 10, 2014 1 / 67 Database and Database Management System
More informationScalable Search Engine Solution
Scalable Search Engine Solution A Case Study of BBS Yifu Huang School of Computer Science, Fudan University huangyifu@fudan.edu.cn COMP620028 Information Retrieval Project, 2013 Yifu Huang (FDU CS) COMP620028
More informationThe Internet and World Wide Web. Chapter4
The Internet and World Wide Web Chapter4 ITBIS105 IS-IT-UOB 2016 The Internet What is the Internet? Worldwide collection of millions of computers networks that connects ITBIS105 IS-IT-UOB 2016 2 History
More informationOnline Marketing Strategies to Increase Sales & Guest Loyalty. Monday, March 8, 2010
Online Marketing Strategies to Increase Sales & Guest Loyalty Monday, March 8, 2010 Fishbowl Cocktail Facts 10 years old, 130+ employees Alexandria, VA based Product Suite includes Email Marketing, Social
More informationSME Developing and managing your online presence. Presented by: Rasheed Girvan Global Directories
SME Developing and managing your online presence Presented by: Rasheed Girvan Global Directories DIGITAL MEDIA What is Digital Media Any media type in an electronic or digital format for the convenience
More informationMap Reduce. Yerevan.
Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate
More informationWHY EVERY ONLINE ECOMMERCE STORE NEEDS MOBILE APP?
WHY EVERY ONLINE ECOMMERCE STORE NEEDS MOBILE APP? ABSTRACT Mobile Apps Replacing Ecommerce Websites. Statistics Sales Forecast. Statistics Customer Engagement. What Google Has To Say About Mobile App?
More information