Consistency-preserving Caching of Dynamic Database Content

Size: px
Start display at page:

Download "Consistency-preserving Caching of Dynamic Database Content"

Transcription

1 Consistency-preserving Caching of Dynamic Database Content Niraj Tolia M. Satyanarayanan Carnegie Mellon University

2 Motivation Database Server Web and App. Servers Easy to geographically distribute web and app. servers Harder to distribute databases Pick Two: Consistency, Availability, Tolerance to Partitions How can you optimize the use of the WAN? 2

3 Outline Motivation Ganesh Overview Design and Implementation Evaluation Conclusion 3

4 High-level Overview of Ganesh Optimizes WAN usage for database accesses Without interpreting queries or results In a database-independent manner Uses a Content Addressable cache to Detect similarity between query results Eliminate redundancy over WAN link Tradeoff increased computation for network savings 4

5 Finding Result Similarity (A 10,000 foot view) SELECT queries 5

6 Finding Result Similarity (A 10,000 foot view) Query 1 SELECT queries 6

7 Finding Result Similarity (A 10,000 foot view) Query 1 SELECT queries 7

8 Finding Result Similarity (A 10,000 foot view) SELECT queries 8

9 Finding Result Similarity (A 10,000 foot view) SELECT queries 9

10 Finding Result Similarity (A 10,000 foot view) SELECT queries 10

11 Finding Result Similarity (A 10,000 foot view) SELECT queries 11

12 Finding Result Similarity (A 10,000 foot view) Query 2 SELECT queries 12

13 Finding Result Similarity (A 10,000 foot view) Query 2 SELECT queries 13

14 Finding Result Similarity (A 10,000 foot view) Data ID SELECT queries 14

15 Finding Result Similarity (A 10,000 foot view) Data ID SELECT queries 15

16 Finding Result Similarity (A 10,000 foot view) SELECT queries 16

17 Finding Result Similarity (A 10,000 foot view) SELECT queries 17

18 Finding Result Similarity (A 10,000 foot view) SELECT queries UPDATE queries are not optimized 18

19 Design Goals Transparent To both the application server and the database Does not weaken consistency Efficiently detects similarity 19

20 Design - Transparency Doesn t require modifications to the application and database server 20

21 Design - Transparency Doesn t require modifications to the application and database server Web and Application Server JDBC Driver DB 21

22 Design - Transparency Doesn t require modifications to the application and database server Web and Application Server JDBC Driver Ganesh Proxy DB 22

23 Design - Transparency Doesn t require modifications to the application and database server Web and Application Server Ganesh Proxy JDBC Driver DB 23

24 Design - Transparency Doesn t require modifications to the application and database server Web and Application Server Ganesh JDBC JDBC Driver Driver Ganesh Proxy JDBC Driver DB 24

25 Proxy-based Architecture Ganesh JDBC Driver Thin but smart -- conservative network use Contains in-memory cache Caches previous results at different granularities Uses a LRU replacement policy Ganesh Proxy Forwards queries to the database Eliminates redundancy in results 25

26 Detecting Similarity Ganesh uses Content Addressability Result Result Hash Cryptographic Hash Hash value is a globally unique identifier Independent of any particular system Infeasible to find another object with same hash If hash values are equal, so are the source objects Any small change in source completely changes hash 26

27 Exploiting Structure Exploit structure in results they look like tables SELECT name, address, zip, FROM USERS Name Address Zip John Doe 412 Avenue Mary Major 821 Lane John Stiles 701 Street T1L1J4 27

28 Exploiting Structure Exploit structure in results they look like tables SELECT name, address, zip, FROM USERS Name Address Zip John Doe 412 Avenue Mary Major 821 Lane John Stiles 701 Street T1L1J4 Result Hash 28

29 Exploiting Structure Exploit structure in results they look like tables SELECT name, address, zip, FROM USERS Name Address Zip John Doe 412 Avenue Mary Major 821 Lane John Stiles 701 Street T1L1J4 Result Hash 29

30 Exploiting Structure Exploit structure in results they look like tables SELECT name, address, zip, FROM USERS Name Address Zip John Doe 412 Avenue Row Hash 1 Mary Major 821 Lane mm@eg.com John Stiles 701 Street T1L1J4 js@eg.com Result Hash 30

31 Exploiting Structure Exploit structure in results they look like tables SELECT name, address, zip, FROM USERS Name Address Zip John Doe 412 Avenue Row Hash 1 Mary Major John Stiles 821 Lane 701 Street T1L1J4 mm@eg.com js@eg.com Row Hash 2 Result Hash 31

32 Exploiting Structure Exploit structure in results they look like tables SELECT name, address, zip, FROM USERS Name Address Zip John Doe 412 Avenue Row Hash 1 Mary Major John Stiles 821 Lane 701 Street T1L1J4 mm@eg.com js@eg.com Row Hash 2 Row Hash 3 Result Hash 32

33 Exploiting Structure Exploit structure in results they look like tables SELECT name, address, zip, FROM USERS Name Address Zip John Doe 412 Avenue Row Hash 1 Mary Major John Stiles 821 Lane 701 Street T1L1J4 mm@eg.com js@eg.com Row Hash 2 Row Hash 3 This process does not interpret data Result Hash 33

34 Putting it all together 34

35 Putting it all together Query 1 35

36 Putting it all together Query 1 36

37 Putting it all together 0x8c3aea3b Query 1 37

38 Putting it all together 0x8c3aea3b Query 1 0x8c3aea3b 38

39 Putting it all together 0x8c3aea3b 39

40 Putting it all together 0x8c3aea3b 40

41 Putting it all together 0x8c3aea3b 41

42 Putting it all together 0xda39a3ee 0x8c3aea3b 42

43 Putting it all together 0xda39a3ee 0x8c3aea3b 43

44 Putting it all together Query 2 0xda39a3ee 0x8c3aea3b 44

45 Putting it all together Query 2 0xda39a3ee 0x8c3aea3b 45

46 Putting it all together 0x8c3aea3b Query 2 0xda39a3ee 0x8c3aea3b 46

47 Putting it all together 0x8c3aea3b Query 2 0xda39a3ee 0x8c3aea3b 47

48 Putting it all together 0x8c3aea3b Query 2 0xda39a3ee 0x8c3aea3b 48

49 Putting it all together 0x8c3aea3b 0xda39a3ee 0x8c3aea3b 49

50 Putting it all together 0x8c3aea3b 0xda39a3ee 0x8c3aea3b 50

51 Putting it all together 0x8c3aea3b 0xda39a3ee 0x8c3aea3b 51

52 Putting it all together 0x8c3aea3b 0xda39a3ee 0x8c3aea3b 52

53 Putting it all together 0x8c3aea3b 0xda39a3ee 0x8c3aea3b 0x8c3aea3b 53

54 Putting it all together 0x8c3aea3b 0xda39a3ee 0x8c3aea3b 0x8c3aea3b No explicit cache-coherence algorithm needed 54

55 Outline Motivation Ganesh Overview Design and Implementation Evaluation Benchmarks and Experimental Setup Results Did exploiting structure help? Conclusion 55

56 Evaluation Benchmarks AUCTION (RUBiS) Models ebay Browsing, bidding, add auctions and feedback, BBOARD (RUBBoS) Models Slashdot Reading articles, adding comments, moderating, Benchmarks have RW and RO variants Performance metrics Throughput (Requests/sec) Latency (Average request response time) 56

57 Experimental Setup Clients Two configurations: Native: Unmodified Setup 57

58 Experimental Setup Clients Ganesh Proxy Two configurations: Native: Unmodified Setup Ganesh: Content-based optimizations used 58

59 Experimental Setup Clients Ganesh Proxy Two configurations: Native: Unmodified Setup Ganesh: Content-based optimizations used Evaluation Parameters 5 Mbit/s + 66ms, 20 Mbit/s + 33ms, 100 Mbit/s 400, 800, 1200, and 1600 clients 59

60 AUCTION Throughput Requests / sec Native Ganesh Higher is better Read-Only Mbit/s 20 Mbit/s 100 Mbit/s Test Clients 60

61 AUCTION - Latency Read-Only 10 1 Native Ganesh Lower is better Mbit/s 20 Mbit/s 100 Mbit/s Test Clients Avg. Resp. Time (sec)

62 BBOARD - Throughput Read-Write Requests / sec Native Ganesh Higher is better Mbit/s 20 Mbit/s 100 Mbit/s Test Clients 62

63 BBOARD - Latency Read-Write Native Ganesh Lower is better Mbit/s 20 Mbit/s 100 Mbit/s Test Clients Avg. Resp. Time (sec)

64 Rabin Fingerprinting vs. Structure Could we have used Rabin fingerprinting? Extensive use in storage systems Chunks data using a stochastic process ` Works well for in-place updates, deletes, insertions Does not work well for query results Reordering of data (ORDER BY) Also hard to pick average chunk size 64

65 Rabin vs. Exploiting Structure BBOARD - Read-Only Mb/s 20 Mb/s 100 Mb/s Test Clients Norm. Throughput

66 Related Work Caching Dynamic Database Content DBCache, DBProxy, MTCache, Per-application consistency model [Gao03, GlobeDB] Backend-scalability [C-JDBC, SSS, Ganeymed] Content-Addressable Systems TCP-level duplicate elimination [Riverbed, Spring00] P2P backup [Pastiche], Storage [Venti, SiS] Distributed File Systems [LBFS, CASPER, PAST] 66

67 Conclusion Ganesh: Optimizes database access over the WAN Transparent Does not weaken consistency Prototype built around Java and the JDBC interface 67

No-Compromise Caching of Dynamic Content from Relational Databases

No-Compromise Caching of Dynamic Content from Relational Databases No-Compromise Caching of Dynamic Content from Relational Databases Niraj Tolia and M. Satyanarayanan August 2006 CMU-CS-06-146 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213

More information

Consistency-preserving Caching of Dynamic Database Content

Consistency-preserving Caching of Dynamic Database Content Consistency-preserving Caching of Dynamic Database Content Niraj Tolia Carnegie Mellon University ntolia@cmu.edu M. Satyanarayanan Carnegie Mellon University satya@cs.cmu.edu ABSTRACT With the growing

More information

Opportunistic Use of Content Addressable Storage for Distributed File Systems

Opportunistic Use of Content Addressable Storage for Distributed File Systems Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia *, Michael Kozuch, M. Satyanarayanan *, Brad Karp, Thomas Bressoud, and Adrian Perrig * * Carnegie Mellon University,

More information

status Emmanuel Cecchet

status Emmanuel Cecchet status Emmanuel Cecchet c-jdbc@objectweb.org JOnAS developer workshop http://www.objectweb.org - c-jdbc@objectweb.org 1-23/02/2004 Outline Overview Advanced concepts Query caching Horizontal scalability

More information

Transactional Consistency and Automatic Management in an Application Data Cache Dan R. K. Ports MIT CSAIL

Transactional Consistency and Automatic Management in an Application Data Cache Dan R. K. Ports MIT CSAIL Transactional Consistency and Automatic Management in an Application Data Cache Dan R. K. Ports MIT CSAIL joint work with Austin Clements Irene Zhang Samuel Madden Barbara Liskov Applications are increasingly

More information

Improving Bandwidth Efficiency of Peer-to-Peer Storage

Improving Bandwidth Efficiency of Peer-to-Peer Storage Improving Bandwidth Efficiency of Peer-to-Peer Storage Patrick Eaton, Emil Ong, John Kubiatowicz University of California, Berkeley http://oceanstore.cs.berkeley.edu/ P2P Storage: Promise vs.. Reality

More information

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR Overview Technics of data reduction in storage systems:

More information

Migratory TCP (MTCP) Transport Layer Support for Highly Available Network Services

Migratory TCP (MTCP) Transport Layer Support for Highly Available Network Services Migratory TCP (MTCP) Transport Layer Support for Highly Available Network Services DisCo Lab Division of Computer and Information Sciences Rutgers University Nov. 29, 2001 CONS Light seminar 1 The People

More information

S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University

S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda The Ohio State University

More information

A native InfiniBand Transporter for MySQL Cluster

A native InfiniBand Transporter for MySQL Cluster A native for, Dirk Dunger, Torsten Mehlan, Torsten Höfler and Wolfgang Rehm Computer Architecture Group Department of Computer Science Chemnitz University of Technology February 8, 2007 Outline 1 2 3 4

More information

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture X: Parallel Databases Topics Motivation and Goals Architectures Data placement Query processing Load balancing

More information

GlobeTP: Template-Based Database Replication for Scalable. Web Applications

GlobeTP: Template-Based Database Replication for Scalable. Web Applications GlobeTP: Template-Based Database Replication for Scalable Page 1 of 18 Web Applications Tobias Groothuyse, Swaminathan Sivasubramanian, and Guillaume Pierre. In procedings of WWW 2007, May 8-12, 2007,

More information

<Insert Picture Here> MySQL Cluster What are we working on

<Insert Picture Here> MySQL Cluster What are we working on MySQL Cluster What are we working on Mario Beck Principal Consultant The following is intended to outline our general product direction. It is intended for information purposes only,

More information

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 15: NoSQL & JSON (mostly not in textbook only Ch 11.1) 1 Homework 4 due tomorrow night [No Web Quiz 5] Midterm grading hopefully finished tonight post online

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 16: NoSQL and JSon CSE 414 - Spring 2016 1 Announcements Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5] Today s lecture:

More information

Opportunistic Use of Content Addressable Storage for Distributed File Systems

Opportunistic Use of Content Addressable Storage for Distributed File Systems Opportunistic Use of Content Addressable Storage for Distributed File Systems Niraj Tolia, Michael Kozuch, Mahadev Satyanarayanan, Brad Karp, Thomas Bressoud, and Adrian Perrig Intel Research Pittsburgh,

More information

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 16: NoSQL and JSon Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5 Today s lecture: JSon The book covers

More information

NSF Future Internet Architecture. Outline. Predicting the Future is Hard! The expressive Internet Architecture: from Architecture to Network

NSF Future Internet Architecture. Outline. Predicting the Future is Hard! The expressive Internet Architecture: from Architecture to Network The expressive Internet Architecture: from Architecture to Network Peter Steenkiste Dave Andersen, David Eckhardt, Sara Kiesler, Jon Peha, Adrian Perrig, Srini Seshan, Marvin Sirbu, Hui Zhang Carnegie

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling

More information

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Design Tradeoffs for Data Deduplication Performance in Backup Workloads Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing

More information

Bridging the Processor/Memory Performance Gap in Database Applications

Bridging the Processor/Memory Performance Gap in Database Applications Bridging the Processor/Memory Performance Gap in Database Applications Anastassia Ailamaki Carnegie Mellon http://www.cs.cmu.edu/~natassa Memory Hierarchies PROCESSOR EXECUTION PIPELINE L1 I-CACHE L1 D-CACHE

More information

Prophecy: Using History for High Throughput Fault Tolerance

Prophecy: Using History for High Throughput Fault Tolerance Prophecy: Using History for High Throughput Fault Tolerance Siddhartha Sen Joint work with Wyatt Lloyd and Mike Freedman Princeton University Non crash failures happen Non crash failures happen Model as

More information

Class Overview. Two Classes of Database Applications. NoSQL Motivation. RDBMS Review: Client-Server. RDBMS Review: Serverless

Class Overview. Two Classes of Database Applications. NoSQL Motivation. RDBMS Review: Client-Server. RDBMS Review: Serverless Introduction to Database Systems CSE 414 Lecture 12: NoSQL 1 Class Overview Unit 1: Intro Unit 2: Relational Data Models and Query Languages Unit 3: Non-relational data NoSQL Json SQL++ Unit 4: RDMBS internals

More information

Performance Analysis for Crawling

Performance Analysis for Crawling Scalable Servers and Load Balancing Kai Shen Online Applications online applications Applications accessible to online users through. Examples Online keyword search engine: Google. Web email: Gmail. News:

More information

Amol Deshpande, UC Berkeley. Suman Nath, CMU. Srinivasan Seshan, CMU

Amol Deshpande, UC Berkeley. Suman Nath, CMU. Srinivasan Seshan, CMU Amol Deshpande, UC Berkeley Suman Nath, CMU Phillip Gibbons, Intel Research Pittsburgh Srinivasan Seshan, CMU IrisNet Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet

More information

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013 PNUTS: Yahoo! s Hosted Data Serving Platform Reading Review by: Alex Degtiar (adegtiar) 15-799 9/30/2013 What is PNUTS? Yahoo s NoSQL database Motivated by web applications Massively parallel Geographically

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

Deduplication Storage System

Deduplication Storage System Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business

More information

Announcements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless

Announcements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless Introduction to Database Systems CSE 414 Lecture 11: NoSQL 1 HW 3 due Friday Announcements Upload data with DataGrip editor see message board Azure timeout for question 5: Try DataGrip or SQLite HW 2 Grades

More information

Performance Evaluation of Virtualization Technologies

Performance Evaluation of Virtualization Technologies Performance Evaluation of Virtualization Technologies Saad Arif Dept. of Electrical Engineering and Computer Science University of Central Florida - Orlando, FL September 19, 2013 1 Introduction 1 Introduction

More information

More on Testing and Large Scale Web Apps

More on Testing and Large Scale Web Apps More on Testing and Large Scale Web Apps Testing Functionality Tests - Unit tests: E.g. Mocha - Integration tests - End-to-end - E.g. Selenium - HTML CSS validation - forms and form validation - cookies

More information

XCo: Explicit Coordination for Preventing Congestion in Data Center Ethernet

XCo: Explicit Coordination for Preventing Congestion in Data Center Ethernet XCo: Explicit Coordination for Preventing Congestion in Data Center Ethernet Vijay Shankar Rajanna, Smit Shah, Anand Jahagirdar and Kartik Gopalan Computer Science, State University of New York at Binghamton

More information

Typical scenario in shared infrastructures

Typical scenario in shared infrastructures Got control? AutoControl: Automated Control of MultipleVirtualized Resources Pradeep Padala, Karen Hou, Xiaoyun Zhu*, Mustfa Uysal, Zhikui Wang, Sharad Singhal, Arif Merchant, Kang G. Shin University of

More information

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon Computer Platforms in 198 Execution PROCESSOR 1 cycles/instruction Data and Instructions cycles

More information

Introduction. Distributed Systems IT332

Introduction. Distributed Systems IT332 Introduction Distributed Systems IT332 2 Outline Definition of A Distributed System Goals of Distributed Systems Types of Distributed Systems 3 Definition of A Distributed System A distributed systems

More information

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio)

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio) Introduction to Distributed Systems INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio) August 28, 2018 Outline Definition of a distributed system Goals of a distributed system Implications of distributed

More information

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. X, XXXXX Boafft: Distributed Deduplication for Big Data Storage in the Cloud

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. X, XXXXX Boafft: Distributed Deduplication for Big Data Storage in the Cloud TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. X, XXXXX 2016 1 Boafft: Distributed Deduplication for Big Data Storage in the Cloud Shengmei Luo, Guangyan Zhang, Chengwen Wu, Samee U. Khan, Senior Member,,

More information

Simplifying Wide-Area Application Development with WheelFS

Simplifying Wide-Area Application Development with WheelFS Simplifying Wide-Area Application Development with WheelFS Jeremy Stribling In collaboration with Jinyang Li, Frans Kaashoek, Robert Morris MIT CSAIL & New York University Resources Spread over Wide-Area

More information

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen *, Wen Xia, Fangting Huang, Qing

More information

Optimizing Performance for Partitioned Mappings

Optimizing Performance for Partitioned Mappings Optimizing Performance for Partitioned Mappings 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

More information

SQL Azure. Abhay Parekh Microsoft Corporation

SQL Azure. Abhay Parekh Microsoft Corporation SQL Azure By Abhay Parekh Microsoft Corporation Leverage this Presented by : - Abhay S. Parekh MSP & MSP Voice Program Representative, Microsoft Corporation. Before i begin Demo Let s understand SQL Azure

More information

CA485 Ray Walshe Google File System

CA485 Ray Walshe Google File System Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage

More information

CS 514: Transport Protocols for Datacenters

CS 514: Transport Protocols for Datacenters Department of Computer Science Cornell University Outline Motivation 1 Motivation 2 3 Commodity Datacenters Blade-servers, Fast Interconnects Different Apps: Google -> Search Amazon -> Etailing Computational

More information

Designing Next Generation Data-Centers with Advanced Communication Protocols and Systems Services

Designing Next Generation Data-Centers with Advanced Communication Protocols and Systems Services Designing Next Generation Data-Centers with Advanced Communication Protocols and Systems Services P. Balaji, K. Vaidyanathan, S. Narravula, H. W. Jin and D. K. Panda Network Based Computing Laboratory

More information

HYBRID TRANSACTION/ANALYTICAL PROCESSING COLIN MACNAUGHTON

HYBRID TRANSACTION/ANALYTICAL PROCESSING COLIN MACNAUGHTON HYBRID TRANSACTION/ANALYTICAL PROCESSING COLIN MACNAUGHTON WHO IS NEEVE RESEARCH? Headquartered in Silicon Valley Creators of the X Platform - Memory Oriented Application Platform Passionate about high

More information

AMGA metadata catalogue system

AMGA metadata catalogue system AMGA metadata catalogue system Hurng-Chun Lee ACGrid School, Hanoi, Vietnam www.eu-egee.org EGEE and glite are registered trademarks Outline AMGA overview AMGA Background and Motivation for AMGA Interface,

More information

Oracle WebCenter Portal Performance Tuning

Oracle WebCenter Portal Performance Tuning ORACLE PRODUCT LOGO Oracle WebCenter Portal Performance Tuning Rich Nessel - Principal Product Manager Christina Kolotouros - Product Management Director 1 Copyright 2011, Oracle and/or its affiliates.

More information

Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO. Oliver Michel

Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO. Oliver Michel Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO Oliver Michel Network monitoring is important Security issues Performance issues Equipment failure Analytics Platform

More information

ApsaraDB for Redis. Product Introduction

ApsaraDB for Redis. Product Introduction ApsaraDB for Redis is compatible with open-source Redis protocol standards and provides persistent memory database services. Based on its high-reliability dual-machine hot standby architecture and seamlessly

More information

Staged Memory Scheduling

Staged Memory Scheduling Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:

More information

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of California, Berkeley Operating Systems Principles

More information

SCALABLE WEB PROGRAMMING. CS193S - Jan Jannink - 2/02/10

SCALABLE WEB PROGRAMMING. CS193S - Jan Jannink - 2/02/10 SCALABLE WEB PROGRAMMING CS193S - Jan Jannink - 2/02/10 Weekly Syllabus 1.Scalability: (Jan.) 2.Agile Practices 3.Ecology/Mashups 4.Browser/Client 5.Data/Server: (Feb.) 6.Security/Privacy 7.Analytics*

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

Multi-Channel Clustered Web Application Servers

Multi-Channel Clustered Web Application Servers Multi-Channel Clustered Web Application Servers Masters Thesis Proposal Progress American University in Cairo Proposed by Karim Sobh (kmsobh@aucegypt.edu) Supervised by Dr. Ahmed Sameh (sameh@aucegypt.edu)

More information

Making Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS

Making Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS Making Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS What is a session store? A session store is An chunk of data that is connected to one user of a service user

More information

When Average is Not Average: Large Response Time Fluctuations in n-tier Applications. Qingyang Wang, Yasuhiko Kanemasa, Calton Pu, Motoyuki Kawaba

When Average is Not Average: Large Response Time Fluctuations in n-tier Applications. Qingyang Wang, Yasuhiko Kanemasa, Calton Pu, Motoyuki Kawaba When Average is Not Average: Large Response Time Fluctuations in n-tier Applications Qingyang Wang, Yasuhiko Kanemasa, Calton Pu, Motoyuki Kawaba Background & Motivation Analysis of the Large Response

More information

systems & research project

systems & research project class 4 systems & research project prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ index index knows order about the data data filtering data: point/range queries index data A B C sorted A B C initial

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

15-740/ Computer Architecture

15-740/ Computer Architecture 15-740/18-740 Computer Architecture Lecture 19: Caching II Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/31/2011 Announcements Milestone II Due November 4, Friday Please talk with us if you

More information

Citation for published version (APA): Ydraios, E. (2010). Database cracking: towards auto-tunning database kernels

Citation for published version (APA): Ydraios, E. (2010). Database cracking: towards auto-tunning database kernels UvA-DARE (Digital Academic Repository) Database cracking: towards auto-tunning database kernels Ydraios, E. Link to publication Citation for published version (APA): Ydraios, E. (2010). Database cracking:

More information

Stanford University Computer Science Department CS 240 Quiz 2 with Answers Spring May 24, total

Stanford University Computer Science Department CS 240 Quiz 2 with Answers Spring May 24, total Stanford University Computer Science Department CS 240 Quiz 2 with Answers Spring 2004 May 24, 2004 This is an open-book exam. You have 50 minutes to answer eight out of ten questions. Write all of your

More information

Performance Extrapolation across Servers

Performance Extrapolation across Servers Performance Extrapolation across Servers Subhasri Duttagupta www.cmgindia.org 1 Outline Why do performance extrapolation across servers? What are the techniques for extrapolation? SPEC-Rates of servers

More information

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, & Windsor Hsu Backup Recovery Systems Division EMC Corporation Introduction

More information

Lecture 5: Active & Overlay Networks"

Lecture 5: Active & Overlay Networks Lecture 5: Active & Overlay Networks" CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Amin Vahdat Lecture 5 Overview" Project discussion Brief intro to overlay networking Active networking

More information

SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers

SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers 2011 31st International Conference on Distributed Computing Systems Workshops SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers Lei Xu, Jian Hu, Stephen Mkandawire and Hong

More information

The Power of Prediction: Cloud Bandwidth and Cost Reduction

The Power of Prediction: Cloud Bandwidth and Cost Reduction The Power of Prediction: Cloud Bandwidth and Cost Reduction Eyal Zohar Israel Cidon Technion Osnat(Ossi) Mokryn Tel-Aviv College Traffic Redundancy Elimination (TRE) Traffic redundancy stems from downloading

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Improving Backup and Restore Performance for Deduplication-based Cloud Backup Services

Improving Backup and Restore Performance for Deduplication-based Cloud Backup Services University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

Goal. Outline. Outline. J2EE architecture. Enterprise JavaBeans. J2EE Performance Scalability and Clustering Part 1

Goal. Outline. Outline. J2EE architecture. Enterprise JavaBeans. J2EE Performance Scalability and Clustering Part 1 Emmanuel Cecchet INRIA Rhône-Alpes, ObjectWeb J2EE Performance Scalability and Clustering Part 1 Goal J2EE performance scalability evaluation design patterns communication layers Java Virtual Machine J2EE

More information

Contents Part I Traditional Deduplication Techniques and Solutions Introduction Existing Deduplication Techniques

Contents Part I Traditional Deduplication Techniques and Solutions Introduction Existing Deduplication Techniques Contents Part I Traditional Deduplication Techniques and Solutions 1 Introduction... 3 1.1 Data Explosion... 3 1.2 Redundancies... 4 1.3 Existing Deduplication Solutions to Remove Redundancies... 5 1.4

More information

Efficient Resource Allocation And Avoid Traffic Redundancy Using Chunking And Indexing

Efficient Resource Allocation And Avoid Traffic Redundancy Using Chunking And Indexing Efficient Resource Allocation And Avoid Traffic Redundancy Using Chunking And Indexing S.V.Durggapriya 1, N.Keerthika 2 M.E, Department of CSE, Vandayar Engineering College, Pulavarnatham, Thanjavur, T.N,

More information

Building Adaptive Performance Models for Dynamic Resource Allocation in Cloud Data Centers

Building Adaptive Performance Models for Dynamic Resource Allocation in Cloud Data Centers Building Adaptive Performance Models for Dynamic Resource Allocation in Cloud Data Centers Jin Chen University of Toronto Joint work with Gokul Soundararajan and Prof. Cristiana Amza. Today s Cloud Pay

More information

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value

More information

A Comparison of Capacity Management Schemes for Shared CMP Caches

A Comparison of Capacity Management Schemes for Shared CMP Caches A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip

More information

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd. A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd. 1 Agenda Introduction Background and Motivation Hybrid Key-Value Data Store Architecture Overview Design details Performance

More information

Lamassu: Storage-Efficient Host-Side Encryption

Lamassu: Storage-Efficient Host-Side Encryption Lamassu: Storage-Efficient Host-Side Encryption Peter Shah, Won So Advanced Technology Group 9 July, 2015 1 2015 NetApp, Inc. All rights reserved. Agenda 1) Overview 2) Security 3) Solution Architecture

More information

BenchLab An Open Testbed for Realistic Benchmarking of Web Applications

BenchLab An Open Testbed for Realistic Benchmarking of Web Applications BenchLab An Open Testbed for Realistic Benchmarking of Web Applications http://lass.cs.umass.edu/projects/benchlab/ Emmanuel Cecchet, Veena Udayabhanu, Timothy Wood, Prashant Shenoy University of Massachusetts

More information

c-through: Part-time Optics in Data Centers

c-through: Part-time Optics in Data Centers Data Center Network Architecture c-through: Part-time Optics in Data Centers Guohui Wang 1, T. S. Eugene Ng 1, David G. Andersen 2, Michael Kaminsky 3, Konstantina Papagiannaki 3, Michael Kozuch 3, Michael

More information

CIFS Acceleration Techniques

CIFS Acceleration Techniques CIFS Acceleration Techniques How to improve SMB traffic Plan Introduction Before Acceleration CIFS Acceleration Overview CIFS Acceleration Methods CIFS Acceleration Experience Platforms Q & A 2 Introduction

More information

Oracle Enterprise Architecture. Software. Hardware. Complete. Oracle Exalogic.

Oracle Enterprise Architecture. Software. Hardware. Complete. Oracle Exalogic. Oracle Enterprise Architecture Software. Hardware. Complete Oracle Exalogic edward.zhang@oracle.com Exalogic Exalogic Exalogic -- Exalogic Design Center Exalogic - Sun Oracle - - - CPU/Memory/Networking/Storage

More information

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

Software-defined Storage: Fast, Safe and Efficient

Software-defined Storage: Fast, Safe and Efficient Software-defined Storage: Fast, Safe and Efficient TRY NOW Thanks to Blockchain and Intel Intelligent Storage Acceleration Library Every piece of data is required to be stored somewhere. We all know about

More information

CS4513 Distributed Computer Systems

CS4513 Distributed Computer Systems Outline CS4513 Distributed Computer Systems Overview Goals Software Client Server Introduction (Ch 1: 1.1-1.2, 1.4-1.5) The Rise of Distributed Systems Computer hardware prices falling, power increasing

More information

Single Instance Storage Strategies

Single Instance Storage Strategies Single Instance Storage Strategies Michael Fahey, Hitachi Data Systems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this

More information

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies! DEMYSTIFYING BIG DATA WITH RIAK USE CASES Martin Schneider Basho Technologies! Agenda Defining Big Data in Regards to Riak A Series of Trade-Offs Use Cases Q & A About Basho & Riak Basho Technologies is

More information

CIT 668: System Architecture. Caching

CIT 668: System Architecture. Caching CIT 668: System Architecture Caching Topics 1. Cache Types 2. Web Caching 3. Replacement Algorithms 4. Distributed Caches 5. memcached A cache is a system component that stores data so that future requests

More information

QuartzV: Bringing Quality of Time to Virtual Machines

QuartzV: Bringing Quality of Time to Virtual Machines QuartzV: Bringing Quality of Time to Virtual Machines Sandeep D souza and Raj Rajkumar Carnegie Mellon University IEEE RTAS @ CPS Week 2018 1 A Shared Notion of Time Coordinated Actions Ordering of Events

More information

Distributed Versioning: Consistent Replication for Scaling Back-end Databases of Dynamic Content Web Sites

Distributed Versioning: Consistent Replication for Scaling Back-end Databases of Dynamic Content Web Sites Distributed Versioning: Consistent Replication for Scaling Back-end Databases of Dynamic Content Web Sites Cristiana Amza, Alan L. Cox and Willy Zwaenepoel Department of Computer Science, Rice University,

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 7: Mutable State (2/2) November 13, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are

More information

Application-specific Compression for Remote Visualization of Genomics Applications

Application-specific Compression for Remote Visualization of Genomics Applications Application-specific Compression for Remote Visualization of Genomics Applications Lars Ailo Bongo, Kai Li, Olga Troyanskaya,, Tore Larsen and Grant Wallace Motivation Outline Genomics applications WAN

More information

Network Swapping. Outline Motivations HW and SW support for swapping under Linux OS

Network Swapping. Outline Motivations HW and SW support for swapping under Linux OS Network Swapping Emanuele Lattanzi, Andrea Acquaviva and Alessandro Bogliolo STI University of Urbino, ITALY Outline Motivations HW and SW support for swapping under Linux OS Local devices (CF, µhd) Network

More information

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim,Christian Engelmann, and Galen Shipman

More information

GD-Wheel: A Cost-Aware Replacement Policy for Key-Value Stores

GD-Wheel: A Cost-Aware Replacement Policy for Key-Value Stores GD-Wheel: A Cost-Aware Replacement Policy for Key-Value Stores Conglong Li Carnegie Mellon University conglonl@cs.cmu.edu Alan L. Cox Rice University alc@rice.edu Abstract Memory-based key-value stores,

More information

Introduction to MySQL Cluster: Architecture and Use

Introduction to MySQL Cluster: Architecture and Use Introduction to MySQL Cluster: Architecture and Use Arjen Lentz, MySQL AB (arjen@mysql.com) (Based on an original paper by Stewart Smith, MySQL AB) An overview of the MySQL Cluster architecture, what's

More information

Solros: A Data-Centric Operating System Architecture for Heterogeneous Computing

Solros: A Data-Centric Operating System Architecture for Heterogeneous Computing Solros: A Data-Centric Operating System Architecture for Heterogeneous Computing Changwoo Min, Woonhak Kang, Mohan Kumar, Sanidhya Kashyap, Steffen Maass, Heeseung Jo, Taesoo Kim Virginia Tech, ebay, Georgia

More information