Distributed Systems. 29. Distributed Caching Paul Krzyzanowski. Rutgers University. Fall 2014

Similar documents
CS December 2017

Distributed Systems. 25. Distributed Caching & Some Peer-to-Peer Systems Paul Krzyzanowski. Rutgers University. Fall 2017

Memcached is an open source, high-performance, distributed memory object caching system.

Give Your Site A Boost With Memcache. Ben Ramsey September 25, 2009

Caching with Memcached & APC. Ben Ramsey TEK X May 21, 2010

CS November 2018

CS November 2017

Memcached is an open source, high-performance, distributed memory object caching system.

MEMCACHED - QUICK GUIDE MEMCACHED - OVERVIEW

Cassandra- A Distributed Database

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

Distributed Systems 16. Distributed File Systems II

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

DIVING IN: INSIDE THE DATA CENTER

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Developer Internship Opportunity at I-CC

pgmemcache and the over reliance on RDBMSs Copyright (c) 2004 Sean Chittenden. All rights reserved.

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

CISC 7610 Lecture 2b The beginnings of NoSQL

1

CS5412: DIVING IN: INSIDE THE DATA CENTER

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

Distributed File Systems II

memcached Functions For MySQL: Seemless caching for MySQL Patrick Galbraith, Lycos Inc.

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

A memcached implementation in Java. Bela Ban JBoss 2340

<Insert Picture Here> MySQL Cluster What are we working on

App Engine: Datastore Introduction

The Google File System

InnoDB: Status, Architecture, and Latest Enhancements

Distributed Systems. 18. MapReduce. Paul Krzyzanowski. Rutgers University. Fall 2015

CS /5/18. Paul Krzyzanowski 1. Credit. Distributed Systems 18. MapReduce. Simplest environment for parallel processing. Background.

CS5412: OTHER DATA CENTER SERVICES

MySQL. The Right Database for GIS Sometimes

vbuckets: The Core Enabling Mechanism for Couchbase Server Data Distribution (aka Auto-Sharding )

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

CIB Session 12th NoSQL Databases Structures

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

pymemcache Documentation

Distributed Data Store

CS5412: DIVING IN: INSIDE THE DATA CENTER

NPTEL Course Jan K. Gopinath Indian Institute of Science

Extreme Computing. NoSQL.

Internet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Hive Metadata Caching Proposal

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015

Internet Technology 3/2/2016

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

<Insert Picture Here> Introduction to MySQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

Comparing SQL and NOSQL databases

Accelerating NoSQL. Running Voldemort on HailDB. Sunny Gleason March 11, 2011

CS 655 Advanced Topics in Distributed Systems

Transitioning from C# to Scala Using Apache Thrift. Twitter Finagle

/ Cloud Computing. Recitation 6 October 2 nd, 2018

IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:

Large Scale MySQL Migration

Give Your Site a Boost With memcached. Ben Ramsey

ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt

BigTable. CSE-291 (Cloud Computing) Fall 2016

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

Utilizing Databases in Grid Engine 6.0

Introduction to NoSQL Databases

CloudI Integration Framework. Chicago Erlang User Group May 27, 2015

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Manual Trigger Sql Server 2008 Inserted Table Examples Insert

Real World Web Scalability. Ask Bjørn Hansen Develooper LLC

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

Getting Started with Memcached. Ahmed Soliman

Give Your Site a Boost With memcached. Ben Ramsey

W b b 2.0. = = Data Ex E pl p o l s o io i n

Binghamton University. CS-220 Spring Cached Memory. Computer Systems Chapter

Intro To Big Data. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2017

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

DATABASE DESIGN II - 1DL400

CLOUD-SCALE FILE SYSTEMS

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

What is database? Types and Examples

Postgres-XC PG session #3. Michael PAQUIER Paris, 2012/02/02

CSI33 Data Structures

PHP. MIT 6.470, IAP 2010 Yafim Landa

The Google File System (GFS)

Real Life Web Development. Joseph Paul Cohen

Distributed Filesystem

How to pimp high volume PHP websites. 27. September 2008, PHP conference Barcelona. By Jens Bierkandt

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

CA485 Ray Walshe Google File System

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

Applications of Paxos Algorithm

Transcription:

Distributed Systems 29. Distributed Caching Paul Krzyzanowski Rutgers University Fall 2014 December 5, 2014 2013 Paul Krzyzanowski 1

Caching Purpose of a cache Temporary storage to increase data access speeds Increase effective bandwidth by caching most frequently used data Raw data from slow devices Memory cache on CPUs Buffer cache in operating system Chubby file data and metadata GFS master caches all metadata in memory Computed data Results of database queries or file searches Avoid the need to look the same thing up again 2

Distributed In-Memory Caching A network memory-based caching service Shared by many typically used by front-end services Stores frequently-used (key, value) data Old data gets evicted General purpose Not tied to a specific back-end service Not transparent (usually) Because it s a general-purpose service, the programmer gets involved User-facing User-facing User-facing Service Service User-facing User-facing Service Service Service look here first if not found then look here Cache Back-end service 3

Deployment Models Separate caching server One or more computers whose sole purpose is to provide a caching service User-facing Service User-facing Service User-facing Service Cache server Cache server Cache server Or share cache memory among servers Take advantage of free memory from lightly-loaded nodes User-facing Service User-facing Service User-facing Service Cache server Cache server Cache server 4

What would you use it for? Cache user session state an web application servers No need to keep user coming back to the same computer Cache user preferences, shopping carts, etc. Avoid repeated database lookup Cache rendered HTML pages Avoid server-side includes, JSP/ASP/PHP code 5

Example: memcached Free & open source distributed memory caching Used by Facebook, Wikipedia, Flickr, Twitter, YouTube, Digg, Bebo, WordPress, Craigslist, memcached.org Protocol Binary & ASCII versions Client APIs for command line, C/C++, Go, PHP, Java, Python, Ruby, Perl, Windows/.NET, mysql, PostgreSQL, Erlang, Lua, LISP, ColdFusion, io 6

Example: memcached Key-Value store Cache is made up of { key, value, expiration time, flags } All access is O(1) Client software Provided with a list of memcached servers Hashing algorithm: chooses a server based on the key Server software Stores keys and values in an in-memory hash table Throw out old data when necessary LRU cache and time-based expiration Objects expire after a minute to ensure stale data is not returned Servers are unaware of each other 7

Memcached API Commands sent over TCP (UDP also available) Connection may be kept open indefinitely. Commands Storage Storage commands take an expiration time in seconds from current time or 0 = forever (but may be deleted) set store data add store data only if the server does not have data for the key replace store data if the server does have data for the key append add data after existing data prepend add data before existing data cas check & set: store data only if no one else updated it since I fetched it (cas = unique, 64-bit value associated with the item) Retrieval get retrieve one or more keys: returns key, flags, bytes, and cas unique 8

Memcached API Commands Deletion delete key Increment/decrement Treat data as a 64-bit unsigned integer and add/subtract value incr key value increment key by value decr key value decrement key by value Update expiration touch key exptime Update the expiration time Get Statistics stats various options for reporting statistics Flush flush_all clear the cache 9

Memcached Implementations Servers are pretty dumb They can store, get, delete, and report Clients need to know how to find servers Need to be given a list of servers Some clients are smarter than others Example: Ruby client uses Chord-like consistent hashing Allows incremental addition of servers BUT unlike Dynamo no migration is needed items just expire 10

What distributed caching is not It s not persistent or fault tolerant It s not Dynamo No back-end storage It s designed for data to expire Usually no transparent addition of new caching servers It s not an in-memory database No queries beyond a key lookup No locking or transactions MongoDB, CouchDB, SAP HANA, Oracle 12 Run (or can be made to run) as in-memory databases Distributed caching does not help if the data is already in memory at a server with fast lookup 11

Lots of other systems Terracotta BigMemory More than caching: Hadoop support, high availability) Ehcache (developed & maintained by Terracotta) Java-based, supports HA Facebook Flashcache 3.0 Designed to optimize use of SSDs Microsoft Velocity offers high availability through redundancy One Velocity service hosts provides clients with info about the cluster Infinispan Key/value store written in Java 12

Adding features: Microsoft Velocity Memcached is trivially simple by design Memory caching can be made more complex (full-featured?) Expirations Elapsed time or time of day Relations among objects If A depends on B and B depends on C If you remove A, then B and C will be removed as well Working with environments that update data but do not use the cache Specify that a cached item depends on a file Cache monitors the file Changes in the file invalidate/update the cached item 13

Adding features: Microsoft Velocity Working with environments that update a database Specify a SQL string to match rows in a DB If rows updated, DB causes cache to execute event-management code Read-through Make the cache transparent If data is not cached, the cache performs a DB query to get & cache data Write-through Again, make the cache transparent Updates to the cache will lead to updates in the database Handler defined in the cache system Item modification notification Subscribe to be notified if a cached entry changes Replication Data is replicated on multiple servers for high availability 14

The End December 5, 2014 2013 Paul Krzyzanowski 15