Counting is Hard: Probabilistically Counting Views at Reddit. Krishnan Chandra, Data Engineer

Size: px
Start display at page:

Download "Counting is Hard: Probabilistically Counting Views at Reddit. Krishnan Chandra, Data Engineer"

Transcription

1 Counting is Hard: Probabilistically Counting Views at Reddit Krishnan Chandra, Data Engineer

2 What is probabilistic counting? Overview How did probabilistic counting help us scale? What issues did we face along the way?

3 What is Reddit? Reddit is the frontpage of the internet A social network where there are tens of thousands of communities around whatever passions or interests you might have It s where people converse about the things that are most important to them

4 Reddit by the numbers 4th/7th Alexa Rank (US/World) 330M+ MAU 138K+ Active Communities 10.7M Posts per month 14B Screenviews per month

5 Counting Views

6 Why Count Views? Includes logged-out users Better measure of reach than votes Currently exposed to moderators and content creators

7 Cat Fist Bumping Cat Walking a Human

8 Why is Counting Hard?

9 Product Requirements Counts are over the life of a post The same user should not count multiple times within a short time frame Should build in some protections against spamming/cheating (similar to votes) Should provide (near) real-time feedback

10 Exact counting: Requires storing state per user per post Exact vs. Approximate Counting Approximate counting: Requires much less state and storage Provides an estimate of reach within a few percentage points of the exact number

11 HyperLogLog (HLL) Hash-based probabilistic algorithm published in 2007 Approximates set cardinality Works well for large cardinalities, but not for small ones HyperLogLog (And Friends) HyperLogLog++ Introduced by Google in 2013 Uses sparse and dense HLL representations Switches over to HLL once needed

12

13 Hash table consisting of m registers or buckets, each of width k bits Hash the input value, and split the hash value into 2 portions How does HLL work? First portion (log2m bits) used to index to a register Second portion used to count the number of leading zeros and set the register value

14 Assume: m=8 registers, k=3 bits input hash 1 1 r0 1 Register# leading zeroes Record 3+1=4 into Register# 7 r1 r2 r3 r4 r5 r6 r7 Adapted from HyperLogLog - A Layman s Overview 1 0 0

15 Estimate of cardinality is computed by taking the harmonic mean of the registers and raising 2 to that power Computing Cardinality Intuition: HLL is like flipping a coin! Largest run of heads gives an estimate of total number of flips

16 Counting Error HLL standard error Number of registers/hash buckets m Standard error = 1.04/sqrt(m) Using Redis s HLL implementation, standard error is 0.81%!

17 Using HLL to Count Views 1 HLL per post HLL inserts are idempotent! Allows reprocessing data if needed How to manage de-duping over short time window? Store user + truncated timestamp as the value

18

19 Exact counting: User id = 8 byte long ~1.5m users * 8 bytes = 12 MB Space Usage HLL (Redis implementation) Max size = 12 KB 0.1% of the exact counting storage

20 Counting Architecture

21 Architecture Goals 1. Consume a stream of view events and filter out spam/bad events 2. For good events, insert into an HLL in real time 3. Allow clients to consume views values in real time

22 Server Side Events Anti-Spam App Servers Client Side Events Counting

23 Kafka Main message bus for view events Stream Processing Infrastructure Redis Used for storing state + HLLs Intended as short term storage Functions as a cache for Cassandra Cassandra Used to store the final counts and HLLs in separate column families Intended as long term storage

24 Counting Application (Part 1) Anti-Spam Consumer Consumes the stream of views from Kafka Basic rules engine backed by Redis Consumer outputs a decision to a Kafka topic

25 Counting Application (Part 2) Counting Consumer Consumes the decisions topic output by the anti-spam consumer Creates/updates the HLL for the post in Redis. Stores both the count and the HLL filter out to Cassandra.

26 Scaling Challenges

27 Redis Problems Rules engine is very memory heavy HLL counting is very CPU-heavy Rules engine data is generally time-bound with expiry HLL data should be kept in Redis as long as possible to avoid reading from Cassandra

28 Solutions Separate Redis instances for the 2 parts of the application Different instance types to reflect the different workloads Allkeys-lru expiration on HLLs, volatile-ttl expiration on the rules engine

29

30 Cassandra Problems 1 row per post - overwritten frequently Read rate on page loads overwhelming the cluster Issues with load when catching up Storage grows forever with the number of posts!

31 Solutions Updates to the same row in Cassandra throttled to every 10 seconds Read caching Slow the update rate when catching up More disk!

32

33 Observations Views on Reddit skew towards newer posts Allows most views to be served by Redis Keeps read rate on Cassandra very low

34

35 Thanks to HLLs, counting views became much more efficient Current storage usage is ~1TB for a full year of posts! Takeaways Delivery was possible in a quarter with an engineering team of 3 (not always full time)

36 /u/gooeyblob - Cassandra + Backend /u/d3fect - Backend + API Thanks to our team! /u/powerlanguage - Product Management

37 Thanks! Krishnan Chandra u/shrink_and_an_arch PS: We re hiring!

38 View Counting at Reddit (Blog Post from 2017) References Original HyperLogLog paper Redis blog announcing HLL support Google paper announcing HLL++ algorithm HyperLogLog - A Layman s Overview

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

Making Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS

Making Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS Making Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS What is a session store? A session store is An chunk of data that is connected to one user of a service user

More information

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid

More information

CS November 2018

CS November 2018 Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

Home of Redis. April 24, 2017

Home of Redis. April 24, 2017 Home of Redis April 24, 2017 Introduction to Redis and Redis Labs Redis with MySQL Data Structures in Redis Benefits of Redis e 2 Redis and Redis Labs Open source. The leading in-memory database platform,

More information

Introduction to File Structures

Introduction to File Structures 1 Introduction to File Structures Introduction to File Organization Data processing from a computer science perspective: Storage of data Organization of data Access to data This will be built on your knowledge

More information

Preview. Memory Management

Preview. Memory Management Preview Memory Management With Mono-Process With Multi-Processes Multi-process with Fixed Partitions Modeling Multiprogramming Swapping Memory Management with Bitmaps Memory Management with Free-List Virtual

More information

Understanding Your Audience: Using Probabilistic Data Aggregation Jason Carey Software Engineer, Data

Understanding Your Audience: Using Probabilistic Data Aggregation Jason Carey Software Engineer, Data Understanding Your Audience: Using Probabilistic Data Aggregation Jason Carey Software Engineer, Data Products @jmcarey The Vision Insights Audience API Fast, ad hoc aggregate queries 10+ proprietary demographic

More information

Persistent Storage - Datastructures and Algorithms

Persistent Storage - Datastructures and Algorithms Persistent Storage - Datastructures and Algorithms 1 / 21 L 03: Virtual Memory and Caches 2 / 21 Questions How to access data, when sequential access is too slow? Direct access (random access) file, how

More information

Topics in P2P Networked Systems

Topics in P2P Networked Systems 600.413 Topics in P2P Networked Systems Week 4 Measurements Andreas Terzis Slides from Stefan Saroiu Content Delivery is Changing Thirst for data continues to increase (more data & users) New types of

More information

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since

More information

Using space-filling curves for multidimensional

Using space-filling curves for multidimensional Using space-filling curves for multidimensional indexing Dr. Bisztray Dénes Senior Research Engineer 1 Nokia Solutions and Networks 2014 In medias res Performance problems with RDBMS Switch to NoSQL store

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

Hypertable: The Storage Infrastructure behind Rediffmail - one of the World s Largest Services. Introduction. Current Architecture

Hypertable: The Storage Infrastructure behind Rediffmail - one of the World s Largest  Services. Introduction. Current Architecture Hypertable: The Storage Infrastructure behind Rediffmail - one of the World s Largest Email Services Doug Judd CEO, Hypertable Inc. Introduction Rediff.com India (Nasdaq: REDF) is one of India's top Internet

More information

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara Week 1-B-0 Week 1-B-1 CS535 BIG DATA FAQs Slides are available on the course web Wait list Term project topics PART 0. INTRODUCTION 2. DATA PROCESSING PARADIGMS FOR BIG DATA Sangmi Lee Pallickara Computer

More information

Intra-cluster Replication for Apache Kafka. Jun Rao

Intra-cluster Replication for Apache Kafka. Jun Rao Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

PS2 out today. Lab 2 out today. Lab 1 due today - how was it? 6.830 Lecture 7 9/25/2017 PS2 out today. Lab 2 out today. Lab 1 due today - how was it? Project Teams Due Wednesday Those of you who don't have groups -- send us email, or hand in a sheet with just your

More information

Characteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram

Characteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram Microprocessor Design & Organisation HCA2102 Cache Memory Characteristics Location Unit of transfer Access method Performance Physical type Physical Characteristics UTM-RHH Slide Set 5 2 Location Internal

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

HOW MANUAL AUTOMATION GOT US FROM AN IDEA TO A WORKING PRODUCT IN 2 WEEKS. The Story behind

HOW MANUAL AUTOMATION GOT US FROM AN IDEA TO A WORKING PRODUCT IN 2 WEEKS. The Story behind HOW MANUAL AUTOMATION GOT US FROM AN IDEA TO A WORKING PRODUCT IN 2 WEEKS. The Story behind Noam Schwartz & Alon Porat HackingRevenue.com QUICK REMINDER MANUAL AUTOMATION is a mindset. It means building

More information

CA485 Ray Walshe NoSQL

CA485 Ray Walshe NoSQL NoSQL BASE vs ACID Summary Traditional relational database management systems (RDBMS) do not scale because they adhere to ACID. A strong movement within cloud computing is to utilize non-traditional data

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 18: Virtual Memory Lecture Outline Review of Main Memory Virtual Memory Simple Interleaving Cycle

More information

Time-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018

Time-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018 Time-Series Data in MongoDB on a Budget Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018 TIME SERIES DATA in MongoDB on a Budget Click to add text

More information

Computer & Microprocessor Architecture HCA103

Computer & Microprocessor Architecture HCA103 Computer & Microprocessor Architecture HCA103 Cache Memory UTM-RHH Slide Set 4 1 Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation

More information

Apache Cassandra. Tips and tricks for Azure

Apache Cassandra. Tips and tricks for Azure Apache Cassandra Tips and tricks for Azure Agenda - 6 months in production Introduction to Cassandra Design and Test Getting ready for production The first 6 months 1 Quick introduction to Cassandra Client

More information

Kubernetes Integration with Virtuozzo Storage

Kubernetes Integration with Virtuozzo Storage Kubernetes Integration with Virtuozzo Storage A Technical OCTOBER, 2017 2017 Virtuozzo. All rights reserved. 1 Application Container Storage Application containers appear to be the perfect tool for supporting

More information

Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO. Oliver Michel

Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO. Oliver Michel Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO Oliver Michel Network monitoring is important Security issues Performance issues Equipment failure Analytics Platform

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Overview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition

Overview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition William Stallings Computer Organization and Architecture 6th Edition Chapter 4 Cache Memory Overview IN this chapter we will study 4.1 COMPUTER MEMORY SYSTEM OVERVIEW 4.2 CACHE MEMORY PRINCIPLES 4.3 ELEMENTS

More information

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019 MEMORY: SWAPPING Shivaram Venkataraman CS 537, Spring 2019 ADMINISTRIVIA - Project 2b is out. Due Feb 27 th, 11:59 - Project 1b grades are out Lessons from p2a? 1. Start early! 2. Sketch out a design?

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 20 Main Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Pages Pages and frames Page

More information

Learning to Play Well With Others

Learning to Play Well With Others Virtual Memory 1 Learning to Play Well With Others (Physical) Memory 0x10000 (64KB) Stack Heap 0x00000 Learning to Play Well With Others malloc(0x20000) (Physical) Memory 0x10000 (64KB) Stack Heap 0x00000

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly

More information

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Bigtable. Presenter: Yijun Hou, Yixiao Peng Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng

More information

Memory Management Prof. James L. Frankel Harvard University

Memory Management Prof. James L. Frankel Harvard University Memory Management Prof. James L. Frankel Harvard University Version of 5:42 PM 25-Feb-2017 Copyright 2017, 2015 James L. Frankel. All rights reserved. Memory Management Ideal memory Large Fast Non-volatile

More information

Buffering to Redis for Efficient Real-Time Processing. Percona Live, April 24, 2018

Buffering to Redis for Efficient Real-Time Processing. Percona Live, April 24, 2018 Buffering to Redis for Efficient Real-Time Processing Percona Live, April 24, 2018 Presenting Today Jon Hyman CTO & Co-Founder Braze (Formerly Appboy) @jon_hyman Mobile is at the vanguard of a new wave

More information

Caching and Buffering in HDF5

Caching and Buffering in HDF5 Caching and Buffering in HDF5 September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1 Software stack Life cycle: What happens to data when it is transferred from application buffer to HDF5 file and from HDF5

More information

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units CSCI 4717/5717 Computer Architecture Topic: Cache Memory Reading: Stallings, Chapter 4 Characteristics of Memory Location wrt Motherboard Inside CPU temporary memory or registers Motherboard main memory

More information

Chapter 4 Main Memory

Chapter 4 Main Memory Chapter 4 Main Memory Course Outcome (CO) - CO2 Describe the architecture and organization of computer systems Program Outcome (PO) PO1 Apply knowledge of mathematics, science and engineering fundamentals

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

Some Practice Problems on Hardware, File Organization and Indexing

Some Practice Problems on Hardware, File Organization and Indexing Some Practice Problems on Hardware, File Organization and Indexing Multiple Choice State if the following statements are true or false. 1. On average, repeated random IO s are as efficient as repeated

More information

Migrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring

Migrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring Migrating massive monitoring to Bigtable without downtime Martin Parm, Infrastructure Engineer for Monitoring This is a big deal. -- Nicholas Harteau/VP, Engineering & Infrastructure https://news.spotify.com/dk/2016/02/23/announcing-spotify-infrastructures-googley-future/

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Virtual Memory. Today.! Virtual memory! Page replacement algorithms! Modeling page replacement algorithms

Virtual Memory. Today.! Virtual memory! Page replacement algorithms! Modeling page replacement algorithms Virtual Memory Today! Virtual memory! Page replacement algorithms! Modeling page replacement algorithms Reminder: virtual memory with paging! Hide the complexity let the OS do the job! Virtual address

More information

CS 186/286 Spring 2018 Midterm 1

CS 186/286 Spring 2018 Midterm 1 CS 186/286 Spring 2018 Midterm 1 Do not turn this page until instructed to start the exam. You should receive 1 single-sided answer sheet and a 13-page exam packet. All answers should be written on the

More information

FAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan

FAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan FAWN A Fast Array of Wimpy Nodes David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan Carnegie Mellon University *Intel Labs Pittsburgh Energy in computing

More information

Chapter 8: Memory-Management Strategies

Chapter 8: Memory-Management Strategies Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

Solutions for Netezza Performance Issues

Solutions for Netezza Performance Issues Solutions for Netezza Performance Issues Vamsi Krishna Parvathaneni Tata Consultancy Services Netezza Architect Netherlands vamsi.parvathaneni@tcs.com Lata Walekar Tata Consultancy Services IBM SW ATU

More information

DNS Traffic Sampling

DNS Traffic Sampling DNS Traffic Sampling A HyperLogLog seasoned implementation for dnscap Madrid 2017-05-14 Alexander Mayrhofer Head of R&D DNS Sampling - Background Operational Monitoring of DNS traffic Practice of many

More information

Distributed Data Store

Distributed Data Store Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is

More information

Introduction to computers

Introduction to computers Introduction to Computers 1 Introduction to computers You will learn what are the basic components of a computer system and the rudiments of how those components work. Are Computers Really So Confusing?

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and Jaliya Ekanayake Range in size from edge facilities

More information

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache Is the cache indexed with virtual or physical address? To index with a physical address, we

More information

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES OBJECTIVES Detailed description of various ways of organizing memory hardware Various memory-management techniques, including paging and segmentation To provide

More information

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Google Bigtable 2 A distributed storage system for managing structured data that is designed to scale to a very

More information

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012 Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 6 April 12, 2012 1 Acknowledgements: The slides are provided by Nikolaus Augsten

More information

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

Chapter 8: Main Memory. Operating System Concepts 9 th Edition Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

Computer Organization

Computer Organization University of Pune S.E. I.T. Subject code: 214442 Computer Organization Part 20 : Memory Organization Basics UNIT IV Tushar B. Kute, Department of Information Technology, Sandip Institute of Technology

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

Introduction to Virtual Memory Management

Introduction to Virtual Memory Management Introduction to Virtual Memory Management Minsoo Ryu Department of Computer Science and Engineering Virtual Memory Management Page X Demand Paging Page X Q & A Page X Memory Allocation Three ways of memory

More information

Auto Management for Apache Kafka and Distributed Stateful System in General

Auto Management for Apache Kafka and Distributed Stateful System in General Auto Management for Apache Kafka and Distributed Stateful System in General Jiangjie (Becket) Qin Data Infrastructure @LinkedIn GIAC 2017, 12/23/17@Shanghai Agenda Kafka introduction and terminologies

More information

SCYLLA: NoSQL at Ludicrous Speed. 主讲人 :ScyllaDB 软件工程师贺俊

SCYLLA: NoSQL at Ludicrous Speed. 主讲人 :ScyllaDB 软件工程师贺俊 SCYLLA: NoSQL at Ludicrous Speed 主讲人 :ScyllaDB 软件工程师贺俊 Today we will cover: + Intro: Who we are, what we do, who uses it + Why we started ScyllaDB + Why should you care + How we made design decisions to

More information

Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop

Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop K. Senthilkumar PG Scholar Department of Computer Science and Engineering SRM University, Chennai, Tamilnadu, India

More information

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1 Filesystem Disclaimer: some slides are adopted from book authors slides with permission 1 Storage Subsystem in Linux OS Inode cache User Applications System call Interface Virtual File System (VFS) Filesystem

More information

Research Students Lecture Series 2015

Research Students Lecture Series 2015 Research Students Lecture Series 215 Analyse your big data with this one weird probabilistic approach! Or: applied probabilistic algorithms in 5 easy pieces Advait Sarkar advait.sarkar@cl.cam.ac.uk Research

More information

Heckaton. SQL Server's Memory Optimized OLTP Engine

Heckaton. SQL Server's Memory Optimized OLTP Engine Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability

More information

CS 3733 Operating Systems:

CS 3733 Operating Systems: CS 3733 Operating Systems: Topics: Memory Management (SGG, Chapter 08) Instructor: Dr Dakai Zhu Department of Computer Science @ UTSA 1 Reminders Assignment 2: extended to Monday (March 5th) midnight:

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

Mismatch of CPU and MM Speeds

Mismatch of CPU and MM Speeds Fö 3 Cache-Minne Introduction Cache design Replacement and write policy Zebo Peng, IDA, LiTH Mismatch of CPU and MM Speeds Cycle Time (nano second) 0 4 0 3 0 0 Main Memory CPU Speed Gap (ca. one order

More information

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies. M6 Memory Hierarchy Module Outline CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies. Events on a Cache Miss Events on a Cache Miss Stall the pipeline.

More information

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs

More information

Time Series Live 2017

Time Series Live 2017 1 Time Series Schemas @Percona Live 2017 Who Am I? Chris Larsen Maintainer and author for OpenTSDB since 2013 Software Engineer @ Yahoo Central Monitoring Team Who I m not: A marketer A sales person 2

More information

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache 2 Virtually Indexed Caches 24-bit virtual address, 4KB page size 12 bits offset and 12 bits

More information

Top Trends in DBMS & DW

Top Trends in DBMS & DW Oracle Top Trends in DBMS & DW Noel Yuhanna Principal Analyst Forrester Research Trend #1: Proliferation of data Data doubles every 18-24 months for critical Apps, for some its every 6 months Terabyte

More information

Virtual to physical address translation

Virtual to physical address translation Virtual to physical address translation Virtual memory with paging Page table per process Page table entry includes present bit frame number modify bit flags for protection and sharing. Page tables can

More information

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of

More information

CS 550 Operating Systems Spring File System

CS 550 Operating Systems Spring File System 1 CS 550 Operating Systems Spring 2018 File System 2 OS Abstractions Process: virtualization of CPU Address space: virtualization of memory The above to allow a program to run as if it is in its own private,

More information

CMU Storage Systems 20 Feb 2004 Fall 2005 Exam 1. Name: SOLUTIONS

CMU Storage Systems 20 Feb 2004 Fall 2005 Exam 1. Name: SOLUTIONS CMU 18 746 Storage Systems 20 Feb 2004 Fall 2005 Exam 1 Instructions Name: SOLUTIONS There are three (3) questions on the exam. You may find questions that could have several answers and require an explanation

More information

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:

More information

Outlook. File-System Interface Allocation-Methods Free Space Management

Outlook. File-System Interface Allocation-Methods Free Space Management File System Outlook File-System Interface Allocation-Methods Free Space Management 2 File System Interface File Concept File system is the most visible part of an OS Files storing related data Directory

More information

Computer Organization

Computer Organization University of Pune S.E. I.T. Subject code: 214442 Computer Organization Part 20 : Memory Organization Basics UNIT IV Tushar B. Kute, Department of Information Technology, Sandip Institute of Technology

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and 64-bit Architectures Example:

More information

Desktop Crawls. Document Feeds. Document Feeds. Information Retrieval

Desktop Crawls. Document Feeds. Document Feeds. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Web crawlers Retrieving web pages Crawling the web» Desktop crawlers» Document feeds File conversion Storing the documents Removing noise Desktop Crawls! Used

More information

CISC 360. Cache Memories Exercises Dec 3, 2009

CISC 360. Cache Memories Exercises Dec 3, 2009 Topics ν CISC 36 Cache Memories Exercises Dec 3, 29 Review of cache memory mapping Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. ν Hold frequently

More information

Armon HASHICORP

Armon HASHICORP Nomad Armon Dadgar @armon Cluster Manager Scheduler Nomad Cluster Manager Scheduler Nomad Schedulers map a set of work to a set of resources Work (Input) Resources Web Server -Thread 1 Web Server -Thread

More information

Nicholas Dritsas Principal Program Manager Microsoft Corporation Microsoft Corporation. All rights reserved

Nicholas Dritsas Principal Program Manager Microsoft Corporation Microsoft Corporation. All rights reserved Nicholas Dritsas Principal Program Manager Microsoft Corporation Who is SQL Customer Advisory Team (SQL CAT) Overview of large AS projects Lessons Learned People and Infrastructure Performance Improving

More information

The Memory Hierarchy 1

The Memory Hierarchy 1 The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow

More information

MEMORY. Objectives. L10 Memory

MEMORY. Objectives. L10 Memory MEMORY Reading: Chapter 6, except cache implementation details (6.4.1-6.4.6) and segmentation (6.5.5) https://en.wikipedia.org/wiki/probability 2 Objectives Understand the concepts and terminology of hierarchical

More information

ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Circle the memory type based on electrically re-chargeable elements

ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Circle the memory type based on electrically re-chargeable elements ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Student name: Date: Example 1 Section: Memory hierarchy (SRAM, DRAM) Question # 1.1 Circle the memory type based on electrically re-chargeable elements

More information

HY225 Lecture 12: DRAM and Virtual Memory

HY225 Lecture 12: DRAM and Virtual Memory HY225 Lecture 12: DRAM and irtual Memory Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS May 16, 2011 Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 1 / 36 DRAM Fundamentals Random-access

More information

Study of NoSQL Database Along With Security Comparison

Study of NoSQL Database Along With Security Comparison Study of NoSQL Database Along With Security Comparison Ankita A. Mall [1], Jwalant B. Baria [2] [1] Student, Computer Engineering Department, Government Engineering College, Modasa, Gujarat, India ank.fetr@gmail.com

More information

ADVANCED REVIEW FOR MAGENTO 2

ADVANCED REVIEW FOR MAGENTO 2 1 User Guide Advanced Review for Magento 2 ADVANCED REVIEW FOR MAGENTO 2 USER GUIDE BSS COMMERCE 1 2 User Guide Advanced Review for Magento 2 Contents 1. Advanced Review for Magento 2 Overview... 3 2.

More information

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level

More information

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

Index Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search

Index Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing

More information