From architecture to algorithms: Lessons from the FAWN project
|
|
- Rudolf Norris
- 5 years ago
- Views:
Transcription
1 From architecture to algorithms: Lessons from the FAWN project David Andersen, Vijay Vasudevan, Michael Kaminsky*, Michael A. Kozuch*, Amar Phanishayee, Lawrence Tan, Jason Franklin, Iulian Moraru, Sang Kil Cha, Hyeontaek Lim, Bin Fan, Reinhard Munz, Nathan Wan, Jack Ferris, Hrishikesh Amur**, Wolfgang Richter, Michael Freedman***, Wyatt Lloyd***, Padmanabhan Pillali*, Dong Zhou Carnegie Mellon University *Intel Labs Pittsburgh ** Princeton University *** Georgia Tech
2 Power limits computing The Dalles Dam 2
3 Infrastructure: PUE 2000W 2005: : ~1.1 Leave it to industry 100% 100% Efficiency 1000W 300W Servers FAWNs Proportionality 20% 20% Combined W <100W 200W
4 Builds up to... Computation / total energy = 1/PUE * 1/SPUE * Computational Work/ Total Energy to Components 4
5 Efficiency vs load Source: WSC book 5
6 Points to make DVFS used to be effective; not so good anymore Core is decently proportional; rest of components aren t. 6
7 Slow CPU Flash SSD Less DRAM 7
8 Efficiency is not free Because speed is not cheap 8
9 Principles Key-Value Systems FAWN-KV Design Evaluation Gigahertz is not free Speed and power calculated from specification sheets Power includes system overhead (e.g., Ethernet) 2500 Instructions/sec/W in millions Instructions Joule Custom ARM Mote XScale 800Mhz Atom Z500 Xeon Instructions/sec in millions (Speed)
10 Principles Key-Value Systems FAWN-KV Design Evaluation The Memory Wall Transistors 1E+08 1E+07 1ms 1E+06 Disk Seek Have the soul of a capacitor Nanoseconds 1E+05 1E+04 1μs 1E+03 1E+02 1E+01 1ns 1E+00 Bridge gap: Caching, speculation, etc. DRAM Access CPU Cycle 1E Year Charge Here Moves charge carriers here Which lets current flow
11 Two pillars Gigahertz costs twice: Once for the switching speed Once for the memory wall Memory capacity costs (at least) once: Longer buses < efficient
12 Wimpy Nodes 1.6 GHz Dual- core Atom GB Flash SSD Only 1 GB DRAM! 12
13 Each decimal order of magnitude increase in parallelism requires a major redesign and rewrite of parallel code - Kathy Yelick 13
14 The FAWN Quad of Pain Load Balancing Parallelization Bigger Clusters Wimpy Nodes Hardware Specificity Memory Capacity
15 It s not just masochism Moore Dennard (Figures from Danowitz, Kelley, Mao, Stevenson, and Horowitz: CPU DB) All systems will face this challenge over time
16 FAWN: It started with a key-value store 16
17 Principles Key-Value Systems FAWN-KV Design Evaluation Key-value storage systems Critical infrastructure service Performance-conscious Random-access, read-mostly, hard to cache 17
18 Principles Key-Value Systems FAWN-KV Design Evaluation Small record, random access Select name,photo from users where uid=513542; Select wallpost from posts where pid= ; Select name,photo from users where uid=818503; Select wallpost from posts where pid= ; Select name,photo from users where uid=474488; Select name,photo from users where uid=124111; Select name,photo from users where uid=124566; Select name,photo from users where uid=42223; Select name,photo from users where uid=468883; Select wallpost from posts where pid= ; Select name,photo from users where uid=097788; Select wallpost from posts where pid= ; Select name,photo from users where uid=357845; 18
19 FAWN-DS FAWN-KV SILT Small Cache Cuckoo key-value Parallel, backend fast, store backend A cluster-distributed store memory-efficient Provable load one node hyper-optimized key-value balancing store optimized memcached for wimpy Minimizes using for low DRAM work on churn using a tiny cache nodes optimistic and flash and large flash cuckoo hashing Cuckoo Fawn-KV Cache Cache Fawn-DS SILT Fawn-DS SILT Fawn-DS SILT
20 Principles Key-Value Systems FAWN-KV Design Evaluation FAWN-DS and -KV: Key-value Storage System Goal: improve Queries/Joule Unique Challenges: Wimpy CPUs, limited DRAM Flash poor at small random writes Sustain performance during membership changes 500MHz CPU 256MB DRAM 4GB CompactFlash 20
21 Principles Key-Value Systems FAWN-KV Design Evaluation FAWN-KV Architecture Front-end Switch Back-end Back-end FAWN Back-end FAWN-DS FAWN Back-end FAWN-DS Back-end FAWN Back-end FAWN-DS Back-end FAWN Back-end Back-end FAWN Back-end FAWN-DS FAWN-DS 21
22 Principles Key-Value Systems FAWN-KV Design Evaluation Avoiding random writes In DRAM Hashtable Hash Index In Flash Data region Put K1,V Get K2 All writes to Flash are sequential 22
23 With limited DRAM KeyFrag!= Key Potential collisions! DRAM 160-bit key Flash Log Entry Key Len Data Low probability of multiple Flash reads Hashtable Hash Index Data region KeyFrag Valid. { 12 bytes per entry Offset (a) Partial-key hashing 23
24 Principles Key-Value Systems FAWN-KV Design Evaluation Evaluation Takeaways 2008: FAWN-based system 6x more efficient than traditional systems Partial-key hashing enabled memoryefficient DRAM index for flash-resident data Can create high-peformance, predictable storage service for small key-value pairs 24
25 And then we moved to Atom + SSD Geode 500Mhz 256MB 4GB CF Card ~2k IOPS 6x 8x 30-60x Atom 1.6 Ghz single-core 2GB 120GB SSD ~60k IOPS 25
26 FAWN-DS FAWN-KV SILT Small Cache Cuckoo backend store Fawn-DS SILT hyper-optimized for low DRAM and large flash Fawn-KV Fawn-DS SILT Fawn-DS SILT
27 Flash Must be Used Carefully Random reads / sec 48,000 Fast, but not THAT fast $ / GB 1.83 Space is precious Another long- standing problem: random writes are slow and bad for flash life (wearout)
28 Three Metrics to Minimize Memory overhead = Index size per entry Ideally 0 bytes/entry (no memory overhead) Read amplificadon = Flash reads per query Limits query throughput Ideally 1 (no wasted flash reads) Write amplificadon = Flash writes per entry Limits insert throughput Also reduces flash life expectancy Must be small enough for flash to last a few years
29 Read amplificadon SkimpyStash (2011) HashCache (2009) BufferHash (2010) FlashStore (2010) FAWN- DS (2009) Memory overhead (bytes/entry)
30 SkimpyStash FAWN- DS FlashStore HashCache BufferHash Memory efficiency High performance
31 (staec) External DicEonary DRAM Index Hash Index Flash Data Prior state of the art: EPH : ~3.8 bits/entry Ours: Entropy-coded tries, ~2.5 bits/entry Important considerations: Construction speed; query speed Aw, it s read-only...
32 SoluEon: (1) Three Stores with (2) New Index Data Structures Queries look up stores in sequence (from new to old) Inserts only go to Log Data are moved in background Entropy- Coded Tries (Memory efficient) SILT Filter SILT Log Index (Write friendly) Memory Flash Sorted 32
33 Workload: 90% GET (100~ M keys) + 10% PUT Caveat: Not on wimpies. SEll working on reducing CPU cost! :- )
34 And now... Load imbalance Distributed key- value system Atom CPU Backend1 SSD SLA: 850,000 queries/sec 1. get(key) 4. return val FrontEnd 10,000 queries/sec Backend2 3. val=lookup(key) 2. BackendID=hash(key) Back.. 85 Back
35 Measured tput on FAWN testbed Overall throughput (KQPS) uniform Zipf (1.01) adversarial n: number of nodes 35
36 Overall throughput (KQPS) uniform Zipf (1.01) adversarial Backend n: number of nodes Queries FrontEnd cache Backend2 Backend8 How many items to cache? Backend8 36
37 small/fast cache is enough! E.g., for 1KB (k,v) pair, 85 nodes, 3MB needed, fifng in CPU L3 cache Backend1 Queries FrontEnd cache Backend2 Backend8 We prove that, for n nodes - Only need to cache O(n log n) most popular entries - worst case perf. = (1 - ε) * n * single node capacity Backend8 37
38 Cache forces near- uniform dist. Popularity Cached Keys Uncached keys KeyID 38
39 Worst case? Now best case Overall throughput (KQPS) uniform Zipf (1.01) adversarial n: number of nodes Overall throughput (KQPS) uniform Zipf (1.01) adversarial n: number of nodes 39
40 Thus... 40
41 FAWN-DS FAWN-KV SILT Small Cache Cuckoo Wimpy servers [FAWN, SOSP 2009] Brawny server SILT Insanely SILT Fast Cache SILT [SILT, SOSP 2011] O(N log N) [ small cache socc 2011] Entropy-coded tries [SILT + under submission] Multi-reader parallel cuckoo hashing [under submission] Partial-key cuckoo hashing Cuckoo filter
42 Moore Dennard highly parallel, lower-ghz, (memoryconstrained?): Architectures, algorithms, and programming
FAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan
FAWN A Fast Array of Wimpy Nodes David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan Carnegie Mellon University *Intel Labs Pittsburgh Energy in computing
More informationSystem and Algorithmic Adaptation for Flash
System and Algorithmic Adaptation for Flash The FAWN Perspective David G. Andersen, Vijay Vasudevan, Michael Kaminsky* Amar Phanishayee, Jason Franklin, Iulian Moraru, Lawrence Tan Carnegie Mellon University
More informationFAWN: A Fast Array of Wimpy Nodes
FAWN: A Fast Array of Wimpy Nodes David G. Andersen, Jason Franklin, Michael Kaminsky *, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan Carnegie Mellon University, * Intel Labs SOSP 09 CAS ICT Storage
More informationBe Fast, Cheap and in Control with SwitchKV Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for
More informationSILT: A MEMORY-EFFICIENT, HIGH-PERFORMANCE KEY- VALUE STORE PRESENTED BY PRIYA SRIDHAR
SILT: A MEMORY-EFFICIENT, HIGH-PERFORMANCE KEY- VALUE STORE PRESENTED BY PRIYA SRIDHAR AGENDA INTRODUCTION Why SILT? MOTIVATION SILT KV STORAGE SYSTEM LOW READ AMPLIFICATION CONTROLLABLE WRITE AMPLIFICATION
More informationMemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing
MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU), Dave Andersen (CMU), Michael Kaminsky (Intel Labs) NSDI 2013 http://www.pdl.cmu.edu/ 1 Goal: Improve Memcached 1. Reduce space overhead
More informationFAWN as a Service. 1 Introduction. Jintian Liang CS244B December 13, 2017
Liang 1 Jintian Liang CS244B December 13, 2017 1 Introduction FAWN as a Service FAWN, an acronym for Fast Array of Wimpy Nodes, is a distributed cluster of inexpensive nodes designed to give users a view
More informationBe Fast, Cheap and in Control with SwitchKV. Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level
More informationPart II: Software Infrastructure in Data Centers: Key-Value Data Management Systems
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Software Infrastructure in Data Centers: Key-Value Data Management Systems 1 Key-Value Store Clients
More informationSILT: A Memory-Efficient, High- Performance Key-Value Store
SILT: A Memory-Efficient, High- Performance Key-Value Store SOSP 11 Presented by Fan Ni March, 2016 SILT is Small Index Large Tables which is a memory efficient high performance key value store system
More informationFAWN: A Fast Array of Wimpy Nodes
To appear in 22nd ACM Symposium on Operating Systems Principles (SOSP 09) This version is reformatted from the official version that appears in the conference proceedings. FAWN: A Fast Array of Wimpy Nodes
More informationImproving Datacenter Energy Efficiency Using a Fast Array of Wimpy Nodes
Improving Datacenter Energy Efficiency Using a Fast Array of Wimpy Nodes Vijay Vasudevan vrv+@cs.cmu.edu October 12, 2010 THESIS PROPOSAL Computer Science Department Carnegie Mellon University Pittsburgh,
More informationPart II: Software Infrastructure in Data Centers: Key-Value Data Management Systems
CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services Part II: Software Infrastructure in Data Centers: Key-Value Data Management Systems 1 Key-Value Store Clients
More informationC 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.
Recap CSE 486/586 Distributed Systems Distributed File Systems Optimistic quorum Distributed transactions with replication One copy serializability Primary copy replication Read-one/write-all replication
More informationCS3600 SYSTEMS AND NETWORKS
CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection
More informationHashCache: Cache Storage for the Next Billion
HashCache: Cache Storage for the Next Billion Anirudh Badam KyoungSoo Park Vivek S. Pai Larry L. Peterson Princeton University 1 Next Billion Internet Users 2 Next Billion Internet Users Schools, urban
More informationFAWN: A Fast Array of Wimpy Nodes By David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan
FAWN: A Fast Array of Wimpy Nodes By David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan doi:10.1145/1965724.1965747 Abstract This paper presents a
More informationCheap and Large CAMs for High Performance Data-Intensive Networked Systems- The Bufferhash KV Store
Cheap and Large CAMs for High Performance Data-Intensive Networked Systems- The Bufferhash KV Store Presented by Akhila Nookala M.S EE Wayne State University ECE7650 Scalable and Secure Internet Services
More informationKVZone and the Search for a Write-Optimized Key-Value Store
KVZone and the Search for a Write-Optimized Key-Value Store Salil Gokhale, Nitin Agrawal, Sean Noonan, Cristian Ungureanu NEC Laboratories America, Princeton, NJ {salil, nitin, sean, cristian}@nec-labs.com
More informationMlcached: Multi-level DRAM-NAND Key-value Cache
Mlcached: Multi-level DRAM-NAND Key-value Cache I. Stephen Choi, Byoung Young Ahn, and Yang-Suk Kee Samsung Memory Solutions Lab Abstract We present Mlcached, multi-level DRAM-NAND keyvalue cache, that
More informationEECS 482 Introduction to Operating Systems
EECS 482 Introduction to Operating Systems Winter 2018 Baris Kasikci Slides by: Harsha V. Madhyastha OS Abstractions Applications Threads File system Virtual memory Operating System Next few lectures:
More informationMemory Hierarchy Y. K. Malaiya
Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath
More informationCSE 4/521 Introduction to Operating Systems. Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018
CSE 4/521 Introduction to Operating Systems Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018 Overview Objective: To discuss how the disk is managed for a
More informationPS2 out today. Lab 2 out today. Lab 1 due today - how was it?
6.830 Lecture 7 9/25/2017 PS2 out today. Lab 2 out today. Lab 1 due today - how was it? Project Teams Due Wednesday Those of you who don't have groups -- send us email, or hand in a sheet with just your
More informationChunkStash: Speeding Up Storage Deduplication using Flash Memory
ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication
More informationPebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees
PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research
More informationL7: Performance. Frans Kaashoek Spring 2013
L7: Performance Frans Kaashoek kaashoek@mit.edu 6.033 Spring 2013 Overview Technology fixes some performance problems Ride the technology curves if you can Some performance requirements require thinking
More informationMemory-Based Cloud Architectures
Memory-Based Cloud Architectures ( Or: Technical Challenges for OnDemand Business Software) Jan Schaffner Enterprise Platform and Integration Concepts Group Example: Enterprise Benchmarking -) *%'+,#$)
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationComputer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg
Computer Architecture and System Software Lecture 09: Memory Hierarchy Instructor: Rob Bergen Applied Computer Science University of Winnipeg Announcements Midterm returned + solutions in class today SSD
More informationPerformance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Performance COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals What is Performance? How do we measure the performance of
More informationMulticore Programming
Multi Programming Parallel Hardware and Performance 8 Nov 00 (Part ) Peter Sewell Jaroslav Ševčík Tim Harris Merge sort 6MB input (-bit integers) Recurse(left) ~98% execution time Recurse(right) Merge
More informationCaches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)
Caches Han Wang CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes) This week: Announcements PA2 Work-in-progress submission Next six weeks: Two labs and two projects
More informationQuiz for Chapter 6 Storage and Other I/O Topics 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: 1. [6 points] Give a concise answer to each of the following
More informationSSDs vs HDDs for DBMS by Glen Berseth York University, Toronto
SSDs vs HDDs for DBMS by Glen Berseth York University, Toronto So slow So cheap So heavy So fast So expensive So efficient NAND based flash memory Retains memory without power It works by trapping a small
More informationA Comparison of Performance and Accuracy of Measurement Algorithms in Software
A Comparison of Performance and Accuracy of Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref 1, Yang Zhou 2, Tong Yang 2, Minlan Yu 3 Yale University, Barefoot Networks 1, Peking University
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: November 28, 2017 at 14:31 CS429 Slideset 18: 1 Random-Access Memory
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: April 9, 2018 at 12:16 CS429 Slideset 17: 1 Random-Access Memory
More informationLow memory Map-Reduce
Low memory -Reduce Hrishikesh Amur, Karsten Schwan, Georgia Tech. Wolf Richter, Athula Balachandran, Erik Zawadzki, Dave Andersen, CMU Michael Kaminsky, Intel 1 Datasets are growing People see value (even
More informationRAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel Rosenblum and John Ousterhout) a Storage System
More informationDisks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]
Disks and RAID CS 4410 Operating Systems [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse] Storage Devices Magnetic disks Storage that rarely becomes corrupted Large capacity at low cost Block
More informationMemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing
To appear in Proceedings of the 1th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), Lombard, IL, April 213 MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
More informationMemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing
MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing Bin Fan, David G. Andersen, Michael Kaminsky Carnegie Mellon University, Intel Labs CMU-PDL-12-116 November 2012 Parallel
More informationThe Memory Hierarchy 1
The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow
More informationCS252 S05. CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2. I/O performance measures. I/O performance measures
CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2 I/O performance measures I/O performance measures diversity: which I/O devices can connect to the system? capacity: how many I/O devices
More informationScalable Enterprise Networks with Inexpensive Switches
Scalable Enterprise Networks with Inexpensive Switches Minlan Yu minlanyu@cs.princeton.edu Princeton University Joint work with Alex Fabrikant, Mike Freedman, Jennifer Rexford and Jia Wang 1 Enterprises
More informationChapter 12: File System Implementation
Chapter 12: File System Implementation Silberschatz, Galvin and Gagne 2013 Chapter 12: File System Implementation File-System Structure File-System Implementation Allocation Methods Free-Space Management
More informationDifferential RAID: Rethinking RAID for SSD Reliability
Differential RAID: Rethinking RAID for SSD Reliability Mahesh Balakrishnan Asim Kadav 1, Vijayan Prabhakaran, Dahlia Malkhi Microsoft Research Silicon Valley 1 The University of Wisconsin-Madison Solid
More informationEI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)
EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:
More informationBUFFER HASH KV TABLE
BUFFER HASH KV TABLE CHEAP AND LARGE CAMS FOR HIGH PERFORMANCE DATA-INTENSIVE NETWORKED SYSTEMS PAPER BY ASHOK ANAND, CHITRA MUTHUKRISHNAN, STEVEN KAPPES, ADITYA AKELLA AND SUMAN NATH PRESENTED BY PRAMOD
More informationCS 201 The Memory Hierarchy. Gerson Robboy Portland State University
CS 201 The Memory Hierarchy Gerson Robboy Portland State University memory hierarchy overview (traditional) CPU registers main memory (RAM) secondary memory (DISK) why? what is different between these
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More information[537] Flash. Tyler Harter
[537] Flash Tyler Harter Flash vs. Disk Disk Overview I/O requires: seek, rotate, transfer Inherently: - not parallel (only one head) - slow (mechanical) - poor random I/O (locality around disk head) Random
More informationAssignment 1 due Mon (Feb 4pm
Announcements Assignment 1 due Mon (Feb 19) @ 4pm Next week: no classes Inf3 Computer Architecture - 2017-2018 1 The Memory Gap 1.2x-1.5x 1.07x H&P 5/e, Fig. 2.2 Memory subsystem design increasingly important!
More informationLecture-14 (Memory Hierarchy) CS422-Spring
Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect
More informationBlueDBM: An Appliance for Big Data Analytics*
BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting
More informationSecondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane
Secondary storage CS 537 Lecture 11 Secondary Storage Michael Swift Secondary storage typically: is anything that is outside of primary memory does not permit direct execution of instructions or data retrieval
More informationMain-Memory Databases 1 / 25
1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low
More informationPerformance Analysis in the Real World of Online Services
Performance Analysis in the Real World of Online Services Dileep Bhandarkar, Ph. D. Distinguished Engineer 2009 IEEE International Symposium on Performance Analysis of Systems and Software My Background:
More informationThe Right Read Optimization is Actually Write Optimization. Leif Walsh
The Right Read Optimization is Actually Write Optimization Leif Walsh leif@tokutek.com The Right Read Optimization is Write Optimization Situation: I have some data. I want to learn things about the world,
More informationChapter 11: Implementing File Systems
Chapter 11: Implementing File Systems Operating System Concepts 99h Edition DM510-14 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation
More informationNo Tradeoff Low Latency + High Efficiency
No Tradeoff Low Latency + High Efficiency Christos Kozyrakis http://mast.stanford.edu Latency-critical Applications A growing class of online workloads Search, social networking, software-as-service (SaaS),
More informationChapter 11: Implementing File
Chapter 11: Implementing File Systems Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency
More informationRandom-Access Memory (RAM) CS429: Computer Organization and Architecture. SRAM and DRAM. Flash / RAM Summary. Storage Technologies
Random-ccess Memory (RM) CS429: Computer Organization and rchitecture Dr. Bill Young Department of Computer Science University of Texas at ustin Key Features RM is packaged as a chip The basic storage
More informationChapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition
Chapter 11: Implementing File Systems Operating System Concepts 9 9h Edition Silberschatz, Galvin and Gagne 2013 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory
More informationDell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark
Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark Testing validation report prepared under contract with Dell Introduction As innovation drives
More informationJinho Hwang and Timothy Wood George Washington University
Jinho Hwang and Timothy Wood George Washington University Background: Memory Caching Two orders of magnitude more reads than writes Solution: Deploy memcached hosts to handle the read capacity 6. HTTP
More informationStorage Technologies and the Memory Hierarchy
Storage Technologies and the Memory Hierarchy 198:231 Introduction to Computer Organization Lecture 12 Instructor: Nicole Hynes nicole.hynes@rutgers.edu Credits: Slides courtesy of R. Bryant and D. O Hallaron,
More informationPart II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems SkimpyStash: Key Value
More informationShifting Gears with SSDs
Shifting Gears with SSDs SNIA Developers Conference Denis Vilfort 2008 Sun Microsystems, Inc. 1 What We ll Cover In the next 45 minutes Why SSDs Now? Were do SSDs Fit? SSDs Impact on Networked Storage
More informationCHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.
CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed. File-System Structure File structure Logical storage unit Collection of related information File
More informationBuilding High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL
Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationIJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 12, 2016 ISSN (online):
IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 12, 2016 ISSN (online): 2321-0613 Search Engine Big Data Management and Computing RagavanN 1 Athinarayanan S 2 1 M.Tech
More informationAlbis: High-Performance File Format for Big Data Systems
Albis: High-Performance File Format for Big Data Systems Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle, Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University Chapter 10: File System Chapter 11: Implementing File-Systems Chapter 12: Mass-Storage
More informationA Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510
A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510 Incentives for migrating to Exchange 2010 on Dell PowerEdge R720xd Global Solutions Engineering
More informationComparing Performance of Solid State Devices and Mechanical Disks
Comparing Performance of Solid State Devices and Mechanical Disks Jiri Simsa Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University Motivation Performance gap [Pugh71] technology
More informationEmulex LPe16000B 16Gb Fibre Channel HBA Evaluation
Demartek Emulex LPe16000B 16Gb Fibre Channel HBA Evaluation Evaluation report prepared under contract with Emulex Executive Summary The computing industry is experiencing an increasing demand for storage
More informationNon-Volatile Memory Through Customized Key-Value Stores
Non-Volatile Memory Through Customized Key-Value Stores Leonardo Mármol 1 Jorge Guerra 2 Marcos K. Aguilera 2 1 Florida International University 2 VMware L. Mármol, J. Guerra, M. K. Aguilera (FIU and VMware)
More informationICS RESEARCH TECHNICAL TALK DRAKE TETREAULT, ICS H197 FALL 2013
ICS RESEARCH TECHNICAL TALK DRAKE TETREAULT, ICS H197 FALL 2013 TOPIC: RESEARCH PAPER Title: Data Management for SSDs for Large-Scale Interactive Graphics Applications Authors: M. Gopi, Behzad Sajadi,
More informationUnblinding the OS to Optimize User-Perceived Flash SSD Latency
Unblinding the OS to Optimize User-Perceived Flash SSD Latency Woong Shin *, Jaehyun Park **, Heon Y. Yeom * * Seoul National University ** Arizona State University USENIX HotStorage 2016 Jun. 21, 2016
More informationTowards Energy-Proportional Datacenter Memory with Mobile DRAM
Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University
More informationIntroduction to Database Services
Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationCascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching
Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value
More informationLocality and The Fast File System. Dongkun Shin, SKKU
Locality and The Fast File System 1 First File System old UNIX file system by Ken Thompson simple supported files and the directory hierarchy Kirk McKusick The problem: performance was terrible. Performance
More informationS. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda The Ohio State University
More informationOPERATING SYSTEM. Chapter 12: File System Implementation
OPERATING SYSTEM Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management
More informationECE 486/586. Computer Architecture. Lecture # 2
ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationHyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University
HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM
More informationCSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI Contents 1.7 - End of Chapter 1 Power wall The multicore era
More informationCuckoo Filter: Practically Better Than Bloom
Cuckoo Filter: Practically Better Than Bloom Bin Fan (CMU/Google) David Andersen (CMU) Michael Kaminsky (Intel Labs) Michael Mitzenmacher (Harvard) 1 What is Bloom Filter? A Compact Data Structure Storing
More informationLECTURE 1. Introduction
LECTURE 1 Introduction CLASSES OF COMPUTERS When we think of a computer, most of us might first think of our laptop or maybe one of the desktop machines frequently used in the Majors Lab. Computers, however,
More informationAlgorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I
Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language
More informationE-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems
E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, Michael
More informationUMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming
Systems Design and Programming Instructor: Professor Jim Plusquellic Text: Barry B. Brey, The Intel Microprocessors, 8086/8088, 80186/80188, 80286, 80386, 80486, Pentium and Pentium Pro Processor Architecture,
More informationOverview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM
Memories Overview Memory Classification Read-Only Memory (ROM) Types of ROM PROM, EPROM, E 2 PROM Flash ROMs (Compact Flash, Secure Digital, Memory Stick) Random Access Memory (RAM) Types of RAM Static
More informationCS 310: Memory Hierarchy and B-Trees
CS 310: Memory Hierarchy and B-Trees Chris Kauffman Week 14-1 Matrix Sum Given an M by N matrix X, sum its elements M rows, N columns Sum R given X, M, N sum = 0 for i=0 to M-1{ for j=0 to N-1 { sum +=
More information