Closing the Performance Gap Between Volatile and Persistent K-V Stores
|
|
- Ambrose Marsh
- 5 years ago
- Views:
Transcription
1 Closing the Performance Gap Between Volatile and Persistent K-V Stores Yihe Huang, Harvard University Matej Pavlovic, EPFL Virendra Marathe, Oracle Labs Margo Seltzer, Oracle Labs Tim Harris, Oracle Labs Steve Byan, Oracle Labs ATC 2018, 07/13/2018, Boston MA, USA 1
2 Key-Value Stores: Persistent or Fast? Persistent Slow or Volatile Fast 2
3 Key-Value Stores: Persistent or Fast? Persistent Slow or Volatile Fast What about both? 2
4 Non-Volatile Memory Faster than HDD/SSD Byte-addressable 3
5 Non-Volatile Memory Faster than HDD/SSD Slower than DRAM Byte-addressable Tricky to manage consistently 3
6 Non-Volatile Memory Faster than HDD/SSD Slower than DRAM Byte-addressable Tricky to manage consistently DRAM K-V stores don t easily translate to NVM 3
7 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store 4
8 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store 5
9 Caching Volatile store (frontend) caches Persistent store (backend) 6
10 Caching Volatile store (frontend) caches Persistent store (backend) NVM 7
11 Caching Volatile store (frontend) How? caches Persistent store (backend) NVM 7
12 Reading: Standard Volatile store (frontend) Lookup on cache miss Persistent store (backend) NVM 8
13 Writing: Logs Volatile store (frontend) 1. Persistently log op. 2. Update frontend 3. Return to client 4. Update backend in background Persistent store (backend) NVM 9
14 Writing: How? Logs Volatile store (frontend) 1. Persistently log op. 2. Update frontend 3. Return to client 4. Update backend in background Persistent store (backend) NVM 9
15 Writing: How? Logs Volatile store (frontend) 1. Persistently log op. 2. Update frontend 3. Return to client 4. Update backend in background Persistent store (backend) NVM Challenges: persistence, low latency, scalability, consistency 9
16 Cross-Referencing Logs Persistence: Place logs in NVM Low latency Few NVM accesses per log append Scalability Multiple logs (one per thread) Consistency Cross-log references 10
17 Cross-Referencing Logs Persistence: Place logs in NVM Low latency Few NVM accesses per log append Scalability Multiple logs (one per thread) Consistency Cross-log references 10
18 Cross-Referencing Logs Log entry: Key OP Applied Prev append consume 11
19 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 1, OP: put A append consume 11
20 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 1, OP: put A append Key 2, OP: put B consume 11
21 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume 11
22 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume 11
23 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 3, OP: delete 11
24 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete 11
25 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete 11
26 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete 12
27 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 13
28 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 14
29 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 15
30 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 16
31 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B Applied, Pr: consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 17
32 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Applied, Pr: Key 1, OP: put A append Key 2, OP: put B Applied, Pr: consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 18
33 Cross-Referencing Logs Log entry: Key OP Applied Prev Key 2, OP: delete Applied, Pr: Key 1, OP: put A append Key 2, OP: put B Applied, Pr: consume Key 2, OP: put C Applied, Pr: Key 3, OP: delete Applied, Pr: 19
34 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store 20
35 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store Basic design Optimizations 20
36 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store Basic design Optimizations 21
37 Bullet K-V Store Volatile store (frontend) caches Persistent store (backend) NVM 22
38 Bullet K-V Store Volatile store (frontend) caches Persistent store (backend) NVM 23
39 Bullet K-V Store Volatile store (frontend) Persistent store (backend) caches 24
40 Experimental Setup 16 CPU cores, 512 GB DRAM Intel s NVM emulation platform Zipfian distribution of keys (YCSB) Measuring 99%ile latency Comparing against HiKV 25
41 Raw Hash Table Performance 26
42 Bullet K-V Store Volatile store (frontend) Persistent store (backend) caches 27
43 Bullet K-V Store Volatile store (frontend) Persistent store (backend) X-Ref Logs 28
44 Bullet K-V Store Volatile store (frontend) Writers Persistent store (backend) X-Ref Logs 29
45 Bullet K-V Store Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 29
46 Latency in ns Bullet Performance Read-only volatile phash Throughput in ops/sec 30
47 Latency in ns Bullet Performance Read-only volatile phash hikv-ht Throughput in ops/sec 31
48 Latency in ns Bullet Performance Read-only volatile phash hikv-ht bullet-st Throughput in ops/sec 32
49 Latency in ns Bullet Performance % writes volatile phash hikv-ht bullet-st Throughput in ops/sec 33
50 Outline Cross-Referencing Logs: Getting the best of DRAM and NVM Bullet: Fast and persistent key-value store Basic design Optimizations 34
51 Optimization: Dynamic Worker Threads Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 35
52 Optimization: Dynamic Worker Threads Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 36
53 Read-Heavy Workload Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 37
54 Write-Heavy Workload Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 38
55 Optimization: Overwriting Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 39
56 Optimization: Overwriting Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 40
57 Optimization: Overwriting Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 41
58 Optimization: Overwriting Log entry: Key OP Applied Prev Key 2, OP: delete Key 1, OP: put A append Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 42
59 Optimization: Overwriting Log entry: Key OP Applied Prev append IGNORE Key 2, OP: delete Key 1, OP: put A IGNORE Key 2, OP: put B consume Key 2, OP: put C Key 3, OP: delete Applied, Pr: 43
60 Optimization: Overwriting Volatile store (frontend) Writers Gleaners Persistent store (backend) X-Ref Logs 44
61 Latency in ns Bullet Performance Read-only volatile phash hikv-ht bullet-st Throughput in ops/sec 45
62 Latency in ns Bullet Performance Read-only volatile phash hikv-ht bullet-st bullet-dyn-ovr Throughput in ops/sec 46
63 Latency in ns Bullet Performance % writes volatile phash hikv-ht bullet-st bullet-dyn-ovr Throughput in ops/sec 47
64 Latency in ns Bullet Performance % writes volatile phash hikv-ht bullet-st bullet-dyn-ovr Throughput in ops/sec 48
65 Summary DRAM cache for NVM DRAM frontend, NVM backend Novel caching scheme: Cross-referencing logs 49
66 Summary DRAM cache for NVM DRAM frontend, NVM backend Novel caching scheme: Cross-referencing logs Bullet K-V store Almost identical hash tables in DRAM and NVM DRAM performance for reads Substantial latency improvements for writes 49
67 Summary DRAM cache for NVM DRAM frontend, NVM backend Novel caching scheme: Cross-referencing logs Bullet K-V store Almost identical hash tables in DRAM and NVM DRAM performance for reads Substantial latency improvements for writes More details in the paper! 49
68 Backup Slides 50
69 Optimization Summary Dynamic worker threads Overwrites Non-blocking reads (see paper) Backend access off critical path (see paper) 51
70 Bullet Performance Read-only 52
71 Bullet Performance 2% writes 53
72 Bullet Performance 15% writes 54
73 Bullet Performance 50% writes 55
74 Bullet Performance (end-to-end) Read-only 56
75 Bullet Performance (end-to-end) 2% writes 57
76 Bullet Performance (end-to-end) 15% writes 58
77 Bullet Performance (end-to-end) 50% writes 59
78 Latency Distribution Reads 60
79 Latency Distribution Writes 61
80 Log Size Senzitivity 62
81 Experimental Setup 16 CPU cores 512 GB DRAM NVM simulated using DRAM 384 GB (out of 512GB) 300ns load (2x DRAM) 100ns persist barrier Same throughput DRAM cache fits all data Zipfian distribution of keys (YCSB) 50M K/V pairs, 16 B keys, 100 B values Measuring 99%ile latency 63
82 Dynamic Worker Thread Switching 64
83 Cross-Referencing Logs Persistence: Circular buffers in NVM Low latency Fast appends (3 persist barriers) Scalability Multiple logs, stall only when all full Coarse-grained synchronization Epoch-based space reclamation Consistency Cross-log references 65
84 Cross-Referencing Logs Summary Persistent circular buffers (in NVM) Cross-log pointers for consistency Coarse-grained synchronization Fast appends Epoch-based space reclamation 66
85 Latency in ns Read-only volatile phash hikv-ht bullet-st bullet-dyn bullet-dyn-ovr Throughput in ops/sec 67
Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant
Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model
More informationModule Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.
M6 Memory Hierarchy Module Outline CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies. Events on a Cache Miss Events on a Cache Miss Stall the pipeline.
More informationSoftware and Tools for HPE s The Machine Project
Labs Software and Tools for HPE s The Machine Project Scalable Tools Workshop Aug/1 - Aug/4, 2016 Lake Tahoe Milind Chabbi Traditional Computing Paradigm CPU DRAM CPU DRAM CPU-centric computing 2 CPU-Centric
More informationDalí: A Periodically Persistent Hash Map
Dalí: A Periodically Persistent Hash Map Faisal Nawab* 1, Joseph Izraelevitz* 2, Terence Kelly*, Charles B. Morrey III*, Dhruva R. Chakrabarti*, and Michael L. Scott 2 1 Department of Computer Science
More informationStrata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson
A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements
More informationNon-Volatile Memory Through Customized Key-Value Stores
Non-Volatile Memory Through Customized Key-Value Stores Leonardo Mármol 1 Jorge Guerra 2 Marcos K. Aguilera 2 1 Florida International University 2 VMware L. Mármol, J. Guerra, M. K. Aguilera (FIU and VMware)
More informationPebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees
PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research
More informationAn Analysis of Persistent Memory Use with WHISPER
An Analysis of Persistent Memory Use with WHISPER Sanketh Nalli, Swapnil Haria, Michael M. Swift, Mark D. Hill, Haris Volos*, Kimberly Keeton* University of Wisconsin- Madison & *Hewlett- Packard Labs
More informationRequest-Oriented Durable Write Caching for Application Performance appeared in USENIX ATC '15. Jinkyu Jeong Sungkyunkwan University
Request-Oriented Durable Write Caching for Application Performance appeared in USENIX ATC '15 Jinkyu Jeong Sungkyunkwan University Introduction Volatile DRAM cache is ineffective for write Writes are dominant
More informationHigh-Performance Key-Value Store on OpenSHMEM
High-Performance Key-Value Store on OpenSHMEM Huansong Fu*, Manjunath Gorentla Venkata, Ahana Roy Choudhury*, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline Background
More informationRedesigning LSMs for Nonvolatile Memory with NoveLSM
Redesigning LSMs for Nonvolatile Memory with NoveLSM Sudarsun Kannan, University of Wisconsin-Madison; Nitish Bhat and Ada Gavrilovska, Georgia Tech; Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau, University
More informationNV-Tree Reducing Consistency Cost for NVM-based Single Level Systems
NV-Tree Reducing Consistency Cost for NVM-based Single Level Systems Jun Yang 1, Qingsong Wei 1, Cheng Chen 1, Chundong Wang 1, Khai Leong Yong 1 and Bingsheng He 2 1 Data Storage Institute, A-STAR, Singapore
More informationAn Analysis of Persistent Memory Use with WHISPER
An Analysis of Persistent Memory Use with WHISPER Sanketh Nalli, Swapnil Haria, Michael M. Swift, Mark D. Hill, Haris Volos*, Kimberly Keeton* University of Wisconsin- Madison & *Hewlett- Packard Labs
More informationBzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory
BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory JOY ARULRAJ JUSTIN LEVANDOSKI UMAR FAROOQ MINHAS PER-AKE LARSON Microsoft Research NON-VOLATILE MEMORY [NVM] PERFORMANCE DRAM VOLATILE
More informationSTORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)
1 STORAGE LATENCY 2 RAMAC 350 (600 ms) 1956 10 5 x NAND SSD (60 us) 2016 COMPUTE LATENCY 3 RAMAC 305 (100 Hz) 1956 10 8 x 1000x CORE I7 (1 GHZ) 2016 NON-VOLATILE MEMORY 1000x faster than NAND 3D XPOINT
More informationPortable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI
Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI Huansong Fu*, Manjunath Gorentla Venkata, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline
More informationHardware Support for NVM Programming
Hardware Support for NVM Programming 1 Outline Ordering Transactions Write endurance 2 Volatile Memory Ordering Write-back caching Improves performance Reorders writes to DRAM STORE A STORE B CPU CPU B
More informationData Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research
Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO Albis Pocket
More informationScalable and Secure Internet Services and Architecture PRESENTATION REPORT Semester: Winter 2015 Course: ECE 7650
Scalable and Secure Internet Services and Architecture PRESENTATION REPORT Semester: Winter 2015 Course: ECE 7650 SUBMITTED BY: Yashwanth Boddu fq9316@wayne.edu (1) Our base design uses less than 1 byte
More informationAccelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740
Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740 A performance study with NVDIMM-N Dell EMC Engineering September 2017 A Dell EMC document category Revisions Date
More informationRequest-Oriented Durable Write Caching for Application Performance
Request-Oriented Durable Write Caching for Application Performance Sangwook Kim 1, Hwanju Kim 2, Sang-Hoon Kim 3, Joonwon Lee 1, and Jinkyu Jeong 1 Sungkyunkwan University 1 University of Cambridge 2 Korea
More informationSoft Updates Made Simple and Fast on Non-volatile Memory
Soft Updates Made Simple and Fast on Non-volatile Memory Mingkai Dong, Haibo Chen Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University @ NVMW 18 Non-volatile Memory (NVM) ü Non-volatile
More informationA New Key-Value Data Store For Heterogeneous Storage Architecture
A New Key-Value Data Store For Heterogeneous Storage Architecture brien.porter@intel.com wanyuan.yang@intel.com yuan.zhou@intel.com jian.zhang@intel.com Intel APAC R&D Ltd. 1 Agenda Introduction Background
More informationFOEDUS: OLTP Engine for a Thousand Cores and NVRAM
FOEDUS: OLTP Engine for a Thousand Cores and NVRAM Hideaki Kimura HP Labs, Palo Alto, CA Slides By : Hideaki Kimura Presented By : Aman Preet Singh Next-Generation Server Hardware? HP The Machine UC Berkeley
More informationTailwind: Fast and Atomic RDMA-based Replication. Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes
Tailwind: Fast and Atomic RDMA-based Replication Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes In-Memory Key-Value Stores General purpose in-memory key-value stores are widely used nowadays
More informationCascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching
Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value
More informationFine-grained Metadata Journaling on NVM
32nd International Conference on Massive Storage Systems and Technology (MSST 2016) May 2-6, 2016 Fine-grained Metadata Journaling on NVM Cheng Chen, Jun Yang, Qingsong Wei, Chundong Wang, and Mingdi Xue
More informationData Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research
Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO FS Albis Streaming
More informationPASTE: A Network Programming Interface for Non-Volatile Main Memory
PASTE: A Network Programming Interface for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Giuseppe Lettieri (Università di Pisa) Lars Eggert and Douglas Santry (NetApp) USENIX NSDI 2018
More informationBlurred Persistence in Transactional Persistent Memory
Blurred Persistence in Transactional Persistent Memory Youyou Lu, Jiwu Shu, Long Sun Tsinghua University Overview Problem: high performance overhead in ensuring storage consistency of persistent memory
More informationMemory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology
Memory: Page Table Structure CSSE 332 Operating Systems Rose-Hulman Institute of Technology General address transla+on CPU virtual address data cache MMU Physical address Global memory Memory management
More informationModule 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.
The Lecture Contains: Memory organisation Example of memory hierarchy Memory hierarchy Disks Disk access Disk capacity Disk access time Typical disk parameters Access times file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture4/4_1.htm[6/14/2012
More informationHigh Performance Transactions in Deuteronomy
High Performance Transactions in Deuteronomy Justin Levandoski, David Lomet, Sudipta Sengupta, Ryan Stutsman, and Rui Wang Microsoft Research Overview Deuteronomy: componentized DB stack Separates transaction,
More informationRAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
More informationNVthreads: Practical Persistence for Multi-threaded Applications
NVthreads: Practical Persistence for Multi-threaded Applications Terry Hsu*, Purdue University Helge Brügner*, TU München Indrajit Roy*, Google Inc. Kimberly Keeton, Hewlett Packard Labs Patrick Eugster,
More informationLog-Free Concurrent Data Structures
Log-Free Concurrent Data Structures Abstract Tudor David IBM Research Zurich udo@zurich.ibm.com Rachid Guerraoui EPFL rachid.guerraoui@epfl.ch Non-volatile RAM (NVRAM) makes it possible for data structures
More informationSystem Software for Persistent Memory
System Software for Persistent Memory Subramanya R Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran and Jeff Jackson 72131715 Neo Kim phoenixise@gmail.com Contents
More informationWrite-Optimized and High-Performance Hashing Index Scheme for Persistent Memory
Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory Pengfei Zuo, Yu Hua, and Jie Wu, Huazhong University of Science and Technology https://www.usenix.org/conference/osdi18/presentation/zuo
More informationWORT: Write Optimal Radix Tree for Persistent Memory Storage Systems
WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems Se Kwon Lee K. Hyun Lim 1, Hyunsub Song, Beomseok Nam, Sam H. Noh UNIST 1 Hongik University Persistent Memory (PM) Persistent memory
More informationMemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing
MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU), Dave Andersen (CMU), Michael Kaminsky (Intel Labs) NSDI 2013 http://www.pdl.cmu.edu/ 1 Goal: Improve Memcached 1. Reduce space overhead
More informationSpeeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationHigh-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han
High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han Seoul National University, Korea Dongduk Women s University, Korea Contents Motivation and Background
More informationSLM-DB: Single-Level Key-Value Store with Persistent Memory
SLM-DB: Single-Level Key-Value Store with Persistent Memory Olzhas Kaiyrakhmet and Songyi Lee, UNIST; Beomseok Nam, Sungkyunkwan University; Sam H. Noh and Young-ri Choi, UNIST https://www.usenix.org/conference/fast19/presentation/kaiyrakhmet
More informationOutline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA
CS 258 Parallel Computer Architecture Data Speculation Support for a Chip Multiprocessor (Hydra CMP) Lance Hammond, Mark Willey and Kunle Olukotun Presented: May 7 th, 2008 Ankit Jain Outline The Hydra
More informationAerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song
Aerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song Outline 1. Storage-Class Memory (SCM) 2. Motivation 3. Design of Aerie 4. File System Features
More informationPay Migration Tax to Homeland: Anchor-based Scalable Reference Counting for Multicores
Pay Migration Tax to Homeland: Anchor-based Scalable Reference Counting for Multicores Seokyong Jung, Jongbin Kim, Minsoo Ryu, Sooyong Kang, Hyungsoo Jung Hanyang University 1 Reference counting It is
More informationMarch NVM Solutions Group
March 2017 NVM Solutions Group Ideally one would desire an indefinitely large memory capacity such that any particular word would be immediately available. It does not seem possible physically to achieve
More informationCGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout
CGAR: Strong Consistency without Synchronous Replication Seo Jin Park Advised by: John Ousterhout Improved update performance of storage systems with master-back replication Fast: updates complete before
More informationS. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda The Ohio State University
More informationIntroduction to cache memories
Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal
More informationCaches. Hiding Memory Access Times
Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY
More informationHigh Performance Packet Processing with FlexNIC
High Performance Packet Processing with FlexNIC Antoine Kaufmann, Naveen Kr. Sharma Thomas Anderson, Arvind Krishnamurthy University of Washington Simon Peter The University of Texas at Austin Ethernet
More information* Contributed while interning at SAP. September 1 st, 2017 PUBLIC
Adaptive Recovery for SCM-Enabled Databases Ismail Oukid (TU Dresden & SAP), Daniel Bossle* (SAP), Anisoara Nica (SAP), Peter Bumbulis (SAP), Wolfgang Lehner (TU Dresden), Thomas Willhalm (Intel) * Contributed
More informationAnalyzing I/O Performance on a NEXTGenIO Class System
Analyzing I/O Performance on a NEXTGenIO Class System holger.brunst@tu-dresden.de ZIH, Technische Universität Dresden LUG17, Indiana University, June 2 nd 2017 NEXTGenIO Fact Sheet Project Research & Innovation
More informationCHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 5th Edition, Irv Englander John
More informationKartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18
Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation
More information3D Xpoint Status and Forecast 2017
3D Xpoint Status and Forecast 2017 Mark Webb MKW 1 Ventures Consulting, LLC Memory Technologies Latency Density Cost HVM ready DRAM ***** *** *** ***** NAND * ***** ***** ***** MRAM ***** * * *** 3DXP
More informationVirtual to physical address translation
Virtual to physical address translation Virtual memory with paging Page table per process Page table entry includes present bit frame number modify bit flags for protection and sharing. Page tables can
More informationEnlightening the I/O Path: A Holistic Approach for Application Performance
Enlightening the I/O Path: A Holistic Approach for Application Performance Sangwook Kim 13, Hwanju Kim 2, Joonwon Lee 3, and Jinkyu Jeong 3 Apposha 1 Dell EMC 2 Sungkyunkwan University 3 Data-Intensive
More informationThe End of a Myth: Distributed Transactions Can Scale
The End of a Myth: Distributed Transactions Can Scale Erfan Zamanian, Carsten Binnig, Tim Harris, and Tim Kraska Rafael Ketsetsides Why distributed transactions? 2 Cheaper for the same processing power
More informationCS6453. Data-Intensive Systems: Rachit Agarwal. Technology trends, Emerging challenges & opportuni=es
CS6453 Data-Intensive Systems: Technology trends, Emerging challenges & opportuni=es Rachit Agarwal Slides based on: many many discussions with Ion Stoica, his class, and many industry folks Servers Typical
More informationRethink the Sync 황인중, 강윤지, 곽현호. Embedded Software Lab. Embedded Software Lab.
1 Rethink the Sync 황인중, 강윤지, 곽현호 Authors 2 USENIX Symposium on Operating System Design and Implementation (OSDI 06) System Structure Overview 3 User Level Application Layer Kernel Level Virtual File System
More informationFlash-Conscious Cache Population for Enterprise Database Workloads
IBM Research ADMS 214 1 st September 214 Flash-Conscious Cache Population for Enterprise Database Workloads Hyojun Kim, Ioannis Koltsidas, Nikolas Ioannou, Sangeetha Seshadri, Paul Muench, Clem Dickey,
More informationAmnesic Cache Management for Non-Volatile Memory
Amnesic Cache Management for Non-Volatile Memory Dongwoo Kang, Seungjae Baek, Jongmoo Choi Dankook University, South Korea {kangdw, baeksj, chiojm}@dankook.ac.kr Donghee Lee University of Seoul, South
More informationPacketShader: A GPU-Accelerated Software Router
PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,
More informationW H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4
W H I T E P A P E R Comparison of Storage Protocol Performance in VMware vsphere 4 Table of Contents Introduction................................................................... 3 Executive Summary............................................................
More informationREMOTE PERSISTENT MEMORY ACCESS WORKLOAD SCENARIOS AND RDMA SEMANTICS
13th ANNUAL WORKSHOP 2017 REMOTE PERSISTENT MEMORY ACCESS WORKLOAD SCENARIOS AND RDMA SEMANTICS Tom Talpey Microsoft [ March 31, 2017 ] OUTLINE Windows Persistent Memory Support A brief summary, for better
More informationCycle Time for Non-pipelined & Pipelined processors
Cycle Time for Non-pipelined & Pipelined processors Fetch Decode Execute Memory Writeback 250ps 350ps 150ps 300ps 200ps For a non-pipelined processor, the clock cycle is the sum of the latencies of all
More informationDatabase Management and Tuning
Database Management and Tuning Concurrency Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 8 May 10, 2012 Acknowledgements: The slides are provided by Nikolaus
More informationNVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory
NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory Dhananjoy Das, Sr. Systems Architect SanDisk Corp. 1 Agenda: Applications are KING! Storage landscape (Flash / NVM)
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 24 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 2: More Multiprocessors Computation Taxonomy SISD SIMD MISD MIMD ILP Vectors, MM-ISAs Shared Memory
More informationLecture: Memory Technology Innovations
Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile cells, photonics Multiprocessor intro 1 Row Buffers
More informationFAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan
FAWN A Fast Array of Wimpy Nodes David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan Carnegie Mellon University *Intel Labs Pittsburgh Energy in computing
More informationChip-Multithreading Systems Need A New Operating Systems Scheduler
Chip-Multithreading Systems Need A New Operating Systems Scheduler Alexandra Fedorova Christopher Small Daniel Nussbaum Margo Seltzer Harvard University, Sun Microsystems Sun Microsystems Sun Microsystems
More informationBenchmarking Persistent Memory in Computers
Benchmarking Persistent Memory in Computers Testing with MongoDB Presenter: Adam McPadden Co-Authors: Moshik Hershcovitch and Revital Eres August 2017 1 Overview Objective Background System Configuration
More informationMultiprocessor Support
CSC 256/456: Operating Systems Multiprocessor Support John Criswell University of Rochester 1 Outline Multiprocessor hardware Types of multi-processor workloads Operating system issues Where to run the
More informationInterrupt Coalescing in Xen
Interrupt Coalescing in Xen with Scheduler Awareness Michael Peirce & Kevin Boos Outline Background Hypothesis vic-style Interrupt Coalescing Adding Scheduler Awareness Evaluation 2 Background Xen split
More informationRuntime Data Management on Non-volatile Memory-based Heterogeneous Memory for Task-Parallel Programs
Runtime Data Management on Non-volatile Memory-based Heterogeneous Memory for Task-Parallel Programs Kai Wu Jie Ren University of California, Merced PASA Lab Dong Li SC 18 1 Non-volatile Memory is Promising
More informationBIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory
HotStorage 18 BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory Gyuyoung Park 1, Miryeong Kwon 1, Pratyush Mahapatra 2, Michael Swift 2, and Myoungsoo Jung 1 Yonsei University Computer
More informationand data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed
5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid
More informationPart II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems SkimpyStash: Key Value
More informationThyNVM. Enabling So1ware- Transparent Crash Consistency In Persistent Memory Systems
ThyNVM Enabling So1ware- Transparent Crash Consistency In Persistent Memory Systems Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu TWO- LEVEL STORAGE MODEL MEMORY CPU STORAGE
More informationHybrid Memory Platform
Hybrid Memory Platform Kenneth Wright, Sr. Driector Rambus / Emerging Solutions Division Join the Conversation #OpenPOWERSummit 1 Outline The problem / The opportunity Project goals Roadmap - Sub-projects/Tracks
More informationKaladhar Voruganti Senior Technical Director NetApp, CTO Office. 2014, NetApp, All Rights Reserved
Kaladhar Voruganti Senior Technical Director NetApp, CTO Office Storage Used to Be Simple DRAM $$$ DISK Nearline TAPE volatile persistent Access Latency 2 Talk Focus: Persistent Memory Design Center DRAM
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationStorage Systems : Disks and SSDs. Manu Awasthi CASS 2018
Storage Systems : Disks and SSDs Manu Awasthi CASS 2018 Why study storage? Scalable High Performance Main Memory System Using Phase-Change Memory Technology, Qureshi et al, ISCA 2009 Trends Total amount
More informationStreamBox: Modern Stream Processing on a Multicore Machine
StreamBox: Modern Stream Processing on a Multicore Machine Hongyu Miao and Heejin Park, Purdue ECE; Myeongjae Jeon and Gennady Pekhimenko, Microsoft Research; Kathryn S. McKinley, Google; Felix Xiaozhu
More informationarxiv: v1 [cs.db] 19 Jan 2019
Guaranteeing Recoverability via Partially Constrained Transaction Logs Huan Zhou, Jinwei Guo, Huiqi Hu, Weining Qian, Xuan Zhou, Aoying Zhou School of Data Science & Engineering East China Normal University
More informationFalcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University
Falcon: Scaling IO Performance in Multi-SSD Volumes Pradeep Kumar H Howie Huang The George Washington University SSDs in Big Data Applications Recent trends advocate using many SSDs for higher throughput
More informationSoftWrAP: A Lightweight Framework for Transactional Support of Storage Class Memory
SoftWrAP: A Lightweight Framework for Transactional Support of Storage Class Memory Ellis Giles Rice University Houston, Texas erg@rice.edu Kshitij Doshi Intel Corp. Portland, OR kshitij.a.doshi@intel.com
More informationMoneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010
Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed
More informationCS 550 Operating Systems Spring File System
1 CS 550 Operating Systems Spring 2018 File System 2 OS Abstractions Process: virtualization of CPU Address space: virtualization of memory The above to allow a program to run as if it is in its own private,
More informationSAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory
SAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory Hyeonho Song, Sam H. Noh UNIST HotStorage 2018 Contents Persistent Memory Motivation SAY-Go Design Implementation Evaluation
More informationMnemosyne Lightweight Persistent Memory
Mnemosyne Lightweight Persistent Memory Haris Volos Andres Jaan Tack, Michael M. Swift University of Wisconsin Madison Executive Summary Storage-Class Memory (SCM) enables memory-like storage Persistent
More informationParallel pg_dump. Parallel pg_dump. Maxing out hardware resources for database copy, backup and restore. Joachim Wieland. CHAR(10), 2nd July 2010
Parallel pg_dump Maxing out hardware resources for database copy, backup and restore Joachim Wieland CHAR(10), 2nd July 2010 The Idea Parallel pg_dump Current pg_dump does not use too much of the available
More informationA New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.
A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd. 1 Agenda Introduction Background and Motivation Hybrid Key-Value Data Store Architecture Overview Design details Performance
More informationNEXTGenIO Performance Tools for In-Memory I/O
NEXTGenIO Performance Tools for In- I/O holger.brunst@tu-dresden.de ZIH, Technische Universität Dresden 22 nd -23 rd March 2017 Credits Intro slides by Adrian Jackson (EPCC) A new hierarchy New non-volatile
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationPCIe Storage Beyond SSDs
PCIe Storage Beyond SSDs Fabian Trumper NVM Solutions Group PMC-Sierra Santa Clara, CA 1 Classic Memory / Storage Hierarchy FAST, VOLATILE CPU Cache DRAM Performance Gap Performance Tier (SSDs) SLOW, NON-VOLATILE
More information