Hash Table Design and Optimization for Software Virtual Switches
|
|
- Lucas Griffin
- 5 years ago
- Views:
Transcription
1 Hash Table Design and Optimization for Software Virtual Switches P R E S E N T E R : R E N WA N G Y I P E N G WA N G, S A M E H G O B R I E L, R E N WA N G, C H A R L I E TA I, C R I S T I A N D U M I T R E S C U I N T E L L A B S
2 OUTLINE Background and motivation Survey and understanding Analysis
3 Background We found the most common data structure used in virtual switch is hash table. wildcarding match (tuple space search): routing table, ACL exact match: con-track table, flow cache, etc. Comparing to tree based data structure, hash table based data structure has certain advantages: More parallelism: no pointer chasing Faster rule updates 3
4 Background Hash table lookup is also one of the most time consuming stage during packet processing: E.g. Open vswitch (100k rules, 20 subtables) Execution percentage IO Preprocessing Rule lookup others ~8% ~5% ~78% ~9% A major source of hash table lookup overhead is memory access latency. 4
5 Motivation Hash table is a simple data structure, but there are many different design and implementations. Understanding of hash table performance and how to design an efficient hash table structure is the key to a good software switch. A general guideline to hash table designs will benefit future vswitch development. 5
6 Basic hash table structure The evolution of hash table algorithms: single array -> bucket-based -> n-hash Key A Key A Key A 6
7 Cuckoo hashing Cuckoo hash algorithm: existing keys can be displaced to alternative bucket Key B Key A Key A Hash func Key B 7
8 Survey We also studied into various open source virtual switch applications to learn their implementations. Three major purposes these applications use hash table for: Routing table/acl tuple space search Connect tracking table exact match Flow cache exact/signature match with replacement policy 8
9 Observations Set-associative table and cuckoo hash are widely used. Bucket size is usually 4-8 entries Cache alignment Vectorization Capacity guarantee is needed in telcom use cases Linked list based hash table as extended table Software techniques to improve performance: Software pipelining Batching Read write concurrency Optimistic locking Intel TSX 9
10 Analysis - Table organization and data structure Number of keys per bucket More entries in a bucket can directly improve the table utilizations. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% load factors vs. assoc hash 2hash cuckoo Cuckoo hash insertion cycles vs. load factor Conclusion: when table utilization is important, cuckoo hash should be used. Multiple hash function and multiple ways per bucket also help a lot. 10
11 Analysis - Separate key storage and cache alignment Hash tables could store key-data pair in a separate memory location, and only keep signatures and index in the table. Pros: signature and index are easily to be cache aligned, benefit cache miss case. Cons: requires another memory jump when hit. SIG Key+data Out tests show that with optimized DPDK hash tables, storing keys in or outside the table does not show major difference with 16 or 32-byte key size. However, cache alignment will improve hash table lookup speed by % in our DPDK based performance test. 11
12 Analysis - Hash table based cache When use hash table for flow cache we need to consider cache miss ratio. 4-8 ways per bucket can already keep the miss ratio to be reasonable low. lookup We propose a new AVX-based LRU implementation. Use Intel AVX instruction to permute the bucket % 25.00% miss ratio vs. ways/bucket miss ratio 20.00% 15.00% 10.00% 5.00% 0.00% speed comparison (lower the better) avx age ll bplru tplru insert lookup void adjust_location(int location, bucket* bucket){ m256i array = avx_load(bucket) m256i permute_pattern = avx_load(permute_index[location]) m256i permuted_array = avx_permute(array, permute_pattern) avx_store (bucket, permuted_array) } 12
13 Analysis - Software pipelining and batching Batching can enable us to prefetch hash table bucket for different lookup keys. Together with batching, software pipelining can further improve performance. Software pipelining + batching easily improve performance by 2X in our test case cycle/pkt with prefetch or pipelining no opt. prefetch pipelined 13
14 Analysis- Vectorization Besides using Intel AVX instruction for LRU operation, we can also use AVX instruction to perform signature comparison. We compare three mechanisms: No vectorization. Horizontal vectorization: compare one key s signature to all signatures in a bucket. Vertical vecotrization: compare all key s signatures in a batch to different entries across different buckets. Observation: Vertical or scalar better for low table utilization. Horizontal better for high table utilization. An adaptive method could benefit lookup cycle vs. avg entry location scalar vert hori 14
15 Future directions of Hash Table Design Cuckoo hash + extended linked list design Linked list based hash table provides capacity guarantee. Cuckoo hash table provides high table utilization and constant table lookup time. The combination of both to achieve both capacity guarantee and better utilization. Adaptive vrouter From the study, we found no single data structure could fit all use cases. Runtime decision based on traffic patterns could benefit. During runtime, a learning (e.g., trial and rank) phase to try various hash table data structures. 15
16 Conclusion We investigated multiple hash table algorithms and implementations in popular virtual switches. We analyzed various hash table designs and provide guide lines for different use cases. We proposed Intel AVX based LRU cache implementation and adaptive signature comparison. We proposed future directions on hash table design in virtual switches. 16
Tungsten Fabric Optimization by DPDK ZHAOYAN CHEN YIPENG WANG
x Tungsten Fabric Optimization by DPDK ZHAOYAN CHEN YIPENG WANG Agenda Introduce Tungsten Fabric Support More CPU cores MPLS over GRE Optimization Hash Table Optimization Batch RX for VM and Fabric What
More informationFlow Caching for High Entropy Packet Fields
Flow Caching for High Entropy Packet Fields Nick Shelly Nick McKeown! Ethan Jackson Teemu Koponen Jarno Rajahalme Outline Current flow classification in OVS Problems with high entropy packets Proposed
More informationExtending DPDK Flow Classification Libraries for Determinism & Cloud Usages SAMEH GOBRIEL INTEL LABS
x Extending DPDK Flow Classification Libraries for Determinism & Cloud Usages SAMEH GOBRIEL INTEL LABS Contributors Yipeng Wang yipeng1.wang@intel.com Ren Wang ren.wang@intel.com Charlie Tai charlie.tai@intel.com
More informationMemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing
MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU), Dave Andersen (CMU), Michael Kaminsky (Intel Labs) NSDI 2013 http://www.pdl.cmu.edu/ 1 Goal: Improve Memcached 1. Reduce space overhead
More informationHigh-Performance Key-Value Store on OpenSHMEM
High-Performance Key-Value Store on OpenSHMEM Huansong Fu*, Manjunath Gorentla Venkata, Ahana Roy Choudhury*, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline Background
More informationSpring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand
Cache Design Basics Nima Honarmand Storage Hierarchy Make common case fast: Common: temporal & spatial locality Fast: smaller, more expensive memory Bigger Transfers Registers More Bandwidth Controlled
More informationBe Fast, Cheap and in Control with SwitchKV. Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level
More informationCascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching
Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value
More informationMain Memory (Part II)
Main Memory (Part II) Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Main Memory 1393/8/17 1 / 50 Reminder Amir H. Payberah
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #21: Caches 3 2005-07-27 CS61C L22 Caches III (1) Andy Carle Review: Why We Use Caches 1000 Performance 100 10 1 1980 1981 1982 1983
More informationAccelerating OpenFlow SDN Switches with Per-Port Cache
Accelerating OpenFlow SDN Switches with Per-Port Cache Cheng-Yi Lin Youn-Long Lin Department of Computer Science National Tsing Hua University 1 Outline 1. Introduction 2. Related Work 3. Per-Port Cache
More informationA Comparison of Performance and Accuracy of Measurement Algorithms in Software
A Comparison of Performance and Accuracy of Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref 1, Yang Zhou 2, Tong Yang 2, Minlan Yu 3 Yale University, Barefoot Networks 1, Peking University
More informationHigh Performance Computing and Programming, Lecture 3
High Performance Computing and Programming, Lecture 3 Memory usage and some other things Ali Dorostkar Division of Scientific Computing, Department of Information Technology, Uppsala University, Sweden
More informationPVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim
PVPP: A Programmable Vector Packet Processor Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim Fixed Set of Protocols Fixed-Function Switch Chip TCP IPv4 IPv6
More informationMultimedia Streaming. Mike Zink
Multimedia Streaming Mike Zink Technical Challenges Servers (and proxy caches) storage continuous media streams, e.g.: 4000 movies * 90 minutes * 10 Mbps (DVD) = 27.0 TB 15 Mbps = 40.5 TB 36 Mbps (BluRay)=
More informationIMPORTANT: Circle the last two letters of your class account:
Spring 2011 University of California, Berkeley College of Engineering Computer Science Division EECS MIDTERM I CS 186 Introduction to Database Systems Prof. Michael J. Franklin NAME: STUDENT ID: IMPORTANT:
More informationOpenFlow Software Switch & Intel DPDK. performance analysis
OpenFlow Software Switch & Intel DPDK performance analysis Agenda Background Intel DPDK OpenFlow 1.3 implementation sketch Prototype design and setup Results Future work, optimization ideas OF 1.3 prototype
More informationStochastic Pre-Classification for SDN Data Plane Matching
Stochastic Pre-Classification for SDN Data Plane Matching Luke McHale, C. Jasson Casey, Paul V. Gratz, Alex Sprintson Presenter: Luke McHale Ph.D. Student, Texas A&M University Contact: luke.mchale@tamu.edu
More informationLearning with Purpose
Network Measurement for 100Gbps Links Using Multicore Processors Xiaoban Wu, Dr. Peilong Li, Dr. Yongyi Ran, Prof. Yan Luo Department of Electrical and Computer Engineering University of Massachusetts
More informationEECS 470. Lecture 14 Advanced Caches. DEC Alpha. Fall Jon Beaumont
Lecture 14 Advanced Caches DEC Alpha Fall 2018 Instruction Cache BIU Jon Beaumont www.eecs.umich.edu/courses/eecs470/ Data Cache Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti,
More informationMemory Hierarchy. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Memory Hierarchy Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 35 Caches IV / VM I 2004-11-19 Andy Carle inst.eecs.berkeley.edu/~cs61c-ta Google strikes back against recent encroachments into the Search
More informationBasics of Performance Engineering
ERLANGEN REGIONAL COMPUTING CENTER Basics of Performance Engineering J. Treibig HiPerCH 3, 23./24.03.2015 Why hardware should not be exposed Such an approach is not portable Hardware issues frequently
More informationLecture 21. Monday, February 28 CS 470 Operating Systems - Lecture 21 1
Lecture 21 Case study guidelines posted Case study assignments Third project (virtual memory simulation) will go out after spring break. Extra credit fourth project (shell program) will go out after that.
More informationJackson Marusarz Intel Corporation
Jackson Marusarz Intel Corporation Intel VTune Amplifier Quick Introduction Get the Data You Need Hotspot (Statistical call tree), Call counts (Statistical) Thread Profiling Concurrency and Lock & Waits
More informationOverview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition
William Stallings Computer Organization and Architecture 6th Edition Chapter 4 Cache Memory Overview IN this chapter we will study 4.1 COMPUTER MEMORY SYSTEM OVERVIEW 4.2 CACHE MEMORY PRINCIPLES 4.3 ELEMENTS
More informationBe Fast, Cheap and in Control with SwitchKV Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II 27-8-6 Scott Beamer, Instructor New Flow Based Routers CS61C L24 Cache II (1) www.anagran.com Caching Terminology When we try
More informationMaximizing Face Detection Performance
Maximizing Face Detection Performance Paulius Micikevicius Developer Technology Engineer, NVIDIA GTC 2015 1 Outline Very brief review of cascaded-classifiers Parallelization choices Reducing the amount
More informationSome Practice Problems on Hardware, File Organization and Indexing
Some Practice Problems on Hardware, File Organization and Indexing Multiple Choice State if the following statements are true or false. 1. On average, repeated random IO s are as efficient as repeated
More informationA distributed in-memory key-value store system on heterogeneous CPU GPU cluster
The VLDB Journal DOI 1.17/s778-17-479- REGULAR PAPER A distributed in-memory key-value store system on heterogeneous CPU GPU cluster Kai Zhang 1 Kaibo Wang 2 Yuan Yuan 3 Lei Guo 2 Rubao Li 3 Xiaodong Zhang
More informationECE 30 Introduction to Computer Engineering
ECE 0 Introduction to Computer Engineering Study Problems, Set #9 Spring 01 1. Given the following series of address references given as word addresses:,,, 1, 1, 1,, 8, 19,,,,, 7,, and. Assuming a direct-mapped
More informationECE 5730 Memory Systems
ECE 5730 Memory Systems Spring 2009 Command Scheduling Disk Caching Lecture 23: 1 Announcements Quiz 12 I ll give credit for #4 if you answered (d) Quiz 13 (last one!) on Tuesday Make-up class #2 Thursday,
More informationScalable Concurrent Hash Tables via Relativistic Programming
Scalable Concurrent Hash Tables via Relativistic Programming Josh Triplett September 24, 2009 Speed of data < Speed of light Speed of light: 3e8 meters/second Processor speed: 3 GHz, 3e9 cycles/second
More informationOperating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015
Operating Systems 09. Memory Management Part 1 Paul Krzyzanowski Rutgers University Spring 2015 March 9, 2015 2014-2015 Paul Krzyzanowski 1 CPU Access to Memory The CPU reads instructions and reads/write
More informationShow Me the $... Performance And Caches
Show Me the $... Performance And Caches 1 CPU-Cache Interaction (5-stage pipeline) PCen 0x4 Add bubble PC addr inst hit? Primary Instruction Cache IR D To Memory Control Decode, Register Fetch E A B MD1
More informationHashing Round-down Prefixes for Rapid Packet Classification
Hashing Round-down Prefixes for Rapid Packet Classification Fong Pong and Nian-Feng Tzeng* *Center for Advanced Computer Studies University of Louisiana, Lafayette, USA 1 Outline Packet Classification
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 32 Caches III 2008-04-16 Lecturer SOE Dan Garcia Hi to Chin Han from U Penn! Prem Kumar of Northwestern has created a quantum inverter
More informationWilliam Stallings Computer Organization and Architecture 8th Edition. Cache Memory
William Stallings Computer Organization and Architecture 8th Edition Chapter 4 Cache Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics
More informationTopic: A Deep Dive into Memory Access. Company: Intel Title: Software Engineer Name: Wang, Zhihong
Topic: A Deep Dive into Memory Access Company: Intel Title: Software Engineer Name: Wang, Zhihong A Typical NFV Scenario: PVP Guest Forwarding Engine virtio vhost Forwarding Engine NIC Ring ops What s
More informationMemory Management. Memory
Memory Management These slides are created by Dr. Huang of George Mason University. Students registered in Dr. Huang s courses at GMU can make a single machine readable copy and print a single copy of
More informationCOSC3330 Computer Architecture Lecture 20. Virtual Memory
COSC3330 Computer Architecture Lecture 20. Virtual Memory Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston Virtual Memory Topics Reducing Cache Miss Penalty (#2) Use
More informationReliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!
Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015! ! My Topic for Today! Goal: a reliable longest name prefix lookup performance
More informationWrite-Optimized and High-Performance Hashing Index Scheme for Persistent Memory
Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory Pengfei Zuo, Yu Hua, Jie Wu Huazhong University of Science and Technology, China 3th USENIX Symposium on Operating Systems
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More informationComputer Systems Laboratory Sungkyunkwan University
ARM & IA-32 Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ARM (1) ARM & MIPS similarities ARM: the most popular embedded core Similar basic set
More informationHuge market -- essentially all high performance databases work this way
11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch
More informationDistributed File Systems Part II. Distributed File System Implementation
s Part II Daniel A. Menascé Implementation File Usage Patterns File System Structure Caching Replication Example: NFS 1 Implementation: File Usage Patterns Static Measurements: - distribution of file size,
More informationAdapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]
Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 Caches III Lecturer SOE Dan Garcia Google Glass may be one vision of the future of post-pc interfaces augmented reality with video
More informationProcessors. Young W. Lim. May 12, 2016
Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version
More informationMemory Management Outline. Operating Systems. Motivation. Paging Implementation. Accessing Invalid Pages. Performance of Demand Paging
Memory Management Outline Operating Systems Processes (done) Memory Management Basic (done) Paging (done) Virtual memory Virtual Memory (Chapter.) Motivation Logical address space larger than physical
More informationShepard - A Fast Exact Match Short Read Aligner
Shepard A Fast Exact Match Short Read Aligner Chad Nelson, Kevin Townsend, Bhavani Rao, Phillip Jones, Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Ames, IA,
More informationMemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing
To appear in Proceedings of the 1th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), Lombard, IL, April 213 MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
More informationForget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest
Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray peter@swifttest.com SwiftTest Storage Performance Validation Rely on vendor IOPS claims Test in production and pray Validate
More informationThe Gap Between the Virtual Machine and the Real Machine. Charles Forgy Production Systems Tech
The Gap Between the Virtual Machine and the Real Machine Charles Forgy Production Systems Tech How to Improve Performance Use better algorithms. Use parallelism. Make better use of the hardware. Argument
More informationUnit 2. Chapter 4 Cache Memory
Unit 2 Chapter 4 Cache Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation Location CPU Internal External Capacity Word
More informationProgrammable NICs. Lecture 14, Computer Networks (198:552)
Programmable NICs Lecture 14, Computer Networks (198:552) Network Interface Cards (NICs) The physical interface between a machine and the wire Life of a transmitted packet Userspace application NIC Transport
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationMulti-level Translation. CS 537 Lecture 9 Paging. Example two-level page table. Multi-level Translation Analysis
Multi-level Translation CS 57 Lecture 9 Paging Michael Swift Problem: what if you have a sparse address space e.g. out of GB, you use MB spread out need one PTE per page in virtual address space bit AS
More informationChapter 8 & Chapter 9 Main Memory & Virtual Memory
Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array
More informationCS 261 Fall Caching. Mike Lam, Professor. (get it??)
CS 261 Fall 2017 Mike Lam, Professor Caching (get it??) Topics Caching Cache policies and implementations Performance impact General strategies Caching A cache is a small, fast memory that acts as a buffer
More informationGrowth of the Internet Network capacity: A scarce resource Good Service
IP Route Lookups 1 Introduction Growth of the Internet Network capacity: A scarce resource Good Service Large-bandwidth links -> Readily handled (Fiber optic links) High router data throughput -> Readily
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationMega KV: A Case for GPUs to Maximize the Throughput of In Memory Key Value Stores
Mega KV: A Case for GPUs to Maximize the Throughput of In Memory Key Value Stores Kai Zhang Kaibo Wang 2 Yuan Yuan 2 Lei Guo 2 Rubao Lee 2 Xiaodong Zhang 2 University of Science and Technology of China
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More information6.033 Computer System Engineering
MIT OpenCourseWare http://ocw.mit.edu 6.033 Computer System Engineering Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 6.033 2009 Lecture
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationCOS 318: Operating Systems. Virtual Memory and Address Translation
COS 318: Operating Systems Virtual Memory and Address Translation Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Today s Topics
More informationLecture notes for CS Chapter 2, part 1 10/23/18
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationOpenEdge 12.0 Database Performance and Server Side Joins. Richard Banville Fellow, OpenEdge Development October 12, 2018
OpenEdge 12.0 Database Performance and Server Side Joins Richard Banville Fellow, OpenEdge Development October 12, 2018 Data Access Performance Enhancements Increasing overall throughput Provide more concurrency
More informationChapter 8 Main Memory
COP 4610: Introduction to Operating Systems (Spring 2014) Chapter 8 Main Memory Zhi Wang Florida State University Contents Background Swapping Contiguous memory allocation Paging Segmentation OS examples
More informationBig and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant
Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model
More informationHigh-Speed Forwarding: A P4 Compiler with a Hardware Abstraction Library for Intel DPDK
High-Speed Forwarding: A P4 Compiler with a Hardware Abstraction Library for Intel DPDK Sándor Laki Eötvös Loránd University Budapest, Hungary lakis@elte.hu Motivation Programmability of network data plane
More informationCuckoo Filter: Practically Better Than Bloom
Cuckoo Filter: Practically Better Than Bloom Bin Fan (CMU/Google) David Andersen (CMU) Michael Kaminsky (Intel Labs) Michael Mitzenmacher (Harvard) 1 What is Bloom Filter? A Compact Data Structure Storing
More informationRuler: High-Speed Packet Matching and Rewriting on Network Processors
Ruler: High-Speed Packet Matching and Rewriting on Network Processors Tomáš Hrubý Kees van Reeuwijk Herbert Bos Vrije Universiteit, Amsterdam World45 Ltd. ANCS 2007 Tomáš Hrubý (VU Amsterdam, World45)
More informationBuilding high performance network functions in VPP. Ole Trøan, VPP contributor FOSDEM 2018
Building high performance network functions in VPP Ole Trøan, ot@cisco.com, VPP contributor FOSDEM 2018 1 2 This talk? Goal: Make you into VPP developers Agenda: VPP architecture An example decomposed
More informationLecture 4: RISC Computers
Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) represents an important
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationSIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto
SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols
More informationExam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence
Exam-2 Scope 1. Memory Hierarchy Design (Cache, Virtual memory) Chapter-2 slides memory-basics.ppt Optimizations of Cache Performance Memory technology and optimizations Virtual memory 2. SIMD, MIMD, Vector,
More informationAgilio OVS Software Architecture
WHITE PAPER Agilio OVS Software Architecture FOR SERVER-BASED NETWORKING THERE IS CONSTANT PRESSURE TO IMPROVE SERVER- BASED NETWORKING PERFORMANCE DUE TO THE INCREASED USE OF SERVER AND NETWORK VIRTUALIZATION
More informationBackground Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation
Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Basic Hardware Address Binding Logical VS Physical Address Space Dynamic Loading Dynamic Linking and Shared
More informationMartin Kruliš, v
Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal
More informationChapter Seven Morgan Kaufmann Publishers
Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be
More informationEEC 483 Computer Organization. Chapter 5.3 Measuring and Improving Cache Performance. Chansu Yu
EEC 483 Computer Organization Chapter 5.3 Measuring and Improving Cache Performance Chansu Yu Cache Performance Performance equation execution time = (execution cycles + stall cycles) x cycle time stall
More informationMemory Management Topics. CS 537 Lecture 11 Memory. Virtualizing Resources
Memory Management Topics CS 537 Lecture Memory Michael Swift Goals of memory management convenient abstraction for programming isolation between processes allocate scarce memory resources between competing
More information6232B: Implementing a Microsoft SQL Server 2008 R2 Database
6232B: Implementing a Microsoft SQL Server 2008 R2 Database Course Overview This instructor-led course is intended for Microsoft SQL Server database developers who are responsible for implementing a database
More informationScalable Enterprise Networks with Inexpensive Switches
Scalable Enterprise Networks with Inexpensive Switches Minlan Yu minlanyu@cs.princeton.edu Princeton University Joint work with Alex Fabrikant, Mike Freedman, Jennifer Rexford and Jia Wang 1 Enterprises
More informationMemory hierarchy review. ECE 154B Dmitri Strukov
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal
More informationAddress Translation. Tore Larsen Material developed by: Kai Li, Princeton University
Address Translation Tore Larsen Material developed by: Kai Li, Princeton University Topics Virtual memory Virtualization Protection Address translation Base and bound Segmentation Paging Translation look-ahead
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS61C L22 Caches II (1) CPS today! Lecture #22 Caches II 2005-11-16 There is one handout today at the front and back of the room! Lecturer PSOE,
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 20 Main Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Pages Pages and frames Page
More informationNetBricks: Taking the V out of NFV. Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Ratnasamy, Scott Shenker UC Berkeley, Google, ICSI
NetBricks: Taking the V out of NFV Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Ratnasamy, Scott Shenker UC Berkeley, Google, ICSI What the heck is NFV? A Short Introduction to NFV A Short
More informationComputer Architecture Spring 2016
omputer Architecture Spring 2016 Lecture 09: Prefetching Shuai Wang Department of omputer Science and Technology Nanjing University Prefetching(1/3) Fetch block ahead of demand Target compulsory, capacity,
More informationVirtual Memory 2. Today. Handling bigger address spaces Speeding translation
Virtual Memory 2 Today Handling bigger address spaces Speeding translation Considerations with page tables Two key issues with page tables Mapping must be fast Done on every memory reference, at least
More informationEECS 470 Lecture 13. Basic Caches. Fall 2018 Jon Beaumont
Basic Caches Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar of
More informationHigh-Speed Network Processors. EZchip Presentation - 1
High-Speed Network Processors EZchip Presentation - 1 NP-1c Interfaces Switch Fabric 10GE / N x1ge or Switch Fabric or Lookup Tables Counters SDRAM/FCRAM 64 x166/175mhz SRAM DDR NBT CSIX c XGMII HiGig
More information12 Cache-Organization 1
12 Cache-Organization 1 Caches Memory, 64M, 500 cycles L1 cache 64K, 1 cycles 1-5% misses L2 cache 4M, 10 cycles 10-20% misses L3 cache 16M, 20 cycles Memory, 256MB, 500 cycles 2 Improving Miss Penalty
More informationSwapping. Operating Systems I. Swapping. Motivation. Paging Implementation. Demand Paging. Active processes use more physical memory than system has
Swapping Active processes use more physical memory than system has Operating Systems I Address Binding can be fixed or relocatable at runtime Swap out P P Virtual Memory OS Backing Store (Swap Space) Main
More information