Network Architecture Laboratory

Size: px
Start display at page:

Download "Network Architecture Laboratory"

Transcription

1 Automated Synthesis of Adversarial Workloads for Network Functions Luis Pedrosa, Rishabh Iyer, Arseniy Zaostrovnykh, Jonas Fietz, Katerina Argyraki Network Architecture Laboratory

2 Software NFs The good: The flexibility of software The software development cycle The bad: The reliability of software Inconsistent performance The ugly: Adversarial traffic / DoS / Slowdowns 2

3 We need better tools... Dynamic analysis: profiling Reasons about known inputs Helps find root cause / debug Only as good as the inputs used 3

4 We need better tools... Static analysis Reasons about potential inputs in abstract Over-approximating: WCET Under-approximating: adversarial inputs 0 Typical Adversarial MAX WCET Latency (not to scale) 4

5 CASTAN Cycle Approximating Symbolic Timing Analysis for NFs Statically analyze NF Analyze code Generate PCAP file with adversarial workload Exploit The CPU cache hierarchy Algorithmic complexity It works! Increased NF latency up to 3 5

6 Outline Introduction SymbEx in a Nutshell CASTAN Evaluation Conclusion 6

7 SymbEx in a Nutshell Procedure Interpret code with symbolic values 01: int var = input(); // α 02: return var++; // α+1 J. C. King, Symbolic Execution and Program Testing,

8 SymbEx in a Nutshell Procedure Interpret code with symbolic values 01: 02: 03: 04: 05: 06: int var = input(); // α if (var >= 0) { return var; } else { return -var; } 8

9 SymbEx in a Nutshell Procedure Interpret code with symbolic values Fork execution on symbolic conditions Keep track of path constraints 01: 02: 03: 04: 05: 06: int var = input(); // α if (var >= 0) { return var; // α if α>=0 } else { return -var; // -α if α<0 } 9

10 SymbEx in a Nutshell Procedure Interpret code with symbolic values Fork execution on symbolic conditions Keep track of path constraints SMT solver finds concrete inputs 01: 02: 03: 04: 05: 06: int var = input(); // α if (var >= 0) { return var; // α if α>=0, e.g. α=0 } else { return -var; // -α if α<0, e.g. α=-1 } 10

11 SymbEx in a Nutshell Challenges Path Explosion! Typically exponential # of paths / branch Unbounded with loops Impractical to SymbEx exhaustively 11

12 SymbEx in a Nutshell Mitigation Can t do everything: prioritize! Directed Symbolic Execution Prioritize executing relevant paths over others Graph search with heuristic Try to reach a bug / increase coverage / etc. Stop SEE when satisfied (or impatient) 12

13 CASTAN Overview Generate adversarial NF workloads Packet sequence more CPU cycles / packet Under-approximate: not WCET Largely automated 13

14 CASTAN Approach Exploits performance variation 1. CPU cache: +DRAM accesses 2. Algorithmic complexity: +instructions 3. Hashing: reverse to expose internals 14

15 CASTAN Attacking the CPU Cache Symbolic Pointers Index into memory with packet: array[packet.dst_addr] Find packets memory addresses DRAM access CPU Cache Model Simple 1-tier model of the LLC Models contention, associativity, write-back Empirical contention set model 15

16 CASTAN Attacking Algorithmic Complexity Maximize Instructions / Packet Find packets longer code paths Guide SymbEx with a Heuristic Maximize cycles w/o inducing breadth-first-search Estimate cycles / packet Receive Packet Receive Packet 16

17 CASTAN Attacking Algorithmic Complexity CFG Distance Heuristic max(successors)+cost<current> cost = cycles conservatively assuming an L1 hit 17

18 CASTAN Attacking Algorithmic Complexity CFG Distance Heuristic max(successors)+cost<current> cost = cycles conservatively assuming an L1 hit 0 18

19 CASTAN Attacking Algorithmic Complexity CFG Distance Heuristic max(successors)+cost<current> cost = cycles conservatively assuming an L1 hit

20 CASTAN Attacking Algorithmic Complexity CFG Distance Heuristic max(successors)+cost<current> cost = cycles conservatively assuming an L1 hit

21 CASTAN Attacking Algorithmic Complexity Handling Loops Distance vector algorithm Limit repeats to 2 (unrolls loops once) 21

22 CASTAN Attacking Algorithmic Complexity Handling Loops Distance vector algorithm Limit repeats to 2 (unrolls loops once) 0 22

23 CASTAN Attacking Algorithmic Complexity Handling Loops Distance vector algorithm Limit repeats to 2 (unrolls loops once)

24 CASTAN Attacking Algorithmic Complexity Handling Loops Distance vector algorithm Limit repeats to 2 (unrolls loops once)

25 CASTAN Attacking Algorithmic Complexity Handling Loops Distance vector algorithm Limit repeats to 2 (unrolls loops once)

26 CASTAN Handling Hash Functions SymbExing hash functions is hard Complex expression / Path explosion Reason about hash value, without computing it? 26

27 CASTAN Handling Hash Functions SymbExing hash functions is hard Complex expression / Path explosion Reason about hash value, without computing it? Havocing Annotate and disable hash function Assign hash value a new symbol Analyze data structure internals unencumbered Find packet hash value expected behavior 27

28 CASTAN Handling Hash Functions Packets Solve Packets Hash Inputs Reverse Hashes Hashes Solve Hashes 28

29 Evaluation Setup Network Measurement Campaign E2E Latency / Throughput Intel Xeon E5-2667v2 3.3GHz 25.6MB LLC / 32GB RAM Intel 82599ES 10Gb NICs Tester DUT 29

30 Evaluation NFs 11 NF Implementations 3 types, different data structures Unbalanced Tree Red-Black Tree Hash Ring Hash Table Hierarchical Lookup (DPDK) Single Lookup Patricia Trie NAT LB LPM 30

31 Evaluation NFs 11 NF Implementations 3 types, different data structures Unbalanced Tree Red-Black Tree Hash Ring Hash Table Hierarchical Lookup (DPDK) Single Lookup Patricia Trie NAT Algorithmic Complexity LB Cache LPM 31

32 Evaluation Workloads Baseline NOP Adversarial CASTAN (~50 flows), Manual (~50 flows) Random UniRand (1Mflows) Zipf (100kpkts, 6.7kflows) UniRand CASTAN (# flows = CASTAN) 32

33 Evaluation CDF LPM / Single Lookup Table 3 33

34 Evaluation LPM / Single Lookup Table CDF CASTAN induces DRAM accesses 3 Latency 3 UniRand; fewer flows 34

35 Evaluation LPM / Single Lookup Table -19% 35

36 Evaluation CDF NAT / Unbalanced Tree

37 Evaluation NAT / Unbalanced Tree CDF CASTAN skews the tree +70% Latency / -7% Throughput 1.7 Manual; without intuition 37

38 Conclusion CASTAN Attacks complexity, CPU cache, hash functions Little developer input Adversarial Workloads Manual when available > Uniform random for same number of flows Up to +201% latency / -19% throughput 38

39 Find out more! Look for our poster! Get the source and more: 39

40 Backup Slides 40

41 Cache Structure L3 slice L2 line 34 bits 1GB page index 15 bits 3 bits L1d line byte offset 6 bits 6 bits 1GB page offset 41

42 Latency Deviation from NOP 42

43 Throughput 43

44 LPM / Single Lookup Table 44

45 NAT / Unbalanced Tree 45

46 NAT / Hash Ring 46

47 NAT / Red-Black Tree 47

48 NAT / Hash Table 48

49 LPM / Hierarchical Lookup (DPDK) 49

50 LPM / Patricia Trie 50

51 LB / Unbalanced Tree 51

52 LB / Red-Black Tree 52

53 LB / Hash Ring 53

54 LB / Hash Table 54

Performance Contracts for Software Network Functions

Performance Contracts for Software Network Functions Performance Contracts for Software Network Functions Rishabh Iyer, Luis Pedrosa, Arseniy Zaostrovnykh, Solal Pirelli, Katerina Argyraki, and George Candea, EPFL https://www.usenix.org/conference/nsdi19/presentation/iyer

More information

Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!

Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015! Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015! ! My Topic for Today! Goal: a reliable longest name prefix lookup performance

More information

Program Testing and Analysis: Manual Testing Prof. Dr. Michael Pradel Software Lab, TU Darmstadt

Program Testing and Analysis: Manual Testing Prof. Dr. Michael Pradel Software Lab, TU Darmstadt Program Testing and Analysis: Manual Testing Prof. Dr. Michael Pradel Software Lab, TU Darmstadt Partly based on slides from Peter Müller, ETH Zurich 1 Warm-up Quiz What does the following code print?

More information

DPDK Performance Report Release Test Date: Nov 16 th 2016

DPDK Performance Report Release Test Date: Nov 16 th 2016 Test Date: Nov 16 th 2016 Revision History Date Revision Comment Nov 16 th, 2016 1.0 Initial document for release 2 Contents Audience and Purpose... 4 Test setup:... 4 Intel Xeon Processor E5-2699 v4 (55M

More information

DPDK Intel NIC Performance Report Release 18.02

DPDK Intel NIC Performance Report Release 18.02 DPDK Intel NIC Performance Report Test Date: Mar 14th 2018 Author: Intel DPDK Validation team Revision History Date Revision Comment Mar 15th, 2018 1.0 Initial document for release 2 Contents Audience

More information

DPDK Intel NIC Performance Report Release 18.05

DPDK Intel NIC Performance Report Release 18.05 DPDK Intel NIC Performance Report Test Date: Jun 1th 2018 Author: Intel DPDK Validation team Revision History Date Revision Comment Jun 4th, 2018 1.0 Initial document for release 2 Contents Audience and

More information

A Comparison of Performance and Accuracy of Measurement Algorithms in Software

A Comparison of Performance and Accuracy of Measurement Algorithms in Software A Comparison of Performance and Accuracy of Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref 1, Yang Zhou 2, Tong Yang 2, Minlan Yu 3 Yale University, Barefoot Networks 1, Peking University

More information

The Power of Batching in the Click Modular Router

The Power of Batching in the Click Modular Router The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering

More information

DPDK Intel NIC Performance Report Release 17.08

DPDK Intel NIC Performance Report Release 17.08 DPDK Intel NIC Performance Report Test Date: Aug 23th 2017 Author: Intel DPDK Validation team Revision History Date Revision Comment Aug 24th, 2017 1.0 Initial document for release 2 Contents Audience

More information

FaRM: Fast Remote Memory

FaRM: Fast Remote Memory FaRM: Fast Remote Memory Problem Context DRAM prices have decreased significantly Cost effective to build commodity servers w/hundreds of GBs E.g. - cluster with 100 machines can hold tens of TBs of main

More information

ResQ: Enabling SLOs in Network Function Virtualization

ResQ: Enabling SLOs in Network Function Virtualization ResQ: Enabling SLOs in Network Function Virtualization Amin Tootoonchian* Aurojit Panda Chang Lan Melvin Walls Katerina Argyraki Sylvia Ratnasamy Scott Shenker *Intel Labs UC Berkeley ICSI NYU Nefeli EPFL

More information

Introducing the Cray XMT. Petr Konecny May 4 th 2007

Introducing the Cray XMT. Petr Konecny May 4 th 2007 Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions

More information

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level

More information

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for

More information

Two hours - online. The exam will be taken on line. This paper version is made available as a backup

Two hours - online. The exam will be taken on line. This paper version is made available as a backup COMP 25212 Two hours - online The exam will be taken on line. This paper version is made available as a backup UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE System Architecture Date: Monday 21st

More information

RouteBricks: Exploiting Parallelism To Scale Software Routers

RouteBricks: Exploiting Parallelism To Scale Software Routers outebricks: Exploiting Parallelism To Scale Software outers Mihai Dobrescu & Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, Sylvia atnasamy

More information

Modeling the impact of CPU properties to optimize and predict packet-processing performance

Modeling the impact of CPU properties to optimize and predict packet-processing performance Case Study Modeling the impact of CPU properties to optimize and predict packet-processing performance Intel and AT&T have collaborated in a proof of concept (POC) to model and analyze the performance

More information

TOWARD PREDICTABLE PERFORMANCE IN SOFTWARE PACKET-PROCESSING PLATFORMS. Mihai Dobrescu, EPFL Katerina Argyraki, EPFL Sylvia Ratnasamy, UC Berkeley

TOWARD PREDICTABLE PERFORMANCE IN SOFTWARE PACKET-PROCESSING PLATFORMS. Mihai Dobrescu, EPFL Katerina Argyraki, EPFL Sylvia Ratnasamy, UC Berkeley TOWARD PREDICTABLE PERFORMANCE IN SOFTWARE PACKET-PROCESSING PLATFORMS Mihai Dobrescu, EPFL Katerina Argyraki, EPFL Sylvia Ratnasamy, UC Berkeley Programmable Networks 2 Industry/research community efforts

More information

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology Memory: Page Table Structure CSSE 332 Operating Systems Rose-Hulman Institute of Technology General address transla+on CPU virtual address data cache MMU Physical address Global memory Memory management

More information

Supporting Fine-Grained Network Functions through Intel DPDK

Supporting Fine-Grained Network Functions through Intel DPDK Supporting Fine-Grained Network Functions through Intel DPDK Ivano Cerrato, Mauro Annarumma, Fulvio Risso - Politecnico di Torino, Italy EWSDN 2014, September 1st 2014 This project is co-funded by the

More information

P4 Pub/Sub. Practical Publish-Subscribe in the Forwarding Plane

P4 Pub/Sub. Practical Publish-Subscribe in the Forwarding Plane P4 Pub/Sub Practical Publish-Subscribe in the Forwarding Plane Outline Address-oriented routing Publish/subscribe How to do pub/sub in the network Implementation status Outlook Subscribers Publish/Subscribe

More information

Backend for Software Data Planes

Backend for Software Data Planes The Case for a Flexible Low-Level Backend for Software Data Planes Sean Choi 1, Xiang Long 2, Muhammad Shahbaz 3, Skip Booth 4, Andy Keep 4, John Marshall 4, Changhoon Kim 5 1 2 3 4 5 Why software data

More information

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis Hardware-Software Codesign 9. Worst Case Execution Time Analysis Lothar Thiele 9-1 System Design Specification System Synthesis Estimation SW-Compilation Intellectual Prop. Code Instruction Set HW-Synthesis

More information

HW/SW Codesign. WCET Analysis

HW/SW Codesign. WCET Analysis HW/SW Codesign WCET Analysis 29 November 2017 Andres Gomez gomeza@tik.ee.ethz.ch 1 Outline Today s exercise is one long question with several parts: Basic blocks of a program Static value analysis WCET

More information

MoonGen. A Scriptable High-Speed Packet Generator. Paul Emmerich. January 31st, 2016 FOSDEM Chair for Network Architectures and Services

MoonGen. A Scriptable High-Speed Packet Generator. Paul Emmerich. January 31st, 2016 FOSDEM Chair for Network Architectures and Services MoonGen A Scriptable High-Speed Packet Generator Paul Emmerich January 31st, 216 FOSDEM 216 Chair for Network Architectures and Services Department of Informatics Paul Emmerich MoonGen: A Scriptable High-Speed

More information

Caches. Samira Khan March 23, 2017

Caches. Samira Khan March 23, 2017 Caches Samira Khan March 23, 2017 Agenda Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed

More information

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer)

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer) Agenda Caches Samira Khan March 23, 2017 Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed

More information

Chapter 5. Memory Technology

Chapter 5. Memory Technology Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6. Cache Memories From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6. Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging

More information

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, Michael

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors Yaxuan Qi (presenter), Bo Xu, Fei He, Baohua Yang, Jianming Yu and Jun Li ANCS 2007, Orlando, USA Outline Introduction

More information

Key Point. What are Cache lines

Key Point. What are Cache lines Caching 1 Key Point What are Cache lines Tags Index offset How do we find data in the cache? How do we tell if it s the right data? What decisions do we need to make in designing a cache? What are possible

More information

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research

More information

Lecture Notes: Unleashing MAYHEM on Binary Code

Lecture Notes: Unleashing MAYHEM on Binary Code Lecture Notes: Unleashing MAYHEM on Binary Code Rui Zhang February 22, 2017 1 Finding Exploitable Bugs 1.1 Main Challenge in Exploit Generation Exploring enough of the state space of an application to

More information

Throughput & Latency Control in Ethernet Backplane Interconnects. Manoj Wadekar Gary McAlpine. Intel

Throughput & Latency Control in Ethernet Backplane Interconnects. Manoj Wadekar Gary McAlpine. Intel Throughput & Latency Control in Ethernet Backplane Interconnects Manoj Wadekar Gary McAlpine Intel Date 3/16/04 Agenda Discuss Backplane challenges to Ethernet Simulation environment and definitions Preliminary

More information

CS377P Programming for Performance Single Thread Performance Caches I

CS377P Programming for Performance Single Thread Performance Caches I CS377P Programming for Performance Single Thread Performance Caches I Sreepathi Pai UTCS September 21, 2015 Outline 1 Introduction 2 Caches 3 Performance of Caches Outline 1 Introduction 2 Caches 3 Performance

More information

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1) Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling

More information

PVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim

PVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim PVPP: A Programmable Vector Packet Processor Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim Fixed Set of Protocols Fixed-Function Switch Chip TCP IPv4 IPv6

More information

Symbolic and Concolic Execution of Programs

Symbolic and Concolic Execution of Programs Symbolic and Concolic Execution of Programs Information Security, CS 526 Omar Chowdhury 10/7/2015 Information Security, CS 526 1 Reading for this lecture Symbolic execution and program testing - James

More information

Scalable Name-Based Packet Forwarding: From Millions to Billions. Tian Song, Beijing Institute of Technology

Scalable Name-Based Packet Forwarding: From Millions to Billions. Tian Song, Beijing Institute of Technology Scalable Name-Based Packet Forwarding: From Millions to Billions Tian Song, songtian@bit.edu.cn, Beijing Institute of Technology Haowei Yuan, Patrick Crowley, Washington University Beichuan Zhang, The

More information

Jinho Hwang and Timothy Wood George Washington University

Jinho Hwang and Timothy Wood George Washington University Jinho Hwang and Timothy Wood George Washington University Background: Memory Caching Two orders of magnitude more reads than writes Solution: Deploy memcached hosts to handle the read capacity 6. HTTP

More information

소프트웨어기반고성능침입탐지시스템설계및구현

소프트웨어기반고성능침입탐지시스템설계및구현 소프트웨어기반고성능침입탐지시스템설계및구현 KyoungSoo Park Department of Electrical Engineering, KAIST M. Asim Jamshed *, Jihyung Lee*, Sangwoo Moon*, Insu Yun *, Deokjin Kim, Sungryoul Lee, Yung Yi* Department of Electrical

More information

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016 Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List

More information

Adaptec MaxIQ SSD Cache Performance Solution for Web Server Environments Analysis

Adaptec MaxIQ SSD Cache Performance Solution for Web Server Environments Analysis Adaptec MaxIQ SSD Cache Performance Solution for Web Server Environments Analysis September 22, 2009 Page 1 of 7 Introduction Adaptec has requested an evaluation of the performance of the Adaptec MaxIQ

More information

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies. M6 Memory Hierarchy Module Outline CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies. Events on a Cache Miss Events on a Cache Miss Stall the pipeline.

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

Accelerate Applications Using EqualLogic Arrays with directcache

Accelerate Applications Using EqualLogic Arrays with directcache Accelerate Applications Using EqualLogic Arrays with directcache Abstract This paper demonstrates how combining Fusion iomemory products with directcache software in host servers significantly improves

More information

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Charles Eckert Xiaowei Wang Jingcheng Wang Arun Subramaniyan Ravi Iyer Dennis Sylvester David Blaauw Reetuparna Das M-Bits Research

More information

Naïve (1) CS 6354: Memory Hierarchy III. Goto Fig. 4. Naïve (2)

Naïve (1) CS 6354: Memory Hierarchy III. Goto Fig. 4. Naïve (2) Naïve (1) CS 6354: Memory Hierarchy III 5 September 2016 for (int i = 0; i < I; ++i) { for (int j = 0; j < J; ++j) { for (int k = 0; k < K; ++k) { C[i * K + k] += A[i * J + j] * B[j * K + k]; 1 2 Naïve

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection

More information

Closing the Performance Gap Between Volatile and Persistent K-V Stores

Closing the Performance Gap Between Volatile and Persistent K-V Stores Closing the Performance Gap Between Volatile and Persistent K-V Stores Yihe Huang, Harvard University Matej Pavlovic, EPFL Virendra Marathe, Oracle Labs Margo Seltzer, Oracle Labs Tim Harris, Oracle Labs

More information

Modern Virtual Memory Systems. Modern Virtual Memory Systems

Modern Virtual Memory Systems. Modern Virtual Memory Systems 6.823, L12--1 Modern Virtual Systems Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L12--2 Modern Virtual Systems illusion of a large, private, uniform store Protection

More information

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary

More information

Deduplication Storage System

Deduplication Storage System Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business

More information

V. Primary & Secondary Memory!

V. Primary & Secondary Memory! V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)

More information

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach Tiziano De Matteis, Gabriele Mencagli University of Pisa Italy INTRODUCTION The recent years have

More information

Flash-Conscious Cache Population for Enterprise Database Workloads

Flash-Conscious Cache Population for Enterprise Database Workloads IBM Research ADMS 214 1 st September 214 Flash-Conscious Cache Population for Enterprise Database Workloads Hyojun Kim, Ioannis Koltsidas, Nikolas Ioannou, Sangeetha Seshadri, Paul Muench, Clem Dickey,

More information

Scalable Locking. Adam Belay

Scalable Locking. Adam Belay Scalable Locking Adam Belay Problem: Locks can ruin performance 12 finds/sec 9 6 Locking overhead dominates 3 0 0 6 12 18 24 30 36 42 48 Cores Problem: Locks can ruin performance the locks

More information

Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation

Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation Large-Scale Data & Systems Group Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation Georgios Theodorakis, Alexandros Koliousis, Peter Pietzuch, Holger Pirk Large-Scale Data & Systems (LSDS)

More information

Securing Grid Data Transfer Services with Active Network Portals

Securing Grid Data Transfer Services with Active Network Portals Securing Grid Data Transfer Services with Active Network Portals Onur Demir 1 2 Kanad Ghose 3 Madhusudhan Govindaraju 4 Department of Computer Science Binghamton University (SUNY) {onur 1, mike 2, ghose

More information

PDP : A Flexible and Programmable Data Plane. Massimo Gallo et al.

PDP : A Flexible and Programmable Data Plane. Massimo Gallo et al. PDP : A Flexible and Programmable Data Plane Massimo Gallo et al. Introduction Network Function evolution L7 Load Balancer TLS/SSL Server Proxy Server Firewall Introduction Network Function evolution Can

More information

Program Analysis and Constraint Programming

Program Analysis and Constraint Programming Program Analysis and Constraint Programming Joxan Jaffar National University of Singapore CPAIOR MasterClass, 18-19 May 2015 1 / 41 Program Testing, Verification, Analysis (TVA)... VS... Satifiability/Optimization

More information

The dark powers on Intel processor boards

The dark powers on Intel processor boards The dark powers on Intel processor boards Processing Resources (3U VPX) Boards with Multicore CPUs: Up to 16 cores using Intel Xeon D-1577 on TR C4x/msd Boards with 4-Core CPUs and Multiple Graphical Execution

More information

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation!

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Xiangyao Yu 1, Christopher Hughes 2, Nadathur Satish 2, Onur Mutlu 3, Srinivas Devadas 1 1 MIT 2 Intel Labs 3 ETH Zürich 1 High-Bandwidth

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Learning with Purpose

Learning with Purpose Network Measurement for 100Gbps Links Using Multicore Processors Xiaoban Wu, Dr. Peilong Li, Dr. Yongyi Ran, Prof. Yan Luo Department of Electrical and Computer Engineering University of Massachusetts

More information

Main Memory and the CPU Cache

Main Memory and the CPU Cache Main Memory and the CPU Cache CPU cache Unrolled linked lists B Trees Our model of main memory and the cost of CPU operations has been intentionally simplistic The major focus has been on determining

More information

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks : Defending Against Cache-Based Side Channel Attacks Mengjia Yan, Bhargava Gopireddy, Thomas Shull, Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Presented by Mengjia

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

G-NET: Effective GPU Sharing In NFV Systems

G-NET: Effective GPU Sharing In NFV Systems G-NET: Effective Sharing In NFV Systems Kai Zhang*, Bingsheng He^, Jiayu Hu #, Zeke Wang^, Bei Hua #, Jiayi Meng #, Lishan Yang # *Fudan University ^National University of Singapore #University of Science

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

PARDA: Proportional Allocation of Resources for Distributed Storage Access

PARDA: Proportional Allocation of Resources for Distributed Storage Access PARDA: Proportional Allocation of Resources for Distributed Storage Access Ajay Gulati, Irfan Ahmad, Carl Waldspurger Resource Management Team VMware Inc. USENIX FAST 09 Conference February 26, 2009 The

More information

The.pdf version of this slide deck will have missing info, due to use of animations. The original.pptx deck is available here:

The.pdf version of this slide deck will have missing info, due to use of animations. The original.pptx deck is available here: The.pdf version of this slide deck will have missing info, due to use of animations. The original.pptx deck is available here: https://wiki.opnfv.org/download/attachments/10293193/vsperf-dataplane-perf-cap-bench.pptx?api=v2

More information

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions 1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a

More information

COS 318: Operating Systems. File Systems. Topics. Evolved Data Center Storage Hierarchy. Traditional Data Center Storage Hierarchy

COS 318: Operating Systems. File Systems. Topics. Evolved Data Center Storage Hierarchy. Traditional Data Center Storage Hierarchy Topics COS 318: Operating Systems File Systems hierarchy File system abstraction File system operations File system protection 2 Traditional Data Center Hierarchy Evolved Data Center Hierarchy Clients

More information

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores Xiaoning Ding The Ohio State University dingxn@cse.ohiostate.edu

More information

Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR

Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR Presentation by Eric Newberry and Youssef Tobah Paper by Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh 1 Motivation Buffer overflow

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140

More information

Fundamentals of Computer Systems

Fundamentals of Computer Systems Fundamentals of Computer Systems Caches Stephen A. Edwards Columbia University Summer 217 Illustrations Copyright 27 Elsevier Computer Systems Performance depends on which is slowest: the processor or

More information

Maximizing Face Detection Performance

Maximizing Face Detection Performance Maximizing Face Detection Performance Paulius Micikevicius Developer Technology Engineer, NVIDIA GTC 2015 1 Outline Very brief review of cascaded-classifiers Parallelization choices Reducing the amount

More information

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR Overview Technics of data reduction in storage systems:

More information

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies Meltdown and Spectre Interconnect Evaluation Jan 2018 1 Meltdown and Spectre - Background Most modern processors perform speculative execution This speculation can be measured, disclosing information about

More information

CS 261 Fall Caching. Mike Lam, Professor. (get it??)

CS 261 Fall Caching. Mike Lam, Professor. (get it??) CS 261 Fall 2017 Mike Lam, Professor Caching (get it??) Topics Caching Cache policies and implementations Performance impact General strategies Caching A cache is a small, fast memory that acts as a buffer

More information

Hello from the Other Side: SSH over Robust Cache Covert Channels in the Cloud

Hello from the Other Side: SSH over Robust Cache Covert Channels in the Cloud Hello from the Other Side: SSH over Robust Cache Covert Channels in the Cloud Clémentine Maurice, Manuel Weber, Michael Schwarz, Lukas Giner, Daniel Gruss, Carlo Alberto Boano, Stefan Mangard, Kay Römer

More information

Virtual Memory. CS 351: Systems Programming Michael Saelee

Virtual Memory. CS 351: Systems Programming Michael Saelee Virtual Memory CS 351: Systems Programming Michael Saelee registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) previously: SRAM

More information

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU), Dave Andersen (CMU), Michael Kaminsky (Intel Labs) NSDI 2013 http://www.pdl.cmu.edu/ 1 Goal: Improve Memcached 1. Reduce space overhead

More information

Fundamentals of Computer Systems

Fundamentals of Computer Systems Fundamentals of Computer Systems Caches Martha A. Kim Columbia University Fall 215 Illustrations Copyright 27 Elsevier 1 / 23 Computer Systems Performance depends on which is slowest: the processor or

More information

Design and Performance of the OpenBSD Stateful Packet Filter (pf)

Design and Performance of the OpenBSD Stateful Packet Filter (pf) Usenix 2002 p.1/22 Design and Performance of the OpenBSD Stateful Packet Filter (pf) Daniel Hartmeier dhartmei@openbsd.org Systor AG Usenix 2002 p.2/22 Introduction part of a firewall, working on IP packet

More information

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture An Introduction to Parallel Programming Peter Pacheco Chapter 2 Parallel Hardware and Parallel Software 1 The Von Neuman Architecture Control unit: responsible for deciding which instruction in a program

More information

Virtual CDN Implementation

Virtual CDN Implementation Virtual CDN Implementation Eugene E. Otoakhia - eugene.otoakhia@bt.com, BT Peter Willis peter.j.willis@bt.com, BT October 2017 1 Virtual CDN Implementation - Contents 1.What is BT s vcdn Concept 2.Lab

More information

Mnemosyne Lightweight Persistent Memory

Mnemosyne Lightweight Persistent Memory Mnemosyne Lightweight Persistent Memory Haris Volos Andres Jaan Tack, Michael M. Swift University of Wisconsin Madison Executive Summary Storage-Class Memory (SCM) enables memory-like storage Persistent

More information

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays. CSE420 Virtual Memory Prof. Mokhtar Aboelaze York University Based on Slides by Prof. L. Bhuyan (UCR) Prof. M. Shaaban (RIT) Virtual Memory Virtual memory first used to relive programmers from the burden

More information

Rethinking DRAM Power Modes for Energy Proportionality

Rethinking DRAM Power Modes for Energy Proportionality Rethinking DRAM Power Modes for Energy Proportionality Krishna Malladi 1, Ian Shaeffer 2, Liji Gopalakrishnan 2, David Lo 1, Benjamin Lee 3, Mark Horowitz 1 Stanford University 1, Rambus Inc 2, Duke University

More information

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava Hardware Acceleration in Computer Networks Outline Motivation for hardware acceleration Longest prefix matching using FPGA Hardware acceleration of time critical operations Framework and applications Contracted

More information

Optimizing Flash-based Key-value Cache Systems

Optimizing Flash-based Key-value Cache Systems Optimizing Flash-based Key-value Cache Systems Zhaoyan Shen, Feng Chen, Yichen Jia, Zili Shao Department of Computing, Hong Kong Polytechnic University Computer Science & Engineering, Louisiana State University

More information

Virtual to physical address translation

Virtual to physical address translation Virtual to physical address translation Virtual memory with paging Page table per process Page table entry includes present bit frame number modify bit flags for protection and sharing. Page tables can

More information

HW1 Solutions. Type Old Mix New Mix Cost CPI

HW1 Solutions. Type Old Mix New Mix Cost CPI HW1 Solutions Problem 1 TABLE 1 1. Given the parameters of Problem 6 (note that int =35% and shift=5% to fix typo in book problem), consider a strength-reducing optimization that converts multiplies by

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

Benchmarking and Analysis of Software Network Data Planes

Benchmarking and Analysis of Software Network Data Planes Benchmarking and Analysis of Software Network Data Planes Maciek Konstantynowicz Distinguished Engineer, Cisco (FD.io CSIT Project Lead) Patrick Lu Performance Engineer, Intel Corporation, (FD.io pma_tools

More information