Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems

Size: px
Start display at page:

Download "Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems"

Transcription

1 Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi

2 Introduction and Motivation 2 A serious issue to the effective utilization of multicore processors is cache partitioning and sharing Simulation were used to evaluate cache partitioning in the existing studies, however, it has some limitations Excessive simulation time Absence of OS activities Proneness to simulation inaccuracy

3 Introduction and Motivation (cont.) 3 In this paper, a software approach has been used It supports static and dynamic cache partitioning by using memory address mapping It emulates hardware partitioning mechanism will examine cache partitioning policies on real time systems Three metrics were used through evaluation for optimization purposes Performance Fairness QoS

4 Cache Partitioning for Multicore Processors 4 It has two interdependent parts Mechanism Forces cache partitioning Provides partitioning policy input Policy Decides how much cache resources will be allocated to each program with an optimization objective

5 Adopted Evaluation Metrics in The Study 5 Performance Metrics Throughput (IPCs) Absolute number of IPCs Combined miss rates Summarizes miss rates Combined misses Summarizes number of cache misses QoS Metrics Suppose that QoS constraints are never violated in their case

6 Adopted Evaluation Metrics in The Study (cont.) 6 Fairness Metrics Miss rates The number of misses The slowdown for each co- secheduled program should be identical after cache partitioning In the study, fairness metrics related to single core execution with dedicated L2 cache Date required for policy metric and the evaluation metric were acquired by running a workload with different cache partitioning The result value will be in the range (-1 to 1) If the result is 1, the correlation between the 2 metrics is perfect

7 Static OS-based Cache Partitioning 7 Static cache partitioning policy predetermines the amount of cache blocks allocated to each program at the beginning of its execution Page coloring will be used in the partitioning mechanism There several bits between cache index and physical page number in the physical address It will be used for page color Addressed cache will be divided to non-intersecting regions by page color Pages with the same color are mapped to the same cache region

8 Cache Partitioning Page Coloring 8

9 Cache Partitioning Page Coloring 9

10 Dynamic OS-based Cache Partitioning 10 Adjust cache quotas among processes dynamically Page recoloring procedure Increasing the process cache resources ( i.e number of colors used by the process) The kernel rearrange the virtual memory mapping of the process Allocating physical pages of the new color Copying the memory contents Freeing the old pages Remapping virtual pages cause performance overhead Reduce the overall overhead by lowering the frequency of cache allocation adjustment Another option is using lazy method of page migration, so the content of colored page is moved only when it s accessed Average overhead of dynamic partitioning reduced to 2% Highest migration overhead observed 7%

11 Page Recoloring 11

12 Dynamic Cache Partitioning Policies 12 Cache partitioning will be adjusted periodically by the policies at the end of each epoch Dynamic cache partitioning policy for performance Adjust cache partitioning dynamically Metrics Throughput (IPCs) Combined miss rate Combined misses Fair speedup Dynamic cache partitioning policy for fairness Two dynamic policies were implemented based on FM0 and FM4 FM0 is the evaluation metric ( i.e. the ratio of the current cumulative IPC over the baseline IPC) FM4 is the cache miss rates

13 Dynamic Cache Partitioning Policies (cont.) 13 Dynamic cache partitioning policy for QoS consideration Two core workload of two programs The first is the target program The second is the partner program QoS guarantee Ensure the target program performance is larger than or equal to X% of a baseline execution of homogeneous workload on a dual core processor with half of the cache capacity allocated for each program Increase the performance of the partner program

14 Experimental Methodology 14 Hardware and software platform Dell PowerEdge1950 Two dual core, 3.0GHz Intel Xeon 5160 processors and 8GB fully Buffered DIMM (FB-DIMM) main memory Shared, 4MB, 16-way set associative L2 cache Each core has a private 32KB instruction cache and a private 32KB data cache Red Hat Enterprise Linux 4.0 Kernel linux Performance collected using pfmon

15 Evaluation Results 15 Show the improvement with the best static partitioning of each workload over shared cache

16 The Performance Static & Dynamic 16

17 Fairness Correlation between Evaluation Metrics and Policy Metrics 17

18 QoS Static & Dynamic 18

19 Related Work 19 Cache partitioning for multicore processors Page Coloring

20 Summary 20 An OS-based cache partitioning mechanism on multicore processors were designed and implemented Using it to study different cache partitioning polices Some simulation-based study findings were confirmed, however, this approach shows new insights haven t been shown by simulation Future work Reduce cache partitioning overhead Adding easy user interface Conducting partitioning research at the compiler level for both multiprogramming and multithreaded applications

21 Discussion 21 Does OS-based approach had provided new insights and observations that simulation couldn t or failed to show it?

22 References 22 Gaining Insights into Multicore Cache Partitioning:Bridging the Gap between Simulation and Real Systems hyos-slides.pdf

SRM-Buffer: An OS Buffer Management Technique to Prevent Last Level Cache from Thrashing in Multicores

SRM-Buffer: An OS Buffer Management Technique to Prevent Last Level Cache from Thrashing in Multicores SRM-Buffer: An OS Buffer Management Technique to Prevent Last Level Cache from Thrashing in Multicores Xiaoning Ding et al. EuroSys 09 Presented by Kaige Yan 1 Introduction Background SRM buffer design

More information

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores Xiaoning Ding The Ohio State University dingxn@cse.ohiostate.edu

More information

SWAP: EFFECTIVE FINE-GRAIN MANAGEMENT

SWAP: EFFECTIVE FINE-GRAIN MANAGEMENT : EFFECTIVE FINE-GRAIN MANAGEMENT OF SHARED LAST-LEVEL CACHES WITH MINIMUM HARDWARE SUPPORT Xiaodong Wang, Shuang Chen, Jeff Setter, and José F. Martínez Computer Systems Lab Cornell University Page 1

More information

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510 A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510 Incentives for migrating to Exchange 2010 on Dell PowerEdge R720xd Global Solutions Engineering

More information

Accelerate Applications Using EqualLogic Arrays with directcache

Accelerate Applications Using EqualLogic Arrays with directcache Accelerate Applications Using EqualLogic Arrays with directcache Abstract This paper demonstrates how combining Fusion iomemory products with directcache software in host servers significantly improves

More information

CSC501 Operating Systems Principles. OS Structure

CSC501 Operating Systems Principles. OS Structure CSC501 Operating Systems Principles OS Structure 1 Announcements q TA s office hour has changed Q Thursday 1:30pm 3:00pm, MRC-409C Q Or email: awang@ncsu.edu q From department: No audit allowed 2 Last

More information

Impact of Dell FlexMem Bridge on Microsoft SQL Server Database Performance

Impact of Dell FlexMem Bridge on Microsoft SQL Server Database Performance Impact of Dell FlexMem Bridge on Microsoft SQL Server Database Performance A Dell Technical White Paper Dell Database Solutions Engineering Jisha J Leena Basanthi October 2010 THIS WHITE PAPER IS FOR INFORMATIONAL

More information

Learning with Purpose

Learning with Purpose Network Measurement for 100Gbps Links Using Multicore Processors Xiaoban Wu, Dr. Peilong Li, Dr. Yongyi Ran, Prof. Yan Luo Department of Electrical and Computer Engineering University of Massachusetts

More information

Background Heterogeneous Architectures Performance Modeling Single Core Performance Profiling Multicore Performance Estimation Test Cases Multicore

Background Heterogeneous Architectures Performance Modeling Single Core Performance Profiling Multicore Performance Estimation Test Cases Multicore By Dan Stafford Background Heterogeneous Architectures Performance Modeling Single Core Performance Profiling Multicore Performance Estimation Test Cases Multicore Design Space Results & Observations General

More information

Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service

Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service * Kshitij Sudan* Sadagopan Srinivasan Rajeev Balasubramonian* Ravi Iyer Executive Summary Goal: Co-schedule N applications

More information

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,

More information

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740 Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740 A performance study of 14 th generation Dell EMC PowerEdge servers for Microsoft SQL Server Dell EMC Engineering September

More information

Improving Real-Time Performance on Multicore Platforms Using MemGuard

Improving Real-Time Performance on Multicore Platforms Using MemGuard Improving Real-Time Performance on Multicore Platforms Using MemGuard Heechul Yun University of Kansas 2335 Irving hill Rd, Lawrence, KS heechul@ittc.ku.edu Abstract In this paper, we present a case-study

More information

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1) Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling

More information

Deterministic Memory Abstraction and Supporting Multicore System Architecture

Deterministic Memory Abstraction and Supporting Multicore System Architecture Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi $, Prathap Kumar Valsan^, Renato Mancuso *, Heechul Yun $ $ University of Kansas, ^ Intel, * Boston University

More information

Chapter 3 Virtualization Model for Cloud Computing Environment

Chapter 3 Virtualization Model for Cloud Computing Environment Chapter 3 Virtualization Model for Cloud Computing Environment This chapter introduces the concept of virtualization in Cloud Computing Environment along with need of virtualization, components and characteristics

More information

Arrakis: The Operating System is the Control Plane

Arrakis: The Operating System is the Control Plane Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan Ports, Doug Woos, Arvind Krishnamurthy, Tom Anderson University of Washington Timothy Roscoe ETH Zurich Building

More information

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy)

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011 NVRAMOS Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011. 4. 19 Jongmoo Choi http://embedded.dankook.ac.kr/~choijm Contents Overview Motivation Observations Proposal:

More information

Two hours - online. The exam will be taken on line. This paper version is made available as a backup

Two hours - online. The exam will be taken on line. This paper version is made available as a backup COMP 25212 Two hours - online The exam will be taken on line. This paper version is made available as a backup UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE System Architecture Date: Monday 21st

More information

Simultaneous Multithreading on Pentium 4

Simultaneous Multithreading on Pentium 4 Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on

More information

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 1 School of

More information

White Paper. File System Throughput Performance on RedHawk Linux

White Paper. File System Throughput Performance on RedHawk Linux White Paper File System Throughput Performance on RedHawk Linux By: Nikhil Nanal Concurrent Computer Corporation August Introduction This paper reports the throughput performance of the,, and file systems

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part I: Operating system overview: Memory Management 1 Hardware background The role of primary memory Program

More information

Presented by: Nafiseh Mahmoudi Spring 2017

Presented by: Nafiseh Mahmoudi Spring 2017 Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory

More information

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors

More information

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter Motivation Memory is a shared resource Core Core Core Core

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 10 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Chapter 6: CPU Scheduling Basic Concepts

More information

Hierarchical PLABs, CLABs, TLABs in Hotspot

Hierarchical PLABs, CLABs, TLABs in Hotspot Hierarchical s, CLABs, s in Hotspot Christoph M. Kirsch ck@cs.uni-salzburg.at Hannes Payer hpayer@cs.uni-salzburg.at Harald Röck hroeck@cs.uni-salzburg.at Abstract Thread-local allocation buffers (s) are

More information

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems

More information

DELL Reference Configuration Microsoft SQL Server 2008 Fast Track Data Warehouse

DELL Reference Configuration Microsoft SQL Server 2008 Fast Track Data Warehouse DELL Reference Configuration Microsoft SQL Server 2008 Fast Track Warehouse A Dell Technical Configuration Guide base Solutions Engineering Dell Product Group Anthony Fernandez Jisha J Executive Summary

More information

PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads

PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads Ran Xu (Purdue), Subrata Mitra (Adobe Research), Jason Rahman (Facebook), Peter Bai (Purdue),

More information

WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS

WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES ON BIG AND SMALL SERVER PLATFORMS Shuang Chen*, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium

More information

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded Reference Case Study of a Multi-core, Multithreaded Processor The Sun T ( Niagara ) Computer Architecture, A Quantitative Approach, Fourth Edition, by John Hennessy and David Patterson, chapter. :/C:8

More information

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary

More information

Competitive Power Savings with VMware Consolidation on the Dell PowerEdge 2950

Competitive Power Savings with VMware Consolidation on the Dell PowerEdge 2950 Competitive Power Savings with VMware Consolidation on the Dell PowerEdge 2950 By Scott Hanson Dell Enterprise Technology Center Dell Enterprise Technology Center www.delltechcenter.com August 2007 Contents

More information

A Case Study in Optimizing GNU Radio s ATSC Flowgraph

A Case Study in Optimizing GNU Radio s ATSC Flowgraph A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%

More information

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing

More information

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation

More information

Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design

Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Based on papers by: A.Fedorova, M.Seltzer, C.Small, and D.Nussbaum Pisa November 6, 2006 Multithreaded Chip

More information

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage Performance Study of Microsoft SQL Server 2016 Dell Engineering February 2017 Table of contents

More information

Are You Insured Against Your Noisy Neighbor Sunku Ranganath, Intel Corporation Sridhar Rao, Spirent Communications

Are You Insured Against Your Noisy Neighbor Sunku Ranganath, Intel Corporation Sridhar Rao, Spirent Communications Are You Insured Against Your Noisy Neighbor Sunku Ranganath, Intel Corporation Sridhar Rao, Spirent Communications @SunkuRanganath, @ngignir Legal Disclaimer 2018 Intel Corporation. Intel, the Intel logo,

More information

Fit for Purpose Platform Positioning and Performance Architecture

Fit for Purpose Platform Positioning and Performance Architecture Fit for Purpose Platform Positioning and Performance Architecture Joe Temple IBM Monday, February 4, 11AM-12PM Session Number 12927 Insert Custom Session QR if Desired. Fit for Purpose Categorized Workload

More information

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr

More information

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou (  Zhejiang University Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads

More information

Utilizing the IOMMU scalably

Utilizing the IOMMU scalably Utilizing the IOMMU scalably Omer Peleg, Adam Morrison, Benjamin Serebrin, and Dan Tsafrir USENIX ATC 15 2017711456 Shin Seok Ha 1 Introduction What is an IOMMU? Provides the translation between IO addresses

More information

Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance

Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance A Dell Technical White Paper Dell Product Group Armando Acosta and James Pledge THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES

More information

46PaQ. Dimitris Miras, Saleem Bhatti, Peter Kirstein Networks Research Group Computer Science UCL. 46PaQ AHM 2005 UKLIGHT Workshop, 19 Sep

46PaQ. Dimitris Miras, Saleem Bhatti, Peter Kirstein Networks Research Group Computer Science UCL. 46PaQ AHM 2005 UKLIGHT Workshop, 19 Sep 46PaQ Dimitris Miras, Saleem Bhatti, Peter Kirstein Networks Research Group Computer Science UCL 46PaQ AHM 2005 UKLIGHT Workshop, 19 Sep 2005 1 Today s talk Overview Current Status and Results Future Work

More information

740: Computer Architecture, Fall 2013 Midterm I

740: Computer Architecture, Fall 2013 Midterm I Instructions: Full Name: Andrew ID (print clearly!): 740: Computer Architecture, Fall 2013 Midterm I October 23, 2013 Make sure that your exam has 17 pages and is not missing any sheets, then write your

More information

File Memory for Extended Storage Disk Caches

File Memory for Extended Storage Disk Caches File Memory for Disk Caches A Master s Thesis Seminar John C. Koob January 16, 2004 John C. Koob, January 16, 2004 File Memory for Disk Caches p. 1/25 Disk Cache File Memory ESDC Design Experimental Results

More information

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems

More information

Memory Management (2)

Memory Management (2) EECS 3221.3 Operating System Fundamentals No.9 Memory Management (2) Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University Memory Management Approaches Contiguous Memory

More information

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing

More information

Performance and Energy Efficiency of the 14 th Generation Dell PowerEdge Servers

Performance and Energy Efficiency of the 14 th Generation Dell PowerEdge Servers Performance and Energy Efficiency of the 14 th Generation Dell PowerEdge Servers This white paper details the performance improvements of Dell PowerEdge servers with the Intel Xeon Processor Scalable CPU

More information

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation

More information

Performance Scaling. When deciding how to implement a virtualized environment. with Dell PowerEdge 2950 Servers and VMware Virtual Infrastructure 3

Performance Scaling. When deciding how to implement a virtualized environment. with Dell PowerEdge 2950 Servers and VMware Virtual Infrastructure 3 Scaling with Dell PowerEdge 2950 Servers and VMware Virtual Infrastructure 3 To assess virtualization scalability and performance, Dell engineers tested two Dell PowerEdge 2950 servers with dual-core Intel

More information

Understanding The Performance of DPDK as a Computer Architect

Understanding The Performance of DPDK as a Computer Architect Understanding The Performance of DPDK as a Computer Architect XIAOBAN WU *, PEILONG LI *, YAN LUO *, LIANG- MIN (LARRY) WANG +, MARC PEPIN +, AND JOHN MORGAN + * UNIVERSITY OF MASSACHUSETTS LOWELL + INTEL

More information

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Motivation Scientific experiments are generating large amounts of data Education

More information

Cisco Prime Home 6.X Minimum System Requirements: Standalone and High Availability

Cisco Prime Home 6.X Minimum System Requirements: Standalone and High Availability White Paper Cisco Prime Home 6.X Minimum System Requirements: Standalone and High Availability White Paper August 2014 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public

More information

Accelerating storage performance in the PowerEdge FX2 converged architecture modular chassis

Accelerating storage performance in the PowerEdge FX2 converged architecture modular chassis Accelerating storage performance in the PowerEdge FX2 converged architecture modular chassis This white paper highlights the impressive storage performance of the PowerEdge FD332 storage node in the FX2

More information

Enhancements to Linux I/O Scheduling

Enhancements to Linux I/O Scheduling Enhancements to Linux I/O Scheduling Seetharami R. Seelam, UTEP Rodrigo Romero, UTEP Patricia J. Teller, UTEP William Buros, IBM-Austin 21 July 2005 Linux Symposium 2005 1 Introduction Dynamic Adaptability

More information

Improving Virtual Machine Scheduling in NUMA Multicore Systems

Improving Virtual Machine Scheduling in NUMA Multicore Systems Improving Virtual Machine Scheduling in NUMA Multicore Systems Jia Rao, Xiaobo Zhou University of Colorado, Colorado Springs Kun Wang, Cheng-Zhong Xu Wayne State University http://cs.uccs.edu/~jrao/ Multicore

More information

IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT

IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT 215-4-14 Authors: Deep Chatterji (dchatter@us.ibm.com) Steve McDuff (mcduffs@ca.ibm.com) CONTENTS Disclaimer...3 Pushing the limits of B2B Integrator...4

More information

IBM System x servers. Innovation comes standard

IBM System x servers. Innovation comes standard IBM System x servers Innovation comes standard IBM System x servers Highlights Build a cost-effective, flexible IT environment with IBM X-Architecture technology. Achieve maximum performance per watt with

More information

HP SAS benchmark performance tests

HP SAS benchmark performance tests HP SAS benchmark performance tests technology brief Abstract... 2 Introduction... 2 Test hardware... 2 HP ProLiant DL585 server... 2 HP ProLiant DL380 G4 and G4 SAS servers... 3 HP Smart Array P600 SAS

More information

Consolidation Assessment Final Report

Consolidation Assessment Final Report Consolidation Assessment Final Report January 2009 The foundation for a lasting relationship starts with a connection. 1.800.800.0014 biz.pcconnection.com Table of Contents Executive Overview..............................................

More information

Performance of Virtual Desktops in a VMware Infrastructure 3 Environment VMware ESX 3.5 Update 2

Performance of Virtual Desktops in a VMware Infrastructure 3 Environment VMware ESX 3.5 Update 2 Performance Study Performance of Virtual Desktops in a VMware Infrastructure 3 Environment VMware ESX 3.5 Update 2 Workload The benefits of virtualization for enterprise servers have been well documented.

More information

A Performance Characterization of Microsoft SQL Server 2005 Virtual Machines on Dell PowerEdge Servers Running VMware ESX Server 3.

A Performance Characterization of Microsoft SQL Server 2005 Virtual Machines on Dell PowerEdge Servers Running VMware ESX Server 3. A Performance Characterization of Microsoft SQL Server 2005 Virtual Machines on Dell PowerEdge Servers Running VMware ESX Server 3.5 Todd Muirhead Dell Enterprise Technology Center www.delltechcenter.com

More information

Lecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)

Lecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) Lecture: SMT, Cache Hierarchies Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized

More information

VERITAS Foundation Suite TM 2.0 for Linux PERFORMANCE COMPARISON BRIEF - FOUNDATION SUITE, EXT3, AND REISERFS WHITE PAPER

VERITAS Foundation Suite TM 2.0 for Linux PERFORMANCE COMPARISON BRIEF - FOUNDATION SUITE, EXT3, AND REISERFS WHITE PAPER WHITE PAPER VERITAS Foundation Suite TM 2.0 for Linux PERFORMANCE COMPARISON BRIEF - FOUNDATION SUITE, EXT3, AND REISERFS Linux Kernel 2.4.9-e3 enterprise VERSION 2 1 Executive Summary The SPECsfs3.0 NFS

More information

Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers

Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers By Todd Muirhead Dell Enterprise Technology Center Dell Enterprise Technology Center dell.com/techcenter

More information

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Juan C. Pichel Centro de Investigación en Tecnoloxías da Información (CITIUS) Universidade de Santiago de Compostela, Spain

More information

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd Performance Study Dell EMC Engineering October 2017 A Dell EMC Performance Study Revisions Date October 2017

More information

Comparing Performance of Solid State Devices and Mechanical Disks

Comparing Performance of Solid State Devices and Mechanical Disks Comparing Performance of Solid State Devices and Mechanical Disks Jiri Simsa Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University Motivation Performance gap [Pugh71] technology

More information

CS330: Operating System and Lab. (Spring 2006) I/O Systems

CS330: Operating System and Lab. (Spring 2006) I/O Systems CS330: Operating System and Lab. (Spring 2006) I/O Systems Today s Topics Block device vs. Character device Direct I/O vs. Memory-mapped I/O Polling vs. Interrupts Programmed I/O vs. DMA Blocking vs. Non-blocking

More information

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of

More information

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage Database Solutions Engineering By Raghunatha M, Ravi Ramappa Dell Product Group October 2009 Executive Summary

More information

Nested Virtualization and Server Consolidation

Nested Virtualization and Server Consolidation Nested Virtualization and Server Consolidation Vara Varavithya Department of Electrical Engineering, KMUTNB varavithya@gmail.com 1 Outline Virtualization & Background Nested Virtualization Hybrid-Nested

More information

Pexip Infinity Server Design Guide

Pexip Infinity Server Design Guide Pexip Infinity Server Design Guide Introduction This document describes the recommended specifications and deployment for servers hosting the Pexip Infinity platform. It starts with a Summary of recommendations

More information

DEMM: a Dynamic Energy-saving mechanism for Multicore Memories

DEMM: a Dynamic Energy-saving mechanism for Multicore Memories DEMM: a Dynamic Energy-saving mechanism for Multicore Memories Akbar Sharifi, Wei Ding 2, Diana Guttman 3, Hui Zhao 4, Xulong Tang 5, Mahmut Kandemir 5, Chita Das 5 Facebook 2 Qualcomm 3 Intel 4 University

More information

Optimizing Virtualized Datacenters

Optimizing Virtualized Datacenters 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 Optimizing Virtualized Datacenters Comparing the Scalability of Intel Xeon Server Processor Platforms for VMware-based Consolidation

More information

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays This whitepaper describes Dell Microsoft SQL Server Fast Track reference architecture configurations

More information

Performance Modeling and Analysis of Flash based Storage Devices

Performance Modeling and Analysis of Flash based Storage Devices Performance Modeling and Analysis of Flash based Storage Devices H. Howie Huang, Shan Li George Washington University Alex Szalay, Andreas Terzis Johns Hopkins University MSST 11 May 26, 2011 NAND Flash

More information

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief Meet the Increased Demands on Your Infrastructure with Dell and Intel ServerWatchTM Executive Brief a QuinStreet Excutive Brief. 2012 Doing more with less is the mantra that sums up much of the past decade,

More information

Intel profiling tools and roofline model. Dr. Luigi Iapichino

Intel profiling tools and roofline model. Dr. Luigi Iapichino Intel profiling tools and roofline model Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimization (and to the next hour) We will focus on tools developed

More information

Scheduling the Intel Core i7

Scheduling the Intel Core i7 Third Year Project Report University of Manchester SCHOOL OF COMPUTER SCIENCE Scheduling the Intel Core i7 Ibrahim Alsuheabani Degree Programme: BSc Software Engineering Supervisor: Prof. Alasdair Rawsthorne

More information

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of

More information

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle Dynamic Fine Grain Scheduling of Pipeline Parallelism Presented by: Ram Manohar Oruganti and Michael TeWinkle Overview Introduction Motivation Scheduling Approaches GRAMPS scheduling method Evaluation

More information

Method-Level Phase Behavior in Java Workloads

Method-Level Phase Behavior in Java Workloads Method-Level Phase Behavior in Java Workloads Andy Georges, Dries Buytaert, Lieven Eeckhout and Koen De Bosschere Ghent University Presented by Bruno Dufour dufour@cs.rutgers.edu Rutgers University DCS

More information

Cauldron: A Framework to Defend Against Cache-based Side-channel Attacks in Clouds

Cauldron: A Framework to Defend Against Cache-based Side-channel Attacks in Clouds Cauldron: A Framework to Defend Against Cache-based Side-channel Attacks in Clouds Mohammad Ahmad, Read Sprabery, Konstantin Evchenko, Abhilash Raj, Dr. Rakesh Bobba, Dr. Sibin Mohan, Dr. Roy Campbell

More information

Chapter 8: Memory-Management Strategies

Chapter 8: Memory-Management Strategies Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and

More information

Lixia Liu, Zhiyuan Li Purdue University, USA. grants ST-HEC , CPA and CPA , and by a Google Fellowship

Lixia Liu, Zhiyuan Li Purdue University, USA. grants ST-HEC , CPA and CPA , and by a Google Fellowship Lixia Liu, Zhiyuan Li Purdue University, USA PPOPP 2010, January 2009 Work supported in part by NSF through Work supported in part by NSF through grants ST-HEC-0444285, CPA-0702245 and CPA-0811587, and

More information

ISA-L Performance Report Release Test Date: Sept 29 th 2017

ISA-L Performance Report Release Test Date: Sept 29 th 2017 Test Date: Sept 29 th 2017 Revision History Date Revision Comment Sept 29 th, 2017 1.0 Initial document for release 2 Contents Audience and Purpose... 4 Test setup:... 4 Intel Xeon Platinum 8180 Processor

More information

Continuous Shape Shifting: Enabling Loop Co-optimization via Near-Free Dynamic Code Rewriting

Continuous Shape Shifting: Enabling Loop Co-optimization via Near-Free Dynamic Code Rewriting Continuous Shape Shifting: Enabling Loop Co-optimization via Near-Free Dynamic Code Rewriting Animesh Jain, Michael A. Laurenzano, Lingjia Tang and Jason Mars International Symposium on Microarchitecture

More information

Database Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question:

Database Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question: Database Workload + Low throughput (0.8 IPC on an 8-wide superscalar. 1/4 of SPEC) + Naturally threaded (and widely used) application - Already high cache miss rates on a single-threaded machine (destructive

More information

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA

More information

A Fine-grained Performance-based Decision Model for Virtualization Application Solution

A Fine-grained Performance-based Decision Model for Virtualization Application Solution A Fine-grained Performance-based Decision Model for Virtualization Application Solution Jianhai Chen College of Computer Science Zhejiang University Hangzhou City, Zhejiang Province, China 2011/08/29 Outline

More information

WHITE PAPER SINGLE & MULTI CORE PERFORMANCE OF AN ERASURE CODING WORKLOAD ON AMD EPYC

WHITE PAPER SINGLE & MULTI CORE PERFORMANCE OF AN ERASURE CODING WORKLOAD ON AMD EPYC WHITE PAPER SINGLE & MULTI CORE PERFORMANCE OF AN ERASURE CODING WORKLOAD ON AMD EPYC INTRODUCTION With the EPYC processor line, AMD is expected to take a strong position in the server market including

More information

Open Benchmark Phase 3: Windows NT Server 4.0 and Red Hat Linux 6.0

Open Benchmark Phase 3: Windows NT Server 4.0 and Red Hat Linux 6.0 Open Benchmark Phase 3: Windows NT Server 4.0 and Red Hat Linux 6.0 By Bruce Weiner (PDF version, 87 KB) June 30,1999 White Paper Contents Overview Phases 1 and 2 Phase 3 Performance Analysis File-Server

More information

... IBM Advanced Technical Skills IBM Oracle International Competency Center September 2013

... IBM Advanced Technical Skills IBM Oracle International Competency Center September 2013 Performance benefits of IBM Power Systems and IBM FlashSystem for JD Edwards EnterpriseOne IBM Power 780 server with AIX and IBM FlashSystem 820 flash storage improves batch performance in a client proof

More information

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740 Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740 A performance study with NVDIMM-N Dell EMC Engineering September 2017 A Dell EMC document category Revisions Date

More information