OneCore Storage Performance Tuning

Size: px
Start display at page:

Download "OneCore Storage Performance Tuning"

Transcription

1 OneCore Storage Performance Tuning Overview To improve Emulex adapter performance while using the OneCore Storage Linux drivers in a multi-core CPU environment, multiple performance tuning features can be employed by using the ocs_config.py shell script, such as: Specifying CPU affinity settings: o o Mapping a driver s interrupt requests (IRQs) to individual CPU cores (CPU affinity). Mapping driver threads to CPU cores. Setting CPU frequency scaling governor to performance mode, if appropriate. Using OCS driver port to non-uniform memory access (NUMA) node mapping to minimize latency by running OCS driver processes on local memory (versus remote memory). The ocs_config.py shell script is provided in the tools/bin directory in the SDK release package. This script should be run after the driver is loaded. Note: The ocs_config.py script combines and replaces the previous ocsmknod, ocs_perf_conf, and ocs_thread_perf_config.py scripts. The following sections describe the performance tuning details. MSI-X Interrupts and CPU Affinity Introduction The OneCore Storage (OCS) Linux reference drivers use MSI-X interrupts. Each PCI function uses one MSI-X assigned interrupt vector. In a multi-core CPU environment, it is generally expected that each OCS interrupt handler instance is run on its own CPU core. However, on some Linux systems, multiple OCS interrupt handlers can share CPU cores. This condition is not optimal for leveling CPU utilization across CPU cores, and can negatively impact input/output operations per second (IOPS) performance. To insure OCS interrupt handlers are distributed across CPU cores and optimize IOPS performance, each OCS interrupt handler can be explicitly bound to its own CPU core. 1 OneCore Storage Performance Tuning April 2015

2 Determining if Multiple Interrupt Handlers Share the Same CPU Core Before explicitly binding OCS interrupt handlers, you must determine whether multiple interrupt handlers are sharing the same CPU core. The interrupt counts for each CPU/driver can be observed by examining the /proc/interrupts file after running IOs on all OCS adapter ports. To display the number of interrupts that each OCS interrupt handler has used on each CPU, issue a grep ocs /proc/interrupts command. The following example shows a grep ocs /proc/interrupts output, where two OCS interrupt handlers are sharing the same CPU core (CPU2): CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 30: PCI-MSI-edge ocs 31: PCI-MSI-edge ocs In contrast, the following output shows two OCS interrupt handlers that are using different CPU cores (CPU2 and CPU3): CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 30: PCI-MSI-edge ocs 31: PCI-MSI-edge ocs Note: The number in the first column (30 and 31 in the previous examples) is the MSI-X interrupt vector assigned to each OCS interrupt handler. Binding an Interrupt Handler (IRQ) to Its Own CPU Core Each interrupt vector is represented by an entry in /proc/irq/<vector>. The /proc/irq/<vector>/smp_affinity entry is a read/write value that indicates the vector-to-cpu binding. This value represents a bitmask of CPUs that may service the interrupt. For example, a value of 1 in bit position 0 indicates that the interrupt handler only runs on CPU0. By default, a value of 1 is set for all the bit positions. For example, on an 8-core CPU, the typical value is 0xff, allowing the interrupt handler to run on any CPU. If you write a single bit value (1, 2, 4, 8 and so on) to the smp_affinity entry, the interrupt handler only runs on that corresponding CPU core. The interrupt vectors may be determined from examining the proc/ocs_* entries: # cat /proc/ocs_fc_ramd 247,4 0,113, ,114, ,115, OneCore Storage Performance Tuning April 2015

3 3,116,10719 In this example, there are four devices, using interrupt vectors 113, 114, 115, and 116. The CPU affinity may be set to CPUs 2-5 as follows: echo 4 > /proc/irq/113/smp_affinity echo 8 > /proc/irq/114/smp_affinity echo 10 > /proc/irq/115/smp_affinity echo 20 > /proc/irq/116/smp_affinity When binding interrupts to CPU cores, there are a few recommendations that may improve performance: Avoid binding to CPU0 and CPU1 (depending on the kernel version). If using hyper-threaded processors with physical cores and virtual hyper-threaded cores, bind the per-port interrupts to physical cores. That is, do not bind interrupts to a hyper-threaded core on the same physical core. Binding Threads to CPU Cores and Setting Thread Priority By default, the OneCore Storage drivers execute within kernel threads. The kernel thread priority and CPU affinity may be set to improve performance. The taskset and chrt programs are used to specify the CPU affinity and priority. The first step is to determine the process IDs for the OCS kernel threads: # cat /proc/ocs_fc_ramd 247,4 0,113, ,114, ,115, ,116,10719 In this example, there are four devices, with process IDs 10716, 10717, 10718, and The taskset command may be used to bind the threads to CPU cores 2-5. taskset c p taskset c p taskset c p taskset c p OneCore Storage Performance Tuning April 2015

4 The thread scheduler policy and priority may be set using the chrt command. In this example, the threads are set to a policy of SCHED_FIFO with a priority of 99. chrt f p chrt f p chrt f p chrt f p Setting the CPU Frequency Scaling Governor to Performance Mode The code in the ocs_config.py script sets the CPU frequency scaling governor to performance mode. CPU frequency scaling (also known as CPU throttling) is a technique where the microprocessor is regulated to run at less-than-maximum frequency to conserve power. When the CPU frequency scaling governor is set to performance mode, the CPU runs at the highest frequency within the specified minimum and maximum frequency limits. Using OCS Driver Port to NUMA Node Mapping Note: This feature is applicable only on systems that support NUMA and have NUMA enabled. In some systems/applications, it is possible that OCS driver port processes are run on remote memory (memory that is not on the same bus as the adapter). In this scenario, the processes are running on a longer signal path as compared to processes running on local memory (memory that is on the same bus as the adapter). This longer signal path not only increases latency but can become a throughput bottleneck if the signal path is shared by multiple CPUs. NUMA was designed to alleviate these latency issues and bottlenecks by grouping compute resources and memory into nodes. A NUMA node typically includes multiple CPUs, shared memory, and a memory controller running on the same bus. Memory located in the same NUMA node as the CPU currently running the process is referred to as local memory, while any memory that does not belong to the node on which the process is currently running is considered remote. Using OCS driver port to NUMA node mapping helps to keep the processes running locally within a NUMA node, which minimizes cross-node remote memory access. The ocs_config.py script recognizes the available NUMA nodes and ensures that CPU mapping is balanced on the appropriate OCS driver port. 4 OneCore Storage Performance Tuning April 2015

5 Example The following output shows a system with two NUMA nodes (two physical processors), each having eight CPUs: # lscpu grep NUMA NUMA node(s): 2 NUMA node0 CPU(s): 0-3,8-11 NUMA node1 CPU(s): 4-7,12-15 The OCS driver port-to-numa mapping (physical adapter ports to physical CPUs) can be viewed using the elxsdkutil utility s list command. In the following output, the OCS driver port 0 (in the example, [0] Emulex/Skyhawk FCoE ) is located on NUMA node 0, whereas the OCS driver ports 1-4 are located on NUMA node 1. #./elxsdkutil list egrep "Emulex NUMA" [0] Emulex/Skyhawk FCoE modeldesc: Emulex OneConnect FCoE/NIC Adapter Physical NUMA Node: 0 NUMA Map: cpu0:node0 [1] Emulex/Lancer FCoE NUMA Map: cpu4:node1 [2] Emulex/Lancer FCoE NUMA Map: cpu5:node1 [3] Emulex/Lancer FCoE NUMA Map: cpu6:node1 [4] Emulex/Lancer FCoE NUMA Map: cpu7:node1 When the ocs_config.py script is executed, it maps the OCS driver ports and the CPUs according to the following NUMA association: #./ocs_config.py --- Checking for ocs drivers ocs_fc_ramd driver found 5 OneCore Storage Performance Tuning April 2015

6 /dev/ocs_0 already exists /dev/ocs_1 already exists /dev/ocs_2 already exists /dev/ocs_3 already exists /dev/ocs_4 already exists Bind ocs:0 vec 79 to cpu_id 0 << OCS port 0 maps to cpu 0 from NUMA node 0 Bind ocs:1 vec 80 to cpu_id 4 << OCS port 1 maps to cpu 4 from NUMA node 1 Bind ocs:2 vec 81 to cpu_id 5 Bind ocs:3 vec 82 to cpu_id 6 Bind ocs:4 vec 83 to cpu_id 7 Bind ocs:0 pid 6939 to cpu_id 0 Bind ocs:1 pid 6945 to cpu_id 4 Bind ocs:2 pid 6951 to cpu_id 5 Bind ocs:3 pid 6957 to cpu_id 6 Bind ocs:4 pid 6974 to cpu_id 7 Restarting irqbalance Copyright Emulex. All rights reserved worldwide. This document refers to various companies and products by their trade names. In most, if not all cases, their respective companies claim these designations as trademarks or registered trademarks. This information is provided for reference only. Although this information is believed to be accurate and reliable at the time of publication, Emulex assumes no responsibility for errors or omissions. Emulex reserves the right to make changes or corrections without notice. This report is the property of Emulex and may not be duplicated without permission from the Company. 6 OneCore Storage Performance Tuning April 2015

Linux Network Tuning Guide for AMD EPYC Processor Based Servers

Linux Network Tuning Guide for AMD EPYC Processor Based Servers Linux Network Tuning Guide for AMD EPYC Processor Application Note Publication # 56224 Revision: 1.00 Issue Date: November 2017 Advanced Micro Devices 2017 Advanced Micro Devices, Inc. All rights reserved.

More information

Linux Network Tuning Guide for AMD EPYC Processor Based Servers

Linux Network Tuning Guide for AMD EPYC Processor Based Servers Linux Network Tuning Guide for AMD EPYC Processor Application Note Publication # 56224 Revision: 1.10 Issue Date: May 2018 Advanced Micro Devices 2018 Advanced Micro Devices, Inc. All rights reserved.

More information

OneCore Storage SDK 5.0

OneCore Storage SDK 5.0 OneCore Storage SDK 5.0 SCST 16G FC Performance Report March 28, 2014 2014 Emulex Corporation Overview Contains performance results for the SCST target mode driver with a dual-port 16G FC LPe16002B-M6

More information

Linux Kernel Hacking Free Course

Linux Kernel Hacking Free Course Linux Kernel Hacking Free Course 3 rd edition G.Grilli, University of me Tor Vergata IRQ DISTRIBUTION IN MULTIPROCESSOR SYSTEMS April 05, 2006 IRQ distribution in multiprocessor systems 1 Contents: What

More information

Measuring a 25 Gb/s and 40 Gb/s data plane

Measuring a 25 Gb/s and 40 Gb/s data plane Measuring a 25 Gb/s and 40 Gb/s data plane Christo Kleu Pervaze Akhtar 1 Contents Preliminaries Equipment Traffic generators Test topologies Host and VM configuration NUMA Architecture CPU allocation BIOS

More information

Emulex OneConnect NIC Teaming and Multiple VLAN Driver and Application Release Notes

Emulex OneConnect NIC Teaming and Multiple VLAN Driver and Application Release Notes Emulex OneConnect NIC Teaming and Multiple VLAN Driver and Application Release Notes Version (Kit): 2.8.5 Date: March 2015 Purpose and Contact Information These release notes describe the installation

More information

Emulex OneConnect NIC Teaming and Multiple VLAN Driver and Application Release Notes

Emulex OneConnect NIC Teaming and Multiple VLAN Driver and Application Release Notes Emulex OneConnect NIC Teaming and Multiple VLAN Driver and Application Release Notes Version (Kit): 2.8.5 Date: September 2014 Purpose and Contact Information These release notes describe the installation

More information

Emulex Drivers for Windows Release Notes

Emulex Drivers for Windows Release Notes Emulex Drivers for Windows Release Notes Versions: FC and FCoE Version 10.2.370.8 NIC Version 10.2.478.1 iscsi Version 10.2.421.0 Date: January 2015 Purpose and Contact Information These release notes

More information

Evaluation of Real-time Performance in Embedded Linux. Hiraku Toyooka, Hitachi. LinuxCon Europe Hitachi, Ltd All rights reserved.

Evaluation of Real-time Performance in Embedded Linux. Hiraku Toyooka, Hitachi. LinuxCon Europe Hitachi, Ltd All rights reserved. Evaluation of Real-time Performance in Embedded Linux LinuxCon Europe 2014 Hiraku Toyooka, Hitachi 1 whoami Hiraku Toyooka Software engineer at Hitachi " Working on operating systems Linux (mainly) for

More information

Performance Optimisations for HPC workloads. August 2008 Imed Chihi

Performance Optimisations for HPC workloads. August 2008 Imed Chihi Performance Optimisations for HPC workloads August 2008 Imed Chihi Agenda The computing model The assignment problem CPU sets Priorities Disk IO optimisations gettimeofday() Disabling services Memory management

More information

Scheduling the Intel Core i7

Scheduling the Intel Core i7 Third Year Project Report University of Manchester SCHOOL OF COMPUTER SCIENCE Scheduling the Intel Core i7 Ibrahim Alsuheabani Degree Programme: BSc Software Engineering Supervisor: Prof. Alasdair Rawsthorne

More information

Emulex Drivers for Windows Release Notes

Emulex Drivers for Windows Release Notes Emulex Drivers for Windows Release Notes Versions: FC and FCoE Version 10.4.246.0 NIC Version 10.4.255.23 iscsi Version 10.4.245.0 Date: March 2015 Purpose and Contact Information These release notes describe

More information

Realtime Tuning 101. Tuning Applications on Red Hat MRG Realtime Clark Williams

Realtime Tuning 101. Tuning Applications on Red Hat MRG Realtime Clark Williams Realtime Tuning 101 Tuning Applications on Red Hat MRG Realtime Clark Williams Agenda Day One Terminology and Concepts Realtime Linux differences from stock Linux Tuning Tools for Tuning Tuning Tools Lab

More information

<Insert Picture Here> Boost Linux Performance with Enhancements from Oracle

<Insert Picture Here> Boost Linux Performance with Enhancements from Oracle Boost Linux Performance with Enhancements from Oracle Chris Mason Director of Linux Kernel Engineering Linux Performance on Large Systems Exadata Hardware How large systems are different

More information

DXE-810S. Manual. 10 Gigabit PCI-EXPRESS-Express Ethernet Network Adapter V1.01

DXE-810S. Manual. 10 Gigabit PCI-EXPRESS-Express Ethernet Network Adapter V1.01 DXE-810S 10 Gigabit PCI-EXPRESS-Express Ethernet Network Adapter Manual V1.01 Table of Contents INTRODUCTION... 1 System Requirements... 1 Features... 1 INSTALLATION... 2 Unpack and Inspect... 2 Software

More information

Linux /proc/irq/<irq>/smp_affinity PC ... PCIe. 3. Legacy PCIe. INTx PCI MSI MSI-X

Linux /proc/irq/<irq>/smp_affinity PC ... PCIe. 3. Legacy PCIe. INTx PCI MSI MSI-X PC Linux /proc/irq//smp_affinity PC... 1. 2 CPU Core2 Core i Intel CPU 2. PCIe NIC 3. Legacy 8259 4. x86_64 Linux 3.2.0+ 5. PCIe INTx PCI PCI PCI PCIe MSI MSI-X 1 MSI PCI 2.3 32 MSI MSI-X PCI 3.0

More information

Achieving 98Gbps of Crosscountry TCP traffic using 2.5 hosts, 10 x 10G NICs, and 10 TCP streams

Achieving 98Gbps of Crosscountry TCP traffic using 2.5 hosts, 10 x 10G NICs, and 10 TCP streams Achieving 98Gbps of Crosscountry TCP traffic using 2.5 hosts, 10 x 10G NICs, and 10 TCP streams Eric Pouyoul, Brian Tierney ESnet January 25, 2012 ANI 100G Testbed ANI Middleware Testbed NERSC To ESnet

More information

Elxflash and LpCfg Utilities Release Notes

Elxflash and LpCfg Utilities Release Notes Elxflash and LpCfg Utilities Release Notes Versions: FreeBSD Elxflash Offline, Version 10.4.255.16 Linux ElxflashOffline Kit, Version 10.4.255.25 Linux ElxflashOffline for NIC-only Kit, Version 10.4.255.25

More information

Running High Performance Computing Workloads on Red Hat Enterprise Linux

Running High Performance Computing Workloads on Red Hat Enterprise Linux Running High Performance Computing Workloads on Red Hat Enterprise Linux Imed Chihi Senior Technical Account Manager Red Hat Global Support Services 21 January 2014 Agenda 2 The case of HPC on commodity

More information

Windows Server 2012: Server Virtualization

Windows Server 2012: Server Virtualization Windows Server 2012: Server Virtualization Module Manual Author: David Coombes, Content Master Published: 4 th September, 2012 Information in this document, including URLs and other Internet Web site references,

More information

Application Testing under Realtime Linux. Luis Claudio R. Gonçalves Red Hat Realtime Team Software Engineer

Application Testing under Realtime Linux. Luis Claudio R. Gonçalves Red Hat Realtime Team Software Engineer Application Testing under Realtime Linux Luis Claudio R. Gonçalves Red Hat Realtime Team Software Engineer Agenda * Realtime Basics * Linux and the PREEMPT_RT patch * About the Tests * Looking for bad

More information

Red Hat Enterprise Linux for Real Time 7

Red Hat Enterprise Linux for Real Time 7 Red Hat Enterprise Linux for Real Time 7 Tuning Guide Advanced tuning procedures for Red Hat Enterprise Linux for Real Time Last Updated: 2017-11-09 Red Hat Enterprise Linux for Real Time 7 Tuning Guide

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number

More information

FAQ. Release rc2

FAQ. Release rc2 FAQ Release 19.02.0-rc2 January 15, 2019 CONTENTS 1 What does EAL: map_all_hugepages(): open failed: Permission denied Cannot init memory mean? 2 2 If I want to change the number of hugepages allocated,

More information

When the OS gets in the way

When the OS gets in the way When the OS gets in the way (and what you can do about it) Mark Price @epickrram LMAX Exchange Linux When the OS gets in the way (and what you can do about it) Mark Price @epickrram LMAX Exchange It s

More information

Elxflash Offline/Online Utilities Release Notes

Elxflash Offline/Online Utilities Release Notes Elxflash Offline/Online Utilities Release Notes Versions: FreeBSD Elxflash Offline, Version 10.2.470.0 Linux ElxflashOffline Kit, Version 10.2.470.14 Linux ElxflashOffline for NIC-only Kit, Version 10.2.470.14

More information

Operating System: Chap13 I/O Systems. National Tsing-Hua University 2016, Fall Semester

Operating System: Chap13 I/O Systems. National Tsing-Hua University 2016, Fall Semester Operating System: Chap13 I/O Systems National Tsing-Hua University 2016, Fall Semester Outline Overview I/O Hardware I/O Methods Kernel I/O Subsystem Performance Application Interface Operating System

More information

Achieve Low Latency NFV with Openstack*

Achieve Low Latency NFV with Openstack* Achieve Low Latency NFV with Openstack* Yunhong Jiang Yunhong.Jiang@intel.com *Other names and brands may be claimed as the property of others. Agenda NFV and network latency Why network latency on NFV

More information

Testing real-time Linux: What to test and how.

Testing real-time Linux: What to test and how. Testing real-time Linux: What to test and how. Sripathi Kodi sripathik@in.ibm.com Agenda IBM Linux Technology Center What is a real-time Operating System? Enterprise real-time Real-Time patches for Linux

More information

Using ROS with RedHawk Linux on the NVIDIA Jetson TX2

Using ROS with RedHawk Linux on the NVIDIA Jetson TX2 A Concurrent Real-Time White Paper 2881 Gateway Drive Pompano Beach, FL 33069 (954) 974-1700 www.concurrent-rt.com Using ROS with RedHawk Linux on the NVIDIA Jetson TX2 By: Jason Baietto Chief Systems

More information

OneCore Storage SCST Driver Guide Release

OneCore Storage SCST Driver Guide Release OneCore Storage SCST Driver Guide 10.6 Release Revision 2.2 August 10, 2015 Connect Monitor Manage 2 Copyright 2012-2015 Emulex. All rights reserved worldwide. No part of this document may be reproduced

More information

Red Hat Enterprise Linux 7 Performance Tuning Guide

Red Hat Enterprise Linux 7 Performance Tuning Guide Red Hat Enterprise Linux 7 Performance Tuning Guide Optimizing subsystem throughput in Red Hat Enterprise Linux 7 Red Hat Subject Matter ExpertsLaura Bailey Red Hat Enterprise Linux 7 Performance Tuning

More information

A+ Guide to Managing and Maintaining Your PC. How Hardware and Software Work Together

A+ Guide to Managing and Maintaining Your PC. How Hardware and Software Work Together A+ Guide to Managing and Maintaining Your PC Fifth Edition Chapter 2 How Hardware and Software Work Together You Will Learn About operating systems, what they are, and what they do How an OS interfaces

More information

Multiprocessor Support

Multiprocessor Support CSC 256/456: Operating Systems Multiprocessor Support John Criswell University of Rochester 1 Outline Multiprocessor hardware Types of multi-processor workloads Operating system issues Where to run the

More information

How To Configure and Tune CoreXL on SecurePlatform

How To Configure and Tune CoreXL on SecurePlatform How To Configure and Tune CoreXL on SecurePlatform 10 April 2012 2012 Check Point Software Technologies Ltd. All rights reserved. This product and related documentation are protected by copyright and distributed

More information

Nova Scheduler: Optimizing, Configuring and Deploying NFV VNF's on OpenStack

Nova Scheduler: Optimizing, Configuring and Deploying NFV VNF's on OpenStack Nova Scheduler: Optimizing, Configuring and Deploying NFV VNF's on OpenStack Ian Jolliffe, Chris Friesen WHEN IT MATTERS, IT RUNS ON WIND RIVER. 2017 WIND RIVER. ALL RIGHTS RESERVED. Ian Jolliffe 2 2017

More information

Firmware and Boot Code Release Notes for Emulex OneConnect OCe10100 and OCe11100 Series Adapters

Firmware and Boot Code Release Notes for Emulex OneConnect OCe10100 and OCe11100 Series Adapters Firmware and Boot Code Release Notes for Emulex OneConnect OCe10100 and OCe11100 Series Adapters Versions: Firmware Version 10.0.803.31 Date: May 2014 Purpose and Contact Information These release notes

More information

Characterizing Memcached* with Intel Memory Drive Technology

Characterizing Memcached* with Intel Memory Drive Technology Characterizing Memcached* with Intel Memory Drive Technology Set-up and Configuration Guide for Benchmarking Evaluation Document Number: 336655-002US Revision History Revision Number Description Date 001

More information

Emulex Drivers Version 10.6 for Windows. Quick Installation Manual

Emulex Drivers Version 10.6 for Windows. Quick Installation Manual Emulex Drivers Version 10.6 for Windows Quick Installation Manual Connect Monitor Manage 2 Copyright 2003-2015 Emulex. All rights reserved worldwide. No part of this document may be reproduced by any means

More information

Emulex Universal Multichannel

Emulex Universal Multichannel Emulex Universal Multichannel Reference Manual Versions 11.2 UMC-OCA-RM112 Emulex Universal Multichannel Reference Manual Corporate Headquarters San Jose, CA Website www.broadcom.com Broadcom, the pulse

More information

Extending the user interface of irqbalance

Extending the user interface of irqbalance Masaryk University Faculty of Informatics Extending the user interface of irqbalance Bachelor s Thesis Veronika Kabátová Brno, Fall 2016 Masaryk University Faculty of Informatics Extending the user interface

More information

Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation

Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation Hui Wang, Peter Varman Rice University FAST 14, Feb 2014 Tiered Storage Tiered storage: HDs and SSDs q Advantages:

More information

IBM POWER8 100 GigE Adapter Best Practices

IBM POWER8 100 GigE Adapter Best Practices Introduction IBM POWER8 100 GigE Adapter Best Practices With higher network speeds in new network adapters, achieving peak performance requires careful tuning of the adapters and workloads using them.

More information

Automatic NUMA Balancing. Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master Technologist, HP

Automatic NUMA Balancing. Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master Technologist, HP Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master Technologist, HP Automatic NUMA Balancing Agenda What is NUMA, anyway? Automatic NUMA balancing internals

More information

KVM Virtualized I/O Performance

KVM Virtualized I/O Performance KVM Virtualized I/O Performance Achieving Leadership I/O Performance Using Virtio- Blk-Data-Plane Technology Preview in Red Hat Enterprise Linux 6.4 Khoa Huynh, Ph.D. - Linux Technology Center, IBM Andrew

More information

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)

More information

Tuned Pipes: End-to-end Throughput and Delay Guarantees for USB Devices. Ahmad Golchin, Zhuoqun Cheng and Richard West Boston University

Tuned Pipes: End-to-end Throughput and Delay Guarantees for USB Devices. Ahmad Golchin, Zhuoqun Cheng and Richard West Boston University Tuned Pipes: End-to-end Throughput and Delay Guarantees for USB Devices Ahmad Golchin, Zhuoqun Cheng and Richard West Boston University Motivations Cyber-physical applications Sensor-actuator loops Ubiquity

More information

CoreXL Administration Guide

CoreXL Administration Guide CoreXL Administration Guide January 3, 2008 In This Document Introduction page 2 Supported Hardware and Operating System page 2 Setting Up CoreXL page 2 Adding Processing Cores to the Hardware page 4 CoreXL

More information

OpenMPDK and unvme User Space Device Driver for Server and Data Center

OpenMPDK and unvme User Space Device Driver for Server and Data Center OpenMPDK and unvme User Space Device Driver for Server and Data Center Open source for maximally utilizing Samsung s state-of-art Storage Solution in shorter development time White Paper 2 Target Audience

More information

NuttX Realtime Programming

NuttX Realtime Programming NuttX RTOS NuttX Realtime Programming Gregory Nutt Overview Interrupts Cooperative Scheduling Tasks Work Queues Realtime Schedulers Real Time == == Deterministic Response Latency Stimulus Response Deadline

More information

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2) Lecture 15 Multiple Processor Systems Multiple Processor Systems Multiprocessors Multicomputers Continuous need for faster computers shared memory model message passing multiprocessor wide area distributed

More information

INPUT/OUTPUT ORGANIZATION

INPUT/OUTPUT ORGANIZATION INPUT/OUTPUT ORGANIZATION Accessing I/O Devices I/O interface Input/output mechanism Memory-mapped I/O Programmed I/O Interrupts Direct Memory Access Buses Synchronous Bus Asynchronous Bus I/O in CO and

More information

Non-uniform memory access (NUMA)

Non-uniform memory access (NUMA) Non-uniform memory access (NUMA) Memory access between processor core to main memory is not uniform. Memory resides in separate regions called NUMA domains. For highest performance, cores should only access

More information

Practical Introduction to

Practical Introduction to 1 2 Outline of the workshop Practical Introduction to What is ScaleMP? When do we need it? How do we run codes on the ScaleMP node on the ScaleMP Guillimin cluster? How to run programs efficiently on ScaleMP?

More information

How Linux RT_PREEMPT Works

How Linux RT_PREEMPT Works How Linux RT_PREEMPT Works A common observation about real time systems is that the cost of the increased determinism of real time is decreased throughput and increased average latency. This presentation

More information

Red Hat Enterprise MRG 2 Realtime Tuning Guide

Red Hat Enterprise MRG 2 Realtime Tuning Guide Red Hat Enterprise MRG 2 Realtime Tuning Guide Advanced tuning procedures for the Realtime component of Red Hat Enterprise MRG Lana Brindley Alison Young Cheryn Tan Red Hat Enterprise MRG 2 Realtime

More information

Reservation-Based Scheduling for IRQ Threads

Reservation-Based Scheduling for IRQ Threads Reservation-Based Scheduling for IRQ Threads Luca Abeni, Nicola Manica, Luigi Palopoli luca.abeni@unitn.it, nicola.manica@gmail.com, palopoli@dit.unitn.it University of Trento, Trento - Italy Reservation-Based

More information

CS A490 Digital Media and Interactive Systems

CS A490 Digital Media and Interactive Systems CS A490 Digital Media and Interactive Systems Lecture 11 Thread Scaling and I/O Threading and Async I/O on Linux October 30, 2013 Sam Siewert Parallel Processing Speed-up Grid Data Processing Speed-up

More information

Input / Output. School of Computer Science G51CSA

Input / Output. School of Computer Science G51CSA Input / Output 1 Overview J I/O module is the third key element of a computer system. (others are CPU and Memory) J All computer systems must have efficient means to receive input and deliver output J

More information

Lessons learnt re-writing a PubSub system. Chandru Mullaparthi - Principal Software Architect at bet365

Lessons learnt re-writing a PubSub system. Chandru Mullaparthi - Principal Software Architect at bet365 1 Lessons learnt re-writing a PubSub system Chandru Mullaparthi - Principal Software Architect at bet365 2 About Founded in 2000 Located in Stoke-on-Trent The largest online sports betting company Over

More information

DPDK Vhost/Virtio Performance Report Release 17.08

DPDK Vhost/Virtio Performance Report Release 17.08 DPDK Vhost/Virtio Performance Report Test Date: August 15 th 2017 Author: Intel DPDK Validation team Revision History Date Revision Comment August 15 th, 2017 1.0 Initial document for release 2 Contents

More information

Emulex Drivers for VMware Release Notes

Emulex Drivers for VMware Release Notes Emulex Drivers for VMware Release Notes Versions: ESXi 5.1 driver FC/FCoE: 10.6.87.0 NIC: 10.6.118.0 iscsi: 10.6.150.3 ESXi 5.5 driver FC/FCoE: 10.6.126.0 NIC: 10.6.163.0 iscsi: 10.6.150.3 ESXi 6.0 driver

More information

Optimize New Intel Xeon E based Ser vers with Emulex OneConnect and OneCommand Manager

Optimize New Intel Xeon E based Ser vers with Emulex OneConnect and OneCommand Manager W h i t e p a p e r Optimize New Intel Xeon E5-2600-based Ser vers with Emulex OneConnect and OneCommand Manager Emulex products complement Intel Xeon E5-2600 processor capabilities for virtualization,

More information

INPUT/OUTPUT ORGANIZATION

INPUT/OUTPUT ORGANIZATION INPUT/OUTPUT ORGANIZATION Accessing I/O Devices I/O interface Input/output mechanism Memory-mapped I/O Programmed I/O Interrupts Direct Memory Access Buses Synchronous Bus Asynchronous Bus I/O in CO and

More information

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time

More information

Understanding Real Time Linux. Alex Shi

Understanding Real Time Linux. Alex Shi Understanding Real Time Linux Alex Shi Agenda What s real time OS RTL project status RT testing and tracing Reasons of latency and solutions for them Resources Summary What s real time OS Real time and

More information

Comparing UFS and NVMe Storage Stack and System-Level Performance in Embedded Systems

Comparing UFS and NVMe Storage Stack and System-Level Performance in Embedded Systems Comparing UFS and NVMe Storage Stack and System-Level Performance in Embedded Systems Bean Huo, Blair Pan, Peter Pan, Zoltan Szubbocsev Micron Technology Introduction Embedded storage systems have experienced

More information

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time

More information

ENGR 3950U / CSCI 3020U Midterm Exam SOLUTIONS, Fall 2012 SOLUTIONS

ENGR 3950U / CSCI 3020U Midterm Exam SOLUTIONS, Fall 2012 SOLUTIONS SOLUTIONS ENGR 3950U / CSCI 3020U (Operating Systems) Midterm Exam October 23, 2012, Duration: 80 Minutes (10 pages, 12 questions, 100 Marks) Instructor: Dr. Kamran Sartipi Question 1 (Computer Systgem)

More information

Scheduling. Scheduling 1/51

Scheduling. Scheduling 1/51 Scheduling 1/51 Scheduler Scheduling Scheduler allocates cpu(s) to threads and processes. This action is known as scheduling. The scheduler is a part of the process manager code that handles scheduling.

More information

Performance Tuning Guidelines for Low Latency Response on AMD EPYC -Based Servers Application Note

Performance Tuning Guidelines for Low Latency Response on AMD EPYC -Based Servers Application Note Performance Tuning Guidelines for Low Latency Response on AMD EPYC -Based Servers Publication # 56263 Revision: 3.00 Issue Date: January 2018 Advanced Micro Devices 2018 Advanced Micro Devices, Inc. All

More information

INPUT/OUTPUT ORGANIZATION

INPUT/OUTPUT ORGANIZATION INPUT/OUTPUT ORGANIZATION Accessing I/O Devices I/O interface Input/output mechanism Memory-mapped I/O Programmed I/O Interrupts Direct Memory Access Buses Synchronous Bus Asynchronous Bus I/O in CO and

More information

Operating System Design Issues. I/O Management

Operating System Design Issues. I/O Management I/O Management Chapter 5 Operating System Design Issues Efficiency Most I/O devices slow compared to main memory (and the CPU) Use of multiprogramming allows for some processes to be waiting on I/O while

More information

OS In Action. Linux. /proc

OS In Action. Linux. /proc OS In Action Linux /proc Text taken from: http://www.tldp.org/ldp/linux-filesystem-hierar chy/html/proc.html Small additions and formatting by Dr.Enis KARAARSLAN, 2015 /proc is very special in that it

More information

Intel QuickAssist Technology

Intel QuickAssist Technology Performance Optimization Guide September 2018 Document Number: 330687-005 You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel

More information

Example Networks on chip Freescale: MPC Telematics chip

Example Networks on chip Freescale: MPC Telematics chip Lecture 22: Interconnects & I/O Administration Take QUIZ 16 over P&H 6.6-10, 6.12-14 before 11:59pm Project: Cache Simulator, Due April 29, 2010 NEW OFFICE HOUR TIME: Tuesday 1-2, McKinley Exams in ACES

More information

Optimizing Fusion iomemory on Red Hat Enterprise Linux 6 for Database Performance Acceleration. Sanjay Rao, Principal Software Engineer

Optimizing Fusion iomemory on Red Hat Enterprise Linux 6 for Database Performance Acceleration. Sanjay Rao, Principal Software Engineer Optimizing Fusion iomemory on Red Hat Enterprise Linux 6 for Database Performance Acceleration Sanjay Rao, Principal Software Engineer Version 1.0 August 2011 1801 Varsity Drive Raleigh NC 27606-2072 USA

More information

(MCQZ-CS604 Operating Systems)

(MCQZ-CS604 Operating Systems) command to resume the execution of a suspended job in the foreground fg (Page 68) bg jobs kill commands in Linux is used to copy file is cp (Page 30) mv mkdir The process id returned to the child process

More information

ATTO UL4D & UL5D Troubleshooting Guide

ATTO UL4D & UL5D Troubleshooting Guide ATTO UL4D & UL5D Troubleshooting Guide This document describes troubleshooting techniques that can be used to identify and resolve issues associated with the ATTO Ultra320 dual channel SCSI host adapter.

More information

ò mm_struct represents an address space in kernel ò task represents a thread in the kernel ò A task points to 0 or 1 mm_structs

ò mm_struct represents an address space in kernel ò task represents a thread in the kernel ò A task points to 0 or 1 mm_structs Last time We went through the high-level theory of scheduling algorithms Scheduling Today: View into how Linux makes its scheduling decisions Don Porter CSE 306 Lecture goals Understand low-level building

More information

LECTURE 3:CPU SCHEDULING

LECTURE 3:CPU SCHEDULING LECTURE 3:CPU SCHEDULING 1 Outline Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time CPU Scheduling Operating Systems Examples Algorithm Evaluation 2 Objectives

More information

Introducing Cache Pseudo-Locking to reduce memory access latency. Reinette Chatre

Introducing Cache Pseudo-Locking to reduce memory access latency. Reinette Chatre Introducing Cache Pseudo-Locking to reduce memory access latency Reinette Chatre About me Software Engineer at Intel (~12 years) Open Source Technology Center (OTC) Currently Enabling Cache Pseudo-Locking

More information

Scheduling. Don Porter CSE 306

Scheduling. Don Porter CSE 306 Scheduling Don Porter CSE 306 Last time ò We went through the high-level theory of scheduling algorithms ò Today: View into how Linux makes its scheduling decisions Lecture goals ò Understand low-level

More information

AMD EPYC Processors Showcase High Performance for Network Function Virtualization (NFV)

AMD EPYC Processors Showcase High Performance for Network Function Virtualization (NFV) White Paper December, 2018 AMD EPYC Processors Showcase High Performance for Network Function Virtualization (NFV) Executive Summary Data centers and cloud service providers are creating a technology shift

More information

Server Support Matrix ETERNUS Disk storage systems Server Connection Guide (Fibre Channel) ETERNUS Disk Storage System Settings

Server Support Matrix ETERNUS Disk storage systems Server Connection Guide (Fibre Channel) ETERNUS Disk Storage System Settings Preface This document briefly explains the operations that need to be performed by the user in order to connect an ETERNUS2000 model 100 or 200, ETERNUS4000 model 300, 400, 500, or 600, or ETERNUS8000

More information

Comparison of Solaris, Linux, and FreeBSD Kernels. Similarities and Differences in some major kernel subsystems.

Comparison of Solaris, Linux, and FreeBSD Kernels. Similarities and Differences in some major kernel subsystems. Comparison of Solaris, Linux, and FreeBSD Kernels Similarities and Differences in some major kernel subsystems. Topics Covered Scheduling Memory Management/Paging File Systems Observability Conclusions

More information

Chapter 5: CPU Scheduling

Chapter 5: CPU Scheduling COP 4610: Introduction to Operating Systems (Fall 2016) Chapter 5: CPU Scheduling Zhi Wang Florida State University Contents Basic concepts Scheduling criteria Scheduling algorithms Thread scheduling Multiple-processor

More information

Open Benchmark Phase 3: Windows NT Server 4.0 and Red Hat Linux 6.0

Open Benchmark Phase 3: Windows NT Server 4.0 and Red Hat Linux 6.0 Open Benchmark Phase 3: Windows NT Server 4.0 and Red Hat Linux 6.0 By Bruce Weiner (PDF version, 87 KB) June 30,1999 White Paper Contents Overview Phases 1 and 2 Phase 3 Performance Analysis File-Server

More information

Intel Hyper-Threading technology

Intel Hyper-Threading technology Intel Hyper-Threading technology technology brief Abstract... 2 Introduction... 2 Hyper-Threading... 2 Need for the technology... 2 What is Hyper-Threading?... 3 Inside the technology... 3 Compatibility...

More information

Chapter 19: Real-Time Systems. Operating System Concepts 8 th Edition,

Chapter 19: Real-Time Systems. Operating System Concepts 8 th Edition, Chapter 19: Real-Time Systems, Silberschatz, Galvin and Gagne 2009 Chapter 19: Real-Time Systems System Characteristics Features of Real-Time Systems Implementing Real-Time Operating Systems Real-Time

More information

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System Best practices Roland Mueller IBM Systems and Technology Group ISV Enablement April 2012 Copyright IBM Corporation, 2012

More information

Performance Analysis on SMP and Non-SMP for Multicore Technology

Performance Analysis on SMP and Non-SMP for Multicore Technology June, 2010 Performance Analysis on SMP and Non-SMP for Multicore Technology FTF-ENT-F0697 TieFei Zang Principle Software Engineer Introduction Multicore in communication processor technology Dual cores

More information

Emulex Driver for FreeBSD

Emulex Driver for FreeBSD Emulex Driver for FreeBSD User Manual Versions 11.0 and 11.1 pub-005374 Corporate Headquarters San Jose, CA Website www.broadcom.com Broadcom, the pulse logo, Connecting everything, the Connecting everything

More information

Maximizing VMware ESX Performance Through Defragmentation of Guest Systems

Maximizing VMware ESX Performance Through Defragmentation of Guest Systems Maximizing VMware ESX Performance Through Defragmentation of Guest Systems This paper details the results of testing performed to determine if there was any measurable performance benefit to be derived

More information

Gen 6 Fibre Channel Evaluation of Products from Emulex and Brocade

Gen 6 Fibre Channel Evaluation of Products from Emulex and Brocade Gen 6 Fibre Channel Evaluation of Products from Emulex and Brocade Gen 6 Fibre Channel provides new speeds and features for enterprise datacenters. Executive Summary Large enterprises choose Fibre Channel

More information

Emulex Drivers for VMware ESXi for OneConnect Adapters Release Notes

Emulex Drivers for VMware ESXi for OneConnect Adapters Release Notes Emulex Drivers for VMware ESXi for OneConnect Adapters Release Notes Versions: ESXi 5.5 driver FCoE: 11.2.1153.13 NIC: 11.2.1149.0 iscsi: 11.2.1153.2 ESXi 6.0 driver FCoE: 11.2.1153.13 NIC: 11.2.1149.0

More information

NVMe Performance Testing and Optimization Application Note

NVMe Performance Testing and Optimization Application Note NVMe Performance Testing and Optimization Application Note Publication # 56163 Revision: 0.72 Issue Date: December 2017 Advanced Micro Devices 2017 Advanced Micro Devices, Inc. All rights reserved. The

More information

S is infinite here 1 =

S is infinite here 1 = Lecture 12 ECEN 5653 CPU & IO Threading, Scaling, and Speed-up April 7, 2008 Sam Siewert Reminders Help Sessions E-mail siewerts@colorado.edu with ECEN5033 DEBUG in Subject Choose Meeting Date and Time

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

NightStar. NightView Source Level Debugger. Real-Time Linux Debugging and Analysis Tools BROCHURE

NightStar. NightView Source Level Debugger. Real-Time Linux Debugging and Analysis Tools BROCHURE NightStar Real-Time Linux Debugging and Analysis Tools Concurrent s NightStar is a powerful, integrated tool set for debugging and analyzing time-critical Linux applications. NightStar tools run with minimal

More information