Efficient Securing of Multithreaded Server Applications

Size: px
Start display at page:

Download "Efficient Securing of Multithreaded Server Applications"

Transcription

1 Efficient Securing of Multithreaded Server Applications Mark Grechanik, Qing Xie, Matthew Hellige, Manoj Seshadrinathan, and Kelly L. Dempski Accenture Technology Labs Copyright 2007 Accenture. All rights reserved.

2 Defense In Depth A single point or layer of defense, no matter how robust, can leave the system exposed Defense in Depth establishes multiple layers of defense, all working in parallel Information Assurance through Defense-in-Depth, Directorate for Command, Control, Communications, and Computer Systems, U.S. Department of Defense Joint Staff, February 2000 Defense in Depth is a strategy that is used by U.S. federal agencies and by a majority of top 500 corporations to protect their infrastructures Copyright 2007 Accenture. All rights reserved. Accenture Technology Labs 2

3 Principles of Defense In Depth Deploy security solutions everywhere Use Cryptographic Accelerators (SSL cards) Use multiple layers of security solutions to protect the network against intrusions and attacks Content filtering, PKI Protect the support infrastructure Encryption, decryption Collect and analyze security events to determine threat levels Encryption, decryption Copyright 2007 Accenture. All rights reserved. 3

4 One Size Does Not Fit All Why not to choose the strongest and fastest security solution? Finding an acceptable solution is a science of trade-off between many factors A solution must fit certain constraints, should be, among other things, adaptable, cost/benefit effective, and maintainable Copyright 2007 Accenture. All rights reserved. 4

5 Performance and Security: Conflicting Goals Performance is a key challenge in building largescale applications since its predictability is inherently difficult Weaving security solutions into the fabric of the architectures of these applications often worsens the performance of the resulting systems The performance degradation is more than 90% when all application data is protected, and it is worse when other security mechanisms are applied Copyright 2007 Accenture. All rights reserved. 5

6 Cryptographic Algorithms Are Costly Every time when a data message crosses a security boundary, this message is encrypted and later decrypted, causing the performance penalty Cryptographic operations in the Secure Socket Layer (SSL) protocol slow down downloading files from servers from 10 to up to 100 times, and they penalize the performance for web servers by as little as a factor of 3.4 to as much as nine Copyright 2007 Accenture. All rights reserved. 6

7 User-View Model τ 1 τ 2 τ 3 τ 4 Average response time of the system per transaction τ = n n τ n Copyright 2007 Accenture. All rights reserved. 7

8 Multithreaded Server Applications Multiple thread execute different payloads Some threads perform crypto operations while other threads execute some business logic When some thread executes crypto operations on the CPU, it takes time from other threads that can execute business logic and send response to users faster If crypto operations are offloaded to some other device, their threads are put to sleep and the CPU executes other threads in parallel with crypto operations thereby improving the overall response Copyright 2007 Accenture. All rights reserved. 8

9 Our Goal Increase the throughput (reduce the average time per transaction) while implementing the Defense-In-Depth strategy We do not try to optimize specific transactions We design an approach that does not require to change applications Copyright 2007 Accenture. All rights reserved. 9

10 Insight Since cryptographic algorithms are costly and they are orthogonal to the core business logic of applications, offload them to idle services Additional CPUs Cryptographic accelerators (cards) Cryptographic processors Extending compiler with security-related optimizations Extending kernels of operating systems Graphics Processing Units (GPUs) Copyright 2007 Accenture. All rights reserved. 10

11 Considerations For Using the GPU Since GPUs are often installed on computers by default, no additional work to add them is required Many servers come with preinstalled GPUs Dell PowerEdge TM servers come with the integrated ATI ES1000 controller The GPU programming model is well-known and documented The GPU inherently supports parallel computations GPUs are cheaper than other hardware solutions GPUs are resistant to different attacks Copyright 2007 Accenture. All rights reserved. 11

12 Our Solution We offer a novel solution, called PErformance and Security Together (PEST) for securing applications efficiently Our key idea is a Graphics Processing Unit (GPU)- management scheme free the CPU from cryptographic computations PESTO comprises offloading, batching, and scheduling mechanisms for executing cryptographic algorithms efficiently on the GPU Copyright 2007 Accenture. All rights reserved. 12

13 Our Solution We offer a novel solution, called PErformance and Security TOgether (PESTO) for securing applications efficiently Our key idea is a Graphics Processing Unit (GPU)- management scheme free the CPU from cryptographic computations PESTO comprises offloading, batching, and scheduling mechanisms for executing cryptographic algorithms efficiently on the GPU Copyright 2007 Accenture. All rights reserved. 13

14 Benefits For Accenture Achieve better performance of various applications when securing them PESTO Tiger Cheaper and easierto-use solution Orthographical error Utilize previously unused resources Copyright 2007 Accenture. All rights reserved. 14

15 CPU vs GPU The CPU is based on Single Instruction Single Data (SISD) model and the GPU is based on Single Instruction Multiple Data (SIMD) Copyright 2007 Accenture. All rights reserved. 15

16 CPU (SISD) vs GPU (SIMD) In SISD the CPU executes one instruction at a time on a single data element that is loaded from a storage into the memory prior to executing the instruction In contrast, an SIMD processor (i.e., the GPU) comprises many processing units that simultaneously execute instructions from a single instruction stream on multiple data streams, one per processing unit The GPU achieves a higher level of parallelism than the CPU Copyright 2007 Accenture. All rights reserved. 16

17 The GPU Is Designed For Graphics Computations The computer memory is considered a one-dimensional array for the CPU The GPU is designed for graphic computations where the memory is viewed as a two-dimensional array Copyright 2007 Accenture. All rights reserved. 17

18 The CPU-GPU Computation Model CPU Memory Bus Read-only GPU memory (Texture) Bus Fragment P 1 Write-only texture P n Copyright 2007 Accenture. All rights reserved. 18

19 The CPU-GPU Computation Model CPU Memory Bus Read-only GPU memory (Texture) Bus Fragment P 1 Write-only texture P n Copyright 2007 Accenture. All rights reserved. 19

20 The CPU-GPU Computation Model CPU Memory Bus Read-only GPU memory (Texture) Bus Draw Call P 1 Write-only texture Fragment P n Copyright 2007 Accenture. All rights reserved. 20

21 Bottlenecks The CPU is utilized 100% Multiple threads execute payload code fragments Each transfer from the CPU to the GPU involves a certain startup cost Each invocation of the draw call on the GPU involves some initialization cost Copyright 2007 Accenture. All rights reserved. 21

22 Model CPU GPU A D Thread n Thread k CryptoThread Copyright 2007 Accenture. All rights reserved. 22

23 Model CPU GPU A B D C Thread n Thread k Thread T Copyright 2007 Accenture. All rights reserved. 23

24 Model CPU GPU A M B D T x C T d Thread n Thread k Thread T Copyright 2007 Accenture. All rights reserved. 24

25 Model CPU GPU A M B D T x C T d Thread n Thread k Thread T Copyright 2007 Accenture. All rights reserved. 25

26 Processing Times Average transfer time per byte T x = k n= 1 T x ( M ) M k n n Average time of the draw call overhead per byte T d = k n= 1 T d ( M ) M k n n Average overhead time per byte Tb = T + T x d Copyright 2007 Accenture. All rights reserved. 26

27 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 27

28 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 28

29 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 29

30 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 30

31 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 31

32 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 32

33 Dependencies Message size: T b : As the message size increases, the average overhead time per byte decreases Copyright 2007 Accenture. All rights reserved. 33

34 User-View Model CPU GPU A B D C Thread n Thread k Thread T Copyright 2007 Accenture. All rights reserved. 34

35 Experimental Results Cost (sec of processing) per byte Log 2 message size Copyright 2007 Accenture. All rights reserved. 35

36 Experimental Results message size Log 2 message size Cost (sec of processing) per byte Copyright 2007 Accenture. All rights reserved. 36

37 Our Solution Batch smaller messages into a large message Put CPU-bound threads that should perform cryptographic operations to sleep The CPU will be busy working on other threads that perform business logic-related operations Send the batched messages to the GPU, execute crypto operations, send the results back to the CPU, and wake up the waiting threads Copyright 2007 Accenture. All rights reserved. 37

38 Model M CPU GPU M M M M Message Queue C B Thread T Copyright 2007 Accenture. All rights reserved. 38

39 Model CPU GPU M Message Queue C B Thread T Copyright 2007 Accenture. All rights reserved. 39

40 Model of Batch Server Size L Server Arrivals Departures Queue Batch The GPU The server (the GPU) has a maximum capacity of K units After the service period ends, the GPU is idle, and the timer kicks off to keep track of the idle time Once the idle time exceeds the max idle time, the service will be started with the messages waiting in the queue The batch size, L, may be smaller that the max capacity K If the queue is empty at the expiration of the max idle time, service will be initiated with the first message arriving Copyright 2007 Accenture. All rights reserved. 40

41 Modeling Arrivals and Services X n-1 : the number of messages in queue immediately after the n-1 th batch has left the GPU (server) X n-1 <=K: the queue will be empty after the next service starts X n-1 >K: the queue will be left with X n-1 -K messages after the next service starts Y n =max(0, X n-1 -K): the number of messages in queue immediately after the n th service starts at the GPU G n : the number of messages that arrive during the n th batch service at the GPU X n =Y n +G n : the number of messages in queue immediately after the n th batch has left the GPU (server) Copyright 2007 Accenture. All rights reserved. 46

42 Discrete-Time Analysis y n (k) and g n (k) are distributions for Y n and G n x n (k) is a distribution for the number of messages X n x n (k) = y n (k) g n (k) The mean number of arrivals during an idle period E L 1 i= 0 [ P ] = ( L i) x( i) idle The mean number of arrivals during a busy period = busy E P ( ) E f L [ ] E A Copyright 2007 Accenture. All rights reserved. 47

43 Simulation Mean Waiting Time λ =10 λ = 8 λ = Max Capacity Copyright 2007 Accenture. All rights reserved. 50

44 Experimental Setup We implemented the AES algorithm on the GPU We used the application PetStore to evaluate PESTO We wrote a program that simulates virtual users We measured the average time per transaction for the original system for different numbers of virtual users, for the original system with the crypto operations implemented on the CPU, and with crypto operations implemented on the GPU Copyright 2007 Accenture. All rights reserved. 51

45 Results Average time per transaction No encryption AES encryption GPU offloading No encryption 0 1,000 2,000 3,000 4, Virtual users Copyright 2007 Accenture. All rights reserved. 52

46 Conclusions We designed a system called PESTO that allows users to add security without affecting performance significantly Remaining work is in linking the model to the characteristics of a system so that we can predict and advise how to fine tune our approach to a specific system Copyright 2007 Accenture. All rights reserved. 53

47 THANK YOU! QUESTIONS? Copyright 2007 Accenture. All rights reserved. 54

Parallel Processing SIMD, Vector and GPU s cont.

Parallel Processing SIMD, Vector and GPU s cont. Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP

More information

Comparing TCP performance of tunneled and non-tunneled traffic using OpenVPN. Berry Hoekstra Damir Musulin OS3 Supervisor: Jan Just Keijser Nikhef

Comparing TCP performance of tunneled and non-tunneled traffic using OpenVPN. Berry Hoekstra Damir Musulin OS3 Supervisor: Jan Just Keijser Nikhef Comparing TCP performance of tunneled and non-tunneled traffic using OpenVPN Berry Hoekstra Damir Musulin OS3 Supervisor: Jan Just Keijser Nikhef Outline Introduction Approach Research Results Conclusion

More information

Intel QuickAssist Technology

Intel QuickAssist Technology Performance Optimization Guide September 2018 Document Number: 330687-005 You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel

More information

Kernel level AES Acceleration using GPUs

Kernel level AES Acceleration using GPUs Kernel level AES Acceleration using GPUs TABLE OF CONTENTS 1 PROBLEM DEFINITION 1 2 MOTIVATIONS.................................................1 3 OBJECTIVE.....................................................2

More information

Intel QuickAssist Technology

Intel QuickAssist Technology Performance Optimization Guide January 2017 Document Number: 330687-004 You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel

More information

Hardware Acceleration for Cryptographic Functions

Hardware Acceleration for Cryptographic Functions Hardware Acceleration for Cryptographic Functions (AES Algorithm) By: Ahmed Moussa Outline Introduction Why Accelerate Cryptographic Functions? Why Hardware Acceleration? Approaches Literature Review Problem

More information

CS418 Operating Systems

CS418 Operating Systems CS418 Operating Systems Lecture 9 Processor Management, part 1 Textbook: Operating Systems by William Stallings 1 1. Basic Concepts Processor is also called CPU (Central Processing Unit). Process an executable

More information

LogCA: A High-Level Performance Model for Hardware Accelerators

LogCA: A High-Level Performance Model for Hardware Accelerators Everything should be made as simple as possible, but not simpler Albert Einstein LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

Federal Information Processing Standard (FIPS) What is it? Why should you care?

Federal Information Processing Standard (FIPS) What is it? Why should you care? Federal Information Processing Standard (FIPS) 140-2 What is it? Why should you care? SECURITY IS BECOMING A GROWING CONCERN The migration from TDM to IP communication networks has drastically increased

More information

Working with Metal Overview

Working with Metal Overview Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission

More information

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008 Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated

More information

Evaluating BFT Protocols for Spire

Evaluating BFT Protocols for Spire Evaluating BFT Protocols for Spire Henry Schuh & Sam Beckley 600.667 Advanced Distributed Systems & Networks SCADA & Spire Overview High-Performance, Scalable Spire Trusted Platform Module Known Network

More information

PacketShader: A GPU-Accelerated Software Router

PacketShader: A GPU-Accelerated Software Router PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

INTERNET PROTOCOL SECURITY (IPSEC) GUIDE.

INTERNET PROTOCOL SECURITY (IPSEC) GUIDE. INTERNET PROTOCOL SECURITY (IPSEC) GUIDE www.insidesecure.com INTRODUCING IPSEC NETWORK LAYER PACKET SECURITY With the explosive growth of the Internet, more and more enterprises are looking towards building

More information

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved Avi Shapira Graphic Remedy Copyright Khronos Group, 2009 - Page 1 2004 2009 Graphic Remedy. All Rights Reserved Debugging and profiling 3D applications are both hard and time consuming tasks Companies

More information

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. 1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,

More information

Cryptographic Hardware Support for the Linux Kernel

Cryptographic Hardware Support for the Linux Kernel Cryptographic Hardware Support for the Linux Kernel James Morris Red Hat Inc. Oregon Networking Summit, July 2004 Current Status Simple crypto API in the 2.6 kernel, designed primarily for IPSec and then

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

CS533 Concepts of Operating Systems. Jonathan Walpole

CS533 Concepts of Operating Systems. Jonathan Walpole CS533 Concepts of Operating Systems Jonathan Walpole SEDA: An Architecture for Well- Conditioned Scalable Internet Services Overview What does well-conditioned mean? Internet service architectures - thread

More information

GPU Accelerated Machine Learning for Bond Price Prediction

GPU Accelerated Machine Learning for Bond Price Prediction GPU Accelerated Machine Learning for Bond Price Prediction Venkat Bala Rafael Nicolas Fermin Cota Motivation Primary Goals Demonstrate potential benefits of using GPUs over CPUs for machine learning Exploit

More information

Parallel Systems. Project topics

Parallel Systems. Project topics Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a

More information

Let s look at each and begin with a view into the software

Let s look at each and begin with a view into the software Power Consumption Overview In this lesson we will Identify the different sources of power consumption in embedded systems. Look at ways to measure power consumption. Study several different methods for

More information

GeoImaging Accelerator Pansharpen Test Results. Executive Summary

GeoImaging Accelerator Pansharpen Test Results. Executive Summary Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance Whitepaper), the same approach has

More information

Introduction to GPU programming with CUDA

Introduction to GPU programming with CUDA Introduction to GPU programming with CUDA Dr. Juan C Zuniga University of Saskatchewan, WestGrid UBC Summer School, Vancouver. June 12th, 2018 Outline 1 Overview of GPU computing a. what is a GPU? b. GPU

More information

Optimisation. CS7GV3 Real-time Rendering

Optimisation. CS7GV3 Real-time Rendering Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that

More information

GPGPU introduction and network applications. PacketShaders, SSLShader

GPGPU introduction and network applications. PacketShaders, SSLShader GPGPU introduction and network applications PacketShaders, SSLShader Agenda GPGPU Introduction Computer graphics background GPGPUs past, present and future PacketShader A GPU-Accelerated Software Router

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief Meet the Increased Demands on Your Infrastructure with Dell and Intel ServerWatchTM Executive Brief a QuinStreet Excutive Brief. 2012 Doing more with less is the mantra that sums up much of the past decade,

More information

vs. GPU Performance Without the Answer University of Virginia Computer Engineering g Labs

vs. GPU Performance Without the Answer University of Virginia Computer Engineering g Labs Where is the Data? Why you Cannot Debate CPU vs. GPU Performance Without the Answer Chris Gregg and Kim Hazelwood University of Virginia Computer Engineering g Labs 1 GPUs and Data Transfer GPU computing

More information

Enabling Efficient and Scalable Zero-Trust Security

Enabling Efficient and Scalable Zero-Trust Security WHITE PAPER Enabling Efficient and Scalable Zero-Trust Security FOR CLOUD DATA CENTERS WITH AGILIO SMARTNICS THE NEED FOR ZERO-TRUST SECURITY The rapid evolution of cloud-based data centers to support

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9 General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

Gen-Z Overview. 1. Introduction. 2. Background. 3. A better way to access data. 4. Why a memory-semantic fabric

Gen-Z Overview. 1. Introduction. 2. Background. 3. A better way to access data. 4. Why a memory-semantic fabric Gen-Z Overview 1. Introduction Gen-Z is a new data access technology that will allow business and technology leaders, to overcome current challenges with the existing computer architecture and provide

More information

Securing Network Communications

Securing Network Communications Securing Network Communications Demonstration: Securing network access with Whitenoise Labs identity management, one-time-pad dynamic authentication, and onetime-pad authenticated encryption. Use of Whitenoise

More information

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme NET1343BU NSX Performance Samuel Kommu #VMworld #NET1343BU Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no

More information

Exploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.

More information

Building NVLink for Developers

Building NVLink for Developers Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized

More information

SMD149 - Operating Systems

SMD149 - Operating Systems SMD149 - Operating Systems Roland Parviainen November 3, 2005 1 / 45 Outline Overview 2 / 45 Process (tasks) are necessary for concurrency Instance of a program in execution Next invocation of the program

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

Out-of-Order Parallel Simulation of SystemC Models. G. Liu, T. Schmidt, R. Dömer (CECS) A. Dingankar, D. Kirkpatrick (Intel Corp.)

Out-of-Order Parallel Simulation of SystemC Models. G. Liu, T. Schmidt, R. Dömer (CECS) A. Dingankar, D. Kirkpatrick (Intel Corp.) Out-of-Order Simulation of s using Intel MIC Architecture G. Liu, T. Schmidt, R. Dömer (CECS) A. Dingankar, D. Kirkpatrick (Intel Corp.) Speaker: Rainer Dömer doemer@uci.edu Center for Embedded Computer

More information

CS 856 Latency in Communication Systems

CS 856 Latency in Communication Systems CS 856 Latency in Communication Systems Winter 2010 Latency Challenges CS 856, Winter 2010, Latency Challenges 1 Overview Sources of Latency low-level mechanisms services Application Requirements Latency

More information

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically

More information

TinySec: A Link Layer Security Architecture for Wireless Sensor Networks. Presented by Paul Ruggieri

TinySec: A Link Layer Security Architecture for Wireless Sensor Networks. Presented by Paul Ruggieri TinySec: A Link Layer Security Architecture for Wireless Sensor Networks Chris Karlof, Naveen Sastry,, David Wagner Presented by Paul Ruggieri 1 Introduction What is TinySec? Link-layer security architecture

More information

Distributed File Systems Part II. Distributed File System Implementation

Distributed File Systems Part II. Distributed File System Implementation s Part II Daniel A. Menascé Implementation File Usage Patterns File System Structure Caching Replication Example: NFS 1 Implementation: File Usage Patterns Static Measurements: - distribution of file size,

More information

Performance Implications of Security Protocols

Performance Implications of Security Protocols Performance Implications of Security Protocols Varsha Mainkar Technical Staff Member Network Design & Performance Analysis Advanced Technologies, Joint Work with Paul Reeser 5th INFORMS Telecom Conference

More information

HP S1500 SSL Appliance. Product overview. Key features. Data sheet

HP S1500 SSL Appliance. Product overview. Key features. Data sheet HP S1500 SSL Appliance Data sheet Product overview The HP S1500 SSL Appliance provides hardware-accelerated Secure Sockets Layer (SSL) offloading and bridging to enable high-performance intrusion prevention

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

An Experimental Analysis on Iterative Block Ciphers and Their Effects on VoIP under Different Coding Schemes

An Experimental Analysis on Iterative Block Ciphers and Their Effects on VoIP under Different Coding Schemes An Experimental Analysis on Iterative Block Ciphers and Their Effects on VoIP under Different Coding Schemes Gregory Epiphaniou 1 Carsten Maple 1 Paul Sant 1 Matthew Reeves 2 1 Institute for Research in

More information

Configuring iscsi in a VMware ESX Server 3 Environment B E S T P R A C T I C E S

Configuring iscsi in a VMware ESX Server 3 Environment B E S T P R A C T I C E S Configuring iscsi in a VMware ESX Server 3 Environment B E S T P R A C T I C E S Contents Introduction...1 iscsi Explained...1 Initiators...1 Discovery and Logging On...2 Authentication...2 Designing the

More information

Who s Protecting Your Keys? August 2018

Who s Protecting Your Keys? August 2018 Who s Protecting Your Keys? August 2018 Protecting the most vital data from the core to the cloud to the field Trusted, U.S. based source for cyber security solutions We develop, manufacture, sell and

More information

WHITE PAPER A10 SSL INSIGHT & FIREWALL LOAD BALANCING WITH SONICWALL NEXT-GEN FIREWALLS

WHITE PAPER A10 SSL INSIGHT & FIREWALL LOAD BALANCING WITH SONICWALL NEXT-GEN FIREWALLS WHITE PAPER A10 SSL INSIGHT & FIREWALL LOAD BALANCING WITH SONICWALL NEXT-GEN FIREWALLS TABLE OF CONTENTS EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 SOLUTION REQUIREMENTS... 3 SOLUTION COMPONENTS... 4 SOLUTION

More information

HAI Network Communication Protocol Description

HAI Network Communication Protocol Description Home Automation, Inc. HAI Network Communication Protocol Description This document contains the intellectual property of Home Automation, Inc. (HAI). HAI authorizes the use of this information for the

More information

Process management. Scheduling

Process management. Scheduling Process management Scheduling Points of scheduler invocation (recap) User Kernel Return from system call Process Schedule Return from interrupt handler Timer interrupts to ensure OS control Return from

More information

What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved. What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

BUILDING A NEXT-GENERATION FIREWALL

BUILDING A NEXT-GENERATION FIREWALL How to Add Network Intelligence, Security, and Speed While Getting to Market Faster INNOVATORS START HERE. EXECUTIVE SUMMARY Your clients are on the front line of cyberspace and they need your help. Faced

More information

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently

More information

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)

More information

Episode 4. Flow and Congestion Control. Baochun Li Department of Electrical and Computer Engineering University of Toronto

Episode 4. Flow and Congestion Control. Baochun Li Department of Electrical and Computer Engineering University of Toronto Episode 4. Flow and Congestion Control Baochun Li Department of Electrical and Computer Engineering University of Toronto Recall the previous episode Detailed design principles in: The link layer The network

More information

Schema-Agnostic Indexing with Azure Document DB

Schema-Agnostic Indexing with Azure Document DB Schema-Agnostic Indexing with Azure Document DB Introduction Azure DocumentDB is Microsoft s multi-tenant distributed database service for managing JSON documents at Internet scale Multi-tenancy is an

More information

3. Memory Management

3. Memory Management Principles of Operating Systems CS 446/646 3. Memory Management René Doursat Department of Computer Science & Engineering University of Nevada, Reno Spring 2006 Principles of Operating Systems CS 446/646

More information

IBM Spectrum Scale IO performance

IBM Spectrum Scale IO performance IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial

More information

AN 831: Intel FPGA SDK for OpenCL

AN 831: Intel FPGA SDK for OpenCL AN 831: Intel FPGA SDK for OpenCL Host Pipelined Multithread Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Intel FPGA SDK for OpenCL Host Pipelined Multithread...3 1.1

More information

Scalable Streaming Analytics

Scalable Streaming Analytics Scalable Streaming Analytics KARTHIK RAMASAMY @karthikz TALK OUTLINE BEGIN I! II ( III b Overview Storm Overview Storm Internals IV Z V K Heron Operational Experiences END WHAT IS ANALYTICS? according

More information

MobiLink Performance. A whitepaper from ianywhere Solutions, Inc., a subsidiary of Sybase, Inc.

MobiLink Performance. A whitepaper from ianywhere Solutions, Inc., a subsidiary of Sybase, Inc. MobiLink Performance A whitepaper from ianywhere Solutions, Inc., a subsidiary of Sybase, Inc. Contents Executive summary 2 Introduction 3 What are the time-consuming steps in MobiLink synchronization?

More information

AES Cryptosystem Acceleration Using Graphics Processing Units. Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley

AES Cryptosystem Acceleration Using Graphics Processing Units. Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley AES Cryptosystem Acceleration Using Graphics Processing Units Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley Overview Introduction Compute Unified Device Architecture (CUDA) Advanced

More information

CS420: Operating Systems

CS420: Operating Systems Main Memory James Moscola Department of Engineering & Computer Science York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Background Program must

More information

Operating System Concepts Ch. 5: Scheduling

Operating System Concepts Ch. 5: Scheduling Operating System Concepts Ch. 5: Scheduling Silberschatz, Galvin & Gagne Scheduling In a multi-programmed system, multiple processes may be loaded into memory at the same time. We need a procedure, or

More information

Copyright Push Technology Ltd December Diffusion TM 4.4 Performance Benchmarks

Copyright Push Technology Ltd December Diffusion TM 4.4 Performance Benchmarks Diffusion TM 4.4 Performance Benchmarks November 2012 Contents 1 Executive Summary...3 2 Introduction...3 3 Environment...4 4 Methodology...5 4.1 Throughput... 5 4.2 Latency... 6 5 Results Summary...7

More information

Efficient Lists Intersection by CPU- GPU Cooperative Computing

Efficient Lists Intersection by CPU- GPU Cooperative Computing Efficient Lists Intersection by CPU- GPU Cooperative Computing Di Wu, Fan Zhang, Naiyong Ao, Gang Wang, Xiaoguang Liu, Jing Liu Nankai-Baidu Joint Lab, Nankai University Outline Introduction Cooperative

More information

CPS 110 Final Exam. Spring 2011

CPS 110 Final Exam. Spring 2011 CPS 110 Final Exam Spring 2011 Please answer all questions for a total of 300 points. Keep it clear and concise: answers are graded on content, not style. I expect that you can answer each question within

More information

Exposing Congestion Attack on Emerging Connected Vehicle based Traffic Signal Control

Exposing Congestion Attack on Emerging Connected Vehicle based Traffic Signal Control Exposing Congestion Attack on Emerging Connected Vehicle based Traffic Signal Control Qi Alfred Chen, Yucheng Yin, Yiheng Feng, Z. Morley Mao, Henry X. Liu University of Michigan Background: Connected

More information

EMBEDDED ENCRYPTION PLATFORM BENEFIT ANALYSIS

EMBEDDED ENCRYPTION PLATFORM BENEFIT ANALYSIS EMBEDDED ENCRYPTION PLATFORM BENEFIT ANALYSIS MerlinCryption s forward-looking technology proactively secures clients against today s threats and tomorrow s risks. A significant advantage to securing systems

More information

Trusted Platform Module explained

Trusted Platform Module explained Bosch Security Systems Video Systems Trusted Platform Module explained What it is, what it does and what its benefits are 3 August 2016 2 Bosch Security Systems Video Systems Table of contents Table of

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

COMP SCI 3SH3: Operating System Concepts (Term 2 Winter 2006) Test 2 February 27, 2006; Time: 50 Minutes ;. Questions Instructor: Dr.

COMP SCI 3SH3: Operating System Concepts (Term 2 Winter 2006) Test 2 February 27, 2006; Time: 50 Minutes ;. Questions Instructor: Dr. COMP SCI 3SH3: Operating System Concepts (Term 2 Winter 2006) Test 2 February 27, 2006; Time: 50 Minutes ;. Questions Instructor: Dr. Kamran Sartipi Name: Student ID: Question 1 (Disk Block Allocation):

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this

More information

Enabling High Performance Bulk Data Transfers With SSH

Enabling High Performance Bulk Data Transfers With SSH Enabling High Performance Bulk Data Transfers With SSH Chris Rapier Benjamin Bennett TIP 08 Moving Data Still crazy after all these years Multiple solutions exist Protocols UDT, SABUL, etc Implementations

More information

I/O Buffering and Streaming

I/O Buffering and Streaming I/O Buffering and Streaming I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks

More information

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 Video Codecs 70% of internet traffic will be video in 2018 [CISCO] Video

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

Emulex LPe16000B Gen 5 Fibre Channel HBA Feature Comparison

Emulex LPe16000B Gen 5 Fibre Channel HBA Feature Comparison Demartek Emulex LPe16000B Gen 5 Fibre Channel HBA Feature Comparison Evaluation report prepared under contract with Emulex Executive Summary Explosive growth in the complexity and amount of data of today

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7 General Purpose GPU Programming Advanced Operating Systems Tutorial 7 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

OPERATING SYSTEMS. Systems with Multi-programming. CS 3502 Spring Chapter 4

OPERATING SYSTEMS. Systems with Multi-programming. CS 3502 Spring Chapter 4 OPERATING SYSTEMS CS 3502 Spring 2018 Systems with Multi-programming Chapter 4 Multiprogramming - Review An operating system can support several processes in memory. While one process receives service

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Assignment 3 (Due date: Thursday, 10/15/2009, in class) Part One: Provide brief answers to the following Chapter Exercises questions:

Assignment 3 (Due date: Thursday, 10/15/2009, in class) Part One: Provide brief answers to the following Chapter Exercises questions: Assignment 3 (Due date: Thursday, 10/15/2009, in class) Your name: Date: Part One: Provide brief answers to the following Chapter Exercises questions: 4.7 Provide two programming examples in which multithreading

More information

Incremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs

Incremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs Incremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs Amit Kalele and Manoj Nambiar April 21, 2014 1 Optimization & Parallelization COE Center of Excellence

More information

BlueDBM: An Appliance for Big Data Analytics*

BlueDBM: An Appliance for Big Data Analytics* BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting

More information

Securing the Frisbee Multicast Disk Loader

Securing the Frisbee Multicast Disk Loader Securing the Frisbee Multicast Disk Loader Robert Ricci, Jonathon Duerig University of Utah 1 What is Frisbee? 2 Frisbee is Emulab s tool to install whole disk images from a server to many clients using

More information

Internet Security in my Crystal Ball

Internet Security in my Crystal Ball Steven M. Bellovin June 21, 2001 1 Florham Park, NJ 07932 AT&T Labs Research +1 973-360-8656 http://www.research.att.com/ smb Steven M. Bellovin Internet Security in my Crystal Ball security speculation

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

Scheduling - Overview

Scheduling - Overview Scheduling - Overview Quick review of textbook scheduling Linux 2.4 scheduler implementation overview Linux 2.4 scheduler code Modified Linux 2.4 scheduler Linux 2.6 scheduler comments Possible Goals of

More information

Linux multi-core scalability

Linux multi-core scalability Linux multi-core scalability Oct 2009 Andi Kleen Intel Corporation andi@firstfloor.org Overview Scalability theory Linux history Some common scalability trouble-spots Application workarounds Motivation

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete

More information

Intel Software Guard Extensions (Intel SGX) Memory Encryption Engine (MEE) Shay Gueron

Intel Software Guard Extensions (Intel SGX) Memory Encryption Engine (MEE) Shay Gueron Real World Cryptography Conference 2016 6-8 January 2016, Stanford, CA, USA Intel Software Guard Extensions (Intel SGX) Memory Encryption Engine (MEE) Shay Gueron Intel Corp., Intel Development Center,

More information

GPUs and GPGPUs. Greg Blanton John T. Lubia

GPUs and GPGPUs. Greg Blanton John T. Lubia GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware

More information