Efficient Securing of Multithreaded Server Applications
|
|
- Christopher Berry
- 5 years ago
- Views:
Transcription
1 Efficient Securing of Multithreaded Server Applications Mark Grechanik, Qing Xie, Matthew Hellige, Manoj Seshadrinathan, and Kelly L. Dempski Accenture Technology Labs Copyright 2007 Accenture. All rights reserved.
2 Defense In Depth A single point or layer of defense, no matter how robust, can leave the system exposed Defense in Depth establishes multiple layers of defense, all working in parallel Information Assurance through Defense-in-Depth, Directorate for Command, Control, Communications, and Computer Systems, U.S. Department of Defense Joint Staff, February 2000 Defense in Depth is a strategy that is used by U.S. federal agencies and by a majority of top 500 corporations to protect their infrastructures Copyright 2007 Accenture. All rights reserved. Accenture Technology Labs 2
3 Principles of Defense In Depth Deploy security solutions everywhere Use Cryptographic Accelerators (SSL cards) Use multiple layers of security solutions to protect the network against intrusions and attacks Content filtering, PKI Protect the support infrastructure Encryption, decryption Collect and analyze security events to determine threat levels Encryption, decryption Copyright 2007 Accenture. All rights reserved. 3
4 One Size Does Not Fit All Why not to choose the strongest and fastest security solution? Finding an acceptable solution is a science of trade-off between many factors A solution must fit certain constraints, should be, among other things, adaptable, cost/benefit effective, and maintainable Copyright 2007 Accenture. All rights reserved. 4
5 Performance and Security: Conflicting Goals Performance is a key challenge in building largescale applications since its predictability is inherently difficult Weaving security solutions into the fabric of the architectures of these applications often worsens the performance of the resulting systems The performance degradation is more than 90% when all application data is protected, and it is worse when other security mechanisms are applied Copyright 2007 Accenture. All rights reserved. 5
6 Cryptographic Algorithms Are Costly Every time when a data message crosses a security boundary, this message is encrypted and later decrypted, causing the performance penalty Cryptographic operations in the Secure Socket Layer (SSL) protocol slow down downloading files from servers from 10 to up to 100 times, and they penalize the performance for web servers by as little as a factor of 3.4 to as much as nine Copyright 2007 Accenture. All rights reserved. 6
7 User-View Model τ 1 τ 2 τ 3 τ 4 Average response time of the system per transaction τ = n n τ n Copyright 2007 Accenture. All rights reserved. 7
8 Multithreaded Server Applications Multiple thread execute different payloads Some threads perform crypto operations while other threads execute some business logic When some thread executes crypto operations on the CPU, it takes time from other threads that can execute business logic and send response to users faster If crypto operations are offloaded to some other device, their threads are put to sleep and the CPU executes other threads in parallel with crypto operations thereby improving the overall response Copyright 2007 Accenture. All rights reserved. 8
9 Our Goal Increase the throughput (reduce the average time per transaction) while implementing the Defense-In-Depth strategy We do not try to optimize specific transactions We design an approach that does not require to change applications Copyright 2007 Accenture. All rights reserved. 9
10 Insight Since cryptographic algorithms are costly and they are orthogonal to the core business logic of applications, offload them to idle services Additional CPUs Cryptographic accelerators (cards) Cryptographic processors Extending compiler with security-related optimizations Extending kernels of operating systems Graphics Processing Units (GPUs) Copyright 2007 Accenture. All rights reserved. 10
11 Considerations For Using the GPU Since GPUs are often installed on computers by default, no additional work to add them is required Many servers come with preinstalled GPUs Dell PowerEdge TM servers come with the integrated ATI ES1000 controller The GPU programming model is well-known and documented The GPU inherently supports parallel computations GPUs are cheaper than other hardware solutions GPUs are resistant to different attacks Copyright 2007 Accenture. All rights reserved. 11
12 Our Solution We offer a novel solution, called PErformance and Security Together (PEST) for securing applications efficiently Our key idea is a Graphics Processing Unit (GPU)- management scheme free the CPU from cryptographic computations PESTO comprises offloading, batching, and scheduling mechanisms for executing cryptographic algorithms efficiently on the GPU Copyright 2007 Accenture. All rights reserved. 12
13 Our Solution We offer a novel solution, called PErformance and Security TOgether (PESTO) for securing applications efficiently Our key idea is a Graphics Processing Unit (GPU)- management scheme free the CPU from cryptographic computations PESTO comprises offloading, batching, and scheduling mechanisms for executing cryptographic algorithms efficiently on the GPU Copyright 2007 Accenture. All rights reserved. 13
14 Benefits For Accenture Achieve better performance of various applications when securing them PESTO Tiger Cheaper and easierto-use solution Orthographical error Utilize previously unused resources Copyright 2007 Accenture. All rights reserved. 14
15 CPU vs GPU The CPU is based on Single Instruction Single Data (SISD) model and the GPU is based on Single Instruction Multiple Data (SIMD) Copyright 2007 Accenture. All rights reserved. 15
16 CPU (SISD) vs GPU (SIMD) In SISD the CPU executes one instruction at a time on a single data element that is loaded from a storage into the memory prior to executing the instruction In contrast, an SIMD processor (i.e., the GPU) comprises many processing units that simultaneously execute instructions from a single instruction stream on multiple data streams, one per processing unit The GPU achieves a higher level of parallelism than the CPU Copyright 2007 Accenture. All rights reserved. 16
17 The GPU Is Designed For Graphics Computations The computer memory is considered a one-dimensional array for the CPU The GPU is designed for graphic computations where the memory is viewed as a two-dimensional array Copyright 2007 Accenture. All rights reserved. 17
18 The CPU-GPU Computation Model CPU Memory Bus Read-only GPU memory (Texture) Bus Fragment P 1 Write-only texture P n Copyright 2007 Accenture. All rights reserved. 18
19 The CPU-GPU Computation Model CPU Memory Bus Read-only GPU memory (Texture) Bus Fragment P 1 Write-only texture P n Copyright 2007 Accenture. All rights reserved. 19
20 The CPU-GPU Computation Model CPU Memory Bus Read-only GPU memory (Texture) Bus Draw Call P 1 Write-only texture Fragment P n Copyright 2007 Accenture. All rights reserved. 20
21 Bottlenecks The CPU is utilized 100% Multiple threads execute payload code fragments Each transfer from the CPU to the GPU involves a certain startup cost Each invocation of the draw call on the GPU involves some initialization cost Copyright 2007 Accenture. All rights reserved. 21
22 Model CPU GPU A D Thread n Thread k CryptoThread Copyright 2007 Accenture. All rights reserved. 22
23 Model CPU GPU A B D C Thread n Thread k Thread T Copyright 2007 Accenture. All rights reserved. 23
24 Model CPU GPU A M B D T x C T d Thread n Thread k Thread T Copyright 2007 Accenture. All rights reserved. 24
25 Model CPU GPU A M B D T x C T d Thread n Thread k Thread T Copyright 2007 Accenture. All rights reserved. 25
26 Processing Times Average transfer time per byte T x = k n= 1 T x ( M ) M k n n Average time of the draw call overhead per byte T d = k n= 1 T d ( M ) M k n n Average overhead time per byte Tb = T + T x d Copyright 2007 Accenture. All rights reserved. 26
27 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 27
28 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 28
29 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 29
30 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 30
31 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 31
32 Dependencies Message size: T b : Copyright 2007 Accenture. All rights reserved. 32
33 Dependencies Message size: T b : As the message size increases, the average overhead time per byte decreases Copyright 2007 Accenture. All rights reserved. 33
34 User-View Model CPU GPU A B D C Thread n Thread k Thread T Copyright 2007 Accenture. All rights reserved. 34
35 Experimental Results Cost (sec of processing) per byte Log 2 message size Copyright 2007 Accenture. All rights reserved. 35
36 Experimental Results message size Log 2 message size Cost (sec of processing) per byte Copyright 2007 Accenture. All rights reserved. 36
37 Our Solution Batch smaller messages into a large message Put CPU-bound threads that should perform cryptographic operations to sleep The CPU will be busy working on other threads that perform business logic-related operations Send the batched messages to the GPU, execute crypto operations, send the results back to the CPU, and wake up the waiting threads Copyright 2007 Accenture. All rights reserved. 37
38 Model M CPU GPU M M M M Message Queue C B Thread T Copyright 2007 Accenture. All rights reserved. 38
39 Model CPU GPU M Message Queue C B Thread T Copyright 2007 Accenture. All rights reserved. 39
40 Model of Batch Server Size L Server Arrivals Departures Queue Batch The GPU The server (the GPU) has a maximum capacity of K units After the service period ends, the GPU is idle, and the timer kicks off to keep track of the idle time Once the idle time exceeds the max idle time, the service will be started with the messages waiting in the queue The batch size, L, may be smaller that the max capacity K If the queue is empty at the expiration of the max idle time, service will be initiated with the first message arriving Copyright 2007 Accenture. All rights reserved. 40
41 Modeling Arrivals and Services X n-1 : the number of messages in queue immediately after the n-1 th batch has left the GPU (server) X n-1 <=K: the queue will be empty after the next service starts X n-1 >K: the queue will be left with X n-1 -K messages after the next service starts Y n =max(0, X n-1 -K): the number of messages in queue immediately after the n th service starts at the GPU G n : the number of messages that arrive during the n th batch service at the GPU X n =Y n +G n : the number of messages in queue immediately after the n th batch has left the GPU (server) Copyright 2007 Accenture. All rights reserved. 46
42 Discrete-Time Analysis y n (k) and g n (k) are distributions for Y n and G n x n (k) is a distribution for the number of messages X n x n (k) = y n (k) g n (k) The mean number of arrivals during an idle period E L 1 i= 0 [ P ] = ( L i) x( i) idle The mean number of arrivals during a busy period = busy E P ( ) E f L [ ] E A Copyright 2007 Accenture. All rights reserved. 47
43 Simulation Mean Waiting Time λ =10 λ = 8 λ = Max Capacity Copyright 2007 Accenture. All rights reserved. 50
44 Experimental Setup We implemented the AES algorithm on the GPU We used the application PetStore to evaluate PESTO We wrote a program that simulates virtual users We measured the average time per transaction for the original system for different numbers of virtual users, for the original system with the crypto operations implemented on the CPU, and with crypto operations implemented on the GPU Copyright 2007 Accenture. All rights reserved. 51
45 Results Average time per transaction No encryption AES encryption GPU offloading No encryption 0 1,000 2,000 3,000 4, Virtual users Copyright 2007 Accenture. All rights reserved. 52
46 Conclusions We designed a system called PESTO that allows users to add security without affecting performance significantly Remaining work is in linking the model to the characteristics of a system so that we can predict and advise how to fine tune our approach to a specific system Copyright 2007 Accenture. All rights reserved. 53
47 THANK YOU! QUESTIONS? Copyright 2007 Accenture. All rights reserved. 54
Parallel Processing SIMD, Vector and GPU s cont.
Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP
More informationComparing TCP performance of tunneled and non-tunneled traffic using OpenVPN. Berry Hoekstra Damir Musulin OS3 Supervisor: Jan Just Keijser Nikhef
Comparing TCP performance of tunneled and non-tunneled traffic using OpenVPN Berry Hoekstra Damir Musulin OS3 Supervisor: Jan Just Keijser Nikhef Outline Introduction Approach Research Results Conclusion
More informationIntel QuickAssist Technology
Performance Optimization Guide September 2018 Document Number: 330687-005 You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel
More informationKernel level AES Acceleration using GPUs
Kernel level AES Acceleration using GPUs TABLE OF CONTENTS 1 PROBLEM DEFINITION 1 2 MOTIVATIONS.................................................1 3 OBJECTIVE.....................................................2
More informationIntel QuickAssist Technology
Performance Optimization Guide January 2017 Document Number: 330687-004 You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel
More informationHardware Acceleration for Cryptographic Functions
Hardware Acceleration for Cryptographic Functions (AES Algorithm) By: Ahmed Moussa Outline Introduction Why Accelerate Cryptographic Functions? Why Hardware Acceleration? Approaches Literature Review Problem
More informationCS418 Operating Systems
CS418 Operating Systems Lecture 9 Processor Management, part 1 Textbook: Operating Systems by William Stallings 1 1. Basic Concepts Processor is also called CPU (Central Processing Unit). Process an executable
More informationLogCA: A High-Level Performance Model for Hardware Accelerators
Everything should be made as simple as possible, but not simpler Albert Einstein LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationFederal Information Processing Standard (FIPS) What is it? Why should you care?
Federal Information Processing Standard (FIPS) 140-2 What is it? Why should you care? SECURITY IS BECOMING A GROWING CONCERN The migration from TDM to IP communication networks has drastically increased
More informationWorking with Metal Overview
Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission
More informationChallenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008
Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated
More informationEvaluating BFT Protocols for Spire
Evaluating BFT Protocols for Spire Henry Schuh & Sam Beckley 600.667 Advanced Distributed Systems & Networks SCADA & Spire Overview High-Performance, Scalable Spire Trusted Platform Module Known Network
More informationPacketShader: A GPU-Accelerated Software Router
PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationINTERNET PROTOCOL SECURITY (IPSEC) GUIDE.
INTERNET PROTOCOL SECURITY (IPSEC) GUIDE www.insidesecure.com INTRODUCING IPSEC NETWORK LAYER PACKET SECURITY With the explosive growth of the Internet, more and more enterprises are looking towards building
More informationCopyright Khronos Group, Page Graphic Remedy. All Rights Reserved
Avi Shapira Graphic Remedy Copyright Khronos Group, 2009 - Page 1 2004 2009 Graphic Remedy. All Rights Reserved Debugging and profiling 3D applications are both hard and time consuming tasks Companies
More informationMany rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.
1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,
More informationCryptographic Hardware Support for the Linux Kernel
Cryptographic Hardware Support for the Linux Kernel James Morris Red Hat Inc. Oregon Networking Summit, July 2004 Current Status Simple crypto API in the 2.6 kernel, designed primarily for IPSec and then
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationCS533 Concepts of Operating Systems. Jonathan Walpole
CS533 Concepts of Operating Systems Jonathan Walpole SEDA: An Architecture for Well- Conditioned Scalable Internet Services Overview What does well-conditioned mean? Internet service architectures - thread
More informationGPU Accelerated Machine Learning for Bond Price Prediction
GPU Accelerated Machine Learning for Bond Price Prediction Venkat Bala Rafael Nicolas Fermin Cota Motivation Primary Goals Demonstrate potential benefits of using GPUs over CPUs for machine learning Exploit
More informationParallel Systems. Project topics
Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a
More informationLet s look at each and begin with a view into the software
Power Consumption Overview In this lesson we will Identify the different sources of power consumption in embedded systems. Look at ways to measure power consumption. Study several different methods for
More informationGeoImaging Accelerator Pansharpen Test Results. Executive Summary
Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance Whitepaper), the same approach has
More informationIntroduction to GPU programming with CUDA
Introduction to GPU programming with CUDA Dr. Juan C Zuniga University of Saskatchewan, WestGrid UBC Summer School, Vancouver. June 12th, 2018 Outline 1 Overview of GPU computing a. what is a GPU? b. GPU
More informationOptimisation. CS7GV3 Real-time Rendering
Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that
More informationGPGPU introduction and network applications. PacketShaders, SSLShader
GPGPU introduction and network applications PacketShaders, SSLShader Agenda GPGPU Introduction Computer graphics background GPGPUs past, present and future PacketShader A GPU-Accelerated Software Router
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationMeet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief
Meet the Increased Demands on Your Infrastructure with Dell and Intel ServerWatchTM Executive Brief a QuinStreet Excutive Brief. 2012 Doing more with less is the mantra that sums up much of the past decade,
More informationvs. GPU Performance Without the Answer University of Virginia Computer Engineering g Labs
Where is the Data? Why you Cannot Debate CPU vs. GPU Performance Without the Answer Chris Gregg and Kim Hazelwood University of Virginia Computer Engineering g Labs 1 GPUs and Data Transfer GPU computing
More informationEnabling Efficient and Scalable Zero-Trust Security
WHITE PAPER Enabling Efficient and Scalable Zero-Trust Security FOR CLOUD DATA CENTERS WITH AGILIO SMARTNICS THE NEED FOR ZERO-TRUST SECURITY The rapid evolution of cloud-based data centers to support
More informationGeneral Purpose GPU Programming. Advanced Operating Systems Tutorial 9
General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous
More informationGen-Z Overview. 1. Introduction. 2. Background. 3. A better way to access data. 4. Why a memory-semantic fabric
Gen-Z Overview 1. Introduction Gen-Z is a new data access technology that will allow business and technology leaders, to overcome current challenges with the existing computer architecture and provide
More informationSecuring Network Communications
Securing Network Communications Demonstration: Securing network access with Whitenoise Labs identity management, one-time-pad dynamic authentication, and onetime-pad authenticated encryption. Use of Whitenoise
More informationParallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model
Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
NET1343BU NSX Performance Samuel Kommu #VMworld #NET1343BU Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no
More informationExploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationSMD149 - Operating Systems
SMD149 - Operating Systems Roland Parviainen November 3, 2005 1 / 45 Outline Overview 2 / 45 Process (tasks) are necessary for concurrency Instance of a program in execution Next invocation of the program
More informationOutline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??
Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross
More informationModern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design
Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant
More informationOut-of-Order Parallel Simulation of SystemC Models. G. Liu, T. Schmidt, R. Dömer (CECS) A. Dingankar, D. Kirkpatrick (Intel Corp.)
Out-of-Order Simulation of s using Intel MIC Architecture G. Liu, T. Schmidt, R. Dömer (CECS) A. Dingankar, D. Kirkpatrick (Intel Corp.) Speaker: Rainer Dömer doemer@uci.edu Center for Embedded Computer
More informationCS 856 Latency in Communication Systems
CS 856 Latency in Communication Systems Winter 2010 Latency Challenges CS 856, Winter 2010, Latency Challenges 1 Overview Sources of Latency low-level mechanisms services Application Requirements Latency
More informationCS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics
CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically
More informationTinySec: A Link Layer Security Architecture for Wireless Sensor Networks. Presented by Paul Ruggieri
TinySec: A Link Layer Security Architecture for Wireless Sensor Networks Chris Karlof, Naveen Sastry,, David Wagner Presented by Paul Ruggieri 1 Introduction What is TinySec? Link-layer security architecture
More informationDistributed File Systems Part II. Distributed File System Implementation
s Part II Daniel A. Menascé Implementation File Usage Patterns File System Structure Caching Replication Example: NFS 1 Implementation: File Usage Patterns Static Measurements: - distribution of file size,
More informationPerformance Implications of Security Protocols
Performance Implications of Security Protocols Varsha Mainkar Technical Staff Member Network Design & Performance Analysis Advanced Technologies, Joint Work with Paul Reeser 5th INFORMS Telecom Conference
More informationHP S1500 SSL Appliance. Product overview. Key features. Data sheet
HP S1500 SSL Appliance Data sheet Product overview The HP S1500 SSL Appliance provides hardware-accelerated Secure Sockets Layer (SSL) offloading and bridging to enable high-performance intrusion prevention
More informationB. Tech. Project Second Stage Report on
B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic
More informationAn Experimental Analysis on Iterative Block Ciphers and Their Effects on VoIP under Different Coding Schemes
An Experimental Analysis on Iterative Block Ciphers and Their Effects on VoIP under Different Coding Schemes Gregory Epiphaniou 1 Carsten Maple 1 Paul Sant 1 Matthew Reeves 2 1 Institute for Research in
More informationConfiguring iscsi in a VMware ESX Server 3 Environment B E S T P R A C T I C E S
Configuring iscsi in a VMware ESX Server 3 Environment B E S T P R A C T I C E S Contents Introduction...1 iscsi Explained...1 Initiators...1 Discovery and Logging On...2 Authentication...2 Designing the
More informationWho s Protecting Your Keys? August 2018
Who s Protecting Your Keys? August 2018 Protecting the most vital data from the core to the cloud to the field Trusted, U.S. based source for cyber security solutions We develop, manufacture, sell and
More informationWHITE PAPER A10 SSL INSIGHT & FIREWALL LOAD BALANCING WITH SONICWALL NEXT-GEN FIREWALLS
WHITE PAPER A10 SSL INSIGHT & FIREWALL LOAD BALANCING WITH SONICWALL NEXT-GEN FIREWALLS TABLE OF CONTENTS EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 SOLUTION REQUIREMENTS... 3 SOLUTION COMPONENTS... 4 SOLUTION
More informationHAI Network Communication Protocol Description
Home Automation, Inc. HAI Network Communication Protocol Description This document contains the intellectual property of Home Automation, Inc. (HAI). HAI authorizes the use of this information for the
More informationProcess management. Scheduling
Process management Scheduling Points of scheduler invocation (recap) User Kernel Return from system call Process Schedule Return from interrupt handler Timer interrupts to ensure OS control Return from
More informationWhat s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved.
What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationBUILDING A NEXT-GENERATION FIREWALL
How to Add Network Intelligence, Security, and Speed While Getting to Market Faster INNOVATORS START HERE. EXECUTIVE SUMMARY Your clients are on the front line of cyberspace and they need your help. Faced
More informationSDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center
SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently
More informationLecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter
Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)
More informationEpisode 4. Flow and Congestion Control. Baochun Li Department of Electrical and Computer Engineering University of Toronto
Episode 4. Flow and Congestion Control Baochun Li Department of Electrical and Computer Engineering University of Toronto Recall the previous episode Detailed design principles in: The link layer The network
More informationSchema-Agnostic Indexing with Azure Document DB
Schema-Agnostic Indexing with Azure Document DB Introduction Azure DocumentDB is Microsoft s multi-tenant distributed database service for managing JSON documents at Internet scale Multi-tenancy is an
More information3. Memory Management
Principles of Operating Systems CS 446/646 3. Memory Management René Doursat Department of Computer Science & Engineering University of Nevada, Reno Spring 2006 Principles of Operating Systems CS 446/646
More informationIBM Spectrum Scale IO performance
IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial
More informationAN 831: Intel FPGA SDK for OpenCL
AN 831: Intel FPGA SDK for OpenCL Host Pipelined Multithread Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Intel FPGA SDK for OpenCL Host Pipelined Multithread...3 1.1
More informationScalable Streaming Analytics
Scalable Streaming Analytics KARTHIK RAMASAMY @karthikz TALK OUTLINE BEGIN I! II ( III b Overview Storm Overview Storm Internals IV Z V K Heron Operational Experiences END WHAT IS ANALYTICS? according
More informationMobiLink Performance. A whitepaper from ianywhere Solutions, Inc., a subsidiary of Sybase, Inc.
MobiLink Performance A whitepaper from ianywhere Solutions, Inc., a subsidiary of Sybase, Inc. Contents Executive summary 2 Introduction 3 What are the time-consuming steps in MobiLink synchronization?
More informationAES Cryptosystem Acceleration Using Graphics Processing Units. Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley
AES Cryptosystem Acceleration Using Graphics Processing Units Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley Overview Introduction Compute Unified Device Architecture (CUDA) Advanced
More informationCS420: Operating Systems
Main Memory James Moscola Department of Engineering & Computer Science York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Background Program must
More informationOperating System Concepts Ch. 5: Scheduling
Operating System Concepts Ch. 5: Scheduling Silberschatz, Galvin & Gagne Scheduling In a multi-programmed system, multiple processes may be loaded into memory at the same time. We need a procedure, or
More informationCopyright Push Technology Ltd December Diffusion TM 4.4 Performance Benchmarks
Diffusion TM 4.4 Performance Benchmarks November 2012 Contents 1 Executive Summary...3 2 Introduction...3 3 Environment...4 4 Methodology...5 4.1 Throughput... 5 4.2 Latency... 6 5 Results Summary...7
More informationEfficient Lists Intersection by CPU- GPU Cooperative Computing
Efficient Lists Intersection by CPU- GPU Cooperative Computing Di Wu, Fan Zhang, Naiyong Ao, Gang Wang, Xiaoguang Liu, Jing Liu Nankai-Baidu Joint Lab, Nankai University Outline Introduction Cooperative
More informationCPS 110 Final Exam. Spring 2011
CPS 110 Final Exam Spring 2011 Please answer all questions for a total of 300 points. Keep it clear and concise: answers are graded on content, not style. I expect that you can answer each question within
More informationExposing Congestion Attack on Emerging Connected Vehicle based Traffic Signal Control
Exposing Congestion Attack on Emerging Connected Vehicle based Traffic Signal Control Qi Alfred Chen, Yucheng Yin, Yiheng Feng, Z. Morley Mao, Henry X. Liu University of Michigan Background: Connected
More informationEMBEDDED ENCRYPTION PLATFORM BENEFIT ANALYSIS
EMBEDDED ENCRYPTION PLATFORM BENEFIT ANALYSIS MerlinCryption s forward-looking technology proactively secures clients against today s threats and tomorrow s risks. A significant advantage to securing systems
More informationTrusted Platform Module explained
Bosch Security Systems Video Systems Trusted Platform Module explained What it is, what it does and what its benefits are 3 August 2016 2 Bosch Security Systems Video Systems Table of contents Table of
More informationCS 426 Parallel Computing. Parallel Computing Platforms
CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:
More informationCOMP SCI 3SH3: Operating System Concepts (Term 2 Winter 2006) Test 2 February 27, 2006; Time: 50 Minutes ;. Questions Instructor: Dr.
COMP SCI 3SH3: Operating System Concepts (Term 2 Winter 2006) Test 2 February 27, 2006; Time: 50 Minutes ;. Questions Instructor: Dr. Kamran Sartipi Name: Student ID: Question 1 (Disk Block Allocation):
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this
More informationEnabling High Performance Bulk Data Transfers With SSH
Enabling High Performance Bulk Data Transfers With SSH Chris Rapier Benjamin Bennett TIP 08 Moving Data Still crazy after all these years Multiple solutions exist Protocols UDT, SABUL, etc Implementations
More informationI/O Buffering and Streaming
I/O Buffering and Streaming I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks
More informationMAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015
MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 Video Codecs 70% of internet traffic will be video in 2018 [CISCO] Video
More informationUNIT I (Two Marks Questions & Answers)
UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-
More informationEmulex LPe16000B Gen 5 Fibre Channel HBA Feature Comparison
Demartek Emulex LPe16000B Gen 5 Fibre Channel HBA Feature Comparison Evaluation report prepared under contract with Emulex Executive Summary Explosive growth in the complexity and amount of data of today
More informationGeneral Purpose GPU Programming. Advanced Operating Systems Tutorial 7
General Purpose GPU Programming Advanced Operating Systems Tutorial 7 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous
More informationOPERATING SYSTEMS. Systems with Multi-programming. CS 3502 Spring Chapter 4
OPERATING SYSTEMS CS 3502 Spring 2018 Systems with Multi-programming Chapter 4 Multiprogramming - Review An operating system can support several processes in memory. While one process receives service
More informationCISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP
CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationAssignment 3 (Due date: Thursday, 10/15/2009, in class) Part One: Provide brief answers to the following Chapter Exercises questions:
Assignment 3 (Due date: Thursday, 10/15/2009, in class) Your name: Date: Part One: Provide brief answers to the following Chapter Exercises questions: 4.7 Provide two programming examples in which multithreading
More informationIncremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs
Incremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs Amit Kalele and Manoj Nambiar April 21, 2014 1 Optimization & Parallelization COE Center of Excellence
More informationBlueDBM: An Appliance for Big Data Analytics*
BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting
More informationSecuring the Frisbee Multicast Disk Loader
Securing the Frisbee Multicast Disk Loader Robert Ricci, Jonathon Duerig University of Utah 1 What is Frisbee? 2 Frisbee is Emulab s tool to install whole disk images from a server to many clients using
More informationInternet Security in my Crystal Ball
Steven M. Bellovin June 21, 2001 1 Florham Park, NJ 07932 AT&T Labs Research +1 973-360-8656 http://www.research.att.com/ smb Steven M. Bellovin Internet Security in my Crystal Ball security speculation
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationScheduling - Overview
Scheduling - Overview Quick review of textbook scheduling Linux 2.4 scheduler implementation overview Linux 2.4 scheduler code Modified Linux 2.4 scheduler Linux 2.6 scheduler comments Possible Goals of
More informationLinux multi-core scalability
Linux multi-core scalability Oct 2009 Andi Kleen Intel Corporation andi@firstfloor.org Overview Scalability theory Linux history Some common scalability trouble-spots Application workarounds Motivation
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete
More informationIntel Software Guard Extensions (Intel SGX) Memory Encryption Engine (MEE) Shay Gueron
Real World Cryptography Conference 2016 6-8 January 2016, Stanford, CA, USA Intel Software Guard Extensions (Intel SGX) Memory Encryption Engine (MEE) Shay Gueron Intel Corp., Intel Development Center,
More informationGPUs and GPGPUs. Greg Blanton John T. Lubia
GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware
More information