Kernel level AES Acceleration using GPUs

Size: px
Start display at page:

Download "Kernel level AES Acceleration using GPUs"

Transcription

1 Kernel level AES Acceleration using GPUs TABLE OF CONTENTS 1 PROBLEM DEFINITION 1 2 MOTIVATIONS OBJECTIVE APPROACH RELATED WORK ACCELERATION OF CRYPTOGRAPHIC FUNCTIONS USING GRAPHICS HARDWARE GPU ACCELERATED CRYPTOGRAPHY AS AN OS SERVICE PERFORMANCE EVALUATION OF PARALLEL AES IMPLEMENTATIONS OVER CUDA CONCLUSION REFERENCES Problem Definition Often times encryption is not the main focus of an application, rather it is something that the application provides as part of its service, an example of this is OpenSSL [2]. Performing AES Encryption is a computationally expensive operation [1], the process of encrypting and decrypting data can take up a large percentage of the CPU's time which may affect the whole system s performance. Increasing the bit count of our keys/encryption may even slow down performance more to a non sustainable level, although this is vital nowadays since with the new hardware accelerators available, cracking of current means of encryption even using brute force became feasible in some cases. 2 Motivations It is certainly beneficial to have better security mainly for data transmission and encryption (since we can use AES 192 or AES 256 instead of AES 128 at the accelerated solution). Such acceleration will also provide us better performing web servers (Enhancing SSL speed) which will affect almost every secured Internet website that has such encryption. Virtual Private Networks (Accelerating IPsec VPNs), Storage Area Networks (Encryption of data transmission) and Pay TV (Securing pay TV through tamper resistant service) are all examples of high level benefits from the accelerated encryption that will be achieved by GPUs, as such hardware accelerators can be used as a separate unit to increase the overall performance of the application allowing more space for future features that can provide better functionalities and security to the users.

2 3 Objective The objective of this research is first to alleviate the amount of work done by the CPU by offloading AES encryption to GPUs, since the process of encrypting and decrypting data can take up a large percentage of the CPU's time. Second, is to provide an Implementation for the accelerated algorithm on GPUs at the kernel level layer in order to avoid overheads from user space and provide abstraction to the services using this acceleration. 4 Approach The main approaches that this research will head for is first mainly to leverage the usage of modern GPU frameworks in providing an accelerated algorithm using the full capabilities of modern GPUs. The choice between frameworks such as the Nvidia CUDA framework [6] or the OpenCL framework to be used in the implementation of the practical part of this research is also a choice yet to be considered however it s undecided up to this moment. Secondly to optimize memory transfers between GPU memory layers and CPU ram to minimize overheads, the key point here will be to exploit the new capabilities and features of modern GPU programming frameworks that can help in such optimizations. Thirdly auto- tune the accelerated parameters based on GPU model and architecture, having a small auto- tuning unit that does multiple sample runs over the algorithm and the present hardware trying to find the best performing sample in order to optimize the parallelism parameters. Fourthly and finally is to integrate the GPU as a driver abstracted in the OCF (OpenBSD Cryptographic Framework) [7] providing an accelerated version of the AES algorithm in it (and other algorithms in future work). 5 Related Works 5.1 Acceleration of Cryptographic Functions using Graphics Hardware In this paper they investigate GPU acceleration of symmetric- key and asymmetric- key functions using an example of AES algorithm that it can be accelerated using modern GPUs and outperform CPUs. The aim of this investigation is to decided to what extent can the GPU act as an efficient hardware accelerator for cryptographic functions. Figure shows the heavily parallel architecture of an Nvidia GPU, the figure clarifies the main computing units within the GPU, the streaming multiprocessor (SM), which executed the CUDA, parallelized code.

3 Figure 5.1.1: Simplified block diagram of the GeForce architecture Using this architecture, the research provided statistics and results of the runtime of the accelerated AES code that shows that the GPU can be viewed as an example of a highly parallel processor for general- purpose computation. The main challenge they faced was to maintain a high occupancy level at the GPU because failing to do so results in a loss for the potential performance increase that would have been achieved using the non- occupied GPU cores. They have using the SIMD, single instruction multiple data, technique to increase the computational density of the architecture. They have achieved using their GPU implementation 2.5x and 6x increase in performance with and without data transfer respectively. 5.2 GPU Accelerated Cryptography as an OS Service This research mainly tackles the issue of the absence of method that allows operating system kernel service or user space application to make use of GPU accelerations in a practical manner. The paper investigates the integration of GPU accelerated functions with an established service virtualization layer, called the OCF- Linux framework [7], within the Linux kernel. OCF is a framework that provides a standard method for the integration of any cryptographic accelerator driver using its producer API. It receives calls from userspace or kernelspace applications and acts as a middleware layer between it and the accelerator. Figure 5.2.1: OCF framework Architecture Figure shows a high level overview of the OCF framework where the core component of the framework, the main Crypto layer, provides two APIs - the producer API for use by crypto- card device drivers and the consumer API for use by other kernel subsystems.

4 The authors have presented a new general- purpose mechanism for processing multiple asymmetric key requests on the GPU and found that the preprocessing of mixed key requests is crucial to maintain the performance. They have shown that the GPU can be effectively integrated into the OCF [7] successfully, though it was challenging in some point such as the driver consisting of the kernelspace OCF driver and the userspace daemon. They also showed that there might be an overhead when using the OCF accelerated encrypted function of 3.4% compared to calling it directly. However they have concluded that GPU accelerated cryptographic functions can be available in a uniform standard way to all operating system components in userspace and kernelspace without having excessive overhead. 5.3 Performance Evaluation of Parallel AES Implementations over CUDA In this research the authors mainly tries to evaluate multiple AES implementations over the GPU, this is mainly because a traditional AES GPU implementation doesn t necessarily provide optimal performance. They also investigate the possibility to enhance different parallelism mechanisms over AES implementations using CUDA framework to utilize GPU in both a basic method (applying a parallel thread into each AES data block) and enhanced mechanisms (optimizing the internal stage in each AES round for parallel AES). They focused only on the parallelism for comparable performance evaluation purpose as they say, since they applied AES encryption in ECB mode (CTR mode can be similarly encrypted without the dependency of the previous blocks, but not others.) where a 16- bytes AES block can be encrypted individually. Running parallel AES by applying each GPU core into each AES block resulting into a complete encrypted parallel block as shown in Figure Figure 5.3.1: AES Encryption Stage 1 and it s mapping to GPU Blocks

5 Their experiment showed that the performance of Parallel AES on CUDA- GPU offers the improvement over CPU by factor of 20, i.e., 38 to 39 milliseconds (GPU) vs. 779 milliseconds (CPU). Figure 5.3.2: Number of Threads versus Number of Blocks They have also shown that in the GPU implementation a fewer number of blocks and more number of threads resulted in best performance as shown in Figure 5.3.2, and this also applies for large data size due to the reduction of inner block communication. They also evaluated the possibility to utilize each AES stage optimization to utilize AES parallelism, and so the results showed the performance improvement over GPU, and especially, a traditional CPU. Conclusion GPUs can provide efficient and reliable hardware acceleration for heavy compute intensive functions that performs much faster on massively parallel architectures, decreasing the load on the CPU allowing it more room to do other functionalities. There is also a security aspect as the number of bits, the complexity, of the encryption can be pushed further with GPUs giving better security overall.

6 References [1] J. Daemen, V. Rijmen, AES Proposal: Rijndael, Ver 2, [2] Cray Inc., Cray XD1 System Overview, Ver 1.1, [3] Acceleration of Cryptographic Functions using Graphics Hardware, Harrison, O [4] GPU accelerated cryptography as an OS service, Harrison, O., & Waldron, J., Springer, Transactions on Computational Science XI, [5] Performance Evaluation of Parallel AES Implementations over CUDA GPU Framework, Chakchai Soin., Sarayut Poolsanguan, Int. Journal of Digital Content Technology and Its Applications, [6] CUDA Framework, cuda. [7] OCF Framework, linux.sourceforge.net/

Hardware Acceleration for Cryptographic Functions

Hardware Acceleration for Cryptographic Functions Hardware Acceleration for Cryptographic Functions (AES Algorithm) By: Ahmed Moussa Outline Introduction Why Accelerate Cryptographic Functions? Why Hardware Acceleration? Approaches Literature Review Problem

More information

AES Cryptosystem Acceleration Using Graphics Processing Units. Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley

AES Cryptosystem Acceleration Using Graphics Processing Units. Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley AES Cryptosystem Acceleration Using Graphics Processing Units Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley Overview Introduction Compute Unified Device Architecture (CUDA) Advanced

More information

Uses of Cryptography

Uses of Cryptography Uses of Cryptography What can we use cryptography for? Lots of things Secrecy Authentication Prevention of alteration Page 1 Cryptography and Secrecy Pretty obvious Only those knowing the proper keys can

More information

Cryptographic Hardware Support for the Linux Kernel

Cryptographic Hardware Support for the Linux Kernel Cryptographic Hardware Support for the Linux Kernel James Morris Red Hat Inc. Oregon Networking Summit, July 2004 Current Status Simple crypto API in the 2.6 kernel, designed primarily for IPSec and then

More information

Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi

Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

More information

I. INTRODUCTION. Manisha N. Kella * 1 and Sohil Gadhiya2.

I. INTRODUCTION. Manisha N. Kella * 1 and Sohil Gadhiya2. 2018 IJSRSET Volume 4 Issue 4 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section : Engineering and Technology A Survey on AES (Advanced Encryption Standard) and RSA Encryption-Decryption in CUDA

More information

Parallelizing Cryptography. Gordon Werner Samantha Kenyon

Parallelizing Cryptography. Gordon Werner Samantha Kenyon Parallelizing Cryptography Gordon Werner Samantha Kenyon Outline Security requirements Cryptographic Primitives Block Cipher Parallelization of current Standards AES RSA Elliptic Curve Cryptographic Attacks

More information

9/30/2016. Cryptography Basics. Outline. Encryption/Decryption. Cryptanalysis. Caesar Cipher. Mono-Alphabetic Ciphers

9/30/2016. Cryptography Basics. Outline. Encryption/Decryption. Cryptanalysis. Caesar Cipher. Mono-Alphabetic Ciphers Cryptography Basics IT443 Network Security Administration Slides courtesy of Bo Sheng Basic concepts in cryptography systems Secret cryptography Public cryptography 1 2 Encryption/Decryption Cryptanalysis

More information

Cryptography Basics. IT443 Network Security Administration Slides courtesy of Bo Sheng

Cryptography Basics. IT443 Network Security Administration Slides courtesy of Bo Sheng Cryptography Basics IT443 Network Security Administration Slides courtesy of Bo Sheng 1 Outline Basic concepts in cryptography systems Secret key cryptography Public key cryptography Hash functions 2 Encryption/Decryption

More information

Security Applications

Security Applications 1. Introduction Security Applications Abhyudaya Chodisetti Paul Wang Lee Garrett Smith Cryptography applications generally involve a large amount of processing. Thus, there is the possibility that these

More information

Comparing TCP performance of tunneled and non-tunneled traffic using OpenVPN. Berry Hoekstra Damir Musulin OS3 Supervisor: Jan Just Keijser Nikhef

Comparing TCP performance of tunneled and non-tunneled traffic using OpenVPN. Berry Hoekstra Damir Musulin OS3 Supervisor: Jan Just Keijser Nikhef Comparing TCP performance of tunneled and non-tunneled traffic using OpenVPN Berry Hoekstra Damir Musulin OS3 Supervisor: Jan Just Keijser Nikhef Outline Introduction Approach Research Results Conclusion

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

Information Security CS526

Information Security CS526 Information CS 526 Topic 3 Ciphers and Cipher : Stream Ciphers, Block Ciphers, Perfect Secrecy, and IND-CPA 1 Announcements HW1 is out, due on Sept 10 Start early, late policy is 3 total late days for

More information

Block Ciphers. Secure Software Systems

Block Ciphers. Secure Software Systems 1 Block Ciphers 2 Block Cipher Encryption function E C = E(k, P) Decryption function D P = D(k, C) Symmetric-key encryption Same key is used for both encryption and decryption Operates not bit-by-bit but

More information

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo N-Body Simulation using CUDA CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo Project plan Develop a program to simulate gravitational

More information

Computer Security CS 526

Computer Security CS 526 Computer Security CS 526 Topic 4 Cryptography: Semantic Security, Block Ciphers and Encryption Modes CS555 Topic 4 1 Readings for This Lecture Required reading from wikipedia Block Cipher Ciphertext Indistinguishability

More information

Symmetric Encryption. Thierry Sans

Symmetric Encryption. Thierry Sans Symmetric Encryption Thierry Sans Design principles (reminder) 1. Kerkoff Principle The security of a cryptosystem must not rely on keeping the algorithm secret 2. Diffusion Mixing-up symbols 3. Confusion

More information

Efficient Securing of Multithreaded Server Applications

Efficient Securing of Multithreaded Server Applications Efficient Securing of Multithreaded Server Applications Mark Grechanik, Qing Xie, Matthew Hellige, Manoj Seshadrinathan, and Kelly L. Dempski Accenture Technology Labs Copyright 2007 Accenture. All rights

More information

GPU 101. Mike Bailey. Oregon State University. Oregon State University. Computer Graphics gpu101.pptx. mjb April 23, 2017

GPU 101. Mike Bailey. Oregon State University. Oregon State University. Computer Graphics gpu101.pptx. mjb April 23, 2017 1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA How Can You Gain Access to GPU Power? 3

More information

GPU 101. Mike Bailey. Oregon State University

GPU 101. Mike Bailey. Oregon State University 1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA 1 How Can You Gain Access to GPU Power?

More information

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS CIS 601 - Graduate Seminar Presentation 1 GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS PRESENTED BY HARINATH AMASA CSU ID: 2697292 What we will talk about.. Current problems GPU What are GPU Databases GPU

More information

A Comparison of Data Encryption Algorithms with the Proposed Algorithm: Wireless Security

A Comparison of Data Encryption Algorithms with the Proposed Algorithm: Wireless Security A Comparison of Data Encryption Algorithms with the Proposed Algorithm: Wireless Security Shadi R. Masadeh, Shadi Aljawarneh, Nedal Turab Faculty of Information Technology Isra University P.O. Box 22,

More information

Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s)

Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s) Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s) Islam Harb 08/21/2015 Agenda Motivation Research Challenges Contributions & Approach Results Conclusion Future Work 2

More information

The Linux Kernel Cryptographic API

The Linux Kernel Cryptographic API Published on Linux Journal (http://www.linuxjournal.com) The Linux Kernel Cryptographic API By James Morris Created 2003-04-01 02:00 This article provides a brief overview of the new cryptographic API

More information

The Case For Crypto Protocol Awareness Inside The OS Kernel

The Case For Crypto Protocol Awareness Inside The OS Kernel The Case For Crypto Protocol Awareness Inside The OS Kernel Matthew Burnside Angelos D. Keromytis Department of Computer Science, Columbia University {mb,angelos}@cs.columbia.edu Abstract Separation of

More information

The Case For Crypto Protocol Awareness Inside The OS Kernel

The Case For Crypto Protocol Awareness Inside The OS Kernel The Case For Crypto Protocol Awareness Inside The OS Kernel Matthew Burnside Angelos D. Keromytis Department of Computer Science, Columbia University mb,angelos @cs.columbia.edu Abstract Separation of

More information

Anand Raghunathan

Anand Raghunathan ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.26: Example: Hardware Architecture Anand Raghunathan raghunathan@purdue.edu ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014,

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9 General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy

More information

APNIC elearning: Cryptography Basics

APNIC elearning: Cryptography Basics APNIC elearning: Cryptography Basics 27 MAY 2015 03:00 PM AEST Brisbane (UTC+10) Issue Date: Revision: Introduction Presenter Sheryl Hermoso Training Officer sheryl@apnic.net Specialties: Network Security

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7 General Purpose GPU Programming Advanced Operating Systems Tutorial 7 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

Why? High performance clusters: Fast interconnects Hundreds of nodes, with multiple cores per node Large storage systems Hardware accelerators

Why? High performance clusters: Fast interconnects Hundreds of nodes, with multiple cores per node Large storage systems Hardware accelerators Remote CUDA (rcuda) Why? High performance clusters: Fast interconnects Hundreds of nodes, with multiple cores per node Large storage systems Hardware accelerators Better performance-watt, performance-cost

More information

GPUfs: Integrating a file system with GPUs

GPUfs: Integrating a file system with GPUs GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU

More information

Parallel Computing Scheme for the Encryption Process of DNSCrypt Protocol using CUDA

Parallel Computing Scheme for the Encryption Process of DNSCrypt Protocol using CUDA Parallel Computing Scheme for the Encryption Process of DNSCrypt Protocol using CUDA Fairuz Astra Pratama 1, Dr. Ir. Rinaldi Munir, MT. 2, Drs. Judhi Santoso, M.Sc. 3 School of Electrical Engineering and

More information

GPU programming. Dr. Bernhard Kainz

GPU programming. Dr. Bernhard Kainz GPU programming Dr. Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages GPU programming paradigms Pitfalls and best practice Reduction and tiling

More information

Computer Security: Principles and Practice

Computer Security: Principles and Practice Computer Security: Principles and Practice Chapter 2 Cryptographic Tools First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Cryptographic Tools cryptographic algorithms

More information

Kernel Transport Layer Security

Kernel Transport Layer Security Kernel Transport Layer Security A TLS socket Dave Watson davejwatson@fb.com TLS implemented as a socket int tls_fd = socket(af_tls, SOCK_STREAM SOCK_DGRAM, 0); 2 Why TLS? Security for the web The S in

More information

LogCA: A High-Level Performance Model for Hardware Accelerators

LogCA: A High-Level Performance Model for Hardware Accelerators Everything should be made as simple as possible, but not simpler Albert Einstein LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

More information

Compiling for GPUs. Adarsh Yoga Madhav Ramesh

Compiling for GPUs. Adarsh Yoga Madhav Ramesh Compiling for GPUs Adarsh Yoga Madhav Ramesh Agenda Introduction to GPUs Compute Unified Device Architecture (CUDA) Control Structure Optimization Technique for GPGPU Compiler Framework for Automatic Translation

More information

CUDA. GPU Computing. K. Cooper 1. 1 Department of Mathematics. Washington State University

CUDA. GPU Computing. K. Cooper 1. 1 Department of Mathematics. Washington State University GPU Computing K. Cooper 1 1 Department of Mathematics Washington State University 2014 Review of Parallel Paradigms MIMD Computing Multiple Instruction Multiple Data Several separate program streams, each

More information

Leveraging Hybrid Hardware in New Ways: The GPU Paging Cache

Leveraging Hybrid Hardware in New Ways: The GPU Paging Cache Leveraging Hybrid Hardware in New Ways: The GPU Paging Cache Frank Feinbube, Peter Tröger, Johannes Henning, Andreas Polze Hasso Plattner Institute Operating Systems and Middleware Prof. Dr. Andreas Polze

More information

The rcuda middleware and applications

The rcuda middleware and applications The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,

More information

Computer Security. 08. Cryptography Part II. Paul Krzyzanowski. Rutgers University. Spring 2018

Computer Security. 08. Cryptography Part II. Paul Krzyzanowski. Rutgers University. Spring 2018 Computer Security 08. Cryptography Part II Paul Krzyzanowski Rutgers University Spring 2018 March 23, 2018 CS 419 2018 Paul Krzyzanowski 1 Block ciphers Block ciphers encrypt a block of plaintext at a

More information

GPU Implementation of a Multiobjective Search Algorithm

GPU Implementation of a Multiobjective Search Algorithm Department Informatik Technical Reports / ISSN 29-58 Steffen Limmer, Dietmar Fey, Johannes Jahn GPU Implementation of a Multiobjective Search Algorithm Technical Report CS-2-3 April 2 Please cite as: Steffen

More information

Parallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU

Parallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU Parallel LZ77 Decoding with a GPU Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU Outline Background (What?) Problem definition and motivation (Why?)

More information

Comparison of CPU and GPGPU performance as applied to procedurally generating complex cave systems

Comparison of CPU and GPGPU performance as applied to procedurally generating complex cave systems Comparison of CPU and GPGPU performance as applied to procedurally generating complex cave systems Subject: Comp6470 - Special Topics in Computing Student: Tony Oakden (U4750194) Supervisor: Dr Eric McCreath

More information

Architectural Analysis of Cryptographic Applications for Network Processors

Architectural Analysis of Cryptographic Applications for Network Processors Architectural Analysis of Cryptographic Applications for Network Processors Haiyong Xie, Li Zhou, and Laxmi Bhuyan Department of Computer Science & Engineering University of California, Riverside Riverside,

More information

The Salsa20 Family of Stream Ciphers

The Salsa20 Family of Stream Ciphers The Salsa20 Family of Stream Ciphers Based on [Bernstein, 2008] Erin Hales, Gregor Matl, Simon-Philipp Merz Introduction to Cryptology November 13, 2017 From a security perspective, if you re connected,

More information

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

CSE 127: Computer Security Cryptography. Kirill Levchenko

CSE 127: Computer Security Cryptography. Kirill Levchenko CSE 127: Computer Security Cryptography Kirill Levchenko October 24, 2017 Motivation Two parties want to communicate securely Secrecy: No one else can read messages Integrity: messages cannot be modified

More information

CIS 4360 Secure Computer Systems Symmetric Cryptography

CIS 4360 Secure Computer Systems Symmetric Cryptography CIS 4360 Secure Computer Systems Symmetric Cryptography Professor Qiang Zeng Spring 2017 Previous Class Classical Cryptography Frequency analysis Never use home-made cryptography Goals of Cryptography

More information

! Readings! ! Room-level, on-chip! vs.!

! Readings! ! Room-level, on-chip! vs.! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads

More information

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop

More information

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran

More information

Technology for a better society. hetcomp.com

Technology for a better society. hetcomp.com Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

Secure Wireless Sensor Network Updates Using OTAP And Performance of Symmetric Encryption Algorithms on Power Consumption

Secure Wireless Sensor Network Updates Using OTAP And Performance of Symmetric Encryption Algorithms on Power Consumption RESEARCH ARTICLE OPEN ACCESS Secure Wireless Sensor Network Updates Using OTAP And Performance of Symmetric Encryption Algorithms on Power Consumption Mr. Charan R. Pote 1, Mr. Punesh U. Tembhare 2, Mr.

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

GPU Fundamentals Jeff Larkin November 14, 2016

GPU Fundamentals Jeff Larkin November 14, 2016 GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate

More information

Symmetric-Key Cryptography

Symmetric-Key Cryptography Symmetric-Key Cryptography CS 161: Computer Security Prof. Raluca Ada Popa Sept 13, 2016 Announcements Project due Sept 20 Special guests Alice Bob The attacker (Eve - eavesdropper, Malice) Sometimes Chris

More information

Secret Key Cryptography Using Graphics Cards

Secret Key Cryptography Using Graphics Cards Secret Key Cryptography Using Graphics Cards Debra L. Cook Columbia University dcook@cs.columbia.edu John Ioannidis Columbia University ji@cs.columbia.edu Jake Luck 10K Interactive jake@10k.org Angelos

More information

Cryptographic algorithm acceleration using CUDA enabled GPUs in typical system configurations

Cryptographic algorithm acceleration using CUDA enabled GPUs in typical system configurations Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 8-1-2010 Cryptographic algorithm acceleration using CUDA enabled GPUs in typical system configurations Maksim

More information

OpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances

OpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances OpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances Stefano Cagnoni 1, Alessandro Bacchini 1,2, Luca Mussi 1 1 Dept. of Information Engineering, University of Parma,

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte

More information

Cryptographic Algorithms - AES

Cryptographic Algorithms - AES Areas for Discussion Cryptographic Algorithms - AES CNPA - Network Security Joseph Spring Department of Computer Science Advanced Encryption Standard 1 Motivation Contenders Finalists AES Design Feistel

More information

Lecture 1 Applied Cryptography (Part 1)

Lecture 1 Applied Cryptography (Part 1) Lecture 1 Applied Cryptography (Part 1) Patrick P. C. Lee Tsinghua Summer Course 2010 1-1 Roadmap Introduction to Security Introduction to Cryptography Symmetric key cryptography Hash and message authentication

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v6.5 August 2014 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

Introduction to Cryptographic Systems. Asst. Prof. Mihai Chiroiu

Introduction to Cryptographic Systems. Asst. Prof. Mihai Chiroiu Introduction to Cryptographic Systems Asst. Prof. Mihai Chiroiu Vocabulary In cryptography, cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Decryption

More information

Presented by: Kevin Hieb May 2, 2005

Presented by: Kevin Hieb May 2, 2005 Presented by: Kevin Hieb May 2, 2005 Governments National Finances National Security Citizens Companies Data Loss Monetary Loss Individuals Identity Theft Data Loss Networks Firewalls Intrusion Detection

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD

SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD OS SIMD 1 2 SIMD (Single Instruction Multiple Data) SIMD OS (Operating System) SIMD SIMD OS Utilization of a SIMD unit in the OS Kernel Shogo Saito 1 and Shuichi Oikawa 2 Nowadays, it is very common that

More information

Neil Costigan School of Computing, Dublin City University PhD student / 2 nd year of research.

Neil Costigan School of Computing, Dublin City University PhD student / 2 nd year of research. Crypto On the Cell Neil Costigan School of Computing, Dublin City University. neil.costigan@computing.dcu.ie +353.1.700.6916 PhD student / 2 nd year of research. Supervisor : - Dr Michael Scott. IRCSET

More information

3 Symmetric Key Cryptography 3.1 Block Ciphers Symmetric key strength analysis Electronic Code Book Mode (ECB) Cipher Block Chaining Mode (CBC) Some

3 Symmetric Key Cryptography 3.1 Block Ciphers Symmetric key strength analysis Electronic Code Book Mode (ECB) Cipher Block Chaining Mode (CBC) Some 3 Symmetric Key Cryptography 3.1 Block Ciphers Symmetric key strength analysis Electronic Code Book Mode (ECB) Cipher Block Chaining Mode (CBC) Some popular block ciphers Triple DES Advanced Encryption

More information

GPU Architecture. Alan Gray EPCC The University of Edinburgh

GPU Architecture. Alan Gray EPCC The University of Edinburgh GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From

More information

Scanned by CamScanner

Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Symmetric-Key Cryptography CS 161: Computer Security

More information

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems Ed Hinkel Senior Sales Engineer Agenda Overview - Rogue Wave & TotalView GPU Debugging with TotalView Nvdia CUDA Intel Phi 2

More information

Cuttingedge crypto graphy

Cuttingedge crypto graphy The latest cryptographic solutions from Linux on the System z platform BY PETER SPERA Cuttingedge crypto graphy Can Linux* for the IBM* System z* platform meet the cryptographic needs of today s enterprise

More information

Apps with Hardware Enabling Run-time Architectural Customization in Smart Phones

Apps with Hardware Enabling Run-time Architectural Customization in Smart Phones Apps with Hardware Enabling Run-time Architectural Customization in Smart Phones Michael Coughlin, Ali Ismail, Eric Keller University of Colorado Boulder Mobile Devices Devices are designed around certain

More information

G-NET: Effective GPU Sharing In NFV Systems

G-NET: Effective GPU Sharing In NFV Systems G-NET: Effective Sharing In NFV Systems Kai Zhang*, Bingsheng He^, Jiayu Hu #, Zeke Wang^, Bei Hua #, Jiayi Meng #, Lishan Yang # *Fudan University ^National University of Singapore #University of Science

More information

More on Cryptography CS 136 Computer Security Peter Reiher January 19, 2017

More on Cryptography CS 136 Computer Security Peter Reiher January 19, 2017 More on Cryptography CS 136 Computer Security Peter Reiher January 19, 2017 Page 1 Outline Desirable characteristics of ciphers Stream and block ciphers Cryptographic modes Uses of cryptography Symmetric

More information

Draft Version. GPU-Acceleration of Block Ciphers in the OpenSSL Cryptographic Library

Draft Version. GPU-Acceleration of Block Ciphers in the OpenSSL Cryptographic Library Draft Version GPU-Acceleration of Block Ciphers in the OpenSSL Cryptographic Library ABSTRACT Johannes Gilger Research Group IT-Security RWTH Aachen University Aachen, Germany Gilger@ITSec.RWTH-Aachen.de

More information

Parallel Processing SIMD, Vector and GPU s cont.

Parallel Processing SIMD, Vector and GPU s cont. Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP

More information

CSC574: Computer & Network Security

CSC574: Computer & Network Security CSC574: Computer & Network Security Lecture 3 Prof. William Enck Spring 2016 (Derived from slides by Micah Sherr, Patrick McDaniel, and Peng Ning) Modern Cryptography 2 Kerckhoffs Principles Modern cryptosystems

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

n-bit Output Feedback

n-bit Output Feedback n-bit Output Feedback Cryptography IV Encrypt Encrypt Encrypt P 1 P 2 P 3 C 1 C 2 C 3 Steven M. Bellovin September 16, 2006 1 Properties of Output Feedback Mode No error propagation Active attacker can

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v7.0 March 2015 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

Cryptography & Key Exchange Protocols. Faculty of Computer Science & Engineering HCMC University of Technology

Cryptography & Key Exchange Protocols. Faculty of Computer Science & Engineering HCMC University of Technology Cryptography & Key Exchange Protocols Faculty of Computer Science & Engineering HCMC University of Technology Outline 1 Cryptography-related concepts 2 3 4 5 6 7 Key channel for symmetric cryptosystems

More information

Lecture Nov. 21 st 2006 Dan Wendlandt ISP D ISP B ISP C ISP A. Bob. Alice. Denial-of-Service. Password Cracking. Traffic.

Lecture Nov. 21 st 2006 Dan Wendlandt ISP D ISP B ISP C ISP A. Bob. Alice. Denial-of-Service. Password Cracking. Traffic. 15-441 Lecture Nov. 21 st 2006 Dan Wendlandt Worms & Viruses Phishing End-host impersonation Denial-of-Service Route Hijacks Traffic modification Spyware Trojan Horse Password Cracking IP Spoofing DNS

More information

Recent Advances in Heterogeneous Computing using Charm++

Recent Advances in Heterogeneous Computing using Charm++ Recent Advances in Heterogeneous Computing using Charm++ Jaemin Choi, Michael Robson Parallel Programming Laboratory University of Illinois Urbana-Champaign April 12, 2018 1 / 24 Heterogeneous Computing

More information

Cryptography Functions

Cryptography Functions Cryptography Functions Lecture 3 1/29/2013 References: Chapter 2-3 Network Security: Private Communication in a Public World, Kaufman, Perlman, Speciner Types of Cryptographic Functions Secret (Symmetric)

More information

VPN Overview. VPN Types

VPN Overview. VPN Types VPN Types A virtual private network (VPN) connection establishes a secure tunnel between endpoints over a public network such as the Internet. This chapter applies to Site-to-site VPNs on Firepower Threat

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

PARALLEL SYSTEMS PROJECT

PARALLEL SYSTEMS PROJECT PARALLEL SYSTEMS PROJECT CSC 548 HW6, under guidance of Dr. Frank Mueller Kaustubh Prabhu (ksprabhu) Narayanan Subramanian (nsubram) Ritesh Anand (ranand) Assessing the benefits of CUDA accelerator on

More information

Crypto: Symmetric-Key Cryptography

Crypto: Symmetric-Key Cryptography Computer Security Course. Song Crypto: Symmetric-Key Cryptography Slides credit: Dan Boneh, David Wagner, Doug Tygar Overview Cryptography: secure communication over insecure communication channels Three

More information

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

The question paper contains 40 multiple choice questions with four choices and students will have to pick the correct one (each carrying ½ marks.).

The question paper contains 40 multiple choice questions with four choices and students will have to pick the correct one (each carrying ½ marks.). Time: 3hrs BCA III Network security and Cryptography Examination-2016 Model Paper 2 M.M:50 The question paper contains 40 multiple choice questions with four choices and students will have to pick the

More information

Data Encryption Standard (DES)

Data Encryption Standard (DES) Data Encryption Standard (DES) Best-known symmetric cryptography method: DES 1973: Call for a public cryptographic algorithm standard for commercial purposes by the National Bureau of Standards Goals:

More information