Efficient Securing of Multithreaded Server Applications

Size: px

Start display at page:

Download "Efficient Securing of Multithreaded Server Applications"

Christopher Berry
5 years ago
Views:

1 Efficient Securing of Multithreaded Server Applications Mark Grechanik, Qing Xie, Matthew Hellige, Manoj Seshadrinathan, and Kelly L. Dempski Accenture Technology Labs Copyright 2007 Accenture. All rights reserved.

2 Defense In Depth A single point or layer of defense, no matter how robust, can leave the system exposed Defense in Depth establishes multiple layers of defense, all working in parallel Information Assurance through Defense-in-Depth, Directorate for Command, Control, Communications, and Computer Systems, U.S. Department of Defense Joint Staff, February 2000 Defense in Depth is a strategy that is used by U.S. federal agencies and by a majority of top 500 corporations to protect their infrastructures Copyright 2007 Accenture. All rights reserved. Accenture Technology Labs 2

Principles of Defense In Depth Deploy security solutions everywhere Use Cryptographic Accelerators (SSL cards) Use multiple layers of security solutions to protect the network against intrusions and

3 Principles of Defense In Depth Deploy security solutions everywhere Use Cryptographic Accelerators (SSL cards) Use multiple layers of security solutions to protect the network against intrusions and attacks Content filtering, PKI Protect the support infrastructure Encryption, decryption Collect and analyze security events to determine threat levels Encryption, decryption Copyright 2007 Accenture. All rights reserved. 3

4 One Size Does Not Fit All Why not to choose the strongest and fastest security solution? Finding an acceptable solution is a science of trade-off between many factors A solution must fit certain constraints, should be, among other things, adaptable, cost/benefit effective, and maintainable Copyright 2007 Accenture. All rights reserved. 4

5 Performance and Security: Conflicting Goals Performance is a key challenge in building largescale applications since its predictability is inherently difficult Weaving security solutions into the fabric of the architectures of these applications often worsens the performance of the resulting systems The performance degradation is more than 90% when all application data is protected, and it is worse when other security mechanisms are applied Copyright 2007 Accenture. All rights reserved. 5

6 Cryptographic Algorithms Are Costly Every time when a data message crosses a security boundary, this message is encrypted and later decrypted, causing the performance penalty Cryptographic operations in the Secure Socket Layer (SSL) protocol slow down downloading files from servers from 10 to up to 100 times, and they penalize the performance for web servers by as little as a factor of 3.4 to as much as nine Copyright 2007 Accenture. All rights reserved. 6

8 Multithreaded Server Applications Multiple thread execute different payloads Some threads perform crypto operations while other threads execute some business logic When some thread executes crypto operations on the CPU, it takes time from other threads that can execute business logic and send response to users faster If crypto operations are offloaded to some other device, their threads are put to sleep and the CPU executes other threads in parallel with crypto operations thereby improving the overall response Copyright 2007 Accenture. All rights reserved. 8

9 Our Goal Increase the throughput (reduce the average time per transaction) while implementing the Defense-In-Depth strategy We do not try to optimize specific transactions We design an approach that does not require to change applications Copyright 2007 Accenture. All rights reserved. 9

10 Insight Since cryptographic algorithms are costly and they are orthogonal to the core business logic of applications, offload them to idle services Additional CPUs Cryptographic accelerators (cards) Cryptographic processors Extending compiler with security-related optimizations Extending kernels of operating systems Graphics Processing Units (GPUs) Copyright 2007 Accenture. All rights reserved. 10

11 Considerations For Using the GPU Since GPUs are often installed on computers by default, no additional work to add them is required Many servers come with preinstalled GPUs Dell PowerEdge TM servers come with the integrated ATI ES1000 controller The GPU programming model is well-known and documented The GPU inherently supports parallel computations GPUs are cheaper than other hardware solutions GPUs are resistant to different attacks Copyright 2007 Accenture. All rights reserved. 11

12 Our Solution We offer a novel solution, called PErformance and Security Together (PEST) for securing applications efficiently Our key idea is a Graphics Processing Unit (GPU)- management scheme free the CPU from cryptographic computations PESTO comprises offloading, batching, and scheduling mechanisms for executing cryptographic algorithms efficiently on the GPU Copyright 2007 Accenture. All rights reserved. 12

13 Our Solution We offer a novel solution, called PErformance and Security TOgether (PESTO) for securing applications efficiently Our key idea is a Graphics Processing Unit (GPU)- management scheme free the CPU from cryptographic computations PESTO comprises offloading, batching, and scheduling mechanisms for executing cryptographic algorithms efficiently on the GPU Copyright 2007 Accenture. All rights reserved. 13

14 Benefits For Accenture Achieve better performance of various applications when securing them PESTO Tiger Cheaper and easierto-use solution Orthographical error Utilize previously unused resources Copyright 2007 Accenture. All rights reserved. 14

CPU vs GPU The CPU is based on Single Instruction Single Data (SISD) model and the GPU is

16 CPU (SISD) vs GPU (SIMD) In SISD the CPU executes one instruction at a time on a single data element that is loaded from a storage into the memory prior to executing the instruction In contrast, an SIMD processor (i.e., the GPU) comprises many processing units that simultaneously execute instructions from a single instruction stream on multiple data streams, one per processing unit The GPU achieves a higher level of parallelism than the CPU Copyright 2007 Accenture. All rights reserved. 16

17 The GPU Is Designed For Graphics Computations The computer memory is considered a one-dimensional array for the CPU The GPU is designed for graphic computations where the memory is viewed as a two-dimensional array Copyright 2007 Accenture. All rights reserved. 17

The CPU-GPU Computation Model CPU Memory Bus Read-only GPU memory (Texture) Bus Draw

21 Bottlenecks The CPU is utilized 100% Multiple threads execute payload code fragments Each transfer from the CPU to the GPU involves a certain startup cost Each invocation of the draw call on the GPU involves some initialization cost Copyright 2007 Accenture. All rights reserved. 21

26 Processing Times Average transfer time per byte T x = k n= 1 T x ( M ) M k n n Average time of the draw call overhead per byte T d = k n= 1 T d ( M ) M k n n Average overhead time per byte Tb = T + T x d Copyright 2007 Accenture. All rights reserved. 26

37 Our Solution Batch smaller messages into a large message Put CPU-bound threads that should perform cryptographic operations to sleep The CPU will be busy working on other threads that perform business logic-related operations Send the batched messages to the GPU, execute crypto operations, send the results back to the CPU, and wake up the waiting threads Copyright 2007 Accenture. All rights reserved. 37

40 Model of Batch Server Size L Server Arrivals Departures Queue Batch The GPU The server (the GPU) has a maximum capacity of K units After the service period ends, the GPU is idle, and the timer kicks off to keep track of the idle time Once the idle time exceeds the max idle time, the service will be started with the messages waiting in the queue The batch size, L, may be smaller that the max capacity K If the queue is empty at the expiration of the max idle time, service will be initiated with the first message arriving Copyright 2007 Accenture. All rights reserved. 40

41 Modeling Arrivals and Services X n-1 : the number of messages in queue immediately after the n-1 th batch has left the GPU (server) X n-1 <=K: the queue will be empty after the next service starts X n-1 >K: the queue will be left with X n-1 -K messages after the next service starts Y n =max(0, X n-1 -K): the number of messages in queue immediately after the n th service starts at the GPU G n : the number of messages that arrive during the n th batch service at the GPU X n =Y n +G n : the number of messages in queue immediately after the n th batch has left the GPU (server) Copyright 2007 Accenture. All rights reserved. 46

42 Discrete-Time Analysis y n (k) and g n (k) are distributions for Y n and G n x n (k) is a distribution for the number of messages X n x n (k) = y n (k) g n (k) The mean number of arrivals during an idle period E L 1 i= 0 [ P ] = ( L i) x( i) idle The mean number of arrivals during a busy period = busy E P ( ) E f L [ ] E A Copyright 2007 Accenture. All rights reserved. 47

44 Experimental Setup We implemented the AES algorithm on the GPU We used the application PetStore to evaluate PESTO We wrote a program that simulates virtual users We measured the average time per transaction for the original system for different numbers of virtual users, for the original system with the crypto operations implemented on the CPU, and with crypto operations implemented on the GPU Copyright 2007 Accenture. All rights reserved. 51

46 Conclusions We designed a system called PESTO that allows users to add security without affecting performance significantly Remaining work is in linking the model to the characteristics of a system so that we can predict and advise how to fine tune our approach to a specific system Copyright 2007 Accenture. All rights reserved. 53

Parallel Processing SIMD, Vector and GPU s cont.

Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP