CAESAR: Cryptanalysis of the Full AES Using GPU-Like Hardware

Similar documents
Survey of Codebreaking Machines. Swathi Guruduth Vivekanand Kamanuri Harshad Patil

Outline Marquette University

Lightweight Cryptography: Designing Crypto for Low Energy and Low Power

Fundamentals of Computer Design

Daniel J. Bernstein University of Illinois at Chicago & Technische Universiteit Eindhoven

Automatic Search for Related-Key Differential Characteristics in Byte-Oriented Block Ciphers

CS 3410: Computer System Organization and Programming

ECE520 VLSI Design. Lecture 1: Introduction to VLSI Technology. Payman Zarkesh-Ha

The Mont-Blanc approach towards Exascale

Minimum Area Cost for a 30 to 70 Gbits/s AES Processor

Microelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica

A Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis

Multimedia in Mobile Phones. Architectures and Trends Lund

Implementation of Full -Parallelism AES Encryption and Decryption

Secret Key Algorithms (DES) Foundations of Cryptography - Secret Key pp. 1 / 34

How to build a Megacore microprocessor. by Andreas Olofsson (MULTIPROG WORKSHOP 2017)

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

Secret Key Algorithms (DES)

Fundamentals of Computer Design

Lecture 2: Performance

PacketShader: A GPU-Accelerated Software Router

ECE 486/586. Computer Architecture. Lecture # 2

Parallelism and Concurrency. COS 326 David Walker Princeton University

Fra superdatamaskiner til grafikkprosessorer og

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Multi-Core Microprocessor Chips: Motivation & Challenges

Cryogenic Computing Complexity (C3) Marc Manheimer December 9, 2015 IEEE Rebooting Computing Summit 4

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Lecture 1: CS/ECE 3810 Introduction

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Thomas Polzer Institut für Technische Informatik

Fundamentals of Computers Design

PARAMETRIC TROJANS FOR FAULT-BASED ATTACKS ON CRYPTOGRAPHIC HARDWARE

GPGPU introduction and network applications. PacketShaders, SSLShader

CIT 668: System Architecture

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?

Implementation Tradeoffs for Symmetric Cryptography

Computers: Inside and Out

45-year CPU Evolution: 1 Law -2 Equations

Data Encryption Standard

VLSI Design Automation. Maurizio Palesi

The Computer Revolution. Classes of Computers. Chapter 1

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp

Introduction to GPGPU and GPU-architectures

A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation

Computer & Microprocessor Architecture HCA103

HPC Technology Trends

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Data Encryption Standard

ECE Lecture 2. Basic Concepts of Cryptology. Basic Vocabulary CRYPTOLOGY. Symmetric Key Public Key Protocols

Computer Architecture s Changing Definition

VLSI Design Automation

Processor Lang Keying Code Clocks to Key Clocks to Encrypt Option Size PPro/II ASM Comp PPr

CS/EE 6810: Computer Architecture

The Design of the KiloCore Chip

CIT 668: System Architecture. Computer Systems Architecture

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Experts in Application Acceleration Synective Labs AB

A Data-Parallel Genealogy: The GPU Family Tree

VLSI Design Automation

CS5222 Advanced Computer Architecture. Lecture 1 Introduction

Lecture 1: Gentle Introduction to GPUs

Moore s s Law, 40 years and Counting

Advanced Computer Architecture (CS620)

An Overview of Cryptanalysis Research for the Advanced Encryption Standard

Advanced processor designs

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

ENIAC - background. ENIAC - details. Structure of von Nuemann machine. von Neumann/Turing Computer Architecture

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

Parallelism: The Real Y2K Crisis. Darek Mihocka August 14, 2008

Computer Architecture

EE5780 Advanced VLSI CAD

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

CS Computer Architecture Spring Lecture 01: Introduction

Trends in the Infrastructure of Computing

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

(software agnostic) Computational Considerations

COPACOBANA: RECONFIGURABLE COMPUTING IN CRYPTANALYSIS. Ben Johnstone

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Fundamentals of Quantitative Design and Analysis

Computer Architecture. R. Poss

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

Symmetric Key Encryption. Symmetric Key Encryption. Advanced Encryption Standard ( AES ) DES DES DES 08/01/2015. DES and 3-DES.

EE586 VLSI Design. Partha Pande School of EECS Washington State University

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

Where Have We Been? Ch. 6 Memory Technology

Computer Architecture!

Homework 02. A, 8 letters, each letter is 7 bit, which makes 8*7=56bit, there will be 2^56= different combination of keys.

E-Passport: Cracking Basic Access Control Keys with COPACOBANA

TDT 4260 lecture 2 spring semester 2015

CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING Top 10 Supercomputers in the World as of November 2013*

Computer Architecture. Fall Dongkun Shin, SKKU

CS61C : Machine Structures

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.

Chapter 2. Perkembangan Komputer

Computer Architecture = CS/ECE 552: Introduction to Computer Architecture. 552 In Context. Why Study Computer Architecture?

Transcription:

CAESAR: Cryptanalysis of the Full AES Using GPU-Like Hardware Alex Biryukov and Johann Großschädl Laboratory of Algorithmics, Cryptology and Security University of Luxembourg SHARCS 2012, March 17, 2012

Motivation Cryptanalytic attack on the full AES Suppose complexity of < 2 100 computations AES-256: related-key cryptanalysis (2 99.5 ) AES-128: TMK trade-off (e.g. 2 96 ) Using special-purpose hardware Is it feasible? Totally infeasible? How would you design such hardware? Where are the bottlenecks?

Outline Cryptanalytic attacks on AES GPU-like AES processor for cryptanalysis Memory and storage CAESAR supercomputer Time and energy Outlook into the future

History Special-purpose hardware for cryptanalysis Cryptanalysis of Enigma and Lorenz Quasimodo for factoring DES-cracker (EFF) TWINKLE and TWIRL COPACOBANA Software Cell Processor GPUs

Cryptanalytic Attacks on AES

Related-Key Boomerang Attack Example: AES-256 Attacker knows relation betw. 4 secret keys Time and data complexity: 2 99.5 Memory complexity: 2 78 Motivation for using RK attack No practical impact due to reliance on related keys However, future single-key attacks may have similar structure and requirements

Time-Memory-Key Trade-Off Example: AES-128 Fixed plaintext encrypted under 2 32 keys 2 96 off-line pre-computation 2 80 on-line computation with 2 56 memory

GPU-like AES Processor GPU architecture Homogenous multi-core system local cache Much higher performance than a CPU Normal CPU: 4-8 ALUs GPU: several 100 ALUs Optimized for cryptanalysis of AES Keon Jang Replace CUDA cores by AES cores Data/keys either constant or generated on-chip

AES Core Optimized for high throughput Loop unrolling: 128-bit datapath for each round Do not need to support a mode of operation Inner-loop and outer-loop pipelining Example: Hodjat-Verbauwhede (2003) Architecture as above New plaintext each clock cycle Every round takes 4 clock cycles Latency of 41 cycles

Hodjat s AES Processor Performance 0.18 μm standard-cell library (2003) Can be clocked with 606 MHz (pipelining) Throughput: 77.6 Gbit/s Silicon area 473k gates (for 10 rounds) 660k gates (for 14 rounds for 256 bit keys)

NVIDIA GT200b Main Characteristics 240 shader cores, each up to 3 FLOP/cycle 1476 MHz, 350M gates (470 mm 2 ) 1000 GFLOPS (10x faster than Intel Core i7) GT200b is well-known in crypto community But today not state-of-the-art anymore

GPU-like AES Processor 500 AES cores based on Hodjat s design Each 660k gates, i.e. 330M gates in total Clocked at 2.0 GHz on 55 nm TSMC technology (requires better cooling) Throughput: 500 x 2 10 9 = 10 12 AES ops/sec

I/O Requirements Data (i.e. plaintexts) and keys Bandwidth no problem for data and key RKC: no need for key-agility (4 fixed keys); plaintext generated on chip (using counter) TMK: plaintext is constant; keys can be generated on chip Ciphertexts Only few ciphertexts are actually stored Transfer rate 2 21 slower than processing rate

Storage Requirements RKC: 2 78 bytes 3x10 10 harddisks of 10 TBytes each Bottleneck today, but feasible in 5-10 years Current state-of-the-art: 4 TB, 250$ retail price 100 TB @ 100$ in 5-10 years: 300 bln $ TMK: 2 61 bytes 92 mln $ now 2.3 mln $ in 5-10 years

CAESAR Supercomputer Assumptions about adversary 1 trillion (10 12 ) US$ for chip production Roughly US national defense budget in 2010 US budget deficit >1.4 trillion US$ in 2009 Cryptanalytic AES ARchitecture Hypothetical supercomputer to break AES 3x10 10 GPU-like AES processors (30$/proc.) 3x10 22 AES operations/sec 10 TB storage attached to each AES processor

Computation Time 3x10 22 AES operations/sec We assume attack time being determined by AES computations and not access to storage RKC attack 1 year for 2 99.5 AES ops (lower bound) TMK attack Pre-computation (2 96 AES ops): 1 month Online phase (2 80 AES ops): negligible 1/10th of budget: 1 year pre-comp, 8 min online

Production of Chips High capacity fab: 300,000 wafers/month About 100 AES processors per wafer We need 83 fabs (1 bln US$ each); time to build a fab: 18 months; fabs work 1 year

Energy 135 W per AES processor For 3x10 10 processors 4 TW US power consumption per year: 3.34 TW in 2005 Water cooling Energy seems to be the main bottleneck

Energy: Impact of Moore s Law Shrinking transistor sizes: From 55 nm in 2009 to 7 nm by 2020 Historical example: First 1 TFLOPS supercomputer was ASCI Red by Intel (1997) for Sandia Labs 10,000 Pentium Pro (333MHz): 500kW Now a single GT200b 1 TFLOPS @ 100 W Factor 5000 in 13 years

Summary Cryptanalytic AES ARchitecture

Outlook into Future Cryptanalytic breakthroughs? Moore s law (10 more years) Computers based on spin (spintronics) or optical computers New storage technologies Thermally assisted recording (10TB/inch) Further miniaturization (12 atoms/strorage cell) Quantum holography 3D optical storage (1TB DVD with 100 layers)

Conclusions Hypothetical supercomputer CAESAR TMK on AES-128 with 2 32 targets is well within reach of current VLSI technology For 2 99.5 time/data attack with 2 78 memory: memory complexity and power consumption are main bottlenecks, but not execution time Recommendations Focus on attacks with time complexity of up to 2 100, but memory complexity of less than 2 70 (and as little data as possible)

Questions?