Type-II optimal polynomial bases. D. J. Bernstein University of Illinois at Chicago. Joint work with: Tanja Lange Technische Universiteit Eindhoven

Similar documents
ECDLP course ECC2K-130. Daniel J. Bernstein University of Illinois at Chicago. Tanja Lange Technische Universiteit Eindhoven

Breaking ECC2K-130 on Cell processors and GPUs

Breaking ECC2K-130. May 20, Oberseminar Computer Security, COSEC group, B-IT, Bonn

ECC2K-130 on NVIDIA GPUs

ECC2K-130 on NVIDIA GPUs

My 2 hours today: 1. Efficient arithmetic in finite fields minute break 3. Elliptic curves. My 2 hours tomorrow:

ECDLP on GPU I. INTRODUCTION

McBits: fast constant-time code-based cryptography. (to appear at CHES 2013)

D. J. Bernstein University of Illinois at Chicago & Technische Universiteit Eindhoven. Joint work with: Tanja Lange Technische Universiteit Eindhoven

The Certicom Challenges ECC2-X

High-Performance Modular Multiplication on the Cell Broadband Engine

Vectorized implementations of post-quantum crypto

Hardware for Collision Search on Elliptic Curve over GF(2 m )

Example: Adding 1000 integers on Cortex-M4F. Lower bound: 2n + 1 cycles for n LDR + n ADD. Imagine not knowing this : : :

Efficient Implementation of the η T Pairing on GPU

Breaking ECC2K-130. Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium 3 Department of Computer Science

Recommendation to Protect Your Data in the Future

High-Performance Cryptography in Software

Algorithms and arithmetic for the implementation of cryptographic pairings

NEON: Faster Elliptic Curve Scalar Multiplications on ARM Processors

Initial recommendations of long-term secure post-quantum systems

Key Management and Distribution

High-Performance Modular Multiplication on the Cell Processor

The H2020 PQCRYPTO project

Understanding Cryptography by Christof Paar and Jan Pelzl. Chapter 9 Elliptic Curve Cryptography

ampire Virtual Application and Implementation Research Lab

HIGH PERFORMANCE ELLIPTIC CURVE CRYPTO-PROCESSOR FOR FPGA PLATFORMS

High-speed cryptography and cryptanalysis

Daniel J. Bernstein University of Illinois at Chicago & Technische Universiteit Eindhoven

SHARCS 09. Special-purpose Hardware for Attacking Cryptographic Systems September Lausanne, Switzerland

Collision Search for Elliptic Curve Discrete Logarithm over GF(2 m ) with FPGA

Outline. Fast Inversion Architectures over GF(2 233 ) using pre-com puted Exponentiation Matrices. SCD Nov , 2011.

Cryptography and Network Security

A High-Speed FPGA Implementation of an RSD- Based ECC Processor

Implementing the NewHope-Simple Key Exchange on Low-Cost FPGAs

Shapes of Elliptic Curves

This chapter continues our overview of public-key cryptography systems (PKCSs), and begins with a description of one of the earliest and simplest

Advances in Implementations of Code-based Cryptography on Embedded Systems

Cryptography and Network Security Chapter 10. Fourth Edition by William Stallings

ELLIPTIC CURVE CRYPTOGRAPHY ON SMART CARDS WITHOUT COPROCESSORS

Computers and Mathematics with Applications

Computer Security. 08. Cryptography Part II. Paul Krzyzanowski. Rutgers University. Spring 2018

Computer Security 3/23/18

Elliptic Curve Cryptography

Point Compression and Coordinate Recovery for Edwards Curves over Finite Field

Sneaking key escrow in through the back door

Low-Latency Elliptic Curve Scalar Multiplication

Usable assembly language for GPUs. D. J. Bernstein University of Illinois at Chicago

Montgomery Multiplication Using Vector Instructions

Implementing Practical leakage-resilient symmetric cryptography. University of Illinois at Chicago, Technische Universiteit Eindhoven

Digit-Level Semi-Systolic and Systolic Structures for the Shifted Polynomial Basis Multiplication Over Binary Extension Fields

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields

This is a repository copy of High Speed and Low Latency ECC Implementation over GF(2m) on FPGA.

Blind Differential Cryptanalysis for Enhanced Power Attacks

Deploying high-security cryptography Daniel J. Bernstein University of Illinois at Chicago

Elliptic Curves over Prime and Binary Fields in Cryptography

A Implementing Curve25519 for Side-Channel-Protected Elliptic Curve Cryptography

U k-2. U k-1 V 1. V k-1. W k-1 W 1 P 1. P k-1

IMPLEMENTATION OF MESSAGE AUTHENTICATION SCHEME WITH ELLIPTIC CURVE CRYPTOGRAPHY

High-Performance Integer Factoring with Reconfigurable Devices

Classic McEliece: conservative code-based cryptography

Hardware/Software Co-design for Hyperelliptic Curve Cryptography (HECC) on the 8051 µp

Basic Concepts and Definitions. CSC/ECE 574 Computer and Network Security. Outline

CPSC 467b: Cryptography and Computer Security

SEC 1: Elliptic Curve Cryptography

Stream Ciphers An Overview

SIMD Acceleration of Modular Arithmetic on Contemporary Embedded Platforms

FSBday: Implementing Wagner s generalized birthday attack against the SHA-3 round-1 candidate FSB

Introduction to Cryptography and Security Mechanisms: Unit 5. Public-Key Encryption

A Binary Redundant Scalar Point Multiplication in Secure Elliptic Curve Cryptosystems

Elliptic Curve Cryptography

MONTGOMERY MODULAR MULTIPLICATION ALGORITHM ON MULTI-CORE SYSTEMS. Junfeng Fan, Kazuo Sakiyama, and Ingrid Verbauwhede

Elliptic Curve Public Key Cryptography

McEliece Cryptosystem in real life: security and implementation

Missing a trick: Karatsuba variations

Performance Analysis of Contemporary Lightweight Block Ciphers on 8-bit Microcontrollers

Elliptic vs. hyperelliptic, part 1. D. J. Bernstein

Reduced Memory Meet-in-the-Middle Attack against the NTRU Private Key

Parallelizing Cryptography. Gordon Werner Samantha Kenyon

A Review of Key Length SelectionFormula for Elliptic Curve Cryptosystems

Public Key Cryptography and RSA

On the Practical Exploitability of Dual EC in TLS Implementations

FSBday: Implementing Wagner s generalized birthday attack against the SHA-3 candidate FSB

CRYPTOGRAPHY plays an important role in the security

How to manipulate curve standards: a white paper for the black hat

Studying Software Implementations of Elliptic Curve Cryptography

ECC Elliptic Curve Cryptography. Foundations of Cryptography - ECC pp. 1 / 31

Hardware/Software Co-Design of Elliptic Curve Cryptography on an 8051 Microcontroller

Talks at Mathematical Structures for Cryptography

Elliptic Curve Cryptography on a Palm OS Device

Crypto Basics. Recent block cipher: AES Public Key Cryptography Public key exchange: Diffie-Hellmann Homework suggestion

Benchmarking benchmarking, and optimizing optimization. Daniel J. Bernstein. University of Illinois at Chicago & Technische Universiteit Eindhoven

Chapter 9 Public Key Cryptography. WANG YANG

Cryptology complementary. Finite fields the practical side (1)

ELLIPTIC curve cryptography (ECC) is a public key cryptography

Generic GF(2 m ) Arithmetic in Software and its Application to ECC

Efficient Elliptic Curve Processor Architectures for Field Programmable Logic

Daniel V. Bailey 1 and Christof Paar 2. 1 Computer Science Department, Worcester Polytechnic Institute, Worcester, MA.

FSBday: Implementing Wagner s generalized birthday attack against the SHA-3 round-1 candidate FSB

ESP Egocentric Social Platform

Transcription:

Type-II optimal polynomial bases D. J. Bernstein University of Illinois at Chicago Joint work with: Tanja Lange Technische Universiteit Eindhoven

Bigger project: Breaking ECC2K-130. Daniel V. Bailey, Lejla Batina, Daniel J. Bernstein, Peter Birkner, Joppe W. Bos, Hsieh-Chung Chen, Chen-Mou Cheng, Gauthier van Damme, Giacomo de Meulenaer, Luis Julian Dominguez Perez, Junfeng Fan, Tim Güneysu, Frank Gürkaynak, Thorsten Kleinjung, Tanja Lange, Nele Mentens, Ruben Niederhagen, Christof Paar, Francesco Regazzoni, Peter Schwabe, Leif Uhsadel, Anthony Van Herrewege, Bo-Yin Yang.

The target: ECC2K-130 1997: Certicom announces several elliptic-curve challenges. The Challenge is to compute the ECC private keys from the given list of ECC public keys and associated system parameters. This is the type of problem facing an adversary who wishes to completely defeat an elliptic curve cryptosystem. Goals: help users select key sizes; compare random and Koblitz; compare F 2 m and F ; etc. p

1997: ECCp-79 broken by Baisley and Harley. 1997: ECC2-79 broken by Harley et al. 1998: ECCp-89, ECC2-89 broken by Harley et al. 1998: ECCp-97 broken by Harley et al. (1288 computers). 1998: ECC2K-95 broken by Harley et al. (200 computers). 1999: ECC2-97 broken by Harley et al. (740 computers). 2000: ECC2K-108 broken by Harley et al. (9500 computers).

Certicom: The 109-bit Level I challenges are feasible using a very large network of computers. The 131-bit Level I challenges are expected to be infeasible against realistic software and hardware attacks, unless of course, a new algorithm for the ECDLP is discovered. 2002: ECCp-109 broken by Monico et al. (10000 computers). 2004: ECC2-109 broken by Monico et al. (2600 computers). Next challenge: ECC2K-130.

The attacker: ECRYPT European Union has funded ECRYPT I network (2004 2008), ECRYPT II network (2008 2012). ECRYPT II: KU Leuven; ENS; EPFL; RU Bochum; RHUL; TU Eindhoven; TU Graz; U Bristol; U Salerno; France Télécom; IBM Research; 22 adjoint members. Work is handled by virtual labs : SymLab: secret-key crypto; MAYA: public-key crypto; VAMPIRE: implementations.

Working groups in VAMPIRE: VAM1: Efficient Implementation of Security Systems. VAM2: Physical Security. 2009.02: VAMPIRE (VAM1) sets its sights on ECC2K-130. Exactly how difficult is breaking ECC2K-130? Also ECC2-131 etc. Sensible topic for implementors. Optimizing ECC attacks isn t far from optimizing ECC.

With our latest implementations, ECC2K-130 is breakable in two years on average by 1595 Phenom II x4 955 CPUs,

With our latest implementations, ECC2K-130 is breakable in two years on average by 1595 Phenom II x4 955 CPUs, or 1231 Playstation 3s,

With our latest implementations, ECC2K-130 is breakable in two years on average by 1595 Phenom II x4 955 CPUs, or 1231 Playstation 3s, or 631 GTX 295 cards,

With our latest implementations, ECC2K-130 is breakable in two years on average by 1595 Phenom II x4 955 CPUs, or 1231 Playstation 3s, or 631 GTX 295 cards, or 308 XC3S5000 FPGAs,

With our latest implementations, ECC2K-130 is breakable in two years on average by 1595 Phenom II x4 955 CPUs, or 1231 Playstation 3s, or 631 GTX 295 cards, or 308 XC3S5000 FPGAs, or any combination thereof.

With our latest implementations, ECC2K-130 is breakable in two years on average by 1595 Phenom II x4 955 CPUs, or 1231 Playstation 3s, or 631 GTX 295 cards, or 308 XC3S5000 FPGAs, or any combination thereof. This is a computation that Certicom called infeasible?

With our latest implementations, ECC2K-130 is breakable in two years on average by 1595 Phenom II x4 955 CPUs, or 1231 Playstation 3s, or 631 GTX 295 cards, or 308 XC3S5000 FPGAs, or any combination thereof. This is a computation that Certicom called infeasible? Certicom has now backpedaled, saying that ECC2K-130 may be within reach.

ECC2K-130 optimization Breaking ECC2K-130 combines many levels of optimization. This talk focuses on arithmetic in the finite field F 2 131 inside ECC2K-130.

ECC2K-130 optimization Breaking ECC2K-130 combines many levels of optimization. This talk focuses on arithmetic in the finite field F 2 131 inside ECC2K-130. Many implementation decisions, platform-specific optimizations. This talk focuses on a simple but useful cost metric, namely # bit operations: i.e., #ANDs + #XORs.

Define M(n) as minimum # bit operations for multiplying n-bit polys. e.g. M(131) 34061 from schoolbook multiplication: 131 2 ANDs + 130 2 XORs.

Define M(n) as minimum # bit operations for multiplying n-bit polys. e.g. M(131) 34061 from schoolbook multiplication: 131 2 ANDs + 130 2 XORs. Much lower costs are known. Optimizations: Karatsuba (not Karatsuba Ofman : K O paper credits Karatsuba); refined Karatsuba; Toom; etc. Current record (CRYPTO 2009): M(131) 11961.

Your metric is too simple! Hardware has area-time tradeoffs! Software does not work on bits!

Your metric is too simple! Hardware has area-time tradeoffs! Software does not work on bits! Response: Optimizing bit operations is very close to optimizing the throughput of unrolled, pipelined hardware. See, e.g., ECC2K-130 FPGA paper to appear at FPL 2010.

Your metric is too simple! Hardware has area-time tradeoffs! Software does not work on bits! Response: Optimizing bit operations is very close to optimizing the throughput of unrolled, pipelined hardware. See, e.g., ECC2K-130 FPGA paper to appear at FPL 2010. Also very close to optimizing the speed of software using vectorized bit operations. See, e.g., ECC2K-130 Cell paper at AFRICACRYPT 2010.

All of the implementations of the ECC2K-130 attack started with the standard pentanomial basis 1; z; z 2 ; : : : ; z 130 of F 2 131 = F 2 [z]=(z 131 + z 13 + z 2 + z + 1). Cost M(131) + 455 for multiplication. Cost 203 for squaring. Our final attack iteration has 5 mults, 21 squarings, 1 normal-basis Hamming weight. Question at start of project: Work entirely in normal basis? Critical issue: mult speed.

Type-I normal basis of F 2 n is a permutation of ; 2 ; : : : ; n in F 2 []=( n + + + 1). Cost M(n) to multiply, obtaining coefficients of 2 ; 3 ; : : : ; 2n. Cost 2n 2 to reduce 2 ; 3 ; : : : ; 2n to ; 2 ; : : : ; n. Alternative (1989 Itoh Tsujii), slightly faster when n is large: redundant 1; ; : : : ; n ; cost M(n + 1) + n.

But F 2 131 doesn t have a type-i normal basis. F 2 131 has a type-ii normal basis + 1, 2 + 2, 4 + 4, : : :, 2130 + 2130 where is a primitive 263rd root of 1. 1995 Gao von zur Gathen Panario: Can multiply on type-ii normal basis of F 2 n by multiplying in F 2 []=( 2n + + 1). Cost > 2M(n).

2001 Bolotov Gashkov: Can quickly convert from type-ii normal basis c; c 2 ; c 4 ; : : : ; c 2n 1 to standard basis 1; c; c 2 ; : : : ; c n 1 where c = + 1. Cost (n=2) lg n + 3n. e.g. 853 for n = 131. Same cost for inverse. (Analysis is too pessimistic; actual cost is lower.)

Bolotov Gashkov multiply in this standard basis with a poly mult, cost M(n), and a reduction modulo the minimal polynomial of c. e.g. c 131 + c 130 + c 128 + c 124 + c 123 + c 122 + c 120 + c 115 + c 114 + c 112 + c 99 + c 98 + c 96 + c 67 + c 66 + c 64 + c 3 + c 2 + 1 = 0 for n = 131. Bolotov Gashkov reduction uses sparsity; cost 2340. Overall cost M(131) + 4899 for type-ii normal-basis mult. Still too slow to be useful.

2007 Shokrollahi (first published in Ph.D. thesis, then in WAIFI 2007 paper by von zur Gathen, Shokrollahi, and Shokrollahi): Convert from type-ii normal basis to redundant 1; c; c 2 ; : : : ; c n. Multiply polynomials, producing redundant 1; c; c 2 ; : : : ; c 2n. Convert to redundant 1, + 1, 2 + 2, 3 + 3, : : : 2n + 2n. Use 2n+1 = 1 to eliminate redundancy.

For n = 131: Shokrollahi s analysis says M(132) + 3462. Our analysis of Shokrollahi s algorithm says M(132) + 1559. Easy speedup: M(131) + 1559. 5 mults, other iteration overhead: 5M(131) + 12249. Compare to pentanomial basis: 5M(131) + 14372.

Do even better by mixing permuted type-ii optimal normal basis + 1, 2 + 2, 3 + 3, : : :, n + n with type-ii optimal polynomial basis + 1, ( + 1 ) 2, ( + 1 ) 3, : : :, ( + 1 ) n. Use normal basis for outputs that will be provided to squaring, poly basis for outputs that will be provided to mult. Use a new reduction algorithm for poly-basis output. See paper for details.

Current iteration cost: 5M(131) + 10305. Practical impact: All of the ECC2K-130 implementations have upgraded from pentanomial basis to type-ii bases, saving time. Ongoing project: We are working on fast high-security ECC software using big Koblitz curves; much bigger than ECC2K-130!