The Salsa20 Family of Stream Ciphers Based on [Bernstein, 2008] Erin Hales, Gregor Matl, Simon-Philipp Merz Introduction to Cryptology November 13, 2017
From a security perspective, if you re connected, you re screwed. Daniel J. Bernstein
From a security perspective, if you re connected, you re screwed. Daniel J. Bernstein The average user doesn t give a damn what happens, as long as (1) it works and (2) it s fast. Daniel J. Bernstein
From a security perspective, if you re connected, you re screwed. Daniel J. Bernstein The average user doesn t give a damn what happens, as long as (1) it works and (2) it s fast. Daniel J. Bernstein I won t be satisfied until I ve put the entire security industry out of work. Daniel J. Bernstein
The Salsa20 Family Introduction Low Level High Level Medium Level
Introduction Introduction Salsa 20/r is a family of stream ciphers designed by Daniel J. Bernstein submitted to estream in 2005 4 / 36
Introduction Introduction Salsa 20/r is a family of stream ciphers designed by Daniel J. Bernstein submitted to estream in 2005 will explain decisions that were made while designing the code on different levels i.e. the operations used, how blocks interact and how blocks are generated 4 / 36
Introduction Introduction Salsa 20/r is a family of stream ciphers designed by Daniel J. Bernstein submitted to estream in 2005 will explain decisions that were made while designing the code on different levels i.e. the operations used, how blocks interact and how blocks are generated Bernstein released variant of Salsa20, named Chacha, on which SHA-3 finalist BLAKE is based 4 / 36
Introduction Introduction Salsa 20/r is a family of stream ciphers designed by Daniel J. Bernstein submitted to estream in 2005 will explain decisions that were made while designing the code on different levels i.e. the operations used, how blocks interact and how blocks are generated Bernstein released variant of Salsa20, named Chacha, on which SHA-3 finalist BLAKE is based Salsa20 is free for any use 4 / 36
Introduction General facts on Salsa20/r sender and receiver share short secret key (128 and 256 bit keys are supported) secret key, 64 bit nonce, 64 bit counter and four blocks of 32 bit constants used to construct 512 bit initial state using r rounds the 512 initial state gets updated and finally algorithm puts 512 bit keystream out keystream of Salsa20 can be used to encrypt series of messages (short or long messages and just one or billions of messages) - maximum keystream 2 70 bits 5 / 36
The Salsa20 Family Introduction Low Level High Level Medium Level
Low Level Goal of design fast encryption function suitable for wide range of applications (encrypt large amount of data, in little time with limited resources) security 7 / 36
Low Level Goal of design fast encryption function suitable for wide range of applications (encrypt large amount of data, in little time with limited resources) security to satisfy different needs in security vs. performance trade-off, there are three versions of Salsa20 proposed: Salsa20/20, Salsa20/12 and Salsa20/8 7 / 36
Low Level Low level: Which operations are used? Round transformations of Salsa20 use combination of three simple operations: addition of two 32-bit words modulo 2 32 32-bit XOR constant-distance 32-bit rotation 8 / 36
Low Level Low level: Which operations are used? Round transformations of Salsa20 use combination of three simple operations: addition of two 32-bit words modulo 2 32 32-bit XOR constant-distance 32-bit rotation Can we get the same security at higher speed? 8 / 36
Low Level Why no integer multiplication? Advantages: output bits are complicated functions of input (mixed very thoroughly) impressively fast multiplication circuits included in some CPU 9 / 36
Low Level Why no integer multiplication? Advantages: output bits are complicated functions of input (mixed very thoroughly) impressively fast multiplication circuits included in some CPU Disadvantages: massive speed penalties on other CPU s (comparable sequence of simple integer operations always reasonable fast) higher risk of timing leaks 9 / 36
Low Level Why no S-Box lookups? Advantages: single table lookup can mangle input thoroughly 10 / 36
Low Level Why no S-Box lookups? Advantages: single table lookup can mangle input thoroughly Disadvantages: integer operation takes 32-bit input instead of 8-bit, thus mangels several inputs at once vulnerable to timing attacks (S-Box lookups in constant time would be slow) 10 / 36
The Salsa20 Family Introduction Low Level High Level Medium Level
High Level High Level: How do blocks interact? What does Salsa20 do? Expands 256-bit key and 64-bit nonce into 2 70 -byte stream. Salsa20 encrypts a b-byte plaintext by XOR-ing plaintext with the first b bytes of the stream and discarding the rest of the stream. Salsa20 decrypts a b-byte ciphertext by XOR-ing the first b bytes of the stream with the ciphertext. 12 / 36
High Level High Level: How do blocks interact? What does Salsa20 do? Expands 256-bit key and 64-bit nonce into 2 70 -byte stream. Salsa20 encrypts a b-byte plaintext by XOR-ing plaintext with the first b bytes of the stream and discarding the rest of the stream. Salsa20 decrypts a b-byte ciphertext by XOR-ing the first b bytes of the stream with the ciphertext. What does this mean? There is no feedback from the plaintext or ciphertext in to the stream. There is no chaining from one block to the next. Parallel computing. No preprocessing costs. 12 / 36
High Level Should encryption and decryption be different? In counter mode, OFB mode and CBC mode each ciphertext block is the XOR of the plaintext block and stream block at the same position. 13 / 36
High Level Should encryption and decryption be different? In counter mode, OFB mode and CBC mode each ciphertext block is the XOR of the plaintext block and stream block at the same position. In contrast, some ciphers mangle the plaintext in a more complicated way. For example AES in CBC mode: the nth plaintext block p n is converted into the nth ciphertext block c n by the formula: c n = AES k (C n 1 p n ). 13 / 36
High Level Should encryption and decryption be different? Why is CBC mode so popular? Historical accident? 14 / 36
High Level Should encryption and decryption be different? Why is CBC mode so popular? Historical accident? 14 / 36
High Level Should encryption and decryption be different? Why is CBC mode so popular? Historical accident? Increased costs as it requires different codes. Extra communication required is a security threat, so extra round needed which adds extra time. Security proof assumes block cipher outputs for attacker-controlled inputs are indistinguishable from uniform. 14 / 36
High Level Should stream cipher depend on the plaintext? For Salsa20 ciphertext and plaintext don t depend on the stream. However, some stream ciphers produce a stream which depends on plaintext. 15 / 36
High Level Should stream cipher depend on the plaintext? For Salsa20 ciphertext and plaintext don t depend on the stream. However, some stream ciphers produce a stream which depends on plaintext. Advantage: allows message authentication for free. 15 / 36
High Level Should stream cipher depend on the plaintext? For Salsa20 ciphertext and plaintext don t depend on the stream. However, some stream ciphers produce a stream which depends on plaintext. Advantage: allows message authentication for free. Disadvantages: free is an exaggeration, it does take time. Incorporation of plaintext is a security threat. 15 / 36
High Level Should stream cipher depend on the plaintext? For Salsa20 ciphertext and plaintext don t depend on the stream. However, some stream ciphers produce a stream which depends on plaintext. Advantage: allows message authentication for free. Disadvantages: free is an exaggeration, it does take time. Incorporation of plaintext is a security threat. State-of-the-art 128-bit authenticators can be computed in just a few cycles per byte. While this may exceed the cost of free authentication for legitimate packets it is much less expensive than free authentication for forged packets. 15 / 36
High Level Should there be more state? Salsa20 carries minimal state between blocks, whereas most stream ciphers carry a larger state, reusing part of the first block as an input to the second etc. 16 / 36
High Level Should there be more state? Salsa20 carries minimal state between blocks, whereas most stream ciphers carry a larger state, reusing part of the first block as an input to the second etc. Advantage of larger state: saves time after first block since we don t need as many cipher rounds to achieve same security level. Disadvantages of larger state: Ciphers that chain can handle fewer communication channels simultaneously. 16 / 36
High Level Should there be more state? Salsa20 carries minimal state between blocks, whereas most stream ciphers carry a larger state, reusing part of the first block as an input to the second etc. Advantage of larger state: saves time after first block since we don t need as many cipher rounds to achieve same security level. Disadvantages of larger state: Ciphers that chain can handle fewer communication channels simultaneously. Reuse forces serialisation. Random access to the stream is prohibited unless the stream is precomputed (memory costs) and saved. inability to exploit parallelism is often a disaster". 16 / 36
High Level Should blocks be larger than 64 bytes? Salsa20 hashes key, nonce and block counter in to 64 byte block. Should a larger block size be used? 17 / 36
High Level Should blocks be larger than 64 bytes? Salsa20 hashes key, nonce and block counter in to 64 byte block. Should a larger block size be used? Advantage of larger block size: Not as many rounds are needed to achieve same conjectured security level. 17 / 36
High Level Should blocks be larger than 64 bytes? Salsa20 hashes key, nonce and block counter in to 64 byte block. Should a larger block size be used? Advantage of larger block size: Not as many rounds are needed to achieve same conjectured security level. Disadvantage of larger block size: Larger block sizes lose time; CPUs are designed to work with less data. Increases overhead for inconvenient message sizes. 17 / 36
High Level Should keys be smaller than 256 bits? The original estream call was for 128-bit software ciphers. Salsa20 is a 256-bit cipher but allows smaller keys. The author recommends 256-bit keys. 18 / 36
High Level Should keys be smaller than 256 bits? The original estream call was for 128-bit software ciphers. Salsa20 is a 256-bit cipher but allows smaller keys. The author recommends 256-bit keys. Larger keys are more expensive so why are they necessary? 18 / 36
High Level Should keys be smaller than 256 bits? The original estream call was for 128-bit software ciphers. Salsa20 is a 256-bit cipher but allows smaller keys. The author recommends 256-bit keys. Larger keys are more expensive so why are they necessary? The argument in favour of 128-bit keys is that they cannot be found by a brute force attack because it is too expensive. If checking 2 20 keys per scond has CPU costing aboout 2 6 euros, then searching 2 128 keys in a year will cost 2 89 euros. 18 / 36
High Level Should keys be smaller than 256 bits? Why is this unrealistic? 19 / 36
High Level Should keys be smaller than 256 bits? Why is this unrealistic? No advances in technology are required to reduce the time and cost required. 19 / 36
High Level Should keys be smaller than 256 bits? Why is this unrealistic? No advances in technology are required to reduce the time and cost required. The attacker can succeed in fewer than 2 128 computations. He reaches success probability p afer just 2 128 p computations. 19 / 36
High Level Should keys be smaller than 256 bits? Why is this unrealistic? No advances in technology are required to reduce the time and cost required. The attacker can succeed in fewer than 2 128 computations. He reaches success probability p afer just 2 128 p computations. Each key checking circuit costs less than 2 6 euros since in bulk one or more circuits can fit on a single chip which reduces the attacker s costs by a factor of 2 10. 19 / 36
High Level Should keys be smaller than 256 bits? Why is this unrealistic? No advances in technology are required to reduce the time and cost required. The attacker can succeed in fewer than 2 128 computations. He reaches success probability p afer just 2 128 p computations. Each key checking circuit costs less than 2 6 euros since in bulk one or more circuits can fit on a single chip which reduces the attacker s costs by a factor of 2 10. Attacker can reduce cost by factor of 2 40 by simultaneously attacking (say) 2 40 keys. We can counter this by adding extra randomness in to nonces. However, putting extra randomness in to keys is less expensive. 19 / 36
The Salsa20 Family Introduction Low Level High Level Medium Level
Medium Level The Algorithm The Algorithm Input: 32 byte key: k 0... k 31 16 byte key repeat it twice 10 byte key pad it with 0s to 16 bytes and repeat that twice 16 byte nonce: n 0... n 15 16 byte block counter: c 0... c 15 Initial State: 0x61707865 k 3 k 2 k 1 k 0 k 7 k 6 k 5 k 4 k 11 k 10 k 9 k 8 k 15 k 14 k 13 k 12 0x3320646e n 3 n 2 n 1 n 0 n 7 n 6 n 5 n 4 c 3 c 2 c 1 c 0 c 7 c 6 c 5 c 4 0x79622d32 k 19 k 18 k 17 k 16 k 23 k 22 k 21 k 20 k 27 k 26 k 25 k 24 k 31 k 30 k 29 k 28 0x6b206574 21 / 36
Medium Level The Algorithm The Algorithm 0x61707865 k 3 k 2 k 1 k 0 k 7 k 6 k 5 k 4 k 11 k 10 k 9 k 8 k 15 k 14 k 13 k 12 0x3320646e n 3 n 2 n 1 n 0 n 7 n 6 n 5 n 4 c 3 c 2 c 1 c 0 c 7 c 6 c 5 c 4 0x79622d32 k 19 k 18 k 17 k 16 k 23 k 22 k 21 k 20 k 27 k 26 k 25 k 24 k 31 k 30 k 29 k 28 0x6b206574 22 / 36
Medium Level The Algorithm The Algorithm Confusion: 0x61707865 k 3 k 2 k 1 k 0 k 7 k 6 k 5 k 4 k 11 k 10 k 9 k 8 k 15 k 14 k 13 k 12 0x3320646e n 3 n 2 n 1 n 0 n 7 n 6 n 5 n 4 c 3 c 2 c 1 c 0 c 7 c 6 c 5 c 4 0x79622d32 k 19 k 18 k 17 k 16 k 23 k 22 k 21 k 20 k 27 k 26 k 25 k 24 k 31 k 30 k 29 k 28 0x6b206574 below diagonal := below diagonal (( diagonal + above diagonal ) 7) 23 / 36
Medium Level The Algorithm The Algorithm Confusion: 0x61707865 k 3 k 2 k 1 k 0 k 7 k 6 k 5 k 4 k 11 k 10 k 9 k 8 0x3320646e n 3 n 2 n 1 n 0 n 7 n 6 n 5 n 4 c 3 c 2 c 1 c 0 c 7 c 6 c 5 c 4 0x79622d32 k 19 k 18 k 17 k 16 k 23 k 22 k 21 k 20 k 27 k 26 k 25 k 24 k 31 k 30 k 29 k 28 0x6b206574 below diagonal := below diagonal (( diagonal + above diagonal ) 7) 24 / 36
Medium Level The Algorithm The Algorithm Confusion: 0x61707865 k 3 k 2 k 1 k 0 k 7 k 6 k 5 k 4 k 11 k 10 k 9 k 8 0x3320646e n 3 n 2 n 1 n 0 n 7 n 6 n 5 n 4 c 3 c 2 c 1 c 0 c 7 c 6 c 5 c 4 0x79622d32 k 19 k 18 k 17 k 16 k 23 k 22 k 21 k 20 k 27 k 26 k 25 k 24 k 31 k 30 k 29 k 28 0x6b206574 below diagonal := below diagonal (( diagonal + above diagonal ) 7) two below diagonal := two below diagonal (( below diagonal + diagonal ) 9) 25 / 36
Medium Level The Algorithm The Algorithm Confusion: 0x61707865 k 3 k 2 k 1 k 0 k 7 k 6 k 5 k 4 k 11 k 10 k 9 k 8 0x3320646e n 3 n 2 n 1 n 0 n 7 n 6 n 5 n 4 c 7 c 6 c 5 c 4 0x79622d32 k 19 k 18 k 17 k 16 k 23 k 22 k 21 k 20 k 27 k 26 k 25 k 24 k 31 k 30 k 29 k 28 0x6b206574 below diagonal := below diagonal (( diagonal + above diagonal ) 7) two below diagonal := two below diagonal (( below diagonal + diagonal ) 9) 26 / 36
Medium Level The Algorithm The Algorithm Confusion: 0x61707865 k 3 k 2 k 1 k 0 k 7 k 6 k 5 k 4 k 11 k 10 k 9 k 8 0x3320646e n 3 n 2 n 1 n 0 n 7 n 6 n 5 n 4 c 7 c 6 c 5 c 4 0x79622d32 k 19 k 18 k 17 k 16 k 23 k 22 k 21 k 20 k 27 k 26 k 25 k 24 k 31 k 30 k 29 k 28 0x6b206574 below diagonal := below diagonal (( diagonal + above diagonal ) 7) two below diagonal := two below diagonal (( below diagonal + diagonal ) 9) above diagonal := above diagonal (( two below diagonal + below diagonal ) 13) 27 / 36
Medium Level The Algorithm The Algorithm Confusion: 0x61707865 k 3 k 2 k 1 k 0 k 7 k 6 k 5 k 4 k 11 k 10 k 9 k 8 0x3320646e n 3 n 2 n 1 n 0 n 7 n 6 n 5 n 4 c 7 c 6 c 5 c 4 0x79622d32 k 19 k 18 k 17 k 16 k 27 k 26 k 25 k 24 k 31 k 30 k 29 k 28 0x6b206574 below diagonal := below diagonal (( diagonal + above diagonal ) 7) two below diagonal := two below diagonal (( below diagonal + diagonal ) 9) above diagonal := above diagonal (( two below diagonal + below diagonal ) 13) 28 / 36
Medium Level The Algorithm The Algorithm Confusion: 0x61707865 k 3 k 2 k 1 k 0 k 7 k 6 k 5 k 4 k 11 k 10 k 9 k 8 0x3320646e n 3 n 2 n 1 n 0 n 7 n 6 n 5 n 4 c 7 c 6 c 5 c 4 0x79622d32 k 19 k 18 k 17 k 16 k 27 k 26 k 25 k 24 k 31 k 30 k 29 k 28 0x6b206574 below diagonal := below diagonal (( diagonal + above diagonal ) 7) two below diagonal := two below diagonal (( below diagonal + diagonal ) 9) above diagonal := above diagonal (( two below diagonal + below diagonal ) 13) diagonal := diagonal (( above diagonal + two below diagonal ) 18) 29 / 36
Medium Level The Algorithm The Algorithm Confusion: k 3 k 2 k 1 k 0 k 7 k 6 k 5 k 4 k 11 k 10 k 9 k 8 0x3320646e n 3 n 2 n 1 n 0 n 7 n 6 n 5 n 4 c 7 c 6 c 5 c 4 0x79622d32 k 19 k 18 k 17 k 16 k 27 k 26 k 25 k 24 k 31 k 30 k 29 k 28 0x6b206574 below diagonal := below diagonal (( diagonal + above diagonal ) 7) two below diagonal := two below diagonal (( below diagonal + diagonal ) 9) above diagonal := above diagonal (( two below diagonal + below diagonal ) 13) diagonal := diagonal (( above diagonal + two below diagonal ) 18) 30 / 36
Medium Level The Algorithm The Algorithm Confusion: below diagonal := below diagonal (( diagonal + above diagonal ) 7) two below diagonal := two below diagonal (( below diagonal + diagonal ) 9) above diagonal := above diagonal (( two below diagonal + below diagonal ) 13) diagonal := diagonal (( above diagonal + two below diagonal ) 18) 31 / 36
Medium Level The Algorithm The Algorithm Algorithm 1: Salsa 20/r where r {8, 12, 20} assemble array from key, nonce and block counter; for r times do confuse each column; transpose array; add initial array; 32 / 36
Medium Level The Algorithm The Algorithm Algorithm 1: Salsa 20/r where r {8, 12, 20} assemble array from key, nonce and block counter; for r times do confuse each column; transpose array; add initial array; Advantages: key is part of array not necessary to store it additionally locality: extremely cache-efficient short & simple code decryption identical to encryption 32 / 36
Medium Level Cryptanalysis Cryptanalysis Cryptanalysis: heavily analysed since 2005 best known attack by [Aumasson et al., 2008] based on probabilistic neutral bits (PNBs) 2 249 -operation attack on Salsa 20/8 2 153 -operation attack on Salsa 20/7 Salsa 20/6 broken 33 / 36
References References I Aumasson, J.-P., Fischer, S., Khazaei, S., Meier, W., and Rechberger, C. (2008). New Features of Latin Dances: Analysis of Salsa, ChaCha, and Rumba, pages 470 488. Springer Berlin Heidelberg, Berlin, Heidelberg. Bernstein, D. J. (2008). The Salsa20 Family of Stream Ciphers, pages 84 97. Springer Berlin Heidelberg, Berlin, Heidelberg. 34 / 36
Thank you for your attention! Any questions?
Thank you for your attention! Discussion!