A practical integrated device for lowoverhead, secure communications. Gord Allan Matt Lewis
Design Goals Versatility Mobility Security -can be used in a range of devices -compatibility, low/no infrastructure -low power consumption -2048 bit RSA key exchange -AES RC6 data exchange -RSA based authentication Throughput -150 ms for RSA cycle -100+ Mbps RC6 throughput
Asymmetric Exchange of Symmetric Key Use public key cryptography to agree on a mutual symmetric key for use in a symmetric cipher. Assume all public keys are known (falls apart later) A A enc B (lets use k) sure... enc RC6 (m) B B
Asymmetric Exchange of Symmetric Key Vulnerabilities enc B (lets use k) Impersonation of origin Replay Attack A sure B Additionally A enc RC6 (m) B Public Key lookup...
Key Exchange and Identity Authentication Diffie-Hellman Advantages: -computational feasibility, small keys -key agreement, resists replay attacks Disadvantages: -active intruder problem -impractical solution, high overhead
Key Exchange and Identity Authentication RSA Advantages: -message content can be controlled -authentication is manageable Disadvantages: -highly computational (speed/power) -requires specialized hardware (area)
User Certificates To solve impersonation and public key distribution attacks we rely on an authentication scheme which makes use of certificates. A 101101001 1011010111011 Associated with a device we have: Extended IP / Phone Nbr (128 bit) MAC (48 bit) Control/Future Use (16 bit) Public Key n (2048 bit)
Certificate Format Header IP/ID MAC Public Key - N Footer 160 128 48 2048 16 160 A certificate is 2560 bits and contains an encrypted version of A s public data. It is encrypted by the issuing authority s private key on configuration of the device and saved into the chip s EEPROM. All devices can easily decode the certificate, but only the issuing authority can create a certificate.
Key Exchange Protocol Header IP/ID MAC Public Key - N Footer 160 128 48 2048 16 160 A Calls B, knowing IP/ID Call Protocol B Sends certificate C A prepares random 2047 bit key decodes C B (C F4 B mod N IA ) encodes S A = Nbr F4 A mod N B sends S A to B decodes S B (S d B mod N A ) key RC6 = (Nbr B XOR Nbr A ) 1919-2047 Decodes C A (C F4 A mod N IA ) sends C B to A s IP/ID prepares random 2047 bit key encodes S B = Nbr F4 B mod N A sends S B to A decodes S A (S d A mod N B ) key RC6 = (Nbr A XOR Nbr B ) 1919-2047
Additional Considerations Certificate forgery Device configuration DNS/Phone lookup Compatibility
Certificate Forgery With the current signature scheme, the issuing authority (IA) is the only holder of the secret encryption key. The IA then, is the only one with the ability to encode a given bit sequence. It is possible for an imposter to generate a random bit string, and decode it to determine whether it meets the format requirements of a certificate. If it does, the intruder can intercept a call, and provide its forged certificate with a weak value of N, such that d can be factored out. To prevent forgery, the header and footer of a certificate is composed of a set data sequence. This sequence must be matched to prove a certificate valid.
Certificate Forgery With a 160 bit header and footer the imposter would have to generate a block that when decrypted not only has a weak N but has to match 320 bits. The odds of a randomly generated block decrypting to this are approximately 2 320 to 1. This is also assuming they can forge their IP and MAC address which is feasible. So using this signature scheme each party can be pretty much be guaranteed that the signatures are authentic given that the trusted authority is secure.
Device Configuration The device is first configured in the factory. If the chip is part of a network card the MAC address for the card is put into the signature. This can also be done with mobile phones as they also carry a unique number that identifies them. Next the IP/ID field is set to a default zero. An N is then generated for the new chip and the signature is then encrypted with the constant header and footer. The decryption key is then programmed into the chip. Now the device is ready for the real world. One might note here that we have not discussed the encryption key. This is because it is always going to be F 4 (Fermat s prime = 2 16 +1). This will also be the case for decrypting signatures. Header IP/ID MAC Public Key - N Footer 160 128 48 2048 16 160
Device Configuration Once the chip is activated on a network for the first time it must contact the trusted authority to get a new signature that includes its new IP/Phone number. The trusted authority will maintain a database containing the MAC, IP, and N for each card. Upon a valid issue, the TA can update future lookup/ DNS servers. The conversation with the trusted authority uses the same protocol described earlier but the TA verifies the old signature and generates a new one based on the requested IP address of the client. Upon receiving the new signature the client reprograms itself and is ready to go again. One thing to note here is by keeping a list of MAC and IP numbers one can verify that a client is not using an old signature.
DNS/Phone Lookup The TA now is in the perfect position to act as a DNS/Phone server. This however would cause too much work for the TA, so a distributed approach is necessary. The TA can give up some of its power to allow DNS/Phone servers to have copies of the MAC/IP database. It does not however give away it s secret key to encrypt signatures. A DNS/Phone server must provide both the MAC/IP numbers to the client so that it can verify the other parties signature is up to date andvalid. If the IP and MAC address don t match it means the person is using an old signature. This scheme allows the TA to only deal with new signatures whichon phones will happen very rarely and when IPV6 takes over this will also be the case on the Internet.
RC6 Symmetric Key Cipher Design Considerations: -Simplicity -Algorithm -Implementation -Analysis -Performance -Security -Throughput -Latency
RC6 Encryption Algorithm
RC6 Key Schedule -Simple -No known weak keys (P(weak key) = 10-1152 ) -Again based on RC5 key schedule (no know good attacks) -Can t generate round keys concurrently with encryption (+latency)
RC6 Security -RC6 design is based on RC5 with slight modification that hold off possible attacks -Linear and differential attacks are no good on 20 round version -Best way to crack RC6 is an exhaustive key search -Best know attack on RC5 allows 12 rounds to be cracked in the long term
RC6 Symmetric Key Cipher RC6 Simplicity: -Smallest memory requirement for code for Smart Cards (does requires small amount of RAM for data) -Extremely easy to code in software Performance Results: -Fastest remaining AES candidate on both Intel and Alpha CPU s -Excellent performance with ASIC design -Good performance on 8 bit CPU s andfpga s(3rd) Security: -Excellent (Lots on RC5) -Easy to do a thorough analysis on how secure it is Future Expandability: -The algorithm can be easily modified to a 256 bit block cipher that uses 64 bit blocks in encryption/decryption
Block Overview of Chip
Random Bit Generator
Modular Exponentiation: P = X E mod N reduce the exponentiation into up to 2*n e 2048 bit modular multiplications reduce each modular multiplication into n/u (2560/32) discrete operation with carries Montgomery s Algorithm: Modular Multiplication: A*B mod N P o = 1, Z o = X for i=0 to n-1 do Z i+1 = Z i2 mod N if e i = 1 then P i+1 = P i *Z i mod N R o = 0 for i=0 to n+2 q i = R i(0) R i+1 =(R i + a i B + q i N)/2 Where: Storage: Work: Z i is a 2048 bit partial product P i is a 2048 bit partial product 2*Z + 2*P + N + E = 12288 bits Avg: 1.5*n e modular multiplies B = 0101 1101 1011 * A = 1011 0110 0110 0000 0000 0000 + 0101 1101 1011 + 0101 1101 1011 etc...
Proposed Architecture: Thomas Blum P = X E mod N P o = 1,Z o = X for i=0 to n-1 do Z i+1 = Z i2 mod N ife i = 1 then P i+1 = P i *Z i mod N
Proposed Architecture: Thomas Blum A Systolic array for 2048/2560 bit modular multiplication. N=2560, u=32bit 80 units R i+1 =(R i +a i B + q i N)/2 B = 0101 1101 1011 * A = 1011 0110 0110 0000 0000 0000 + 0101 1101 1011 + 0101 1101 1011 etc...
Proposed Architecture: Architecture of a single cell to compute u bits: R i+1 =(R i + a i B +q i N)/2 B = 0101 1101 1011 * A = 1011 0110 0110 0000 0000 0000 + 0101 1101 1011 + 0101 1101 1011 etc...
RSA Hardware Estimates and Specifications Memory: 197 bits/unit*80 + 4*2048 = 23952 bits Area: 2.9x2.9 mm 2 (12u*20u*23952*1.5) -would need more for IO pads Encode/Verify Time: 2 ms, using F 4 =2 16 +1 Decode Time: 160-180ms: u=32, n=2560; (2560/512) 2 * t 512 Power: 0.25*197*80*3uW/Mhz*35Mhz = 413mW @ 3.6 V --> Draws 115 ma avg, 700 ma peak
RC6 Hardware Design Goals: -Faster than software implementation -Low power -Small die area RC6 Hardware Requirements: -Multiplier(?) B(2B+1) => 2B 2 +B -Variable bit shifter -Adders, registers, and XOR blocks -0.35u or 0.25u process technology
Squaring Hardware
RC6 Hardware
RC6 Hardware Summary Estimated Performance: ->100 Mbps throughput @ 20 MHz -Easily increased w/ pipelining or Loop Unrolling -<50us latency on first block Estimated Power Requirements: -RC6 cipher @ 20 MHz 24mW + I/O Power Estimated Die Area: -RC6 cipher 1mm 2 -RC6 key scheduler and key storage 1mm 2
Conclusion Performance: Power Requirements: Die Area: Security: Work in progress: Call setup time < 200 ms RC6 throughput > 100 Mbps RC6 cipher 24mW RSA + RC6 cipher 3.3 x 3.3 mm 2048 bit RSA Key exchange User authentication 128 bit RC6 AES candidate VHDL encoding, simulation and synthesis Interface specifications Thank you for your attention.