The most important thing we build is trust ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS UT840 LEON Quad Core First Silicon Results Cobham Semiconductor Solutions Presented at Aerospace Corporation MRQW January 2015 Presenter:Rob Ciccariello
Thank You to our sponsors and supporters 1
Introduction Cobham Semiconductor Solutions (Cobham), formerly Aeroflex Microelectronics HiRel developed the UT840 quad core processor as a proof of concept of multiple new technologies: Gaisler LEON 4FT SOC IP IBM 9SF 90nm Process Technology Cobham UT90nHBD Library and ASIC Design Flow Flip Chip Package Technology for Space Applications Cobham has completed electrical characterization focusing on benchmark tests Data shows that all new technology implementations were successful, and that the UT840 meets performance goals 2
Background: Comparison UT840 is the third generation fault tolerant LEON processor from Cobham Semiconductor Solutions UT840 represents a significant improvement in performance and capability over previous designs LEON Family Key Feature Comparison UT699 (LEON 3FT) UT700 (LEON 3FT) UT840 (LEON 4FT) IEEE-1754 SPARC Compliant Cores 1 1 4 Process Technology 250nm TSMC 130nm TSMC 90nm IBM Operating Voltage 2.5/3.3 1.2/3.3 1.0/2.5 L1 Cache (Data/Instruction) kb 8/8 16/16 16/16/ per core L2 Cache (kb) NA NA 256 IEEE 754 FPU 1 1 1/core Max clock Frequency (MHz) 66 166 266 Supported Interfaces uart, 10/100 Ethernet, CAN, PCI, SpW uart, SPI, 10/100 Ethernet, CAN, PCI, SpW, 1553 uart, SPI, 10/100/1000 Ethernet, CAN, PCI, SpW, 1553 Package Technology 1mil Al Wire bond 1mil Al Wire bond Flip Chip 3
Background: Block Diagram Seven stage pipelined monolithic, high-performance, fault-tolerant SPARC TM V8/LEON 4FT Quad Core Processor 256KB L2 Cache SSRAM Support 1000 Mbit/s Ethernet 1553 Interface 4
Background: RadHard Design UT840 designed using the UT90nHB ASIC RadHard flow 1:Library Characterization Prompt Dose\SEL TID Requirement FF SEU FF SET SRAM Reset SET Clock SET PLL Total Recoverable Error Rate (Error/device-day) 7.0E-04 3.2E-05 3.1E-05 1.9E-05 1.9E-07 3.4E-07 1.6E-05 9.8E-05 Data Integrity Error Rate (Error/device-day) 0.25 9.6E-05 9.2E-05 1.7E-04 1.9E-07 3.4E-07 1.6E-05 3.7E-04 2: UT840 Chip Level Analysis Adjust Drive strengths SEE\SET filter additions Clock\Reset Tree Hardening SRAM Bits 4,598,272 Control Data System Derating* Utilization Factor** SRAM 10% 90% 20% 100% PLL's 3 Logic 25% 75% 20% 100% Latches & FF's 155,045 DICE FF's 126,884 DICE Latches 28,161 MBIST FF's - Clock RE Derating DICE FF's or Latches w/o Data Filter 41,100 Clock Frequency (MHz) Control Derating System Derating Gates 1,157,450 266 25% 50% Gates/FF 7.47 * System Derating accounts for bits that are not accessed Reset SET Rate 1.93E-07 after upset or do not propagate to a detection point. Clock SET Rate 3.44E-07 ** In this case the utilization factor is 100% Composite SRAM Bit Error Rate 2.06E-10 Raw SRAM Bit Error Rate 1.84E-06 FF SET Bit Error Rate (44ps) 1.48E-08 Intel estimate 10% FF SET Bit Error Rate (280ps) 5.83E-14 IBM Server Estimate 40% FF SEU Bit Error Rate 4.56E-09 SDC estimate 50% 5
Background: Device Statistics Die Size: 135 mm 2 (11.6 x 11.6) Macros: 220 placements Package pins : 445 Signals; 229 Pwr/Gnd; 55 spare (2083 die bumps) Gate Count: 4.4M logic gates, 5Mbit Memory (~20M trans) SysClk : 300MHz (Nom) 6
Single Core Comparison Dhrystone benchmark was run at maximum frequency on single core and compared to maximum frequency results from UT699 and UT700 UT840 order of magnitude improvement in W/MHz over UT699 Indicates process technology impact UT840 max frequency exceeded simulation (300 vs. 266 MHz) Performed using BCC, with UT699 @ 100MHz, UT700 @ 220MHz, and UT840 @ 300MHz. All at 25ºC. Unused cores clock gated. 7
Power Tests Custom tests were run to gauge UT840 power consumption with multiple cores active. ~150mA/core at 100MHz ~400mA/core at 275MHz Static Current ~250mA Performed using Aeroflex developed CPU gate test. During this test all interfaces except the ETH0 are clock gated. Core at 1.0V, 25ºC. 8
Quad Core Efficiency Dhrystone benchmark run with 1 to 4 cores enabled, using LINUX operating system at 250MHz, 25ºC. Multicore efficiency ~85% with 4 CPUs active. Linux Operating System, using L2 cache only. No external SDRAM. 9
SPEC CPU2000 Benchmarks A sample of Standard Performance Evaluation Corporation (SPEC) CPU2000 benchmarks were run to evaluate integer and floating point performance across multiple applications Name 175.vpr 176.gcc 186.crafty 197.parser 252.eon 255.vortex Integer Benchmarks Description FPGA Circuit Placement and Routing C Programming Language Compiler Game Playing: Chess Word Processing (syntatic analysis) Computer Visualization Object-oriented Database Floating Point Benchmarks Name Description 172.mgrid Multi-grid Solver: 3D Potential Field 177.mesa 3-D graphics library 179.art Image Recognition 183.equake Seismic Wave Propagation Simulation 187.facerec Image Processing: Face Recognition 188.ammp Computational Chemistry 189.lucas Number Theory / Primality Testing 200.sixtrack Nuclear Physics Accelerator Design For multi-core operations, each core ran its own individual copy of the benchmark 10
SPEC CPU2000 Integer Benchmarks Testing performed with 100MHz system clock and 50 MHz memory clock at nominal conditions Reduced efficiency compared to Dhrystone testing expected due to increased IO and memory intensity of CPU2000 programs 11
SPEC CPU2000 Floating Point Benchmarks Testing performed with 100MHz system clock and 50 MHz memory clock at nominal conditions Extremely (>90% four CPUs) high efficiency on some benchmarks driven by FPU/core architecture 12
Voltage Temperature Dependence UT840 performance strongly affected by core voltage and operating temperature. 1.1V increased max frequency 27% over nominal at 25ºC. 125ºC overwhelmed this improvement at 1.1V. 13
Summary A next generation LEON 4FT processor has been developed and characterized UT840 Quad Core LEON 4FT is functional Device meets 300MHz clock frequency and 2.2W at nominal conditions Multicore efficiency is ~85% when all cores are active for Dhrystone bench mark, and has demonstrated over 90% efficiency for some floating point bench marks Thank You! 14