2. Industry Verification Flow 1 Verification at ARM Alan Hunter Overview The focus will be on CPU cores ARM then and now How we think about DV DV history A side note on complexity So we just need to boot an OS right? What a real project looks like by the numbers Current directions and improvements Future directions 2
2. Industry Verification Flow 2 ARM history Founded in November 1990 With 12 people from Acorn computer And Robin Saxby as CEO First office was a barn outside Cambridge, UK Internal simulator (asim) + full custom design flow 3 ARM1 Visual Simulator 4 http://visual6502.org/sim/varm/armgl.html
Verification of Digital Systems, Spring 2017 2. Industry Verification Flow ARM today Greater than 4600 people worldwide With about 30 offices worldwide Wide range of products from CPUs, GPUs and Video Processors to System IP, Physical IP and software Very large eco-system Industry standard simulators + industry standard ASIC design flows 5 Some of our partners 6 3
2. Industry Verification Flow 4 How we think about design verification Verification partitioned into different levels System-level realistic system implementation Top-level full CPU(s) + Memory system Unit-level Interesting units in isolation (L2, LS, Core, IF ) Multi-unit interesting multiple units together (LSL2, MP tb) And different techniques Simulation Static methods Emulation/FPGA 7 DV history It all started with Architectural Validation Suite(s) (AVS) hand written assembler tests exercising architectural features Device Validation Suite(s) (DVS) hand written assembler tests exercising micro architectural and implementation feature And that was it (for the most part) But we realized that top-level testing was insufficient (although it is very much required) 8
2. Industry Verification Flow 5 DV history So that brought in the era of unit-level testbenches Initially in Vera and e, but latterly in SystemVerilog (and some small amount of SystemC) Mostly home grown base class library, but now moving to UVM Plus functional coverage, portable checkers, assertions It was also a time of learning what makes a good testbench Trading off testbench performance and maintainability vs. details of checking Portability of checkers and code for unit è multi-unit and unit è top-level 9 DV history Emulation and FPGA prototypes required for long single thread execution OS boots and stress testing Custom bare metal stress testing that takes many hours to run Rise of formal methods Initially expert users only But designer bring up is rapidly broadening appeal Adding process here allows broader adoption 10
2. Industry Verification Flow 6 DV history Processor DV Methodologies! No single methodology covers the entire validation space UNIT LEVEL SIMULATION STATIC UNIT LEVEL ASSERTIONS TOP LEVEL ASSERTIONS COVERAGE DRIVEN DIRECTED RANDOM DETERMINISTIC UNIT LEVEL TEST CODE TESTBENCHES (AVS/DVS) RTL CODE RANDOM COVERAGE CODE FUNCTIONAL RTL CODE COVERAGE COVERAGE GENERATION FORMAL COVERAGE PROPERTY CHECKING/ ASSERTIONS EMULATION VALIDATION SPACE REFERENCE FUNCTIONAL FPGA MODEL PROTOTYPING DEBUGGING OS BOOTING/ APPS BENCH- MARKS TOP LEVEL SIMULATION REAL TIME Confidential 3 11 Diversion on complexity L2 in the abstract 10-50 cycles Cortex-A15 L2 FEQ 12 entries ~10 6x4 ~12 ACP AC Instructions - 4 types 512 bit data - 4 ports Address 12
2. Industry Verification Flow 7 By the numbers Simplified view of L2: Reachable state space 5x10 30 Number of seconds since Big Bang ~5x10 17* Realistic reachable state space is actually much bigger * According to http://www.physicsoftheuniverse.com/numbers.html 13 So booting an OS is stressful right? Cumulative bug graph First Linux boot Cumulative 14
2. Industry Verification Flow 8 Cortex-A72 (Maia) overview Full implementation of the ARMv8 architecture AMBA 4 ACE or CHI master interface ECC and parity protection for all SRAMs Aggressive CPU and L2 power reduction capability Support for 4M L2 feature Support for ACP port performance improvements 15 Unit level testbenches IF testbench Delivers up to 3 instructions per cycle to decode Handles program breakpoints Mispredict recovery SystemVerilog testbench, serves as a BFM for dispatch, decode, branch resolution and L2 cache requests Core testbench Includes ID, DS, IX and CX SystemVerilog testbench with custom checkers and ISSCompare to check results Stimulus is a randomly generated stream of instructions does not have to be real 16
2. Industry Verification Flow 9 Unit level testbenches (cont.) L2TLB/TBW testbench LPAE page table format is quite complex, so TB created to test in isolation (although really part of L2) SystemVerilog random page table generator (including recursive page tables) LS testbench One of the main (and most complex) testbenches (~100k LOC) SystemVerilog testbench that drive DS and SCU request stimulus Tests behaviors like flushes causing uop replay, error injection (ECC), ordering 17 Unit level testbenches (cont.) L2 testbench SystemVerilog testbench (complex) (~100k LOC) Tests L2 prefetcher, ACP slave, CPU slave, SCU, TW, ACE/CHI interface Plus support for async events like ECC, WFI, power down Large scoreboard to check actual against predicted not strictly blackbox 18
2. Industry Verification Flow 10 Multi-unit level testbenches LS/L2 Combination of LS and L2 testbenches Ability to include real fabrics to test multi-cluster and big.little configurations Barrier behavior and memory ordering a key focus + async events (like incoming snoops on AC) Multi-cluster sharing and data coherency checking, extreme cache migration scenarios + many more 19 Top-level testing Vehicle for: Checking architectural compliance Running random instruction sequence generators Power aware simulation with UPF file Broader static configuration space of ~70 different configurations Dynamic configurations e.g. Clock ratios, page table attributes, chicken bits Dynamic irritation e.g. Random bus traffic, random events (WFI, FIQ, IRQ ) 20
2. Industry Verification Flow 11 System-level testing Hardware Emulation For OS boots, baremetal testing, Emulator optimized RIS FPGA For faster execution, but poorer debug, so lots of long random testing to pull out any potential system issues Memory ordering litmus testing (very long running) ~2 weeks @ 10MHz 21 Formal verification Targeted to very specific areas: Sequential equivalence check between 2 64-bit FP multiplier implementations C to RTL compare between specific math functions Sequential equivalence between clock gated and non-clock gated versions This was a very quick project (9 months start to finish), so very limited scope 22
2. Industry Verification Flow 12 Regressions Unit level Top level Stimulus Core IF LS L2 Cycles Run 1560 B 2880 B 2146 B 2975 B Stimulus Raven MemRIS Cycles Run /No.Tests 1260 B cycles 1150 B cycles LS/L2 GIC ETM 2006 B 83.7 B 128 B AVS DVS 148268 tests 3037 tests 23 System Validation Emulation FPGA Stimulus Cycles Stimulus Cycles ACE 2800 B HWRIS 29.3 T CHI 2789 B AEGIS 169 T 24
2. Industry Verification Flow 13 Team sizes Fairly small team: Around 20 designers working on RTL development Around 30 DV engineers here in Austin Around 10 Top level and System level verification in BLR 25 Current directions and improvements Broad move to UVM based testbenches Enhanced system-level verification Statistical coverage Enhanced formal verification 26
2. Industry Verification Flow 14 Broad move to UVM New projects have been moving to UVM when they can But it takes a long time to deprecate old testbenches We had a chance to clean slate for some of the current projects So there has been some significant re-writes (not without issues) Some ground up development (using more modern software development ideals) Also taking the time to re-evaluate current testbench methodology So we don t lose anything in the switch over So we can build more maintainable testbenches for the future Optimize testbench performance out of the gate, but without optimizing too early 27 Enhanced system-level verification More configurations of our IP in different system settings Try and find more bugs earlier in the process Additional features to aid silicon/fpga debug Always adding stimulus to try and further stress integration Tie back to unit-testbenches to further improve stimulus Additional bare metal stimulus and checking to enhance hard to hit areas Areas like memory barriers and relaxed memory semantics 28
2. Industry Verification Flow 15 Statistical Coverage Finding relationships between disparate data Correlations may exist beyond single simulation Independent of pass/fail status Fairness Arbitration Performance 29 Enhanced formal verification Formal property verification Formal testbench development (tracking code, end to end checkers, interface constraints, and embedded assertions) Formal coverage collection and fine tuning the coverage model Automated deadlock detection flows (this is different from the deadlock detection app) The idea is that whatever this flow finds, is definitely a deadlock, but it will not necessarily find all possible deadlocks X propagation, and X checking Sequential equivalence checking (against RTL, or a C/SystemC model) Bug hunting flows as opposed to proofs for all the above Scales well with design size (as opposed to proofs) Is not exhaustive Forward progress checking through custom forward progress checks 30
2. Industry Verification Flow 16 Future directions Looking at machine learning techniques for testbench coverage closure Pushing all verification results into a data lake for future offline analysis Enabling ISA formal verification Extracting formal ISA properties directly from ARMARM pseudocode Using those properties to bug hunt with formal tools 31 Conclusions We have a lot of history to deal with So it can be slower than we like to move to new technologies sometimes But there has to be a balance of the old and the new The goal of any new technique is to bring quality and efficiency to the verification process To do that we have to measure everything 32