Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1
Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera Flash 9 128 MB DDR H.264 Skype 2
Processors are evolving, e.g. MP World-class market-proven technology 20+ processors for every application 200+ silicon partners 500+ licenses 15Bu shipped ARMv5 ARM968E-S ARM946E-S ARMv6 x1-4 ARM966E-S ARMv7 Cortex ARM1176JZ(F)-S ARM1156T2(F)-S ARM1136J(F)-S ARM1026EJ-S ARM926EJ-S ARM11 MPCore Cortex-A8 Cortex-R4 x1-4 Cortex-A9 Cortex-R4F ARM7EJ-S SC200 ARMv4 ARM7TDMI(S) ARM920T SC100 ARM922T Cortex-M3 Cortex-M1 SC300 Cortex-M0 3
ARM Mali GPU - Scalable Performance to over 1G Pixel/s Visual complexity Mali -400 MP Mali -200 Mali -55 Web Browsing Flash Lite Java Gaming Next Generation Navigation Mobile Gaming 3D Navigation Flash 10 TV HD UI Video Post Processing HD 3D Gaming Console 3D Gaming 2D/3D Presentations HD Video Post Processing Screen resolution 4
Higher Mobile Device Resolution Requirements of next generation Mobile platform - Increasing bandwidth requirements simply to refresh the display - Ignoring Fill rate, Input Vertex Data and Texture bandwidth 1080p30 1920x1080 1080p60 1920x1080 WSVGA 1024x600 WXGA 1280x800 Display Refresh Bandwidth MB/s WVGA 800x480 1080p60, 1920x1080, 60fps 475 1080p30, 1920x1080, 30fps 237 QVGA 320x240 VGA 640x480 720p, 1280x720, 30fps 105 WVGA, 800x480, 30fps 44 VGA, 640x480, 30fps 35 2007 2008 2009 2010 2011 2012 2013 5
Example SoC Mobile Platform CPU L2 CPU Cache L2CC L2CC Media Media Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 Bandwidth requirement Latency requirement Dynamic Dynamic Memory SDRAM Memory Controller Controller Static Static Memory Memory Interrupt Interrupt Controller Controller AMBA Interconnect LPDDR2 NAND Flash UART0 UART1 SPI WDT Timer0 Timer1 RTC GPIO 6
Example SoC Mobile Platform CPU L2 CPU Cache L2CC L2CC Media Media Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 AMBA Interconnect Dynamic Dynamic Memory SDRAM Memory Controller Controller Static Static Memory Memory Interrupt Interrupt Controller Controller LPDDR2 NAND Flash Digital Highway 7
ARM Design Flow for Digital Highway Design Your Intelligent Digital Highway Configure and connect your RTL AMBA Designer Verification & performance exploration in simulation AVIP Improve your software CoreSight 8
AMBA Ecosystem : The on-chip infrastructure is critical to system performance Increased focus on processor memory performance Different types of processors have different requirements ARM has grown the AMBA architecture eco-system to help accelerate SoC design: 70+ Connected Community partners have AMBA compatible products 10+ AMBA specification downloads a day the de facto standard is of course the ARM bus architecture, AMBA. Ron Wilson, EETimes 9
Design to Minimise Latency Each path must be designed to minimise the inherent pipeline latency Round trip memory latency Processor sub-system AXI Interconnect Dynamic Mem DDR2 PHY DDR2 SDRAM Address format and arbitration DDR2 SDRAM CAS latency De-skew and capture Data FIFO and bus interface Next generation AXI Interconnect halves the interconnect latency Masters which issue multiple AXI requests effectively hide latency PrimeCell Cache Controllers Trade an increase in minimum latency for dramatically reduced average latency 10
Design to Maximise Throughput Effective on-chip Quality of Service depends on the cooperation of the interconnect and memory controller Support for multiple outstanding requests The best use of memory pages by scanning the list of requests Controlling the order of queued transactions to Meet maximum latency targets Ensure throughput-dependent processors are well serviced Provide low latency paths 11
ARM Level2 Cache Controllers CPU CPU L2 L2 Cache Cache Media Media Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 AMBA Interconnect Dynamic Dynamic Memory SDRAM Memory Controller Controller Static Static Memory Memory Interrupt Interrupt Controller Controller LPDDR2 NAND Flash Digital Highway 12
L2CC Increases Processor Performance 512K L2 256K L2 128K L2 No L2 +104% +102% +74% 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Benchmark : MPEG4 decode System : ARM PrimeXsys Platform for ARM1136J-S CPU : 400MHz ARM1136J-S 16K I & D caches Memory : 100MHz 32 bit SDRAM L2 cache : L210 128K unified L2 cache MPEG4 Decode on ARM1136EJ-S Relative performance Web Page Render Time as a function of L2 Cache Size L2 Cache Size (KB) 512 256 128 0 First Time Subsequent Benchmark: Linux + Mozilla (5 html pages from I-Bench looped 4 times) CPU: Cortex-A8 (speed, L1 cache), L2 part of Cortex-A8 Results may vary for system configuration and web content 0.0 1.0 2.0 3.0 4.0 Speed Up Compared to 0K L2 13
L2CC Increases System Performance Reduced System Power Consumption External memory access ~10x more energy than on-chip External memory accesses reduced with L2 cache Enables use of lower-power and lower-cost memory sub-system E.g. 16-bit instead of 32-bit external interface Or LPDDR instead of DDR2 Reduced On-Chip traffic & contention Only cache misses propagated to the interconnect Improve overall system performances Provide more bandwidth to others SoC components 14
ARM AMBA Interconnect Cortex Cortex A8 A8 L2CC L2CC Media Media Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 NIC-301 Dynamic Dynamic Memory SDRAM Memory Controller Controller Static Static Memory Memory Interrupt Interrupt Controller Controller LPDDR2 NAND Flash Digital Highway 15
AMBA Interconnect (NIC-301) Low latency communication for ARM CPUs High bandwidth for ARM Graphics and Video Supporting: AXI, AHB & APB Data widths from 32- to 128-bit Supporting both synchronous & GALS implementations Quality of service Configurable through AMBA Designer For minimum area & maximum frequency 16
Optimise your Interconnect Topology Real-time masters Real-time masters Cortex A9 Freq F Fx2.5 Cortex A9 RAM SMC DMC Fx2.5 Low bandwidth peripherals High connectivity & increasing numbers of IP cores does not scale with a single interconnect RAM SMC DMC Fx2.5 Low bandwidth peripherals Use properties of the traffic to influence the topology 17
Topology Optimisation with ARM Interconnect Cortex Cortex L2CC L2CC Neon Neon Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 NIC-301 400MHz Low Latency Interconnect NIC-301 200MHz Dynamic Dynamic Memory SDRAM Memory Controller Controller Static Static Memory Memory Interrupt Interrupt Controller Controller LPDDR2 NAND Flash 18
ARM Memory Controllers Cortex Cortex L2CC L2CC Neon Neon Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 Low Latency Interconnect DMC-34x DMC-34x SDRA M SMC-35x SMC-35x Interrupt Interrupt Controller Controller LPDDR2 NAND Flash 19
ARM Memory Controllers Synthesizable, Configurable soft cores Wide range of memory types, silicon processes & target applications AXI Dynamic Memory Controllers for SDR, DDR, LPDDR, DDR2 and LPDDR2 (DMC-34x) Over 20 licensees to date AXI Static Memory Controllers for NOR Flash, NAND Flash and SRAM (SMC-35x) Over 40 licensees to date AHB Memory Controllers for Dynamic and Static Memories (PL24x) Over 60 licensees to date 20
ARM Design Flow for Digital Highway Design Your Intelligent Digital Highway Configure and connect your RTL AMBA Designer Verification & performance exploration in simulation AVIP Improve your software CoreSight 21
What is AMBA Designer? Topolology Configure Cross-configure Stitch & Check 22
What is AMBA Designer? Topolology Configure Interface checking on: Signal widths Signal direction Interface properties Valid response types Interleave depth Cross-configure Stitch & Check (Export as individual signals) 23
ARM Design Flow for Digital Highway Design Your Intelligent Digital Highway Configure and connect your RTL AMBA Designer Verification & performance exploration in simulation AVIP Improve your software CoreSight 24
AVIP Features for RTL Simulation Functional IEEE 1800 SystemVerilog Testbench Verification For Verification ers, AVIP is a set of System Verilog modules that enable faster and higher quality verification of AXI based IP. Performance Exploration For SoC architects, HW and Verification ers. AXI based SoC performance can be explored and verified. 25 Directed Prof. Vectors Data AXI Master User VIP AXI Master AXI Slave Interface AXI Master Interface UUT User (Block or Sub-system) AXI Slave Interface AXI Master Interface AXI Slave AXI Monitor User IP Prof. Data
AVIP Features for RTL Simulation Protocol Checkers OVL and SVA assertion libraries provided for AXI protocol checking. IEEE 1800 SystemVerilog Testbench AXI Master User VIP AXI Master AXI Protocol Coverage Channel level, transaction level and sequence level predefined coverage points for AXI protocol coverage. 26 AXI Slave Interface AXI Master Interface UUT User (Block or Sub-system) AXI Slave Interface AXI Master Interface AXI Slave AXI Monitor User IP
AMBA Designer + AVIP: RTL Design Flow To optimise interconnect and memory architecture ARM recommends the following flow: Configuration Set the correct parameters and check 27 the components Integration Assemble the sub-system and statically check the design Simulation Run test scenarios to check usage modes Analysis Check results and loop back Configuration Configuration Integration Integration Simulation Simulation Analysis Analysis
Fabric Design Tools: What is AVIP? IEEE 1800 SystemVerilog Testbench AXI Slave Interface AXI Master Interface AXI Slave AXI Master User VIP UUT User (Block or Sub-system) AXI Slave Interface AXI Master 28 AXI Master Interface AXI Monitor User IP
Fabric Design Tools: What is AVIP? 29 It enables System Exploration at RTL level TTT = Time to tweak = 20s TTS = Time to simulate = 5 mins
System Exploration Methods SoC, static Spreadsheet Analysis Block-level, Internal bus, RTL simulation RTL simulation, AVIP, User VIP Industry standards VIP SoC, Real Stimulus, external I/F Acceleration/Emulation VIP, Logic Tiles, SW Real-time Behavior Silicon/Applications 30
Iteration time vs Realism LOW mins/hrs Cycle time days/wks mths/yrs HIGH AVIP Internal bus simulation Mathematical formula, not dynamic Statistical or recorded traffic profiles SoC + s/w Emulation/proto Adding S/W, external I/F with realistic scenarios Silicon + Appl CoreSight Observe actual behaviour LOW Realistic behaviour mins/hrs Spreadsheet Static analysis HIGH AVIP: the iteration time of a spreadsheet with the accuracy approaching RTL simulation 31
ARM Design Flow for Digital Highway Design Your Intelligent Digital Highway Configure and connect your RTL AMBA Designer Verification & performance exploration in simulation AVIP Improve your software CoreSight 32
Improve the Performance of Your SoC Analyzing real silicon performance enables you to confidently improve the next design If you want to find out how a car really performs, drive it CoreSight Design Kit & Performance Profiling Provide accurate, real-time telemetry from your system Essential tools for delivering system performance improvements Your SoC may be optimized, but is the software? ARM Profiler analyzes system performance, enabling optimization via Profile Driven Compilation 33
CoreSight Debug & Trace The Debug & Trace Architecture for the Digital World Open Standard available on www.arm.com Optimise software productivity on your multi-core SoC SW Debug SW Performance Optimisation SoC Performance optimisation Visibility and trace of the whole SoC ARM trace and performance sources (ETM, PTM, Interconnect) Leverage CoreSight architecture for YOUR IP 34
ARM Digital Highway ARM Digital Highway technology delivers to YOU Key Soft IP and Physical IP elements The de-facto communication standard Tools to analyze and optimize your system design before committing to silicon AVIP Solution to debug and optimise once your silicon has been manufactured Faster time to revenue through reducing design effort and ensuring quality of results 35