Ncore Cache Interconnect Technology Overview, 24 May 2016 Craig Forrest Chief Technology Officer David Kruckemyer Chief Hardware Architect Copyright 2016 Arteris 24 May 2016
Contents About Arteris Caches, Cache Coherency and Challenges Introducing Ncore Cache Interconnect Summary Copyright 2016 Arteris 2
Arteris: The on-chip interconnect leader Arteris Product Milestones Founded in 2003 to pioneer network-on-chip (NoC) interconnect NoC Solution = first released NoC implementation in 2005 FlexNoC = second generation Arteris NoC in 2009/2010 FlexPSI = die-to-die or chip-to-chip parallel interface in 2013 FlexNoC Resilience Package = Functional Safety option in 2014 FlexNoC Physical = Physically aware IP with FlexNoC Version 3 in 2015 Ncore Cache Interconnect = Heterogeneous cache coherency in 2016. Company Headquarters and Engineering Development in Campbell, USA Worldwide support offices (USA, France, China, Korea, India, Japan) Awards Customer Adoption 41 52 58 67 76 79 1 6 9 13 20 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Copyright 2016 Arteris 3 * Customer data current as of 1 May 2016
Arteris has become the standard for complex and low-power SoCs Customers shipped > 1B SoCs as of 2015 240 Design Starts 41 26 1 5 13 85 128 159 190 229 240 146 Tape-Outs 32 19 1 5 11 55 99 119 140 146 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 108 Chips Produced 104 108 79 51 1 4 11 20 33 2008 2009 2010 2011 2012 2013 2014 2015 2016 *Data is cumulative. Design data is customer-reported and subject to change. Data is current as of 1 May 2016. Copyright 2016 Arteris 4
Arteris Customers: Arteris technology is becoming a standard Mobility Current as of 1 May 2016 Very Large SoC Maker Automotive, IoT (Internet of Things), Camera & CE (Consumer Electronics) Major Automotive OEM Major Auto & CE SoC Maker Toshiba Japan System OEM Automotive SoC Maker Japan Tier 1 SoC Maker Large Drone Maker SSD (Solid State Drive), Networking & Automation Major SSD Vendor Major SSD Vendor Defense Contractor Defense Contractor Defense Contractor Silicon Foundry Major IP Provider Copyright 2016 Arteris 5
Arteris interconnect IP now covers coherent and non-coherent use cases CPU Subsystem A57 A57 A57 A57 A53 A53 A53 A53 Design-Specific Subsystems GPU Subsystem 3D Graphics DSP Subsystem (A/V) IP IP IP FlexWay Interconnect IP Application IP Subsystem IP IP FlexWay Interconnect AES 2D GR. MPEG L2 cache L2 cache IP IP IP IP IP IP Etc. Ncore Cache Interconnect FlexNoC Interconnect InterChip Links TM Scheduler Controller Wide IO PHY LP DDR DDR3 PHY USB 3 USB 2 PHY 3.0, 2.0 Subsystem Interconnect PCIe PHY High Speed Wired Peripherals Ethernet PHY WiFi GSM LTE LTE Adv. Wireless Subsystem CRI Crypto Firewall (PCF+) RSA- PSS Cert. Engine HDMI MIPI Display PMU JTAG Arteris Interconnect IP Products Security Subsystem I/O Peripherals Subsystem Copyright 2016 Arteris 6
Contents About Arteris Caches, Cache Coherency and Challenges Introducing Ncore Cache Interconnect Summary Copyright 2016 Arteris 7
Modern SoC Design Challenges SCALABILITY: How to scale systems up as the number of coherent agents increases? HETEROGENEITY: How to integrate coherent processing elements using different protocols, different semantics, or having different cache characteristics? SYSTEM INTEGRATION: How to integrate IP that is not cache coherent and achieve better performance? PHYSICAL DESIGN: How to create a cache coherent system that is easily placed on chip? POWER MANAGEMENT: How to optimize power consumption of complex systems? Copyright 2016 Arteris 8
Why Caches? Caches are small, fast memories tightly coupled to processing elements Reduced average memory latency means higher performance Temporal locality Spatial locality High bandwidth due to high frequency and wide interfaces Fewer off-chip DRAM accesses resulting in lower power consumption Copyright 2016 Arteris 9
Why Cache Coherency? Caches create multiple copies of data Managing these copies in software is difficult Hardware cache coherency creates the illusion of a flat, shared memory Caches are invisible to software Multiple copies are kept consistent But managing copies in hardware requires a lot of communication Must check every place there may be a valid copy à filters reduce communication by tracking cache contents Copyright 2016 Arteris 10
Contents About Arteris Caches, Cache Coherency and Challenges Introducing Ncore Cache Interconnect Summary Copyright 2016 Arteris 11
Ncore Cache Interconnect IP s Agents CPU Cluster Cache ($) GPU Cache ($) Image Processing Display Processing Subsystems Peripherals Agents DRAM SRAM Agents Copyright 2016 Arteris 12
Ncore Interconnect Architecture Cache ($) Cache ($) Directory Proxy Cache ($) Bridge CCTI Proxy Cache ($) Bridge Subsystem Copyright 2016 Arteris 13
Read Example Cache Hit ❶ Consumer Cache ($) Cache ($) Cache ($) ❸ Producer Directory ❷ Proxy Cache ($) Bridge CCTI Bridge Subsystem Copyright 2016 Arteris 14
Read Example Cache Misses ❶ Consumer Cache ($) Cache ($) Cache ($) Directory ❷ ❹ ❸ CCTI Proxy Cache ($) Bridge Bridge Subsystem Copyright 2016 Arteris 15
Ncore Benefits 1. True heterogeneous coherency 2. Highly scalable systems 3. Higher performance with noncoherent IP 4. Lower power consumption 5. Easier chip floorplanning Copyright 2016 Arteris 16
Benefit #1: True heterogeneous coherency Two features are primarily responsible for enabling Ncore s unique heterogeneous cache coherency capabilities: 1. Support for multiple coherence models 2. Use of multiple configurable snoop filters to accommodate different cache organizations Copyright 2016 Arteris 17
Benefit #1: True heterogeneous coherency Support for heterogeneous coherent agents Cache coherent agents can differ greatly, which increases the difficulty in integrating them into a system-on-chip Logical coherence models Physical cache organization, transaction table sizes Ncore adapts to each coherent agent s behavior and characteristics agent interfaces adapt individual coherence models to a generic model using a lightweight messaging layer Copyright 2016 Arteris 18
Benefit #1: True heterogeneous coherency agent interfaces adapt individual coherence models to a generic model Cache ($) Cache ($) Directory Proxy Cache ($) Bridge CCTI Proxy Cache ($) Bridge Subsystem Copyright 2016 Arteris 19
Benefit #1: True heterogeneous coherency With multiple configurable snoop filters Directory Cache ($) Cache ($) Cache coherent agents can have very different behaviors Cache organization Coherency models Workloads Proxy Cache ($) Bridge(s) Associating caching agents that share CCTI common properties with individual Domain snoop filters can consume less die area than a monolithic snoop filter Copyright 2016 Arteris 20
Benefit #1: True heterogeneous coherency Multiple snoop filters are more area-efficient than one A B Cache ($) Cache ($) C Cache ($) D Cache ($) Traditional Approach Ncore Approach REQ Monolithic (X) A B C D REQ #1 (Y) #2 (Z) A B C D Multiple snoop filters are smaller: area(y+z) < area (X) Copyright 2016 Arteris 21
Ncore Benefits 1. True heterogeneous coherency 2. Highly scalable systems 3. Higher performance with noncoherent IP 4. Lower power consumption 5. Easier chip floorplanning Copyright 2016 Arteris 22
Benefit #2: Highly scalable systems With a configurable, modular approach Transaction processing and data bandwidth scaling Each component can be scaled individually (add or subtract components) Ports per component can be scaled individually (add or remove ports) Why is configurable interconnect superior to fixed-function, centralized controllers? Meet performance goals without wasted resources Easily adjust system design as requirements evolve Build derivative chips based on the same platform Copyright 2016 Arteris 23
Benefit #2: Highly scalable systems Add more components or ports to scale bandwidth Cache ($) Cache ($) Cache ($) or add more ports Directory Add more components CCTI Proxy Cache ($) Proxy Cache ($) Bridge Bridge Subsystem Arteris Confidential 24
Ncore Benefits 1. True heterogeneous coherency 2. Highly scalable systems 3. Higher performance with noncoherent IP 4. Lower power consumption 5. Easier chip floorplanning Copyright 2016 Arteris 25
Benefit #3: Higher performance with non-coherent IP Using configurable proxy caches Advantages (new and novel) 1. Better for sharing data between non-coherent agents and coherent agents 2. Better for sharing data between non-coherent agents Using a proxy cache minimizes communication through DRAM Additional system benefits Pre-fetch effect fetch cache lines vs. individual data Write-gathering benefit writes accumulated in cache Optimizes coherent memory accesses Copyright 2016 Arteris 26
Benefit #3: Higher performance with non-coherent IP Sharing between non-coherent & coherent agents Using configurable proxy caches Consumer ❸ Cache ($) Cache ($) ❷ Producer ❶ Directory ❺ Proxy Cache ($) Bridge ❹ CCTI Proxy Cache ($) Bridge Subsystem Copyright 2016 Arteris 27
Benefit #3: Higher performance with non-coherent IP Sharing between non-coherent agents Using configurable proxy caches Cache ($) Cache ($) ❷ Producer ❶ Consumer ❸ Directory Proxy Cache ($) Bridge ❹ CCTI Proxy Cache ($) Bridge Subsystem Copyright 2016 Arteris 28
Ncore Benefits 1. True heterogeneous coherency 2. Highly scalable systems 3. Higher performance with noncoherent IP 4. Lower power consumption 5. Easier chip floorplanning Copyright 2016 Arteris 29
Benefit #4: Lower power consumption With multiple clock and voltage domains Cache ($) Cache ($) Directory Proxy Cache ($) Bridge CCTI Proxy Cache ($) Bridge Subsystem Copyright 2016 Arteris 30
Ncore Benefits 1. True heterogeneous coherency 2. Highly scalable systems 3. Higher performance with noncoherent IP 4. Lower power consumption 5. Easier chip floorplanning Copyright 2016 Arteris 31
Benefit #5: Easier chip floorplanning With a highly distributed architecture Hub- and crossbarbased coherent interconnects require significant contiguous reserved die area Reserve less area for cache coherent interconnect Place it in existing white space routing channels easier P&R Locate modular Ncore components closer to critical IP better timing Minimize wiring congestion Source: Andrei Frumusanu, AnandTech Copyright 2016 Arteris 32
Contents About Arteris Caches, Cache Coherency and Challenges Introducing Ncore Cache Interconnect Summary Copyright 2016 Arteris 33
Summary Ncore Cache Interconnect IP is targeted at heterogeneous SoCs. Benefits Scalability Configurability Area efficiency High performance Optimal power consumption Major Unique Features Multiple configurable snoop filters Multiple configurable proxy caches Modular distributed architecture RESULT: Custom-configured interconnect IP that meets exact system requirements Copyright 2016 Arteris 34
To request more information, visit us at http://www.arteris.com/contact Copyright 2016 Arteris 35