Ncore Cache Coherent Interconnect

Similar documents
Heterogeneous, Distributed and Scalable Cache-Coherent Interconnect

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration

Best Practices of SoC Design

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Intelligent Interconnect for Autonomous Vehicle SoCs. Sam Wong / Chi Peng, NetSpeed Systems

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.

OCP Engineering Workshop - Telco

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Building blocks for 64-bit Systems Development of System IP in ARM

Benefits of Network on Chip Fabrics

Toward a Memory-centric Architecture

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

The Rubber Jigsaw Puzzle

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Verification Futures Nick Heaton, Distinguished Engineer, Cadence Design Systems

Does FPGA-based prototyping really have to be this difficult?

REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.

Next Generation Verification Process for Automotive and Mobile Designs with MIPI CSI-2 SM Interface

MIPI : Advanced Driver Assistance System

Combining Arm & RISC-V in Heterogeneous Designs

SoC Communication Complexity Problem

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Revolutionizing RISC-V based application design possibilities with GLOBALFOUNDRIES. Gregg Bartlett Senior Vice President, CMOS Business Unit

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

New Interconnnects. Moderator: Andy Rudoff, SNIA NVM Programming Technical Work Group and Persistent Memory SW Architect, Intel

CS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:

Smart Me for Smart Life, Smart Lifestyle Driving Internet of Things Revolution

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

Processor Trace in a Holistic World. DAC-2018 San Francisco RISC-V Foundation Booth

Next Generation Enterprise Solutions from ARM

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics

24th MONDAY. Overview 2018

FC-NVMe. NVMe over Fabrics. Fibre Channel the most trusted fabric can transport NVMe natively. White Paper

Software Defined Modem A commercial platform for wireless handsets

HETEROGENOUS COMPUTE IN A QUAD CORE CPU

Silicon Motion s Graphics Display SoCs

A Secure and Connected Intelligent Future. Ian Smythe Senior Director Marketing, Client Business Arm Tech Symposia 2017

ARM instruction sets and CPUs for wide-ranging applications

MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

The Path to Embedded Vision & AI using a Low Power Vision DSP. Yair Siegel, Director of Segment Marketing Hotchips August 2016

DesignWare IP for IoT SoC Designs

Emergence of Segment-Specific DDRn Memory Controller and PHY IP Solution. By Eric Esteve (PhD) Analyst. July IPnest.

RapidIO.org Update. Mar RapidIO.org 1

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications

It's not about the core, it s about the system

SD Express Cards with PCIe and NVMeTM Interfaces

Advanced Memory Organizations

Memory Systems IRAM. Principle of IRAM

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Validation Strategies with pre-silicon platforms

ECE 486/586. Computer Architecture. Lecture # 2

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

A 1-GHz Configurable Processor Core MeP-h1

Chapter Seven Morgan Kaufmann Publishers

DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for 5G Networks

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University

Enabling Technology for the Cloud and AI One Size Fits All?

Advantages of MIPI Interfaces in IoT Applications

NVMe over Universal RDMA Fabrics

Low-Power Processor Solutions for Always-on Devices

3D-IC is Now Real: Wide-IO is Driving 3D-IC TSV. Samta Bansal and Marc Greenberg, Cadence EDPS Monterey, CA April 5-6, 2012

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

ECE 571 Advanced Microprocessor-Based Design Lecture 10

Embedded HW/SW Co-Development

Technology Trends Presentation For Power Symposium

BREAKING THE MEMORY WALL

High Performance Memory in FPGAs

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Architectural Musings

Computer Architecture Memory hierarchies and caches

Design Techniques for Implementing an 800MHz ARM v5 Core for Foundry-Based SoC Integration. Faraday Technology Corp.

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

Hezi Saar, Sr. Staff Product Marketing Manager Synopsys. Powering Imaging Applications with MIPI CSI-2

SONICS, INC. Sonics SOC Integration Architecture. Drew Wingard. (Systems-ON-ICS)

CCIX: a new coherent multichip interconnect for accelerated use cases

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

The Evolution of the ARM Architecture Towards Big Data and the Data-Centre

QLE10000 Series Adapter Provides Application Benefits Through I/O Caching

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

GEN-Z AN OVERVIEW AND USE CASES

Signal Processing IP for a Smarter, Connected World. May 2017

SOM i1 Single Core SOM (System-On-Module) Rev 1.5

A Building Block 3D System with Inductive-Coupling Through Chip Interfaces Hiroki Matsutani Keio University, Japan

THE PATH TO EXASCALE COMPUTING. Bill Dally Chief Scientist and Senior Vice President of Research

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Cost-Optimized Backgrounder

Benchmarking Real-World In-Vehicle Applications

Marvell Public Cloud Vision Web 2.0 Data Center

Altera SDK for OpenCL

NXP-Freescale i.mx6 MicroSoM i2. Dual Core SoM (System-On-Module) Rev 1.3

CIT 668: System Architecture. Computer Systems Architecture

Transcription:

Ncore Cache Interconnect Technology Overview, 24 May 2016 Craig Forrest Chief Technology Officer David Kruckemyer Chief Hardware Architect Copyright 2016 Arteris 24 May 2016

Contents About Arteris Caches, Cache Coherency and Challenges Introducing Ncore Cache Interconnect Summary Copyright 2016 Arteris 2

Arteris: The on-chip interconnect leader Arteris Product Milestones Founded in 2003 to pioneer network-on-chip (NoC) interconnect NoC Solution = first released NoC implementation in 2005 FlexNoC = second generation Arteris NoC in 2009/2010 FlexPSI = die-to-die or chip-to-chip parallel interface in 2013 FlexNoC Resilience Package = Functional Safety option in 2014 FlexNoC Physical = Physically aware IP with FlexNoC Version 3 in 2015 Ncore Cache Interconnect = Heterogeneous cache coherency in 2016. Company Headquarters and Engineering Development in Campbell, USA Worldwide support offices (USA, France, China, Korea, India, Japan) Awards Customer Adoption 41 52 58 67 76 79 1 6 9 13 20 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Copyright 2016 Arteris 3 * Customer data current as of 1 May 2016

Arteris has become the standard for complex and low-power SoCs Customers shipped > 1B SoCs as of 2015 240 Design Starts 41 26 1 5 13 85 128 159 190 229 240 146 Tape-Outs 32 19 1 5 11 55 99 119 140 146 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 108 Chips Produced 104 108 79 51 1 4 11 20 33 2008 2009 2010 2011 2012 2013 2014 2015 2016 *Data is cumulative. Design data is customer-reported and subject to change. Data is current as of 1 May 2016. Copyright 2016 Arteris 4

Arteris Customers: Arteris technology is becoming a standard Mobility Current as of 1 May 2016 Very Large SoC Maker Automotive, IoT (Internet of Things), Camera & CE (Consumer Electronics) Major Automotive OEM Major Auto & CE SoC Maker Toshiba Japan System OEM Automotive SoC Maker Japan Tier 1 SoC Maker Large Drone Maker SSD (Solid State Drive), Networking & Automation Major SSD Vendor Major SSD Vendor Defense Contractor Defense Contractor Defense Contractor Silicon Foundry Major IP Provider Copyright 2016 Arteris 5

Arteris interconnect IP now covers coherent and non-coherent use cases CPU Subsystem A57 A57 A57 A57 A53 A53 A53 A53 Design-Specific Subsystems GPU Subsystem 3D Graphics DSP Subsystem (A/V) IP IP IP FlexWay Interconnect IP Application IP Subsystem IP IP FlexWay Interconnect AES 2D GR. MPEG L2 cache L2 cache IP IP IP IP IP IP Etc. Ncore Cache Interconnect FlexNoC Interconnect InterChip Links TM Scheduler Controller Wide IO PHY LP DDR DDR3 PHY USB 3 USB 2 PHY 3.0, 2.0 Subsystem Interconnect PCIe PHY High Speed Wired Peripherals Ethernet PHY WiFi GSM LTE LTE Adv. Wireless Subsystem CRI Crypto Firewall (PCF+) RSA- PSS Cert. Engine HDMI MIPI Display PMU JTAG Arteris Interconnect IP Products Security Subsystem I/O Peripherals Subsystem Copyright 2016 Arteris 6

Contents About Arteris Caches, Cache Coherency and Challenges Introducing Ncore Cache Interconnect Summary Copyright 2016 Arteris 7

Modern SoC Design Challenges SCALABILITY: How to scale systems up as the number of coherent agents increases? HETEROGENEITY: How to integrate coherent processing elements using different protocols, different semantics, or having different cache characteristics? SYSTEM INTEGRATION: How to integrate IP that is not cache coherent and achieve better performance? PHYSICAL DESIGN: How to create a cache coherent system that is easily placed on chip? POWER MANAGEMENT: How to optimize power consumption of complex systems? Copyright 2016 Arteris 8

Why Caches? Caches are small, fast memories tightly coupled to processing elements Reduced average memory latency means higher performance Temporal locality Spatial locality High bandwidth due to high frequency and wide interfaces Fewer off-chip DRAM accesses resulting in lower power consumption Copyright 2016 Arteris 9

Why Cache Coherency? Caches create multiple copies of data Managing these copies in software is difficult Hardware cache coherency creates the illusion of a flat, shared memory Caches are invisible to software Multiple copies are kept consistent But managing copies in hardware requires a lot of communication Must check every place there may be a valid copy à filters reduce communication by tracking cache contents Copyright 2016 Arteris 10

Contents About Arteris Caches, Cache Coherency and Challenges Introducing Ncore Cache Interconnect Summary Copyright 2016 Arteris 11

Ncore Cache Interconnect IP s Agents CPU Cluster Cache ($) GPU Cache ($) Image Processing Display Processing Subsystems Peripherals Agents DRAM SRAM Agents Copyright 2016 Arteris 12

Ncore Interconnect Architecture Cache ($) Cache ($) Directory Proxy Cache ($) Bridge CCTI Proxy Cache ($) Bridge Subsystem Copyright 2016 Arteris 13

Read Example Cache Hit ❶ Consumer Cache ($) Cache ($) Cache ($) ❸ Producer Directory ❷ Proxy Cache ($) Bridge CCTI Bridge Subsystem Copyright 2016 Arteris 14

Read Example Cache Misses ❶ Consumer Cache ($) Cache ($) Cache ($) Directory ❷ ❹ ❸ CCTI Proxy Cache ($) Bridge Bridge Subsystem Copyright 2016 Arteris 15

Ncore Benefits 1. True heterogeneous coherency 2. Highly scalable systems 3. Higher performance with noncoherent IP 4. Lower power consumption 5. Easier chip floorplanning Copyright 2016 Arteris 16

Benefit #1: True heterogeneous coherency Two features are primarily responsible for enabling Ncore s unique heterogeneous cache coherency capabilities: 1. Support for multiple coherence models 2. Use of multiple configurable snoop filters to accommodate different cache organizations Copyright 2016 Arteris 17

Benefit #1: True heterogeneous coherency Support for heterogeneous coherent agents Cache coherent agents can differ greatly, which increases the difficulty in integrating them into a system-on-chip Logical coherence models Physical cache organization, transaction table sizes Ncore adapts to each coherent agent s behavior and characteristics agent interfaces adapt individual coherence models to a generic model using a lightweight messaging layer Copyright 2016 Arteris 18

Benefit #1: True heterogeneous coherency agent interfaces adapt individual coherence models to a generic model Cache ($) Cache ($) Directory Proxy Cache ($) Bridge CCTI Proxy Cache ($) Bridge Subsystem Copyright 2016 Arteris 19

Benefit #1: True heterogeneous coherency With multiple configurable snoop filters Directory Cache ($) Cache ($) Cache coherent agents can have very different behaviors Cache organization Coherency models Workloads Proxy Cache ($) Bridge(s) Associating caching agents that share CCTI common properties with individual Domain snoop filters can consume less die area than a monolithic snoop filter Copyright 2016 Arteris 20

Benefit #1: True heterogeneous coherency Multiple snoop filters are more area-efficient than one A B Cache ($) Cache ($) C Cache ($) D Cache ($) Traditional Approach Ncore Approach REQ Monolithic (X) A B C D REQ #1 (Y) #2 (Z) A B C D Multiple snoop filters are smaller: area(y+z) < area (X) Copyright 2016 Arteris 21

Ncore Benefits 1. True heterogeneous coherency 2. Highly scalable systems 3. Higher performance with noncoherent IP 4. Lower power consumption 5. Easier chip floorplanning Copyright 2016 Arteris 22

Benefit #2: Highly scalable systems With a configurable, modular approach Transaction processing and data bandwidth scaling Each component can be scaled individually (add or subtract components) Ports per component can be scaled individually (add or remove ports) Why is configurable interconnect superior to fixed-function, centralized controllers? Meet performance goals without wasted resources Easily adjust system design as requirements evolve Build derivative chips based on the same platform Copyright 2016 Arteris 23

Benefit #2: Highly scalable systems Add more components or ports to scale bandwidth Cache ($) Cache ($) Cache ($) or add more ports Directory Add more components CCTI Proxy Cache ($) Proxy Cache ($) Bridge Bridge Subsystem Arteris Confidential 24

Ncore Benefits 1. True heterogeneous coherency 2. Highly scalable systems 3. Higher performance with noncoherent IP 4. Lower power consumption 5. Easier chip floorplanning Copyright 2016 Arteris 25

Benefit #3: Higher performance with non-coherent IP Using configurable proxy caches Advantages (new and novel) 1. Better for sharing data between non-coherent agents and coherent agents 2. Better for sharing data between non-coherent agents Using a proxy cache minimizes communication through DRAM Additional system benefits Pre-fetch effect fetch cache lines vs. individual data Write-gathering benefit writes accumulated in cache Optimizes coherent memory accesses Copyright 2016 Arteris 26

Benefit #3: Higher performance with non-coherent IP Sharing between non-coherent & coherent agents Using configurable proxy caches Consumer ❸ Cache ($) Cache ($) ❷ Producer ❶ Directory ❺ Proxy Cache ($) Bridge ❹ CCTI Proxy Cache ($) Bridge Subsystem Copyright 2016 Arteris 27

Benefit #3: Higher performance with non-coherent IP Sharing between non-coherent agents Using configurable proxy caches Cache ($) Cache ($) ❷ Producer ❶ Consumer ❸ Directory Proxy Cache ($) Bridge ❹ CCTI Proxy Cache ($) Bridge Subsystem Copyright 2016 Arteris 28

Ncore Benefits 1. True heterogeneous coherency 2. Highly scalable systems 3. Higher performance with noncoherent IP 4. Lower power consumption 5. Easier chip floorplanning Copyright 2016 Arteris 29

Benefit #4: Lower power consumption With multiple clock and voltage domains Cache ($) Cache ($) Directory Proxy Cache ($) Bridge CCTI Proxy Cache ($) Bridge Subsystem Copyright 2016 Arteris 30

Ncore Benefits 1. True heterogeneous coherency 2. Highly scalable systems 3. Higher performance with noncoherent IP 4. Lower power consumption 5. Easier chip floorplanning Copyright 2016 Arteris 31

Benefit #5: Easier chip floorplanning With a highly distributed architecture Hub- and crossbarbased coherent interconnects require significant contiguous reserved die area Reserve less area for cache coherent interconnect Place it in existing white space routing channels easier P&R Locate modular Ncore components closer to critical IP better timing Minimize wiring congestion Source: Andrei Frumusanu, AnandTech Copyright 2016 Arteris 32

Contents About Arteris Caches, Cache Coherency and Challenges Introducing Ncore Cache Interconnect Summary Copyright 2016 Arteris 33

Summary Ncore Cache Interconnect IP is targeted at heterogeneous SoCs. Benefits Scalability Configurability Area efficiency High performance Optimal power consumption Major Unique Features Multiple configurable snoop filters Multiple configurable proxy caches Modular distributed architecture RESULT: Custom-configured interconnect IP that meets exact system requirements Copyright 2016 Arteris 34

To request more information, visit us at http://www.arteris.com/contact Copyright 2016 Arteris 35