KeyStone C66x Multicore SoC Overview. Dec, 2011

Similar documents
KeyStone C665x Multicore SoC

SoC Overview. Multicore Applications Team

KeyStone Training. Turbo Encoder Coprocessor (TCP3E)

KeyStone Training. Multicore Navigator Overview

Introduction to AM5K2Ex/66AK2Ex Processors

KeyStone Training. Power Management

KeyStone Training. Network Coprocessor (NETCP) Packet Accelerator (PA)

C66x KeyStone Training HyperLink

C66x KeyStone Training HyperLink

A Next Generation Home Access Point and Router

High Performance Compute Platform Based on multi-core DSP for Seismic Modeling and Imaging

Embedded Processing Portfolio for Ultrasound

Keystone Architecture Inter-core Data Exchange

KeyStone Training. Network Coprocessor (NETCP) Packet Accelerator (PA)

KeyStone Training. Bootloader

Leveraging Data Plane Acceleration Techniques on the QorIQ P4080 Processor

AN133: SEC 4.0: Datapath Security Accelerator on the QorIQ P4080 Processor

INTERNET PROTOCOL SECURITY (IPSEC) GUIDE.

Security IP-Cores. AES Encryption & decryption RSA Public Key Crypto System H-MAC SHA1 Authentication & Hashing. l e a d i n g t h e w a y

PE310G4SPI9 Quad Port Fiber 10 Gigabit Ethernet PCI Express Server Adapter Intel 82599ES Based

Doing more with multicore! Utilizing the power-efficient, high-performance KeyStone multicore DSPs. November 2012

FCQ2 - P2020 QorIQ implementation

PRODUCT PREVIEW TNETV1050 IP PHONE PROCESSOR. description

6WINDGate. White Paper. Packet Processing Software for Wireless Infrastructure

The S6000 Family of Processors

PE310G4SPI9LA Quad Port Fiber 10 Gigabit Ethernet PCI Express Server Adapter Intel 82599ES Based

Cisco ASR 1000 Series Routers Embedded Services Processors

Copyright 2016 Xilinx

Introduction to Sitara AM437x Processors

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

An Introduction to the QorIQ Data Path Acceleration Architecture (DPAA) AN129

Building heterogeneous networks of green base stations on TI s KeyStone II architecture

A design of real-time image processing platform based on TMS320C6678

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

S2C K7 Prodigy Logic Module Series

IBM "Broadway" 512Mb GDDR3 Qimonda

The RM9150 and the Fast Device Bus High Speed Interconnect

Keystone ROM Boot Loader (RBL)

Multicore DSP+ARM KeyStone II System-on-Chip (SoC)

Anand Raghunathan

An Intelligent NIC Design Xin Song

Zynq-7000 All Programmable SoC Product Overview

AIR-WLC K9 Datasheet. Overview. Check its price: Click Here. Quick Specs

D Demonstration of disturbance recording functions for PQ monitoring

Achieving UFS Host Throughput For System Performance

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

PE2G6I35 Six Port Copper Gigabit Ethernet PCI Express Server Adapter Intel i350am2 Based

An Experimental Analysis on Iterative Block Ciphers and Their Effects on VoIP under Different Coding Schemes

Commercial Network Processors

1 TMS320C6678 Features and Description

QorIQ T4 Family of Processors. Our highest performance processor family. freescale.com

HyperLink Programming and Performance consideration

OE2G2I35 Dual Port Copper Gigabit Ethernet OCP Mezzanine Adapter Intel I350BT2 Based

AT-501 Cortex-A5 System On Module Product Brief

KeyStone Training. Network Coprocessor (NETCP)

On-Chip Debugging of Multicore Systems

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

Concurrent High Performance Processor design: From Logic to PD in Parallel

C900 PowerPC G4+ Rugged 3U CompactPCI SBC

TMS320C6678 Memory Access Performance

DSP Solutions For High Quality Video Systems. Todd Hiers Texas Instruments

Designing with STM32F2x & STM32F4

MICROPROCESSOR SYSTEM FOR VISUAL BAKED PRODUCTS CONTROL

C901 PowerPC MPC7448 3U CompactPCI SBC

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration

RK3036 Kylin Board Hardware Manual V0.1

High-Performance, Highly Secure Networking for Industrial and IoT Applications

PE3G4TSFI35P Quad Port Fiber Gigabit Ethernet PCI Express Time Stamp Server Adapter Intel Based

Scaling Acceleration Capacity from 5 to 50 Gbps and Beyond with Intel QuickAssist Technology

Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors

The Challenges of System Design. Raising Performance and Reducing Power Consumption

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

LTE and IEEE802.p for vehicular networking: a performance evaluation

Each Milliwatt Matters

OpenMP Accelerator Model for TI s Keystone DSP+ARM Devices. SC13, Denver, CO Eric Stotzer Ajay Jayaraj

Representation of the interested Bidders / vendors

Classification of Semiconductor LSI

C6100 Ruggedized PowerPC VME SBC

SoC Platforms and CPU Cores

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation

Frame Manager (FMan) Internals

PE2G6BPi35 Six Port Copper Gigabit Ethernet PCI Express Bypass Server Adapter Intel based

Netronome NFP: Theory of Operation

Introducing the AM57x Sitara Processors from Texas Instruments

Quality-of-Service for a High-Radix Switch

Comparing TCP performance of tunneled and non-tunneled traffic using OpenVPN. Berry Hoekstra Damir Musulin OS3 Supervisor: Jan Just Keijser Nikhef

Sophon SC1 White Paper

Ron Emerick, Oracle Corporation

Performance Enhancement for IPsec Processing on Multi-Core Systems

Multicore ARM KeyStone II System-on-Chip (SoC)

SoC Communication Complexity Problem

Digital Signal Processor 2010/1/4

Nutaq μdigitizer FPGA-based, multichannel, high-speed DAQ solutions PRODUCT SHEET

PowerQUICC Integrated Communications Processors. freescale.com/powerarchitecture

Application Brief. QorIQ P2040/P2041, P3 and P5 Series. High-performance multicore processors. freescale.com/qoriq

Wireless systems overview

PE2G6SFPI35 Six Port SFP Gigabit Ethernet PCI Express Server Adapter Intel i350am4 Based

PacketShader: A GPU-Accelerated Software Router

PE2G4SFPI35L Quad Port SFP Gigabit Ethernet PCI Express Server Adapter Intel i350am4 Based

Transcription:

KeyStone C66x Multicore SoC Overview Dec, 011

Outline Multicore Challenge KeyStone Architecture Reminder About KeyStone Solution

Challenge Before KeyStone Multicore performance degradation Lack of efficient multicore communication Not easy to program the high speed IO Relatively high loading for EMAC driver Lack of efficient multicore debug tool Any other

Outline Multicore Challenge KeyStone Architecture Reminder About KeyStone Solution

KeyStone Device Features C66x 1 to 8 C66x Fixed/Floating-Point DSP Cores at up to 1.5 GHz Backward-compatible with C64x+ and C67x+ cores Fixed and Floating Point Operations RSA instruction set extensions Chip-rate processing (downlink & uplink) Reed-Muller decoding ( 1 and only) C Application-Specific Up to 1 MB Local L memory per core Up to 4 MB Multicore Shared Memory () Multicore Shared Memory Controller (C), DDR3-1600 MHz (64-bit) Application-Specific x TCP3d: Turbo Decoder TCP3e: Turbo Encoder x FFT (FFT/IFFT and DFT/IDFT) Coprocessor 4x VCP for voice channel decoding L Cache 1 to 8 Cores @ up to 1.5 GHz Interfaces High-speed Hyperlink bus 4x Serial RapidIO Rev.1 x 10/100/1000 ports w/ embedded switch x PCIe Generation II Six-lane Antenna Interface (AIF) for Wireless Applications o WC, WiMAX, LTE, GSM, TD-SC, TD-LTE o Up to 6.144-Gbps Additional Serials: IC,, GPIO, PCIe Application- Specific I/O Embedded Trace Buffer (ETB) & System Trace Buffer (STB) 40 nm High-Performance Process

Comparing C64x+ and C66x TCI64x+ DSP C66x DSP Three 64x+ Cores 1GHz 1 to 8 New C66x Cores @ up to 1.5Ghz Improved memory performance, Improved infrastructure: CP (CPPI) Improved cache performance memory 1 x VCP, 1 x TCP 4 x VCP, x TCP3d, TCP3e, x FFT/iFFT/DFT/iDFT, Crypto, DDR-667Mhz (3 bit) DDR3-1600Mhz (64 bit) x RapdiIO Lanes @ 3.15 Gbps, 4 RapidIO lanes @ 5 Gbps, 6 AIF lanes @ 3.07 Gbps, McBSP 6 AIF lanes @ 6.144 Gbps, PCIe, Embedded Trace Buffer (ETB) 65nm Process Embedded Trace Buffer (ETB), System Monitor 40nm Process

& C 1 to 8 Cores @ up to 1.5 GHz PCIe L Memory (cache/ram) Application- Specific I/ O Application-Specific & 1 to 8 C66x DSP Cores operating at up to 1.5 GHz Fixed and Floating Point Operations Code compatible with other C64x+ and C67x+ devices Memory can be partitioned as cache or 3KB P per core 3KB D per core Error Detection for P Memory Protection Dedicated and Shared L Memory 51 KB to 1 MB Local L per core to 4 MB Multicore Shared Memory () Multicore Shared Memory Controller (C) Error detection and correction for all L memory available to all cores and can be either program or data

Memory Expansion PCIe C 1 to 8 Cores @ up to 1.5 GHz L Application- Specific I/O Application-Specific & Memory Expansion Multicore Shared Memory Controller (C) Arbitrates and SoC master access to shared memory Provides a direct connection to the Provides access to coprocessors and IO peripherals Memory protection and address extension to 64 GB (36 bits) Provides multi-stream pre-fetching capability DDR3 External Memory Interface (EMIF) Support for 1x 16-bit, 1x 3-bit, and 1x 64-bit modes Supports up to 1600 MHz Supports power down of unused pins when using 16-bit or 3-bit width Support for 8 GB memory address Error detection and correction EMIF-16 (Media Applications Only) Three modes: Synchronized NAND flash NOR flash Can be used to connect asynchronous memory (e.g., NAND flash) up to 56 MB.

C Application-Specific & Memory Expansion PCIe 1 to 8 Cores @ up to 1.5 GHz L Application- Specific I/O and Low-overhead processing and routing of packet traffic Simplified resource management Effective inter-processor communications Abstracts physical implementation from application host software Virtualization to enable dynamic load balancing and provide seamless access to resources on different cores 8K hardware queues and 16K descriptors More descriptors can reside in any shared memory 10 Gbps pre-fetching capability

PCIe C 1 to 8 Cores @ up to 1.5 GHz L Application- Specific I/O Application-Specific & Memory Expansion (PA) Support for single or multiple IP addresses 1 Gbps wire-speed throughput at 1.5 Mpps UDP Checksum processing IPSec ESP and AH tunnels with fast path fully offloaded L support:, Ethertype, and VLAN L3/L4 Support: IPv4/IPv6 and UDP port-based raw or IPv4/6 and SCTP port-based routing Multicast to multiple queues QoS capability: Per channel/flow to individual queue towards DSP cores and support for TX traffic shaping per device (SA) Support for IPSec, SRTP, 3GPP and WiMAX Air Interface, and SSL/TLS security Support for simultaneous wire-speed security processing on 1 Gbps transmit and receive traffic. Encryption Modes: ECB, CBC, CTR, F8, A5/3, CCM, GCM, HMAC, CMAC, and GMAC Encryption Algorithms: AES, DES, 3DES, Kasumi, SNOW 3g, SHA-1, SHA-, and MD5

External Interfaces PCIe C 1 to 8 Cores @ up to 1.5 GHz L Application- Specific I/O Application-Specific & Memory Expansion External Interfaces allows two 10/100/1000 interfaces Four high-bandwidth Serial RapidIO (SRIO) lanes for inter- DSP applications for boot operations for development/testing Two PCIe at 5 Gbps I C for EPROM at 400 Kbps Application-specific Interfaces: Antenna Interface (AIF) for wireless applications Telecommunications Serial Port (TSIP) for media applications

Fabric C 1 to 8 Cores @ up to 1.5 GHz PCIe L Cache Application- Specific I/O Application-Specific & Memory Expansion External Interfaces Fabric is a process controller Channel Controller Transfer Controller provides a configured way within hardware to manage traffic queues and ensure priority jobs are getting accomplished while minimizing the involvement of the DSP cores. facilitates high-bandwidth communications between cores, subsystems, peripherals, and memory.

Diagnostic Enhancements PCIe C 1 to 8 Cores @ up to 1.5 GHz L Cache Application- Specific I/O Application-Specific & Memory Expansion External Interfaces Fabric Diagnostic Enhancements Embedded Trace Buffers (ETB) enhance the diagnostic capabilities of the. CP Monitor enables diagnostic capabilities on data traffic through the switch fabric. Automatic statistics collection and exporting (non-intrusive) Monitor individual events for better debugging Monitor transactions to both memory end point and MMRs (memory mapped Regi) Configurable monitor filtering capability based on address and transaction type

Bus PCIe C 1 to 8 Cores @ up to 1.5 GHz L Cache Application- Specific I/O Application-Specific & Memory Expansion External Interfaces Fabric Diagnostic Enhancements Bus Provides the capability to expand the C66x to include hardware acceleration or other auxiliary processors Four lanes with up to 1.5 Gbps per lane

Miscellaneous Elements PCIe C 1 to 8 Cores @ up to 1.5 GHz L Cache Application- Specific I/O Application-Specific & Memory Expansion External Interfaces Fabric Diagnostic Enhancements Bus Miscellaneous provides atomic access to shared chip-level resources. Eight 64-bit timers Three on-chip s: PL for s for DDR3 3 for Acceleration Three

Device-Specific: Wireless Applications PCIe MB C 3KB 4 Cores @ 1.0 GHz / 1. GHz 3KB 104KB L Cache AIF x6 RSA KeyStone Device Architecture for Wireless Applications RSA VCP TCP3d TCP3e FFTC BCP x4 & Memory Expansion External Interfaces Fabric Diagnostic Enhancements Bus Miscellaneous Application-Specific Device-Specific (Wireless Apps) Wireless-specific FFTC TCP3 Decoder/Encoder VCP BCP Wireless-specific Interfaces: AIF x6 Characteristics Package Size: 44 Process Node: 40nm Pin Count: 841 Core Voltage: 0.9-1.1 V x Rake Search (RSA)

Device-Specific: Media Applications EMIF 16 GPIO PCIe 1 to 8 Cores @ up to 1.5 GHz 4MB C 3KB TSIP 3KB 51KB L Cache KeyStone Device Architecture for Media Applications & Memory Expansion External Interfaces Fabric Diagnostic Enhancements Bus Miscellaneous Application-Specific Device-Specific (Media Apps) Media-specific Interfaces TSIP EMIF 16 (EMIF-A) Characteristics Package Size: 44 Process Node: 40nm Pin Count: 841 Core Voltage: 0.9-1.1 V

Outline Multicore Challenge KeyStone Architecture Reminder About KeyStone Solution

Reminder About KeyStone Solution 1 st Gen Multicore Challenge Keystone Solution Multicore performance degradation Lack of efficient multicore communication Not easy to program packet based high speed IO Relatively high loading for EMAC driver Lack of device level debug capabilities Data highway from cores to shared memory and DDR3 Navigator used for packet based inter core communication and memory pipeline Navigator (CPPI/QMSS) handle packet based high speed IO easily PA introduced to offload packet processing Multicore debug Introduced (System Trace Module)

For More Information For more information, refer to the C66x Getting Started page to locate the data manual for your KeyStone device. View the complete C66x Multicore SOC Online Training for KeyStone Devices, including details on the individual modules. For questions regarding topics covered in this training, visit the support forums at the TI EE Community website.