SoC Overview. Multicore Applications Team

Similar documents
KeyStone C66x Multicore SoC Overview. Dec, 2011

KeyStone C665x Multicore SoC

KeyStone Training. Turbo Encoder Coprocessor (TCP3E)

KeyStone Training. Multicore Navigator Overview

Introduction to AM5K2Ex/66AK2Ex Processors

KeyStone Training. Power Management

C66x KeyStone Training HyperLink

C66x KeyStone Training HyperLink

Keystone Architecture Inter-core Data Exchange

The S6000 Family of Processors

KeyStone Training. Bootloader

Embedded Processing Portfolio for Ultrasound

Copyright 2016 Xilinx

FPGA Adaptive Software Debug and Performance Analysis

Keystone ROM Boot Loader (RBL)

High Performance Compute Platform Based on multi-core DSP for Seismic Modeling and Imaging

High-Performance, Highly Secure Networking for Industrial and IoT Applications

Zynq-7000 All Programmable SoC Product Overview

Multicore DSP+ARM KeyStone II System-on-Chip (SoC)

A Next Generation Home Access Point and Router

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Doing more with multicore! Utilizing the power-efficient, high-performance KeyStone multicore DSPs. November 2012

KeyStone Training. Network Coprocessor (NETCP) Packet Accelerator (PA)

PRU Hardware Overview. Building Blocks for PRU Development: Module 1

KeyStone Training. Network Coprocessor (NETCP) Packet Accelerator (PA)

D Demonstration of disturbance recording functions for PQ monitoring

S2C K7 Prodigy Logic Module Series

Chapter 7. Hardware Implementation Tools

1 TMS320C6678 Features and Description

SoC Platforms and CPU Cores

DSP Solutions For High Quality Video Systems. Todd Hiers Texas Instruments

The Challenges of System Design. Raising Performance and Reducing Power Consumption

Introduction to Sitara AM437x Processors

An Introduction to the QorIQ Data Path Acceleration Architecture (DPAA) AN129

RK3036 Kylin Board Hardware Manual V0.1

A design of real-time image processing platform based on TMS320C6678

1 66AK2H14/12/06 Features and Description

KeyStone Training. Network Coprocessor (NETCP)

On-Chip Debugging of Multicore Systems

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

FCQ2 - P2020 QorIQ implementation

MICROPROCESSOR SYSTEM FOR VISUAL BAKED PRODUCTS CONTROL

Multicore ARM KeyStone II System-on-Chip (SoC)

Design Choices for FPGA-based SoCs When Adding a SATA Storage }

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Sophon SC1 White Paper

Technical Note on NGMP Verification. Next Generation Multipurpose Microprocessor. Contract: 22279/09/NL/JK

LinkSprite Technologies,.Inc. pcduino V2

Zynq Architecture, PS (ARM) and PL

PRODUCT PREVIEW TNETV1050 IP PHONE PROCESSOR. description

«Real Time Embedded systems» Cyclone V SOC - FPGA

WS_CCESSH5-OUT-v1.01.doc Page 1 of 7

DaVinci. DaVinci Processor CPU MHz

INTERNET PROTOCOL SECURITY (IPSEC) GUIDE.

Classification of Semiconductor LSI

OCP Engineering Workshop - Telco

Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02

FPQ6 - MPC8313E implementation

Leveraging Data Plane Acceleration Techniques on the QorIQ P4080 Processor

Advanced Embedded Systems

Product Technical Brief S3C2440X Series Rev 2.0, Oct. 2003

2008/12/23. System Arch 2008 (Fire Tom Wada) 1

LEON4: Fourth Generation of the LEON Processor

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

TMS320C6678 Memory Access Performance

Designing Embedded Processors in FPGAs

AT-501 Cortex-A5 System On Module Product Brief

pcduino V3B XC4350 User Manual

Digital Signal Processor 2010/1/4

Gedae cwcembedded.com. The CHAMP-AV6 VPX-REDI. Digital Signal Processing Card. Maximizing Performance with Minimal Porting Effort

Blackfin ADSP-BF533 External Bus Interface Unit (EBIU)

HyperLink Programming and Performance consideration

Place Your Logo Here. K. Charles Janac

Tutorial Introduction

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle

参考資料. LinkSprite.com. pcduino V2

Sparrowhawk FX FPGA Video Processing Board

Product Technical Brief S3C2416 May 2008

Building heterogeneous networks of green base stations on TI s KeyStone II architecture

OpenMP Accelerator Model for TI s Keystone DSP+ARM Devices. SC13, Denver, CO Eric Stotzer Ajay Jayaraj

1. Overview for Cyclone V Device Family

FPQ9 - MPC8360E implementation

Cyclone V Device Overview

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction.

CONTACT: ,

Lesson 6 Intel Galileo and Edison Prototype Development Platforms. Chapter-8 L06: "Internet of Things ", Raj Kamal, Publs.: McGraw-Hill Education

Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools

Interconnects, Memory, GPIO

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C- Ware, the Energy Efficient Solutions logo, Kinetis,

Introduction to Pre-Boot Loader Supported by QorIQ Processors

Integrating DMA capabilities into BLIS for on-chip data movement. Devangi Parikh Ilya Polkovnichenko Francisco Igual Peña Murtaza Ali

Hercules ARM Cortex -R4 System Architecture. Processor Overview

Copyright 2017 Xilinx.

KeyStone II. CorePac Overview

QorIQ P4080 Processor Pre-Boot Loader Image Tool

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Cyclone V Device Overview

Security IP-Cores. AES Encryption & decryption RSA Public Key Crypto System H-MAC SHA1 Authentication & Hashing. l e a d i n g t h e w a y

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

Transcription:

KeyStone C66x ulticore SoC Overview ulticore Applications Team

KeyStone Overview KeyStone Architecture & Internal Communications and Transport External Interfaces and s Debug iscellaneous Application and Device specific

Enhanced DSP Core C66x ISA Perform mance im mprovem ment C67x Native instructions for IEEE 754, SP&DP Advanced VLIW architecture C67x+ 2x registers Enhanced floatingpoint add capabilities 100% upward object code compatible 4x performance improvement for multiply operation 32 16-bit ACs Improved support for complex arithmetic and matrix computation C674x 100% upward object code compatible with C64x, C64x+, C67xand c67x+ Best of fixed-point and floating-point architecture for better system performance and faster time-to-market. C64x+ SPLOOP and 16-bit instructions for smaller code size Flexible level one memory architecture ida for rapid data transfers between local memories C64x Advanced fixed- point instructions Four 16-bit or eight 8-bit ACs Two-level cache FLOATING-POINT VALUE FIXED-POINT VALUE Preliminary Information under NDA - subject to change

KeyStone Device Architecture Application-Specific iscellaneous External Interfaces

iscellaneous L1P L1D L2 emory Cache/RA 1 to 8 Cores @ up to 1.25 GHz External Interfaces Application-Specific 1 to 8 C66x DSP Cores operating at up to 1.25 GHz Fixed and floating point operations Code compatible with other C64x+ and C67x+ devices L1 emory Can be partitioned as cache and/or RA 32KB L1P per core 32KB L1D per core Error dt detection ti for L1P emory protection Dedicated L2 emory Can be partitioned as cache and/or RA 512 KB to 1 B Local L2 per core Error detection and correction for all L2 memory Direct connection to memory subsystem

iscellaneous S SRA L1P L1D L2 emory Cache/RA 1 to 8 Cores @ up to 1.25 GHz External Interfaces Application-Specific ulticore Shared emory (S SRA) 1 to 4 B Available to all cores Can contain program and data All devices except C6654 ulticore Shared emory Controller () Arbitrates access of and SoC masters to shared memory Provides a connection to the Provides access to coprocessors and IO peripherals Provides error detection and correction for all shared memory emory protection and address extension to 64 GB (36 bits) Provides multi stream pre fetching capability DDR3 External emory Interface (EIF) Support for 16 bit, 32 bit, and (for C667x devices) 64 bit modes Specified at up to 1600 T/s Supports power down of unused pins when using 16 bit or32 bit width Support for 8 GB memory address Error detection and correction

iscellaneous S SRA L1P L1D L2 emory Cache/RA 1 to 8 Cores @ up to 1.25 GHz External Interfaces Application-Specific Queue anager DA Provides seamless inter core communications (messages and data exchanges) between cores, IP, and peripherals. Fire and forget Low overhead processing and routing of packet traffic to and from peripherals and cores Supports dynamic load optimization Data transfer architecture designed to minimize host interaction while maximizing memory and bus efficiency Consists of a Queue anager Subsystem (QSS) and multiple, dedicated DA engines

Architecture

(C667x) iscellaneous S SRA L1P L1D L2 emory Cache/RA 1 to 8 Cores @ up to 1.25 GHz External Interfaces Application-Specific Provides hardware accelerators to perform L2, L3, and L4 processing and encryption that was previously done in software (PA) 8K multiple in, multiple out HW queues Single IP address option UDP (and TCP) checksum and selected CRCs L2/L3/L4 support Quality of Service (QoS) Queue anager DA ulticast to multiple queues Timestamps Security Security (SA) Hardware encryption, decryption, and Sauthentication Supports IPsec ESP, IPsec AH, SRTP, and 3GPP protocols Ethernet t SGII

External Interfaces iscellaneous Application Specific I/O GPIO I C 2 S SRA L1P L1D L2 emory Cache/RA 1 to 8 Cores @ up to 1.25 GHz PCIe UART SPI Application Specific I/O SRIO x4 Ethernet t Application-Specific Queue anager DA Security 2x SGII ports support 10/100/1000 Ethernet 4x high bandwidth Serial RapidIO (SRIO) lanes for inter DSP applications SPI for boot operations UART for development/testing 2x PCIe at 5 Gbps I2C for EPRO at 400 Kbps GPIO Application specific Interfaces SGII

Fabric iscellaneous Application Specific I/O GPIO I C 2 S SRA L1P L1D L2 emory Cache/RA 1 to 8 Cores @ up to 1.25 GHz PCIe UART SPI Application Specific I/O SRIO x4 Ethernet t SGII Application-Specific Queue anager DA Security A non blocking switch fabric that enables fast and contention free internal data movement Provides a configured way within hardware to manage traffic queues and ensure priority jobs are getting accomplished while minimizing the involvement of the cores Facilitates high bandwidth communications between cores, subsystems, peripherals, and memory

Data Connections S TPCC 16ch QDA EDA_0 SRIO TC0 TC1 Network Coprocessor TPCC TC2 64ch TPCC TC3 TC6 TC4 TC7 QDA 64ch TC5 TC8 QDA TC9 EDA_1,2 TAC_FE RAC_BE0,1 RAC_BE0,1 FFTC / PktDA FFTC / PktDA AIF / PktDA CPUCLK K/2 256bit Ter ranet CPUCLK/ /3 12 28bit Tera anet S DDR3 S Shared L2 S S S S XC S L2 Core Core Core 0 3 S SRIO S TCP3e 3e_ W/R S S TCP3d TCP3d S TAC_BE S RAC_FE S RAC_FE S S VCP2 VCP2 (x4) S VCP2 (x4) S VCP2 (x4) (x4) DDR3 Facilitates high bandwidth communication links between DSP cores, subsystems, peripherals, and memories. Supports parallel orthogonal communication links QSS PCIe DebugSS S S QSS PCIe

Diagnostic Enhancements Debug/Trace iscellaneous Application Specific I/O GPIO I C 2 S SRA L1P L1D L2 emory Cache/RA 1 to 8 Cores @ up to 1.25 GHz PCIe UART SPI Application Specific I/O SRIO x4 Ethernet t Application-Specific Queue anager DA Security Embedded Trace Buffers (ETB) enhance the diagnostic capabilities of the. CP onitor enables diagnostic capabilities on data traffic through the switch fabric. Automatic statistics collection and exporting (non intrusive) onitor individual events for better debugging onitor transactions to both memory end point and emory apped Registers (R) Configurable monitor filtering capability based on address and transaction type SGII

Bus Debug/Trace S SRA Application-Specific Provides the capability to expand the device to include hardware acceleration or other auxiliary processors Supports four lanes with up to 12.5 Gbaud per lane L1P L1D L2 emory Cache/RA iscellaneous 1 to 8 Cores @ up to 1.25 GHz Queue anager DA Application Specific I/O GPIO I C 2 PCIe UART SPI Application Specific I/O SRIO x4 Ethernet t Security SGII

iscellaneous Elements Debug/Trace Boot RO Semaphore Power anagement PLL EDA x3 x3 S SRA L1P L1D L2 emory Cache/RA 1 to 8 Cores @ up to 1.25 GHz Application-Specific Queue anager DA Boot RO Semaphore module provides atomic access to shared chiplevel resources. Power anagement Three on chip PLLs: PLL1 for s, except PLL2 for DDR3 PLL3 for Acceleration Three EDA controllers Eight 64 bit timers Inter Processor Communication (IPC) Registers Application Specific I/O GPIO I C 2 PCIe UART SPI Application Specific I/O SRIO x4 Ethernet t Security SGII

App Specific: Wireless Applications 64-Bit Debug/Trace Boot RO Semaphore Power anagement PLL EDA x3 x3 GPIO IC 2 2B S SRA RSA 32KB L1P 32KB L1D 1024KB L2 Cache/RA RSA 4 Cores @ 1.0 GHz / 1.2 GHz PCIe UART SPI AIF2 x6 SRIO x4 t C6670 Ethernet VCP2 TCP3d TCP3e FFTC BCP x4 Queue anager DA Security Wireless specific coprocessors: 2x FFT Coprocessor (FFTC) Turbo Decoder/Encoder Coprocessor (TCP3D/3E) 4x Viterbi Coprocessor (VCP2) Bit rate Coprocessor (BCP) 2x Rake Search (RSA) Wireless specific Interfaces: 6x Antenna Interface 2 (AIF2) SGII

App Specific: General Purpose 64-Bit Debug/Trace Boot RO Semaphore Power anagement PLL EDA x3 x3 4B S SRA 32KB L1P 32KB L1D 512KB L2 Cache/RA 1 to 8 Cores @ up to 1.25 GHz C6671/C6672 C6674/C6678 Queue anager DA General Purpose Application Interfaces: 2x Telecommunications Serial Port (TSIP) EIF 16 (EIF A) : Connects memory up to 256 B Three modes: Synchronized SRA NAND flash NOR flash EIF 16 GPIO IC 2 PCIe UART SPI TSIP SRIO x4 t Ethernet Security SGII

Low Power Low Cost KeyStone C665x Sub family

KeyStone C6655/57: Device Features C66x C6655: One C66x DSP Core at 1.0 or 1.25 GHz C6657: Two C66x DSP Cores at 0.85, 1.0, or 1.25 GHz Fixed and Floating Point Operations Backward-compatible with C64x+ and C67x+ cores 1 B Local L2 memory per core ulticore Shared emory Controller () 32-bit DDR3 Interface Hardware Turbo Coprocessor Decoder (TCP3d) 2x Viterbi (VCP2) Interfaces High-speed Hyperlink bus One 10/100/1000 Ethernet SGII port 4x Serial RapidIO (SRIO) Rev 2.1 2x PCIe Gen2 2x ultichannel Buffered Serial Ports (cbsp) One Asynchronous emory Interface (EIF16) Additional Serials: SPI, I 2 C, UPP, GPIO, UART 32-Bit Debug/Trace Boot RO Semaphore Timers Security / Key anager Power anagement PLL EDA EIF16 GPIO UPP 1B S SRA 2nd core, C6657 only 32KB L1P 32KB L1D 1024KB L2 Cache 1 or 2 Cores @ up to 1.25 GHz I 2 C UART SPI cbsp PCIe SRIO x4 Ethernet AC SGII C6655/57 TCP3d VCP2 Queue anager DA

KeyStone C6654: Power Optimized C66x C6654: One DSP Core at 850 Hz Fixed and Floating Point Operations Backward compatible with C64x+ and C67x+ cores 1 B Local L2 memory ulticore Shared emory Controller () 32-bit DDR3 Interface Interfaces One 10/100/1000 Ethernet SGII port 2x PCIe Gen2 2x ultichannel Buffered Serial Ports (cbsp) One Asynchronous emory Interface (EIF16) Additional Serials: SPI, I 2 C, UPP, GPIO, UART 32-Bit Debug/Trace Boot RO Semaphore Timers Security / Key anager Power anagement PLL EDA EIF16 GPIO 32KB L1P 32KB L1D 1024KB L2 Cache UPP 1 Core @ 850 Hz I 2 C UART SPI cbsp PCIe Ethernet AC C6654 Queue anager DA SGII

KeyStone C665x: Key HW Variations HW Feature C6654 C6655 C6657 Frequency (GHz) 0.85 1 @ 1.0, 1.25 2 @ 0.85, 1.0, 1.25 ulticore Shared emory (S) No 1024KB SRA DDR3 aximum Data Rate 1066 1333 Serial Rapid I/O Lanes No 4x No Yes Viterbi Coprocessor (VCP) No 2x Turbo Coprocessor Decoder (TCP3d) No Yes Network CoProcessor (NETCP) No No

For ore Information For more information, refer to the C66x Getting Started page pg to locate the data manual for your KeyStone device. View the complete C66x ulticore SOC Online Training for KeyStone Devices, including details on theindividual modules. For questions regarding topics covered in this training, visit the support forums at the TI E2E Community website.