SPEAr: an HW/SW reconfigurable multi processor architecture

Similar documents
New System Solutions for Laser Printer Applications by Oreste Emanuele Zagano STMicroelectronics

Product Technical Brief S3C2416 May 2008

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.

Effective System Design with ARM System IP

Interconnects, Memory, GPIO

Fujitsu SOC Fujitsu Microelectronics America, Inc.

Zynq-7000 All Programmable SoC Product Overview

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006

Test and Verification Solutions. ARM Based SOC Design and Verification

Product Technical Brief S3C2413 Rev 2.2, Apr. 2006

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02

Copyright 2016 Xilinx

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform

STM32F429 Overview. Steve Miller STMicroelectronics, MMS Applications Team October 26 th 2015

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

AT-501 Cortex-A5 System On Module Product Brief

Cannon Mountain Dr Longmont, CO LS6410 Hardware Design Perspective

DaVinci. DaVinci Processor CPU MHz

Course Introduction. Purpose: Objectives: Content: Learning Time:

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

Agilent N2533A RMP 4.0 Remote Management Processor Data Sheet

S2C K7 Prodigy Logic Module Series

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

Designing Embedded Processors in FPGAs

Platform-based Design

Software Defined Modem A commercial platform for wireless handsets

SoC Platforms and CPU Cores

Zatara Series ARM ASSP High-Performance 32-bit Solution for Secure Transactions

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

SAM A5 ARM Cortex - A5 MPUs

The World Leader in High Performance Signal Processing Solutions. DSP Processors

System-on-a-Programmable-Chip (SOPC) Development Board

Systems in Silicon. Converting Élan SC400/410 Design to Élan SC520

ARMed for Automotive. Table of Contents. SHARP and ARM Automotive Segments SHARP Target Applications SHARP Devices SHARP Support Network Summary

MYD-IMX28X Development Board

LPC4370FET256. Features and benefits

RZ Embedded Microprocessors

SoC FPGAs. Your User-Customizable System on Chip Altera Corporation Public

Product Series SoC Solutions Product Series 2016

STM32F7 series ARM Cortex -M7 powered Releasing your creativity

Military Grade SmartFusion Customizable System-on-Chip (csoc)

ARM Processors for Embedded Applications

Processor and Peripheral IP Cores for Microcontrollers in Embedded Space Applications

KeyStone C665x Multicore SoC

Silicon Motion s Graphics Display SoCs

Single-chip PLC for STEP7from Siemens

MYD-IMX28X Development Board

Introduction to ARM LPC2148 Microcontroller

Copyright 2014 Xilinx

An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform

A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system

UM LPC3180 User Manual. Document information. LPC3180; ARM9; 16/32-bit ARM microcontroller User manual for LPC3180

Product Technical Brief S3C2440X Series Rev 2.0, Oct. 2003

MYC-C7Z010/20 CPU Module

SoC Design Lecture 9: Platform Based Design. Shaahin Hessabi Department of Computer Engineering

Zynq Architecture, PS (ARM) and PL

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego

Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs. August 2015

SH-MobileG1: A Single-Chip Application and Dual-mode Baseband Processor

Hi3536 H.265 Decoder Processor. Brief Data Sheet. Issue 03. Date

«Real Time Embedded systems» Cyclone V SOC - FPGA

STM32F7 series ARM Cortex -M7 powered Releasing your creativity

DSP Solutions For High Quality Video Systems. Todd Hiers Texas Instruments

Introducing the Spartan-6 & Virtex-6 FPGA Embedded Kits

Contents of this presentation: Some words about the ARM company

Design Techniques for Implementing an 800MHz ARM v5 Core for Foundry-Based SoC Integration. Faraday Technology Corp.

STM32 F-2 series High-performance Cortex-M3 MCUs

FPGA Adaptive Software Debug and Performance Analysis

Oberon M2M IoT Platform. JAN 2016

ARM-Based Embedded Processor Device Overview

Embedded MPU with dual ARM926 core, flexible memory support, powerful connectivity features and programmable LCD interface.

MYD-C7Z010/20 Development Board

T he key to building a presence in a new market

100M Gate Designs in FPGAs

SPEAr: A Customozable MPSoC for Multiprocessor Applications

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

The S6000 Family of Processors

PRODUCT PREVIEW TNETV1050 IP PHONE PROCESSOR. description

Growth outside Cell Phone Applications

Excalibur Device Overview

LEON4: Fourth Generation of the LEON Processor

Next Generation Multi-Purpose Microprocessor

The Challenges of System Design. Raising Performance and Reducing Power Consumption

Does FPGA-based prototyping really have to be this difficult?

Veloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics

Software Driven Verification at SoC Level. Perspec System Verifier Overview

NS9210/NS9215. Overview. Block Diagram. NS µ CMOS, 265-pin BGA. Features/Benefits. Platforms and Services.

Embedded Systems: Architecture

DevKit7000 Evaluation Kit

Development of Low Power and High Performance Application Processor (T6G) for Multimedia Mobile Applications

PRU Hardware Overview. Building Blocks for PRU Development: Module 1

Introduction to Sitara AM437x Processors

A ONE CHIP HARDENED SOLUTION FOR HIGH SPEED SPACEWIRE SYSTEM IMPLEMENTATIONS

Introduction to ASIC Design

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction.

Introduction to LEON3, GRLIB

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

ID 730L: Getting Started with Multimedia Programming on Linux on SH7724

Transcription:

Welcome to the «SPEAr Age» Structured Processor Enhanced Architecture SPEAr: an HW/SW reconfigurable multi processor architecture COMPUTER PERIPHERAL GROUP

Outline Economics of Moore s law and market view Spear TM architecture From single core to dual core Development flow Application examples Conclusions

Moore s s laws Moore s laws: every 18 months chip complexity doubles cost of fabs will double from one generation to the next 100,000,000 10,000,000 1,000,000 100,000 10,000 1,000 100 10 1 transistors/chip M$ GOOD, BUT... 1970 1980 1990 2000 2010 Complexity Costs

The reality Die size almost constant with new technologies more functions integrated NREs more than double from one generation to the next Cost of masks double from one generation to the next Development costs increase exponentially Revenues for each ASIC is not growing like investments 61% of designs requires at least one respin

The Complexity is increasing Once upon a time there was the Single Function printer, only capable to print a document from a PC It then became a 3-1 Multifuction, capable of printing, scanning and faxing documents It further evolved into a 4-1 Multifuction, capable of printing high quality pictures, working as a stand alone photocopier and with networking capabilities It finally became an Easy-to-use stand alone printing tool for your pictures and small businesses are proliferating

The economics are changing HW cost is predominant SW cost is predominant Few new models per / year Market Trend Many new models per / year 8 7.5 7 Asic Cost / Socket 6.5 6 5.5 5 Need lower NRE! ASIC Cost 1 2 3 4 5 6 7 8 9 10 Yearly MU / Socket ASIC Model works well but proprietary technologies continue to be a differentiation factor Is the Full Asic a sustainable model?

What the market is looking for A good, standard and powerful architecture based on standard IPs easy to reuse easy to migrate to different technology and different silicon vendors Easy access to leading edge technologies lower power consumption higher speeds more functions Fast turn around time market is volatile, pervasion of new technologies is very rapid Simple and early system development the ASIC or the IC is just a small part of the system SW is playing a key role

SPEAr concept PreFabricated Top Layers Metal Info Via Info Metal Info PreFrabricated Base Layers Configurable SoC LB FF With Customization SPEAr change includes of few is set of done mask logic through layers, functions, selective additional which are mask Customer arranged layers IPs which can as are be an different added array to the for base each SPEAr design LB LB device FF Pre-Fabricated set of logic functions Customized selected few routing tracks FF via

SPEAr TM : The architecture

Setting the stage SPEArHead200: single processor SPEArHead200 200kgate ARM926@266MHz BGA420 Introducing SPEArHead200 - the first Configurable SOC Based on a robust state of the art architecture, IP selection and core (ARM9 266Mhz supporting MMU) Including 200 Kgates of configurable logic Introducing a new development methodology Dramatically reducing cycle times 6 weeks vs. 6 months for silicon re-spins Dramatically reducing NRE Typically 1/5 of a full asic NRE First Silicon since Q4/2005

State of the Art Architecture & IPs Vectorized Interrupt MII 10/100 DMA MII USB PHY USB PHY UDC 2.0 UHOSTC 2.0 USB Plug Detect UDC AHB UHOSTC AHB USB PHY UHOSTC 2.0 UHOSTC AHB 32 bit AHB Bus AHB 1 ARM926EJ-S -I Mcache 32K -Dcache 16K -ETM9 -Coproc. Timer GPIO Uart APB Int ctr AHB 2 AHB 3 AHB 4 AHB 5 AHB 6 MPMC SDRAM/ DDR EBI 200Kgates AHB Bus Matrix Serial Flash APB bridge 32 bit APB Bus UART0 (IrDA) Misc Reg UART RTC GPT GPIOs WDT I2C PLL2 cfg PLL1 cfg ADC

Configurable logic Vectorized Interrupt MII 10/100 DMA MII USB PHY USB PHY UDC 2.0 UHOSTC 2.0 USB Plug Detect UDC AHB UHOSTC AHB USB PHY UHOSTC 2.0 UHOSTC AHB 32 bit AHB Bus AHB 1 ARM926EJ-S -I cache 32K -Dcache 16K -ETM9 -Coproc. Timer GPIO Uart APB Int ctr AHB 2 AHB 3 AHB 4 AHB 5 AHB 6 MPMC SDRAM/ DDR EBI Additional 4 KB SRAM Programmable Logic 200Kgates Additional 4KB SRAM Additional 4 KB SRAM Additional 4 KB SRAM gdma Master Slave #1 Slave #2 M AHB Bus Matrix Serial Flash APB bridge 32 bit APB Bus UART0 (IrDA) Misc Reg UART RTC GPT GPIOs WDT I2C PLL2 cfg PLL1 cfg ADC

Support for System Development Vectorized Interrupt MII 10/100 DMA MII USB PHY USB PHY UDC 2.0 UHOSTC 2.0 USB Plug Detect UDC AHB UHOSTC AHB USB PHY UHOSTC 2.0 UHOSTC AHB 32 bit AHB Bus AHB 1 AHB SubSystem M + S M ARM926EJ-S -I Mcache 32K -Dcache 16K -ETM9 -Coproc. Timer GPIO Uart APB Int ctr AHB 2 AHB 3 AHB 4 AHB 5 AHB 6 MPMC SDRAM/ DDR EBI GP I/O Additional 4 KB SRAM Programmable Logic 200Kgates Additional 4KB SRAM Additional 4 KB SRAM Additional 4 KB SRAM gdma Master Slave #1 Slave #2 M AHB Bus Matrix Serial Flash APB bridge 32 bit APB Bus UART0 (IrDA) Misc Reg UART RTC GPT GPIOs WDT I2C PLL2 cfg PLL1 cfg ADC

Innovative System Development Fpga Top Level (ST) Configuration pad set to development mode M S AHB DMA IT C FPGA M S I/O Custo mer Logic M S DMA Port Ctrl GP I/O Port Ctrl S DMA M INT Config. Logic INT RTL Design (Customer) Once the application is working the RTL is extracted from the FPGA and it is converted into config. logics

SPEAr Development Board FPGA Xilinx XC3S4000 Flash For FPGA Serial Flashes Up to 32 MB SPEARHEAD 32 Mbyte DDR Ethernet PHY It comes equipped with a suite of drivers for all the peripherals Supporting Linux and WxWorks Operating Systems Supporting simple JTAg based emulators

Supporting Multiple Interfaces Expansion Lines (120 I/Os) 3 Serial I/F FPGA JTAG I/F 2 USB 2.0 Host Ports HS Capable 1 USB 2.0 Device Ethernet 10/100 ETM9 I/F 24 bit JTAG I/F 16 ADC Channels 6 GPI/O Lines

FW/SW Partitioning ARM core, equipped with Linux (core 2.6.10 Montavista) or VxWorks (extension to Windows CE in evaluation) OS Connectivity Application layers Real time operations Customizable logic for hardware accelerators & specific IPs Network driver USB dev driver IrDA driver UART driver UI driver Video driver ARM μp Linux embedded kernel Job Manager/ Linux Applications USB host driver Scanner driver FAX Print driver driver Image pipeline driver Application Kernel Driver Customizable logic Hardware AFE interface CUSTOMIZABLE LOGIC FAX interface Glue logic Print Head driver Print Engine Step & DC motors driver Image Pipeline support

Moving to dual processor: SPEArPlus600 SPEArHead200 200kgate ARM926@266MHz BGA420 SPEArPlus600 600kgate Dual Arm926 (333MH MHz) XGA Res. BGA420 SPEArHead600 600kgate ARM926@333MHz XGA Res. BGA420 Addressing market segmentation Higher Computation Power with lower power consumption Second Arm to boost performance Easier to use than a dedicated DSP Backward compatible SPEArPlus silicon in Q1/2007 Full reference platform for Printers Expanding into side applications (NAS) SPEArBasic 300kgate ARM926@333MHz BGA196

SPEArPlus600 32KB ROM JPEG accelerator XGADisplay ctrl Vectorized Interrupt MII 10/100/1000 DMA MII USB PHY USB PHY UDC 2.0 UHOSTC 2.0 USB Plug Detect UDC AHB UHOSTC AHB USB PHY UHOSTC 2.0 UHOSTC AHB 32 bit AHB Bus AHB 1 AHB SubSystem M + S M ARM926EJ-S -I Mcache 16K -Dcache 16K -ETM9 -Coproc. Timer GPIO Uart APB Int ctr ARM926EJ-S -I Mcache 16K -Dcache 16K -ETM9 -Coproc. Timer GPIO Uart APB Int ctr AHB 2 AHB 3 AHB 4 AHB 5 AHB 6 MPMC DDR1/2 EBI Up to 120 GP I/O Additional 32 KB SRAM Programmable Logic 600Kgate Additional 32KB SRAM Additional 32 KB SRAM Additional 32 KB SRAM gdma Master Slave #1 Slave #2 M AHB Bus Matrix NAND Flash Serial Flash APB bridge 32 bit APB Bus UART0 (IrDA) SPI UART IrDA UART RTC GPT GPIOs WDT I2C PLL2 cfg PLL1 cfg LVDS ADC

SPEAr PLUS 90nm technology OSCI 32KHz Main highlights: 600 Kgates progr logic array 2 ARM926EJS @333MHz Dithered Pll 1 USB2.0 device 2 USB2.0 host DDR1+DDR2 COMBO PHY CMOS SRC 3.3V, LVDS 2.5V 2 Oscillators ADC 8 channels, 10bits Package: PBGA 420 23x23 OSCI 24MHz 600 Kgates lightspeed programm. logic array 3 USB2 PHYs USB PLL DITH PLL ARM 926 ARM 926 DDR PLL

Technical Characteristics Dual 333Mhz ARM926 16k+16k Architecture Potential TCM extension (up to 32k+32k) 720 (2 x 360) Mips Drystone 2.1 600kgate of configurable logic From 1 to 4 Metals needed to be specialized Ethernet GMAC + USB2.0 Triphy, Timers, RTC, 2 UARTs, 3SPI, IRDA,I 2 C USB: 1 Device port + 2 Host ports Jpeg accelerator, Display controller XGA 1024x768, color TFT panel 4 High Performance Memory controller SDRAM, DDR1/2, Serial Flash, Nand Flash 32bit bus @ 166MHz, total 664 MBytes/Sec 136 KB SRAM & 32KB ROM High Performance System Architecture High speed I/Os overall throughput: ~ 305MByte/sec Four channel DMA 32bit @ 166Mhz: ~ 664MByte/sec Package PBGA 420 balls, 23x23mm, 1 mm pitch

128KB+8KB Internal SRAM memory 32KB 96KB Single Port 32KB 8KB 4 blocks 8 blocks 4KB 32KB 32KB Dual Port 16 blocks 2KB v 8 blocks 4 blocks 2KB 4KB Dual Port memory can be increased using Single Memory Port in arbitrate mode. Maximum flexibility in grouping memory blocks e.g. 10KB= 1 block 8KB+1block 2KB

Multi Core FW/SW Partitioning ARM 1, equipped with Linux (core 2.6.10 Montavista), is used to run: Job Manager & Connectivity ARM2, equipped with RTOS, is used to run: Image pipeline & Real time operations Network driver ARM μp USB host driver USB dev driver Linux embedded kernel IrDA driver Scanner driver UART driver UI driver Job Manager/ Linux Applications FAX driver Video driver Inter Processor Communication ARM μp Print driver RTOS Image Algorithms Library Image Pipeline Real Time Application Image pipeline driver Application Kernel Driver Customizable logic Hardware AFE interface CUSTOMIZABLE LOGIC FAX interface Glue logic Print Head driver Print Engine Step & DC motors driver Image Pipeline support Image Pipeline support Printing Preparation support Customizable logic for hardware accelerators

Application examples: MPEG4 decoding

Case study: MPEG4 decoder Full SW solution Maximum performances on one ARM926 18fps if ARM running at 266MHz 22fps if ARM running at 333Mhz CPU load for a MPEG4 stream simple profile, lev.0-3

MPEG4 decoder HW/SW solution HW accelerator implemented in programmable logic to accelerate the more complex functions of the SW decoder size of the accelerator is 150kgates + 15kB SRAM running at 66MHz load of CPU is <10MHz VGA resolution 30 fps at 333MHz ARM926 Programmable logic

Possible SoC resources usage MPEG4 dec. 18fps MPEG4 dec. 30fps Free for any usage 100% Non additional logic needed Free for any usage 5% 150 kgate ARM1 ARM2 User logic ARM1 ARM2 user logic Full SW solution (ARM running at 266Mhz) HW/SW solution

Application examples: FAX function

FAX Soft Modem Linux Driver Class 1 FAX Software Modem Class 1 Both running on ARM926 DTE DCE FAX APPLICATION Image conversion, compression - ITU-T T.4/T.6 MH, MR (G3), MMR (G4), JBIG (ITU-T T.82), JPEG (ITU-T T.81) User interface Scanning and printing Manage FAX session (ITU-T.30) Exchange and agrees with remote FAX the printing capabilities Exchange and agrees with remote FAX the session and transmission capabilities (protocol, speed, ECM,...) Send / Receive data file in blocks / packets Implement retransmission of bad packets (ECM option) Interface between Application and Driver with AT CMD (Class 1) ITU-T T.31 FAX (Soft) Modem DRIVER - (Class 1) Linux serial driver interface AT CMD interpreter (ITU-T T.31) Modem state machine Connection protocol (ITU-T V.21) with remote FAX Data Modulation (DATA PUMP) V27ter, V.29, V.17, V.34 H/D Interface with HW driver (STLC7550)

Possible SoC resources usage Free for any usage 15% FAX 20kg 5% MPEG4 30fps kgate 150 ARM1 ARM2 user logic (ARM running at 266Mhz)

Conclusions SpearPlus architecture opens the door to unprecedented flexibility HW configurability dual core SW configurability Symmetrical dual core architecture for maximum flexibility Possibility to map different OS Possibility to access all the peripherals from the 2 cores Possibility to access programmable logic from the 2 cores HW flexibility configurable logic on board possibility to use embedded SRAm as TCM of ARM cores

Moving to dual processor Support for System Development Standard IPs & Special IOs: connectivity and interactions ARM926EJ-S M Timer -I cache 16KGPIO -Dcache 16K Uart APB -ETM9 Int ctr -Coproc. Programmable Logic ARM926EJ-S M Timer -I cache 16KGPIO -Dcache 16KUart APB -ETM9 Int ctr -Coproc. State of Art Architecture Addressing market segmentation Higher Computation Power with lower power consumption Second Arm to boost performance Easier to use than a dedicated DSP Backward compatible Full reference platform for Printers Expanding into side applications