Putting MPSOC to Work in Multimedia

Size: px
Start display at page:

Download "Putting MPSOC to Work in Multimedia"

Transcription

1 Putting MPSOC to Work in Multimedia Six billion people want live multimedia entertainment and information anywhere and anytime at the lowest cost 1 1. Multimedia subsystems appear everywhere big market 2. Technical demands require new tools, architectures and business models 3. Diversity within multimedia mirrors diversity of entire MPSOC challenge

2 MPSOC Becomes Mainstream Style The Transformation of the Modern Mobile Device Key complementary processing functions of mobile devices covered by Tensilica configurable processors Mobile WiFi Camera Image Processing User Authentication HD Radio Satellite Radio Bluetooth Audio Codec Video Codec Noise Cancellation DTV Reception Baseband DSP GPS 2

3 Software Rules Network of Partners and Customers Defines Strength Audio Complete The designer never has Multimedia to manually add these instructio Video Imaging Solutions 3 Concept Embedded Systems PnpNetwork Technologies Mobile Media Receiver AFA TECHNOLOGY NUFRONT

4 Design Teams Need Different Entry Points Key Question: Who Specifies Processor? Data Processing Needs Hard-wired logic HiFi2 Audio DSP μcontroller 108 Mini Low-power Controller 385VDO Video Decode 381 VDO Video Decode 545CK High-end DSP 330HiFi Audio Xtensa LX 388VDO Video Codec 383VDO Video Codec 212GP Mid-range Controller/ DSP 570T High-end CPU Xtensa 232L Linux CPU RISC CPU Xtensa configurable processors widely used in multimedia Standard cores introduced February 2006: Broad line of compatible controller, CPU and DSP cores High performance generalpurpose DSP Complete low-power HiFi2 audio solution Video spans wide range of mobile media needs Complex Computing Needs 4

5 5 Capability gcc GNU tools Automation Enables Broad Product Line xcc C/C++ Code Development XTMP System Simulation API Xtensa Xtensa Xtensa Xtensa II III IV V Xtensa Xtensa Xplorer Development Environment Seamless Xtensa LX XPRES Compiler CoWare Xtensa 6 Xtensa Xplorer Development Environment XTSC SystemC Simulation TurboXim Xtensa LX2 Xtensa 7 Energy Modeling Video DSP Audio Ctrlrs Next Xtensa Xplorer Next Modeling MP SW tools Processor Synthesis - next Xtensa LX-next next Video next next Audio DSP Xtensa next Ctrlr - next Software development environment Processor and system modeling Automatic processor synthesis High-performance extensible processors Applicationspecific DSPs Mainstream extensible processors Standard controllers time

6 Multimedia Design is Energy-Based Design Example: Xenergy Energy Explorer Processor Description Base CPU Apps Datapaths Cache Extended Registers OCD Timer FPU Application Code int main( ) { int i; i; short c[100]; for (i=0;i<n/2;i++) { Xtensa xcc C/C++ compiler Xenergy Energy Estimator 6 Tune processor instruction set and memories Complete Dynamic Energy/Power Profile Processor + Memory Tune software algorithms and coding

7 Application-Specific Processors Example: HiFi 2 Audio Engine Pre-configured VLIW audio DSP with dual 24-bit MACs for higher quality audio Already in use in dozens of chip designs, licensed by many top semiconductor suppliers Available 2 ways: Xtensa HiFi 2 Audio Engine for further customization Standard 330HiFi audio core Delivers both low power and scalability <75K gates for full-feature processor Example: MP3 decode at 5.7 MHz (core + memory <0.75mW in 65nm) Emerging trend: multiple instances of HiFi2 on same chip, tuned for different mw vs. MHz Extended Fetch/ Decode SLOT 1 SLOT 0 I-RAM I-Cache Queue I/F Audio Reg File Audio ALU Audio MAC 24b Misc. Audio Functions Reg File ALU Branch Unit VLC Accel. External I/F Bridge (opt.) Load/Store D-CACHE D-RAM 1 D-RAM 2 7

8 Sample of HiFi2 Audio Software 8 Currently available: MP3 decode MP3 encode WMA decode WMA encode AAC-LC decode AAC-LC encode aacplus v2 decode aacplus v2 encode aacplus v1 decode aacplus v1 encode Ogg Vorbis Decoder Dolby AC ch decode Dolby AC-3 2 ch encode Dolby Digital Plus 5.1 ch decode Dolby Digital Plus 7.1 ch decode AM3D Diesel 3D AM3D Zirene effects QSound microq MIDI, 3D, effects SONiVOX AudioiNSIDE MIDI SRS WOW XT audio expansion SRS Xspace 3D SRS TruSurround HD AMR NB codec AMR WB codec Schedule for release: BSAC decoder Dolby TrueHD Dolby Digital Prologic II / IIx Dolby Digital Consumer Encoder 2 channel encoder Dolby Digital Compatible Output 5.1 channel encoder aacplus 5.1/7.1 channel decode SBC Bluetooth stereo codec AMR-WB+G.711 codec G codec G.726 codec DTMF aacplus 5.1/7.1 channel decode Dolby TrueHD Dolby Digital Prologic II / IIx G.167 (AEC) G.168 (LEC) WMA 10 Pro Decoder

9 Growth in Video Resolution and Complexity Computational Complexity of Digital Video 1000 HD 1080p HD 720p D1 (576i) CIF Relative Compute Cycles (MPEG-2 CIF = 1) MPEG-2 MPEG-4 Video Standard H Source: A Performance Characterization of High Definition Digital Video Decoding using H.264/AVC, scaled for CIF

10 VDO Design Goals Video encode & decode up to 720 x fps (NTSC) 720 x fps (PAL) Support multiple coding main profile H.264 main L3 decoder MPEG-4 ASP [no L5 decoder MPEG-2 ML decoder WMV9/VC1 main profile decoder MPEG-4 ASP encoder Programmable solution Less than 100MB/s memory decode bandwidth requirement Operating frequency <200Mhz Can be implemented in 130nm G technology Compatible subsets for decode only and Base Profile only 10

11 Top Level Block Diagram 11

12 VDO Product Family Tensilica Engine Decoders Encoder 381VDO 383VDO Baseline Profiles D1 H.264 Baseline Profile MPEG-4 Simple Profile VC-1/WMV9 Simple Profile MPEG-2 Main Profile H.264 Baseline Profile MPEG-4 Simple Profile VC-1/WMV9 Simple Profile MPEG-2 Main Profile None Main / Advanced Profiles D1 MPEG-4 Simple Profile Standard H.264 Main Profile Decode MPEG-4 Advanced Simple Profile Decode VC-1/WMV9 Main Profile Decode Resolution D1 D1 D1 Frames -per- Second Max. Required Clock Rate 172 MHz 156 MHz 189 MHz VDO 388VDO H.264 Main Profile MPEG-4 Advanced Simple Profile VC-1/WMV9 Main Profile MPEG-2 Main Profile H.264 Main Profile MPEG-4 Advanced Simple Profile VC-1/WMV9 Main Profile MPEG-2 Main Profile None MPEG-4 Advanced Simple Profile MPEG-2 Main Profile Decode MPEG-4 Advanced Simple Profile Encode D1 D MHz 185 MHz

13 Optimizing Energy ISA Extension Example: CABAC Context-adaptive binary arithmetic coding (CABAC) achieves higher compression in H.264 Required for Main Profile H.264 decoding Our solution: Xtensa ISA extensions Peak decode performance is one bin per cycle CABAC decode cycles Millions of cycles per second in unaugmented core 710 Millions of cycles per second in ISA extended core 13 Area cost: 20K gates for CABAC ISA extensions Energy for 1 second 164mJ 5mJ Unaugmented core assumes base core plus 16b MUL, LOOP ops, debug, PIF at max MHz 13

14 Optimizing Energy Example: Motion Estimation for MPEG4 Encode 16x16 SAD Half Pel 8pt SAD Control Motion Estimation Millions of cycles required on unaugmented core Millions of cycles required on ISA extended core Area Cost: Less than 40Kgates for all Motion Estimation extensions Energy for 1 second 988mJ 29mJ Unaugmented core assumes base core plus 16b MUL, LOOP ops, debug, PIF at max MHz 14

15 System Issues in Mobile Video Example: Decode Memory Bandwidth Drives Power 15 Off-chip power is a major 100 component of solution power 80 Early, accurate modeling gives 60 insight and enables novel 40 optimization 20 Non-trivial interactions: 0 Core power increases with bit stream data rate Memory power decreases with bit stream data rate Design of video architecture, DMA 120 and software reduces bandwidth 100 by about one-third (saves 30mW) Additional power savings in onchip interconnect and memory controller Memory Bandwidth (MB/s) Mobile DDR power (mw) 120 Memory Bandwidth Needs H.264 MP bitstream (Mbps) Memory Bandwidth Power Cost Delivered MB/s Estimated power for Micron MT46H16M16LF Mobile DDR at 133MHz: 50% 16B block read, 50% 16B writes

16 Move Up to Fully Programmable HD Approach: five small video task engines + audio ~32B bitstream data 16 Bitstream GP scalar with CABAC/CAVLC accel Inst Cache Inst Cache Processor InterFace (PIF) connect Data RAM Hifi2 Audio Engine Data Cache Data RAM Optional Audio motion vectors Bitstream addr Block data Residual data on 6-8 macroblocks 2-4KB Motion Vector GP FLIX Inst Cache Data Cache RAM Data 1K motion vectors Boundary strength data on 5-7 macroblocks Pixel positions: 3-5 macroblocks Intraframe addr 3KB Intraframe data Off-chip DRAM controller Off-chip DDR Prediction Pixel SIMD Inst Cache Data RAM 3-4 computed macroblocks: <1.5KB Residual Small SIMD Inst Cache computed residuals <100B Data Cache Bridge to legacy system bus Deblocking Pixel SIMD Inst Cache Data RAM Result address & data AHB, AXI, OCP, CoreConnect etc.

17 Baseband DSP Needs Extensible Processors 17 Optimized baseband architecture with DSP-optimized task-engines Explosion of standards and multiple radios dictates agile-but-efficient baseband design Each processor extensible to handle specific algorithm kernels with RTL-like power and throughput Easy composition of engines into efficient pipeline Unified hardware and software development model Baseband Processing Engine MAC processor Encrypt encode processor Modulation Demodulation processor Analog I/F Analog I/F Analog I/F Domain Modulation (e.g. OFDM) Error Coding LDPC Security 3-DES AES SSL Upper layer processing Streaming I/O Interfaces Kernel Accel Package Function FIR FFT Viterbi Turbo decoding Kernel Accel Package Special Function Memories Kernel Accel Package Kernel Accel Package Xtensa DSP Foundation Versus TI C64x TI C64x TI C64x TI C62x TI C64x RISC RISC RISC RISC Efficiency (cycle reduction) 4x 3x 3x 18x ~100x 43x 300x 4x 2x

18 System Level Design Matters Example: Audio System Optimization System-level design tools make a big difference Tensilica supports wealth of system modeling: ISS, TurboXim, XTMP C modeling, XTSC System C modeling, CoWare, Seamless Tensilica Xenergy Energy Estimator provides rapid, accurate configuration-specific application power analysis Example: Audio system organization analysis: Relative power PLLs & clock trees Inst cache Other subsystems HiFi2 core PIF Data cache On-chip SRAM DDR/Flash controller Mobile DDR (32M x 16b) Audio DACs System Organization 90G 90G high-vt Power includes dynamic and static at 4K/8K DM + DDR minimum WMA decode MHz Power includes HiFi2 core, I cache, D 4K/8K DM + 192K on-chip cache, On-chip SRAM, DDR (no 16K/32K DM + DDR refresh) I/O through queues 16K/32K DM + 192K on-chip K/32K + 128K on-chip + DDR

19 MPSOC System Design is Collaborative Example: Xtensa in CoWare Platform Architect Configured subsystems with rich interfaces: AMBA bus Local memories Caches Special lookup memories Input queues Output queues Input wires Exported state wires Interrupts 19 Wrapper automatically generated from processor configuration

20 SLOT 1 SLOT 0 Putting It Together The principles Multimedia SOC is not rocket science : hundreds of designs underway around the world Pay attention to best practices: 1.Leverage existing design standards, tools, interconnect and subsystem IP 2.Model early and often. Cost, performance, power largely set in architectural phase 3.Application standards explode. Build in programmability everywhere 4.Look at total bandwidth and total power, especially in offchip communications Differentiation, time-to-market and high efficiency are compatible! Memory controller I/O interfaces Video On-chip interconnect System CPU Fetch/Decode Reg File ALU Branch Unit MAC 16 DSP Load/Store MMU I-Cache External I/F 32 PIF D-Cache MMU Bridge (opt.) 32 AMBA AHB lite MAC processor Queue I/F DSP Encrypt/encode processor Audio Exten ded Fetch/ Decode Modulation processor I-RAM I-Cache Audio Reg Reg File File Audio AL ALU U Audio Bra MAC nch Misc. 24b Unit VL Audio C Functio Acc ns el. D-CACHE D-RAM 1 Load/Store D-RAM 2 External I/F Bridge (opt.) Audio DAC RF 20

21 21 Thank you!

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine PRODUCT BRIEF ConnX D2 DSP Engine Dual-MAC, 16-bit Fixed-Point Communications DSP FEATURES BENEFITS Both SIMD and 2-way FLIX (parallel VLIW) operations Optimized, vectorizing XCC Compiler High-performance

More information

SA-1500: A 300 MHz RISC CPU with Attached Media Processor*

SA-1500: A 300 MHz RISC CPU with Attached Media Processor* and Bridges Division SA-1500: A 300 MHz RISC CPU with Attached Media Processor* Prashant P. Gandhi, Ph.D. and Bridges Division Computing Enhancement Group Intel Corporation Santa Clara, CA 95052 Prashant.Gandhi@intel.com

More information

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content

More information

Ten Reasons to Optimize a Processor

Ten Reasons to Optimize a Processor By Neil Robinson SoC designs today require application-specific logic that meets exacting design requirements, yet is flexible enough to adjust to evolving industry standards. Optimizing your processor

More information

Xtensa 7 Configurable Processor Core

Xtensa 7 Configurable Processor Core FEATURES 32-bit synthesizable RISC architecture with 5-stage pipeline, 16/24-bit instruction encoding with modeless switching Designer-configurable processor options (MMU/MPU, local memory types and sizes,

More information

Configurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc.

Configurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc. Configurable s for SOC Design Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc. Why Listen to This Presentation? Understand how SOC design techniques, now nearly 20 years old, are

More information

Venezia: a Scalable Multicore Subsystem for Multimedia Applications

Venezia: a Scalable Multicore Subsystem for Multimedia Applications Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and

More information

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla. HotChips 2007 An innovative HD video and digital image processor for low-cost digital entertainment products Deepu Talla Texas Instruments 1 Salient features of the SoC HD video encode and decode using

More information

Emerging Architectures for HD Video Transcoding. Jeremiah Golston CTO, Digital Entertainment Products Texas Instruments

Emerging Architectures for HD Video Transcoding. Jeremiah Golston CTO, Digital Entertainment Products Texas Instruments Emerging Architectures for HD Video Transcoding Jeremiah Golston CTO, Digital Entertainment Products Texas Instruments Overview The Need for Transcoding System Challenges Transcoding Approaches and Issues

More information

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &

More information

Emerging Architectures for HD Video Transcoding. Leon Adams Worldwide Manager Catalog DSP Marketing Texas Instruments

Emerging Architectures for HD Video Transcoding. Leon Adams Worldwide Manager Catalog DSP Marketing Texas Instruments Emerging Architectures for HD Video Transcoding Leon Adams Worldwide Manager Catalog DSP Marketing Texas Instruments Overview The Need for Transcoding System Challenges Transcoding Approaches and Issues

More information

Adding C Programmability to Data Path Design

Adding C Programmability to Data Path Design Adding C Programmability to Data Path Design Gert Goossens Sr. Director R&D, Synopsys May 6, 2015 1 Smart Products Drive SoC Developments Feature-Rich Multi-Sensing Multi-Output Wirelessly Connected Always-On

More information

Classification of Semiconductor LSI

Classification of Semiconductor LSI Classification of Semiconductor LSI 1. Logic LSI: ASIC: Application Specific LSI (you have to develop. HIGH COST!) For only mass production. ASSP: Application Specific Standard Product (you can buy. Low

More information

Software Defined Modem A commercial platform for wireless handsets

Software Defined Modem A commercial platform for wireless handsets Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from

More information

Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor. NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis

Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor. NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis University of Thessaly Greece 1 Outline Introduction to AVS

More information

Long Words and Wide Ports: Reinventing the Configurable Processor

Long Words and Wide Ports: Reinventing the Configurable Processor The Configurable Processor Company Long Words and Wide Ports: Reinventing the Configurable Processor Dhanendra Jani Gulbin Ezer James Kim SOC Design Challenges 1. Ever more complex requirements Media-Centric

More information

Low-Power Processor Solutions for Always-on Devices

Low-Power Processor Solutions for Always-on Devices Low-Power Processor Solutions for Always-on Devices Pieter van der Wolf MPSoC 2014 July 7 11, 2014 2014 Synopsys, Inc. All rights reserved. 1 Always-on Mobile Devices Mobile devices on the move Mobile

More information

Anand Raghunathan

Anand Raghunathan ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.15: ASIP: Approaches to Design Anand Raghunathan raghunathan@purdue.edu ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014, ME 1052,

More information

Platform-based Design

Platform-based Design Platform-based Design The New System Design Paradigm IEEE1394 Software Content CPU Core DSP Core Glue Logic Memory Hardware BlueTooth I/O Block-Based Design Memory Orthogonalization of concerns: the separation

More information

Fujitsu SOC Fujitsu Microelectronics America, Inc.

Fujitsu SOC Fujitsu Microelectronics America, Inc. Fujitsu SOC 1 Overview Fujitsu SOC The Fujitsu Advantage Fujitsu Solution Platform IPWare Library Example of SOC Engagement Model Methodology and Tools 2 SDRAM Raptor AHB IP Controller Flas h DM A Controller

More information

Each Milliwatt Matters

Each Milliwatt Matters Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets

More information

Understanding Sources of Inefficiency in General-Purpose Chips

Understanding Sources of Inefficiency in General-Purpose Chips Understanding Sources of Inefficiency in General-Purpose Chips Rehan Hameed Wajahat Qadeer Megan Wachs Omid Azizi Alex Solomatnikov Benjamin Lee Stephen Richardson Christos Kozyrakis Mark Horowitz GP Processors

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended

More information

An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform

An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform Youn-Long Lin Department of Computer Science National Tsing Hua University Hsin-Chu, TAIWAN 300 ylin@cs.nthu.edu.tw 2006/08/16

More information

Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder. Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India

Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder. Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India Outline of Presentation MPEG-2 to H.264 Transcoding Need for a multiprocessor

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

DSP Solutions For High Quality Video Systems. Todd Hiers Texas Instruments

DSP Solutions For High Quality Video Systems. Todd Hiers Texas Instruments DSP Solutions For High Quality Video Systems Todd Hiers Texas Instruments TI Video Expertise Enables Faster And Easier Product Innovation TI has a long history covering the video market from end to end

More information

Original PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy

Original PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy Competitors using generic parts Performance benefits to be had for custom design Original PlayStation: no vector processing or floating point support Geometry issues Photorealism at the core of design

More information

Universität Dortmund. ARM Architecture

Universität Dortmund. ARM Architecture ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture

More information

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous

More information

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc.

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc. Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc. Presentation Overview Yet Another Processor? No, a new way of building systems Puts system designers in the

More information

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block

More information

Development of Low Power and High Performance Application Processor (T6G) for Multimedia Mobile Applications

Development of Low Power and High Performance Application Processor (T6G) for Multimedia Mobile Applications Session 8D-2 Development of Low Power and High Performance Application Processor (T6G) for Multimedia Mobile Applications Yoshiyuki Kitasho, Yu Kikuchi, Takayoshi Shimazawa, Yasuo Ohara, Masafumi Takahashi,

More information

Flash Memory Summit 2011

Flash Memory Summit 2011 1 Billion cores Memory Summit 2011 Session 302: Nonvolatile Design Challenges and Methodologies The Processor s role in maximizing performance and reducing energy consumption Neil Robinson Tensilica At

More information

Cannon Mountain Dr Longmont, CO LS6410 Hardware Design Perspective

Cannon Mountain Dr Longmont, CO LS6410 Hardware Design Perspective LS6410 Hardware Design Perspective 1. S3C6410 Introduction The S3C6410X is a 16/32-bit RISC microprocessor, which is designed to provide a cost-effective, lowpower capabilities, high performance Application

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

PACE: Power-Aware Computing Engines

PACE: Power-Aware Computing Engines PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious

More information

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02 Fujitsu System Applications Support 1 Overview System Applications Support SOC Application Development Lab Multimedia VoIP Wireless Bluetooth Processors, DSP and Peripherals ARM Reference Platform 2 SOC

More information

Video Encoding with. Multicore Processors. March 29, 2007 REAL TIME HD

Video Encoding with. Multicore Processors. March 29, 2007 REAL TIME HD Video Encoding with Multicore Processors March 29, 2007 Video is Ubiquitous... Demand for Any Content Any Time Any Where Resolution ranges from 128x96 pixels for mobile to 1920x1080 pixels for full HD

More information

Flexible wireless communication architectures

Flexible wireless communication architectures Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April

More information

ID 730L: Getting Started with Multimedia Programming on Linux on SH7724

ID 730L: Getting Started with Multimedia Programming on Linux on SH7724 ID 730L: Getting Started with Multimedia Programming on Linux on SH7724 Global Edge Ian Carvalho Architect 14 October 2010 Version 1.0 Mr. Ian Carvalho System Architect, Global Edge Software Ltd. Responsible

More information

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications N.C. Paver PhD Architect Intel Corporation Hot Chips 16 August 2004 Age nda Overview of the Intel PXA27X processor

More information

Product Technical Brief S3C2416 May 2008

Product Technical Brief S3C2416 May 2008 Product Technical Brief S3C2416 May 2008 Overview SAMSUNG's S3C2416 is a 32/16-bit RISC cost-effective, low power, high performance micro-processor solution for general applications including the GPS Navigation

More information

The ARM10 Family of Advanced Microprocessor Cores

The ARM10 Family of Advanced Microprocessor Cores The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Sounding Better Than Ever: High Quality Audio. Simon Forrest Connected Home Marketing

Sounding Better Than Ever: High Quality Audio. Simon Forrest Connected Home Marketing Sounding Better Than Ever: High Quality Audio Simon Forrest Connected Home Marketing www.imgtec.com A brief look at the numbers Market trends Worldwide audio market 2014 67.9m units shipped 16% increase

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Digital Signal Processor 2010/1/4

Digital Signal Processor 2010/1/4 Digital Signal Processor 1 Analog to Digital Shift 2 Digital Signal Processing Applications FAX Phone Personal Computer Medical Instruments DVD player Air conditioner (controller) Digital Camera MP3 audio

More information

Versal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm

Versal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm Engineering Director, Xilinx Silicon Architecture Group Versal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm Presented By Kees Vissers Fellow February 25, FPGA 2019 Technology scaling

More information

Embedded Computation

Embedded Computation Embedded Computation What is an Embedded Processor? Any device that includes a programmable computer, but is not itself a general-purpose computer [W. Wolf, 2000]. Commonly found in cell phones, automobiles,

More information

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor

More information

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

Building blocks for 64-bit Systems Development of System IP in ARM

Building blocks for 64-bit Systems Development of System IP in ARM Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects

More information

Specializing Hardware for Image Processing

Specializing Hardware for Image Processing Lecture 6: Specializing Hardware for Image Processing Visual Computing Systems So far, the discussion in this class has focused on generating efficient code for multi-core processors such as CPUs and GPUs.

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Understanding Sources of Inefficiency in General-Purpose Chips. Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE

Understanding Sources of Inefficiency in General-Purpose Chips. Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE Understanding Sources of Inefficiency in General-Purpose Chips Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE 1 Outline Motivation H.264 Basics Key ideas Implementation & Evaluation Summary

More information

Contents of this presentation: Some words about the ARM company

Contents of this presentation: Some words about the ARM company The architecture of the ARM cores Contents of this presentation: Some words about the ARM company The ARM's Core Families and their benefits Explanation of the ARM architecture Architecture details, features

More information

3D Graphics in Future Mobile Devices. Steve Steele, ARM

3D Graphics in Future Mobile Devices. Steve Steele, ARM 3D Graphics in Future Mobile Devices Steve Steele, ARM Market Trends Mobile Computing Market Growth Volume in millions Mobile Computing Market Trends 1600 Smart Mobile Device Shipments (Smartphones and

More information

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU

More information

2008/12/23. System Arch 2008 (Fire Tom Wada) 1

2008/12/23. System Arch 2008 (Fire Tom Wada) 1 Digital it Signal Processor System Arch 2008 (Fire Tom Wada) 1 Analog to Digital Shift System Arch 2008 (Fire Tom Wada) 2 Digital Signal Processing Applications FAX Phone Personal Computer Medical Instruments

More information

CEVA-X1 Lightweight Multi-Purpose Processor for IoT

CEVA-X1 Lightweight Multi-Purpose Processor for IoT CEVA-X1 Lightweight Multi-Purpose Processor for IoT 1 Cellular IoT for The Massive Internet of Things Narrowband LTE Technologies Days Battery Life Years LTE-Advanced LTE Cat-1 Cat-M1 Cat-NB1 >10Mbps Up

More information

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits NOC INTERCONNECT IMPROVES SOC ECONO CONOMICS Initial Investment is Low Compared to SoC Performance and Cost Benefits A s systems on chip (SoCs) have interconnect, along with its configuration, verification,

More information

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2013 Agenda Introduction

More information

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF.

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems Liang-Gee Chen Distinguished Professor General Director, SOC Center National Taiwan University DSP/IC Design Lab, GIEE, NTU 1

More information

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech

More information

Computer Architecture Dr. Charles Kim Howard University

Computer Architecture Dr. Charles Kim Howard University EECE416 Microcomputer Fundamentals Computer Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer Architecture Art of selecting and interconnecting hardware components to create

More information

Tile Processor (TILEPro64)

Tile Processor (TILEPro64) Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

Enabling the design of multicore SoCs with ARM cores and programmable accelerators

Enabling the design of multicore SoCs with ARM cores and programmable accelerators Enabling the design of multicore SoCs with ARM cores and programmable accelerators Target Compiler Technologies www.retarget.com Sol Bergen-Bartel China Business Development 03 Target Compiler Technologies

More information

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

VLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design

VLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design VLIW DSP Processor Design for Mobile Communication Applications Contents crafted by Dr. Christian Panis Catena Radio Design Agenda Trends in mobile communication Architectural core features with significant

More information

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola 1. Microprocessor Architectures 1.1 Intel 1.2 Motorola 1.1 Intel The Early Intel Microprocessors The first microprocessor to appear in the market was the Intel 4004, a 4-bit data bus device. This device

More information

EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture

EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2014 Agenda

More information

SoC Platforms and CPU Cores

SoC Platforms and CPU Cores SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Markets Demanding More Performance

Markets Demanding More Performance The Tile Processor Architecture: Embedded Multicore for Networking and Digital Multimedia Tilera Corporation August 20 th 2007 Hotchips 2007 Markets Demanding More Performance Networking market - Demand

More information

Effective System Design with ARM System IP

Effective System Design with ARM System IP Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1 Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Long Term Trends for Embedded System Design

Long Term Trends for Embedded System Design Long Term Trends for Embedded System Design DSD 2004 A. A. Jerraya TIMA Laboratory 46 Avenue Felix Viallet 38031 Grenoble Cedex France Tel: +33 476 57 47 59 Fax: +33 476 47 38 14 Email: Ahmed.Jerraya@imag.fr

More information

Computers and Microprocessors. Lecture 34 PHYS3360/AEP3630

Computers and Microprocessors. Lecture 34 PHYS3360/AEP3630 Computers and Microprocessors Lecture 34 PHYS3360/AEP3630 1 Contents Computer architecture / experiment control Microprocessor organization Basic computer components Memory modes for x86 series of microprocessors

More information

A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design

A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design Ahmed Amine JERRAYA EPFL November 2005 TIMA Laboratory 46 Avenue Felix Viallet 38031 Grenoble CEDEX, France Email: Ahmed.Jerraya@imag.fr

More information

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the leading provider of RISC-V processor IP Codasip Bk: A portfolio of RISC-V processors Uniquely

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface

More information

Automotive Challenges Addressed by Standard and Non-Standard Based IP D&R April 2018 Meredith Lucky VP of Sales, CAST, Inc.

Automotive Challenges Addressed by Standard and Non-Standard Based IP D&R April 2018 Meredith Lucky VP of Sales, CAST, Inc. Automotive Challenges Addressed by Standard and Non-Standard Based IP D&R April 2018 Meredith Lucky VP of Sales, CAST, Inc. Automotive Interface Controller Cores 1 Increasing Needs/New Challenges Outlook

More information

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005 Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides:

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:

More information

Digital Signal Processor Core Technology

Digital Signal Processor Core Technology The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x

More information

Cut DSP Development Time Use C for High Performance, No Assembly Required

Cut DSP Development Time Use C for High Performance, No Assembly Required Cut DSP Development Time Use C for High Performance, No Assembly Required Digital signal processing (DSP) IP is increasingly required to take on complex processing tasks in signal processing-intensive

More information

MICROPROCESSOR SYSTEM FOR VISUAL BAKED PRODUCTS CONTROL

MICROPROCESSOR SYSTEM FOR VISUAL BAKED PRODUCTS CONTROL MICROPROCESSOR SYSTEM FOR VISUAL BAKED PRODUCTS CONTROL Krassimir Kolev, PhD University of Food Technologies - Plovdiv, Bulgaria Abstract The paper reports an authentic solution of a microprocessor system

More information

MPSoC Design Space Exploration Framework

MPSoC Design Space Exploration Framework MPSoC Design Space Exploration Framework Gerd Ascheid RWTH Aachen University, Germany Outline Motivation: MPSoC requirements in wireless and multimedia MPSoC design space exploration framework Summary

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

IP Device Integration Notes

IP Device Integration Notes IP Device Integration Notes Article ID: V1-13-11-06-t Release Date: 11/06/2013 Applied to GV-System version 8.5.8 Naming and Definition GV-System GeoVision Analog and Digital Video Recording Software.

More information

Next Generation Technology from Intel Intel Pentium 4 Processor

Next Generation Technology from Intel Intel Pentium 4 Processor Next Generation Technology from Intel Intel Pentium 4 Processor 1 The Intel Pentium 4 Processor Platform Intel s highest performance processor for desktop PCs Targeted at consumer enthusiasts and business

More information

XT Node Architecture

XT Node Architecture XT Node Architecture Let s Review: Dual Core v. Quad Core Core Dual Core 2.6Ghz clock frequency SSE SIMD FPU (2flops/cycle = 5.2GF peak) Cache Hierarchy L1 Dcache/Icache: 64k/core L2 D/I cache: 1M/core

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

Adaptable Intelligence The Next Computing Era

Adaptable Intelligence The Next Computing Era Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion

More information