Unifying Microarchitecture for Embedded Media Processing

Similar documents
THE ROLE OF PROGRAMMABLE DIGITAL SIGNAL PROCESSORS (DSP) FOR 3G MOBILE COMMUNICATION SYSTEMS

The S6000 Family of Processors

XMEGA Series Of AVR Processor. Presented by: Manisha Biyani ( ) Shashank Bolia (

Graduate Institute of Electronics Engineering, NTU 9/16/2004

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

EMBEDDED SYSTEM BASICS AND APPLICATION

PRODUCT PREVIEW TNETV1050 IP PHONE PROCESSOR. description

Embedded Systems. 7. System Components

Chapter 5. Introduction ARM Cortex series

ARM ARCHITECTURE. Contents at a glance:

ARM Processors for Embedded Applications

IAR Embedded Workbench

Overview of Microcontroller and Embedded Systems

Module Introduction! PURPOSE: The intent of this module, 68K to ColdFire Transition, is to explain the changes to the programming model and architectu

Five Ways to Build Flexibility into Industrial Applications with FPGAs

EMBEDDED SYSTEM FOR VIDEO AND SIGNAL PROCESSING

Embedded Systems Lab Lab 1 Introduction to Microcontrollers Eng. Dalia A. Awad

LX4180. LMI: Local Memory Interface CI: Coprocessor Interface CEI: Custom Engine Interface LBC: Lexra Bus Controller

Tutorial Introduction

Advance Information 24-BIT GENERAL PURPOSE DIGITAL SIGNAL PROCESSOR

,1752'8&7,21. Figure 1-0. Table 1-0. Listing 1-0.

Product Technical Brief S3C2440X Series Rev 2.0, Oct. 2003

WS_CCESBF7-OUT-v1.00.doc Page 1 of 8

The World Leader in High Performance Signal Processing Solutions. DSP Processors

Choosing a Micro for an Embedded System Application

HIGH SPEED SIGNAL CHAIN SELECTION GUIDE

Optimal Porting of Embedded Software on DSPs

4. Hardware Platform: Real-Time Requirements

Embedded Systems: Architecture

CROSSWARE 7 V8051NT Virtual Workshop for Windows. q Significantly reduces software development timescales

Understanding the basic building blocks of a microcontroller device in general. Knows the terminologies like embedded and external memory devices,

A236 Video DSP Chip Evaluation Board #2 with FingerPrint Option

Section 6 Blackfin ADSP-BF533 Memory

Copyright 2016 Xilinx

EE 354 Fall 2015 Lecture 1 Architecture and Introduction

V8-uRISC 8-bit RISC Microprocessor AllianceCORE Facts Core Specifics VAutomation, Inc. Supported Devices/Resources Remaining I/O CLBs

3.1 Description of Microprocessor. 3.2 History of Microprocessor

Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools

Course Introduction. Purpose: Objectives: Content: Learning Time:

SRAM SRAM SRAM. Data Bus EXTAL ESSI KHz MHz. In Headphone CS MHz. Figure 1 DSP56302EVM Functional Block Diagram

WS_CCESSH5-OUT-v1.01.doc Page 1 of 7

Computer Organization and Microprocessors SYLLABUS CHAPTER - 1 : BASIC STRUCTURE OF COMPUTERS CHAPTER - 3 : THE MEMORY SYSTEM

UNIT II PROCESSOR AND MEMORY ORGANIZATION

The MAXQ TM Family of High Performance Microcontrollers

ECE 471 Embedded Systems Lecture 2

Chapter 4. Enhancing ARM7 architecture by embedding RTOS

WS_CCESSH-OUT-v1.00.doc Page 1 of 8

Memory Systems IRAM. Principle of IRAM

ATmega128. Introduction

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

With Fixed Point or Floating Point Processors!!

Unlocking the Potential of Your Microcontroller

Classification of Semiconductor LSI

SRAM SRAM SRAM SCLK khz

Microcontroller Not just a case of you say tomarto and I say tomayto

Babu Madhav Institute of Information Technology, UTU

Low-Power Processor Solutions for Always-on Devices

MICROPROCESSOR BASED SYSTEM DESIGN

EC EMBEDDED AND REAL TIME SYSTEMS

VIA ProSavageDDR KM266 Chipset

Contents of this presentation: Some words about the ARM company

MARIE: An Introduction to a Simple Computer

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications

Implementation of DSP Algorithms

Embedded Computation

ELC4438: Embedded System Design Embedded Processor

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

D Demonstration of disturbance recording functions for PQ monitoring

STM32 F0 Value Line. Entry-level MCUs

Programming in the MAXQ environment

CEIBO FE-51RD2 Development System

Military Grade SmartFusion Customizable System-on-Chip (csoc)

突破 8-/16-/32- 位和 DSP 界限的 ARM MCU 解决方案

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.

8051 Microcontroller

Chapter 4. MARIE: An Introduction to a Simple Computer

High-Performance 32-bit

Microcontroller: CPU and Memory

Real-time for Windows NT

ZLF645 Crimzon Flash Microcontroller with ZBase Database Industry Leading Universal Infrared Remote Control (UIR) Solution

Project Title: Design and Implementation of IPTV System Project Supervisor: Dr. Khaled Fouad Elsayed

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking

Microcontroller basics

Computer Hardware Requirements for ERTSs: Microprocessors & Microcontrollers

CS370 Operating Systems

Introduction to Embedded Systems

Microprocessor or Microcontroller Not just a case of you say tomarto and I say tomayto

RM3 - Cortex-M4 / Cortex-M4F implementation

CS370 Operating Systems

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi. Lecture - 10 System on Chip (SOC)

Outline: System Development and Programming with the ADSP-TS101 (TigerSHARC)

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication

Blackfin Optimizations for Performance and Power Consumption

Emergence of Segment-Specific DDRn Memory Controller and PHY IP Solution. By Eric Esteve (PhD) Analyst. July IPnest.

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

Microprocessor or Microcontroller Not just a case of you say tomarto and I say tomayto

CISC RISC. Compiler. Compiler. Processor. Processor

Introduction CHAPTER IN THIS CHAPTER

AVR Microcontrollers Architecture

Transcription:

Unifying Microarchitecture for Embedded Media Processing

Unifying Microarchitecture for Embedded Media Processing As more and more applications require audio, video and communications processing capabilities, the requirements placed on embedded processors used in portable devices and edge-client devices have become more computationally and bandwidth intensive. Both RISC microcontrollers (MCU) and DSPs have served these applications. While RISC processors are traditionally architected to enable efficient asynchronous control flow, DSPs are architected to perform well for synchronous, constant-rate data flow (for example, audio or voice-band applications). Because so many embedded applications have intense requirements for both control and media processing, engineers have typically turned to inelegantly grafting DSPs and MCUs together in one form or another, either at the board level or in system-on-chip (SoC) integration. Together, the respective functional aspects of RISC processors and DSPs unite as the perfect processing engine for a wide variety of multimedia applications and products, such as cellular telephones, digital cameras, portable networked audio/video devices, and so on. Various MCU makers have incorporated some signal processing functionality, such as instruction-set extensions (e.g., MMX from Intel, 3dNow! from AMD, VIS Instruction set from Sun, etc.) and MAC units, but even this approach lacks the essential architectural basis required for advanced media processing applications. It is important to note that engineers are attracted to these kinds of multiprocessor or multicore design techniques for one reason only: No single processor has the processing power or instruction characteristics to meet the requirements of their applications. This especially rings true for high performance, media-rich embedded applications. Fundamentally, multiprocessing is only employed in the absence of a viable single-processor alternative. Despite the seemingly simple solution offered by multiprocessing, the reality is that no one would suffer the complexities of a multiprocessing environment except as an absolute necessity to meet the needs of a particular application. Thus, the need for a powerful unified microprocessor for embedded media applications has long been evident. However, it was not until Analog Devices and Intel jointly developed the high performance Micro Signal Architecture (on which all Blackfin devices are based) that a single architecture was powerful enough, inexpensive enough, and truly optimized both for the complex, real-time world of media data flow and for the control-oriented tasks typically handled by RISC processors. This unified approach to the increasingly media-rich embedded application space is clearly an ideal replacement for previous heterogeneous DSP/MCU integration techniques. Blackfin takes the unique step of architecturally combining media processing attributes like dual MACs (multiply-accumulate engines, commonly used for high performance DSP applications) and classic RISC characteristics like a memory management capability that facilitates simplified, enterprise-level programming modes and styles. The device has DSP features not found on any RISC microcontroller and important microcontroller characteristics not typically on DSPs. This paper provides an overview of the architectural and functional features that make Blackfin an ideal embedded media processor. Unifying Microarchitecture for Embedded Media Processing Analog Devices, Inc. 1

Functional focus DSPs and microcontrollers have classically remained separate because they are each architecturally optimized for very different kinds of tasks. DSP applications usually focus on performing as many arithmetic computations (such as MAC operations) as possible in the fewest number of core clock cycles. To accomplish this, DSPs often use esoteric VLIW (Very Long Instruction Word) instructions, leaving code density as a secondary concern. Also, DSP applications are data-bandwidth intensive; thus, they often have large memory databuses and DMA engines to reduce the data movement load on the core processor. For media-processing applications and formats, DSPs require ancillary processors to make up for a lack of flexibility. However, in addition to native support for 8-bit data (the word size common to read red-green-blue pixel-processing algorithms), Blackfin s enhanced video instructions and video ALUs process rich-media bit streams at up to 10 times the performance of DSP-only implementations. The unified architecture is designed to support software that can efficiently execute video compression, motion estimation, and Huffman coding algorithms used by video and image-processing standards such as MPEG2, MPEG4, and JPEG. Implementing video algorithms in software allows OEMs to adapt to evolving standards and new functional requirements without hardware changes. The core architecture allows for support of algorithms such as MPEG2, MPEG4, and JPEG compression. The integrated video instructions also eliminate complex and confusing communications between the main processor and a separate video CODEC. These features help lower overall system cost while improving the time to market for the end application. The control functions that are typically the focus of microcontrollers, on the other hand, involve many conditional operations, with frequent changes in program flow. These programs are most often written in C or C++, and code density is vital, making variable-length instructions a crucial architectural feature. The Blackfin architecture is optimized for both media and microcontroller functions. System Control Blocks Emulation and Test Control Voltage Regulation Event Controller Dynamic Power Management Controller Memory DMA Watchdog Timer Real-Time Clock Blackfin Processor Core L1 Instruction /Cache L1 Data /Cache L1 Scratchpad RAM L2 Memory External Memory Interface (ASYNC, SDRAM) System Interface Unit and DMA Controller High Speed I/O Low Speed Peripherals Parallel Peripheral Interface/GPIO Unifying Microarchitecture for Embedded Media Processing Analog Devices, Inc. 2

The unified Blackfin architecture realizes the benefits of both approaches. Its variable-length instruction set extends all the way up to 64-bit opcodes used in DSP inner loops, but the Blackfin instruction set is optimized so that 16-bit opcodes represent the most frequently used instructions. Thus, compiled Blackfin code density figures are competitive with those of popular microcontrollers, and like virtually all microcontrollers, the Blackfin architecture is optimized for use with a C/C++ compiler. Its fully interlocked pipeline and algebraicsyntax assembly language make it easy for developers to write in either high level language or assembly with equal ease. Thus, Blackfin s architectural nature is not optimized for either media or microcontroller functions at the expense of the other rather, it is truly optimized for both, and this is the crux of the breakthrough nature of the Blackfin devices. Enterprise-class features Because they have system-level responsibilities, true embedded media processors cannot simply operate in the same egocentric fashion as a focused, number-crunching DSP. They must instead have a full set of enterprise-level security characteristics like memory-management capabilities that define separate, freely accessible application-development spaces while keeping distinct code sections safe from being overwritten and the overall system intact. Blackfin, like MCUs that support full-featured embedded operating systems, has both protected and unprotected operating modes. Respectively, Blackfin calls these User mode and Supervisor mode. These crucial protection features prevent users from unknowingly or maliciously accessing or affecting shared parts of a Blackfin-based system. User mode prevents users or code from doing anything that affects system-level resources. These kinds of security and system integrity issues are simply not needed by nor architected into traditional DSPs. Like a microcontroller, Blackfin also allows both asynchronous interrupts and synchronous exceptions. Both kinds of events cause pipelined instructions to suspend execution in favor of servicing the triggering event. Blackfin s mappable interrupt priorities are a feature common to microcontrollers, but not usually found in DSPs. The chip s exception-handling capabilities are enterprise-class features that protect a Blackfin-based embedded system from invalid or illegal programming. In today s security-focused environment, Blackfin s exceptions are an important means of ensuring that no one succeeds in executing an instruction that doesn t exist, or accesses memory that s off limits or not defined (DSPs, on the other hand, allow much easier raw access to the metal with few protections). Like other embedded processors, Blackfin can generate interrupts and exceptions in software, facilitating an interface from User mode into the unprotected domain of Supervisor mode. System-class peripherals with media-class data flow Blackfin s rich set of on-chip peripherals makes the devices ideal for the control side of embedded applications. Blackfin devices integrate a real-time clock, a watchdog timer, general-purpose timers, bidirectional flag/interrupt pins, SPI-compatible ports and UARTs. Some members of the family include PCI and USB connectivity. These peripherals are typical of control-oriented RISC processors. Blackfin devices also provide streaming-media-oriented peripherals, such as a parallel peripheral interface (PPI) for connecting to high speed video and data converters, and synchronous serial ports for connecting to high resolution digital audio devices and high speed telecom interfaces. One of the problems many developers encounter when trying to apply MCUs to media-processing applications is supporting memory bandwidth fast enough for streaming data. Embedded media processors unequivocally must have wide open DMA capabilities for getting data blocks on and off the chip. With the large amounts of data movement required by most media-processing applications, data movement cannot be permitted to cause processor interrupts that would affect real-time performance, and it is inefficient for the core processor to be Unifying Microarchitecture for Embedded Media Processing Analog Devices, Inc. 3

involved in data movement. The DMA channels are independent of the processor core, allowing it to operate deterministically and allowing the DSP data flow in any embedded application to be completely independent of the control processes. In typical applications, raw data gets DMA d into the media processor from the chip s peripheral(s), then gets DMA d to and from external memory during media processing, and then the processed data gets DMA d back out to the peripheral(s) or to system memory. Blackfin has 16 high speed DMA channels to support bidirectional streaming-media channels between peripherals and memory, virtually eliminating data-movement responsibility from the processor. A unified programming model Blackfin is not a DSP with an enhanced instruction set, nor is it a microcontroller with DSP extensions. The device is an equally high performance media processor and compiler-friendly processor that will be familiar and satisfying to both classes of developers. It allows simple bit-level manipulation, the use of algebraic assembly syntax, and an memory model for low latency access to assembly modules. Dedicated L1 memory for stack and heap pace, dedicated stack and frame pointers, enhanced address generators, and a very large linear address map make Blackfin as good a compiler target as any RISC processor on the market. The Blackfin architecture allows L1 fast system memory to be defined as either cache or. This is another example of how Blackfin allows programmers to flexibly tune and trade off performance versus power consumption through the ability to turn on and off unused chip resources. The Blackfin MMU allows developers to protect selected areas of memory and manage system resources (cache and other memory subsystems) in an enterprise-class embedded environment where not everyone has access to all sections of memory and system resources. Like most microcontrollers, Blackfin has on-chip hardware support for software exceptions, hardware breakpoints, performance counters, and execution trace, plus complete control of the target hardware through a single JTAG port. Blackfin s memory architecture allows programmers to flexibly tune and trade off performance versus power consumption. Data Memory Interface Blackfin Processor Core Memory Management Unit (MMU) Instruction Memory Interface L1 Data Memory L1 Instruction Memory 4k Byte Cache or Cache or Cache or ScratchPad Bank A Bank B Bank C Bank B Bank A To L2 and L3 Memories Unifying Microarchitecture for Embedded Media Processing Analog Devices, Inc. 4

The Blackfin software development model enables high performance DSP functionality within a framework matching that of typical RISC devices. System-level and product-level application code can be written in C/C++ and layered on top of a standard real-time operating system (RTOS). Lower-level code, such as media operations, can be optimized in mixed assembly code and C/C++ code, utilizing hand-tuned, assembly libraries of high performance media functions. Blackfin can be just as easily programmed in assembler, compiled C/C++ code, or a mixture of both. Dynamic power management Control of power dissipation has long been a major concern for embedded applications that are frequently designed for portable and power-constrained environments. And when embedded systems have needed DSP functionality, choices have been few. When separate MCU and DSP cores are combined on one chip, power control becomes even more difficult and complex. As an embedded media processor, Blackfin s integrated dynamic power management (DPM) controller can optimally address the needs of a given embedded application. Multiple power modes support a range of system performance levels, and the device is designed to allow selective disabling of clocks to unused peripherals and L2 memory. The PLL frequency can be adjusted over a wide range to save power based on various processing needs, and Blackfin s voltage can be adjusted to enable exponential savings in power consumption. The fast clock rates needed to meet the computational complexity and performance requirements of an embedded media processor make it difficult to apply tactical power-saving design schemes. Blackfin s dynamic power management capabilities optimize performance versus power for specific tasks, supporting a multitiered approach to power management that adjusts performance based on system needs. Going one step further, Blackfin also allows the core voltage to be changed in concert with frequency changes, so less power is consumed when running a section of code at a lower frequency and a lower voltage, even if execution time is extended. Blackfin processors optimize power consumption. 1.5V, 300MHz DSP Operation PLL Setting Regulator Transition 1.0V, 100MHz DSP Operation 1.3V, 225MHz DSP Operation Power Consumption Just vary the frequency V dd Dynamic Power Management Time Vary the voltage and the frequency Unifying Microarchitecture for Embedded Media Processing Analog Devices, Inc. 5

Advantages The advantages of using a single, unified core to replace a separate microcontroller and DSP are many. But the key benefits include reduced cost of ownership and faster time to market. Both these benefits are driven by the ability to use a single tool chain to develop code on a single, unified platform. Instead of having to learn two instruction sets and tool chains, developers can learn one instruction set and maintain a single code base running on a single operating system (where applicable). In fact, because they are architected from the ground up for true embedded media processing, these devices actually define a new domain of media instruction set computing. A unified core has one set of software application programming interfaces (APIs) and drivers, one debugger, one loader, one linker, one language, and so forth. A single learning curve means dramatic improvement in development productivity the savings on debug time alone is enormous. However, the biggest advantage to using a unified processor that delivers the best of both worlds is the applications it can enable at performance and price points previously unattainable. Unifying Microarchitecture for Embedded Media Processing Analog Devices, Inc. 6

Embedded Processing Support www.analog.com/processors Email (in the U.S.A.): embedded.support@analog.com Email (in Europe): embedded.europe@analog.com Fax (in the U.S.A.): 781-461-3010 Fax (in Europe): 49-89-76903-557 Worldwide Headquarters Analog Devices One Technology Way P.O. Box 9106 Norwood, MA 02062-9106 U.S.A. Telephone: 781-329-4700 Fax: 781-326-8703 Toll-free: 800-262-5643 (U.S.A. only) Japan Headquarters New Pier Takeshiba South Tower Building 1-16-1 Kaigan, Minato-ku, Tokyo 105-6891, Japan Telephone: 813-5402-8210 Fax: 813-5402-1063 Southeast Asian Headquarters 4501 Nat West Tower Times Square One Matheson Street Causeway Bay Hong Kong, PRC Telephone: 852-2-506-9336 Fax: 852-2-506-4755 Printed in the U.S.A. 2003 Analog Devices, Inc. All rights reserved. Trademarks and registered trademarks are the property of their respective companies.