NEW APPROACHES TO HARDWARE ACCELERATION USING ULTRA LOW DENSITY FPGAs

Similar documents
Solving Today s Interface Challenge With Ultra-Low-Density FPGA Bridging Solutions

USING LOW COST, NON-VOLATILE PLDs IN SYSTEM APPLICATIONS

THE INDUSTRY CASE FOR DISTRIBUTED HETEROGENEOUS PROCESSING

IoT Sensor Connectivity and Processing with Ultra-Low Power, Small Form-Factor FPGAs

THE GROWING USE OF PROGRAMMABLE LOGIC DEVICES IN MOBILE HANDSETS

Accelerating Implementation of Low Power Artificial Intelligence at the Edge

The S6000 Family of Processors

LOW-COST FPGA CONFIGURATION VIA INDUSTRY-STANDARD SPI SERIAL FLASH & LatticeECP/EC FPGAs

Programmable Logic Devices

VIDEO BRIDGING SOLUTION PROMISES NEW LEVEL OF DESIGN FLEXIBILITY AND INNOVATION

MANAGING IMAGE DATA IN AUTOMOTIVE INFOTAINMENT APPLICATIONS USING LOW COST PLDS

Five Ways to Build Flexibility into Industrial Applications with FPGAs

Minimizing System Interruption During Configuration Using TransFR Technology

Overview of Microcontroller and Embedded Systems

APPLICATIONS FOR EMBEDDED COMESH William Bricken January 2003 ADVANTAGES

Implementing the Top Five Control-Path Applications with Low-Cost, Low-Power CPLDs

FPGA Co-Processing Architectures for Video Compression

Designing for Low Power with Programmable System Solutions Dr. Yankin Tanurhan, Vice President, System Solutions and Advanced Applications

Reconfigurable Computing. Introduction

Memory Systems IRAM. Principle of IRAM

ENABLING MOBILE INTERFACE BRIDGING IN ADAS AND INFOTAINMENT APPLICATIONS

Hardware-Software Codesign. 1. Introduction

Microcontroller Systems. ELET 3232 Topic 11: General Memory Interfacing

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

FPGA for Dummies. Introduc)on to Programmable Logic

New Architecture for Code-Shadowing Applications. Anil Gupta Technical Executive, Winbond Electronics

CMPE 415 Programmable Logic Devices FPGA Technology I

3D systems-on-chip. A clever partitioning of circuits to improve area, cost, power and performance. The 3D technology landscape

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University

ARM Processors for Embedded Applications

fleximac A MICROSEQUENCER FOR FLEXIBLE PROTOCOL PROCESSING

NEW USE CASES HIGHLIGHT CROSSLINK S BROAD APPLICABILITY

k -bit address bus n-bit data bus Control lines ( R W, MFC, etc.)

ProASIC PLUS FPGA Family

Single Pass Connected Components Analysis

Mark Sandstrom ThroughPuter, Inc.

A Reconfigurable Multifunction Computing Cache Architecture

Welcome. Altera Technology Roadshow 2013

HP Advanced Memory Protection technologies

OVERVIEW: NETWORK ON CHIP 3D ARCHITECTURE

Enabling Intelligent Digital Power IC Solutions with Anti-Fuse-Based 1T-OTP

Implementing RapidIO. Travis Scheckel and Sandeep Kumar. Communications Infrastructure Group, Texas Instruments

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

Mixed-Signal. From ICs to Systems. Mixed-Signal solutions from Aeroflex Colorado Springs. Standard products. Custom ASICs. Mixed-Signal modules

MAX 10 FPGA Device Overview

Efficient Data Movement in Modern SoC Designs Why It Matters

Intel s s Memory Strategy for the Wireless Phone

ispxpld TM 5000MX Family White Paper

DQ8051. Revolutionary Quad-Pipelined Ultra High performance 8051 Microcontroller Core

Address connections Data connections Selection connections

ADVANCED FPGA BASED SYSTEM DESIGN. Dr. Tayab Din Memon Lecture 3 & 4

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

Semiconductor Memories: RAMs and ROMs

White Paper The Need for a High-Bandwidth Memory Architecture in Programmable Logic Devices

Symantec NetBackup 7 for VMware

White Paper. The advantages of using a combination of DSP s and FPGA s. Version: 1.0. Author: Louis N. Bélanger. Date: May, 2004.

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

Different network topologies

Vertex Detector Electronics: ODE to ECS Interface

LatticeSCM SPI4.2 Interoperability with PMC-Sierra PM3388

machine cycle, the CPU: (a) Fetches an instruction, (b) Decodes the instruction, (c) Executes the instruction, and (d) Stores the result.

Introduction to Partial Reconfiguration Methodology

FPGA Adaptive Software Debug and Performance Analysis

The Xilinx XC6200 chip, the software tools and the board development tools

CS310 Embedded Computer Systems. Maeng

ECE 448 Lecture 15. Overview of Embedded SoC Systems

EE380K: Computing In Transition

IoT Market: Three Classes of Devices

CMPE 415 Programmable Logic Devices Introduction

Approximately half the power consumption of earlier Renesas Technology products and multiple functions in a 14-pin package

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

CS 261 Fall Mike Lam, Professor. Memory

Higher Level Programming Abstractions for FPGAs using OpenCL

If It s Electronic, It Needs a Clock

Wireless Infrastructure Solutions to Enhance the Digital Media Experience

Custom computing systems

Chapter 5: ASICs Vs. PLDs

Global Semiconductor Market Outlook to 2017

Whitepaper: FPGA-Controlled Test (FCT): What it is and why is it needed?

Allmost all systems contain two main types of memory :

How FPGAs Enable Automotive Systems

VICP Signal Processing Library. Further extending the performance and ease of use for VICP enabled devices

MAX II CPLD Applications Brochure

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

Moore s Law: Alive and Well. Mark Bohr Intel Senior Fellow

LatticeXP2 Hardware Checklist

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

ASYNC Rik van de Wiel COO Handshake Solutions

Sistemas Digitais I LESI - 2º ano

Ultra-low power, Single-chip SRAM FPGA Targets Handheld Consumer Applications

Multimedia in Mobile Phones. Architectures and Trends Lund

DesignWare IP for IoT SoC Designs

INTELLIGENT LOAD SWITCHING HELPS IMPLEMENT RELIABLE HOT SWAP SYSTEMS

4. Hardware Platform: Real-Time Requirements

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers

The Next Steps in the Evolution of Embedded Processors

Creating hybrid FPGA/virtual platform prototypes

EMBEDDED SYSTEM FOR VIDEO AND SIGNAL PROCESSING

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

Transcription:

NEW APPROACHES TO HARDWARE ACCELERATION USING ULTRA LOW DENSITY FPGAs August 2013 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 New Approaches to Hardware Acceleration Using Ultra Low Density FPGAs

Introduction Ask system designers to list the problems they face it doesn t matter whether they re building mobile consumer, automotive, industrial, medical or scientific applications and inevitably they ll mention optimizing host processor performance. It s hardly surprising. The event-driven architecture of these MPUs allows them to multitask and address new priorities as they occur. But as the number of I/O continues to rise, it also places escalating demand on bandwidth. Tasked with managing a wider array of I/O as well as other system-wide command and control functions, today s host MPUs must remain operational for longer periods of time, thereby consuming precious power and compute resources. Complicating this problem, many of the tasks host MPUs must address today are timing critical; they must be performed in a contiguous manner. Once the processing commences, it cannot be interrupted. Each task must be performed as the data reaches the MPU or the integrity of the data is compromised. Given that many of today s handsets or instruments may feature up to a half dozen or more sensors delivering data continuously over a SPI or I2C interface, performance requirements can climb dramatically. Historically, designers have addressed these performance bottlenecks and allowed the host processor to devote more attention to command and control functions, by offloading math functions, particularly those that are predictable and repetitive, to general purpose devices such as DSPs or ASSPs. This partitioning of the design reduces processor overhead, but designers typically pay a price in some combination of increased board space, power consumption, cost, and development time. Furthermore, most of these general purpose solutions are far too large and generic to address the specific needs of today s timing critical applications. Ideally designers require a programmable solution that can be dedicated to a specific task and delivers silicon optimized to perform that task with maximum efficiency in realtime logic. 2 New Approaches to Hardware Acceleration Using Ultra Low Density FPGAs

Highly Targeted Solutions A new class of Ultra Low Density (ULD) FPGAs, developed by Lattice Semiconductor, offers designers the opportunity to develop highly targeted solutions that can be uniquely optimized for these specific timing critical and coprocessor-type applications. In effect, these new devices allow designers to develop very small, low power solutions aimed at accelerating very specific functions with very specific silicon. Unlike traditional general purpose coprocessors, these extremely small programmable devices offer the ability to implement algorithms that are highly optimized for a particular application and supported by interfaces targeted to that particular function. For instance, these applications might support timing critical functions for motor control or communications, or very specific hardware acceleration functions for sensor preprocessing or image rotation. While these highly targeted solutions offer more efficient bandwidth utilization, they also bring very attractive benefits in terms of lower power consumption, smaller product footprint and lower system cost. At the same time ULD FPGAs give designers unparalleled design flexibility. By capturing these highly targeted functions in a very small, low power FPGA, designers can reduce core processor overhead opening up new options for system design. In some cases it allows them to use a lower performance and lower cost processor and keep the overall solution budget competitive. In others, it allows them to add new functionality into the system without disturbing the complex coding associated with the legacy core processor. Finally, unlike traditional large FPGAs, these ULD devices offer designers the ability to deliver highly tailored programmable solutions occupying minimal board space while keeping system power demands and cost as low as possible. Utilizing a unique fabric that can perform functions in parallel, ULD FPGAs deliver excellent performance in devices as small as 1.71-mm x 1.71-mm while consuming an order of magnitude less power than a traditional applications processor. The following design examples give insight into how ULD FPGAs might be used to address a wide variety of timing critical and hardware acceleration applications. 3 New Approaches to Hardware Acceleration Using Ultra Low Density FPGAs

Image Scaling In a growing number of mobile applications today image outputs must be resized, either scaled down or up, or presented as picture-in-picture to support new applications or the use of multiple displays. Many embedded, medical and automotive applications, for instance, feature a display of one size on the front of the device and a second, differentsized display in another location on the product. By offloading this image scaling function from the core system processor, a circuit implemented in a ULD FPGA allows designers to use a lower cost core processor and reduce power consumption by minimizing overhead. Alternately, it can permit designers to extend the life of an existing design by adding new functionality without swapping out the core processor or taking on a major re-coding effort. The block diagram below depicts a typical image scaling circuit. Fig. 1: Image scaling circuit for a mobile device. WDR Preprocessing Wide Dynamic Range (WDR) preprocessing is a common technique used to improve image clarity. By lighting up dark areas of an image and diminishing the light in bright areas, this process improves the visibility of objects in an image. But this function can be highly compute intensive. The processor must combine short exposure frames and long exposure frames and develop a tone map to modify and improve the image. 4 New Approaches to Hardware Acceleration Using Ultra Low Density FPGAs

Typically this process would require the significant logic resources of a high end Image Signal Processor (ISP). The block diagram below illustrates a co-processing solution that drives down host ISP performance requirements by offloading the WDR function using an X02-4000 FPGA from Lattice Semiconductor. This highly targeted implementation takes image data from an image sensor with WDR capabilities over a sub-lvds interface and performs the WDR function in a 4K LUT X02 FPGA. The HD video frame is stored in a low power SDRAM. Once the pre-processing is complete, the image is sent via parallel bus to the ISP. By performing this function before the data reaches the ISP, this highly optimized ULD FPGA allows designers to use a lower performance, more affordable ISP for the rest of the image pipeline. Fig. 2: WDR coprocessor using a Lattice Semiconductor X02-4000 ULD FPGA. JTAG Extender As communications system functionality continues to grow by leaps and bounds, board design has become increasingly complex. Some communication boards today feature hundreds of components in densely packed layouts. Given this design trend, it has become extremely difficult to establish and use a single JTAG chain for an entire board. In this example designers simplify the task of developing a JTAG chain for board initialization and test by breaking it into multiple smaller, more easily addressable pieces using an ULD FPGA. This multiple boundary scan Test Access Port (TAP) addressable 5 New Approaches to Hardware Acceleration Using Ultra Low Density FPGAs

buffer is accessed through a standard 1149.1 interface. With three Local Scan Ports (LSPs), the function can be configured as hierarchical ports with the ability to add or remove local scan chains to streamline test flow. Each LSP can also be accessed individually or in combinations of two or three. This JTAG extender can be implemented in a ULD FPGA with less than 200 LUTs. Fig. 3: JTAG Extender using a Lattice Semiconductor ULD FPGA. Image Rotation Some lower cost processors used in a wide variety of industrial, medical, scientific and automotive applications are designed to only output an image in either portrait or landscape format. Designers can retain their cost-effective core processor and add image rotation capability by implementing the function in a ULD FPGA from Lattice Semiconductor with as little as 1K LUTs. In the design below the ice40 FPGA loads an image in landscape format (400 x 240) and a counter and buffers into SDRAM. Once the full image is loaded, the FPGA reads out the frame in rotated portrait format for display on the screen. The functions across the top of the FPGA manage setup and control. Larger screens would require FPGAs with densities of approximately 4K LUTs. 6 New Approaches to Hardware Acceleration Using Ultra Low Density FPGAs

Fig. 4: Image rotation circuit using ice40 FPGA. Live System Updating Lattice Semiconductor s ULD FPGAs also bring a unique capability to time-critical applications where systems must be updated on-the-fly while the system continues to operate. Historically, one of the more attractive benefits of FPGAs has always been the ability to reconfigure device functionality without removing the device from the system. However, typically most of the methods used to reprogram FPGAs require a major disruption to system operation. How do users of high reliability systems that must be operational 24/7 update the FPGA without interrupting the operation of the rest of the board? Lattice X02 ULD FPGAs feature a technology called Transparent Field Reconfiguration (TransFR) to help minimize system interruption. A function provided in Lattice s ispvm system software, TransFR makes use of the two sets of configuration storage built into Lattice s ULD FPGAs. Each FPGA maintains a working configuration of the FPGA in its on-chip SRAM while a copy of the configuration is retained in on-chip flash memory. Since the contents of the flash memory can be loaded into SRAM automatically at power up or any other time, there is the no need for external boot memory. FPGAs without internal non-volatile storage configure the SRAM from an external device such as a serial flash, microprocessor or EEPROM. Whether the SRAM is programmed by an 7 New Approaches to Hardware Acceleration Using Ultra Low Density FPGAs

external device or on-chip Flash, the non-volatile storage can be programmed independently of the SRAM and each memory can be modified while the other remains unchanged. In a process called background programming, engineers program the nonvolatile configuration memory while the SRAM continues to operate uninterrupted. Lattice s TransFR technology combines this background programming capability with the ULD FPGA s powerful boundary scan functionality to support system updates with minimal disruption. First, the non-volatile memory is updated while the SRAM runs undisrupted. Next, I/O states are captured and driven to a user-defined level using JTAG commands. While the I/O states are under control, JTAG commands are used to initiate the transfer of new technology of new functionality from the non-volatile memory to the SRAM. Once the SRAM is reconfigured, I/Os are released from boundary scan control and returned to their specified functions. Fig. 5: Live system updating solution. 8 New Approaches to Hardware Acceleration Using Ultra Low Density FPGAs

Conclusion The traditional notion of coprocessing requirements has arguably been summed by the concept, the more computational power the better. But many systems built today do not call for a monster coprocessor capable of delivering mountains of performance. More often than not what designers really need are solutions that are expressly targeted at their hardware acceleration needs. By using precisely targeted silicon optimized for a specific function, ULD FPGAs offer designers a new option the ability to resolve their performance bottlenecks with minimal impact on system footprint, power consumption and cost. 9 New Approaches to Hardware Acceleration Using Ultra Low Density FPGAs