Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005

Size: px
Start display at page:

Download "Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005"

Transcription

1 Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005

2 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides: The computational efficiency of direct logic implementations in ASICs All the flexibility of microprocessors (pure software) Unique architecture enables the array to be effectively targeted by an ANSI Standard C compiler CPU logic is continuously redefined to match the compiler s needs for each instruction ( 20 cycles) Massive parallelism even in programs with little or no inherent instruction-level parallelism Dozens of lines of sequential C code are executed every instruction

3 Ascenium Block Diagram No instruction set RLU is combinatorial logic that settles Memory is accessed in statically scheduled block operations Two instruction streams 1) Memory 2) Configuration Whole loops are often executed as one RLU instruction Reconfigurable Logic Unit (RLU) Config Control In Out Memory Ascenium Memory Control

4 Ascenium Basic Operation 12 RISC Instructions ALU Register File Control Reconfigurable Logic Unit (RLU) Config Control In Out Memory Control Memory Conventional Clock = Memory Ascenium

5 A128 Rough Layout Pad Ring RLU Inst. Pipe Data Pipe Onchip Memory 130 nm process 128 -bit logic elements 64 MACs/clock cycle 200 MHz DDR II = 3.2 GB/s external memory bandwidth 256KB onchip memory 19.2 GB/s peak onchip memory bandwidth MHz 52 mm 2

6 Logic Element Input A ( bits) Input B ( bits) Input C ( bits) ½ -Bit Full Multiplier - Accumulator Fast Carry -Bit Add LUT ( bits) LUT Y ( bits) Wide Logic Iterator ( bits) Iterator Y ( bits) Iterator Z ( bits) Output ( bits) Output Y ( bits)

7 Array Connectivity

8 Compiler Block Diagram Library (open source) Bytecode files Source code C, C, Java GCC Based Compiler (open source) Bytecode Bytecode Linker (open source) Bytecode Proprietary Ascenium Code Generator (develop) Ascenium Executable

9 Code Generator Diagram LLVM PARSER GENERIC INT. REP. (GIR) GLOBAL THREADING GIR OPTS. ASCENIUM INT. REP. (AIR) FUNC. ASSIGNMENT & SCORING ARRAY INST. PACKING & COMP. DATA DIRECTIVES FETCH ORDER ASSY. AIR OPTS. OUTPUT indicates where optimizations occur

10 Generic Intermediate Representation (GIR) PARAM A PARAM B PARAM A LOADI PARAM A CALL RET * LOADI STORE RET RET

11 FFT Inner Loop C Code i1 = i0 n2; i2 = i1 n2; i3 = i2 n2; r1 = x[2 * i0] x[2 * i2]; r2 = x[2 * i0] - x[2 * i2]; t = x[2 * i1] x[2 * i3]; x[2 * i0] = r1 t; r1 = r1 - t; s1 = x[2 * i0 1] x[2 * i2 1]; s2 = x[2 * i0 1] - x[2 * i2 1]; t = x[2 * i1 1] x[2 * i3 1]; x[2 * i0 1] = s1 t; s1 = s1 - t; x[2 * i2] = (r1 * co2 s1 * si2) >>15; x[2 * i2 1] = (s1 * co2-r1*si2)>>15; t = x[2 * i1 1] - x[2 * i3 1]; r1 = r2 t; r2 = r2 - t; t = x[2 * i1] - x[2 * i3]; s1 = s2 - t; s2 = s2 t; x[2 * i1] = (r1 * co1 s1 * si1)>>15; x[2 * i1 1] = (s1 * co1-r1 * si1)>>15; x[2 * i3] = (r2 * co3 s2 * si3)>>15; x[2 * i3 1] = (s2 * co3-r2 *si3)>>15;

12 FFT Inner Loop Virtual Circuit Y0 Y2 Y0 Y2 Y1 Y3 Y1 Y3 1 3 A r1 - B C r2 t1 s1 s2 D - E F - G - H t2 t3 t4 - I - J K C2 S2 C2 S2 C1 S1 C1 S1 C3 S3 C3 r3 s3 r4 r5 s4 s5 - L - M N S3 O P Q R S T U V W Y Z 32 AA 32 BB 32 CC 32 DD 32 EE 32 FF GG HH >>15 II >>15 JJ >>15 KK >>15 LL >>15 MM >>15 NN 0o Y0o 2o Y2o 1o Y1o 3o Y3o

13 FFT Loop Array Instruction AA BB CC DD O P R Q S T V U B A E D 0 Y0 2 Y2 C2 S2 W H C EE G F 1 Y1 3 Y3 C3 S3 NN Z Y K I C1 Y3 o MM KK FF M J S1 1 3 o o LL L N GG HH 0 Y1 o o JJ II Y0 2 o Y2 o o A 0 Y0 2 Y2 C2 S2 D C F I J GG HH B E H G K M L O 1 Y1 3 Y3 C3 S3 C1 S1 Y3 1 o 3 o o 0 Y1 o o N P Q T U AA R BB S CC V DD W Y LL II Z EE FF JJ NN KK MM Y0 o2 oy2 o

14 FFT Loop Data Directives 1. Fetch the array instruction and data directives 2. Load the array instruction 3. Read x[t] in four 64-bit wide banks 32 times 4. Write x[t] in one 256-bit wide bank 32 times, changing banks every 8 operations 5. Repeat steps 3 and 4 three more times 6. At the same time as 3 5, read in w coefficients 7. Pass 1: w every clock; 2: w every 4 clocks, then every, then every Stop

15 DSP Benchmark Results (Simulated, Hand-Coded) Benchmark 256-Point Radix-4 FFT Real Block FIR Filter Complex Block FIR Filter DCT x 24 IIR Filter NR = 2,048 A128 core, 130nm, 250MHz, 250mW 512 ns 512 ns 512 ns 512 ns 292 ns TMS320C64x core, 130nm, 300MHz*, 250mW 8,920 ns 6,827 ns 6,827 ns 6,080 ns 6,827 ns * The 130nm C64x core will operate up to 720 MHz, but burns more power at higher frequencies.

16 Ascenium Status Patents filed Chip architecture complete, polishing ongoing Prototype compiler complete & demo-able Development plan in place Raising seed round

17 Summary Great performance and performance/mw: >10x programmable DSPs >4x performance/mw of FPGAs Great time-to-market: -- Normal GPP software development cycle Great ease-of-use: -- Programmers writing standard, non-optimized code in C/C get all the performance

Digital Signal Processor Core Technology

Digital Signal Processor Core Technology The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x

More information

C-Based Hardware Design Platform for Dynamically Reconfigurable Processor

C-Based Hardware Design Platform for Dynamically Reconfigurable Processor C-Based Hardware Design Platform for Dynamically Reconfigurable Processor September 22 nd, 2005 IPFlex Inc. Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW

More information

Math 96--Radicals #1-- Simplify; Combine--page 1

Math 96--Radicals #1-- Simplify; Combine--page 1 Simplify; Combine--page 1 Part A Number Systems a. Whole Numbers = {0, 1, 2, 3,...} b. Integers = whole numbers and their opposites = {..., 3, 2, 1, 0, 1, 2, 3,...} c. Rational Numbers = quotient of integers

More information

FABRICATION TECHNOLOGIES

FABRICATION TECHNOLOGIES FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of An Independent Analysis of the: MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of Berkeley Design Technology, Inc. OVERVIEW MIPS Technologies, Inc. is an Intellectual Property (IP)

More information

System-on Solution from Altera and Xilinx

System-on Solution from Altera and Xilinx System-on on-a-programmable-chip Solution from Altera and Xilinx Xun Yang VLSI CAD Lab, Computer Science Department, UCLA FPGAs with Embedded Microprocessors Combination of embedded processors and programmable

More information

An introduction to Digital Signal Processors (DSP) Using the C55xx family

An introduction to Digital Signal Processors (DSP) Using the C55xx family An introduction to Digital Signal Processors (DSP) Using the C55xx family Group status (~2 minutes each) 5 groups stand up What processor(s) you are using Wireless? If so, what technologies/chips are you

More information

D. Richard Brown III Associate Professor Worcester Polytechnic Institute Electrical and Computer Engineering Department

D. Richard Brown III Associate Professor Worcester Polytechnic Institute Electrical and Computer Engineering Department D. Richard Brown III Associate Professor Worcester Polytechnic Institute Electrical and Computer Engineering Department drb@ece.wpi.edu 12-November-2012 Efficient Real-Time DSP Data types Memory usage

More information

FPGA Polyphase Filter Bank Study & Implementation

FPGA Polyphase Filter Bank Study & Implementation FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes

More information

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design Lecture Objectives Background Need for Accelerator Accelerators and different type of parallelizm

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

High performance, power-efficient DSPs based on the TI C64x

High performance, power-efficient DSPs based on the TI C64x High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research

More information

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes B-KD rees for Hardware Accelerated Ray racing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek Saarland University, Germany Outline Previous Work B-KD ree as new Spatial Index Structure DynR

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.

More information

Jason Manley. Internal presentation: Operation overview and drill-down October 2007

Jason Manley. Internal presentation: Operation overview and drill-down October 2007 Jason Manley Internal presentation: Operation overview and drill-down October 2007 System overview Achievements to date ibob F Engine in detail BEE2 X Engine in detail Backend System in detail Future developments

More information

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information

FAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH

FAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH Key words: Digital Signal Processing, FIR filters, SIMD processors, AltiVec. Grzegorz KRASZEWSKI Białystok Technical University Department of Electrical Engineering Wiejska

More information

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING 1 DSP applications DSP platforms The synthesis problem Models of computation OUTLINE 2 DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: Time-discrete representation

More information

THE NVIDIA DEEP LEARNING ACCELERATOR

THE NVIDIA DEEP LEARNING ACCELERATOR THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural

More information

PACE: Power-Aware Computing Engines

PACE: Power-Aware Computing Engines PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious

More information

The Xilinx XC6200 chip, the software tools and the board development tools

The Xilinx XC6200 chip, the software tools and the board development tools The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions

More information

The extreme Adaptive DSP Solution to Sensor Data Processing

The extreme Adaptive DSP Solution to Sensor Data Processing The extreme Adaptive DSP Solution to Sensor Data Processing Abstract Martin Vorbach PACT XPP Technologies Leo Mirkin Sky Computers, Inc. The new ISR mobile autonomous sensor platforms present a difficult

More information

Graduate Institute of Electronics Engineering, NTU 9/16/2004

Graduate Institute of Electronics Engineering, NTU 9/16/2004 / 9/16/2004 ACCESS IC LAB Overview of DSP Processor Current Status of NTU DSP Laboratory (E1-304) Course outline of Programmable DSP Lab Lab handout and final project DSP processor is a specially designed

More information

CNP: An FPGA-based Processor for Convolutional Networks

CNP: An FPGA-based Processor for Convolutional Networks Clément Farabet clement.farabet@gmail.com Computational & Biological Learning Laboratory Courant Institute, NYU Joint work with: Yann LeCun, Cyril Poulet, Jefferson Y. Han Now collaborating with Eugenio

More information

Cache Justification for Digital Signal Processors

Cache Justification for Digital Signal Processors Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose

More information

Better sharc data such as vliw format, number of kind of functional units

Better sharc data such as vliw format, number of kind of functional units Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com

More information

High Performance Implementation of Microtubule Modeling on FPGA using Vivado HLS

High Performance Implementation of Microtubule Modeling on FPGA using Vivado HLS High Performance Implementation of Microtubule Modeling on FPGA using Vivado HLS Yury Rumyantsev rumyantsev@rosta.ru 1 Agenda 1. Intoducing Rosta and Hardware overview 2. Microtubule Modeling Problem 3.

More information

Embedded Target for TI C6000 DSP 2.0 Release Notes

Embedded Target for TI C6000 DSP 2.0 Release Notes 1 Embedded Target for TI C6000 DSP 2.0 Release Notes New Features................... 1-2 Two Virtual Targets Added.............. 1-2 Added C62x DSP Library............... 1-2 Fixed-Point Code Generation

More information

Understanding Peak Floating-Point Performance Claims

Understanding Peak Floating-Point Performance Claims white paper FPGA Understanding Peak ing-point Performance Claims Learn how to calculate and compare the peak floating-point capabilities of digital signal processors (DSPs), graphics processing units (GPUs),

More information

asoc: : A Scalable On-Chip Communication Architecture

asoc: : A Scalable On-Chip Communication Architecture asoc: : A Scalable On-Chip Communication Architecture Russell Tessier, Jian Liang,, Andrew Laffely,, and Wayne Burleson University of Massachusetts, Amherst Reconfigurable Computing Group Supported by

More information

USING C-TO-HARDWARE ACCELERATION IN FPGAS FOR WAVEFORM BASEBAND PROCESSING

USING C-TO-HARDWARE ACCELERATION IN FPGAS FOR WAVEFORM BASEBAND PROCESSING USING C-TO-HARDWARE ACCELERATION IN FPGAS FOR WAVEFORM BASEBAND PROCESSING David Lau (Altera Corporation, San Jose, CA, dlau@alteracom) Jarrod Blackburn, (Altera Corporation, San Jose, CA, jblackbu@alteracom)

More information

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers

More information

Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension

Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Embedded Controller Design. CompE 270 Digital Systems - 5. Objective. Application Specific Chips. User Programmable Logic. Copyright 1998 Ken Arnold 1

Embedded Controller Design. CompE 270 Digital Systems - 5. Objective. Application Specific Chips. User Programmable Logic. Copyright 1998 Ken Arnold 1 CompE 270 Digital Systems - 5 Programmable Logic Ken Arnold Objective Application Specific ICs Introduce User Programmable Logic Common Architectures Programmable Array Logic Address Decoding Example Development

More information

Industrial Control. 50 to 5,000 VA. Applications. Specifications. Standards. Options and Accessories. Features, Functions, Benefits.

Industrial Control. 50 to 5,000 VA. Applications. Specifications. Standards. Options and Accessories. Features, Functions, Benefits. Industrial Control 6 50 to 5,000 VA Applications n For commercial and industrial control applications including; control panels, conveyor systems, machine tool equipment, pump systems, and commercial air

More information

Team 1. Common Questions to all Teams. Team 2. Team 3. CO200-Computer Organization and Architecture - Assignment One

Team 1. Common Questions to all Teams. Team 2. Team 3. CO200-Computer Organization and Architecture - Assignment One CO200-Computer Organization and Architecture - Assignment One Note: A team may contain not more than 2 members. Format the assignment solutions in a L A TEX document. E-mail the assignment solutions PDF

More information

Industrial Control. 50 to 5,000 VA. Applications. Specifications. Standards. Options and Accessories. Features, Functions, Benefits.

Industrial Control. 50 to 5,000 VA. Applications. Specifications. Standards. Options and Accessories. Features, Functions, Benefits. Industrial Control 6 50 to 5,000 VA Applications n For commercial and industrial control applications including; control panels, conveyor systems, machine tool equipment, pump systems, and commercial air

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013 Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Design Space Exploration for Memory Subsystems of VLIW Architectures

Design Space Exploration for Memory Subsystems of VLIW Architectures E University of Paderborn Dr.-Ing. Mario Porrmann Design Space Exploration for Memory Subsystems of VLIW Architectures Thorsten Jungeblut 1, Gregor Sievers, Mario Porrmann 1, Ulrich Rückert 2 1 System

More information

Classification of Semiconductor LSI

Classification of Semiconductor LSI Classification of Semiconductor LSI 1. Logic LSI: ASIC: Application Specific LSI (you have to develop. HIGH COST!) For only mass production. ASSP: Application Specific Standard Product (you can buy. Low

More information

Agenda. How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware design

Agenda. How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware design Catapult C Synthesis High Level Synthesis Webinar Stuart Clubb Technical Marketing Engineer April 2009 Agenda How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware

More information

14K Mixable Rings 1/ 2 ctw, $1499. Silver Diamond Pendant 1/6 ctw, $299

14K Mixable Rings 1/ 2 ctw, $1499. Silver Diamond Pendant 1/6 ctw, $299 1k 14K D.B.T.Y. 1/4 ctw, 1/2 ctw, $499 3/4 ctw, 9 1 ctw, 9 1m 1a 14K Mixable Rings 1/ 2 ctw, $1499 1h 14K Two Tone 1/5 ctw, $549 1/2 ctw, $1299 Triple Strand 1/4 ctw 14K Triple Strand 3/4 ctw 9 1c 14K

More information

Stacks & Queues. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Stacks & Queues. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST Stacks & Queues Kuan-Yu Chen ( 陳冠宇 ) 2018/10/01 @ TR-212, NTUST Review Stack Stack Permutation Expression Infix Prefix Postfix 2 Stacks. A stack is an ordered list in which insertions and deletions are

More information

Memory Systems and Performance Engineering

Memory Systems and Performance Engineering SPEED LIMIT PER ORDER OF 6.172 Memory Systems and Performance Engineering Fall 2010 Basic Caching Idea A. Smaller memory faster to access B. Use smaller memory to cache contents of larger memory C. Provide

More information

A Reconfigurable Multifunction Computing Cache Architecture

A Reconfigurable Multifunction Computing Cache Architecture IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 4, AUGUST 2001 509 A Reconfigurable Multifunction Computing Cache Architecture Huesung Kim, Student Member, IEEE, Arun K. Somani,

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

SoC Basics Avnet Silica & Enclustra Seminar Getting started with Xilinx Zynq SoC Fribourg, April 26, 2017

SoC Basics Avnet Silica & Enclustra Seminar Getting started with Xilinx Zynq SoC Fribourg, April 26, 2017 1 2 3 4 Introduction - Cool new Stuff Everybody knows, that new technologies are usually driven by application requirements. A nice example for this is, that we developed portable super-computers with

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

DSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions

DSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions White Paper: Spartan-3 FPGAs WP212 (v1.0) March 18, 2004 DSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions By: Steve Zack, Signal Processing Engineer Suhel Dhanani, Senior

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING SASE 2010 Universidad Tecnológica Nacional - FRBA Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.

More information

Representation of Numbers and Arithmetic in Signal Processors

Representation of Numbers and Arithmetic in Signal Processors Representation of Numbers and Arithmetic in Signal Processors 1. General facts Without having any information regarding the used consensus for representing binary numbers in a computer, no exact value

More information

Introduction to the MMAGIX Multithreading Supercomputer

Introduction to the MMAGIX Multithreading Supercomputer Introduction to the MMAGIX Multithreading Supercomputer A supercomputer is defined as a computer that can run at over a billion instructions per second (BIPS) sustained while executing over a billion floating

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

Queues. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Queues. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST Queues Kuan-Yu Chen ( 陳冠宇 ) 2018/10/03 @ TR-212, NTUST Review Stack FILO Algorithms to covert Infix to Postfix Infix to Prefix Queue FIFO 2 Homeork1: Expression Convertor. Given a infix expression, please

More information

Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc.

Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc. Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc. CT Image Reconstruction Herman Head Sinogram Herman Head Reconstruction CT Image Reconstruction for all

More information

Simplifying FPGA Design for SDR with a Network on Chip Architecture

Simplifying FPGA Design for SDR with a Network on Chip Architecture Simplifying FPGA Design for SDR with a Network on Chip Architecture Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 RF NoC 3 Status and Conclusions USRP FPGA Capability Gen

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable. Reproducibility. Don t depend on components

More information

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory Memory systems Memory technology Memory hierarchy Virtual memory Memory technology DRAM Dynamic Random Access Memory bits are represented by an electric charge in a small capacitor charge leaks away, need

More information

Course Overview Revisited

Course Overview Revisited Course Overview Revisited void blur_filter_3x3( Image &in, Image &blur) { // allocate blur array Image blur(in.width(), in.height()); // blur in the x dimension for (int y = ; y < in.height(); y++) for

More information

VICP Signal Processing Library. Further extending the performance and ease of use for VICP enabled devices

VICP Signal Processing Library. Further extending the performance and ease of use for VICP enabled devices Signal Processing Library Further extending the performance and ease of use for enabled devices Why is library effective for customer application? Get to market faster with ready-to-use signal processing

More information

RISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER

RISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER RISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER Miss. Sushma kumari IES COLLEGE OF ENGINEERING, BHOPAL MADHYA PRADESH Mr. Ashish Raghuwanshi(Assist. Prof.) IES COLLEGE OF ENGINEERING, BHOPAL

More information

ADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING

ADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING ADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING Evan Baytan Department of Electrical Engineering and Computer

More information

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013 A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company

More information

An Optimizing Compiler for the TMS320C25 DSP Chip

An Optimizing Compiler for the TMS320C25 DSP Chip An Optimizing Compiler for the TMS320C25 DSP Chip Wen-Yen Lin, Corinna G Lee, and Paul Chow Published in Proceedings of the 5th International Conference on Signal Processing Applications and Technology,

More information

Lecture 41: Introduction to Reconfigurable Computing

Lecture 41: Introduction to Reconfigurable Computing inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following

More information

Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors

Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors Paul Ekas, DSP Engineering, Altera Corp. pekas@altera.com, Tel: (408) 544-8388, Fax: (408) 544-6424 Altera Corp., 101

More information

Specializing Hardware for Image Processing

Specializing Hardware for Image Processing Lecture 6: Specializing Hardware for Image Processing Visual Computing Systems So far, the discussion in this class has focused on generating efficient code for multi-core processors such as CPUs and GPUs.

More information

Microprocessors vs. DSPs (ESC-223)

Microprocessors vs. DSPs (ESC-223) Insight, Analysis, and Advice on Signal Processing Technology Microprocessors vs. DSPs (ESC-223) Kenton Williston Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com

More information

Two-level Reconfigurable Architecture for High-Performance Signal Processing

Two-level Reconfigurable Architecture for High-Performance Signal Processing International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA 04, pp. 177 183, Las Vegas, Nevada, June 2004. Two-level Reconfigurable Architecture for High-Performance Signal Processing

More information

Dependable VLSI Platform using Robust Fabrics

Dependable VLSI Platform using Robust Fabrics Dependable VLSI Platform using Robust Fabrics Director H. Onodera, Kyoto Univ. Principal Researchers T. Onoye, Y. Mitsuyama, K. Kobayashi, H. Shimada, H. Kanbara, K. Wakabayasi Background: Overall Design

More information

Evaluating MMX Technology Using DSP and Multimedia Applications

Evaluating MMX Technology Using DSP and Multimedia Applications Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

Qsys and IP Core Integration

Qsys and IP Core Integration Qsys and IP Core Integration Stephen A. Edwards (after David Lariviere) Columbia University Spring 2016 IP Cores Altera s IP Core Integration Tools Connecting IP Cores IP Cores Cyclone V SoC: A Mix of

More information

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 Hot Chips, 2006 Structure of the talk 65nm technology going towards 32nm Virtex-5 family Improved I/O Benchmarking

More information

IMPLICIT+EXPLICIT Architecture

IMPLICIT+EXPLICIT Architecture IMPLICIT+EXPLICIT Architecture Fortran Carte Programming Environment C Implicitly Controlled Device Dense logic device Typically fixed logic µp, DSP, ASIC, etc. Implicit Device Explicit Device Explicitly

More information

High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication

High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication Erik H. D Hollander Electronics and Information Systems Department Ghent University, Ghent, Belgium Erik.DHollander@ugent.be

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will

More information

FPGA Provides Speedy Data Compression for Hyperspectral Imagery

FPGA Provides Speedy Data Compression for Hyperspectral Imagery FPGA Provides Speedy Data Compression for Hyperspectral Imagery Engineers implement the Fast Lossless compression algorithm on a Virtex-5 FPGA; this implementation provides the ability to keep up with

More information

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 9 ] Shared Memory Performance Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture

More information

Unifi 45 Projector Retrofit Kit for SMART Board 580 and 560 Interactive Whiteboards

Unifi 45 Projector Retrofit Kit for SMART Board 580 and 560 Interactive Whiteboards Unifi 45 Projector Retrofit Kit for SMRT oard 580 and 560 Interactive Whiteboards 72 (182.9 cm) 60 (152.4 cm) S580 S560 Cautions, warnings and other important product information are contained in document

More information

Benchmarking Multithreaded, Multicore and Reconfigurable Processors

Benchmarking Multithreaded, Multicore and Reconfigurable Processors Insight, Analysis, and Advice on Signal Processing Technology Benchmarking Multithreaded, Multicore and Reconfigurable Processors Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley,

More information

Memory Systems and Performance Engineering. Fall 2009

Memory Systems and Performance Engineering. Fall 2009 Memory Systems and Performance Engineering Fall 2009 Basic Caching Idea A. Smaller memory faster to access B. Use smaller memory to cache contents of larger memory C. Provide illusion of fast larger memory

More information

Computer Architecture s Changing Definition

Computer Architecture s Changing Definition Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction

More information

Design of Reusable Context Pipelining for Coarse Grained Reconfigurable Architecture

Design of Reusable Context Pipelining for Coarse Grained Reconfigurable Architecture Design of Reusable Context Pipelining for Coarse Grained Reconfigurable Architecture P. Murali 1 (M. Tech), Dr. S. Tamilselvan 2, S. Yazhinian (Research Scholar) 3 1, 2, 3 Dept of Electronics and Communication

More information

Teacher Assignment and Transfer Program (TATP) On-Line Application Quick Sheets

Teacher Assignment and Transfer Program (TATP) On-Line Application Quick Sheets Teacher Assignment and Transfer Program (TATP) On-Line Application Quick Sheets On-Line Application The on-line application process is being piloted for teachers applying for advertised teaching positions

More information

Compilers and Code Optimization EDOARDO FUSELLA

Compilers and Code Optimization EDOARDO FUSELLA Compilers and Code Optimization EDOARDO FUSELLA Contents LLVM The nu+ architecture and toolchain LLVM 3 What is LLVM? LLVM is a compiler infrastructure designed as a set of reusable libraries with well

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. Specialized Computing

ECE 5775 (Fall 17) High-Level Digital Design Automation. Specialized Computing ECE 5775 (Fall 7) High-Level Digital Design Automation Specialized Computing Announcements All students enrolled in CMS & Piazza Vivado HLS tutorial on Tuesday 8/29 Install an SSH client (mobaxterm or

More information

KiloCore: A 32 nm 1000-Processor Array

KiloCore: A 32 nm 1000-Processor Array KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Implementation of DSP Algorithms

Implementation of DSP Algorithms Implementation of DSP Algorithms Main frame computers Dedicated (application specific) architectures Programmable digital signal processors voice band data modem speech codec 1 PDSP and General-Purpose

More information

This calculation converts 3562 from base 10 to base 8 octal. Digits are produced right to left, so the final answer is 6752.

This calculation converts 3562 from base 10 to base 8 octal. Digits are produced right to left, so the final answer is 6752. COMP 222 Spring 2016 Midterm #1 Solutions Average = 83, Median = 87 Range # of Papers 100 2 90s 13 80s 8 70s 3 60s 4

More information

TMS320C6000 Programmer s Guide

TMS320C6000 Programmer s Guide TMS320C6000 Programmer s Guide Literature Number: SPRU198E October 2000 Printed on Recycled Paper IMPORTANT NOTICE Texas Instruments (TI) reserves the right to make changes to its products or to discontinue

More information

CMSC 2833 Lecture Memory Organization and Addressing

CMSC 2833 Lecture Memory Organization and Addressing Computer memory consists of a linear array of addressable storage cells that are similar to registers. Memory can be byte-addressable, or word-addressable, where a word typically consists of two or more

More information

Section 8: Monomials and Radicals

Section 8: Monomials and Radicals In this section, we are going to learn skills for: NGSS Standards MA.912.A.4.1 Simplify monomials and monomial expressions using the laws of integral exponents. MA.912.A.6.1 Simplify radical expressions.

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information