Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005
|
|
- Janel Amy Jacobs
- 5 years ago
- Views:
Transcription
1 Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005
2 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides: The computational efficiency of direct logic implementations in ASICs All the flexibility of microprocessors (pure software) Unique architecture enables the array to be effectively targeted by an ANSI Standard C compiler CPU logic is continuously redefined to match the compiler s needs for each instruction ( 20 cycles) Massive parallelism even in programs with little or no inherent instruction-level parallelism Dozens of lines of sequential C code are executed every instruction
3 Ascenium Block Diagram No instruction set RLU is combinatorial logic that settles Memory is accessed in statically scheduled block operations Two instruction streams 1) Memory 2) Configuration Whole loops are often executed as one RLU instruction Reconfigurable Logic Unit (RLU) Config Control In Out Memory Ascenium Memory Control
4 Ascenium Basic Operation 12 RISC Instructions ALU Register File Control Reconfigurable Logic Unit (RLU) Config Control In Out Memory Control Memory Conventional Clock = Memory Ascenium
5 A128 Rough Layout Pad Ring RLU Inst. Pipe Data Pipe Onchip Memory 130 nm process 128 -bit logic elements 64 MACs/clock cycle 200 MHz DDR II = 3.2 GB/s external memory bandwidth 256KB onchip memory 19.2 GB/s peak onchip memory bandwidth MHz 52 mm 2
6 Logic Element Input A ( bits) Input B ( bits) Input C ( bits) ½ -Bit Full Multiplier - Accumulator Fast Carry -Bit Add LUT ( bits) LUT Y ( bits) Wide Logic Iterator ( bits) Iterator Y ( bits) Iterator Z ( bits) Output ( bits) Output Y ( bits)
7 Array Connectivity
8 Compiler Block Diagram Library (open source) Bytecode files Source code C, C, Java GCC Based Compiler (open source) Bytecode Bytecode Linker (open source) Bytecode Proprietary Ascenium Code Generator (develop) Ascenium Executable
9 Code Generator Diagram LLVM PARSER GENERIC INT. REP. (GIR) GLOBAL THREADING GIR OPTS. ASCENIUM INT. REP. (AIR) FUNC. ASSIGNMENT & SCORING ARRAY INST. PACKING & COMP. DATA DIRECTIVES FETCH ORDER ASSY. AIR OPTS. OUTPUT indicates where optimizations occur
10 Generic Intermediate Representation (GIR) PARAM A PARAM B PARAM A LOADI PARAM A CALL RET * LOADI STORE RET RET
11 FFT Inner Loop C Code i1 = i0 n2; i2 = i1 n2; i3 = i2 n2; r1 = x[2 * i0] x[2 * i2]; r2 = x[2 * i0] - x[2 * i2]; t = x[2 * i1] x[2 * i3]; x[2 * i0] = r1 t; r1 = r1 - t; s1 = x[2 * i0 1] x[2 * i2 1]; s2 = x[2 * i0 1] - x[2 * i2 1]; t = x[2 * i1 1] x[2 * i3 1]; x[2 * i0 1] = s1 t; s1 = s1 - t; x[2 * i2] = (r1 * co2 s1 * si2) >>15; x[2 * i2 1] = (s1 * co2-r1*si2)>>15; t = x[2 * i1 1] - x[2 * i3 1]; r1 = r2 t; r2 = r2 - t; t = x[2 * i1] - x[2 * i3]; s1 = s2 - t; s2 = s2 t; x[2 * i1] = (r1 * co1 s1 * si1)>>15; x[2 * i1 1] = (s1 * co1-r1 * si1)>>15; x[2 * i3] = (r2 * co3 s2 * si3)>>15; x[2 * i3 1] = (s2 * co3-r2 *si3)>>15;
12 FFT Inner Loop Virtual Circuit Y0 Y2 Y0 Y2 Y1 Y3 Y1 Y3 1 3 A r1 - B C r2 t1 s1 s2 D - E F - G - H t2 t3 t4 - I - J K C2 S2 C2 S2 C1 S1 C1 S1 C3 S3 C3 r3 s3 r4 r5 s4 s5 - L - M N S3 O P Q R S T U V W Y Z 32 AA 32 BB 32 CC 32 DD 32 EE 32 FF GG HH >>15 II >>15 JJ >>15 KK >>15 LL >>15 MM >>15 NN 0o Y0o 2o Y2o 1o Y1o 3o Y3o
13 FFT Loop Array Instruction AA BB CC DD O P R Q S T V U B A E D 0 Y0 2 Y2 C2 S2 W H C EE G F 1 Y1 3 Y3 C3 S3 NN Z Y K I C1 Y3 o MM KK FF M J S1 1 3 o o LL L N GG HH 0 Y1 o o JJ II Y0 2 o Y2 o o A 0 Y0 2 Y2 C2 S2 D C F I J GG HH B E H G K M L O 1 Y1 3 Y3 C3 S3 C1 S1 Y3 1 o 3 o o 0 Y1 o o N P Q T U AA R BB S CC V DD W Y LL II Z EE FF JJ NN KK MM Y0 o2 oy2 o
14 FFT Loop Data Directives 1. Fetch the array instruction and data directives 2. Load the array instruction 3. Read x[t] in four 64-bit wide banks 32 times 4. Write x[t] in one 256-bit wide bank 32 times, changing banks every 8 operations 5. Repeat steps 3 and 4 three more times 6. At the same time as 3 5, read in w coefficients 7. Pass 1: w every clock; 2: w every 4 clocks, then every, then every Stop
15 DSP Benchmark Results (Simulated, Hand-Coded) Benchmark 256-Point Radix-4 FFT Real Block FIR Filter Complex Block FIR Filter DCT x 24 IIR Filter NR = 2,048 A128 core, 130nm, 250MHz, 250mW 512 ns 512 ns 512 ns 512 ns 292 ns TMS320C64x core, 130nm, 300MHz*, 250mW 8,920 ns 6,827 ns 6,827 ns 6,080 ns 6,827 ns * The 130nm C64x core will operate up to 720 MHz, but burns more power at higher frequencies.
16 Ascenium Status Patents filed Chip architecture complete, polishing ongoing Prototype compiler complete & demo-able Development plan in place Raising seed round
17 Summary Great performance and performance/mw: >10x programmable DSPs >4x performance/mw of FPGAs Great time-to-market: -- Normal GPP software development cycle Great ease-of-use: -- Programmers writing standard, non-optimized code in C/C get all the performance
Digital Signal Processor Core Technology
The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x
More informationC-Based Hardware Design Platform for Dynamically Reconfigurable Processor
C-Based Hardware Design Platform for Dynamically Reconfigurable Processor September 22 nd, 2005 IPFlex Inc. Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW
More informationMath 96--Radicals #1-- Simplify; Combine--page 1
Simplify; Combine--page 1 Part A Number Systems a. Whole Numbers = {0, 1, 2, 3,...} b. Integers = whole numbers and their opposites = {..., 3, 2, 1, 0, 1, 2, 3,...} c. Rational Numbers = quotient of integers
More informationFABRICATION TECHNOLOGIES
FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationMIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of
An Independent Analysis of the: MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of Berkeley Design Technology, Inc. OVERVIEW MIPS Technologies, Inc. is an Intellectual Property (IP)
More informationSystem-on Solution from Altera and Xilinx
System-on on-a-programmable-chip Solution from Altera and Xilinx Xun Yang VLSI CAD Lab, Computer Science Department, UCLA FPGAs with Embedded Microprocessors Combination of embedded processors and programmable
More informationAn introduction to Digital Signal Processors (DSP) Using the C55xx family
An introduction to Digital Signal Processors (DSP) Using the C55xx family Group status (~2 minutes each) 5 groups stand up What processor(s) you are using Wireless? If so, what technologies/chips are you
More informationD. Richard Brown III Associate Professor Worcester Polytechnic Institute Electrical and Computer Engineering Department
D. Richard Brown III Associate Professor Worcester Polytechnic Institute Electrical and Computer Engineering Department drb@ece.wpi.edu 12-November-2012 Efficient Real-Time DSP Data types Memory usage
More informationFPGA Polyphase Filter Bank Study & Implementation
FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes
More informationCOPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design
COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design Lecture Objectives Background Need for Accelerator Accelerators and different type of parallelizm
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationHigh performance, power-efficient DSPs based on the TI C64x
High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research
More informationB-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes
B-KD rees for Hardware Accelerated Ray racing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek Saarland University, Germany Outline Previous Work B-KD ree as new Spatial Index Structure DynR
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.
More informationJason Manley. Internal presentation: Operation overview and drill-down October 2007
Jason Manley Internal presentation: Operation overview and drill-down October 2007 System overview Achievements to date ibob F Engine in detail BEE2 X Engine in detail Backend System in detail Future developments
More informationThe S6000 Family of Processors
The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which
More informationFAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH
Key words: Digital Signal Processing, FIR filters, SIMD processors, AltiVec. Grzegorz KRASZEWSKI Białystok Technical University Department of Electrical Engineering Wiejska
More informationDIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING
1 DSP applications DSP platforms The synthesis problem Models of computation OUTLINE 2 DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: Time-discrete representation
More informationTHE NVIDIA DEEP LEARNING ACCELERATOR
THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural
More informationPACE: Power-Aware Computing Engines
PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious
More informationThe Xilinx XC6200 chip, the software tools and the board development tools
The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions
More informationThe extreme Adaptive DSP Solution to Sensor Data Processing
The extreme Adaptive DSP Solution to Sensor Data Processing Abstract Martin Vorbach PACT XPP Technologies Leo Mirkin Sky Computers, Inc. The new ISR mobile autonomous sensor platforms present a difficult
More informationGraduate Institute of Electronics Engineering, NTU 9/16/2004
/ 9/16/2004 ACCESS IC LAB Overview of DSP Processor Current Status of NTU DSP Laboratory (E1-304) Course outline of Programmable DSP Lab Lab handout and final project DSP processor is a specially designed
More informationCNP: An FPGA-based Processor for Convolutional Networks
Clément Farabet clement.farabet@gmail.com Computational & Biological Learning Laboratory Courant Institute, NYU Joint work with: Yann LeCun, Cyril Poulet, Jefferson Y. Han Now collaborating with Eugenio
More informationCache Justification for Digital Signal Processors
Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose
More informationBetter sharc data such as vliw format, number of kind of functional units
Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com
More informationHigh Performance Implementation of Microtubule Modeling on FPGA using Vivado HLS
High Performance Implementation of Microtubule Modeling on FPGA using Vivado HLS Yury Rumyantsev rumyantsev@rosta.ru 1 Agenda 1. Intoducing Rosta and Hardware overview 2. Microtubule Modeling Problem 3.
More informationEmbedded Target for TI C6000 DSP 2.0 Release Notes
1 Embedded Target for TI C6000 DSP 2.0 Release Notes New Features................... 1-2 Two Virtual Targets Added.............. 1-2 Added C62x DSP Library............... 1-2 Fixed-Point Code Generation
More informationUnderstanding Peak Floating-Point Performance Claims
white paper FPGA Understanding Peak ing-point Performance Claims Learn how to calculate and compare the peak floating-point capabilities of digital signal processors (DSPs), graphics processing units (GPUs),
More informationasoc: : A Scalable On-Chip Communication Architecture
asoc: : A Scalable On-Chip Communication Architecture Russell Tessier, Jian Liang,, Andrew Laffely,, and Wayne Burleson University of Massachusetts, Amherst Reconfigurable Computing Group Supported by
More informationUSING C-TO-HARDWARE ACCELERATION IN FPGAS FOR WAVEFORM BASEBAND PROCESSING
USING C-TO-HARDWARE ACCELERATION IN FPGAS FOR WAVEFORM BASEBAND PROCESSING David Lau (Altera Corporation, San Jose, CA, dlau@alteracom) Jarrod Blackburn, (Altera Corporation, San Jose, CA, jblackbu@alteracom)
More informationFlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com
FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers
More informationEnhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension
Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationEmbedded Controller Design. CompE 270 Digital Systems - 5. Objective. Application Specific Chips. User Programmable Logic. Copyright 1998 Ken Arnold 1
CompE 270 Digital Systems - 5 Programmable Logic Ken Arnold Objective Application Specific ICs Introduce User Programmable Logic Common Architectures Programmable Array Logic Address Decoding Example Development
More informationIndustrial Control. 50 to 5,000 VA. Applications. Specifications. Standards. Options and Accessories. Features, Functions, Benefits.
Industrial Control 6 50 to 5,000 VA Applications n For commercial and industrial control applications including; control panels, conveyor systems, machine tool equipment, pump systems, and commercial air
More informationTeam 1. Common Questions to all Teams. Team 2. Team 3. CO200-Computer Organization and Architecture - Assignment One
CO200-Computer Organization and Architecture - Assignment One Note: A team may contain not more than 2 members. Format the assignment solutions in a L A TEX document. E-mail the assignment solutions PDF
More informationIndustrial Control. 50 to 5,000 VA. Applications. Specifications. Standards. Options and Accessories. Features, Functions, Benefits.
Industrial Control 6 50 to 5,000 VA Applications n For commercial and industrial control applications including; control panels, conveyor systems, machine tool equipment, pump systems, and commercial air
More informationVersal: AI Engine & Programming Environment
Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY
More informationIntroduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013
Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationDesign Space Exploration for Memory Subsystems of VLIW Architectures
E University of Paderborn Dr.-Ing. Mario Porrmann Design Space Exploration for Memory Subsystems of VLIW Architectures Thorsten Jungeblut 1, Gregor Sievers, Mario Porrmann 1, Ulrich Rückert 2 1 System
More informationClassification of Semiconductor LSI
Classification of Semiconductor LSI 1. Logic LSI: ASIC: Application Specific LSI (you have to develop. HIGH COST!) For only mass production. ASSP: Application Specific Standard Product (you can buy. Low
More informationAgenda. How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware design
Catapult C Synthesis High Level Synthesis Webinar Stuart Clubb Technical Marketing Engineer April 2009 Agenda How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware
More information14K Mixable Rings 1/ 2 ctw, $1499. Silver Diamond Pendant 1/6 ctw, $299
1k 14K D.B.T.Y. 1/4 ctw, 1/2 ctw, $499 3/4 ctw, 9 1 ctw, 9 1m 1a 14K Mixable Rings 1/ 2 ctw, $1499 1h 14K Two Tone 1/5 ctw, $549 1/2 ctw, $1299 Triple Strand 1/4 ctw 14K Triple Strand 3/4 ctw 9 1c 14K
More informationStacks & Queues. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST
Stacks & Queues Kuan-Yu Chen ( 陳冠宇 ) 2018/10/01 @ TR-212, NTUST Review Stack Stack Permutation Expression Infix Prefix Postfix 2 Stacks. A stack is an ordered list in which insertions and deletions are
More informationMemory Systems and Performance Engineering
SPEED LIMIT PER ORDER OF 6.172 Memory Systems and Performance Engineering Fall 2010 Basic Caching Idea A. Smaller memory faster to access B. Use smaller memory to cache contents of larger memory C. Provide
More informationA Reconfigurable Multifunction Computing Cache Architecture
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 4, AUGUST 2001 509 A Reconfigurable Multifunction Computing Cache Architecture Huesung Kim, Student Member, IEEE, Arun K. Somani,
More informationSpiral 2-8. Cell Layout
2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric
More informationSoC Basics Avnet Silica & Enclustra Seminar Getting started with Xilinx Zynq SoC Fribourg, April 26, 2017
1 2 3 4 Introduction - Cool new Stuff Everybody knows, that new technologies are usually driven by application requirements. A nice example for this is, that we developed portable super-computers with
More informationRUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch
RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,
More informationDSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions
White Paper: Spartan-3 FPGAs WP212 (v1.0) March 18, 2004 DSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions By: Steve Zack, Signal Processing Engineer Suhel Dhanani, Senior
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGNAL PROCESSING SASE 2010 Universidad Tecnológica Nacional - FRBA Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.
More informationRepresentation of Numbers and Arithmetic in Signal Processors
Representation of Numbers and Arithmetic in Signal Processors 1. General facts Without having any information regarding the used consensus for representing binary numbers in a computer, no exact value
More informationIntroduction to the MMAGIX Multithreading Supercomputer
Introduction to the MMAGIX Multithreading Supercomputer A supercomputer is defined as a computer that can run at over a billion instructions per second (BIPS) sustained while executing over a billion floating
More informationEECS150 - Digital Design Lecture 09 - Parallelism
EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization
More informationQueues. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST
Queues Kuan-Yu Chen ( 陳冠宇 ) 2018/10/03 @ TR-212, NTUST Review Stack FILO Algorithms to covert Infix to Postfix Infix to Prefix Queue FIFO 2 Homeork1: Expression Convertor. Given a infix expression, please
More informationComputed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc.
Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc. CT Image Reconstruction Herman Head Sinogram Herman Head Reconstruction CT Image Reconstruction for all
More informationSimplifying FPGA Design for SDR with a Network on Chip Architecture
Simplifying FPGA Design for SDR with a Network on Chip Architecture Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 RF NoC 3 Status and Conclusions USRP FPGA Capability Gen
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable. Reproducibility. Don t depend on components
More informationMemory systems. Memory technology. Memory technology Memory hierarchy Virtual memory
Memory systems Memory technology Memory hierarchy Virtual memory Memory technology DRAM Dynamic Random Access Memory bits are represented by an electric charge in a small capacitor charge leaks away, need
More informationCourse Overview Revisited
Course Overview Revisited void blur_filter_3x3( Image &in, Image &blur) { // allocate blur array Image blur(in.width(), in.height()); // blur in the x dimension for (int y = ; y < in.height(); y++) for
More informationVICP Signal Processing Library. Further extending the performance and ease of use for VICP enabled devices
Signal Processing Library Further extending the performance and ease of use for enabled devices Why is library effective for customer application? Get to market faster with ready-to-use signal processing
More informationRISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER
RISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER Miss. Sushma kumari IES COLLEGE OF ENGINEERING, BHOPAL MADHYA PRADESH Mr. Ashish Raghuwanshi(Assist. Prof.) IES COLLEGE OF ENGINEERING, BHOPAL
More informationADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING
ADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING Evan Baytan Department of Electrical Engineering and Computer
More informationA Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013
A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company
More informationAn Optimizing Compiler for the TMS320C25 DSP Chip
An Optimizing Compiler for the TMS320C25 DSP Chip Wen-Yen Lin, Corinna G Lee, and Paul Chow Published in Proceedings of the 5th International Conference on Signal Processing Applications and Technology,
More informationLecture 41: Introduction to Reconfigurable Computing
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following
More informationDeveloping and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors
Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors Paul Ekas, DSP Engineering, Altera Corp. pekas@altera.com, Tel: (408) 544-8388, Fax: (408) 544-6424 Altera Corp., 101
More informationSpecializing Hardware for Image Processing
Lecture 6: Specializing Hardware for Image Processing Visual Computing Systems So far, the discussion in this class has focused on generating efficient code for multi-core processors such as CPUs and GPUs.
More informationMicroprocessors vs. DSPs (ESC-223)
Insight, Analysis, and Advice on Signal Processing Technology Microprocessors vs. DSPs (ESC-223) Kenton Williston Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com
More informationTwo-level Reconfigurable Architecture for High-Performance Signal Processing
International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA 04, pp. 177 183, Las Vegas, Nevada, June 2004. Two-level Reconfigurable Architecture for High-Performance Signal Processing
More informationDependable VLSI Platform using Robust Fabrics
Dependable VLSI Platform using Robust Fabrics Director H. Onodera, Kyoto Univ. Principal Researchers T. Onoye, Y. Mitsuyama, K. Kobayashi, H. Shimada, H. Kanbara, K. Wakabayasi Background: Overall Design
More informationEvaluating MMX Technology Using DSP and Multimedia Applications
Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More informationQsys and IP Core Integration
Qsys and IP Core Integration Stephen A. Edwards (after David Lariviere) Columbia University Spring 2016 IP Cores Altera s IP Core Integration Tools Connecting IP Cores IP Cores Cyclone V SoC: A Mix of
More informationThe Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006
The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 Hot Chips, 2006 Structure of the talk 65nm technology going towards 32nm Virtex-5 family Improved I/O Benchmarking
More informationIMPLICIT+EXPLICIT Architecture
IMPLICIT+EXPLICIT Architecture Fortran Carte Programming Environment C Implicitly Controlled Device Dense logic device Typically fixed logic µp, DSP, ASIC, etc. Implicit Device Explicit Device Explicitly
More informationHigh-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication
High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication Erik H. D Hollander Electronics and Information Systems Department Ghent University, Ghent, Belgium Erik.DHollander@ugent.be
More informationECE 471 Embedded Systems Lecture 2
ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will
More informationFPGA Provides Speedy Data Compression for Hyperspectral Imagery
FPGA Provides Speedy Data Compression for Hyperspectral Imagery Engineers implement the Fast Lossless compression algorithm on a Virtex-5 FPGA; this implementation provides the ability to keep up with
More informationAn Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki
An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 9 ] Shared Memory Performance Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture
More informationUnifi 45 Projector Retrofit Kit for SMART Board 580 and 560 Interactive Whiteboards
Unifi 45 Projector Retrofit Kit for SMRT oard 580 and 560 Interactive Whiteboards 72 (182.9 cm) 60 (152.4 cm) S580 S560 Cautions, warnings and other important product information are contained in document
More informationBenchmarking Multithreaded, Multicore and Reconfigurable Processors
Insight, Analysis, and Advice on Signal Processing Technology Benchmarking Multithreaded, Multicore and Reconfigurable Processors Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley,
More informationMemory Systems and Performance Engineering. Fall 2009
Memory Systems and Performance Engineering Fall 2009 Basic Caching Idea A. Smaller memory faster to access B. Use smaller memory to cache contents of larger memory C. Provide illusion of fast larger memory
More informationComputer Architecture s Changing Definition
Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction
More informationDesign of Reusable Context Pipelining for Coarse Grained Reconfigurable Architecture
Design of Reusable Context Pipelining for Coarse Grained Reconfigurable Architecture P. Murali 1 (M. Tech), Dr. S. Tamilselvan 2, S. Yazhinian (Research Scholar) 3 1, 2, 3 Dept of Electronics and Communication
More informationTeacher Assignment and Transfer Program (TATP) On-Line Application Quick Sheets
Teacher Assignment and Transfer Program (TATP) On-Line Application Quick Sheets On-Line Application The on-line application process is being piloted for teachers applying for advertised teaching positions
More informationCompilers and Code Optimization EDOARDO FUSELLA
Compilers and Code Optimization EDOARDO FUSELLA Contents LLVM The nu+ architecture and toolchain LLVM 3 What is LLVM? LLVM is a compiler infrastructure designed as a set of reusable libraries with well
More informationECE 5775 (Fall 17) High-Level Digital Design Automation. Specialized Computing
ECE 5775 (Fall 7) High-Level Digital Design Automation Specialized Computing Announcements All students enrolled in CMS & Piazza Vivado HLS tutorial on Tuesday 8/29 Install an SSH client (mobaxterm or
More informationKiloCore: A 32 nm 1000-Processor Array
KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationImplementation of DSP Algorithms
Implementation of DSP Algorithms Main frame computers Dedicated (application specific) architectures Programmable digital signal processors voice band data modem speech codec 1 PDSP and General-Purpose
More informationThis calculation converts 3562 from base 10 to base 8 octal. Digits are produced right to left, so the final answer is 6752.
COMP 222 Spring 2016 Midterm #1 Solutions Average = 83, Median = 87 Range # of Papers 100 2 90s 13 80s 8 70s 3 60s 4
More informationTMS320C6000 Programmer s Guide
TMS320C6000 Programmer s Guide Literature Number: SPRU198E October 2000 Printed on Recycled Paper IMPORTANT NOTICE Texas Instruments (TI) reserves the right to make changes to its products or to discontinue
More informationCMSC 2833 Lecture Memory Organization and Addressing
Computer memory consists of a linear array of addressable storage cells that are similar to registers. Memory can be byte-addressable, or word-addressable, where a word typically consists of two or more
More informationSection 8: Monomials and Radicals
In this section, we are going to learn skills for: NGSS Standards MA.912.A.4.1 Simplify monomials and monomial expressions using the laws of integral exponents. MA.912.A.6.1 Simplify radical expressions.
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more
More information