HW/SW Co-Design Lab. Seminar 2 WS 2018/2019. chair. Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G.
|
|
- Irene Edwards
- 5 years ago
- Views:
Transcription
1 Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G. Fettweis HW/SW Co-Design Lab Seminar WS 8/9 TU Dresden, Slide
2 CORE FEATURES Slide
3 corelx_hwswcd Xtensa LX ALU -bit MUL Load/Store Unit Basic Instr. Set Extended Instr. Set Load/Store Unit Memory Managment Unit (MMU) Processor Interface (PIF) Instr. Fetch Unit 8 System RAM System ROM 8 8 DMEM DMEM IMEM ICache Slide
4 corelx_hwswcd Memories Memory Size Start address Instruction RAM (IMEM) kb xfff Data RAM (DMEM) kb xffe Data RAM (DMEM) kb xffe8 Instruction Cache (ICache) kb -- System RAM MB x System ROM 8 kb x For parallel access, place important global variables/arrays in different DMEMs, e.g.: int data_array[8] attribute ((section(.dram.data ))); or int *data_array = (*int)xffe8; Otherwise the compiler automatically places the variables in System RAM. Slide
5 TIE Basic and most used TIE language constructs: Immediate Range Table State Register File Operation Schedule Instruction Format Slot Opcodes TIE Built-in Modules Function Documentation TIE-Doc.pdf Slide
6 FLIX FLIX: Flexible Length Instruction Extension Combine operations into one instruction word of flexible width VLIW: Very Large Instruction Word Allowing the processor to execute several operations simultaneously Format flix_x TIE code format flix_x {slot_, slot_, slot} slot_opcodes slot_ {vsalu} slot_opcodes slot_ {vsmac} slot_opcodes slot_ {vsldst} Requirements: At least opt. level O Enable optimization alias restrict: OPT:alias=restrict Slide
7 TASK : FIR FILTER TU Dresden, HW/SW Co-Design Lab Slide
8 FIR Filter FIR filter: Hot-Spots Multiply/Accumulate Operation (MAC) Load input values/coefficients Store output values void fir_c(short *in, int *out, int len) { int n, i; for (n = ; n < len+; n++) { for (i = ; i < 8; i++) out[n] += in[n+-i]*coeff[i]; } } Different approaches: ) Performing MAC on one input element with one coefficient ) Performing MAC on multiple elements in parallel SIMD ) Separated instructions for Load, MAC and Store Slide 8
9 Approach Performing MAC on one input element with one coefficient by applying Fusion C code: for (n = ; n < len+; n++) { for (i = ; i < 8; i++) FIR_MAC(out[n], in[n+-i], coeff[i]); } TIE code: operation FIR_MAC {inout AR acc, in AR a, in AR b} { } { wire [:] product = TIEmul(a[:], b[:], 'b); assign acc = acc + product; } FIR_MAC produces the output in one clock cycle. fused operations Slide 9
10 Approach Performing MAC on multiple elements in parallel SIMD FIR_MAC_SIMD produces output of the second for-loop in one clock cycle fused and parallel operations Recommended schedule to reduce c.p. Addition is done cycle after multiplications regfile VR_fir 8 8 vrf operation FIR_MAC_SIMD {out AR acc, in VR_fir a, in VR_fir b} { } { // 8x signed multiplications wire [:] product = TIEmul(a[:], b[:], 'b); wire [:] product = TIEmul(a[:9], b[:], 'b); wire [:] product = TIEmul(a[9:8], b[:], 'b); wire [:] product = TIEmul(a[9:], b[:8], 'b); wire [:] product = TIEmul(a[:8], b[9:], 'b); wire [:] product = TIEmul(a[:], b[9:8], 'b); wire [:] product = TIEmul(a[:], b[:9], 'b); wire [:] product = TIEmul(a[:], b[:], 'b); } // addition of the 8 products assign acc = TIEaddn(product, product, product, product, product,product,product,product); schedule FIR_MAC_SIMD_SCHED {FIR_MAC_SIMD} { def acc ; } Slide
11 Approach Separate Load, MAC, and Store operations into different TIE instructions Load 8 elements: FIR_LD Perform on 8 elements in parallel and store result in every clock cycle: FIR_CALC_ST Hold coefficients in Look-Up table access within clock cycle Pipelining by using Schedule or additional registers TIE code operation FIR_LD {} {out VAddr, in MemDataIn8, inout ptr_in, inout shad_in8, out in8_} { assign VAddr = ptr_in; assign ptr_in = ptr_in + ; assign shad_in8 = MemDataIn8; assign in8_ = shad_in8; } FIR_LD(); for(n=; n<((len+)>>)-; n++) { FIR_LD(); } C code Slide
12 Approach operation FIR_CALC_ST {} {out VAddr, out MemDataOut, out StoreByteDisable, inout ptr_out, inout in8_, inout in8_, inout st_cnt, inout product, inout product, inout product, inout product, inout product, inout product, inout product, inout product} { assign product = TIEmul(in8_[:], coeff[], 'b); assign product = TIEmul(in8_[: 9], coeff[], 'b); assign product = TIEmul(in8_[ 9: 8], coeff[], 'b); assign product = TIEmul(in8_[ 9: ], coeff[], 'b); assign product = TIEmul(in8_[ : 8], coeff[], 'b); assign product = TIEmul(in8_[ : ], coeff[], 'b); assign product = TIEmul(in8_[ : ], coeff[], 'b); assign product = TIEmul(in8_[ : ], coeff[], 'b); assign in8_ = {in8_[:], in8_[:]}; assign in8_ = {'h, in8_[:]}; assign VAddr = ptr_out; assign ptr_out = ptr_out + ; assign ptr_out_kill = st_cnt; assign StoreByteDisable = {{st_cnt}}; assign st_cnt = 'b; 8-fold SIMD MULs Shift/Prepare next values Interface/Pointer increment } // addition of the 8 products assign MemDataOut = TIEaddn(product, product, product, product, product, product, product, product); Write to memory Slide
13 TASK : FFT/IFFT Slide
14 Discrete Fourier Transformation DFT FIR filter: DFT: DFT is comparable with FIR filter: Input values x Coefficients become complex: b i ee jjππkkmm NN Slide
15 DFT FFT/DIT Cooley-Tukey algorithm for a Decimation in Time (DIT) radix- FFT N=n, N is power of Twiddle factor: WW mm nn = ee jjππmm nn Slide
16 Butterfly Network Decimation in Time DIT Compute Node ( ) FFT compute node ( ) ( ) ( ) ( ) ( ) ( ) ( ) Bit-reverse order WW WW WW WW sets WW WW WW WW sets set WW WW WW WW even odd + + x w twiddle factor + - x w Slide
17 Calculating Twiddle Factors WW nn mm = ee jjππmm nn n = WW = n = WW = WW = jj n = WW = WW = jj WW = jj WW = + jj n number of interleaved butterflies Slide
18 DFT FFT/DIF Sande-Tukey algorithm for a Decimation in Frequency (DIF) radix- FFT Slide 8
19 FFT DIF Even and odd indexed frequency values N=n, N is power of Twiddle factor: WW kk nn = ee jjππkk nn Slide 9
20 DIT, DIF Decimation in Time Decimation in Frequency ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) Bit-reverse order sets sets set sets sets set Bit-reverse order Slide
21 FFT: C code Given C code Fixed integer arithmetic ( bit real, bit imaginary part) Code for FFT/IFFT with any N with power of Data Reordering Variable scaling to prevent overflow Initialization Data Reordering Variable Scaling Stage i Calculate Twiddle Factor Compute Result of Butterfly Multiple Twiddle Factors? no yes Stage i+ Slide
22 SUMMARY Slide
23 Summary Tasks Introduce TIE instructions in hot-spots of FFT/IFFT Hold constants in TIE states or tables Compare DIT/DIF In the end, FFT should work with selected N with power of Parallelization MAC operation: Butterfly Compute Node SIMD: Butterflies of one stage can be computed independently FLIX: Load input data simultaneously by using the two available DMEMs of corelx_hwswcd Submit report + exported workspace until st March 9 via to sebastian.haas@tu-dresden.de Slide
HW/SW-Codesign Lab. Seminar 2 WS 2016/2017. chair. Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G.
Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G. Fettweis HW/SW-Codesign Lab Seminar WS / TU Dresden, Slide CORE FEATURES TU Dresden HW/SW-Codesign Lab Slide corelx_hwswcd Xtensa
More informationHW/SW Co-Design Lab. Seminar 1 WS 2017/2018. chair. Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G.
Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G. Fettweis HW/SW Co-Design Lab Seminar 1 WS 2017/2018 TU Dresden, Slide 1 Schedule of the Lab Introduction: Lab task: Work: Become
More informationENT 315 Medical Signal Processing CHAPTER 3 FAST FOURIER TRANSFORM. Dr. Lim Chee Chin
ENT 315 Medical Signal Processing CHAPTER 3 FAST FOURIER TRANSFORM Dr. Lim Chee Chin Outline Definition and Introduction FFT Properties of FFT Algorithm of FFT Decimate in Time (DIT) FFT Steps for radix
More informationDigital Signal Processing. Soma Biswas
Digital Signal Processing Soma Biswas 2017 Partial credit for slides: Dr. Manojit Pramanik Outline What is FFT? Types of FFT covered in this lecture Decimation in Time (DIT) Decimation in Frequency (DIF)
More informationLow-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units
Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units Abstract: Split-radix fast Fourier transform (SRFFT) is an ideal candidate for the implementation of a lowpower FFT processor, because
More informationFFT. There are many ways to decompose an FFT [Rabiner and Gold] The simplest ones are radix-2 Computation made up of radix-2 butterflies X = A + BW
FFT There are many ways to decompose an FFT [Rabiner and Gold] The simplest ones are radix-2 Computation made up of radix-2 butterflies A X = A + BW B Y = A BW B. Baas 442 FFT Dataflow Diagram Dataflow
More informationTOPICS PIPELINE IMPLEMENTATIONS OF THE FAST FOURIER TRANSFORM (FFT) DISCRETE FOURIER TRANSFORM (DFT) INVERSE DFT (IDFT) Consulted work:
1 PIPELINE IMPLEMENTATIONS OF THE FAST FOURIER TRANSFORM (FFT) Consulted work: Chiueh, T.D. and P.Y. Tsai, OFDM Baseband Receiver Design for Wireless Communications, John Wiley and Sons Asia, (2007). Second
More informationCHAPTER 5. Software Implementation of FFT Using the SC3850 Core
CHAPTER 5 Software Implementation of FFT Using the SC3850 Core 1 Fast Fourier Transform (FFT) Discrete Fourier Transform (DFT) is defined by: 1 nk X k x n W, k 0,1,, 1, W e n0 Theoretical arithmetic complexity:
More informationFFT/IFFTProcessor IP Core Datasheet
System-on-Chip engineering FFT/IFFTProcessor IP Core Datasheet - Released - Core:120801 Doc: 130107 This page has been intentionally left blank ii Copyright reminder Copyright c 2012 by System-on-Chip
More informationImplementing the Fast Fourier Transform for the Xtensa Processor
Implementing the Fast Fourier Transform for the Xtensa Processor Application Note Tensilica, Inc. 3255-6 Scott Blvd. Santa Clara, CA 95054 (408) 986-8000 Fax (408) 986-8919 www.tensilica.com November 2005
More informationFast Fourier Transform IP Core v1.0 Block Floating-Point Streaming Radix-2 Architecture. Introduction. Features. Data Sheet. IPC0002 October 2014
Introduction The FFT/IFFT IP core is a highly configurable Fast Fourier Transform (FFT) and Inverse Fast Fourier Transform (IFFT) VHDL IP component. The core performs an N-point complex forward or inverse
More informationUsing a Scalable Parallel 2D FFT for Image Enhancement
Introduction Using a Scalable Parallel 2D FFT for Image Enhancement Yaniv Sapir Adapteva, Inc. Email: yaniv@adapteva.com Frequency domain operations on spatial or time data are often used as a means for
More informationRadix-4 FFT Algorithms *
OpenStax-CNX module: m107 1 Radix-4 FFT Algorithms * Douglas L Jones This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 10 The radix-4 decimation-in-time
More informationAnalysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope
Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope G. Mohana Durga 1, D.V.R. Mohan 2 1 M.Tech Student, 2 Professor, Department of ECE, SRKR Engineering College, Bhimavaram, Andhra
More informationConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine
PRODUCT BRIEF ConnX D2 DSP Engine Dual-MAC, 16-bit Fixed-Point Communications DSP FEATURES BENEFITS Both SIMD and 2-way FLIX (parallel VLIW) operations Optimized, vectorizing XCC Compiler High-performance
More informationIntroduction to HPC. Lecture 21
443 Introduction to HPC Lecture Dept of Computer Science 443 Fast Fourier Transform 443 FFT followed by Inverse FFT DIF DIT Use inverse twiddles for the inverse FFT No bitreversal necessary! 443 FFT followed
More informationThis course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers
Course Introduction Purpose: This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers Objectives: Learn about error detection and address errors
More informationDESIGN METHODOLOGY. 5.1 General
87 5 FFT DESIGN METHODOLOGY 5.1 General The fast Fourier transform is used to deliver a fast approach for the processing of data in the wireless transmission. The Fast Fourier Transform is one of the methods
More informationAnand Raghunathan
ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.15: ASIP: Approaches to Design Anand Raghunathan raghunathan@purdue.edu ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014, ME 1052,
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGAL PROCESSIG UT-FRBA www.electron.frba.utn.edu.ar/dplab UT-FRBA Frequency Analysis Fast Fourier Transform (FFT) Fast Fourier Transform DFT: complex multiplications (-) complex aditions
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationXtensa. Andrew Mihal 290A Fall 2002
Xtensa Andrew Mihal 290A Fall 2002 1 Outline Introduction Single processor Xtensa system architecture Exporting a programming model for single processor Multiple processor system architecture Exporting
More informationHigh Performance Pipelined Design for FFT Processor based on FPGA
High Performance Pipelined Design for FFT Processor based on FPGA A.A. Raut 1, S. M. Kate 2 1 Sinhgad Institute of Technology, Lonavala, Pune University, India 2 Sinhgad Institute of Technology, Lonavala,
More informationMULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION
MULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION Maheshwari.U 1, Josephine Sugan Priya. 2, 1 PG Student, Dept Of Communication Systems Engg, Idhaya Engg. College For Women, 2 Asst Prof, Dept Of Communication
More informationELEC 427 Final Project Area-Efficient FFT on FPGA
ELEC 427 Final Project Area-Efficient FFT on FPGA Hamed Rahmani-Mohammad Sadegh Riazi- Seyyed Mohammad Kazempour Introduction The aim of this project was to design a 16 point Discrete Time Fourier Transform
More informationA Pipelined Fused Processing Unit for DSP Applications
A Pipelined Fused Processing Unit for DSP Applications Vinay Reddy N PG student Dept of ECE, PSG College of Technology, Coimbatore, Abstract Hema Chitra S Assistant professor Dept of ECE, PSG College of
More informationFPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith
FPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith Sudhanshu Mohan Khare M.Tech (perusing), Dept. of ECE Laxmi Naraian College of Technology, Bhopal, India M. Zahid Alam Associate
More informationHow to Increase ASICs and SOC Computational Performance with Long-Word Processors
WHITE PAPER How to Increase ASICs and SOC Computational Performance with Long-Word Processors VLIW processors execute multiple independent instructions each clock cycle and provide a tremendous performance
More informationC66x CorePac: Achieving High Performance
C66x CorePac: Achieving High Performance Agenda 1. CorePac Architecture 2. Single Instruction Multiple Data (SIMD) 3. Memory Access 4. Pipeline Concept CorePac Architecture 1. CorePac Architecture 2. Single
More informationImplementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics
Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics Yojana Jadhav 1, A.P. Hatkar 2 PG Student [VLSI & Embedded system], Dept. of ECE, S.V.I.T Engineering College, Chincholi,
More informationDecimation-in-Frequency (DIF) Radix-2 FFT *
OpenStax-CX module: m1018 1 Decimation-in-Frequency (DIF) Radix- FFT * Douglas L. Jones This work is produced by OpenStax-CX and licensed under the Creative Commons Attribution License 1.0 The radix- decimation-in-frequency
More information[FFT_2XRADIX4] Demo: FFT_2xRadix4. Fast Fourier Transform: Using 2- Radix 4 Butterflies 04/03/ IMPULSE ACCELERATED
Demo: FFT_2xRadix4 Fast Fourier Transform: Using 2- Radix 4 Butterflies Date Revison 04/03/20.0. IMPULSE ACCELERATED FFT_2xRadix4 Fast Fourier Transform: Using 2- Radix 4 Butterflies Introduction There
More informationAn MPSoC for Energy-Efficient Database Query Processing
Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G. Fettweis An MPSoC for Energy-Efficient Database Query Processing TensilicaDay 2016 Sebastian Haas Emil Matúš Gerhard Fettweis 09.02.2016
More informationEfficient Methods for FFT calculations Using Memory Reduction Techniques.
Efficient Methods for FFT calculations Using Memory Reduction Techniques. N. Kalaiarasi Assistant professor SRM University Kattankulathur, chennai A.Rathinam Assistant professor SRM University Kattankulathur,chennai
More informationComputing the Discrete Fourier Transform on FPGA Based Systolic Arrays
Computing the Discrete Fourier Transform on FPGA Based Systolic Arrays Chris Dick School of Electronic Engineering La Trobe University Melbourne 3083, Australia Abstract Reconfigurable logic arrays allow
More informationDesign & Development of IP-core of FFT for Field Programmable Gate Arrays Bhawesh Sahu ME Reserch Scholar,sem(IV),
Design & Development of IP-core of FFT for Field Programmable Gate Arrays Bhawesh Sahu ME Reserch Scholar,sem(IV), VLSI design, SSTC,SSGI(FET),Bhilai, Anil Kumar Sahu Assistant Professor,SSGI(FET),Bhlai
More informationEnergy Optimizations for FPGA-based 2-D FFT Architecture
Energy Optimizations for FPGA-based 2-D FFT Architecture Ren Chen and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Ganges.usc.edu/wiki/TAPAS Outline
More informationDESIGN OF PARALLEL PIPELINED FEED FORWARD ARCHITECTURE FOR ZERO FREQUENCY & MINIMUM COMPUTATION (ZMC) ALGORITHM OF FFT
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 199-206 Impact Journals DESIGN OF PARALLEL PIPELINED
More informationWe are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors
We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 3,800 116,000 120M Open access books available International authors and editors Downloads Our
More informationTechniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company
Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities
More informationA scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment
LETTER IEICE Electronics Express, Vol.11, No.2, 1 9 A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment Ting Chen a), Hengzhu Liu, and Botao Zhang College of
More information6. Fast Fourier Transform
x[] X[] x[] x[] x[6] X[] X[] X[3] x[] x[5] x[3] x[7] 3 X[] X[5] X[6] X[7] A Historical Perspective The Cooley and Tukey Fast Fourier Transform (FFT) algorithm is a turning point to the computation of DFT
More informationFatima Michael College of Engineering & Technology
DEPARTMENT OF ECE V SEMESTER ECE QUESTION BANK EC6502 PRINCIPLES OF DIGITAL SIGNAL PROCESSING UNIT I DISCRETE FOURIER TRANSFORM PART A 1. Obtain the circular convolution of the following sequences x(n)
More informationResearch Article Design of A Novel 8-point Modified R2MDC with Pipelined Technique for High Speed OFDM Applications
Research Journal of Applied Sciences, Engineering and Technology 7(23): 5021-5025, 2014 DOI:10.19026/rjaset.7.895 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitted:
More informationLOW-POWER SPLIT-RADIX FFT PROCESSORS
LOW-POWER SPLIT-RADIX FFT PROCESSORS Avinash 1, Manjunath Managuli 2, Suresh Babu D 3 ABSTRACT To design a split radix fast Fourier transform is an ideal person for the implementing of a low-power FFT
More informationAbstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs
Implementation of Split Radix algorithm for length 6 m DFT using VLSI J.Nancy, PG Scholar,PSNA College of Engineering and Technology; S.Bharath,Assistant Professor,PSNA College of Engineering and Technology;J.Wilson,Assistant
More informationHigh Throughput Energy Efficient Parallel FFT Architecture on FPGAs
High Throughput Energy Efficient Parallel FFT Architecture on FPGAs Ren Chen Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, USA 989 Email: renchen@usc.edu
More informationSuperscalar Machines. Characteristics of superscalar processors
Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance
More informationDesign of Delay Efficient Distributed Arithmetic Based Split Radix FFT
Design of Delay Efficient Arithmetic Based Split Radix FFT Nisha Laguri #1, K. Anusudha *2 #1 M.Tech Student, Electronics, Department of Electronics Engineering, Pondicherry University, Puducherry, India
More informationVLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design
VLIW DSP Processor Design for Mobile Communication Applications Contents crafted by Dr. Christian Panis Catena Radio Design Agenda Trends in mobile communication Architectural core features with significant
More informationLogiCORE IP Fast Fourier Transform v7.1
LogiCORE IP Fast Fourier Transform v7.1 DS260 March 1, 2011 Introduction The Xilinx LogiCORE IP Fast Fourier Transform (FFT) implements the Cooley-Tukey FFT algorithm, a computationally efficient method
More informationAscenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005
Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides:
More informationDigital Signal Processor Core Technology
The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x
More informationHow to Write Fast Numerical Code Spring 2012 Lecture 20. Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato
How to Write Fast Numerical Code Spring 2012 Lecture 20 Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato Planning Today Lecture Project meetings Project presentations 10 minutes each
More informationHigh-Performance 16-Point Complex FFT Features 1 Functional Description 2 Theory of Operation
High-Performance 16-Point Complex FFT April 8, 1999 Application Note This document is (c) Xilinx, Inc. 1999. No part of this file may be modified, transmitted to any third party (other than as intended
More informationAn Area Efficient Mixed Decimation MDF Architecture for Radix. Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix Parallel FFT Reshma K J 1, Prof. Ebin M Manuel 2 1M-Tech, Dept. of ECE Engineering, Government Engineering College, Idukki, Kerala, India 2Professor,
More informationLogiCORE IP Fast Fourier Transform v7.1
LogiCORE IP Fast Fourier Transform v7.1 DS260 April 19, 2010 Introduction The Xilinx LogiCORE IP Fast Fourier Transform (FFT) implements the Cooley-Tukey FFT algorithm, a computationally efficient method
More informationAn introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures
An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?
More informationIMPLEMENTATION OF FAST FOURIER TRANSFORM USING VERILOG HDL
IMPLEMENTATION OF FAST FOURIER TRANSFORM USING VERILOG HDL 1 ANUP TIWARI, 2 SAMIR KUMAR PANDEY 1 Department of ECE, Jharkhand Rai University,Ranchi, Jharkhand, India 2 Department of Mathematical Sciences,
More informationThe Serial Commutator FFT
The Serial Commutator FFT Mario Garrido Gálvez, Shen-Jui Huang, Sau-Gee Chen and Oscar Gustafsson Journal Article N.B.: When citing this work, cite the original article. 2016 IEEE. Personal use of this
More informationSTUDY OF A CORDIC BASED RADIX-4 FFT PROCESSOR
STUDY OF A CORDIC BASED RADIX-4 FFT PROCESSOR 1 AJAY S. PADEKAR, 2 S. S. BELSARE 1 BVDU, College of Engineering, Pune, India 2 Department of E & TC, BVDU, College of Engineering, Pune, India E-mail: ajay.padekar@gmail.com,
More informationare Softw Instruction Set Architecture Microarchitecture are rdw
Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics
More informationsystems such as Linux (real time application interface Linux included). The unified 32-
1.0 INTRODUCTION The TC1130 is a highly integrated controller combining a Memory Management Unit (MMU) and a Floating Point Unit (FPU) on one chip. Thanks to the MMU, this member of the 32-bit TriCoreTM
More informationHow to Write Fast Numerical Code
How to Write Fast Numerical Code Lecture: Optimizing FFT, FFTW Instructor: Markus Püschel TA: Georg Ofenbeck & Daniele Spampinato Rest of Semester Today Lecture Project meetings Project presentations 10
More informationParallel-computing approach for FFT implementation on digital signal processor (DSP)
Parallel-computing approach for FFT implementation on digital signal processor (DSP) Yi-Pin Hsu and Shin-Yu Lin Abstract An efficient parallel form in digital signal processor can improve the algorithm
More informationINTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
Course Outline Course Outline INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM Introduction Fast Fourier Transforms have revolutionized digital signal processing What is the FFT? A collection of tricks
More informationAn efficient multiplierless approximation of the fast Fourier transform using sum-of-powers-of-two (SOPOT) coefficients
Title An efficient multiplierless approximation of the fast Fourier transm using sum-of-powers-of-two (SOPOT) coefficients Author(s) Chan, SC; Yiu, PM Citation Ieee Signal Processing Letters, 2002, v.
More informationKanto Audio Player Design
Kanto Audio Player Design David Benedetto djb2167, Kavita Jain Cocks kj2264, Amrita Mazumdar am3210 Zhehao Mao zm2169, Darien Nurse don2102, Jonathan Yu jy2432 Project Overview: Our project will be an
More informationTwiddle Factor Transformation for Pipelined FFT Processing
Twiddle Factor Transformation for Pipelined FFT Processing In-Cheol Park, WonHee Son, and Ji-Hoon Kim School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea icpark@ee.kaist.ac.kr,
More informationTSEA 26 exam page 1 of Examination. Design of Embedded DSP Processors, TSEA26 Date 8-12, G34, G32, FOI hus G
TSEA 26 exam page 1 of 10 20171019 Examination Design of Embedded DSP Processors, TSEA26 Date 8-12, 2017-10-19 Room G34, G32, FOI hus G Time 08-12AM Course code TSEA26 Exam code TEN1 Design of Embedded
More informationFused Floating Point Arithmetic Unit for Radix 2 FFT Implementation
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 2, Ver. I (Mar. -Apr. 2016), PP 58-65 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Fused Floating Point Arithmetic
More informationALU(B) delay in cycles Arithmetic 32% 1 2 Data Transfer 36% 2 2 Floating Point 10% 3 4 Control Transfer 22% 2 2
Midterm No. 1 April, 2007 Arab Academy for Science, Technology & Maritime Transport School of Engineering Computer Department Computing Systems (CC 513) Time: 90 minutes Lecturer: Prof. Dr. Magdy Saeb
More information04 - DSP Architecture and Microarchitecture
September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:
More informationDecimation-in-time (DIT) Radix-2 FFT *
OpenStax-CNX module: m1016 1 Decimation-in-time (DIT) Radix- FFT * Douglas L. Jones This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 1.0 The radix- decimation-in-time
More informationOut of Order Processing
Out of Order Processing Manu Awasthi July 3 rd 2018 Computer Architecture Summer School 2018 Slide deck acknowledgements : Rajeev Balasubramonian (University of Utah), Computer Architecture: A Quantitative
More informationStreamlined real-factor FFTs
18th European Signal Processing Conference (EUSIPCO-010 Aalborg, Denmark, August 3-7, 010 Streamlined real-factor FFTs Mohammed Zafar Ali Khan ICSL IIT, Hyderabad-5005,India Email: zafar@iith.ac.in Shaik
More informationHow to Write Fast Numerical Code Spring 2011 Lecture 22. Instructor: Markus Püschel TA: Georg Ofenbeck
How to Write Fast Numerical Code Spring 2011 Lecture 22 Instructor: Markus Püschel TA: Georg Ofenbeck Schedule Today Lecture Project presentations 10 minutes each random order random speaker 10 Final code
More informationFixed Point Streaming Fft Processor For Ofdm
Fixed Point Streaming Fft Processor For Ofdm Sudhir Kumar Sa Rashmi Panda Aradhana Raju Abstract Fast Fourier Transform (FFT) processors are today one of the most important blocks in communication systems.
More informationCSE140 L. Instructor: Thomas Y. P. Lee. March 1, Agenda. Computer System Design. Computer Architecture. Instruction Memory design.
CSE4 L Instructor: Thomas Y. P. Lee March, 26 Agenda Computer System Design Computer Architecture Instruction Memory design Datapath Registers Program Counter Instruction Decoder Lab4 Simple Computer System
More informationComputer Engineering Mekelweg 4, 2628 CD Delft The Netherlands MSc THESIS
Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands http://ce.et.tudelft.nl/ 2009 MSc THESIS A Custom FFT Hardware Accelerator for Wave Fields Synthesis Reza Sadegh Azad Abstract Wave Field
More informationOutline. Applications of FFT in Communications. Fundamental FFT Algorithms. FFT Circuit Design Architectures. Conclusions
FFT Circuit Desig Outlie Applicatios of FFT i Commuicatios Fudametal FFT Algorithms FFT Circuit Desig Architectures Coclusios DAB Receiver Tuer OFDM Demodulator Chael Decoder Mpeg Audio Decoder 56/5/ 4/48
More informationThe Design and Implementation of FFTW3
The Design and Implementation of FFTW3 MATTEO FRIGO AND STEVEN G. JOHNSON Invited Paper FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize
More informationLatest Innovation For FFT implementation using RCBNS
Latest Innovation For FFT implementation using SADAF SAEED, USMAN ALI, SHAHID A. KHAN Department of Electrical Engineering COMSATS Institute of Information Technology, Abbottabad (Pakistan) Abstract: -
More informationSuperscalar Processors
Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance
More informationVHDL IMPLEMENTATION OF A FLEXIBLE AND SYNTHESIZABLE FFT PROCESSOR
VHDL IMPLEMENTATION OF A FLEXIBLE AND SYNTHESIZABLE FFT PROCESSOR 1 Gatla Srinivas, 2 P.Masthanaiah, 3 P.Veeranath, 4 R.Durga Gopal, 1,2[ M.Tech], 3 Associate Professor, J.B.R E.C, 4 Associate Professor,
More informationISSN: [Kavitha* et al., (6): 3 March-2017] Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY REVIEW PAPER ON EFFICIENT VLSI AND FAST FOURIER TRANSFORM ARCHITECTURES Kavitha MV, S.Ranjitha, Dr Suresh H N *Research scholar,
More informationLecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )
Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections 2.3-2.6) 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB) Branch prediction and instr fetch R1 R1+R2 R2 R1+R3 BEQZ R2 R3
More informationComputer Architecture Sample Test 1
Computer Architecture Sample Test 1 Question 1. Suppose we have 32-bit memory addresses, a byte-addressable memory, and a 512 KB (2 19 bytes) cache with 32 (2 5 ) bytes per block. a) How many total lines
More informationSuperscalar SMIPS Processor
Superscalar SMIPS Processor Group 2 Qian (Vicky) Liu Cliff Frey 1 Introduction Our project is the implementation of a superscalar processor that implements the SMIPS specification. Our primary goal is
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 3 Fundamentals in Computer Architecture Computer Architecture Part 3 page 1 of 55 Prof. Dr. Uwe Brinkschulte,
More informationThe Fast Fourier Transform
Chapter 7 7.1 INTRODUCTION The Fast Fourier Transform In Chap. 6 we saw that the discrete Fourier transform (DFT) could be used to perform convolutions. In this chapter we look at the computational requirements
More informationPutting MPSOC to Work in Multimedia
Putting MPSOC to Work in Multimedia Six billion people want live multimedia entertainment and information anywhere and anytime at the lowest cost 1 1. Multimedia subsystems appear everywhere big market
More informationThe Fast Fourier Transform Algorithm and Its Application in Digital Image Processing
The Fast Fourier Transform Algorithm and Its Application in Digital Image Processing S.Arunachalam(Associate Professor) Department of Mathematics, Rizvi College of Arts, Science & Commerce, Bandra (West),
More informationComputer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley
Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue
More informationISSN Vol.02, Issue.11, December-2014, Pages:
ISSN 2322-0929 Vol.02, Issue.11, December-2014, Pages:1119-1123 www.ijvdcs.org High Speed and Area Efficient Radix-2 2 Feed Forward FFT Architecture ARRA ASHOK 1, S.N.CHANDRASHEKHAR 2 1 PG Scholar, Dept
More informationEfficient FFT Algorithm and Programming Tricks
Connexions module: m12021 1 Efficient FFT Algorithm and Programming Tricks Douglas L. Jones This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License Abstract
More informationIMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL
IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL Tharanidevi.B 1, Jayaprakash.R 2 Assistant Professor, Dept. of ECE, Bharathiyar Institute of Engineering for Woman, Salem, TamilNadu,
More informationFPGA Implementation of IP-core of FFT Block for DSP Applications
IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 1 Issue 10, December 2014. FPGA Implementation of IP-core of FFT Block for DSP Applications Bhawesh Sahu 1,Anil Sahu
More informationAccelerating Nios II Systems with the C2H Compiler Tutorial
Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008, Version 8.0 Tutorial Introduction The Nios II C2H Compiler is a powerful tool that generates hardware accelerators for software
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More information