EECS Dept., University of California at Berkeley. Berkeley Wireless Research Center Tel: (510)
|
|
- Penelope Caldwell
- 5 years ago
- Views:
Transcription
1 A V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications Hui Zhang, Vandana Prabhu, Varghese George, Marlene Wan, Martin Benes, Arthur Abnous, and Jan M. Rabaey EECS Dept., University of California at Berkeley Broadcom Corp., Irvine, CA Berkeley Wireless Research Center Tel: (50) Allston Way, Suite 200 Fax: (50) Berkeley, CA hui@eecs.berkeley.edu Abstract Heterogeneous reconfiguration enables the flexible implementation of baseband wireless functions at energy levels between 50 and 00 MIPS/mW, 8 times lower than traditional DSP processors. A mm 2 prototype processor, targeted for voice compression is implemented in a 0.25 µm 6-metal CMOS process, and consumes.8 mw at an average operation rate of 40 MHz. It combines an embedded microprocessor with an array of computational units of different granularities, connected by a hierarchical configurable interconnect network. ISSCC Subject Area: Signal Processing
2 ISSCC Subject Area: Signal Processing A V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications Hui Zhang, Vandana Prabhu, Varghese George, Marlene Wan, Martin Benes, Arthur Abnous, and Jan M. Rabaey University of California at Berkeley Broadcom Corporation Introduction The advent of the third generation of wireless application creates a need for processing modules that simultaneously display high computational performance, ultra low-energy consumption and a high degree of flexibility and adaptability. The flexibility and adaptability is a necessity in the presence of multiple and evolving standards, and helps to increase quality-of-service in the presence of dynamically evolving conditions. (Re)configurable processors offer the advantage of combining flexibility and low-energy by providing a direct spatial mapping from algorithm to architecture, hence reducing the control overhead typically associated with instruction-set processors. General Concept The Pleiades processor approach [] combines an on-chip microprocessor with an array of heterogeneous programmable computational units of different granularities (called satellite processors) connected by a reconfigurable interconnect network (Figure ). The
3 microprocessor supports the control-intensive components of the applications as well as the reconfiguration, while repetitive and regular data-intensive loops (henceforth referred to kernels) are directly mapped on the array of satellites by configuring the satellite parameters and the interconnections between them (Figure 2). Synchronization between the satellite processors is accomplished by a data-driven communication protocol in accordance with the data-flow nature of the computations performed in the kernels. A generalized interface wrapper is placed around each satellite processor to comply with the communication protocol. This spatial programming approach results in energy dissipation levels of MIPS/mW, at least an order of magnitude better than what can be accomplished in comparable DSP processors by exploiting the locality of the computations and the correlations within data streams, and by distributing the control. Processor Architecture A prototype processor has been implemented targeting the domain of voice processing (and related applications) for wireless applications. The Maia processor (Figure 3) combines an ARM8 core with 2 satellite processors: two MACs, two ALUs, eight address generators, eight embedded memories ( bit, 4 K 6bit), and an embedded low-energy FPGA array [3]. Through an interface control unit, ARM8 configures the memory-mapped satellites using a configuration bus, and communicates data with satellites using 2 pairs of IO interface ports and direct memory reads/writes. Connections between satellite modules are accomplished through a 2-level hierarchical mesh-structured reconfigurable interconnect network. The 20-pin chip contains.2
4 million transistors and measures mm 2 in 0.25 µm 6-metal CMOS technology (Figure 4). The embedded ARM8 core is optimized for low-energy operation, and can operate under variable supply voltages [2]. Both the dual-stage pipelined MAC (including shift/round/saturate functions) and the ALU can be configured to handle a range of operations. The address generators and embedded memories are distributed to supply multiple parallel data streams to the computational elements. The address generator features a small local instruction memory, and can be programmed to support various types of addressing patterns and nested loops with loop counters and stride counters. It behaves as the local controller of data-flow kernels by initiating the data-flow threads, and by signaling the end of the data-flow threads to the ARM8. The embedded FPGA supports a 4 8 array of 5-input 3-output CLBs, optimized for arithmetic operations and data-flow control functions. It contains 3 levels of interconnect hierarchy, superimposing nearest-neighbor, mesh and tree architectures. Its energy-efficiency has been measured to be 70 times higher than equivalent industrial solutions [3]. The interface control unit coordinates synchronization and communication between the synchronous ARM8 core and the asynchronous reconfigurable data-paths, most importantly helping the core perform the reconfiguration of satellites by mapping all the configuration memories to the ARM8 memory space. Communication Network The data-driven synchronization between the processing elements employs a 2-phase self-timed handshaking scheme with REQUEST and ACKNOWLEDGE signals (Figure
5 5a), realized in a globally-asynchronous locally-synchronous implementation fashion. This approach not only reduces power consumption by ensuring that a module is only activated when data is ready, but also allows various modules to operate at different and dynamically varying rates. Each module includes a network interface controller to coordinate communication and synchronization. Data links combine 6-bit fixed-width data words with 2-bit control tokens that serve as tags of the different data structures (scalar, vector, or matrix) that are supported by the network (Figure 5b). Keeping the energy of the reconfigurable communication network as low as possible is crucial to the success of the approach. This is realized by a combination of architecture and circuit optimizations. The network itself is implemented as a 2-level hierarchical mesh. Several clusters of tightly connected modules are formed according to the communication locality. Each cluster has a local mesh with 2 buses-per-channel, and a universal switchbox at every intersection point (Figure 6a). Global interconnections are supported by a 2 nd level larger-granularity mesh (implemented on the higher metal layers) with 2 buses-per-channel and hierarchical switchboxes, located at the key connection points. The hierarchical switchbox (Figure 6b) contains a universal switchbox for each mesh-level, as well as a number of cross-level interconnect switches. This hierarchical network architecture requires only a limited number of buses to achieve sufficient connection flexibility for our target applications, and cuts the interconnect energy cost by a factor of 7 compared to a straightforward crossbar network implementation. Communication energy is further reduced by employing a low-swing (0.4V) pseudodifferential signaling scheme (Figure 7a). The capacitance loads are also reduced by
6 simplifying the switch network with NMOS-only switches. The circuit uses a single wire for each data bit while still retaining most advantages of differential signaling such as high common-mode noise rejection, low input-offset, and good sensitivity. It employs an NMOS-only push-pull driver with a very low voltage supply. The receiver is a clocked sense amplifier followed by a static flip-flop. It contains double pairs of input transistor, with the gates of P and P3 connected to d, while the gates of P4 and P2 biased at GND and REF respectively. Figure 7b shows the signaling waveforms. Initially, A and B are discharged to GND, and n and n2 are equalized. The receiver is enabled by a negative pulse, which is generated from the handshaking signals. If d is low, the current drive of P3 is same as that of P4, while the current drive of P is larger than that of P2. Consequently B and A are pulled high and low, respectively, by the cross-coupled inverter pair. An opposite transition is triggered if d is high. The following static flip-flop will retain the data value even after the sense amplifier is reinitialized. The low-swing signaling reduces the interconnect energy with a factor 3.4 compared to a full-swing CMOS implementation. Results and Data Measurements The overall chip characteristics are summarized in Table. Table 2 shows the performances of different chip components (based on a per-block analysis). The energy dissipation of the processor when programmed for a VCELP voice coder (with.8mw total power consumption) is presented in Table 3, including a breakdown of the energy over the major functions. Dominant kernels are directly mapped onto hardware satellites, and their run-time reconfiguration is performed by the ARM core. Therefore, the kernel energy presented in the table incorporate contributions from both satellite and ARM8
7 configuration. The program control part of the algorithm is completely mapped to the software. The total measured energy efficiency is a factor of 8 better than the best reported in literature [4]. Acknowledgments The research was funded by the DARPA ACS, and the California MICRO program. The support from Philips, Atmel, and Conexant is greatly appreciated. The authors also wish to thank SGS-Thompson for providing fabrication facilities of the integrated circuits. References [] Arthur Abnous and Jan Rabaey, Ultra-Low-Power Domain-Specific Multimedia Processors, IEEE VLSI Signal Processing Workshop, October 996. [2] Tom Burd et al, A Dynamic Voltage Scaled Microprocessor System, submitted to ISSCC [3] Varghese George et al, The Design of a Low-Energy FPGA, Proceedings of ISLPED99, Aug [4] Wai Lee et al, A V DSP for Wireless Communication, Digest of Technical Papers of ISSCC 97.
8 Technology Main Supply Voltage Additional Voltages Die Size Transistor Count Average Cycle Speed Average Power Dissipation 0.25 µm 6-level metal CMOS V 0.4 V,.5 V 5.2 mm x 6.7 mm.2 Million transistors 40 MHz.5-2 mw Table : Chip Characteristics Hardware modules Pipeline speed (ns) Energy consumption per operation (PJ) Area (mm 2 ) MAC ALU Memory (K x 6) Memory (52 x 6) Address generator Interconnect network 0 * NA FPGA 25 8** 2.76 Table 2: Performances of hardware modules *This number is the average energy consumption per connection **This number is the average energy consumption across various arithmetic functions Functionality Energy consumption (mj) for sec of VCELP speech processing Dot product FIR filter 0.3 IIIR filter 0.02 Kernels Vector sum with scalar multiply Compute code 0.0 Covariance matrix compute Program control Total.787 Table 3: VCELP energy consumption breakdown among dominant kernels and program control
9 Satellite Processors Configuration Bus Configurable Logic Embedded Memory Address Generator Reconfigurable Interconnect Micro- Processor Arithmetic Co-Processor Arithmetic Co-Processor Figure : Heterogeneous Reconfigurable Processor Architecture Execution Control AddrGen for (i=;i<=length;i++) { for (k=i<k<=length;k++) { phi[i][k] = phi[i-][k-] + in[np-i]*in[np-k] in[na--i]*in[na--k]; } } :i MPY MPY AddrGen :phi +/- Figure 2: Mapping a computational kernel on an array of satellite processors.
10 MemK MemK AG AG FPGA AG AG MemK MemK 2 MAC 5 m e M AG Mem52 ALU i o AG ALU i o MAC 2 5 m e M AG Mem52 AG Interface ARM Hierarchical Switchbox Universal Switchbox Level-2 Mesh Level- Mesh Figure 3: Floorplan of Prototype Processor Reconfigurable Network In Req in Processor Module Clk delay Done clk Out Req out In Req in Clk Enable Clk Done (a) Globally asynchronous - locally synchronous signaling MPY n MPY n n n MAC Data associated with an end-of-vector token Regular data (b) Control tokens differentiate and delineate data streams and data structures (scalar, vector, matrix) Figure 5: Data-driven globally-asynchronous locally-synchronous inter-processor communication.
11 AGU AGU FPGA AGU AGU MAC ALU ALU MAC Interconnect Network AGU AGU AGU AGU Interface ARM8 Core Figure 4: Heterogeneous Reconfigurable Processor Chip Microphotograph
12 Cluster Cluster (a) Level Mesh Universal Switchbox (b) Level 2 Mesh Hierarchical Switchbox (only cross-mesh connections are shown) Figure 6: Hierarchical Mesh Network and Switch Matrices in REF d clk clk P3 P n P6 B N3 N VDD P5 P2 P4 REF GND n2 GND P7 A N2 N4 clk out (a) Circuit diagram clk in d 0.4V V A B out (b) Circuit Waveforms Figure 7: Pseudo-differential low-swing interconnect circuitry
Silicon Architectures for Wireless Systems Part 2 Configurable Processors
Tutorial HotChips 01 Silicon Architectures for Wireless Systems Part 2 Configurable Processors Jan M. Rabaey BWRC University of California @ Berkeley http://www.eecs.berkeley.edu/~jan With contributions
More informationInterconnect Architecture Exploration for Low-Energy Reconfigurable Single-Chip DSPs
Interconnect Architecture Exploration for Low-Energy Reconfigurable Single-Chip DSPs ABSTRACT In this paper, we present and analyze a number of interconnect architectures for reconfigurable systems targeting
More informationManaging Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks
Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department
More informationTHE latest generation of microprocessors uses a combination
1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationAn Overview of Standard Cell Based Digital VLSI Design
An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,
More informationDesign Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Design Methodologies December 10, 2002 L o g i c T r a n s i s t o r s p e r C h i p ( K ) 1 9 8 1 1
More informationA 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More informationDesign Methodologies
Design Methodologies 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 Complexity Productivity (K) Trans./Staff - Mo. Productivity Trends Logic Transistor per Chip (M) 10,000 0.1
More informationImplementing Tile-based Chip Multiprocessors with GALS Clocking Styles
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California, Davis, USA Outline Introduction Timing issues
More informationFABRICATION TECHNOLOGIES
FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationOverview of SOC Architecture design
Computer Architectures Overview of SOC Architecture design Tien-Fu Chen National Chung Cheng Univ. SOC - 0 SOC design Issues SOC architecture Reconfigurable System-level Programmable processors Low-level
More informationA 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology
http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee
More informationA Time-Multiplexed FPGA
A Time-Multiplexed FPGA Steve Trimberger, Dean Carberry, Anders Johnson, Jennifer Wong Xilinx, nc. 2 100 Logic Drive San Jose, CA 95124 408-559-7778 steve.trimberger @ xilinx.com Abstract This paper describes
More informationMemory in Digital Systems
MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked
More informationECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141
ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition
More informationSpiral 2-8. Cell Layout
2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric
More informationCS250 VLSI Systems Design Lecture 9: Memory
CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled
More informationVLSI Test Technology and Reliability (ET4076)
VLSI Test Technology and Reliability (ET4076) Lecture 8(2) I DDQ Current Testing (Chapter 13) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Learning aims Describe the
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationCHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER
84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The
More informationFPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.
FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different
More informationA 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling
A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge,
More informationCS310 Embedded Computer Systems. Maeng
1 INTRODUCTION (PART II) Maeng Three key embedded system technologies 2 Technology A manner of accomplishing a task, especially using technical processes, methods, or knowledge Three key technologies for
More informationMEMORIES. Memories. EEC 116, B. Baas 3
MEMORIES Memories VLSI memories can be classified as belonging to one of two major categories: Individual registers, single bit, or foreground memories Clocked: Transparent latches and Flip-flops Unclocked:
More informationRUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch
RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,
More informationChapter 6. CMOS Functional Cells
Chapter 6 CMOS Functional Cells In the previous chapter we discussed methods of designing layout of logic gates and building blocks like transmission gates, multiplexers and tri-state inverters. In this
More informationMassively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain
Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,
More informationOUTLINE Introduction Power Components Dynamic Power Optimization Conclusions
OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism
More informationISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2
ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2 9.2 A 80/20MHz 160mW Multimedia Processor integrated with Embedded DRAM MPEG-4 Accelerator and 3D Rendering Engine for Mobile Applications
More informationA 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS
A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS ABSTRACT We describe L1 cache designed for digital signal processor (DSP) core. The cache is 32KB with variable associativity (4 to 16 ways) and is pseudo-dual-ported.
More informationColumn decoder using PTL for memory
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 5, Issue 4 (Mar. - Apr. 2013), PP 07-14 Column decoder using PTL for memory M.Manimaraboopathy
More informationA Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit
International Journal of Electrical and Computer Engineering (IJECE) Vol. 3, No. 4, August 2013, pp. 509~515 ISSN: 2088-8708 509 A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit Sidhant Kukrety*,
More informationVLSI Chip Design Project TSEK06
VLSI Chip Design Project TSEK06 Project Description and Requirement Specification Version 1.0 Project: A -Bit Kogge-Stone Adder Project number: 1 Project Group: Name Project members Telephone E-mail Project
More informationMemory in Digital Systems
MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked
More information6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1
6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,
More informationThe S6000 Family of Processors
The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which
More informationA 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing
A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine
More informationTing Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China
CMOS Crossbar Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China OUTLINE Motivations Problems of Designing Large Crossbar Our Approach - Pipelined MUX
More informationAn Asynchronous Array of Simple Processors for DSP Applications
An Asynchronous Array of Simple Processors for DSP Applications Zhiyi Yu, Michael Meeuwsen, Ryan Apperson, Omar Sattari, Michael Lai, Jeremy Webb, Eric Work, Tinoosh Mohsenin, Mandeep Singh, Bevan Baas
More informationADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING
ADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING Evan Baytan Department of Electrical Engineering and Computer
More informationIntroduction to Microprocessor
Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device
More informationINTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)
INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS) Bill Jason P. Tomas Dept. of Electrical and Computer Engineering University of Nevada Las Vegas FIELD PROGRAMMABLE ARRAYS Dominant digital design
More informationPOWER consumption has become one of the most important
704 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 4, APRIL 2004 Brief Papers High-Throughput Asynchronous Datapath With Software-Controlled Voltage Scaling Yee William Li, Student Member, IEEE, George
More informationAll MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes
MSEE Curriculum All MSEE students are required to take the following two core courses: 3531-571 Linear systems 3531-507 Probability and Random Processes The course requirements for students majoring in
More informationLow Power PLAs. Reginaldo Tavares, Michel Berkelaar, Jochen Jess. Information and Communication Systems Section, Eindhoven University of Technology,
Low Power PLAs Reginaldo Tavares, Michel Berkelaar, Jochen Jess Information and Communication Systems Section, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands {regi,michel,jess}@ics.ele.tue.nl
More informationBasic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices
3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific
More informationProASIC PLUS SSO and Pin Placement Guidelines
Application Note AC264 ProASIC PLUS SSO and Pin Placement Guidelines Table of Contents Introduction................................................ 1 SSO Data.................................................
More informationPrototype of SRAM by Sergey Kononov, et al.
Prototype of SRAM by Sergey Kononov, et al. 1. Project Overview The goal of the project is to create a SRAM memory layout that provides maximum utilization of the space on the 1.5 by 1.5 mm chip. Significant
More informationEmbedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory
Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives
More informationINTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017
Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of
More informationEE577b. Register File. By Joong-Seok Moon
EE577b Register File By Joong-Seok Moon Register File A set of registers that store data Consists of a small array of static memory cells Smallest size and fastest access time in memory hierarchy (Register
More informationUNIT V (PROGRAMMABLE LOGIC DEVICES)
UNIT V (PROGRAMMABLE LOGIC DEVICES) Introduction There are two types of memories that are used in digital systems: Random-access memory(ram): perform both the write and read operations. Read-only memory(rom):
More informationFPGA architecture and design technology
CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA
More information3. Implementing Logic in CMOS
3. Implementing Logic in CMOS 3. Implementing Logic in CMOS Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 27 September, 27 ECE Department,
More informationLeso Martin, Musil Tomáš
SAFETY CORE APPROACH FOR THE SYSTEM WITH HIGH DEMANDS FOR A SAFETY AND RELIABILITY DESIGN IN A PARTIALLY DYNAMICALLY RECON- FIGURABLE FIELD-PROGRAMMABLE GATE ARRAY (FPGA) Leso Martin, Musil Tomáš Abstract:
More informationAn overview of standard cell based digital VLSI design
An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased
More informationImplementation of Asynchronous Topology using SAPTL
Implementation of Asynchronous Topology using SAPTL NARESH NAGULA *, S. V. DEVIKA **, SK. KHAMURUDDEEN *** *(senior software Engineer & Technical Lead, Xilinx India) ** (Associate Professor, Department
More informationEE586 VLSI Design. Partha Pande School of EECS Washington State University
EE586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 1 (Introduction) Why is designing digital ICs different today than it was before? Will it change in
More informationIntroduction to asynchronous circuit design. Motivation
Introduction to asynchronous circuit design Using slides from: Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky, Intel Corporation, USA Alex Kondratyev, Theseus Logic,
More informationLecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.
Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports
More informationAn Overview of Microprocessor The first question comes in a mind "What is a microprocessor?. Let us start with a more familiar term computer. A digital computer is an electronic machine capable of quickly
More informationINTRODUCTION TO FPGA ARCHITECTURE
3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)
More informationContinuum Computer Architecture
Plenary Presentation to the Workshop on Frontiers of Extreme Computing: Continuum Computer Architecture Thomas Sterling California Institute of Technology and Louisiana State University October 25, 2005
More informationSTUDY OF SRAM AND ITS LOW POWER TECHNIQUES
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN ISSN 0976 6464(Print)
More informationDoing Nothing Well * Aka: Wake-up Receivers to the Rescue. Jan M. Rabaey, University of California at Berkeley VLSI Symposium June 17, 2009
Doing Nothing Well * Aka: Wake-up Receivers to the Rescue [* Original quote by David Culler, UCB] Jan M. Rabaey, University of California at Berkeley VLSI Symposium June 17, 2009 Outline Major Focus on
More informationChapter 2 Logic Gates and Introduction to Computer Architecture
Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are
More informationDynamic CMOS Logic Gate
Dynamic CMOS Logic Gate In dynamic CMOS logic a single clock can be used to accomplish both the precharge and evaluation operations When is low, PMOS pre-charge transistor Mp charges Vout to Vdd, since
More informationFPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)
FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor
More informationLow Power SRAM Design with Reduced Read/Write Time
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 3 (2013), pp. 195-200 International Research Publications House http://www. irphouse.com /ijict.htm Low
More informationVLSI Design Automation
VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,
More informationRTL Power Estimation and Optimization
Power Modeling Issues RTL Power Estimation and Optimization Model granularity Model parameters Model semantics Model storage Model construction Politecnico di Torino Dip. di Automatica e Informatica RTL
More informationLecture 11 SRAM Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010
EE4800 CMOS Digital IC Design & Analysis Lecture 11 SRAM Zhuo Feng 11.1 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitryit Multiple Ports Outline Serial Access Memories 11.2 Memory Arrays
More informationCreating a Scalable Microprocessor:
Creating a Scalable Microprocessor: A 16-issue Multiple-Program-Counter Microprocessor With Point-to-Point Scalar Operand Network Michael Bedford Taylor J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B.
More informationVLSI System Implementation of 200 MHz, 8-bit, 90nm CMOS Arithmetic and Logic Unit (ALU) Processor Controller
VLSI System Implementation of 200 MHz, 8-bit, 90nm CMOS Arithmetic and Logic Unit (ALU) Processor Controller Department Electronics and Communication Engineering, KL University, Vaddeswaram, Guntur (Dist.),
More informationHardware Design with VHDL PLDs IV ECE 443
Embedded Processor Cores (Hard and Soft) Electronic design can be realized in hardware (logic gates/registers) or software (instructions executed on a microprocessor). The trade-off is determined by how
More informationField Programmable Gate Array (FPGA)
Field Programmable Gate Array (FPGA) Lecturer: Krébesz, Tamas 1 FPGA in general Reprogrammable Si chip Invented in 1985 by Ross Freeman (Xilinx inc.) Combines the advantages of ASIC and uc-based systems
More informationProblem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.
Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.
More informationPower dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.
The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults
More informationOrganic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design
Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the
More informationLow-Power SRAM and ROM Memories
Low-Power SRAM and ROM Memories Jean-Marc Masgonty 1, Stefan Cserveny 1, Christian Piguet 1,2 1 CSEM, Neuchâtel, Switzerland 2 LAP-EPFL Lausanne, Switzerland Abstract. Memories are a main concern in low-power
More informationGRE Architecture Session
GRE Architecture Session Session 2: Saturday 23, 1995 Young H. Cho e-mail: youngc@cs.berkeley.edu www: http://http.cs.berkeley/~youngc Y. H. Cho Page 1 Review n Homework n Basic Gate Arithmetics n Bubble
More informationNoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods
1 NoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods V. Venkatraman, A. Laffely, J. Jang, H. Kukkamalla, Z. Zhu & W. Burleson Interconnect Circuit
More informationDesign and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology
Vol. 3, Issue. 3, May.-June. 2013 pp-1475-1481 ISSN: 2249-6645 Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology Bikash Khandal,
More informationThe Design of the KiloCore Chip
The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory
More informationEECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:
Problem 1: CLD2 Problems. (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are: C 0 = A + BD + C + BD C 1 = A + CD + CD + B C 2 = A + B + C + D C 3 = BD + CD + BCD + BC C 4
More informationEECS 151/251A: SPRING 17 MIDTERM 2 SOLUTIONS
University of California College of Engineering Department of Electrical Engineering and Computer Sciences J. Rabaey G. Alexandrov, N. Narevsky, V. Iyer MoWe 4-5:30pm Mo, Oct. 2, 6:00-7:30pm EECS 151/251A:
More informationEE 466/586 VLSI Design. Partha Pande School of EECS Washington State University
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 18 Implementation Methods The Design Productivity Challenge Logic Transistors per Chip (K) 10,000,000.10m
More informationDYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)
DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS
More informationECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures
ECE 747 Digital Signal Processing Architecture DSP Implementation Architectures Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 406 Spring 2006 Slide 1 My Goal Challenge
More informationDesign & Implementation of 64 bit ALU for Instruction Set Architecture & Comparison between Speed/Power Consumption on FPGA.
Design & Implementation of 64 bit ALU for Instruction Set Architecture & Comparison between Speed/Power Consumption on FPGA 1 Rajeev Kumar Coordinator M.Tech ECE, Deptt of ECE, IITT College, Punjab rajeevpundir@hotmail.com
More informationMemory in Digital Systems
MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked
More informationHigh Performance Interconnect and NoC Router Design
High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali
More informationImplementation of ALU Using Asynchronous Design
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 6 (Nov. - Dec. 2012), PP 07-12 Implementation of ALU Using Asynchronous Design P.
More informationA Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique
A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique P. Durga Prasad, M. Tech Scholar, C. Ravi Shankar Reddy, Lecturer, V. Sumalatha, Associate Professor Department
More informationAbbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University
Abbas El Gamal Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program Stanford University Chip stacking Vertical interconnect density < 20/mm Wafer Stacking
More information1. Designing a 64-word Content Addressable Memory Background
UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Project Phase I Specification NTU IC541CA (Spring 2004) 1. Designing a 64-word Content Addressable
More informationA Comparative Study of Power Efficient SRAM Designs
A Comparative tudy of Power Efficient RAM Designs Jeyran Hezavei, N. Vijaykrishnan, M. J. Irwin Pond Laboratory, Department of Computer cience & Engineering, Pennsylvania tate University {hezavei, vijay,
More informationFPGA Power Management and Modeling Techniques
FPGA Power Management and Modeling Techniques WP-01044-2.0 White Paper This white paper discusses the major challenges associated with accurately predicting power consumption in FPGAs, namely, obtaining
More information