Microprocessor and DSP Technologies for the Nanoscale Era
|
|
- Olivia Maxwell
- 5 years ago
- Views:
Transcription
1 Microprocessor and DSP Technologies for the Nanoscale Era Seminar 1 Ram Kumar Krishnamurthy Microprocessor Research Labs Intel Corporation, Hillsboro, OR ram.krishnamurthy@intel.com 1 July 5, 2005 Intel Labs
2 About Circuits Research Lab Established 1996 Belongs under Microprocessor Technology Labs Located in Hillsboro, Oregon, USA (primary) and Bangalore, India 75 researchers Charter: High-performance & low-power digital circuits Off-chip I/O signaling circuits Power delivery circuits >50 patents, >25 papers per year 2 Intel Labs
3 Motivation: Higher performance at lower power and cost MIPS Pentium 4 Architecture Pentium Pro Architecture Pentium Architecture Strong demand for > 1 TIPS performance beyond this decade How do you get there?
4 Our Research Agenda Outlook Technology Node (nm) Integration Capacity (BT) Delay = CV/I scaling ~0.7 >0.7 Delay scaling will slow down 8 Energy/Logic Op scaling Bulk Planar CMOS Alternate, 3G etc Variability >0.35 >0.5 >0.5 Energy scaling will slow down High Probability Low Probability Low Probability High Probability Medium High Very High ILD (K) ~3 <3 Reduce slowly towards RC Delay Metal Layers to 1 layer per generation Internal University FCRP(MARCO) 4
5 Intel s Research Focus 1 Technology Leadership Gate Length 1000 Industry 0.1 Intel nm Complete solution stack Technology Arch & Design Platforms Software 5
6 Architectures & Designs Back End Server Server Desktop Mobile Handheld Family Itanium Itanium Xeon Pentium Celeron Centrino Pentium Xscale Architecture IA64, VLIW IA64/ IA32 IA32 IA32 ARM Word 64 bit 64 bit Itanium 32 bit Xeon 32 bit 32 bit 32 bit Address Space Huge Huge/4 GB 4 GB 4 GB 4 GB Cache Performance 6 MB 6 MB, 2 MB 1 MB 1 MB 512 KB High High High Medium Low Power ~130W ~100 W < 100 W ~25 W < 1W Power Metric Watts/sq ft Watts/cu ft Watts Watt-hours Battery Life Cost High High Med Med Low 6 Our research agenda addresses all these platforms
7 Is Transistor a Good Switch? I = 0 I 0 On I = I = 1ma/u I = 0 I 0 Off I = 0 I 0 Sub-threshold Leakage 7
8 Sub-threshold Leakage MOS Transistor Characteristics Ids (log) Vt Exponential Increase in Ioff Vgs Ioff (na/u) Temp (C) SD Leakage (Watts) X Tr Growth 1.5X Tr Growth 0.25u 0.18u 0.13u 90nm 65nm 45nm Technology Transistors will not be switches,, but dimmers 8
9 Leakage Power Leakage Power (% of Total) 50% 40% 30% 20% 10% 0% Technology (µ) Leakage power limits Vt scaling Must stop at 50% A. Grove, IEDM
10 High Leakage Impacts Functionality Clock Pk 0 I Leak Dyn_out M 11 M 1j M 1K M 21 M 2j M 2K INV_out Keeper / pulldown ratio 1.6 Sub-70nm X 3X 5X 10X 20X Subthreshold + gate leakage 10 M. Anders, R. Krishnamurthy et al, 2001 Symp. VLSI Circuits Sub-65nm Dynamic Circuit Active Leakage Tolerance: Cache, RF, Arrays, Bitlines most affected Keeper sizes > 50% of pulldown strength High contention degraded performance Slow keeper shutoff high short-circuit power
11 Power Will be the Limiter Million Transistors Pentium 4 proc 386 Pentium proc 1 Billion Transistors MHz Pentium 4 proc 386 Pentium proc GHz B transistor integration capacity will exist Pentium 4 proc MIPS Pentium proc TIPS Power (Watts) Pentium 4 proc Pentium proc 1000's of Watts? Applications will demand TIPS performance But the Power Challenge: Highest performance in the power envelope
12 Power Trend Power (W) Pentium processor 386 Cooling Capacity Of Conventional System 486 Pentium II processor Pentium 4 processor Business As Usual is is Not an an Option C scales by by 30% per generation but Vcc scales by by 10-15% 15% only! Must maintain or or reduce power in in future
13 Gate Oxide is Near Limit CoSi 2 130nm Transistor 70 nm Si 3 N 4 Gate Leakage (Watts) 1.E+06 1.E+05 1.E+04 1.E+03 1.E+02 1.E+01 1.E+00 1.E-01 1.E-02 1.E X 2X During Burn-in 1.4X Vdd 0.25u 0.18u 0.13u 90nm 65nm 45nm Technology Poly Si Gate Electrode 1.5 nm Gate Oxide Si Substrate Intel s High K leadership is crucial for the industry 13
14 Power Density Will Get Even Worse 10,000 1,000 Power Density (W/cm2) Need to Keep the Junctions Cool Performance (Higher Frequency) Lower leakage (Exponential) Better reliability (Exponential) Pentium processors Pat Gelsinger, ISSCC 2001
15 Active Power Reduction Low Supply Voltage Slow Fast Slow High Supply Voltage Multiple Supply Voltages 15 Vdd Logic Block Replicated Designs Freq = 1 Vdd = 1 Throughput = 1 Power = 1 Area = 1 Pwr Den = 1 Vdd/2 Logic Block Logic Block Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = Need high-speed multi-supply level converter techniques
16 Leakage Control Body Bias Stack Effect Sleep Transistor Vdd +Ve Vbp Equal Loading Logic Block -Ve Vbn 2-10X reduction X reduction X reduction Need low leakage and leakage tolerant techniques
17 Dual Vt Design for Active Leakage Reduction Number of paths High Vt Delay Technology provides two V t High V t with nominal I off (lower performance) Low V t with ~10X higher l off (higher performance) Employing high V t everywhere yields lower performance, and lower leakage (1X) Number of paths Low Vt Delay Employing low V t everywhere yields higher performance, but higher leakage (10X) Number of paths Dual Vt Delay Logic path between latch boundaries Selective usage of low and high V t yields higher performance, yet low leakage between 1X, and <<10X 17
18 Chip Multi-Processing C1 C3 Cache C2 C4 Relative Performance CMP ST Die Area, Power Multi-core, each core Multi-threaded Shared cache and front side bus Each core has different Vdd & Freq Core hopping to spread hot spots Lower junction temperature 18
19 Memory Latency CPU Small ~few Clocks Cache Memory Large ns Memory Latency (Clocks) Assume: 50ns Memory latency Freq (MHz) Cache miss hurts performance Worse at higher frequency Need power efficient high-speed I/O techniques
20 Increase on-die Memory Power Density (Watts/cm 2 ) Logic Memory 0.25µ 0.18µ 0.13µ 0.1µ 100% 80% 60% 40% 20% 0% Cache % of full chip area? Pentium Pentium Pentium Pro II Pentium III & 4 Pentium III Pentium 4 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.10µ Large on die memory provides: 1. Increased Data Bandwidth & Reduced Latency 2. Hence, higher performance for much lower power 20
21 Special Purpose Hardware Acceleration TCP Offload Engine 1.E+06 PLL OOO ROB Send buffer MIPS 1.E+05 1.E+04 GP TCB Exec Core TCB Exec Core ROM ROM Input seq CAM1 CLB 1.E+03 TOE 1.E mm X 3.54 mm, 260K transistors Opportunities for acceleration: Network processing engines MPEG Encode/Decode engines Speech engines Wireless communication/baseband Special purpose HW Best MIPS/Watt 21
22 Energy-efficient efficient Data-path Circuits Cache Processor thermal map Temp ( o C) Execution core Integer and FP ALUs and MACs 22 ALUs: performance and peak-current limiters High activity thermal hotspots Goal: high-performance energy-efficient design
23 23 130nm 9GHz 32-bit Integer ALU (ISSCC 02) 32-bit integer exec core Input FIFO Die Size Process Interconnect Transistors Maximum V CC Scan Ctl RF Clock Output FIFO ALU 1.61 x 1.44 mm 130nm CMOS 1 poly, 6 metal 160K 1.5 V Misc BB Ctl M. Anders, R. Krishnamurthy et al, Intl. Solid-state Circuits Conf & IEEE Journal of Solid-state Circuits 11/02 F max (GHz) Power (mw) Supply Voltage (V) Design target: 6.5GHz at 120mW Supply Voltage (V) Leakage Power (mw)
24 90nm 7GHz 64-bit Integer ALU (ISSCC 04) Clock Generator and Drivers Lower-order 32-bit ALU Upper-order 32-bit ALU I/O Circuits S. Mathew, R. Krishnamurthy et al, Intl. Solid-state Circuits Conf & IEEE Journal of Solid-state Circuits 01/05 Process Die area 64-bit ALU layout area 90nm Dual-Vt CMOS, 7 Metal 0.474mm mm 2 Total transistor count bit ALU average switching power (α=0.3)( 89mW at 4GHz, 1.3V, 25 o C 64-bit ALU active leakage power 9.6mW at 1.3V, 25 o C 64-bit ALU maximum frequency 7GHz at 2.1V, 25C 32-bit ALU average switching power (α=0.3)( 71mW at 7GHz, 1.3V, 25 o C 32-bit ALU active leakage power 4.4mW at 1.3V, 25 o C 64-bit ALU die microphotograph and measured performance summary 24 7GHz single-cycle 64-bit integer ALU ALU (measured in in 90nm CMOS) Simultaneous 9GHz single-cycle 32-bit integer ALU ALU mode Fastest reported single-cycle 64-bit integer ALU ALU performance
25 90nm 1GHz 9mW 16*16b Multiplier (ISSCC 05) Clock Generator and Drivers R-PLA Registers 16x16b Multiplier I/O Circuits 25 S. Hsu, R. Krishnamurthy et al, Intl. Solid-state Circuits Conf Process Die area 16b Multiplier and PLA layout area 90nm Dual-V CMOS t 0.474mm mm 2 16b Multiplier worst-case power 9mW at 1GHz, 1.3V, 50 o C (nominal) 16b Multiplier active leakage power 540µW W at 1.3V, 50 o C (nominal) 16b Multiplier peak performance 1.5GHz, 32mW at 1.95V, 50 o C 16b Multiplier low-voltage mode performance 50MHz, 79µW W at 0.57V, 50 o C Reconfigurable PLA peak performance 2.3GHz, 4.2mW at 1.3V, 50 C Reconfigurable PLA worst-case power 2mW at 1GHz, 1.3V, 50 o C (nominal) Stand-by mode power 75µW W (7X reduction vs. active leakage) 16*16-bit Multiplier die microphotograph and measured performance summary 1GHz single-cycle 16*16-bit DSP DSP multiplier (measured in in 90nm CMOS) Reconfigurable PLA PLA control engine 9pJ/Op or or 110GOPS/Watt Highest reported GOPS/Watt for for single-cycle 16-bit multiply
26 32-bit ALU architecture External operands External operands Mux control 6:1 Mux 6:1 Mux Shift control 5:1 Mux 2:1 Mux Adder core O/p Mux Sum Mux control Sign control Loopback bus 26 Multiple ALUs clustered together in the execution core High power density
27 Full-Adder Intro A B Cin Full adder Sum Cout 27
28 The Binary Adder A B Cin Full adder Sum Cout S = A B C i = ABC i + ABC i + ABC i + ABC i C o = AB + BC i + AC i 28
29 The Ripple-Carry Adder A 0 B 0 A 1 B 1 A 2 B 2 A 3 B 3 C i,0 C o,0 C o,1 C o,2 C o,3 FA FA FA FA (= C i,1 ) S 0 S 1 S 2 S 3 Worst case delay linear with the number of bits t d = O(N) t adder = (N-1)t carry + t sum Goal: Make the fastest possible carry path circuit 29
30 Static CMOS Full Adder V DD V DD C i A B A B A B A C i X B C i V DD C i A S C i A B B V DD A B C i A C o B 28 Transistors 30
31 Carry Look-ahead Sum i = A i B i Carry i-1 Carry i = A i B i + (A i +B i )Carry i-1 Intel Labs
32 Partial Sum Sum i = A i B i Carry i-1 Carry i = A i B i + (A i +B i )Carry i-1 Intel Labs
33 Partial Sum Sum i = A i B i Carry i-1 Carry i = A i B i + (A i +B i )Carry i-1 Generate Propagate Intel Labs
34 Partial Sum Sum i = A i B i Carry i-1 Carry i = A i B i + (A i +B i )Carry i-1 Generate Propagate Carry i = G i + P i Carry i-1 Intel Labs
35 High-performance Adders: Even input bits Kogge Stone PG Gen. CM1 CM2 CM3 CM4 CM5 XOR Sum even Odd input bits PG Gen. CM1 CM2 CM3 CM4 CM5 XOR Sum odd GG=G i +P i G i-1 GP=P i P i-1 Generate all 32 carries: 35 Full-blown binary tree energy-inefficient # Carry-merge stages = log 2 (32) 5 stages
36 Kogge-Stone Adder Carry-merge gates XOR PG Critical path = PG+5+XOR = 7 gate stages Generate,Propagate fanout of 2,3 Maximum interconnect spans 16b Energy inefficient
37 Sparse-tree Adder Architecture Generate every 4 th carry in parallel Side-path: 4-bit conditional sum generator 73% fewer carry-merge gates energy-efficient 37 C 27 C 23 C 19 C 15 C 11 C 7 C 3
38 Non-critical Sum Generator P i+3,g i+3 P i+2,g i+2 P i+1 G i+1 P i CM 1 0 CM CM CM CM XOR CM XOR XOR XOR XOR XOR Sum i,1 Sum i,0 38 2:1 Sum i+3 2:1 2:1 2:1 Sum i+2 Sum i+1 Sum i Carry Non-critical path: ripple carry chain Reduced area, energy consumption, leakage Generate conditional sums for each bit Sparse-tree carry selects appropriate sum
39 Adder Inputs 39 Adder Core Critical Path clk PG GG 1 clk2 GG 7 clk3 Single-rail dynamic sparse-tree path Sum 31_0 clk CM0 Latch GG 3 CM1 XOR GG 15 Static sum generator GG 27 Sum 31_1 C 27 Sum 31 Critical path: 7 gate stages same as KS Sparse-tree: single-rail dynamic Exploit non-criticality of sum generator Convert to static logic Semi-dynamic design
40 Sparse-tree Architecture Performance impact: (20% speedup) 33-50% reduced G/P fanouts 80% reduced wiring complexity 30% reduction in maximum interconnect Power impact: (56% reduction) 73% fewer carry-merge gates 50% reduction in average transistor size 40
41 Energy-delay Space 41 Worst-case Energy (pj) % 20% 130nm CMOS, 1.2V, 110 o C Dynamic Kogge-Stone 20 4GHz 0 Design Semi-dynamic Sparse-Tree Delay (ps) 20% speedup over Kogge-Stone 56% worst-case energy reduction Scales with activity factor
42 Semi-dynamic Design Average Energy ( (pj pj) % Dynamic Kogge-Stone Semi-dynamic Sparse-Tree Activity factor Static sum generators : low switching activity 71% lower average energy at 10% activity 42
43 So, How Do We Get There? Multi-Threaded, Multi-Core Multi Threaded MIPS Super Scalar Era of Thread & Processor Level Parallelism Speculative, OOO Era of Instruction Era of Special Level Pipelined Purpose HW Architecture Parallelism Significant Challenges Ahead Can only be solved with joint industry -university industry-university collaboration
44 Thank You for Your Attention Q&A Our publications can be found in: IEEE Intl. Solid-State State Circuits Conference, IEEE Journal of Solid-State State Circuits, Symposium on VLSI Circuits, Intl. Symposium on Low-power Design, Custom Integrated Circuits Conference, SOCC, etc.,
45 45 Backup
46 Optimized First-level Carry-merge CM 0 Conditional Carry for Cin=0 P i Cin=0 G i C#_0 i G i C#_0 G i P i Carry-merge stage reduces to inverter Conditional carry_0 = G i # 46
47 Optimized First-level Carry-merge CM 1 Conditional carry for Cin=1 Cin=1 P i P i C#_1 G i C#_1 G i G i P i A i B i P i G i C#_ P i C#_1 47 P i & G i correlated Conditional carry_1 = P i #
48 Optimized Sum Generator P i+3,g i+3 P i+2,g i+2 P i+1 G i+1 P i Optimized 1st-level carry-merge CM CM CM XOR CM XOR XOR XOR XOR XOR Sum i,1 Sum i,0 2:1 2:1 2:1 2:1 Carry Sum i+3 Sum i+2 Sum i+1 Sum i Optimized non-critical path: 4 stages 48
Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004
Gigascale Integration Design Challenges & Opportunities Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Outline CMOS technology challenges Technology, circuit and μarchitecture solutions Integration
More informationMulti-Core Microprocessor Chips: Motivation & Challenges
Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005
More informationEE241 - Spring 2004 Advanced Digital Integrated Circuits
EE24 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolić Lecture 2 Impact of Scaling Class Material Last lecture Class scope, organization Today s lecture Impact of scaling 2 Major Roadblocks.
More informationMOS High Performance Arithmetic
MOS High Performance Arithmetic Mark Horowitz Stanford University horowitz@ee.stanford.edu Arithmetic Is Important Then Now (TegraK1) 2 What Is Hard? 9999999 + 1 3 Proof: 1 st Gen 2 nd Gen 4 And Getting
More informationEE241 - Spring 2000 Advanced Digital Integrated Circuits. Practical Information
EE24 - Spring 2000 Advanced Digital Integrated Circuits Tu-Th 2:00 3:30pm 203 McLaughlin Practical Information Instructor: Borivoje Nikolic 570 Cory Hall, 3-9297, bora@eecs.berkeley.edu Office hours: TuTh
More informationPOWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS
POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS Radu Zlatanovici zradu@eecs.berkeley.edu http://www.eecs.berkeley.edu/~zradu Department of Electrical Engineering and Computer Sciences University
More informationEE586 VLSI Design. Partha Pande School of EECS Washington State University
EE586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 1 (Introduction) Why is designing digital ICs different today than it was before? Will it change in
More informationAnnouncements. Advanced Digital Integrated Circuits. No office hour next Monday. Lecture 2: Scaling Trends
EE4 - Spring 008 Advanced Digital Integrated Circuits Lecture : Scaling Trends Announcements No office hour next Monday Extra office hours Tuesday and Thursday -3pm CMOS Scaling Rules Voltage, V / α tox/α
More informationDesign and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology
Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,
More informationDesign and Technology Trends
Lecture 1 Design and Technology Trends R. Saleh Dept. of ECE University of British Columbia res@ece.ubc.ca 1 Recently Designed Chips Itanium chip (Intel), 2B tx, 700mm 2, 8 layer 65nm CMOS (4 processors)
More informationDesign of Low Power Wide Gates used in Register File and Tag Comparator
www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,
More informationthe main limitations of the work is that wiring increases with 1. INTRODUCTION
Design of Low Power Speculative Han-Carlson Adder S.Sangeetha II ME - VLSI Design, Akshaya College of Engineering and Technology, Coimbatore sangeethasoctober@gmail.com S.Kamatchi Assistant Professor,
More informationMoore s s Law, 40 years and Counting
Moore s s Law, 40 years and Counting Future Directions of Silicon and Packaging Bill Holt General Manager Technology and Manufacturing Group Intel Corporation InterPACK 05 2005 Heat Transfer Conference
More informationCAD for VLSI. Debdeep Mukhopadhyay IIT Madras
CAD for VLSI Debdeep Mukhopadhyay IIT Madras Tentative Syllabus Overall perspective of VLSI Design MOS switch and CMOS, MOS based logic design, the CMOS logic styles, Pass Transistors Introduction to Verilog
More informationPerformance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design
International Journal of Engineering Research and General Science Volume 2, Issue 3, April-May 2014 Performance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design FelcyJeba
More informationCS310 Embedded Computer Systems. Maeng
1 INTRODUCTION (PART II) Maeng Three key embedded system technologies 2 Technology A manner of accomplishing a task, especially using technical processes, methods, or knowledge Three key technologies for
More informationA 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More information2009 International Solid-State Circuits Conference Intel Paper Highlights
2009 International Solid-State Circuits Conference Intel Paper Highlights Mark Bohr Intel Senior Fellow Soumyanath Krishnamurthy Intel Fellow 1 2009 ISSCC Intel Paper Summary Under embargo until February,
More informationLecture 5. Other Adder Issues
Lecture 5 Other Adder Issues Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 24 by Mark Horowitz with information from Brucek Khailany 1 Overview Reading There
More informationA 1.5GHz Third Generation Itanium Processor
A 1.5GHz Third Generation Itanium Processor Jason Stinson, Stefan Rusu Intel Corporation, Santa Clara, CA 1 Outline Processor highlights Process technology details Itanium processor evolution Block diagram
More informationMicroelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica
Microelettronica J. M. Rabaey, "Digital integrated circuits: a design perspective" Introduction Why is designing digital ICs different today than it was before? Will it change in future? The First Computer
More informationA Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache
A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache Stefan Rusu Intel Corporation Santa Clara, CA Intel and the Intel logo are registered trademarks of Intel Corporation or its subsidiaries in
More informationOUTLINE Introduction Power Components Dynamic Power Optimization Conclusions
OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism
More informationNear-Threshold Computing: Reclaiming Moore s Law
1 Near-Threshold Computing: Reclaiming Moore s Law Dr. Ronald G. Dreslinski Research Fellow Ann Arbor 1 1 Motivation 1000000 Transistors (100,000's) 100000 10000 Power (W) Performance (GOPS) Efficiency (GOPS/W)
More informationFABRICATION TECHNOLOGIES
FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general
More informationFundamentals of Quantitative Design and Analysis
Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature
More informationFPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level
More informationReduce Your System Power Consumption with Altera FPGAs Altera Corporation Public
Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary
More informationZ-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.
Z-RAM Ultra-Dense Memory for 90nm and Below Hot Chips 2006 David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Outline Device Overview Operation Architecture Features Challenges Z-RAM Performance
More informationAim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group
Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.
More informationPicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor
PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,
More informationECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141
ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition
More informationAdvanced Computer Architecture (CS620)
Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).
More informationShow how to connect three Full Adders to implement a 3-bit ripple-carry adder
Show how to connect three Full Adders to implement a 3-bit ripple-carry adder 1 Reg. A Reg. B Reg. Sum 2 Chapter 5 Computing Components Yet another layer of abstraction! Components Circuits Gates Transistors
More informationMemory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM
ECEN454 Digital Integrated Circuit Design Memory ECEN 454 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports DRAM Outline Serial Access Memories ROM ECEN 454 12.2 1 Memory
More informationPower Consumption in 65 nm FPGAs
White Paper: Virtex-5 FPGAs R WP246 (v1.2) February 1, 2007 Power Consumption in 65 nm FPGAs By: Derek Curd With the introduction of the Virtex -5 family, Xilinx is once again leading the charge to deliver
More informationIntroduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN
1 Introduction The evolution of integrated circuit (IC) fabrication techniques is a unique fact in the history of modern industry. The improvements in terms of speed, density and cost have kept constant
More informationComputer Architecture s Changing Definition
Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction
More informationLecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.
Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports
More informationEE5780 Advanced VLSI CAD
EE5780 Advanced VLSI CAD Lecture 1 Introduction Zhuo Feng 1.1 Prof. Zhuo Feng Office: EERC 513 Phone: 487-3116 Email: zhuofeng@mtu.edu Class Website http://www.ece.mtu.edu/~zhuofeng/ee5780fall2013.html
More informationThe Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006
The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 Hot Chips, 2006 Structure of the talk 65nm technology going towards 32nm Virtex-5 family Improved I/O Benchmarking
More informationLecture 2: Performance
Lecture 2: Performance Today s topics: Technology wrap-up Performance trends and equations Reminders: YouTube videos, canvas, and class webpage: http://www.cs.utah.edu/~rajeev/cs3810/ 1 Important Trends
More informationAutomatic Post Silicon Clock Scheduling 08/12/2008. UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Power Issues in Computer Architecture Fall 2008 Power Density Trend for Intel mp 1000 Watt/cm2 100 10
More informationFPGA architecture and design technology
CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA
More informationUnleashing the Power of Embedded DRAM
Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers
More informationAdaptive Robustness Tuning for High Performance Domino Logic
Adaptive Robustness Tuning for High Performance Domino Logic Bharan Giridhar 1, David Fick 1, Matthew Fojtik 1, Sudhir Satpathy 1, David Bull 2, Dennis Sylvester 1 and David Blaauw 1 1 niversity of Michigan,
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationThe Processor That Don't Cost a Thing
The Processor That Don't Cost a Thing Peter Hsu, Ph.D. Peter Hsu Consulting, Inc. http://cs.wisc.edu/~peterhsu DRAM+Processor Commercial demand Heat stiffling industry's growth Heat density limits small
More informationDigital Integrated Circuits A Design Perspective. Jan M. Rabaey
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Outline (approximate) Introduction and Motivation The VLSI Design Process Details of the MOS Transistor Device Fabrication Design Rules CMOS
More informationCentip3De: A 64-Core, 3D Stacked, Near-Threshold System
1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman
More informationPackage level Interconnect Options
Package level Interconnect Options J.Balachandran,S.Brebels,G.Carchon, W.De Raedt, B.Nauwelaers,E.Beyne imec 2005 SLIP 2005 April 2 3 Sanfrancisco,USA Challenges in Nanometer Era Integration capacity F
More informationBasic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices
3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Sept. 5 th : Homework 1 release (due on Sept.
More informationVLSI Design Automation
VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,
More informationA 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing
A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine
More informationIntroduction to Microprocessor
Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device
More informationDESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES
Volume 120 No. 6 2018, 4453-4466 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR
More informationComparative Analysis of Contemporary Cache Power Reduction Techniques
Comparative Analysis of Contemporary Cache Power Reduction Techniques Ph.D. Dissertation Proposal Samuel V. Rodriguez Motivation Power dissipation is important across the board, not just portable devices!!
More informationPower dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.
The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults
More informationSpiral 2-8. Cell Layout
2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric
More informationAdaptive Voltage Scaling (AVS) Alex Vainberg October 13, 2010
Adaptive Voltage Scaling (AVS) Alex Vainberg Email: alex.vainberg@nsc.com October 13, 2010 Agenda AVS Introduction, Technology and Architecture Design Implementation Hardware Performance Monitors Overview
More information+1 (479)
Memory Courtesy of Dr. Daehyun Lim@WSU, Dr. Harris@HMC, Dr. Shmuel Wimer@BIU and Dr. Choi@PSU http://csce.uark.edu +1 (479) 575-6043 yrpeng@uark.edu Memory Arrays Memory Arrays Random Access Memory Serial
More informationMobile Processors. Jose R. Ortiz Ubarri
Mobile Processors Jose R. Ortiz Ubarri Electrical and Computer Engineering Department University of Puerto Rico, Mayagüez Campus Mayagüez, Puerto Rico 00681 5000 Jose.Ortiz@hpcf.upr.edu Introduction While
More informationMicroprocessor Trends and Implications for the Future
Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from
More informationEITF35: Introduction to Structured VLSI Design
EITF35: Introduction to Structured VLSI Design Part 1.1.2: Introduction (Digital VLSI Systems) Liang Liu liang.liu@eit.lth.se 1 Outline Why Digital? History & Roadmap Device Technology & Platforms System
More informationUNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.
UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known
More informationAn Overview of Standard Cell Based Digital VLSI Design
An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,
More informationECE520 VLSI Design. Lecture 1: Introduction to VLSI Technology. Payman Zarkesh-Ha
ECE520 VLSI Design Lecture 1: Introduction to VLSI Technology Payman Zarkesh-Ha Office: ECE Bldg. 230B Office hours: Wednesday 2:00-3:00PM or by appointment E-mail: pzarkesh@unm.edu Slide: 1 Course Objectives
More informationAnnouncements. Advanced Digital Integrated Circuits. No office hour next Monday. Lecture 2: Scaling Trends
EE24 - Spring 2008 Advanced Digital Integrated Circuits Lecture 2: Scaling Trends Announcements No office hour next Monday Extra office hours Tuesday and Thursday 2-3pm 2 CMOS Scaling Rules Voltage, V
More informationIntel released new technology call P6P
P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new
More informationTrends in the Infrastructure of Computing
Trends in the Infrastructure of Computing CSCE 9: Computing in the Modern World Dr. Jason D. Bakos My Questions How do computer processors work? Why do computer processors get faster over time? How much
More informationVLSI Design Automation. Maurizio Palesi
VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 Outline Technology trends VLSI Design flow (an overview) 3 IC Products Processors CPU, DSP, Controllers Memory chips
More informationMemory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.
Memory Design I Professor Chris H. Kim University of Minnesota Dept. of ECE chriskim@ece.umn.edu Array-Structured Memory Architecture 2 1 Semiconductor Memory Classification Read-Write Wi Memory Non-Volatile
More informationPart 1 of 3 -Understand the hardware components of computer systems
Part 1 of 3 -Understand the hardware components of computer systems The main circuit board, the motherboard provides the base to which a number of other hardware devices are connected. Devices that connect
More informationEE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali.
EE219A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 9 FPGA Architecture Ranier Yap, Mohamed Ali Annoucements Homework 2 posted Due Wed, May 7 Now is the time to turn-in your Hw
More informationCSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI Contents 1.7 - End of Chapter 1 Power wall The multicore era
More informationA 1-GHz Configurable Processor Core MeP-h1
A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface
More informationECE 4514 Digital Design II. Spring Lecture 22: Design Economics: FPGAs, ASICs, Full Custom
ECE 4514 Digital Design II Lecture 22: Design Economics: FPGAs, ASICs, Full Custom A Tools/Methods Lecture Overview Wows and Woes of scaling The case of the Microprocessor How efficiently does a microprocessor
More information! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.
Topics! SRAM-based FPGA fabrics:! Xilinx.! Altera. SRAM-based FPGAs! Program logic functions, using SRAM.! Advantages:! Re-programmable;! dynamically reconfigurable;! uses standard processes.! isadvantages:!
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationVLSI Design Automation. Calcolatori Elettronici Ing. Informatica
VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing
More informationEE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements
EE241 - Spring 2007 Advanced Digital Integrated Circuits Lecture 22: SRAM Announcements Homework #4 due today Final exam on May 8 in class Project presentations on May 3, 1-5pm 2 1 Class Material Last
More informationHigh-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs
High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs October 29, 2002 Microprocessor Research Forum Intel s Microarchitecture Research Labs! USA:
More informationVLSI Design Automation
VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,
More informationLab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation
Course Goals Lab Understand key components in VLSI designs Become familiar with design tools (Cadence) Understand design flows Understand behavioral, structural, and physical specifications Be able to
More informationStacked Silicon Interconnect Technology (SSIT)
Stacked Silicon Interconnect Technology (SSIT) Suresh Ramalingam Xilinx Inc. MEPTEC, January 12, 2011 Agenda Background and Motivation Stacked Silicon Interconnect Technology Summary Background and Motivation
More informationA 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013
A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias David Kidd August 26, 2013 1 HOTCHIPS 2013 Copyright 2013 SuVolta, Inc. All rights reserved. Agenda DDC transistor and PowerShrink platform
More information2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don
RAMP-IV: A Low-Power and High-Performance 2D/3D Graphics Accelerator for Mobile Multimedia Applications Woo, Sungdae Choi, Ju-Ho Sohn, Seong-Jun Song, Young-Don Bae,, and Hoi-Jun Yoo oratory Dept. of EECS,
More informationKiloCore: A 32 nm 1000-Processor Array
KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationHigh Speed Han Carlson Adder Using Modified SQRT CSLA
I J C T A, 9(16), 2016, pp. 7843-7849 International Science Press High Speed Han Carlson Adder Using Modified SQRT CSLA D. Vamshi Krishna*, P. Radhika** and T. Vigneswaran*** ABSTRACT Binary addition is
More informationTransistors and Wires
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis Part II These slides are based on the slides provided by the publisher. The slides
More informationThe Impact of Wave Pipelining on Future Interconnect Technologies
The Impact of Wave Pipelining on Future Interconnect Technologies Jeff Davis, Vinita Deodhar, and Ajay Joshi School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332-0250
More informationIntroduction. Summary. Why computer architecture? Technology trends Cost issues
Introduction 1 Summary Why computer architecture? Technology trends Cost issues 2 1 Computer architecture? Computer Architecture refers to the attributes of a system visible to a programmer (that have
More informationECE484 VLSI Digital Circuits Fall Lecture 01: Introduction
ECE484 VLSI Digital Circuits Fall 2017 Lecture 01: Introduction Adapted from slides provided by Mary Jane Irwin. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] CSE477 L01 Introduction.1
More informationVLSI Chip Design Project TSEK06
VLSI Chip Design Project TSEK06 Project Description and Requirement Specification Version 1.0 Project: A -Bit Kogge-Stone Adder Project number: 1 Project Group: Name Project members Telephone E-mail Project
More informationJim Keller. Digital Equipment Corp. Hudson MA
Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership
More informationDesigning and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders
Vol. 3, Issue. 4, July-august. 2013 pp-2266-2270 ISSN: 2249-6645 Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders V.Krishna Kumari (1), Y.Sri Chakrapani
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationAn Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection
An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,
More information