Microprocessor and DSP Technologies for the Nanoscale Era

Size: px
Start display at page:

Download "Microprocessor and DSP Technologies for the Nanoscale Era"

Transcription

1 Microprocessor and DSP Technologies for the Nanoscale Era Seminar 1 Ram Kumar Krishnamurthy Microprocessor Research Labs Intel Corporation, Hillsboro, OR ram.krishnamurthy@intel.com 1 July 5, 2005 Intel Labs

2 About Circuits Research Lab Established 1996 Belongs under Microprocessor Technology Labs Located in Hillsboro, Oregon, USA (primary) and Bangalore, India 75 researchers Charter: High-performance & low-power digital circuits Off-chip I/O signaling circuits Power delivery circuits >50 patents, >25 papers per year 2 Intel Labs

3 Motivation: Higher performance at lower power and cost MIPS Pentium 4 Architecture Pentium Pro Architecture Pentium Architecture Strong demand for > 1 TIPS performance beyond this decade How do you get there?

4 Our Research Agenda Outlook Technology Node (nm) Integration Capacity (BT) Delay = CV/I scaling ~0.7 >0.7 Delay scaling will slow down 8 Energy/Logic Op scaling Bulk Planar CMOS Alternate, 3G etc Variability >0.35 >0.5 >0.5 Energy scaling will slow down High Probability Low Probability Low Probability High Probability Medium High Very High ILD (K) ~3 <3 Reduce slowly towards RC Delay Metal Layers to 1 layer per generation Internal University FCRP(MARCO) 4

5 Intel s Research Focus 1 Technology Leadership Gate Length 1000 Industry 0.1 Intel nm Complete solution stack Technology Arch & Design Platforms Software 5

6 Architectures & Designs Back End Server Server Desktop Mobile Handheld Family Itanium Itanium Xeon Pentium Celeron Centrino Pentium Xscale Architecture IA64, VLIW IA64/ IA32 IA32 IA32 ARM Word 64 bit 64 bit Itanium 32 bit Xeon 32 bit 32 bit 32 bit Address Space Huge Huge/4 GB 4 GB 4 GB 4 GB Cache Performance 6 MB 6 MB, 2 MB 1 MB 1 MB 512 KB High High High Medium Low Power ~130W ~100 W < 100 W ~25 W < 1W Power Metric Watts/sq ft Watts/cu ft Watts Watt-hours Battery Life Cost High High Med Med Low 6 Our research agenda addresses all these platforms

7 Is Transistor a Good Switch? I = 0 I 0 On I = I = 1ma/u I = 0 I 0 Off I = 0 I 0 Sub-threshold Leakage 7

8 Sub-threshold Leakage MOS Transistor Characteristics Ids (log) Vt Exponential Increase in Ioff Vgs Ioff (na/u) Temp (C) SD Leakage (Watts) X Tr Growth 1.5X Tr Growth 0.25u 0.18u 0.13u 90nm 65nm 45nm Technology Transistors will not be switches,, but dimmers 8

9 Leakage Power Leakage Power (% of Total) 50% 40% 30% 20% 10% 0% Technology (µ) Leakage power limits Vt scaling Must stop at 50% A. Grove, IEDM

10 High Leakage Impacts Functionality Clock Pk 0 I Leak Dyn_out M 11 M 1j M 1K M 21 M 2j M 2K INV_out Keeper / pulldown ratio 1.6 Sub-70nm X 3X 5X 10X 20X Subthreshold + gate leakage 10 M. Anders, R. Krishnamurthy et al, 2001 Symp. VLSI Circuits Sub-65nm Dynamic Circuit Active Leakage Tolerance: Cache, RF, Arrays, Bitlines most affected Keeper sizes > 50% of pulldown strength High contention degraded performance Slow keeper shutoff high short-circuit power

11 Power Will be the Limiter Million Transistors Pentium 4 proc 386 Pentium proc 1 Billion Transistors MHz Pentium 4 proc 386 Pentium proc GHz B transistor integration capacity will exist Pentium 4 proc MIPS Pentium proc TIPS Power (Watts) Pentium 4 proc Pentium proc 1000's of Watts? Applications will demand TIPS performance But the Power Challenge: Highest performance in the power envelope

12 Power Trend Power (W) Pentium processor 386 Cooling Capacity Of Conventional System 486 Pentium II processor Pentium 4 processor Business As Usual is is Not an an Option C scales by by 30% per generation but Vcc scales by by 10-15% 15% only! Must maintain or or reduce power in in future

13 Gate Oxide is Near Limit CoSi 2 130nm Transistor 70 nm Si 3 N 4 Gate Leakage (Watts) 1.E+06 1.E+05 1.E+04 1.E+03 1.E+02 1.E+01 1.E+00 1.E-01 1.E-02 1.E X 2X During Burn-in 1.4X Vdd 0.25u 0.18u 0.13u 90nm 65nm 45nm Technology Poly Si Gate Electrode 1.5 nm Gate Oxide Si Substrate Intel s High K leadership is crucial for the industry 13

14 Power Density Will Get Even Worse 10,000 1,000 Power Density (W/cm2) Need to Keep the Junctions Cool Performance (Higher Frequency) Lower leakage (Exponential) Better reliability (Exponential) Pentium processors Pat Gelsinger, ISSCC 2001

15 Active Power Reduction Low Supply Voltage Slow Fast Slow High Supply Voltage Multiple Supply Voltages 15 Vdd Logic Block Replicated Designs Freq = 1 Vdd = 1 Throughput = 1 Power = 1 Area = 1 Pwr Den = 1 Vdd/2 Logic Block Logic Block Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = Need high-speed multi-supply level converter techniques

16 Leakage Control Body Bias Stack Effect Sleep Transistor Vdd +Ve Vbp Equal Loading Logic Block -Ve Vbn 2-10X reduction X reduction X reduction Need low leakage and leakage tolerant techniques

17 Dual Vt Design for Active Leakage Reduction Number of paths High Vt Delay Technology provides two V t High V t with nominal I off (lower performance) Low V t with ~10X higher l off (higher performance) Employing high V t everywhere yields lower performance, and lower leakage (1X) Number of paths Low Vt Delay Employing low V t everywhere yields higher performance, but higher leakage (10X) Number of paths Dual Vt Delay Logic path between latch boundaries Selective usage of low and high V t yields higher performance, yet low leakage between 1X, and <<10X 17

18 Chip Multi-Processing C1 C3 Cache C2 C4 Relative Performance CMP ST Die Area, Power Multi-core, each core Multi-threaded Shared cache and front side bus Each core has different Vdd & Freq Core hopping to spread hot spots Lower junction temperature 18

19 Memory Latency CPU Small ~few Clocks Cache Memory Large ns Memory Latency (Clocks) Assume: 50ns Memory latency Freq (MHz) Cache miss hurts performance Worse at higher frequency Need power efficient high-speed I/O techniques

20 Increase on-die Memory Power Density (Watts/cm 2 ) Logic Memory 0.25µ 0.18µ 0.13µ 0.1µ 100% 80% 60% 40% 20% 0% Cache % of full chip area? Pentium Pentium Pentium Pro II Pentium III & 4 Pentium III Pentium 4 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.10µ Large on die memory provides: 1. Increased Data Bandwidth & Reduced Latency 2. Hence, higher performance for much lower power 20

21 Special Purpose Hardware Acceleration TCP Offload Engine 1.E+06 PLL OOO ROB Send buffer MIPS 1.E+05 1.E+04 GP TCB Exec Core TCB Exec Core ROM ROM Input seq CAM1 CLB 1.E+03 TOE 1.E mm X 3.54 mm, 260K transistors Opportunities for acceleration: Network processing engines MPEG Encode/Decode engines Speech engines Wireless communication/baseband Special purpose HW Best MIPS/Watt 21

22 Energy-efficient efficient Data-path Circuits Cache Processor thermal map Temp ( o C) Execution core Integer and FP ALUs and MACs 22 ALUs: performance and peak-current limiters High activity thermal hotspots Goal: high-performance energy-efficient design

23 23 130nm 9GHz 32-bit Integer ALU (ISSCC 02) 32-bit integer exec core Input FIFO Die Size Process Interconnect Transistors Maximum V CC Scan Ctl RF Clock Output FIFO ALU 1.61 x 1.44 mm 130nm CMOS 1 poly, 6 metal 160K 1.5 V Misc BB Ctl M. Anders, R. Krishnamurthy et al, Intl. Solid-state Circuits Conf & IEEE Journal of Solid-state Circuits 11/02 F max (GHz) Power (mw) Supply Voltage (V) Design target: 6.5GHz at 120mW Supply Voltage (V) Leakage Power (mw)

24 90nm 7GHz 64-bit Integer ALU (ISSCC 04) Clock Generator and Drivers Lower-order 32-bit ALU Upper-order 32-bit ALU I/O Circuits S. Mathew, R. Krishnamurthy et al, Intl. Solid-state Circuits Conf & IEEE Journal of Solid-state Circuits 01/05 Process Die area 64-bit ALU layout area 90nm Dual-Vt CMOS, 7 Metal 0.474mm mm 2 Total transistor count bit ALU average switching power (α=0.3)( 89mW at 4GHz, 1.3V, 25 o C 64-bit ALU active leakage power 9.6mW at 1.3V, 25 o C 64-bit ALU maximum frequency 7GHz at 2.1V, 25C 32-bit ALU average switching power (α=0.3)( 71mW at 7GHz, 1.3V, 25 o C 32-bit ALU active leakage power 4.4mW at 1.3V, 25 o C 64-bit ALU die microphotograph and measured performance summary 24 7GHz single-cycle 64-bit integer ALU ALU (measured in in 90nm CMOS) Simultaneous 9GHz single-cycle 32-bit integer ALU ALU mode Fastest reported single-cycle 64-bit integer ALU ALU performance

25 90nm 1GHz 9mW 16*16b Multiplier (ISSCC 05) Clock Generator and Drivers R-PLA Registers 16x16b Multiplier I/O Circuits 25 S. Hsu, R. Krishnamurthy et al, Intl. Solid-state Circuits Conf Process Die area 16b Multiplier and PLA layout area 90nm Dual-V CMOS t 0.474mm mm 2 16b Multiplier worst-case power 9mW at 1GHz, 1.3V, 50 o C (nominal) 16b Multiplier active leakage power 540µW W at 1.3V, 50 o C (nominal) 16b Multiplier peak performance 1.5GHz, 32mW at 1.95V, 50 o C 16b Multiplier low-voltage mode performance 50MHz, 79µW W at 0.57V, 50 o C Reconfigurable PLA peak performance 2.3GHz, 4.2mW at 1.3V, 50 C Reconfigurable PLA worst-case power 2mW at 1GHz, 1.3V, 50 o C (nominal) Stand-by mode power 75µW W (7X reduction vs. active leakage) 16*16-bit Multiplier die microphotograph and measured performance summary 1GHz single-cycle 16*16-bit DSP DSP multiplier (measured in in 90nm CMOS) Reconfigurable PLA PLA control engine 9pJ/Op or or 110GOPS/Watt Highest reported GOPS/Watt for for single-cycle 16-bit multiply

26 32-bit ALU architecture External operands External operands Mux control 6:1 Mux 6:1 Mux Shift control 5:1 Mux 2:1 Mux Adder core O/p Mux Sum Mux control Sign control Loopback bus 26 Multiple ALUs clustered together in the execution core High power density

27 Full-Adder Intro A B Cin Full adder Sum Cout 27

28 The Binary Adder A B Cin Full adder Sum Cout S = A B C i = ABC i + ABC i + ABC i + ABC i C o = AB + BC i + AC i 28

29 The Ripple-Carry Adder A 0 B 0 A 1 B 1 A 2 B 2 A 3 B 3 C i,0 C o,0 C o,1 C o,2 C o,3 FA FA FA FA (= C i,1 ) S 0 S 1 S 2 S 3 Worst case delay linear with the number of bits t d = O(N) t adder = (N-1)t carry + t sum Goal: Make the fastest possible carry path circuit 29

30 Static CMOS Full Adder V DD V DD C i A B A B A B A C i X B C i V DD C i A S C i A B B V DD A B C i A C o B 28 Transistors 30

31 Carry Look-ahead Sum i = A i B i Carry i-1 Carry i = A i B i + (A i +B i )Carry i-1 Intel Labs

32 Partial Sum Sum i = A i B i Carry i-1 Carry i = A i B i + (A i +B i )Carry i-1 Intel Labs

33 Partial Sum Sum i = A i B i Carry i-1 Carry i = A i B i + (A i +B i )Carry i-1 Generate Propagate Intel Labs

34 Partial Sum Sum i = A i B i Carry i-1 Carry i = A i B i + (A i +B i )Carry i-1 Generate Propagate Carry i = G i + P i Carry i-1 Intel Labs

35 High-performance Adders: Even input bits Kogge Stone PG Gen. CM1 CM2 CM3 CM4 CM5 XOR Sum even Odd input bits PG Gen. CM1 CM2 CM3 CM4 CM5 XOR Sum odd GG=G i +P i G i-1 GP=P i P i-1 Generate all 32 carries: 35 Full-blown binary tree energy-inefficient # Carry-merge stages = log 2 (32) 5 stages

36 Kogge-Stone Adder Carry-merge gates XOR PG Critical path = PG+5+XOR = 7 gate stages Generate,Propagate fanout of 2,3 Maximum interconnect spans 16b Energy inefficient

37 Sparse-tree Adder Architecture Generate every 4 th carry in parallel Side-path: 4-bit conditional sum generator 73% fewer carry-merge gates energy-efficient 37 C 27 C 23 C 19 C 15 C 11 C 7 C 3

38 Non-critical Sum Generator P i+3,g i+3 P i+2,g i+2 P i+1 G i+1 P i CM 1 0 CM CM CM CM XOR CM XOR XOR XOR XOR XOR Sum i,1 Sum i,0 38 2:1 Sum i+3 2:1 2:1 2:1 Sum i+2 Sum i+1 Sum i Carry Non-critical path: ripple carry chain Reduced area, energy consumption, leakage Generate conditional sums for each bit Sparse-tree carry selects appropriate sum

39 Adder Inputs 39 Adder Core Critical Path clk PG GG 1 clk2 GG 7 clk3 Single-rail dynamic sparse-tree path Sum 31_0 clk CM0 Latch GG 3 CM1 XOR GG 15 Static sum generator GG 27 Sum 31_1 C 27 Sum 31 Critical path: 7 gate stages same as KS Sparse-tree: single-rail dynamic Exploit non-criticality of sum generator Convert to static logic Semi-dynamic design

40 Sparse-tree Architecture Performance impact: (20% speedup) 33-50% reduced G/P fanouts 80% reduced wiring complexity 30% reduction in maximum interconnect Power impact: (56% reduction) 73% fewer carry-merge gates 50% reduction in average transistor size 40

41 Energy-delay Space 41 Worst-case Energy (pj) % 20% 130nm CMOS, 1.2V, 110 o C Dynamic Kogge-Stone 20 4GHz 0 Design Semi-dynamic Sparse-Tree Delay (ps) 20% speedup over Kogge-Stone 56% worst-case energy reduction Scales with activity factor

42 Semi-dynamic Design Average Energy ( (pj pj) % Dynamic Kogge-Stone Semi-dynamic Sparse-Tree Activity factor Static sum generators : low switching activity 71% lower average energy at 10% activity 42

43 So, How Do We Get There? Multi-Threaded, Multi-Core Multi Threaded MIPS Super Scalar Era of Thread & Processor Level Parallelism Speculative, OOO Era of Instruction Era of Special Level Pipelined Purpose HW Architecture Parallelism Significant Challenges Ahead Can only be solved with joint industry -university industry-university collaboration

44 Thank You for Your Attention Q&A Our publications can be found in: IEEE Intl. Solid-State State Circuits Conference, IEEE Journal of Solid-State State Circuits, Symposium on VLSI Circuits, Intl. Symposium on Low-power Design, Custom Integrated Circuits Conference, SOCC, etc.,

45 45 Backup

46 Optimized First-level Carry-merge CM 0 Conditional Carry for Cin=0 P i Cin=0 G i C#_0 i G i C#_0 G i P i Carry-merge stage reduces to inverter Conditional carry_0 = G i # 46

47 Optimized First-level Carry-merge CM 1 Conditional carry for Cin=1 Cin=1 P i P i C#_1 G i C#_1 G i G i P i A i B i P i G i C#_ P i C#_1 47 P i & G i correlated Conditional carry_1 = P i #

48 Optimized Sum Generator P i+3,g i+3 P i+2,g i+2 P i+1 G i+1 P i Optimized 1st-level carry-merge CM CM CM XOR CM XOR XOR XOR XOR XOR Sum i,1 Sum i,0 2:1 2:1 2:1 2:1 Carry Sum i+3 Sum i+2 Sum i+1 Sum i Optimized non-critical path: 4 stages 48

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Gigascale Integration Design Challenges & Opportunities Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Outline CMOS technology challenges Technology, circuit and μarchitecture solutions Integration

More information

Multi-Core Microprocessor Chips: Motivation & Challenges

Multi-Core Microprocessor Chips: Motivation & Challenges Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005

More information

EE241 - Spring 2004 Advanced Digital Integrated Circuits

EE241 - Spring 2004 Advanced Digital Integrated Circuits EE24 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolić Lecture 2 Impact of Scaling Class Material Last lecture Class scope, organization Today s lecture Impact of scaling 2 Major Roadblocks.

More information

MOS High Performance Arithmetic

MOS High Performance Arithmetic MOS High Performance Arithmetic Mark Horowitz Stanford University horowitz@ee.stanford.edu Arithmetic Is Important Then Now (TegraK1) 2 What Is Hard? 9999999 + 1 3 Proof: 1 st Gen 2 nd Gen 4 And Getting

More information

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Practical Information

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Practical Information EE24 - Spring 2000 Advanced Digital Integrated Circuits Tu-Th 2:00 3:30pm 203 McLaughlin Practical Information Instructor: Borivoje Nikolic 570 Cory Hall, 3-9297, bora@eecs.berkeley.edu Office hours: TuTh

More information

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS Radu Zlatanovici zradu@eecs.berkeley.edu http://www.eecs.berkeley.edu/~zradu Department of Electrical Engineering and Computer Sciences University

More information

EE586 VLSI Design. Partha Pande School of EECS Washington State University

EE586 VLSI Design. Partha Pande School of EECS Washington State University EE586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 1 (Introduction) Why is designing digital ICs different today than it was before? Will it change in

More information

Announcements. Advanced Digital Integrated Circuits. No office hour next Monday. Lecture 2: Scaling Trends

Announcements. Advanced Digital Integrated Circuits. No office hour next Monday. Lecture 2: Scaling Trends EE4 - Spring 008 Advanced Digital Integrated Circuits Lecture : Scaling Trends Announcements No office hour next Monday Extra office hours Tuesday and Thursday -3pm CMOS Scaling Rules Voltage, V / α tox/α

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

Design and Technology Trends

Design and Technology Trends Lecture 1 Design and Technology Trends R. Saleh Dept. of ECE University of British Columbia res@ece.ubc.ca 1 Recently Designed Chips Itanium chip (Intel), 2B tx, 700mm 2, 8 layer 65nm CMOS (4 processors)

More information

Design of Low Power Wide Gates used in Register File and Tag Comparator

Design of Low Power Wide Gates used in Register File and Tag Comparator www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,

More information

the main limitations of the work is that wiring increases with 1. INTRODUCTION

the main limitations of the work is that wiring increases with 1. INTRODUCTION Design of Low Power Speculative Han-Carlson Adder S.Sangeetha II ME - VLSI Design, Akshaya College of Engineering and Technology, Coimbatore sangeethasoctober@gmail.com S.Kamatchi Assistant Professor,

More information

Moore s s Law, 40 years and Counting

Moore s s Law, 40 years and Counting Moore s s Law, 40 years and Counting Future Directions of Silicon and Packaging Bill Holt General Manager Technology and Manufacturing Group Intel Corporation InterPACK 05 2005 Heat Transfer Conference

More information

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras CAD for VLSI Debdeep Mukhopadhyay IIT Madras Tentative Syllabus Overall perspective of VLSI Design MOS switch and CMOS, MOS based logic design, the CMOS logic styles, Pass Transistors Introduction to Verilog

More information

Performance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design

Performance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design International Journal of Engineering Research and General Science Volume 2, Issue 3, April-May 2014 Performance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design FelcyJeba

More information

CS310 Embedded Computer Systems. Maeng

CS310 Embedded Computer Systems. Maeng 1 INTRODUCTION (PART II) Maeng Three key embedded system technologies 2 Technology A manner of accomplishing a task, especially using technical processes, methods, or knowledge Three key technologies for

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

2009 International Solid-State Circuits Conference Intel Paper Highlights

2009 International Solid-State Circuits Conference Intel Paper Highlights 2009 International Solid-State Circuits Conference Intel Paper Highlights Mark Bohr Intel Senior Fellow Soumyanath Krishnamurthy Intel Fellow 1 2009 ISSCC Intel Paper Summary Under embargo until February,

More information

Lecture 5. Other Adder Issues

Lecture 5. Other Adder Issues Lecture 5 Other Adder Issues Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 24 by Mark Horowitz with information from Brucek Khailany 1 Overview Reading There

More information

A 1.5GHz Third Generation Itanium Processor

A 1.5GHz Third Generation Itanium Processor A 1.5GHz Third Generation Itanium Processor Jason Stinson, Stefan Rusu Intel Corporation, Santa Clara, CA 1 Outline Processor highlights Process technology details Itanium processor evolution Block diagram

More information

Microelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica

Microelettronica. J. M. Rabaey, Digital integrated circuits: a design perspective EE141 Microelettronica Microelettronica J. M. Rabaey, "Digital integrated circuits: a design perspective" Introduction Why is designing digital ICs different today than it was before? Will it change in future? The First Computer

More information

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache Stefan Rusu Intel Corporation Santa Clara, CA Intel and the Intel logo are registered trademarks of Intel Corporation or its subsidiaries in

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

Near-Threshold Computing: Reclaiming Moore s Law

Near-Threshold Computing: Reclaiming Moore s Law 1 Near-Threshold Computing: Reclaiming Moore s Law Dr. Ronald G. Dreslinski Research Fellow Ann Arbor 1 1 Motivation 1000000 Transistors (100,000's) 100000 10000 Power (W) Performance (GOPS) Efficiency (GOPS/W)

More information

FABRICATION TECHNOLOGIES

FABRICATION TECHNOLOGIES FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary

More information

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Z-RAM Ultra-Dense Memory for 90nm and Below Hot Chips 2006 David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Outline Device Overview Operation Architecture Features Challenges Z-RAM Performance

More information

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.

More information

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,

More information

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141 ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition

More information

Advanced Computer Architecture (CS620)

Advanced Computer Architecture (CS620) Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).

More information

Show how to connect three Full Adders to implement a 3-bit ripple-carry adder

Show how to connect three Full Adders to implement a 3-bit ripple-carry adder Show how to connect three Full Adders to implement a 3-bit ripple-carry adder 1 Reg. A Reg. B Reg. Sum 2 Chapter 5 Computing Components Yet another layer of abstraction! Components Circuits Gates Transistors

More information

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM ECEN454 Digital Integrated Circuit Design Memory ECEN 454 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports DRAM Outline Serial Access Memories ROM ECEN 454 12.2 1 Memory

More information

Power Consumption in 65 nm FPGAs

Power Consumption in 65 nm FPGAs White Paper: Virtex-5 FPGAs R WP246 (v1.2) February 1, 2007 Power Consumption in 65 nm FPGAs By: Derek Curd With the introduction of the Virtex -5 family, Xilinx is once again leading the charge to deliver

More information

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN 1 Introduction The evolution of integrated circuit (IC) fabrication techniques is a unique fact in the history of modern industry. The improvements in terms of speed, density and cost have kept constant

More information

Computer Architecture s Changing Definition

Computer Architecture s Changing Definition Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction

More information

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed. Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports

More information

EE5780 Advanced VLSI CAD

EE5780 Advanced VLSI CAD EE5780 Advanced VLSI CAD Lecture 1 Introduction Zhuo Feng 1.1 Prof. Zhuo Feng Office: EERC 513 Phone: 487-3116 Email: zhuofeng@mtu.edu Class Website http://www.ece.mtu.edu/~zhuofeng/ee5780fall2013.html

More information

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 Hot Chips, 2006 Structure of the talk 65nm technology going towards 32nm Virtex-5 family Improved I/O Benchmarking

More information

Lecture 2: Performance

Lecture 2: Performance Lecture 2: Performance Today s topics: Technology wrap-up Performance trends and equations Reminders: YouTube videos, canvas, and class webpage: http://www.cs.utah.edu/~rajeev/cs3810/ 1 Important Trends

More information

Automatic Post Silicon Clock Scheduling 08/12/2008. UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering

Automatic Post Silicon Clock Scheduling 08/12/2008. UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Power Issues in Computer Architecture Fall 2008 Power Density Trend for Intel mp 1000 Watt/cm2 100 10

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

Adaptive Robustness Tuning for High Performance Domino Logic

Adaptive Robustness Tuning for High Performance Domino Logic Adaptive Robustness Tuning for High Performance Domino Logic Bharan Giridhar 1, David Fick 1, Matthew Fojtik 1, Sudhir Satpathy 1, David Bull 2, Dennis Sylvester 1 and David Blaauw 1 1 niversity of Michigan,

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

The Processor That Don't Cost a Thing

The Processor That Don't Cost a Thing The Processor That Don't Cost a Thing Peter Hsu, Ph.D. Peter Hsu Consulting, Inc. http://cs.wisc.edu/~peterhsu DRAM+Processor Commercial demand Heat stiffling industry's growth Heat density limits small

More information

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Digital Integrated Circuits A Design Perspective Jan M. Rabaey Outline (approximate) Introduction and Motivation The VLSI Design Process Details of the MOS Transistor Device Fabrication Design Rules CMOS

More information

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System 1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman

More information

Package level Interconnect Options

Package level Interconnect Options Package level Interconnect Options J.Balachandran,S.Brebels,G.Carchon, W.De Raedt, B.Nauwelaers,E.Beyne imec 2005 SLIP 2005 April 2 3 Sanfrancisco,USA Challenges in Nanometer Era Integration capacity F

More information

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices 3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Sept. 5 th : Homework 1 release (due on Sept.

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine

More information

Introduction to Microprocessor

Introduction to Microprocessor Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device

More information

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES Volume 120 No. 6 2018, 4453-4466 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR

More information

Comparative Analysis of Contemporary Cache Power Reduction Techniques

Comparative Analysis of Contemporary Cache Power Reduction Techniques Comparative Analysis of Contemporary Cache Power Reduction Techniques Ph.D. Dissertation Proposal Samuel V. Rodriguez Motivation Power dissipation is important across the board, not just portable devices!!

More information

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem. The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

Adaptive Voltage Scaling (AVS) Alex Vainberg October 13, 2010

Adaptive Voltage Scaling (AVS) Alex Vainberg   October 13, 2010 Adaptive Voltage Scaling (AVS) Alex Vainberg Email: alex.vainberg@nsc.com October 13, 2010 Agenda AVS Introduction, Technology and Architecture Design Implementation Hardware Performance Monitors Overview

More information

+1 (479)

+1 (479) Memory Courtesy of Dr. Daehyun Lim@WSU, Dr. Harris@HMC, Dr. Shmuel Wimer@BIU and Dr. Choi@PSU http://csce.uark.edu +1 (479) 575-6043 yrpeng@uark.edu Memory Arrays Memory Arrays Random Access Memory Serial

More information

Mobile Processors. Jose R. Ortiz Ubarri

Mobile Processors. Jose R. Ortiz Ubarri Mobile Processors Jose R. Ortiz Ubarri Electrical and Computer Engineering Department University of Puerto Rico, Mayagüez Campus Mayagüez, Puerto Rico 00681 5000 Jose.Ortiz@hpcf.upr.edu Introduction While

More information

Microprocessor Trends and Implications for the Future

Microprocessor Trends and Implications for the Future Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from

More information

EITF35: Introduction to Structured VLSI Design

EITF35: Introduction to Structured VLSI Design EITF35: Introduction to Structured VLSI Design Part 1.1.2: Introduction (Digital VLSI Systems) Liang Liu liang.liu@eit.lth.se 1 Outline Why Digital? History & Roadmap Device Technology & Platforms System

More information

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

ECE520 VLSI Design. Lecture 1: Introduction to VLSI Technology. Payman Zarkesh-Ha

ECE520 VLSI Design. Lecture 1: Introduction to VLSI Technology. Payman Zarkesh-Ha ECE520 VLSI Design Lecture 1: Introduction to VLSI Technology Payman Zarkesh-Ha Office: ECE Bldg. 230B Office hours: Wednesday 2:00-3:00PM or by appointment E-mail: pzarkesh@unm.edu Slide: 1 Course Objectives

More information

Announcements. Advanced Digital Integrated Circuits. No office hour next Monday. Lecture 2: Scaling Trends

Announcements. Advanced Digital Integrated Circuits. No office hour next Monday. Lecture 2: Scaling Trends EE24 - Spring 2008 Advanced Digital Integrated Circuits Lecture 2: Scaling Trends Announcements No office hour next Monday Extra office hours Tuesday and Thursday 2-3pm 2 CMOS Scaling Rules Voltage, V

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

Trends in the Infrastructure of Computing

Trends in the Infrastructure of Computing Trends in the Infrastructure of Computing CSCE 9: Computing in the Modern World Dr. Jason D. Bakos My Questions How do computer processors work? Why do computer processors get faster over time? How much

More information

VLSI Design Automation. Maurizio Palesi

VLSI Design Automation. Maurizio Palesi VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 Outline Technology trends VLSI Design flow (an overview) 3 IC Products Processors CPU, DSP, Controllers Memory chips

More information

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE. Memory Design I Professor Chris H. Kim University of Minnesota Dept. of ECE chriskim@ece.umn.edu Array-Structured Memory Architecture 2 1 Semiconductor Memory Classification Read-Write Wi Memory Non-Volatile

More information

Part 1 of 3 -Understand the hardware components of computer systems

Part 1 of 3 -Understand the hardware components of computer systems Part 1 of 3 -Understand the hardware components of computer systems The main circuit board, the motherboard provides the base to which a number of other hardware devices are connected. Devices that connect

More information

EE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali.

EE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali. EE219A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 9 FPGA Architecture Ranier Yap, Mohamed Ali Annoucements Homework 2 posted Due Wed, May 7 Now is the time to turn-in your Hw

More information

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI Contents 1.7 - End of Chapter 1 Power wall The multicore era

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

ECE 4514 Digital Design II. Spring Lecture 22: Design Economics: FPGAs, ASICs, Full Custom

ECE 4514 Digital Design II. Spring Lecture 22: Design Economics: FPGAs, ASICs, Full Custom ECE 4514 Digital Design II Lecture 22: Design Economics: FPGAs, ASICs, Full Custom A Tools/Methods Lecture Overview Wows and Woes of scaling The case of the Microprocessor How efficiently does a microprocessor

More information

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes. Topics! SRAM-based FPGA fabrics:! Xilinx.! Altera. SRAM-based FPGAs! Program logic functions, using SRAM.! Advantages:! Re-programmable;! dynamically reconfigurable;! uses standard processes.! isadvantages:!

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing

More information

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements EE241 - Spring 2007 Advanced Digital Integrated Circuits Lecture 22: SRAM Announcements Homework #4 due today Final exam on May 8 in class Project presentations on May 3, 1-5pm 2 1 Class Material Last

More information

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs October 29, 2002 Microprocessor Research Forum Intel s Microarchitecture Research Labs! USA:

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation Course Goals Lab Understand key components in VLSI designs Become familiar with design tools (Cadence) Understand design flows Understand behavioral, structural, and physical specifications Be able to

More information

Stacked Silicon Interconnect Technology (SSIT)

Stacked Silicon Interconnect Technology (SSIT) Stacked Silicon Interconnect Technology (SSIT) Suresh Ramalingam Xilinx Inc. MEPTEC, January 12, 2011 Agenda Background and Motivation Stacked Silicon Interconnect Technology Summary Background and Motivation

More information

A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013

A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013 A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias David Kidd August 26, 2013 1 HOTCHIPS 2013 Copyright 2013 SuVolta, Inc. All rights reserved. Agenda DDC transistor and PowerShrink platform

More information

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don RAMP-IV: A Low-Power and High-Performance 2D/3D Graphics Accelerator for Mobile Multimedia Applications Woo, Sungdae Choi, Ju-Ho Sohn, Seong-Jun Song, Young-Don Bae,, and Hoi-Jun Yoo oratory Dept. of EECS,

More information

KiloCore: A 32 nm 1000-Processor Array

KiloCore: A 32 nm 1000-Processor Array KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

High Speed Han Carlson Adder Using Modified SQRT CSLA

High Speed Han Carlson Adder Using Modified SQRT CSLA I J C T A, 9(16), 2016, pp. 7843-7849 International Science Press High Speed Han Carlson Adder Using Modified SQRT CSLA D. Vamshi Krishna*, P. Radhika** and T. Vigneswaran*** ABSTRACT Binary addition is

More information

Transistors and Wires

Transistors and Wires Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis Part II These slides are based on the slides provided by the publisher. The slides

More information

The Impact of Wave Pipelining on Future Interconnect Technologies

The Impact of Wave Pipelining on Future Interconnect Technologies The Impact of Wave Pipelining on Future Interconnect Technologies Jeff Davis, Vinita Deodhar, and Ajay Joshi School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332-0250

More information

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Introduction. Summary. Why computer architecture? Technology trends Cost issues Introduction 1 Summary Why computer architecture? Technology trends Cost issues 2 1 Computer architecture? Computer Architecture refers to the attributes of a system visible to a programmer (that have

More information

ECE484 VLSI Digital Circuits Fall Lecture 01: Introduction

ECE484 VLSI Digital Circuits Fall Lecture 01: Introduction ECE484 VLSI Digital Circuits Fall 2017 Lecture 01: Introduction Adapted from slides provided by Mary Jane Irwin. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] CSE477 L01 Introduction.1

More information

VLSI Chip Design Project TSEK06

VLSI Chip Design Project TSEK06 VLSI Chip Design Project TSEK06 Project Description and Requirement Specification Version 1.0 Project: A -Bit Kogge-Stone Adder Project number: 1 Project Group: Name Project members Telephone E-mail Project

More information

Jim Keller. Digital Equipment Corp. Hudson MA

Jim Keller. Digital Equipment Corp. Hudson MA Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership

More information

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders Vol. 3, Issue. 4, July-august. 2013 pp-2266-2270 ISSN: 2249-6645 Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders V.Krishna Kumari (1), Y.Sri Chakrapani

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,

More information