MOS High Performance Arithmetic

Size: px
Start display at page:

Download "MOS High Performance Arithmetic"

Transcription

1 MOS High Performance Arithmetic Mark Horowitz Stanford University

2 Arithmetic Is Important Then Now (TegraK1) 2

3 What Is Hard?

4 Proof: 1 st Gen 2 nd Gen 4

5 And Getting The Data You Need But we didn t notice this until much later To notice this problem We need many advances in technology 5

6 3 rd Gen Relays (Z3 1940) A few adds/sec 6

7 4 th Gen Tubes (Eniac 1945) 5000 Adds/sec 7

8 5 th Gen Transistors (TX-0, Transistor ) TX-0 8

9 Modern Era IC, 1961 Image from State of the Art Stan Augarten 9

10 Moore s Law From Electronics, Volume 38, Number 8, April 19, 1965 Number of components on IC doubles every year Later modified to doubling every 18 to 24 months 10

11 ECL Computers 11

12 Microprocessor MOS Processor (1974)

13 nmos

14 CMOS 1985 To Present

15 CMOS (Arithmetic) Design What makes a good design? 15

16 Try to Balance 4 Parameters Area Performance Power Design Time 16

17 The Good News By the time CMOS came along There had been a lot of work on arithmetic Booth coding Wallace trees Ling coding Tree adders Manchester carry chains SRT division 17

18 The Bad News The best logical design depends on technology Remember the carry dependency? For relays a Manchester carry chain is the best All the delay is in changing the relay state P0_pv1 P1_pv1 P2_pv1 G0_pv1 G1_pv1 G2_pv1 18

19 More Bad News The metrics you are optimizing work in opposition Energy Performance 19

20 Just to Make Life More Complicated Your metrics change w/ technology scaling 20

21 Must Use Technology Independent Metrics Performance In terms of a FO4 delay FO4 Fanout=4 inverter delay at TT, 90% Vdd, 125C * Ldrawn Gate delay (ps) Technology Ldrawn (um) 21

22 Area and Energy Area Measure linear dimensions in features Energy CMOS energy is Normalize by 22

23 Dennard s Scaling The triple play: Get more gates, 1/L 2 1/ 2 Gates get faster, CV/i Energy per switch CV 2 3 Dennard, JSSC, pp , Oct

24 Three Era s of CMOS Arithmetic Design Getting Going Area constrained Party time! Performance Constrained The hangover Power Constrained 24

25 GETTING STARTED 25

26 Life in the 80 s Just learning how to design complex chips Chips had 100K transistors Almost no CAD tools Worried about it getting the design done And getting all the functions to fit on chip Getting the design to fit was job one Getting it to go fast was job two 26

27 Main Effects on Arithmetic Circuits Merged Function Blocks ALU A B Lookup Table P F 27

28 Precharge Logic static current W Dual pmos Network W precharge A non-overlapping (good, but not always possible) 2W B precharge precharge 2W evaluate evaluate evaluate Psuedo-nMOS CMOS Pre-Charge 28

29 Carry Chains, and Carry Skip Adders P0*P1*P2*P3 Cin Carry Carry Carry Carry Cout C0 C1 C2 C3 PG Cin =0 XOR Cin=1 XOR Mux 29

30 Iterating Structures Main processor Just use instructions (micro-code) and ALU Co-processor Also used iterating structures But built these structures for multiple or division Often asynchronous 30

31 MIPS R3010 Multiplier Clocked by internal oscillator, not external clock CSA CSA 31

32 A Self-Timed Pipeline C C C C prech prech prech prech clk clk clk clk out+ outout+ out- out+ out- out+ out- in+ inin+ inin+ inin+ in- 32

33 A Self-Timed Pipeline Data enters at the far left and the NOR gate flips This activates the C-element C C C C prech prech prech prech clk clk clk clk out+ outout+ out- out+ out- out+ out- in+ inin+ inin+ inin+ in- 33

34 A Self-Timed Pipeline First logic block goes into evaluate C C C C eval prech prech prech clk clk clk clk out+ outout+ out- out+ out- out+ out- in+ inin+ inin+ inin+ in- 34

35 A Self-Timed Pipeline C C C C eval prech prech prech clk clk clk clk out+ outout+ out- out+ out- out+ out- in+ inin+ inin+ inin+ in- 35

36 A Self-Timed Pipeline Second block goes into evaluate Primary inputs are deasserted, flipping the first NOR gate C C C C eval eval prech prech clk clk clk clk out+ outout+ out- out+ out- out+ out- in+ inin+ inin+ inin+ in- 36

37 Division - SRT Ted Williams Completely self-timed 37

38 PARTY TIME! 38

39 Performance, Performance, Performance Scaling provided Enough transistors Low energy, and fast gates Goal was to find the fastest structures Lots of dual rail domino logic Started to build full array/trees Many of the trees were regular (4:2 adder) for designer sanity 39

40 Ling Adder Implementation Sam Naffziger (HP, 1996) presented a 64b adder 7 FO4 delay (< 1nS): pretty darn fast 0.5 m CMOS From VLSI lecture notes in early 2000 s 40

41 Kogge Stone Adders H 64 0 H H64 H16/I16 H4/I4 cin (g 0, t 0 ) (g 62, t 62 ) 41

42 Alignment Shifter Build full shifter 42

43 Even Fuse Multiplier and Adder Together IBM Power 6 FMA 5 GHz 7-stage in 65nm Dependent unrounded results forwarded making dependent latency 6 cycles instead of 7 (6,6,7) design 43

44 Life Was Good, For a While 44

45 THE HANGOVER 45

46 But You Have to Pay Eventually 46

47 The Power Limit Watts/mm

48 Clever Power Increased Because We Were Greedy 10x too large 48

49 This Power Problem Is Not Going Away: P = C * Vdd 2 * f L

50 Think About It 50

51 32 bit CMOS Adder Design Space domino Sklansky Ling w/ 2bit sum select static Sklansky 10 Energy in pj dual rail Sklansky Ling Delay in 100ps 51

52 Performance Metrics Normally think of delay of unit But that only matter if there is a dependent op Many applications have many non-dependent ops These are throughput based systems Adding units improves performance 52

53 The Rise of Multi-Core Processors 53

54 The Stagnation of Multi-Core Processors 54

55 Throughput Based Designs For applications with abundant parallelism Leveraging parallelism helps energy efficiency But when do you stop Lower performance is almost always lower energy Minimum energy designs, Sea of very slow processors Meters of silicon area What to optimize? 55

56 56 Optimize Energy/Op vs. Area/Throughput

57 Floating Point Optimization 180nm ITRS 10nm 57

58 In This Space the Details Matter Implementation of Booth Mux More important then whether Booth 2, or Booth 3 How you wire the CSA array Is more important than the type of counter Most fancy adder tricks Produce worse designs 58

59 Built an FP Generator in

60 FMA Output 60

61 CMA vs. FMA For Latency For Throughput 61

62 Have A Shiny Ball, Now What? 62

63 Today FP Units are Not the Problem 8 cores L1/reg/TLB L2 L3 63

64 Rough Energy Numbers (45nm) Integer FP Memory Add FAdd Cache (64bit) 8 bit 0.03pJ 16 bit 0.4pJ 8KB 10pJ 32 bit 0.1pJ 32 bit 0.9pJ 32KB 20pJ Mult FMult 1MB 100pJ 8 bit 0.2pJ 16 bit 1pJ DRAM nJ 32 bit 3 pj 32 bit 4pJ Instruction Energy Breakdown 25pJ 6pJ Control 70 pj I-Cache Access Register File Access Add 64

65 What Is Going On Here? Dedicated 1000 Energy Efficiency (MOPS/mW) CPUs CPUs+GPUs GP DSPs ~1000x

66 The Truth: It s More About the Algorithm then the Hardware All Algorithms GPU Alg 66

67 Highly Local Computation Model 67

68 Highly Local Computation Model 68

69 Highly Local Computation Model 69

70 Compose These Cores into a Pipeline Program in space, not time Makes building programmable hardware more difficult 70

71 Great, But Can A User Program It? Frankencamera 4 User code Cool images 71 71

72 Goals Have user code in a image friendly language Language should facilitate writing image/vision processing Analyze/compile the language for different targets CPU / GPU / FPGA Create not just the hardware bit file But also the hardware drivers and application level API 72

73 How: Constructors to Encode Domain Knowledge Encapsulate domain knowledge in the system Build constructor from lower level constructors Clean interfaces are critical Reuse both constructor and most of the configuration file 73

74 Halide Language Language for creating fast image processing apps Separate algorithm from schedule Target CPU and GPU 74

75 What Halide Does For You Tiled Fused Vectorized Multithreaded 11x faster And not readable 75

76 Architecture Template: Stencil Functions and Line Buffers Stencil functions consume sliding windows of data Huge locality To capture this locality need to buffer a few lines Line buffer is the hardware buffer block. 76

77 Design Flow 77

78 Performance Results Performance compared to Nvidia TK1 78

79 Energy Results 79

80 Conclusions Designing the best arithmetic unit depends on: Technology and constraints Finding the right metrics is critical Details matter Must assess performance/area/energy of your idea Generators (procedural knowledge) is a good approach to do this Key to performance scaling in the future is the memory Need applications with high locality 80

Computing s Energy Problem:

Computing s Energy Problem: Computing s Energy Problem: (and what we can do about it) Mark Horowitz Stanford University horowitz@ee.stanford.edu Everything Has A Computer Inside 2 The Reason is Simple: Moore s Law Made Gates Cheap

More information

Lecture 5. Other Adder Issues

Lecture 5. Other Adder Issues Lecture 5 Other Adder Issues Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 24 by Mark Horowitz with information from Brucek Khailany 1 Overview Reading There

More information

Many ways to build logic out of MOSFETs

Many ways to build logic out of MOSFETs Many ways to build logic out of MOSFETs pass transistor logic (most similar to the first switch logic we saw) static CMOS logic (what we saw last time) dynamic CMOS logic Clock=0 precharges X through the

More information

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation Course Goals Lab Understand key components in VLSI designs Become familiar with design tools (Cadence) Understand design flows Understand behavioral, structural, and physical specifications Be able to

More information

An Asynchronous Floating-Point Multiplier

An Asynchronous Floating-Point Multiplier An Asynchronous Floating-Point Multiplier Basit Riaz Sheikh and Rajit Manohar Computer Systems Lab Cornell University http://vlsi.cornell.edu/ The person that did the work! Motivation Fast floating-point

More information

Lecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2)

Lecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2) Lecture Topics Today: The MIPS ISA (P&H 2.1-2.14) Next: continued 1 Announcements Milestone #1 (due 1/26) Milestone #2 (due 2/2) Milestone #3 (due 2/9) 2 1 Evolution of Computing Machinery To understand

More information

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS Radu Zlatanovici zradu@eecs.berkeley.edu http://www.eecs.berkeley.edu/~zradu Department of Electrical Engineering and Computer Sciences University

More information

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems Computer Architecture Review ICS332 - Spring 2016 Operating Systems ENIAC (1946) Electronic Numerical Integrator and Calculator Stored-Program Computer (instead of Fixed-Program) Vacuum tubes, punch cards

More information

Two-Level CLA for 4-bit Adder. Two-Level CLA for 4-bit Adder. Two-Level CLA for 16-bit Adder. A Closer Look at CLA Delay

Two-Level CLA for 4-bit Adder. Two-Level CLA for 4-bit Adder. Two-Level CLA for 16-bit Adder. A Closer Look at CLA Delay Two-Level CLA for 4-bit Adder Individual carry equations C 1 = g 0 +p 0, C 2 = g 1 +p 1 C 1,C 3 = g 2 +p 2 C 2, = g 3 +p 3 C 3 Fully expanded (infinite hardware) CLA equations C 1 = g 0 +p 0 C 2 = g 1

More information

Microprocessor and DSP Technologies for the Nanoscale Era

Microprocessor and DSP Technologies for the Nanoscale Era Microprocessor and DSP Technologies for the Nanoscale Era Seminar 1 Ram Kumar Krishnamurthy Microprocessor Research Labs Intel Corporation, Hillsboro, OR ram.krishnamurthy@intel.com 1 July 5, 2005 Intel

More information

AT Arithmetic. Integer addition

AT Arithmetic. Integer addition AT Arithmetic Most concern has gone into creating fast implementation of (especially) FP Arith. Under the AT (area-time) rule, area is (almost) as important. So it s important to know the latency, bandwidth

More information

Single Cycle Datapath

Single Cycle Datapath Single Cycle atapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili Section 4.-4.4 Appendices B.7, B.8, B.,.2 Practice Problems:, 4, 6, 9 ing (2) Introduction We will examine two MIPS implementations

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers Implementation Today Review representations (252/352 recap) Floating point Addition: Ripple

More information

Introduction to Microprocessor

Introduction to Microprocessor Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device

More information

Microcomputers. Outline. Number Systems and Digital Logic Review

Microcomputers. Outline. Number Systems and Digital Logic Review Microcomputers Number Systems and Digital Logic Review Lecture 1-1 Outline Number systems and formats Common number systems Base Conversion Integer representation Signed integer representation Binary coded

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)"

Lecture 28 Multicore, Multithread Suggested reading: (H&P Chapter 7.4) Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)" 1" Processor components" Multicore processors and programming" Processor comparison" CSE 30321 - Lecture 01 - vs." Goal: Explain

More information

Tailoring the 32-Bit ALU to MIPS

Tailoring the 32-Bit ALU to MIPS Tailoring the 32-Bit ALU to MIPS MIPS ALU extensions Overflow detection: Carry into MSB XOR Carry out of MSB Branch instructions Shift instructions Slt instruction Immediate instructions ALU performance

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Boris Grot and Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh General Information Instructors: Boris

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

Alternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model

Alternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model What is Computer Architecture? Structure: static arrangement of the parts Organization: dynamic interaction of the parts and their control Implementation: design of specific building blocks Performance:

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will

More information

EITF35: Introduction to Structured VLSI Design

EITF35: Introduction to Structured VLSI Design EITF35: Introduction to Structured VLSI Design Part 1.1.2: Introduction (Digital VLSI Systems) Liang Liu liang.liu@eit.lth.se 1 Outline Why Digital? History & Roadmap Device Technology & Platforms System

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26) Lecture Topics Today: Integer Arithmetic (P&H 3.1-3.4) Next: continued 1 Announcements Consulting hours Introduction to Sim Milestone #1 (due 1/26) 2 1 Overview: Integer Operations Internal representation

More information

EE3032 Introduction to VLSI Design

EE3032 Introduction to VLSI Design EE3032 Introduction to VLSI Design Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering National Central University Jhongli, Taiwan Contents Syllabus Introduction to CMOS

More information

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015 CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 3, 2015 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

Computer Architecture s Changing Definition

Computer Architecture s Changing Definition Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction

More information

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016 CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 2, 2016 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if

More information

SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I

SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK Subject with Code : DICD (16EC5703) Year & Sem: I-M.Tech & I-Sem Course

More information

More Course Information

More Course Information More Course Information Labs and lectures are both important Labs: cover more on hands-on design/tool/flow issues Lectures: important in terms of basic concepts and fundamentals Do well in labs Do well

More information

A Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis

A Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis A Data-Parallel Genealogy: The GPU Family Tree John Owens University of California, Davis Outline Moore s Law brings opportunity Gains in performance and capabilities. What has 20+ years of development

More information

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141 ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

Let s put together a Manual Processor

Let s put together a Manual Processor Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce

More information

E40M. MOS Transistors, CMOS Logic Circuits, and Cheap, Powerful Computers. M. Horowitz, J. Plummer, R. Howe 1

E40M. MOS Transistors, CMOS Logic Circuits, and Cheap, Powerful Computers. M. Horowitz, J. Plummer, R. Howe 1 E40M MOS Transistors, CMOS Logic Circuits, and Cheap, Powerful Computers M. Horowitz, J. Plummer, R. Howe 1 Reading Chapter 4 in the reader For more details look at A&L 5.1 Digital Signals (goes in much

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures ECE 747 Digital Signal Processing Architecture DSP Implementation Architectures Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 406 Spring 2006 Slide 1 My Goal Challenge

More information

GRE Architecture Session

GRE Architecture Session GRE Architecture Session Session 2: Saturday 23, 1995 Young H. Cho e-mail: youngc@cs.berkeley.edu www: http://http.cs.berkeley/~youngc Y. H. Cho Page 1 Review n Homework n Basic Gate Arithmetics n Bubble

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

Integer Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25)

Integer Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25) Back to Arithmetic Before, we did Representation of integers Addition/Subtraction Logical ops Forecast Integer Multiplication Integer Division Floating-point Numbers Floating-point Addition/Multiplication

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

EE577b. Register File. By Joong-Seok Moon

EE577b. Register File. By Joong-Seok Moon EE577b Register File By Joong-Seok Moon Register File A set of registers that store data Consists of a small array of static memory cells Smallest size and fastest access time in memory hierarchy (Register

More information

FABRICATION TECHNOLOGIES

FABRICATION TECHNOLOGIES FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general

More information

Suggested Readings! Lecture 24" Parallel Processing on Multi-Core Chips! Technology Drive to Multi-core! ! Readings! ! H&P: Chapter 7! vs.! CSE 30321!

Suggested Readings! Lecture 24 Parallel Processing on Multi-Core Chips! Technology Drive to Multi-core! ! Readings! ! H&P: Chapter 7! vs.! CSE 30321! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 7!! (Over next 2 weeks)! Lecture 24" Parallel Processing on Multi-Core Chips! 3! Processor components! Multicore processors and programming! Processor

More information

Huh? Lecture 01 Introduction to CSE You can learn about good routes to run if you!re visiting Chicago...

Huh? Lecture 01 Introduction to CSE You can learn about good routes to run if you!re visiting Chicago... 1 Huh? 2 All of the following are magazines that are regularly delivered to the Niemier household. Lecture 01 Introduction to CSE 30321 3 4 You can learn about good routes to run if you!re visiting Chicago...

More information

Fundamentals of Computer Design

Fundamentals of Computer Design CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining

More information

Computer Architecture (TT 2012)

Computer Architecture (TT 2012) Computer Architecture (TT 2012) The Register Transfer Level Daniel Kroening Oxford University, Computer Science Department Version 1.0, 2011 Outline Reminders Gates Implementations of Gates Latches, Flip-flops

More information

Learning Outcomes. Spiral 2-2. Digital System Design DATAPATH COMPONENTS

Learning Outcomes. Spiral 2-2. Digital System Design DATAPATH COMPONENTS 2-2. 2-2.2 Learning Outcomes piral 2-2 Arithmetic Components and Their Efficient Implementations I understand the control inputs to counters I can design logic to control the inputs of counters to create

More information

Lecture 11: MOS Memory

Lecture 11: MOS Memory Lecture 11: MOS Memory MAH, AEN EE271 Lecture 11 1 Memory Reading W&E 8.3.1-8.3.2 - Memory Design Introduction Memories are one of the most useful VLSI building blocks. One reason for their utility is

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

CS152 Computer Architecture and Engineering Lecture 16: Memory System

CS152 Computer Architecture and Engineering Lecture 16: Memory System CS152 Computer Architecture and Engineering Lecture 16: System March 15, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http://http.cs.berkeley.edu/~patterson

More information

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore

More information

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS 2-2. 2-2.2 Learning Outcomes piral 2 2 Arithmetic Components and Their Efficient Implementations I know how to combine overflow and subtraction results to determine comparison results of both signed and

More information

ELCT 501: Digital System Design

ELCT 501: Digital System Design ELCT 501: Digital System Lecture 1: Introduction Dr. Mohamed Abd El Ghany, Mohamed.abdel-ghany@guc.edu.eg Administrative Rules Course components: Lecture: Thursday (fourth slot), 13:15-14:45 (H8) Office

More information

A Data-Parallel Genealogy: The GPU Family Tree

A Data-Parallel Genealogy: The GPU Family Tree A Data-Parallel Genealogy: The GPU Family Tree Department of Electrical and Computer Engineering Institute for Data Analysis and Visualization University of California, Davis Outline Moore s Law brings

More information

Introduction to Boole algebra. Binary algebra

Introduction to Boole algebra. Binary algebra Introduction to Boole algebra Binary algebra Boole algebra George Boole s book released in 1847 We have only two digits: true and false We have NOT, AND, OR, XOR etc operations We have axioms and theorems

More information

The Memory Hierarchy 1

The Memory Hierarchy 1 The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow

More information

By, Ajinkya Karande Adarsh Yoga

By, Ajinkya Karande Adarsh Yoga By, Ajinkya Karande Adarsh Yoga Introduction Early computer designers believed saving computer time and memory were more important than programmer time. Bug in the divide algorithm used in Intel chips.

More information

discrete logic do not

discrete logic do not Welcome to my second year course on Digital Electronics. You will find that the slides are supported by notes embedded with the Powerpoint presentations. All my teaching materials are also available on

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

The Xilinx XC6200 chip, the software tools and the board development tools

The Xilinx XC6200 chip, the software tools and the board development tools The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

CAD4 The ALU Fall 2009 Assignment. Description

CAD4 The ALU Fall 2009 Assignment. Description CAD4 The ALU Fall 2009 Assignment To design a 16-bit ALU which will be used in the datapath of the microprocessor. This ALU must support two s complement arithmetic and the instructions in the baseline

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh (thanks to Prof. Nigel Topham) General Information Instructor

More information

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Introduction. Summary. Why computer architecture? Technology trends Cost issues Introduction 1 Summary Why computer architecture? Technology trends Cost issues 2 1 Computer architecture? Computer Architecture refers to the attributes of a system visible to a programmer (that have

More information

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007 EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides

More information

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1> Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

The Design of the KiloCore Chip

The Design of the KiloCore Chip The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory

More information

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview : Computer Architecture and Organization Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ : Computer Architecture and Organization -- Course Overview Goals»

More information

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Digital Integrated Circuits A Design Perspective Jan M. Rabaey Outline (approximate) Introduction and Motivation The VLSI Design Process Details of the MOS Transistor Device Fabrication Design Rules CMOS

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Memory in Digital Systems

Memory in Digital Systems MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked

More information

VARUN AGGARWAL

VARUN AGGARWAL ECE 645 PROJECT SPECIFICATION -------------- Design A Microprocessor Functional Unit Able To Perform Multiplication & Division Professor: Students: KRIS GAJ LUU PHAM VARUN AGGARWAL GMU Mar. 2002 CONTENTS

More information

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM Memories Overview Memory Classification Read-Only Memory (ROM) Types of ROM PROM, EPROM, E 2 PROM Flash ROMs (Compact Flash, Secure Digital, Memory Stick) Random Access Memory (RAM) Types of RAM Static

More information

ECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path

ECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path ECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path Project Summary This project involves the schematic and layout design of an 8-bit microprocessor

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

Arithmetic Logic Unit. Digital Computer Design

Arithmetic Logic Unit. Digital Computer Design Arithmetic Logic Unit Digital Computer Design Arithmetic Circuits Arithmetic circuits are the central building blocks of computers. Computers and digital logic perform many arithmetic functions: addition,

More information

THE DESIGN OF AN IC HALF PRECISION FLOATING POINT ARITHMETIC LOGIC UNIT

THE DESIGN OF AN IC HALF PRECISION FLOATING POINT ARITHMETIC LOGIC UNIT Clemson University TigerPrints All Theses Theses 12-2009 THE DESIGN OF AN IC HALF PRECISION FLOATING POINT ARITHMETIC LOGIC UNIT Balaji Kannan Clemson University, balaji.n.kannan@gmail.com Follow this

More information

Arithmetic Circuits. Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak.

Arithmetic Circuits. Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak. Arithmetic Circuits Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak http://www.syssec.ethz.ch/education/digitaltechnik_14 Adapted from Digital Design and Computer Architecture, David Money

More information

Combinational Circuits

Combinational Circuits Combinational Circuits Q. What is a combinational circuit? A. Digital: signals are or. A. No feedback: no loops. analog circuits: signals vary continuously sequential circuits: loops allowed (stay tuned)

More information

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS 2-2. 2-2.2 Learning Outcomes piral 2 2 Arithmetic Components and Their Efficient Implementations I know how to combine overflow and subtraction results to determine comparison results of both signed and

More information

Overview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems.

Overview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems. Overview EE 15 - omponents and Design Techniques for Digital ystems Lec 16 Arithmetic II (Multiplication) Review of Addition Overflow Multiplication Further adder optimizations for multiplication LA in

More information

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing for DirectX Graphics Richard Huddy European Developer Relations Manager Also on today from ATI... Start & End Time: 12:00pm 1:00pm Title: Precomputed Radiance Transfer and Spherical Harmonic

More information

Mark Redekopp, All rights reserved. EE 352 Unit 8. HW Constructs

Mark Redekopp, All rights reserved. EE 352 Unit 8. HW Constructs EE 352 Unit 8 HW Constructs Logic Circuits Combinational logic Perform a specific function (mapping of 2 n input combinations to desired output combinations) No internal state or feedback Given a set of

More information

ECE484 VLSI Digital Circuits Fall Lecture 01: Introduction

ECE484 VLSI Digital Circuits Fall Lecture 01: Introduction ECE484 VLSI Digital Circuits Fall 2017 Lecture 01: Introduction Adapted from slides provided by Mary Jane Irwin. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] CSE477 L01 Introduction.1

More information

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. UCAS-6 6 > Stanford > Imperial > Verify 2011 Marching Memory マーチングメモリ Tadao Nakamura 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. Flynn 1 Copyright 2010 Tadao Nakamura C-M-C Computer

More information

Elettronica T moduli I e II

Elettronica T moduli I e II Elettronica T moduli I e II Docenti: Massimo Lanzoni, Igor Loi Massimo.lanzoni@unibo.it igor.loi@unibo.it A.A. 2015/2016 Scheduling MOD 1 (Prof. Loi) Weeks 39,40,41,42, 43,44» MOS transistors» Digital

More information

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most

More information