MOS High Performance Arithmetic
|
|
- Hannah Montgomery
- 5 years ago
- Views:
Transcription
1 MOS High Performance Arithmetic Mark Horowitz Stanford University
2 Arithmetic Is Important Then Now (TegraK1) 2
3 What Is Hard?
4 Proof: 1 st Gen 2 nd Gen 4
5 And Getting The Data You Need But we didn t notice this until much later To notice this problem We need many advances in technology 5
6 3 rd Gen Relays (Z3 1940) A few adds/sec 6
7 4 th Gen Tubes (Eniac 1945) 5000 Adds/sec 7
8 5 th Gen Transistors (TX-0, Transistor ) TX-0 8
9 Modern Era IC, 1961 Image from State of the Art Stan Augarten 9
10 Moore s Law From Electronics, Volume 38, Number 8, April 19, 1965 Number of components on IC doubles every year Later modified to doubling every 18 to 24 months 10
11 ECL Computers 11
12 Microprocessor MOS Processor (1974)
13 nmos
14 CMOS 1985 To Present
15 CMOS (Arithmetic) Design What makes a good design? 15
16 Try to Balance 4 Parameters Area Performance Power Design Time 16
17 The Good News By the time CMOS came along There had been a lot of work on arithmetic Booth coding Wallace trees Ling coding Tree adders Manchester carry chains SRT division 17
18 The Bad News The best logical design depends on technology Remember the carry dependency? For relays a Manchester carry chain is the best All the delay is in changing the relay state P0_pv1 P1_pv1 P2_pv1 G0_pv1 G1_pv1 G2_pv1 18
19 More Bad News The metrics you are optimizing work in opposition Energy Performance 19
20 Just to Make Life More Complicated Your metrics change w/ technology scaling 20
21 Must Use Technology Independent Metrics Performance In terms of a FO4 delay FO4 Fanout=4 inverter delay at TT, 90% Vdd, 125C * Ldrawn Gate delay (ps) Technology Ldrawn (um) 21
22 Area and Energy Area Measure linear dimensions in features Energy CMOS energy is Normalize by 22
23 Dennard s Scaling The triple play: Get more gates, 1/L 2 1/ 2 Gates get faster, CV/i Energy per switch CV 2 3 Dennard, JSSC, pp , Oct
24 Three Era s of CMOS Arithmetic Design Getting Going Area constrained Party time! Performance Constrained The hangover Power Constrained 24
25 GETTING STARTED 25
26 Life in the 80 s Just learning how to design complex chips Chips had 100K transistors Almost no CAD tools Worried about it getting the design done And getting all the functions to fit on chip Getting the design to fit was job one Getting it to go fast was job two 26
27 Main Effects on Arithmetic Circuits Merged Function Blocks ALU A B Lookup Table P F 27
28 Precharge Logic static current W Dual pmos Network W precharge A non-overlapping (good, but not always possible) 2W B precharge precharge 2W evaluate evaluate evaluate Psuedo-nMOS CMOS Pre-Charge 28
29 Carry Chains, and Carry Skip Adders P0*P1*P2*P3 Cin Carry Carry Carry Carry Cout C0 C1 C2 C3 PG Cin =0 XOR Cin=1 XOR Mux 29
30 Iterating Structures Main processor Just use instructions (micro-code) and ALU Co-processor Also used iterating structures But built these structures for multiple or division Often asynchronous 30
31 MIPS R3010 Multiplier Clocked by internal oscillator, not external clock CSA CSA 31
32 A Self-Timed Pipeline C C C C prech prech prech prech clk clk clk clk out+ outout+ out- out+ out- out+ out- in+ inin+ inin+ inin+ in- 32
33 A Self-Timed Pipeline Data enters at the far left and the NOR gate flips This activates the C-element C C C C prech prech prech prech clk clk clk clk out+ outout+ out- out+ out- out+ out- in+ inin+ inin+ inin+ in- 33
34 A Self-Timed Pipeline First logic block goes into evaluate C C C C eval prech prech prech clk clk clk clk out+ outout+ out- out+ out- out+ out- in+ inin+ inin+ inin+ in- 34
35 A Self-Timed Pipeline C C C C eval prech prech prech clk clk clk clk out+ outout+ out- out+ out- out+ out- in+ inin+ inin+ inin+ in- 35
36 A Self-Timed Pipeline Second block goes into evaluate Primary inputs are deasserted, flipping the first NOR gate C C C C eval eval prech prech clk clk clk clk out+ outout+ out- out+ out- out+ out- in+ inin+ inin+ inin+ in- 36
37 Division - SRT Ted Williams Completely self-timed 37
38 PARTY TIME! 38
39 Performance, Performance, Performance Scaling provided Enough transistors Low energy, and fast gates Goal was to find the fastest structures Lots of dual rail domino logic Started to build full array/trees Many of the trees were regular (4:2 adder) for designer sanity 39
40 Ling Adder Implementation Sam Naffziger (HP, 1996) presented a 64b adder 7 FO4 delay (< 1nS): pretty darn fast 0.5 m CMOS From VLSI lecture notes in early 2000 s 40
41 Kogge Stone Adders H 64 0 H H64 H16/I16 H4/I4 cin (g 0, t 0 ) (g 62, t 62 ) 41
42 Alignment Shifter Build full shifter 42
43 Even Fuse Multiplier and Adder Together IBM Power 6 FMA 5 GHz 7-stage in 65nm Dependent unrounded results forwarded making dependent latency 6 cycles instead of 7 (6,6,7) design 43
44 Life Was Good, For a While 44
45 THE HANGOVER 45
46 But You Have to Pay Eventually 46
47 The Power Limit Watts/mm
48 Clever Power Increased Because We Were Greedy 10x too large 48
49 This Power Problem Is Not Going Away: P = C * Vdd 2 * f L
50 Think About It 50
51 32 bit CMOS Adder Design Space domino Sklansky Ling w/ 2bit sum select static Sklansky 10 Energy in pj dual rail Sklansky Ling Delay in 100ps 51
52 Performance Metrics Normally think of delay of unit But that only matter if there is a dependent op Many applications have many non-dependent ops These are throughput based systems Adding units improves performance 52
53 The Rise of Multi-Core Processors 53
54 The Stagnation of Multi-Core Processors 54
55 Throughput Based Designs For applications with abundant parallelism Leveraging parallelism helps energy efficiency But when do you stop Lower performance is almost always lower energy Minimum energy designs, Sea of very slow processors Meters of silicon area What to optimize? 55
56 56 Optimize Energy/Op vs. Area/Throughput
57 Floating Point Optimization 180nm ITRS 10nm 57
58 In This Space the Details Matter Implementation of Booth Mux More important then whether Booth 2, or Booth 3 How you wire the CSA array Is more important than the type of counter Most fancy adder tricks Produce worse designs 58
59 Built an FP Generator in
60 FMA Output 60
61 CMA vs. FMA For Latency For Throughput 61
62 Have A Shiny Ball, Now What? 62
63 Today FP Units are Not the Problem 8 cores L1/reg/TLB L2 L3 63
64 Rough Energy Numbers (45nm) Integer FP Memory Add FAdd Cache (64bit) 8 bit 0.03pJ 16 bit 0.4pJ 8KB 10pJ 32 bit 0.1pJ 32 bit 0.9pJ 32KB 20pJ Mult FMult 1MB 100pJ 8 bit 0.2pJ 16 bit 1pJ DRAM nJ 32 bit 3 pj 32 bit 4pJ Instruction Energy Breakdown 25pJ 6pJ Control 70 pj I-Cache Access Register File Access Add 64
65 What Is Going On Here? Dedicated 1000 Energy Efficiency (MOPS/mW) CPUs CPUs+GPUs GP DSPs ~1000x
66 The Truth: It s More About the Algorithm then the Hardware All Algorithms GPU Alg 66
67 Highly Local Computation Model 67
68 Highly Local Computation Model 68
69 Highly Local Computation Model 69
70 Compose These Cores into a Pipeline Program in space, not time Makes building programmable hardware more difficult 70
71 Great, But Can A User Program It? Frankencamera 4 User code Cool images 71 71
72 Goals Have user code in a image friendly language Language should facilitate writing image/vision processing Analyze/compile the language for different targets CPU / GPU / FPGA Create not just the hardware bit file But also the hardware drivers and application level API 72
73 How: Constructors to Encode Domain Knowledge Encapsulate domain knowledge in the system Build constructor from lower level constructors Clean interfaces are critical Reuse both constructor and most of the configuration file 73
74 Halide Language Language for creating fast image processing apps Separate algorithm from schedule Target CPU and GPU 74
75 What Halide Does For You Tiled Fused Vectorized Multithreaded 11x faster And not readable 75
76 Architecture Template: Stencil Functions and Line Buffers Stencil functions consume sliding windows of data Huge locality To capture this locality need to buffer a few lines Line buffer is the hardware buffer block. 76
77 Design Flow 77
78 Performance Results Performance compared to Nvidia TK1 78
79 Energy Results 79
80 Conclusions Designing the best arithmetic unit depends on: Technology and constraints Finding the right metrics is critical Details matter Must assess performance/area/energy of your idea Generators (procedural knowledge) is a good approach to do this Key to performance scaling in the future is the memory Need applications with high locality 80
Computing s Energy Problem:
Computing s Energy Problem: (and what we can do about it) Mark Horowitz Stanford University horowitz@ee.stanford.edu Everything Has A Computer Inside 2 The Reason is Simple: Moore s Law Made Gates Cheap
More informationLecture 5. Other Adder Issues
Lecture 5 Other Adder Issues Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 24 by Mark Horowitz with information from Brucek Khailany 1 Overview Reading There
More informationMany ways to build logic out of MOSFETs
Many ways to build logic out of MOSFETs pass transistor logic (most similar to the first switch logic we saw) static CMOS logic (what we saw last time) dynamic CMOS logic Clock=0 precharges X through the
More informationLab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation
Course Goals Lab Understand key components in VLSI designs Become familiar with design tools (Cadence) Understand design flows Understand behavioral, structural, and physical specifications Be able to
More informationAn Asynchronous Floating-Point Multiplier
An Asynchronous Floating-Point Multiplier Basit Riaz Sheikh and Rajit Manohar Computer Systems Lab Cornell University http://vlsi.cornell.edu/ The person that did the work! Motivation Fast floating-point
More informationLecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2)
Lecture Topics Today: The MIPS ISA (P&H 2.1-2.14) Next: continued 1 Announcements Milestone #1 (due 1/26) Milestone #2 (due 2/2) Milestone #3 (due 2/9) 2 1 Evolution of Computing Machinery To understand
More informationPOWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS
POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS Radu Zlatanovici zradu@eecs.berkeley.edu http://www.eecs.berkeley.edu/~zradu Department of Electrical Engineering and Computer Sciences University
More informationComputer Architecture Review. ICS332 - Spring 2016 Operating Systems
Computer Architecture Review ICS332 - Spring 2016 Operating Systems ENIAC (1946) Electronic Numerical Integrator and Calculator Stored-Program Computer (instead of Fixed-Program) Vacuum tubes, punch cards
More informationTwo-Level CLA for 4-bit Adder. Two-Level CLA for 4-bit Adder. Two-Level CLA for 16-bit Adder. A Closer Look at CLA Delay
Two-Level CLA for 4-bit Adder Individual carry equations C 1 = g 0 +p 0, C 2 = g 1 +p 1 C 1,C 3 = g 2 +p 2 C 2, = g 3 +p 3 C 3 Fully expanded (infinite hardware) CLA equations C 1 = g 0 +p 0 C 2 = g 1
More informationMicroprocessor and DSP Technologies for the Nanoscale Era
Microprocessor and DSP Technologies for the Nanoscale Era Seminar 1 Ram Kumar Krishnamurthy Microprocessor Research Labs Intel Corporation, Hillsboro, OR ram.krishnamurthy@intel.com 1 July 5, 2005 Intel
More informationAT Arithmetic. Integer addition
AT Arithmetic Most concern has gone into creating fast implementation of (especially) FP Arith. Under the AT (area-time) rule, area is (almost) as important. So it s important to know the latency, bandwidth
More informationSingle Cycle Datapath
Single Cycle atapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili Section 4.-4.4 Appendices B.7, B.8, B.,.2 Practice Problems:, 4, 6, 9 ing (2) Introduction We will examine two MIPS implementations
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers Implementation Today Review representations (252/352 recap) Floating point Addition: Ripple
More informationIntroduction to Microprocessor
Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device
More informationMicrocomputers. Outline. Number Systems and Digital Logic Review
Microcomputers Number Systems and Digital Logic Review Lecture 1-1 Outline Number systems and formats Common number systems Base Conversion Integer representation Signed integer representation Binary coded
More informationEE282 Computer Architecture. Lecture 1: What is Computer Architecture?
EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer
More informationLecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)"
Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)" 1" Processor components" Multicore processors and programming" Processor comparison" CSE 30321 - Lecture 01 - vs." Goal: Explain
More informationTailoring the 32-Bit ALU to MIPS
Tailoring the 32-Bit ALU to MIPS MIPS ALU extensions Overflow detection: Carry into MSB XOR Carry out of MSB Branch instructions Shift instructions Slt instruction Immediate instructions ALU performance
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationAddressing the Memory Wall
Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the
More informationComputer Architecture
Informatics 3 Computer Architecture Dr. Boris Grot and Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh General Information Instructors: Boris
More informationUnit 11: Putting it All Together: Anatomy of the XBox 360 Game Console
Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture
More informationAlternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model
What is Computer Architecture? Structure: static arrangement of the parts Organization: dynamic interaction of the parts and their control Implementation: design of specific building blocks Performance:
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationThis Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits
More informationECE 471 Embedded Systems Lecture 2
ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will
More informationEITF35: Introduction to Structured VLSI Design
EITF35: Introduction to Structured VLSI Design Part 1.1.2: Introduction (Digital VLSI Systems) Liang Liu liang.liu@eit.lth.se 1 Outline Why Digital? History & Roadmap Device Technology & Platforms System
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationLecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)
Lecture Topics Today: Integer Arithmetic (P&H 3.1-3.4) Next: continued 1 Announcements Consulting hours Introduction to Sim Milestone #1 (due 1/26) 2 1 Overview: Integer Operations Internal representation
More informationEE3032 Introduction to VLSI Design
EE3032 Introduction to VLSI Design Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering National Central University Jhongli, Taiwan Contents Syllabus Introduction to CMOS
More informationCS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015
CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 3, 2015 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if
More informationSpiral 2-8. Cell Layout
2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric
More informationComputer Architecture s Changing Definition
Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction
More informationCS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016
CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 2, 2016 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if
More informationSIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I
SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK Subject with Code : DICD (16EC5703) Year & Sem: I-M.Tech & I-Sem Course
More informationMore Course Information
More Course Information Labs and lectures are both important Labs: cover more on hands-on design/tool/flow issues Lectures: important in terms of basic concepts and fundamentals Do well in labs Do well
More informationA Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis
A Data-Parallel Genealogy: The GPU Family Tree John Owens University of California, Davis Outline Moore s Law brings opportunity Gains in performance and capabilities. What has 20+ years of development
More informationECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141
ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition
More informationComputer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra
Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating
More informationLet s put together a Manual Processor
Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce
More informationE40M. MOS Transistors, CMOS Logic Circuits, and Cheap, Powerful Computers. M. Horowitz, J. Plummer, R. Howe 1
E40M MOS Transistors, CMOS Logic Circuits, and Cheap, Powerful Computers M. Horowitz, J. Plummer, R. Howe 1 Reading Chapter 4 in the reader For more details look at A&L 5.1 Digital Signals (goes in much
More informationFPGA architecture and design technology
CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA
More informationECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures
ECE 747 Digital Signal Processing Architecture DSP Implementation Architectures Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 406 Spring 2006 Slide 1 My Goal Challenge
More informationGRE Architecture Session
GRE Architecture Session Session 2: Saturday 23, 1995 Young H. Cho e-mail: youngc@cs.berkeley.edu www: http://http.cs.berkeley/~youngc Y. H. Cho Page 1 Review n Homework n Basic Gate Arithmetics n Bubble
More informationComputer Architecture!
Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors
More informationInteger Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25)
Back to Arithmetic Before, we did Representation of integers Addition/Subtraction Logical ops Forecast Integer Multiplication Integer Division Floating-point Numbers Floating-point Addition/Multiplication
More informationFPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level
More informationEE577b. Register File. By Joong-Seok Moon
EE577b Register File By Joong-Seok Moon Register File A set of registers that store data Consists of a small array of static memory cells Smallest size and fastest access time in memory hierarchy (Register
More informationFABRICATION TECHNOLOGIES
FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general
More informationSuggested Readings! Lecture 24" Parallel Processing on Multi-Core Chips! Technology Drive to Multi-core! ! Readings! ! H&P: Chapter 7! vs.! CSE 30321!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7!! (Over next 2 weeks)! Lecture 24" Parallel Processing on Multi-Core Chips! 3! Processor components! Multicore processors and programming! Processor
More informationHuh? Lecture 01 Introduction to CSE You can learn about good routes to run if you!re visiting Chicago...
1 Huh? 2 All of the following are magazines that are regularly delivered to the Niemier household. Lecture 01 Introduction to CSE 30321 3 4 You can learn about good routes to run if you!re visiting Chicago...
More informationFundamentals of Computer Design
CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining
More informationComputer Architecture (TT 2012)
Computer Architecture (TT 2012) The Register Transfer Level Daniel Kroening Oxford University, Computer Science Department Version 1.0, 2011 Outline Reminders Gates Implementations of Gates Latches, Flip-flops
More informationLearning Outcomes. Spiral 2-2. Digital System Design DATAPATH COMPONENTS
2-2. 2-2.2 Learning Outcomes piral 2-2 Arithmetic Components and Their Efficient Implementations I understand the control inputs to counters I can design logic to control the inputs of counters to create
More informationLecture 11: MOS Memory
Lecture 11: MOS Memory MAH, AEN EE271 Lecture 11 1 Memory Reading W&E 8.3.1-8.3.2 - Memory Design Introduction Memories are one of the most useful VLSI building blocks. One reason for their utility is
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCS152 Computer Architecture and Engineering Lecture 16: Memory System
CS152 Computer Architecture and Engineering Lecture 16: System March 15, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http://http.cs.berkeley.edu/~patterson
More informationTechniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company
Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities
More informationComputer Architecture!
Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore
More informationLearning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS
2-2. 2-2.2 Learning Outcomes piral 2 2 Arithmetic Components and Their Efficient Implementations I know how to combine overflow and subtraction results to determine comparison results of both signed and
More informationELCT 501: Digital System Design
ELCT 501: Digital System Lecture 1: Introduction Dr. Mohamed Abd El Ghany, Mohamed.abdel-ghany@guc.edu.eg Administrative Rules Course components: Lecture: Thursday (fourth slot), 13:15-14:45 (H8) Office
More informationA Data-Parallel Genealogy: The GPU Family Tree
A Data-Parallel Genealogy: The GPU Family Tree Department of Electrical and Computer Engineering Institute for Data Analysis and Visualization University of California, Davis Outline Moore s Law brings
More informationIntroduction to Boole algebra. Binary algebra
Introduction to Boole algebra Binary algebra Boole algebra George Boole s book released in 1847 We have only two digits: true and false We have NOT, AND, OR, XOR etc operations We have axioms and theorems
More informationThe Memory Hierarchy 1
The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow
More informationBy, Ajinkya Karande Adarsh Yoga
By, Ajinkya Karande Adarsh Yoga Introduction Early computer designers believed saving computer time and memory were more important than programmer time. Bug in the divide algorithm used in Intel chips.
More informationdiscrete logic do not
Welcome to my second year course on Digital Electronics. You will find that the slides are supported by notes embedded with the Powerpoint presentations. All my teaching materials are also available on
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationThe Xilinx XC6200 chip, the software tools and the board development tools
The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions
More informationComputer Architecture!
Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!
More informationVerilog for High Performance
Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes
More informationCAD4 The ALU Fall 2009 Assignment. Description
CAD4 The ALU Fall 2009 Assignment To design a 16-bit ALU which will be used in the datapath of the microprocessor. This ALU must support two s complement arithmetic and the instructions in the baseline
More informationComputer Architecture
Informatics 3 Computer Architecture Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh (thanks to Prof. Nigel Topham) General Information Instructor
More informationIntroduction. Summary. Why computer architecture? Technology trends Cost issues
Introduction 1 Summary Why computer architecture? Technology trends Cost issues 2 1 Computer architecture? Computer Architecture refers to the attributes of a system visible to a programmer (that have
More informationEECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007
EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides
More informationChapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>
Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building
More informationCS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS
CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight
More informationThe Design of the KiloCore Chip
The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory
More informationEE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview
: Computer Architecture and Organization Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ : Computer Architecture and Organization -- Course Overview Goals»
More informationDigital Integrated Circuits A Design Perspective. Jan M. Rabaey
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Outline (approximate) Introduction and Motivation The VLSI Design Process Details of the MOS Transistor Device Fabrication Design Rules CMOS
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationMemory in Digital Systems
MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked
More informationVARUN AGGARWAL
ECE 645 PROJECT SPECIFICATION -------------- Design A Microprocessor Functional Unit Able To Perform Multiplication & Division Professor: Students: KRIS GAJ LUU PHAM VARUN AGGARWAL GMU Mar. 2002 CONTENTS
More informationOverview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM
Memories Overview Memory Classification Read-Only Memory (ROM) Types of ROM PROM, EPROM, E 2 PROM Flash ROMs (Compact Flash, Secure Digital, Memory Stick) Random Access Memory (RAM) Types of RAM Static
More informationECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path
ECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path Project Summary This project involves the schematic and layout design of an 8-bit microprocessor
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationDesign and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology
Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,
More informationArithmetic Logic Unit. Digital Computer Design
Arithmetic Logic Unit Digital Computer Design Arithmetic Circuits Arithmetic circuits are the central building blocks of computers. Computers and digital logic perform many arithmetic functions: addition,
More informationTHE DESIGN OF AN IC HALF PRECISION FLOATING POINT ARITHMETIC LOGIC UNIT
Clemson University TigerPrints All Theses Theses 12-2009 THE DESIGN OF AN IC HALF PRECISION FLOATING POINT ARITHMETIC LOGIC UNIT Balaji Kannan Clemson University, balaji.n.kannan@gmail.com Follow this
More informationArithmetic Circuits. Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak.
Arithmetic Circuits Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak http://www.syssec.ethz.ch/education/digitaltechnik_14 Adapted from Digital Design and Computer Architecture, David Money
More informationCombinational Circuits
Combinational Circuits Q. What is a combinational circuit? A. Digital: signals are or. A. No feedback: no loops. analog circuits: signals vary continuously sequential circuits: loops allowed (stay tuned)
More informationLearning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS
2-2. 2-2.2 Learning Outcomes piral 2 2 Arithmetic Components and Their Efficient Implementations I know how to combine overflow and subtraction results to determine comparison results of both signed and
More informationOverview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems.
Overview EE 15 - omponents and Design Techniques for Digital ystems Lec 16 Arithmetic II (Multiplication) Review of Addition Overflow Multiplication Further adder optimizations for multiplication LA in
More informationOptimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager
Optimizing for DirectX Graphics Richard Huddy European Developer Relations Manager Also on today from ATI... Start & End Time: 12:00pm 1:00pm Title: Precomputed Radiance Transfer and Spherical Harmonic
More informationMark Redekopp, All rights reserved. EE 352 Unit 8. HW Constructs
EE 352 Unit 8 HW Constructs Logic Circuits Combinational logic Perform a specific function (mapping of 2 n input combinations to desired output combinations) No internal state or feedback Given a set of
More informationECE484 VLSI Digital Circuits Fall Lecture 01: Introduction
ECE484 VLSI Digital Circuits Fall 2017 Lecture 01: Introduction Adapted from slides provided by Mary Jane Irwin. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] CSE477 L01 Introduction.1
More informationMarching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.
UCAS-6 6 > Stanford > Imperial > Verify 2011 Marching Memory マーチングメモリ Tadao Nakamura 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. Flynn 1 Copyright 2010 Tadao Nakamura C-M-C Computer
More informationElettronica T moduli I e II
Elettronica T moduli I e II Docenti: Massimo Lanzoni, Igor Loi Massimo.lanzoni@unibo.it igor.loi@unibo.it A.A. 2015/2016 Scheduling MOD 1 (Prof. Loi) Weeks 39,40,41,42, 43,44» MOS transistors» Digital
More informationOptimizing DirectX Graphics. Richard Huddy European Developer Relations Manager
Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most
More information