Components of a Computer

Size: px

Start display at page:

Download "Components of a Computer"

Lorin Harvey
5 years ago
Views:

CS 6C: Great Ideas in Computer Architecture (Machine Structures) s Part I Instructors: Krste Asanovic & Vladimir Stojanovic hfp://insteecsberkeleyedu/~cs6c/ New- School Machine Structures (It s a bit

) Parallel Requests Assigned to computer eg, Search Katz Parallel Threads Assigned to core eg, Lookup, Ads So$ware Parallel InstrucZons > instruczon @ one Zme eg, 5 pipelined instruczons Parallel >

1 CS 6C: Great Ideas in Computer Architecture (Machine Structures) s Part I Instructors: Krste Asanovic & Vladimir Stojanovic hfp://insteecsberkeleyedu/~cs6c/ New- School Machine Structures (It s a bit more complicated!) Parallel Requests Assigned to computer eg, Search Katz Parallel Threads Assigned to core eg, Lookup, Ads So$ware Parallel InstrucZons > one Zme eg, 5 pipelined instruczons Parallel > data one Zme eg, Add of 4 pairs of words Hardware descripzons All one Zme Programming Languages Harness Parallelism & Achieve High Performance Hardware Warehouse Scale Computer How do we know? Core Input/Output InstrucZon Unit(s) Computer () Core Core FuncZonal Unit(s) A 0 +B 0 A +B A +B A 3 +B 3 Smart Phone Logic Gates Control path Components of a Computer PC Registers ArithmeZc & Logic Unit (ALU) Enable? Read/Write Write Read - Interface Program Bytes Input Output I/O- Interfaces 3 Performance DRAM Gap (latency) µproc 60%/year Time DRAM CPU - Performance Gap: (growing 50%/yr) DRAM %/year 980 microprocessor executes ~one instruczon in same Zme as DRAM access 05 microprocessor executes ~000 instruczons in same Zme as DRAM access 4 Inner Levels in memory hierarchy Outer Big Idea: Hierarchy Level Level Level 3 Level n Increasing distance from processor, decreasing speed Size of memory at each level As we move to outer levels the latency goes up and price per bit goes down Why? 5 Library Analogy WriZng a report based on books on reserve Eg, works of JD Salinger Go to library to get reserved book and place on desk in library If need more, check them out and keep on desk But don t return earlier books since might need them You hope this colleczon of ~0 books on desk enough to write report, despite 0 being only 00000% of books in UC Berkeley libraries 6

2 (one dot per access) Real Reference PaFerns Donald J Hatfield, Jeanette Gerald: Program Restructuring for Virtual IBM Systems Journal 0(3): 68-9 (9) Time Big Idea: Locality Temporal Locality (locality in Zme) Go back to same book on desktop mulzple Zmes If a memory locazon is referenced, then it will tend to be referenced again soon SpaCal Locality (locality in space) When go to book shelf, pick up mulzple books on JD Salinger since library stores related books together If a memory locazon is referenced, the locazons with nearby addresses will tend to be referenced soon 8 (one dot per access) Reference PaFerns Spa<al Locality Temporal Locality Donald J Hatfield, Jeanette Gerald: Program Time Restructuring for Virtual IBM Systems Journal 0(3): 68-9 (9) Principle of Locality Principle of Locality: Programs access small porzon of address space at any instant of Zme What program structures lead to temporal and spazal locality in instruczon accesses? In data accesses? 0 Reference PaFerns Instruc<on fetches Stack accesses accesses subrou<ne call n loop itera<ons argument access vector access scalar accesses subrou<ne return Time Philosophy Programmer- invisible hardware mechanism to give illusion of speed of fastest memory with size of largest memory Works fine even if programmer has no idea what a cache is However, performance- oriented programmers today somezmes reverse engineer cache design to design data structures to match cache We ll do that in Project 3

3 Access without Adding to Computer Load word instruczon: lw $t0,0($t)! $t contains 0 ten, [0] = 99 Control Enable? Read/Write Input issues address 0 ten to reads word at address 0 ten (99) 3 sends 99 to 4 loads 99 into register $t0 path PC Registers ArithmeZc & Logic Unit (ALU) Write Read Program Bytes Output 3 - Interface I/O- Interfaces 4 Access with Load word instruczon: lw $t0,0($t)! $t contains 0 ten, [0] = 99 With cache (similar to a hash) issues address 0 ten to checks to see if has copy of data at address 0 ten a If finds a match (Hit): cache reads 99, sends to processor b No match (Miss): cache sends address 0 to I reads 99 at address 0 ten II sends 99 to III replaces word with new 99 IV sends 99 to processor 3 loads 99 into register $t0 5 Administrivia Midterm results out last week Project - due Sunday March 5 th, :59PM Use pinned Piazza threads! We ll penalize those who ask, but don t search! Guerilla seczons starzng this weekend OpZonal seczons, focus on lecture/exam material, not projects Vote for Zme on Piazza poll 6 Midterm Score DistribuZon Mean: 56 Min: 65 Max: 900 Median: 580 Std Dev: 5 In the News: RowHammer Exploit Flipping Bits in Without Accessing Them: An Experimental Study of DRAM Disturbance Errors Yoongu Kim Ross Daly Jeremie Kim Chris Fallin Ji Hye Lee Donghyuk Lee Chris Wilkerson Konrad Lai Onur Mutlu Carnegie Mellon University Intel Labs CMU + Intel researchers found commercial DRAM chips suscepzble to neighboring bits flipping if one row of memory accessed frequently Google Engineers figured out how to use this to gain root access on a machine! Almost all laptops suscepzble, but server ECC memory helps reduce impact 8 3

4 s Need way to tell if have copy of locazon in memory so that can decide on hit or miss On cache miss, put memory address of block in tag address of cache block 0 placed in tag next to data from memory (99) From earlier instruczons 9 Anatomy of a 6 Byte, 4 Byte Block OperaZons: Hit Miss 3 Refill cache from memory needs s to decide if is a Hit or Miss Compares all 4 tags 3- bit 3- bit 5 3- bit bit 0 Replacement Suppose processor now requests locazon 5, which contains? Doesn t match any cache block, so must evict one resident block to make room Which block to evict? Replace viczm with new memory block at address Block Must be Aligned in Word blocks are aligned, so binary address of all words in cache always ends in 00 two How to take advantage of this to save hardware and energy? Don t need to compare last bits of 3- bit byte address (comparator can be narrower) => Don t need to store last bits of 3- bit byte address in ( can be narrower) Anatomy of a 3B, 8B Block Blocks must be aligned in pairs, otherwise could get same word twice in cache s only have even- numbered words Last 3 bits of address always 000 two s, comparators can be narrower Can get hit for either word in block 3- bit 3- bit 5 3- bit bit Hardware Cost of Need to compare every tag to the address Comparators are expensive OpZmizaZon: sets => ½ comparators bit selects which set 3- bit Set 0 Set 3- bit 3- bit 3- bit 4 4 4

5 Fields used by Controller Block Offset: Byte address within block Set : Selects which set : Remaining porzon of processor address (3- bits total) Set Size of = log (number of sets) Size of = size Size of log (number of bytes/block) What is limit to number of sets? Can save more comparators if have more than sets Limit: As Many Sets as Blocks only needs one comparator! Called Direct- Mapped Design 5 6 Mapping a 6- bit Mem Block Within $ Block Block Within $ Byte Offset Within Block (eg, Word) In example, block size is 4 bytes/ word (it could be mulz- word) and cache blocks are the same size, unit of transfer between memory and cache # blocks >> # blocks 6 blocks/6 words/64 bytes/6 bits to address all bytes 4 blocks, 4 bytes ( word) per block 4 blocks map to each cache block Byte within block: low order two bits, ignore! (nothing smaller than a block) block to cache block, aka index: middle two bits Which memory block is in a given cache block, aka tag: top two bits 0 One More Detail: Valid Bit When start a new program, cache does not have valid informazon for this program Need an indicator whether this tag entry is valid for this program Add a valid bit to the cache tag entry 0 => cache miss, even if by chance, address = tag => cache hit, if processor address = tag Caching: A Simple First Example Valid Q: Is the memory block in cache? Compare the cache tag to the high- order memory address bits to tell if the memory block is in the cache (provided valid bit is set) 0000xx 000xx 000xx 00xx 000xx 00xx 00xx 0xx 000xx 00xx 00xx 0xx 00xx 0xx 0xx xx Main One word blocks Two low order bits (xx) define the byte in the block (3b words) Q: Where in the cache is the mem block? Use next low- order memory address bits the index to determine which cache block (ie, modulo the number of blocks in the cache) 9 Direct- Mapped Example One word blocks, cache size = K words (or 4KB) Hit Valid bit ensures something useful in cache for this index Compare with upper part of to see if a Hit 0 0 Valid What kind of locality are we taking advantage of? 3 Comparator Read data from cache instead of memory if a Hit 30 5

6 MulZword- Block Direct- Mapped Four words/block, cache size = K words Byte Hit offset Valid What kind of locality are we taking advantage of? 3 3 Names for Each OrganizaZon Fully AssociaZve : Block can go anywhere First design in lecture Note: No field, but comparator/block Direct Mapped : Block goes one place Note: Only comparator Number of sets = number blocks N- way Set AssociaZve : N places for a block Number of sets = number of blocks / N Fully AssociaZve: N = number of blocks Direct Mapped: N = 3 Range of Set- AssociaZve s For a fixed- size cache, each increase by a factor of in associazvity doubles the number of blocks per set (ie, the number of ways ) and halves the number of sets decreases the size of the index by bit and increases the size of the tag by bit More AssociaZvity (more ways) Note: IBM persists in calling sets ways and ways sets They re wrong 33 Clickers/Peer InstrucZon For a cache with constant total capacity, if we increase the number of ways by a factor of, which statement is false: A: The number of sets could be doubled B: The tag width could decrease C: The number of tags could stay the same D: The block size could be halved E: width must increase 34 path Typical Hierarchy On- Chip Components Control RegFile Instr Second- Level (SRAM) Third- Level (SRAM) Main (DRAM) Secondary (Disk Or Flash) Speed (cycles): ½ s s 0 s 00 s,000,000 s Size (bytes): 00 s 0K s M s G s T s Cost/bit: highest lowest Principle of locality + memory hierarchy presents programmer with as much memory as is available in the cheapest technology at the speed offered by the fastest technology Handling Stores with Write- Through Store instruczons write to memory, changing values Need to make sure cache and memory have same values on writes: policies ) Write- Through Policy: write cache and write through the cache to memory Every write eventually gets to memory Too slow, so include Write Buffer to allow processor to conznue once data in Buffer Buffer updates memory in parallel to processor

7 Write- Through Write both values in cache and in memory Write buffer stops CPU from stalling if memory cannot keep up Write buffer may have mulzple entries to absorb bursts of writes What if store misses in cache? 3- bit 3- bit 5 3- bit 3- bit 0 99 Write 3 Buffer 04 0 Addr Handling Stores with Write- Back ) Write- Back Policy: write only to cache and then write cache block back to memory when evict block from cache Writes collected in cache, only single write to memory per block Include bit to see if wrote to block or not, and then only write back if bit is set Called Dirty bit (wrizng makes it dirty ) 3 38 Write- Back Store/cache hit, write data in cache only & set dirty bit has stale value Store/cache miss, read data from memory, then update and set dirty bit Write- allocate policy Load/cache hit, use value from cache On any miss, write back evicted block, only if dirty Update cache with new block and clear dirty bit 3- bit 3- bit 3- bit 5 D 0 Dirty D Bits D D 0 3- bit 39 Write- Through vs Write- Back Write- Through: Write- Back Simpler control logic More complex control logic More predictable Zming More variable Zming (0,, simplifies processor control memory accesses per logic cache access) Easier to make reliable, since Usually reduces write memory always has copy of traffic data (big idea: Redundancy!) Harder to make reliable, somezmes cache has only copy of data 40 And In Conclusion, Principle of Locality for Libraries /Computer Hierarchy of Memories (speed/size/cost per bit) to Exploit Locality copy of data lower level in memory hierarchy Direct Mapped to find block in cache using field and Valid bit for Hit design choice: Write- Through vs Write- Back 4

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part I

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part I Instructors: Krste Asanovic & Vladimir Stojanovic hbp://inst.eecs.berkeley.edu/~cs61c/ New- School Machine Structures (It