Data Cache Final Project Report ECE251: VLSI Systems Design UCI Spring, 2000
|
|
- Geoffrey Black
- 6 years ago
- Views:
Transcription
1 June 15, 2000 Data Cache Final Project Report ECE251: VLSI Systems Design UCI Spring, 2000 Jinfeng Liu Yi Deng ID: ID:
2 Project Summary In this project, we have designed and implemented a direct-mapped write back data cache using TSMC 0.18µm technology. This design has been carried out with Magic layout editor using a SCN6M_SUBM.10 technology file. The data cache is featured with a total capacity of 512 bytes with four-word per cache line. All the cells in the cache share one read and one write port through MUX. In addition, separated read and write address decoders are used to support simultaneous read and write operations. Logic function verification is performed by IRSIM, whereas a more comprehensive simulation of the data cache is done with HSPICE. Preliminary results show our design performs well with 3ns cycle time.
3 1. OBJECTIVE This project is to design and implement a data cache system using the 0.18 technology. The specifications are: o o o o o o Direct-mapped Total 32 cache lines Four words (128 bits) per cache line Write-back One read port and one write port with separated address decoder 32-bit data input, 32-bit data output o Initialize valid bit to 0, dirty bit to 1 The project structure is shown in Figure 1. Two separated read and write address decoders and ports are needed in order to support simultaneous read and write operation. In addition, comparators are used to compare the tag in the address and the tag stored in the cache, and generate HIT signal for read and write accordingly. Furthermore, the 4-1 MUX serves as a selector, which picks one of the four words that are read from the cells according to the offset bits -- the third and fourth bits in the address. Similarly, the 1-4 DEMUX addresses the 32-bit data input to 1 of 4 data words in each cache line according to the offset bits in write address. The byte offset bits (bit 0 and bit 1 in 32-bit address) are not used, since the basic unit for read and write operation is one 32-bit word.
4 Fig. 1 System overview 2. PROCEDURES We started from the design of the individual function cells, such as static RAM, MUX, COMPARATOR, and row decoder. After we finished the logic design, we did the implementation and simulation on the individual function cells. We then put them together to form a one entry with four data words cache. Only when the simulation result of this one line data cache was correctly, we then proceeded to put thirty two lines together to form the 512 bytes, direct-mapped, write back data cache STATIC RAM Figure 2 shows the mask-level layout of our static RAM. We used 6 transistors to implement the static RAM cell. The cell stores data on the gate of the storage transistor. Separate read and write control lines are used. Bit line is used to write, while bit line is used to read.
5 Fig. 2 Structure of Static SRAM cell 2.2. STATIC RAM WRITE AND READ The objective of the RAM write operation is to apply voltages to the RAM cell such that it will flip state. There are two kinds of WRITE operations: memory WRITE and processor WRTIE. For a memory WRITE, the write word line is asserted by row decoder. For a processor WRITE, if the comparator generates a write HIT signal, the write word line is asserted by row decoder. The Bit line is then driven to either VDD or VSS depending on the value that are to be stored in the cell. Figure 3.a shows a plot of wave forms during a WRITE operation. The state retained in SRAM cell (q) follows the input data appearing on the write bit line during the write stage (WWL is high). To read the RAM cell, the read word line is asserted by the read row decoder. All the bit lines is then pulled either up or down according to the values stored in the RAM cells in this row. Thirty two Muxes is used. Each Mux is to select one bit out of four bits, which we will discuss in detail section 2.4. Comparator generats HIT or MISS signal by comparing the tag in the address and
6 the tag stored in the cache. Figure 3.b shows a plot of wave forms during a READ operation. Correct value of SRAM cell is read during the read stage, while RWL is asserted. Fig. 3.a SRAM write operation Fig. 3.b SRAM read operation We use this basic cell structure to implement all data bits. However, there are different issues in data, tag, dirty and tag bits. We will address these issues later.
7 2.3. ADDRESS DECODER The simplest row decoder is an AND gate. To avoid exponentially increased delay due to large fan-in AND gate, we implemented different the address decoders. These decoders are illustrated in Figure 4. Although the 5-NAND decoder is supposed to deliver the worse performance, we discovered that when we attach a load to the output, the delay time is not too different in all these configurations. This implies the delay time is dominated by load, rather than the gate delay itself. We chose the NOR-NAND decoder in our implementation. Fig 4.a cascaded 2-NAND and 3-NAND decoder Fig 4.b 3-NOR and 3-NAND decoder
8 Fig. 4.c 5-NAND decoder 2.4. MUX To select one of the four words, we used thirty-two MUXes. Each MUX selects one bit among four bits according to the two offset bits, which are the third and fourth bits in the address. Eight n-transistors are used to implement MUX, which is shown in Figure 5. Since data bits are ready through!bl, the inverted data values are fed into the MUX. Based the value of A3 and A2 which constitutes the word offset fields, one of the four data values are selected and then inverted to generate the data out.!data3 A3!Data1!A3 Vdd A2 A2 Dataout!A0!A3 A3!A2!Data2 Fig. 5.a!Data0 Gnd 4-1 1bit Mux schematic
9 Fig. 5.b bit Mux layout 2.5. COMPARATOR We used twenty-three XOR gates, six four-input NOR gates and one six-input NAND gate to implement comparator. To break down the delay time, the six-input NAND is divided into two level NAND logic. An XOR gate takes one tag bit from the address and one from the corresponding cell as inputs. All the outputs from the XOR gates, along with VALID signal, are routed to the six four-input NOR gates as inputs. The six outputs from the six NOR gates are in turn routed to the NAND which generates HIT or MISS signal as output. Figure 6 shows the three basic XOR gates used in the COMPARATOR. Fig. 6 XOR gate as comparator
10 2.6. TAG Similar to the static RAM that we discussed in section 2.1, we used six transistors to implement TAG cell. The distinction between a tag bit and a data bit is that during a CPU write stage, the tag bit is being read instead of written. While the MEM write will make the mandatory update to all SRAM cells. This difference also applies to valid and dirty bits. The different operations are summarized in table 1. Table 1. Different operations on data, tag, valid and dirty bits It can be seen from the table that the bit line of TAG bit will be shared for both read (CPU write) and write (MEM write). There is another approach that will add another read port on TAG bit dedicated for CPU write operation. Our design is to share the bit line. Therefore, the CPU write and MEM write operation should be insulated on the bit line. Otherwise, during the CPU write stage, the wrong value may be put into the SRAM cell. We use a tri-state buffer as the input to
11 the bit line for MEM write operation; and a standard 2-invertor output buffer is connected to the bit line for CPU write operations. The MEM write signal (mwr) is used to enable the tri-state buffer. This will separate the CPU write stage and MEM write stage and protect the correct value in SRAM cell. The tri-state buffer structure is shown in Fig. 7. The TAG bit structure is shown in Fig. 8. Vdd In En Out Fig. 7. Tri-state buffer GND Vd Vdd tagin tagout mwr GN GND BL WWL!BL SRAM cell Of TAG RWL Fig. 8. TAG cell structure
12 Fig. 9 shows simulation results for a pair of tag bits in one cache line. TAG0 and TAG1 are written during MEM write stages twice (signal mwr) with different values (wtag0in & wtag1in). The MEM write operations are followed by a read (signal rwl) and a CPU write (wwl is high while mwr is low). The output signals demonstrate correct behavior at read port (rtag0out, rtag1out) and write port (wtag0out, wtag1out). Fig. 9. Simulation result of a TAG cell
13 2.7. WORD LINE AND DEMUX LOGIC Due to the different operation modes on different cells, the logic to assert the word line is also implemented differently. The DEMUX logic is also combined in word line logic using predecoded offset signals. One of the four data words in each cache line by asserted WWL based on the value of offset field A1A0. Table 2 summarizes the word line logic combined with the predecoded DEMUX logic. mwr and pwr are control signals for MEM write and CPU write. The address decodes generates addr_decode signal to select one cache lline. CPU write / WWL = MEM write / WWL = Read / RWL = Valid mwr addr_decode pwr addr_decode addr_decode Dirty mwr addr_decode pwr addr_decode addr_decode Tag mwr addr_decode pwr addr_decode addr_decode Data00 mwr addr_decode!a3!a2 pwr addr_decode!a3!a2 addr_decode Data01 mwr addr_decode!a3 A2 pwr addr_decode!a3 A2 addr_decode Data10 mwr addr_decode A3!A2 pwr addr_decode A3!A2 addr_decode Data11 mwr addr_decode A3 A2 pwr addr_decode A3 A2 addr_decode Table 2. Word line logic on different operations 3. RESULTS Based on the issues discussed in the previous section, the layout of one single cache line is drafted. We first completed one cache line with one port to perform either read or write operation. After the correct logic is verified, we added the second port and separated read and
14 write operations. The following simulation results are performed for single cache line with one read port and one write port. Load is added to the output signals. All speed models ff, tt and ss are simulated. The system clock cycle time is 3ns, in another word, the clock speed is 333 MHz. Simulation 1, perform read and write operation, test hit, valid, dirty signals and data out. In this simulation, we simulated the conditions at which read/write hit/miss can occur at the single cache line. The read address is kept the same as write address to check how soon read operation can respond to a immediate write operation in the same cycle. All 3 speed models, FF, TT and SS are simulated. All results show that the operation can be completed in 3ns cycle time. Cycle Operation reset MEM write To ffff Datain = 1 Read from ffff MEM write To ffbf Datain = 0 Read from ffff CPU write To ffff Datain = 0 Read from ffff CPU write To 001f Datain = 0 Read from ffff CPU write To ffff Datain = 1 Read from ffff Result X Write to word3 Data = 1 Read hit* Not write to this line Read hit Write hit To word3 Data = 0 Read hit* Write miss By conflict Read miss By conflict Write hit To word3 Data = 1 Read hit* wlinesel X rlinesel X Whit X X X X Rhit X 1* 1 1* 0 1* 1* Wvalidout X 1 X 1 X 1 1 Rvalidout X 1* 1 1* X 1* 1* Wdirtyout X 0 X 1 X 1 0 Rdirtyout X 0* 1 1* X 1* 0* Dataout X X 1 0 Mem write To ffff Datain = 0 Read from ffff Write to word3 Data = 0 Read hit* Table 3. Test vector of simulation 1 and the truth table of some signals * reads are performed to save cell while it is being written. This should not be allowed in reality, but here it is used to observe how soon the output can keep up with the in the same cycle. a. Use TT configuration.
15 All related signals are listed below. Although we perform some illegal operations read from the same cell that is being written at the same cycle, the output data can still match with the input in the same cycle. The logic of all signals concords with the truth table. A delay time around 1ns 1.5ns can be seen on the read data output. The delay time of write operation also contributes to this delay since the data is just written in the same cycle. The average power consumption is 7mW, while the peak power is 44mW.
16 Fig. 10. Simulation 1 results of TT configuration b. Use FF configuration. Delay time is reduced around 0.5ns. All logic is correct. The power consumption increases to 7.7mW average, 60mW at peak.
17 Fig 11. Simulation result of FF model c. Use SS configuration Severe delay can be observed. Sometimes the delay time is around 2ns in a 3ns cycle time. However, the logic is still correct, but it might not be safe to implement our design in slow materials. The average power consumption is 6mW, the peak power is 43mW. Fig 12. Simulation result of SS model
18 Simulation 2, read and write at same time from/to different location In this simulation, we only consider valid write and read. In consecutive 4 cycles, MEM write updates word0, word1, word2 and word3 in the same line. The read operation will validate the write value in the next cycle. The scenario of this simulation is summarized in table 4. We only use TT model for simulation. A 2-invertor load is attached to output. Cycle Operation reset MEM write To ffffc Datain = 1 Read from fffc MEM write To fffd Datain = 0 Read from fffc MEM write To fffe Datain = 1 Read from fffd Result X Write to word0 Data = 1 Read hit* Write to word1 Data = 0 Read hit Data = 1 Write to word2 Data = 1 Read hit Data = 0 wlinesel X rlinesel X 1* Whit X X X X X Rhit X 1* Wvalidout X Rvalidout X 1* Wdirtyout X Rdirtyout X 1* Dataout X 1* Table 4. Operations in simulation 2. * read/write at the same cycle MEM write To ffff Datain = 0 Read from fffe Write to word3 Data = 0 Read hit Data = 0 The simulation result shows the correct logical sequence. The delay time is 0.5ns 1ns in 3ns cycle time. The average power is 6.4mW. The peak power is 46mW.
19
20 fig 13. simulation result of simulation 2, tt model After we validated the correct functionality of the single cache line, we completed the whole 32- line cache in layout. However, we have troubles to run simulation. Each time spice refuses to execute after 2 or 3 hours due to some not converging nodes. Since each time it takes a long time before the error comes out, we are not able to continue the simulation in spice. This problem happens also in our simulation for a single line. We will leave this issue in the discussion section. 4. DISCUSSION In the implementation we found the most critical part that seriously impacts the correctness of the logic is incurred by different delay paths on word line and bit line. In most cases, the word line arrives/leaves the SRAM cell after the corresponding bit line signal that carries data, therefore wrong data will be stored into the SRAM cell. Fig 14 illustrate this behavior.
21 Control signal data Word line Bit line Wrong data will be written to SRAM Cell Clocked control signal Fig 14. Different delay time will affect the logic We tried different approaches to solve this problem. 1. We made more delay circuit to the data signals that will generate the bine line signal. We hope that the data on the bit line will remain longer before word line signal drops. This scheme failed since the word line signal is distributed to many SRAM cells, the delay time has to long; at the same time we can not effectively make longer delay on data signals. 2. We use control signals to turn on and off a transmission gate that passed data signal to bit line. We hope the word line signal drops as the control signal closes the transmission gate, so that the data signal will remain longer at bit line. This is also malfunctioning due to unpredictable factors that impact delay time. 3. We have to make a conservative approach, that is, we clock the control signals to make a short pulse on word line that will always turn on the SRAM cell when data on bit line is valid. This scheme works well on our implementation. It is also quite
22 tolerant to different delay paths, only if the clocked control signal lags the data signal on bit line more than a half of clock cycle should any error occur. We should notice data signal also has delay to be present on bit line, therefore such worse-case should be rare. Our implementation works well in 3ns cycle time, that is, the short pulse on word line around 1.5ns will make the system function correctly. However, in this approach, we may not be able to achieve higher clock rate due to the clock skew problem that may transition the short pulse on word line to a spike. As far as we consider 3ns cycle time, this scheme is not a bad solution. To avoid some long delays, we put larger transistors in every signal that will drive many gates in long wires. We also need such large transistors to prevent clock skew since this problem will kill the basis of our design. By inspecting the simulation results, our effort to reduce the delay time is quite effective. We have troubles to run simulation for the whole cache system. Spice tells inconvergenced node after we executed the program for 2 or 3 hours. This also happened when we simulate the single cache line. We used to have a smaller SRAM cell that was validated to work by simulation. When we run spice in the whole cache line, the circuit cannot converge. So we have to give up our original design for SRAM cell. This time the same problem happens again when we are moving from a validated single line to the whole cache system. 5. CONCLUSIONS We tried different implementations on proposed cache system. We found that the delay time on different paths can make a critical impact on the performance of a VLSI system, as well as the correctness of the logic functions.
23 In this project, we tried two ways to reduce the undesirable delay time. First, we made some driving signals more powerful by enlarge transistor size at the output. We found this scheme quite effective when a driving signal has moderate load, such as some internal logical operations on address bits (A3A2, and etc), control signals (mwr, pwr, hit), and clock. On the other hand, for the signals that will be broadcasting over long wires, the delay time seems to bound to a lower limit (caused by big load and long wire) that can will be further reduced by enlarging transistor size. The examples of these signals are word line and bit line signals. In our first approach to fight against the different delay paths, we put more driving power on word line and delayed the bit line signal, but the effectiveness is trivial. Our decision is to cut the long delay short by clocking the word line signals. This implementation functions well with a targeted 3ns clock cycle time.
24 The magic file for whole cache system is located in ~/jinfengl/ece251/project The tasks in this project is distributed by following: System design SRAM cell design Address decoder both both Yi Deng Word line and bit line logic Jinfeng Liu Tag comparator Component simulation Read port layout Write port layout Full layout for whole cache Simulation for single line Report Yi Deng Yi Deng Yi Deng Jinfeng Liu both Jinfeng Liu both
PICo Embedded High Speed Cache Design Project
PICo Embedded High Speed Cache Design Project TEAM LosTohmalesCalientes Chuhong Duan ECE 4332 Fall 2012 University of Virginia cd8dz@virginia.edu Andrew Tyler ECE 4332 Fall 2012 University of Virginia
More information6. Latches and Memories
6 Latches and Memories This chapter . RS Latch The RS Latch, also called Set-Reset Flip Flop (SR FF), transforms a pulse into a continuous state. The RS latch can be made up of two interconnected
More informationPicture of memory. Word FFFFFFFD FFFFFFFE FFFFFFFF
Memory Sequential circuits all depend upon the presence of memory A flip-flop can store one bit of information A register can store a single word, typically 32-64 bits Memory allows us to store even larger
More informationPrototype of SRAM by Sergey Kononov, et al.
Prototype of SRAM by Sergey Kononov, et al. 1. Project Overview The goal of the project is to create a SRAM memory layout that provides maximum utilization of the space on the 1.5 by 1.5 mm chip. Significant
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Cache 11232011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Memory Components/Boards Two-Level Memory Hierarchy
More informationOverview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM
Memories Overview Memory Classification Read-Only Memory (ROM) Types of ROM PROM, EPROM, E 2 PROM Flash ROMs (Compact Flash, Secure Digital, Memory Stick) Random Access Memory (RAM) Types of RAM Static
More informationMemory and Programmable Logic
Memory and Programmable Logic Memory units allow us to store and/or retrieve information Essentially look-up tables Good for storing data, not for function implementation Programmable logic device (PLD),
More informationSIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I
SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK Subject with Code : DICD (16EC5703) Year & Sem: I-M.Tech & I-Sem Course
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationBUILDING BLOCKS OF A BASIC MICROPROCESSOR. Part 1 PowerPoint Format of Lecture 3 of Book
BUILDING BLOCKS OF A BASIC MICROPROCESSOR Part PowerPoint Format of Lecture 3 of Book Decoder Tri-state device Full adder, full subtractor Arithmetic Logic Unit (ALU) Memories Example showing how to write
More information! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 5, 8 Memory: Periphery circuits Today! Memory " RAM Memory " Architecture " Memory core " SRAM " DRAM " Periphery " Serial Access Memories
More informationENGIN 112 Intro to Electrical and Computer Engineering
ENGIN 112 Intro to Electrical and Computer Engineering Lecture 30 Random Access Memory (RAM) Overview Memory is a collection of storage cells with associated input and output circuitry Possible to read
More informationA Comparative Study of Power Efficient SRAM Designs
A Comparative tudy of Power Efficient RAM Designs Jeyran Hezavei, N. Vijaykrishnan, M. J. Irwin Pond Laboratory, Department of Computer cience & Engineering, Pennsylvania tate University {hezavei, vijay,
More informationCOE758 Digital Systems Engineering
COE758 Digital Systems Engineering Project #1 Memory Hierarchy: Cache Controller Objectives To learn the functionality of a cache controller and its interaction with blockmemory (SRAM based) and SDRAM-controllers.
More informationCSEE 3827: Fundamentals of Computer Systems. Storage
CSEE 387: Fundamentals of Computer Systems Storage The big picture General purpose processor (e.g., Power PC, Pentium, MIPS) Internet router (intrusion detection, pacet routing, etc.) WIreless transceiver
More informationSpiral 2-9. Tri-State Gates Memories DMA
2-9.1 Spiral 2-9 Tri-State Gates Memories DMA 2-9.2 Learning Outcomes I understand how a tri-state works and the rules for using them to share a bus I understand how SRAM and DRAM cells perform reads and
More informationEECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:
Problem 1: CLD2 Problems. (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are: C 0 = A + BD + C + BD C 1 = A + CD + CD + B C 2 = A + B + C + D C 3 = BD + CD + BCD + BC C 4
More informationPROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES
PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES. psa. rom. fpga THE WAY THE MODULES ARE PROGRAMMED NETWORKS OF PROGRAMMABLE MODULES EXAMPLES OF USES Programmable
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: January 2, 2018 at 11:23 CS429 Slideset 5: 1 Topics of this Slideset
More informationCENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as
More informationLecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.
Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports
More informationCS250 VLSI Systems Design Lecture 9: Memory
CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled
More informationDigital Systems Design with PLDs and FPGAs Kuruvilla Varghese Department of Electronic Systems Engineering Indian Institute of Science Bangalore
Digital Systems Design with PLDs and FPGAs Kuruvilla Varghese Department of Electronic Systems Engineering Indian Institute of Science Bangalore Lecture-32 Simple PLDs So welcome to just lecture on programmable
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM
CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 22: Memery, ROM [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L22 S.1
More information6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1
6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,
More informationDec Hex Bin ORG ; ZERO. Introduction To Computing
Dec Hex Bin 0 0 00000000 ORG ; ZERO Introduction To Computing OBJECTIVES this chapter enables the student to: Convert any number from base 2, base 10, or base 16 to any of the other two bases. Add and
More informationLow-Power SRAM and ROM Memories
Low-Power SRAM and ROM Memories Jean-Marc Masgonty 1, Stefan Cserveny 1, Christian Piguet 1,2 1 CSEM, Neuchâtel, Switzerland 2 LAP-EPFL Lausanne, Switzerland Abstract. Memories are a main concern in low-power
More informationRandom Access Memory (RAM)
Random Access Memory (RAM) EED2003 Digital Design Dr. Ahmet ÖZKURT Dr. Hakkı YALAZAN 1 Overview Memory is a collection of storage cells with associated input and output circuitry Possible to read and write
More informationESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems
ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 26: November 9, 2018 Memory Overview Dynamic OR4! Precharge time?! Driving input " With R 0 /2 inverter! Driving inverter
More informationFPGA Programming Technology
FPGA Programming Technology Static RAM: This Xilinx SRAM configuration cell is constructed from two cross-coupled inverters and uses a standard CMOS process. The configuration cell drives the gates of
More informationIntroduction to SRAM. Jasur Hanbaba
Introduction to SRAM Jasur Hanbaba Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Non-volatile Memory Manufacturing Flow Memory Arrays Memory Arrays Random Access Memory Serial
More informationComputer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Building Memory
Computer Science 324 Computer rchitecture Mount Holyoke College Fall 2007 Topic Notes: Building Memory We ll next look at how we can use the devices we ve been looking at to construct memory. Tristate
More informationfalling edge Intro Computer Organization
Clocks 1 A clock is a free-running signal with a cycle time. A clock may be either high or low, and alternates between the two states. The length of time the clock is high before changing states is its
More informationCENG 4480 L09 Memory 2
CENG 4480 L09 Memory 2 Bei Yu Reference: Chapter 11 Memories CMOS VLSI Design A Circuits and Systems Perspective by H.E.Weste and D.M.Harris 1 v.s. CENG3420 CENG3420: architecture perspective memory coherent
More informationECE 2300 Digital Logic & Computer Organization
ECE 2300 Digital Logic & Computer Organization Spring 201 Memories Lecture 14: 1 Announcements HW6 will be posted tonight Lab 4b next week: Debug your design before the in-lab exercise Lecture 14: 2 Review:
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Write-Back Alternative: On data-write hit, just
More informationSTUDY OF SRAM AND ITS LOW POWER TECHNIQUES
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN ISSN 0976 6464(Print)
More informationCS311 Lecture 21: SRAM/DRAM/FLASH
S 14 L21-1 2014 CS311 Lecture 21: SRAM/DRAM/FLASH DARM part based on ISCA 2002 tutorial DRAM: Architectures, Interfaces, and Systems by Bruce Jacob and David Wang Jangwoo Kim (POSTECH) Thomas Wenisch (University
More informationContents. Main Memory Memory access time Memory cycle time. Types of Memory Unit RAM ROM
Memory Organization Contents Main Memory Memory access time Memory cycle time Types of Memory Unit RAM ROM Memory System Virtual Memory Cache Memory - Associative mapping Direct mapping Set-associative
More informationEmbedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts
Hardware/Software Introduction Chapter 5 Memory Outline Memory Write Ability and Storage Permanence Common Memory Types Composing Memory Memory Hierarchy and Cache Advanced RAM 1 2 Introduction Memory:
More informationEmbedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction
Hardware/Software Introduction Chapter 5 Memory 1 Outline Memory Write Ability and Storage Permanence Common Memory Types Composing Memory Memory Hierarchy and Cache Advanced RAM 2 Introduction Embedded
More informationVLSI for Multi-Technology Systems (Spring 2003)
VLSI for Multi-Technology Systems (Spring 2003) Digital Project Due in Lecture Tuesday May 6th Fei Lu Ping Chen Electrical Engineering University of Cincinnati Abstract In this project, we realized the
More informationECE3663 Design Project: Design Review #1
ECE3663 Design Project: Design Review #1 General Overview: For the first stage of the project, we designed four different components of the arithmetic logic unit. First, schematics for each component were
More informationIntroduction to CMOS VLSI Design Lecture 13: SRAM
Introduction to CMOS VLSI Design Lecture 13: SRAM David Harris Harvey Mudd College Spring 2004 1 Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports Serial Access
More informationTopic Notes: Building Memory
Computer Science 220 ssembly Language & Comp. rchitecture Siena College Fall 2011 Topic Notes: Building Memory We ll next see how we can use flip-flop devices to construct memory. Buffers We ve seen and
More informationColumn decoder using PTL for memory
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 5, Issue 4 (Mar. - Apr. 2013), PP 07-14 Column decoder using PTL for memory M.Manimaraboopathy
More informationCS152 Computer Architecture and Engineering Lecture 16: Memory System
CS152 Computer Architecture and Engineering Lecture 16: System March 15, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http://http.cs.berkeley.edu/~patterson
More informationMemory Supplement for Section 3.6 of the textbook
The most basic -bit memory is the SR-latch with consists of two cross-coupled NOR gates. R Recall the NOR gate truth table: A S B (A + B) The S stands for Set to remember, and the R for Reset to remember.
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main
More informationDigital Integrated Circuits Lecture 13: SRAM
Digital Integrated Circuits Lecture 13: SRAM Chih-Wei Liu VLSI Signal Processing LAB National Chiao Tung University cwliu@twins.ee.nctu.edu.tw DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 1 Outline Memory Arrays
More informationESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)
ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) Objective Part A: To become acquainted with Spectre (or HSpice) by simulating an inverter,
More informationA NEW GENERATION OF TAG SRAMS THE IDT71215 AND IDT71216
A NEW GENERATION OF TAG SRAMS THE IDT71215 AND IDT71216 APPLICATION NOTE AN-16 Integrated Device Technology, Inc. By Kelly Maas INTRODUCTION The 71215 and 71216 represent a new generation of integrated
More information3. Implementing Logic in CMOS
3. Implementing Logic in CMOS 3. Implementing Logic in CMOS Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 27 September, 27 ECE Department,
More informationDesign of Low Power Wide Gates used in Register File and Tag Comparator
www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,
More informationEECS150 - Digital Design Lecture 16 Memory 1
EECS150 - Digital Design Lecture 16 Memory 1 March 13, 2003 John Wawrzynek Spring 2003 EECS150 - Lec16-mem1 Page 1 Memory Basics Uses: Whenever a large collection of state elements is required. data &
More informationCpE 442. Memory System
CpE 442 Memory System CPE 442 memory.1 Outline of Today s Lecture Recap and Introduction (5 minutes) Memory System: the BIG Picture? (15 minutes) Memory Technology: SRAM and Register File (25 minutes)
More informationCOMP 3221: Microprocessors and Embedded Systems
COMP 3: Microprocessors and Embedded Systems Lectures 7: Cache Memory - III http://www.cse.unsw.edu.au/~cs3 Lecturer: Hui Wu Session, 5 Outline Fully Associative Cache N-Way Associative Cache Block Replacement
More information10/24/2016. Let s Name Some Groups of Bits. ECE 120: Introduction to Computing. We Just Need a Few More. You Want to Use What as Names?!
University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Memory Let s Name Some Groups of Bits I need your help. The computer we re going
More informationChap-2 Boolean Algebra
Chap-2 Boolean Algebra Contents: My name Outline: My position, contact Basic information theorem and postulate of Boolean Algebra. or project description Boolean Algebra. Canonical and Standard form. Digital
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationLecture 11: MOS Memory
Lecture 11: MOS Memory MAH, AEN EE271 Lecture 11 1 Memory Reading W&E 8.3.1-8.3.2 - Memory Design Introduction Memories are one of the most useful VLSI building blocks. One reason for their utility is
More informationIntegrated Circuits & Systems
Federal University of Santa Catarina Center for Technology Computer Science & Electronics Engineering Integrated Circuits & Systems INE 5442 Lecture 23-1 guntzel@inf.ufsc.br Semiconductor Memory Classification
More information+1 (479)
Memory Courtesy of Dr. Daehyun Lim@WSU, Dr. Harris@HMC, Dr. Shmuel Wimer@BIU and Dr. Choi@PSU http://csce.uark.edu +1 (479) 575-6043 yrpeng@uark.edu Memory Arrays Memory Arrays Random Access Memory Serial
More informationThe Memory Hierarchy. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 3, 2018 L13-1
The Memory Hierarchy Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 3, 2018 L13-1 Memory Technologies Technologies have vastly different tradeoffs between capacity, latency,
More informationCHAPTER 12 ARRAY SUBSYSTEMS [ ] MANJARI S. KULKARNI
CHAPTER 2 ARRAY SUBSYSTEMS [2.4-2.9] MANJARI S. KULKARNI OVERVIEW Array classification Non volatile memory Design and Layout Read-Only Memory (ROM) Pseudo nmos and NAND ROMs Programmable ROMS PROMS, EPROMs,
More informationECE 2300 Digital Logic & Computer Organization. Caches
ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1 Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2 Course Content Binary numbers and logic gates
More informationA Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM
IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 09, 2016 ISSN (online): 2321-0613 A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM Yogit
More informationMODULE 12 APPLICATIONS OF MEMORY DEVICES:
Introduction to Digital Electronic Design, Module 12 Application of Memory Devices 1 MODULE 12 APPLICATIONS OF MEMORY DEVICES: CONCEPT 12-1: REVIEW OF HOW MEMORY DEVICES WORK Memory consists of two parts.
More informationMark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness
EE 352 Unit 10 Memory System Overview SRAM vs. DRAM DMA & Endian-ness The Memory Wall Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology
More informationChapter 3 Semiconductor Memories. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan
Chapter 3 Semiconductor Memories Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Introduction Random Access Memories Content Addressable Memories Read
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationLearning Outcomes. Spiral 2-9. Typical Logic Gate TRI-STATE GATES
2-9.1 Learning Outcomes 2-9.2 Spiral 2-9 Tri-State Gates Memories DMA I understand how a tri-state works and the rules for using them to share a bus I understand how SRAM and DRAM cells perform reads and
More informationDESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES
Volume 120 No. 6 2018, 4453-4466 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR
More informationEE577A FINAL PROJECT REPORT Design of a General Purpose CPU
EE577A FINAL PROJECT REPORT Design of a General Purpose CPU Submitted By Youngseok Lee - 4930239194 Narayana Reddy Lekkala - 9623274062 Chirag Ahuja - 5920609598 Phase 2 Part 1 A. Introduction The core
More informationMIPS) ( MUX
Memory What do we use for accessing small amounts of data quickly? Registers (32 in MIPS) Why not store all data and instructions in registers? Too much overhead for addressing; lose speed advantage Register
More informationEECS150, Fall 2004, Midterm 1, Prof. Culler. Problem 1 (15 points) 1.a. Circle the gate-level circuits that DO NOT implement a Boolean AND function.
Problem 1 (15 points) 1.a. Circle the gate-level circuits that DO NOT implement a Boolean AND function. 1.b. Show that a 2-to-1 MUX is universal (i.e. that any Boolean expression can be implemented with
More informationMemory memories memory
Memory Organization Memory Hierarchy Memory is used for storing programs and data that are required to perform a specific task. For CPU to operate at its maximum speed, it required an uninterrupted and
More informationSlide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationGHz Asynchronous SRAM in 65nm. Jonathan Dama, Andrew Lines Fulcrum Microsystems
GHz Asynchronous SRAM in 65nm Jonathan Dama, Andrew Lines Fulcrum Microsystems Context Three Generations in Production, including: Lowest latency 24-port 10G L2 Ethernet Switch Lowest Latency 24-port 10G
More informationAnnouncement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.
Announcement Computer Architecture (CSC-350) Lecture 0 (08 April 008) Seung-Jong Park (Jay) http://www.csc.lsu.edu/~sjpark Chapter 6 Objectives 6. Introduction Master the concepts of hierarchical memory
More informationSRAM. Introduction. Digital IC
SRAM Introduction Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports Serial Access Memories Memory Arrays Memory Arrays Random Access Memory Serial Access Memory
More informationMemory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM
ECEN454 Digital Integrated Circuit Design Memory ECEN 454 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports DRAM Outline Serial Access Memories ROM ECEN 454 12.2 1 Memory
More informationA Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding
A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely
More information190-MHz CMOS 4-Kbyte Pipelined Caches
90-MHz CMOS -Kbyte Pipelined Caches Apoorv Srivastava, Yong-Seon Koh, Barton Sano, and Alvin M. Despain ACAL-TR-9- November 99 ABSTRACT In this paper we describe the design and implementation of a 90-MHz
More informationLecture 21: Combinational Circuits. Integrated Circuits. Integrated Circuits, cont. Integrated Circuits Combinational Circuits
Lecture 21: Combinational Circuits Integrated Circuits Combinational Circuits Multiplexer Demultiplexer Decoder Adders ALU Integrated Circuits Circuits use modules that contain multiple gates packaged
More information12 Cache-Organization 1
12 Cache-Organization 1 Caches Memory, 64M, 500 cycles L1 cache 64K, 1 cycles 1-5% misses L2 cache 4M, 10 cycles 10-20% misses L3 cache 16M, 20 cycles Memory, 256MB, 500 cycles 2 Improving Miss Penalty
More informationDLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 3 DLD P VIDYA SAGAR
DLD UNIT III Combinational Circuits (CC), Analysis procedure, Design Procedure, Combinational circuit for different code converters and other problems, Binary Adder- Subtractor, Decimal Adder, Binary Multiplier,
More informationWhere We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Main Memory. Readings
Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2012 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2012 Where We Are in This Course
More informationCHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang
CHAPTER 6 Memory 6.1 Memory 233 6.2 Types of Memory 233 6.3 The Memory Hierarchy 235 6.3.1 Locality of Reference 237 6.4 Cache Memory 237 6.4.1 Cache Mapping Schemes 239 6.4.2 Replacement Policies 247
More informationProblem Set 10 Solutions
CSE 260 Digital Computers: Organization and Logical Design Problem Set 10 Solutions Jon Turner thru 6.20 1. The diagram below shows a memory array containing 32 words of 2 bits each. Label each memory
More informationMemory. Memory Technologies
Memory Memory technologies Memory hierarchy Cache basics Cache variations Virtual memory Synchronization Galen Sasaki EE 36 University of Hawaii Memory Technologies Read Only Memory (ROM) Static RAM (SRAM)
More informationNAND/NOR Logic Gate Equivalent Training Tool Design Document. Team 34 TA: Xinrui Zhu ECE Fall Jeremy Diamond and Matthew LaGreca
NAND/NOR Logic Gate Equivalent Training Tool Design Document Team 34 TA: Xinrui Zhu ECE 445 - Fall 2017 Jeremy Diamond and Matthew LaGreca Table of Contents 1.0 INTRODUCTION 1.1 Objective 1.2 Background
More informationConcept of Memory. The memory of computer is broadly categories into two categories:
Concept of Memory We have already mentioned that digital computer works on stored programmed concept introduced by Von Neumann. We use memory to store the information, which includes both program and data.
More informationAn Overview of Standard Cell Based Digital VLSI Design
An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,
More informationThe Memory Hierarchy Cache, Main Memory, and Virtual Memory
The Memory Hierarchy Cache, Main Memory, and Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University The Simple View of Memory The simplest view
More informationSemiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 4, 7 Memory Overview, Memory Core Cells Today! Memory " Classification " ROM Memories " RAM Memory " Architecture " Memory core " SRAM
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationChapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review
Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic
More informationComputer Architecture Memory hierarchies and caches
Computer Architecture Memory hierarchies and caches S Coudert and R Pacalet January 23, 2019 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches
More information