LPRAM: A Novel Low-Power High-Performance RAM Design With Testability and Scalability. Subhasis Bhattacharjee and Dhiraj K. Pradhan, Fellow, IEEE

Size: px
Start display at page:

Download "LPRAM: A Novel Low-Power High-Performance RAM Design With Testability and Scalability. Subhasis Bhattacharjee and Dhiraj K. Pradhan, Fellow, IEEE"

Transcription

1 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY LPRAM: A Novel Low-Power High-Performance RAM Design With Testability and Scalability Subhasis Bhattacharjee and Dhiraj K. Pradhan, Fellow, IEEE Abstract To date, all of the proposals for low-power designs of RAMs essentially focus on circuit-level solutions. What we propose here is a novel architecture (high) level solution. Our methodology provides a systematic tradeoff between power and area. Also, it allows tradeoff between test time and power consumed in test mode. Significantly, too, the proposed design has the potential to achieve performance improvements while simultaneously reducing power. In this respect, it stands apart from other approaches where power reduction results in speed reduction. The basic approach here divides the RAM into modules, interconnecting these modules in a binary tree where the tree can be reconfigured dynamically during normal operation and during test mode. Furthermore, during test mode, most of the RAM can be switched off, which provides major power reduction, while test-application time is reduced. The aspect ratio of the modules is allowed to vary as a design parameter. The chosen aspect ratio for module impacts power/access time/area tradeoffs. Such novel features make the proposed methodology of potential practical significance. Also, a design tool is developed which inputs various parameters, such as desired power/performance, giving outputs basic design parameters, such as the needed number of modules, area overhead, and resulting test speed-up. Index Terms Embedded RAM, leakage power, low power, lowpower RAM (LPRAM), low-power testing, memory architecture, RAM, testable RAM. I. INTRODUCTION FURTHER progress in low-power very large scale integration (VLSI) technology, including low-power RAM designs, is crucial for the semiconductor industry. Additionally, the success of future system-on-a-chip (SOC) depends heavily on innovations in low-power embedded RAM design. All previous works on RAM focus on circuit-level solutions. There are mainly three directions in which research has targeted design of low-power RAM [2], [3], [7], [8]. Specifically, these are reduction in 1) charging capacitance; 2) operating voltage; and 3) static current. Proposed methodology here departs radically from all these and provides an architectural high-level solution. This does not preclude application of the lower circuit-level techniques for low-power design, in addition. Therefore, any existing circuit-level techniques can also be applied to our proposed methodology to achieve further power savings. However, a unique feature of our design that cannot be accomplished through the circuit approach is that power reduction is achieved with potential performance and test improvements. Manuscript received December 12, 2002; revised March 28, This work was supported in part by EPSRC (U.K.) and is based on D. K. Pradhan s A Low Power RAM Design (patent filed). This paper was recommended by Associate Editor K. Chakrabarty. The authors are with the University of Bristol, Bristol BS8 1UB, U.K. ( pradhan@cs.bris.ac.uk). Digital Object Identifier /TCAD Fig. 1. On-chip RAM. The simultaneous reduction in delay and power is achieved by reduction in the size of both word and bit lines. The conventional wisdom dictates that any power reduction must also result in speed reduction. What we propose here stands apart in that, while the power is reduced, the speed is actually increased. The overhead here is in terms of increased area. In particular, the proposed design has significant potential for application in the design of on-chip memories as shown in Fig. 1. Here, both power and test concerns pose major challenges. Also, our proposed design provides certain speed advantages. It has the potential to achieve higher speed and, also significant, it guarantees uniform access to all the cells. This is a byproduct of our novel layout strategy for the cell arrays. The power reduction targets normal operation of the RAM as well as during the testing of the RAM. The proposed methodology allows for systematic tradeoff between area, power, and performance. In addition, our design differs from all existing approaches in its unique ability for power reduction during both normal operation and testing. The speed of testing can also be varied, allowing varying levels of power dissipations. Another unique feature of our design methodology is that, unlike conventional design, the speed is improved while power is reduced, the tradeoff here being the area. There is an area increase over conventional RAM designs. The design methodology is recursive and has the unique feature of being scalable in that one can synthesize larger designs using smaller designs. A power estimation model for the proposed design is developed. This model demonstrates significant power savings. Also developed here is a model for area estimates. This is used to estimate the area increase for proposed design over traditional design. What is apparent is that the proposed design allows for smooth trade off between area, power, and performance. This paper is organized into Sections II X. Section II reviews the previous works on low-power RAM. The proposed architecture is discussed in Section III, followed by its design methodology, described in Section IV, with detailed discussion on its various modes of operations in Section V. Estimates of power, area, and performance, along with a comparison to the traditional RAM, are discussed in Sections VI VIII, respectively. Section X discusses the testing procedure with the test structure proposed in Section IV. A case study addressing all these issues is discussed in Section IX, showing the effect of aspect ratio on power, performance, and area /04$ IEEE

2 638 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY 2004 II. REVIEW One of the popular techniques for low-power design is the reduction in supply voltage [8], [10], [16]. However, there are limits to this approach. Decreasing the supply voltage requires corresponding reduction of the threshold voltage. Also, the noise and dc current considerations limit reducing the supply voltage arbitrarily. Another approach for decreasing power is to reduce the charging capacitance of RAMs [7], [10]. The charging capacitance can be reduced by partial activation of multidivided data line (DDL) and/or multidivided word line (DWL) [7]. The proposed approach here provides a systematic technique for reduction of word line capacitance in a manner which is useful in both operational and test mode. Further, our approach differs from earlier approaches [6] in that we don t share decoders between cell array partitions therein, providing additional power savings. However, the word and data line segmentation techniques proposed earlier [2], [3], [7], [8] can still be applied to our modular partitioning technique, providing additional power savings. Our technique also allows multibanking within modules to attach further power reduction. Other prior research on how to reduce power focused on techniques to reduce the charging capacitance during the data retention period, as well as lowering the refresh frequency [12]. Preserving the refresh busy rate, as low as possible proportionately, for a large RAM, increases the charging capacitance of a word linewiththemaximumrefreshtimeofthecell.doublingthe maximum refresh time at each generation reduces power during refresh mode. However, this approach can be cumbersome and is limited by the cell leakage current. In [2], another scheme is proposed to utilize long word line for refresh operation and divided word line for normal operation to attain reduced power during normal operation. The proposed technique allows a reduction in retention power as well, and is different from prior approaches, providing a higher level solution for low-power design. III. PROPOSED ARCHITECTURE The proposed architecture partitions the RAM into a number of modules, where each is a smaller RAM module with decoder and refresh circuitry. The modules are then interconnected by an H-tree [1], which provides for planar layout as well as the incorporation of a particular built-in-self-test technique. A new feature of our design is that modules are allowed to have arbitrary aspect ratio. As demonstrated here, this allows major power/performance tradeoffs. Another new feature now proposed is the switching off of portions of the RAM during both normal operation, as well as during testing. Such a dynamic reconfiguration capability allows for a smooth tradeoff of test application time and power dissipation during test. Importantly, the built-in-test structure proposed here differs significantly from earlier designs. Rather than activating all modules for parallel read and write, we allow parallel read/write to a group of only a small number of modules, simultaneously. Because the rest of the RAM is switched off during testing, test power is drastically reduced. During the normal mode, major power savings are achieved because the modular design explicitly reduces the length of the word line activated. Fig. 2. Conceptual schematic of LPRAM architecture. An integral feature of the proposed methodology is the ability to tradeoff both power and performance with area, as described. Also, the ability to tradeoff test power with test-application time, as described, is of significant importance. IV. DESIGN OVERVIEW Our design for low-power RAM assumes cells divided into equal-sized modules, representing the size of the RAM in bits and, the number of address lines (assuming an bit organization). These modules appear as leaf nodes in a complete binary tree (Fig. 2). The depth of the tree and the number of modules or leaf nodes are related by. The size of each node is, where. Note that the root node is at level one. The parameters and define the properties of this architecture. A large means a higher granularity, a higher degree of power saving, speed-up, and testability, with increased chip size. (The design can also be configured using a -way tree, where is a power of two). Our low-power design relies on making modules of a different geometry than the earlier testable version. Also, major innovation in the test and refresh circuitry is proposed. The following highlights the key differences between the traditional approach to low power for a RAM that is built, using multiple cell array partitions, versus the proposed LPRAM. 1. The traditional cell array partition does not use any H-tree layout. Ours uses H-tree layout for laying out different cell arrays. This assures that, independent of the number cell arrays (modules) and independent of the size of the modules, the cells are generally equidistant from the read/write port. This will allow more predictability of the delays, because the delays are equally balanced in embedded RAM design. 2. In our approach, each cell array (module) has an independent refresh and decoder circuit. In the conventional cell-array portioning, these are shared. What we show is that this feature helps to achieve performance improvement during normal operation. 3. Also, the proposed design allows power reduction during refresh. Each module has fewer words so the

3 BHATTACHARJEE AND PRADHAN: LPRAM: NOVEL METHODOLOGY FOR LOW-POWER HIGH-PERFORMANCE RAM DESIGN 639 words can be refreshed at a slower speed. This coupled with the fact that the number of bits in each word is smaller we obtain a quadratic effect. However, since all nodes have to be refreshed in parallel, this quadratic savings reduces to a linear factor. It should be noted that the total energy required to refresh stays the same. 4. Also, independent decoding and refreshing is essential for parallel testing. This traditional cell-array partitioning can suffer from correlated failures reducing fault coverage. 5. The partitioned approach we have allows for an additional low-power mode, by being able to switch off portions of the RAM, at ease. This can be a major advantage when battery power is a concern. Although this can also be done in traditional cell-array portioning, this additional low-power mode in our LPRAM is much more flexible and versatile. 6. The H-tree layout also has the advantage of being able to pipeline multiple bits, through the H-tree, providing an additional bandwidth potential. This is not possible in traditional cell-array partitions. As shown in our paper, different kinds of address mapping is possible, because of this modular approach we have taken. 7. The H-tree circuit, itself, can be built with wider and faster buses, making the delay in the H-tree negligible. This particular H-tree has decoders which are very simple, and can be built differently than the cell arrays, for additional speed. 8. Unlike cell array partitions, we have the potential to achieve significant speed advantages, BOTH during normal operation as well as during testing. Testing can be a major concern and our low-power RAM (LPRAM) achieves higher test speed, while reducing the power consumption. 9. Although the comparisons done here are done assuming only four cell-array partitions within our module, there is no reason why more partitions cannot be used, within each module providing greater savings in power and higher speed. 10. Our design approach is RECURSIVE by nature. This has the advantage of design reuse, using a thoroughly optimized, smaller RAM design to build a larger one. As we progress through the generations of RAM design, the ability to use a recursive approach can be of significant advantage in speeding up design, and verifying the design. Our design methodology has the unique feature of being scalable as one can build larger RAMs using smaller RAMs. A Simplified Model of RAM for Comparisons: In this paper, we assume a simplified model of RAM, as shown in Fig. 3. This model is used here for both conventional RAM and for the modules used in the proposed architecture. The simplified model is used because it admits developing simple and accurate expressions for comparing power, area, and performance estimates, as shown later. Since we are using this model for the basis of comparison, this does not compromise the basic results and conclusions obtained. Fig. 3. Simplified model of RAM architecture. Based on our simplified model of RAM, we observe that a conventional RAM (Fig. 3) can be thought as a special case of low-power high-performance RAM with where cells are arranged in four quadrants, each holding cells arranged in a two-dimensional (2-D) matrix of rows and columns. The address bus is divided into two equal (near equal, when is odd) parts, one half used to decode the row, and the other to select the column. For the sake of comparison, we assume a four-quadrant architecture, but the architecture allows each module to be built out of more numbers of cell array partitions. Basically, two types of nodes are used in our design: memory nodes and switch nodes. Memory nodes have the cell array based on the traditional multisubarray (for example, four quadrant) organization with independent control units, refresh circuitry, and certain built-in test circuitry. Each module itself can also be designed with a larger number of subarrays, as in current designs. For the sake of modeling, we propose that each module containing cells is arranged in four quadrants, each quadrant holding cells. Each quadrant is a 2-D array of memory cells arranged in rows, each row containing cells. But, unlike conventional RAM, we divide the address bus ( address lines) into two parts and respectively, to give preferably a nonunit aspect ratio. These and address lines are separately decoded in the row and column decoders, respectively, to give rows and columns. We define, the aspect ratio of each quadrant in LPRAM. Additionally, each memory node contains some tristate switches on the runs of power line(s), to cut it off from the power source when required. The number of such switches will depend on the maximum number of elements active at any time and on the power-line layout. The control of these switches is discussed in the latter part of this section. The switch nodes are simple 1-out-of-2 decoders with buffers. As Fig. 4 shows, the memory nodes are connected hierarchically, using the switch nodes, and laid out in an H-tree layout. Let each memory node be identified by, where. Therefore, as shown in Fig. 4 (for ), the nodes are num-

4 640 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY 2004 Fig. 4. H-tree of LPRAM architecture. bered, consecutively numbered nodes adjacent to each other in the layout. We consider the laying of the memory nodes in rows and columns in the H-tree layout. We define the aspect ratio of H-tree to be equal to. For this initial example, the aspect ratio of the memory node and the H-tree will be assumed to be 1:1, that is and. The address/data/control bus is connected to the root, a switch node. The most significant bit is decoded, generating a left subtree or a right subtree select. The other signals are buffered and propagated down the tree. This action occurs repeatedly at each level until a single memory node is selected. At this point, the remaining address bits are latched into the address buffers of the selected memory node only, and are then used to select a cell within the node. The address buffers of all other nonselected nodes remain completely unchanged, thereby nullifying any possibility of activity within them (other than normal refresh activity). Each cell is identified by the address, where (node address) and (address within a node). Aspect Ratio: As defined earlier, is the aspect ratio of each quadrant of a memory node in LPRAM and is equal to. Since all four quadrants of a memory node are identical in structure, we see that the aspect ratio of a memory node is almost the same as the aspect ratio of the quadrant. This is also shown in Fig. 5. So, we define the aspect ratio of a memory node, and only use for discussion. As both and are powers of two, so is ; i.e., for some to be referred to as the aspect ratio index (ARI) of a memory node. The aspect ratio of LPRAM depends on: 1) the aspect ratio of the individual module and 2) the aspect ratio of the H-tree layout (Fig. 5). This figure depicts an LPRAM with 16 memory nodes, shown to have a chip aspect of 2:1, where the individual memory node has an aspect ratio of 1:2, and the aspect ratio of the H-Tree layout is 4:1. We define to be the ARI of the H-tree layout, such that, the aspect ratio of the H-tree layout. The ARI of the LPRAM, and the corresponding aspect ratio of the LPRAM is. It should be noted that the aspect ratio of the RAM chip (denoted as ) is defined as the ratio of the two sides. Since we do not make any distinction between width and height at line chip level always. All other s are defined as the ratio of width divided by height. We illustrate the difference between and using Figs Fig. 6 shows the layout of a LPRAM with 16 modules, where and, producing. Whereas, Fig. 7 shows another layout of the same LPRAM with 16 modules, where and, producing. However, the chip aspect ratios of all of them are the same and equal to 2:1. It should be noted that the conventional tradeoff is the lower the power the lower the speed. However, the proposed design methodology achieves power savings, while at the same time achieving higher speed up to a certain point as shown later. The tradeoff here is with the area. The proposed design increases the area. It is important to note that the traditional relationship between power and performance does not hold here. Normally, as power reduces, speed also reduces. However, in the proposed methodology, the reduced power design has improved performance. However, there is an increase in area as the power is reduced. It will be simpler to discuss the various properties and estimates with respect to ARIs, rather than to the aspect ratio. Therefore, from this point, we will focus only on ARIs. The corresponding aspect ratio can be easily computed from the known ARI. In summary: the aspect ratio of the chip is ; the aspect ratio of a memory node is ; the aspect ratio of the H-tree layout is ; the aspect ratio of the LPRAM layout is. For a given size of RAM, in a traditional model as well as in a LPRAM model, we cannot get any arbitrary aspect ratio. For example, it can be easily seen that if we are to design a RAM of size or 4 M, we cannot get any configuration of rows and columns, such that the resulting chip has the aspect ratio of 2:1. However, for any given specified size of the RAM, one can realize the RAM with various aspect ratios, such as 1:1, 4:1, or 16:1, etc., which correspond to ARIs 0, 2, 4, etc., respectively. So, even if a traditional RAM is designed with a nonunit aspect ratio, the ARIs (and, correspondingly, the aspect ratios) are restricted by ; i.e., ARI is even if and only if is even, and it is odd if and only if is odd. It is also easy to see from the argument above that if and only if is odd (i.e., is even and modules), then can have only even values. Similarly, can take even values if and only if is even. We know for an LPRAM, and, thus, if is odd, is even and vice versa. Lemma 1: For a given size of RAM,, the ARI of any LPRAM is odd (even) if and only if is odd (even). Proof is given in the Appendix. Lemma 2: For a given RAM of size and the given chip aspect ratio, such that either both and

5 BHATTACHARJEE AND PRADHAN: LPRAM: NOVEL METHODOLOGY FOR LOW-POWER HIGH-PERFORMANCE RAM DESIGN 641 Fig. 5. LPRAM with chip aspect ratio 2:1 and node aspect ratio is 1:2 for low power and higher speed. Fig. 6. LPRAM with chip aspect ratio = 2:1, high power, and low speed. Fig. 8. Test structure of LPRAM with four-way built-in comparison. Fig. 7. LPRAM with chip aspect ratio = 2:1, low power, and high speed. are odd numbers or both are even numbers, then there are exactly ways to construct the LPRAM to meet the given aspect ratio. Proof is given in the Appendix. As illustration, consider a 64 M DRAM. This has cells and let the of the chip be 2. This provides altogether distinct possible architectures. So, we have a large flexibility in power/performance tradeoffs. However, all these H-Tree layouts are not advantageous, with respect to wire length and, as wire length increases, performance decreases due to longer critical path length. So, only a particular range of variation in the aspect ratio of the H-Tree layout, say, is of practical importance. To keep the formula simple, we assume the ARI of the H-Tree layout and the memory nodes to be 0 (i.e., aspect ratio 1:1) while deducing power and performance estimates. It has been verified that little variation in the aspect ratio of the H-Tree layout produces little difference in the power estimates and performance. Instead, the variation of the aspect ratio of the individual module greatly affects the power, performance, and area of the module (all addressed in Sections VI X). What is key here is that an optimum aspect ratio can be used for individual memory nodes to reduce the power dissipation and increase the speed, while, at the same time, achieving the given aspect ratio of the chip by varying the aspect ratio of the H-Tree layout, as long as it is tolerable. Test Structure: Fig. 8 shows the test structure to be used for the proposed low-power LPRAM during testing. All these modules have been divided into quadrants (shown as the dotted boundary in Fig. 8), each quadrant holding (a power of two, assumed to be four in the figure) modules. So, we have. In each quadrant, comparators are placed between

6 642 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY 2004 Fig. 9. Address mapping of LPRAM. adjacent modules, shown in Fig. 8,,,. The output of all those comparators is fed to a input OR gate, centrally located in that quadrant. The output of all these OR gates is tagged and sent as a single FAIL line, to generate an error during testing. So, in each quadrant, all the modulo adjacent nodes are compared simultaneously, eventually leading to a speed-up of fold during testing. V. MODES OF OPERATION The low-power testable RAM (LPRAM) has three modes of operation: 1) normal mode; 2) test mode; and 3) standby mode. There are two additional inputs, of which one (TEST) is used to activate the RAM in test mode and the other in low-power mode. In test mode, test data is fed into the RAM and any discrepancy in testing raised by an additional output pin FAIL. In low-power mode, the modular design allows switching off portions of the RAM. Dynamic reconfiguration based on workload can be easily accomplished in standby mode. This mode of operation can be useful when the system is not very active. Switching off portions of RAM can be done easily in our architecture by the memory management unit, to save additional power. We propose to overlay an additional switch structure for routing the. This will allow switching out half or three fourths of the RAM by disabling appropriately. Test Mode Operation: The LPRAM can be put into test mode by activating the TEST pin. Test data is fed into LPRAM, as usual, through the external tester by addressing as,,. These bits are ignored during testing, and data is written parallel into all nodes simultaneously, in the th quadrant. By Test Write, the writing up of data to all locations identified by the address is conveyed. Similarly, by Test Read, the parallel reading of all locations addressed, routing the data internally to the OR gate and finally to the FAIL line, is conveyed. Testing proceeds by activating each one of these quadrants, one at a time. The extra pins provided for the low-power reconfiguration, as described above, are used here in test mode, switching out all other quadrants. Identical data is simply written into all the modules in the quadrant; the data is then read back and compared against each other internally for test. Thus, all modules in a quadrant can be tested simultaneously. The testing time to test all the modules in any quadrant is the same as testing any single module providing considerable speed-up. Low-Power Structure and Operation: RAMs full capacity is often not fully utilized, a small fraction active most of the time. So, a technique which enables switching off of the portions of RAM not in use, but dissipating power due to leakage and refreshing (at chip and board levels) is enormously helpful. We provide here a chip-level technique. However, how much of the LPRAM can be switched off depends on the number of additional input pins (called DIV pins) allowed: with one DIV pin, either half or three-fourths of the RAM can be switched off. Switching off larger portions of the RAM can be done using additional pins. Because LPRAM is so modular, it can be accommodated by FULL and DIV lines controlling the tristate switches planted inside the memory nodes. This mode is very useful for handheld devices, particularly when battery power is below certain thresholds. The stepped nature of this configurability provides additional flexibility to the operating system to select a range of battery power thresholds to better utilize power, rather than wasting. Address Mapping: Basically, two ways to map the addresses exist: one is to have consecutive addresses within each module, and the other addresses are interleaved across modules. In Fig. 9, we have shown two different mappings of 32 addresses (given in hexadecimal) into modules. Address bits are divided into two parts, and, here represents the module number 0 through 7 and represents the address within the module. In Fig. 9(a), the least significant two bits are changed to produce consecutive addresses within the same module. The mapping shown in Fig. 9(b) has the advantage of being able to access multiple addresses through pipelining. Buffers can be placed on the switch nodes to facilitate this. This will further impact speed.

7 BHATTACHARJEE AND PRADHAN: LPRAM: NOVEL METHODOLOGY FOR LOW-POWER HIGH-PERFORMANCE RAM DESIGN 643 Fig. 10. Power dissipation within a memory node. VI. POWER-ESTIMATION MODEL AND COMPARISONS The following is the active power equation for CMOS RAM of size cells (i.e., cells arranged in rows, and each containing cells in each quadrant of a four-quadrant memory module), given by [2] Data Retention Power of Conventional DRAM: In the data-retention mode, internal data is retained and refreshed without any access from outside. The refresh operation is performed by reading data of all the cells on a single word line, and restoring them to their original values. The refreshing circuitry selects each of the word lines in order, and during the whole time (called refresh busy time), the RAM is not accessible from the outside. For high-performance RAMs, refresh busy time is expected to be as low as possible. The refresh cycle frequency equals, where is the refresh time interval of cells in the retention mode, and increases with reducing junction temperature. In general, is much smaller than the, which is provided in specification and depends on the cell technology for the trench capacitor. The power consumed for refreshing cells can be derived as (3) where is an external supply voltage, is the active current drawn by the selected cells, and is the data retention current required by any inactive or nonselected cell. is the output node capacitance of each decoder, is the internal supply voltage, is the total capacitance of the CMOS logic and driver circuits in the periphery. Let represent the total static (dc) current of the periphery, and is the operating frequency. When we need to access a cell within a RAM, all the cells along the row, containing specific cell, are selected simultaneously. As mentioned earlier, we are using a simplified model of RAM, as shown in Fig. 3. This model is used for both conventional RAM and the modules used in the proposed architecture. This helps in developing easy-to-understand expressions for power, area, and performance estimates and comparisons. Equation (1) can be simplified for high-frequency DRAM operation (Fig. 10), and by the use of the CMOS NAND decoder, as well as by elimination of very low dc components, yielding the following reasonable approximation. Data Reading Power of Conventional DRAM: The destructive readout of a DRAM cell requires successive operations of amplification and restoration for the selected cell on every data read. Here, each cell is basically a trench capacitor, requiring charging and discharging during each reading. This is accomplished by a latch-type CMOS sense amplifier on each data line. So, during the reading of a data line, the associated trench capacitor is charged and discharged with a large voltage swing of (usually V) and with charging current of, where is the data line capacitance. The active power consumption during read is given by (1) (2) From (2) and (3), it follows that the following factors are crucial to reduce the power during any read/write cycle: 1) reducing charging capacitance (, ); 2) lowering the external and internal voltages (,, ); 3) reducing static current ; and 4) reducing refresh cycle frequency. As mentioned, several techniques have been offered to reduce circuit parameters. These techniques can be used in conjunction with our proposed architectural solution to low-power design. It is interesting to note that reducing design parameters like and can also reduce power consumption. Therefore, for instance, if previous researchers have proposed segmenting the word line, the proposed low-power architecture allows a systematic way to reduce. It further allows reduction of the to reduce power, not previously possible by the circuit level techniques. In the LPRAM, data is read out or written into by first choosing a selected module by the tree decoder, and power is being dissipated only by the decoder (switch nodes) on its path (Fig. 11). The address is then decoded in selected modules to locate the exact cell containing data. For example, in Fig. 11, only those switch nodes that are hatched consume power while reading a data from module. No other switch node is activated at all. This observation is used for modeling power for switching nodes. However, the switch nodes consume a very small fraction of overall power. Consider the example of a 16 M DRAM, with and. The same size of RAM implemented using LPRAM architecture will have and, with 16 nodes of 1 M each. In addition, if DWL is used for 16 divisions of the word line, then for traditional RAM, and the corresponding value for LPRAM is 64. The power reduction in the proposed RAM is achieved primarily by reducing these parameters. In the following, we develop various equations for power estimates.

8 644 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY 2004 Fig. 12. Access power reduction in LPRAM during normal mode. Fig. 11. Power dissipation model during normal mode in LPRAM. Data Reading Power of LPRAM in Normal Mode: The data read-out power for LPRAM can be formulated as where could be as small as, and is the effective capacitance seen in the tree decoder of LPRAM. Equation (5) provides the expression for the tree capacitance,. Estimation of : In the tree decoder, each switch node consists of a simple one-out-of-two decoder and buffers. The decoder is a one bit decoder consisting of one level of logic. Additionally, each decoded signal is controlled by the preceding subtree select (chip enable for the first level), and this introduces another level of logic. In each switch node, one bit of the address is decoded, and the rest of the address bits are simply transmitted. At each node, the signal has to drive a load of (two gates each offering a load of ), and the output gate has a drive capability of. The bus width is assumed to be. All bus lengths of the tree are computed with respect to, the length of the vertical side of the LPRAM (Fig. 17). The input buffer drives the bus up to the root node-length,. Let, the number of levels in the tree, be assumed odd. The length of the bus connecting level 1 to level 2 is. Thus, if is the capacitance of metal over field oxide, then the load offered by the bus, between levels 1 and 2, is. Each node is connected to two nodes at the next lower level. Therefore, a buffer at level has to drive two buffers at level, each offering a load of. Thus, this load can be modeled as. The total load that has to be driven at level 1, by the second gate, is and which is parallel to. The total capacitance seen at level 1 can, therefore, be represented as. The capacitance at level 2 is the same as level 1 because the bus lengths are the same. Further, after every two successive levels, the length of the bus to be driven decreases by half of the (4) level before. For example, levels 3 and 4 have to drive buses of lengths of ; and subsequently, levels 5 and 6 have to drive buses of length, and so on. Let. In general, the bus length to be driven by the node at level can be expressed as. A tree of depth will have decoding stages. Therefore, the total capacitance over the entire tree, from level 1 to the leaf nodes, can be modeled in parallel operation, as, which evaluates. So, the capacitance value seen from the root of the tree to the accessed node is given by Data Retention Power of LPRAM: The LPRAM achieves a corresponding reduction in retention power as well because of the reduction in both and architectural parameters. The equation for data retention power is given by Refreshing is done independently within each module. Also, we have. Let us assume is of the form as the ratio between and ; i.e.,, then. So, by an appropriate choice of, both the data read out power and the data retention power can be reduced! We have calculated the power dissipation of the proposed LPRAM for a large range of module sizes, and for four different RAM sizes, 4, 16, 64, and 256 M. The reduction in power dissipation over the traditional RAM is illustrated in Fig. 12 with a range of aspect ratios of individual memory node. These savings are shown as percentages of reduction in power dissipation over the same size of conventional RAM. For the sake of comparison, we have considered the same number of partitions (5) (6)

9 BHATTACHARJEE AND PRADHAN: LPRAM: NOVEL METHODOLOGY FOR LOW-POWER HIGH-PERFORMANCE RAM DESIGN 645 Fig. 13. Retention power reduction in LPRAM during normal mode. (four partitions or four-quadrant) in the traditional RAM, and the individual memory node of the LPRAM. From Fig. 12, we see that for the same size of RAM, we achieve greater access power savings when the aspect ratio of the individual memory node is greater. However, when aspect ratio become too large, the retention power increases. But, from Fig. 13 we also see that there is a savings of the retention power as well. (The test mode power savings is discussed later). It may be noted that the reduction in retention power is the result of potential Q-fold reduction in refresh frequency because each module has fewer words and a potential Q-fold reduction in number of bits per word. However, because the Q modules have to be refreshed in parallel, the net effect is at best linear reduction. One must also observe that while the refresh power can be reduced, the total energy needed to refresh remains the same. VII. PERFORMANCE IMPROVEMENT AND COMPARISONS In this section, we demonstrate potential performance improvements attainable by the proposed architecture. Node Delay: The primary delay in accessing data within a memory node is due to: 1) the selection of a word line (row decoding); 2) enabling the selected word line; and 3) the charge transfer between the selected cell. Column decoding is performed in parallel with the above operation, and does not appear in the critical delay, as long as that delay is less than the added delay of 1) and 2). Note that operation 3) cannot be done before column decoding; the address is buffered before it drives the decoder. Address Decoding Delay: Using the CMOS NAND decoder [7] for low-power operation, only one out of rows is charged for every addressing, and as assumed earlier, at each stage, address lines are decoded by switching a series of CMOS gates at each stage, except the first. Then, the capacitative load seen by the output of one stage to the next will be. For the CMOS NAND decoder, delay at every stage will be proportional:. The delay in decoding the address can be. Word Line Enable Delay: The most common model of word line enable delay is. Bit Line Delay: The general form of bit line delay is, where, and is the resistance of the transistor. So, we model the total delay along the critical path within a memory node of LPRAM as. Delay in Traditional RAM: We compute the delay in accessing data in traditional RAM; i.e.,, using the formula as given for, by setting and keeping all other circuit parameters unchanged. So,. Delay in LPRAM: As each module of the LPRAM follows the traditional architecture, we will compute both the additional cost of the signal propagating up and down the tree, as well as the delay within the node. The delay up the tree is less than the delay down the tree, since only one buffer delay is introduced at each node. In propagating the signals down the tree, the address bits are buffered and decoded. The data signals propagating up the tree after a read are simply buffered. However, conservatively, the delay up the tree is taken to be the same as the delay down the tree. Therefore, the total delay along the critical read access path for the LPRAM architecture can be modeled by. Tree Decoder Delay: In the tree decoder, each switch node consists of a one bit decoder, consisting of one level of logic. Additionally, each decoded signal is controlled by the previous subtree select (chip enable for the first level), and this introduces another level of logic. To estimate the worst case delays, we model the delay at each switch node as the sum of the signal propagation delay through two levels of logic, coupled with the delay for driving the bus structure and the gates at the next level. We have already seen in Section V that the total load offered to the bus structure of H-tree is, in parallel with. Therefore, the delay over the entire tree, from level 1 to the leaf nodes, is given by. Assuming inputs to the tree are also buffered, we have delay from the input to the root node, which is in the center of the layout, as. The total tree delay is the sum of the previous expressions, given by,. For simplicity, in the above analysis of the tree decoder delay, we have assumed or 1. In practice, we will keep the aspect ratio of the H-tree within 4:1; i.e.,, and will not produce any noticeable deviation from what is given above. The access time for the LPRAM architecture, as proposed here, is given by. This equation has been evaluated for RAM of sizes 4, 16, 64, and 256 M, with other parameters, as given in Appendix. Fig. 15 shows the percentage reduction in delay; i.e.,, of LPRAM with respect to the traditional architecture. Here, the tradeoff is between the gain in performance due to partitioning, versus the additional delay in traversing the tree. Importantly, we can see a steady improvement in the performance for higher RAMs. Like access power, performance of the LPRAM improves when the aspect ratio of individual memory node is increased. These graphs in Fig. 15 illustrate the performance improvement as the number of nodes increases for the same size of RAM. As expected, finer granularity results in shorter word length within each module resulting in faster RAMs. Furthermore, it may be observed that as the aspect ratio is increased, we obtain improvement in performance as well. This can be also explained by the fact that the higher the aspect ratio, the smaller the word line. However, the key observation to be made is that

10 646 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY 2004 Fig. 14. Uniform distances for various cells. there is a point of diminishing return. Increasing aspect ratio beyond a certain point does not improve performance. Performance Improvement in On-Chip Memory Designs: The proposed H-Tree layout for the cell arrays has the following when applied to unique advantage on-chip memory designs such as caches. In these applications, not only is speed very important but so is the ability to provide uniform access to different cells can be of practical significance. The H-Tree layout ensures uniform path lengths to various cells from the root node. In traditional designs the path lengths to various cells can vary significantly. However, in our design all of the cell arrays have the same distance from the root node. Here, the cell arrays have the same distance from the root node. This is further illustrated in Fig. 14. This uniform distance property, holds for any arbitrary sized H-tree layout. Further Speed-Up Using Pipelining Techniques: As illustrated in Fig. 9, the proposed architecture admits two different modes of address mapping noninterleaved as shown in Fig. 9(a) and interleaved as shown in Fig. 9(b). The interleaved mode will admit access to consecutive addresses in a single cycle. By placing data buffers in the switch nodes one can further access a block of data simultaneously consider the address mapping as shown in Fig. 16. Any consecutive addresses,, and reside in four different modules in the interleaved mode. By placing buffers in the switch nodes one can access more data in consecutive addresses in and out simultaneously through the pipelining as shown in Fig. 16. This can be specially effective when data transfers are done in blocks as in caches. VIII. AREA ESTIMATES WITH NEW TECHNOLOGY AND COMPARISONS As before, we use the simplified representation of RAM, shown in Fig. 3, in developing area-estimation formulas. Since these estimates are used for comparisons only, any other estimates should yield similar results. Area of a Memory Node: As mentioned earlier, within each memory node of LPRAM, the address bus of width is divided into two parts and and is decoded by the row and column decoders, respectively. This is done in such a way as to get a desired aspect ratio of. So, we get and. The row decoder using CMOS NAND gate selects one out of rows. Assuming a stage decoder, at each stage address lines are decoded by switching a series of CMOS gates. The area of the row decoder at the first stage,, [13] is given by, where, is the area of a single CMOS inverter needed, per output. All other stages require, as there is an extra signal line from the previous stage. The number of decoders in each stage is. The Fig. 15. Fig. 16. Performance enhancement in LPRAM. Pipelined block transfer of consecutive addresses. total area for the row decoder is then given by. The decoder area is a function of, the number of address lines and. The number of address lines decoded at each stage, depending on, in general, vary between two and four in most practical designs. Now, considering the width of the row decoder to be, we approximate the height of the row decoder to be.we similarly define, and. We take, and the column decoder and the sense amplifier are similarly characterized by and, respectively. Because the CMOS NAND decoder takes much more space, we need to put enough space between two rows to accommodate the decoder portion. Let the area of each cell be. Because the small trench capacitance area required for the individual cell is very small, the estimated height will be dominated by the height of the row and column decoders. We get the width of the node to be the maximum of, denoted as. Similarly, we calculate the height of the node to be. Here, we have added extra space, equivalent to, required for the sense amplifier. The timing and control in DRAM is implemented by a timing chain generated by delay elements. As in [13], this is imple-

11 BHATTACHARJEE AND PRADHAN: LPRAM: NOVEL METHODOLOGY FOR LOW-POWER HIGH-PERFORMANCE RAM DESIGN 647 Fig. 18. Area model of LPRAM. Fig. 17. Area model of a memory node. mented as. The memory node has address bits; therefore, it requires address buffers whose area is given by, and the data buffer is characterized by. Then, the area of each of the memory nodes (Fig. 17) is computed as (7) Area of Traditional RAM: Now, the area of the traditional RAM can be computed using (7), with small changes in the parameters. We set the address bus width equal to, and the aspect ratio of RAM to 1:1 (2:1), when is even (odd). So, here we use and.as will always remain greater than, the number of stages in the address decoder may need to be increased. We compute the area of the traditional RAM, using (7), with, between two and four. Area of a LPRAM: We assume the bus structure of the LPRAM is implemented such that the address bus is multiplexed (usually done with little, if any, performance penalty since the column address is required some time after the row address). The lower address bits can be multiplexed; the upper bits propagate directly for the subtree and the final node select. Therefore, the bus carries address lines, one data line, two lines (TEST & FAIL) for testing, two lines (FULL & DIV) for low-power configuration structures, and one for. The area of the LPRAM architecture (Fig. 18) can be computed by using the following parameters. Let be the width of the bus, let be the length of the horizontal side of the chip, and let be the length of the vertical side of the chip. Therefore:, where is the pitch, the difference between neighboring signals, in, and. Because the aspect ratio may be other than 1, depending on the value of, we explicitly need the height and the width of each node, rather than its area. Therefore, the area of the nodes and the bus structure is. Fig. 19. Area overhead in LPRAM. Finally, certain input buffers would be required to drive the tree, the area for which can be estimated as. The area of the LPRAM can, therefore, be expressed as. The area requirements of this architecture are analyzed and compared to the traditional architecture. Significant area increases for LPRAM architecture are seen for large numbers of nodes. However, the proposed architecture may be best-suited for defect-tolerance techniques. For example, any single defect can be tolerated by switching out half of the tree. By exploring such defect tolerance techniques, one may be able to obtain acceptable yield levels. To compute this increase, four sizes of RAMS are evaluated: 4, 16, 64, and 256 M, over a series of values of. Fig. 19 shows the percentage of increase in the area of the LPRAM over the traditional implementation. As expected, the key factor in area increase is the number of nodes. However, as shown below, the larger the number of nodes, the greater the performance improvement.

12 648 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY 2004 IX. IMPACT OF ASPECT RATIO ON POWER, PERFORMANCE, AND AREA At this point, we are now ready to draw some relation between aspect ratios ( and ) and the power consumption, performance, and area overhead in LPRAM. From all the previous analysis, it is clear that dominates the power consumption and performance. So, reducing will reduce power as well as improve performance, an aspect ratio of the module with being preferable. However, we cannot reduce arbitrarily, as this will increase gradually, which may even cross, thereby significantly increasing the data retention power consumption compared to traditional RAM. We will, therefore, prefer to be 1:2, 1:4, or 1:8, depending on the size of the RAM and the number of memory modules. For example, if we need to design a RAM of size or 16 M, one of the optimum LPRAM implementations could be to divide the RAM into 16 modules (i.e., ), and use. So, we will get and. Additionally, if we need to satisfy a chip-aspect ratio of 2: 1, we have two choices, as shown in Figs. 5 and 7, either able to be used, depending on the requirements. Choosing the organization given in Figs. 5 or 7 will not give any performance penalty, either, for the following reason. While computing the delay of the H-tree layout and, for simplicity, we have assumed a unit aspect ratio of the H-tree layout. However, all of the calculations, as well as the bus length, have been measured with respect to, the length of the vertical side of LPRAM. We had it in mind that the high-performance low-power LPRAM will be one with. So, will always be greater than, and using for our analysis will yield a much more conservative result. The fact can be even verified in Figs. 5 and 7, by observing that all the wire segments in the H-tree layout are less than. It has been also found that the area of the LPRAM is minimum, when both and are equal to one. This is because the area is mainly dominated by the decoder area which changes significantly, even if a single address line moves from row decoder to the column decoder and vice versa. So, for the same RAM of size 16 M, another alternative is to have. Such an implementation, however, will consume more power with low speed than the previous one, but could be laid in a smaller chip area than the previous one. X. TESTING The testability technique used here enables the of nodes to be tested in parallel, as mentioned earlier. Depending upon the size of the RAM and the number of modules in LPRAM, we set the value of. Thus, we get a test time saving of fold, without dissipating much power as well. A test algorithm with steps now definitely requires steps only. Testing the RAM involves three sets of tests: 1) testing the tree decoder; 2) testing the built-in test structure (BITS); and 3) testing the memory nodes. We will discuss the testing of the memory nodes only, the test procedure of the other parts being the same as given in [1]. Testing of Memory Nodes: At this stage, it simply looks like a RAM of size instead of. The tester, instead of performing the usual read and write, performs Test Read and Test Write, and the FAIL line is monitored to see which pattern fails. After going to all addresses of one module (i.e., after exploring address spaces), we jump address spaces, as they have been tested in parallel, previously. We stop when all the address spaces have been tested. Any test algorithm can be modified to do this. For example, the MATS, presented in [14] can be modified as follows. 1) Place RAM in Test Mode 2) For to do Begin 3) For to do in parallel Test Write 4) For to do in parallel Begin a) Test Read (BITS internally verifies that all set to 0) b) Test Write End 5) For to do in parallel cells have been Test Read (BITS internally verifies that all cells have been set to 1) End 6) Return the RAM to Normal Mode. Power Reduction During Test: The following elaborates on potential power savings during test. Consider, for example, the MATS algorithm [4]. A test cycle in MATS is comprised of 4 access to RAM. And assuming the tester runs at least as fast as the memory chip is, then we get For Conventional RAM: No. of test cycles. i.e., Total no of accesses Energy dissipation per test cycle Total Energy dissipation Total testing time For LPRAM: No of cells accessed per test cycle No of test cycles or, equivalently, i.e., Energy consumption per test cycle Total Energy dissipation Total testing time

13 BHATTACHARJEE AND PRADHAN: LPRAM: NOVEL METHODOLOGY FOR LOW-POWER HIGH-PERFORMANCE RAM DESIGN 649 TABLE I PROOF PROCEDURE OF LEMMA 1 Fig. 20. Reduction in power during testing in LPRAM (q =4). Speed up in testing. So, this speed-up is irrespective of the testing algorithm being used. As all modules are tested simultaneously, the peak power consumption during testing also grows closer to folds, compared to normal operation. However, LPRAM consumes very low power for accessing, and the test data are written and read locally within the quadrant, with up to five; therefore, we still get a reduction of about 20% power in 256 M RAM, depicted in Fig. 20, compared to the traditional RAM. At the same time, we get a four-time reduction in test time. XI. CONCLUSION A novel architecture for LPRAM is proposed. The LPRAM architecture saves about 35% power during normal operation for a 256 M RAM, compared to the traditional RAM. Also, for a 256 M RAM, LPRAM provides about 20% reduction in power during testing, with a 75% saving in test time. Thus, it reduces power consumption, both during normal operation and testing. Significantly, the proposed architecture achieves a higher speed (about four times higher) than the traditional architecture because of reduced word line length. Also, the performance enhancement is achieved because a much smaller number of cycles is needed for refreshing, with reduced refresh busy time. In addition, the BITS allows significant reduction in test time. It also indicated the strategy to attain further speed-up through address mapping combined with pipelining. There is an increase in the area over traditional RAM. This increase, however, may not impact the yield because the RAM nicely allows defect tolerance through reconfiguration. For example, the LPRAM with certain defects, can be reconfigured to small RAM half the size. Also, the defects within each module can be repaired using spare rows and columns. Again, we highlight the distinctions between the approach presented and the traditional approach of multiple cell arrays. 1. Our approach used the H-tree layout to equalize delays among different cells. This has a major advantage of delay predictability, and fast access. 2. Having independent refresh and decoder circuits, the proposed LPRAM allowed reduction in normal power and test power, as well as refresh power. Also, the independence is crucial to the built-in test strategy. 3. The LPRAM has an additional mode of low power which allows, in a sleep mode, reduction of power further, by switching off modules. 4. The design, unlike the cell-array partition, is both conceptually and in terms of implementation RECURSIVE in nature. This allows for ease in implementation verification and in design, itself. 5. The traditional approach is ad hoc, and is a circuit-based approach; ours, on the other hand, is systematic and architectural. So, any circuit-based approach can be employed to further reduce power. Our approach does not preclude any circuit-based approach. 6. We also have shown different types of address-mapping, which can provide some interesting advantages in achieving multiple bit access. APPENDIX A. Proof of Lemma1 Proof: Consider that is odd, so. Then, either and are both odd or both even. The first and second rows of Table I depict the possible assignment of,, and the corresponding assignments for,, and. The case when is even follows accordingly. B. Proof of Lemma2 Proof: From Lemma 1, it is necessary that either both and have to be even, or both have to be odd. Let be divided into two parts and as was done for LPRAM, such that, where is the number of levels in the tree, and is the number of address bits in each memory node of the

14 650 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY 2004 LPRAM. We divide into two parts ( and ), such that. Also, must be divided (as there are only modules in a tree of depth ) into two parts, and, for laying out into an H-Tree layout; then. This implies that there will be and modules on the horizontal and vertical side, respectively, in the H-Tree layout. Similarly, is divided into two parts ( and ) such that, contributing toward the aspect ratio of each module. Therefore, we get, and the number of possible ways by which the aspect ratio criteria can be achieved depends on the number of solutions of this last expression. Substituting by and by, we get. Also, we know that the number of positive integer solutions of the equation ;,aninteger, is.itis further noted that, though chip ARI is positive only, this can be both positive and negative (i.e., or, both cases, chip aspect ratio ). So, the number of possible configurations is given by or, equivalently,. Process and Design Parameters: The process parameters used in technology-dependent computations have been based on the example 18- m CMOS process given in [12]. The parameters are then scaled, as detailed in [14], based on the minimum feature size used for a particular technology using constant field model. Let be the scale factor. Therefore,, where (corresponding to the 18- m technology which is taken as the base technology). The value of the gate capacitance is approximated by the value of the gate-oxide capacitance and is a function of the oxide thickness. We have used pf m, and. We have used 0.1 ns for our computation. The source/drain-junction capacitance is an important parameter for estimating the bit-line capacitance. It consists of two parts: the planar- or junction-area capacitance and the sidewall- or junction-peripheral capacitance. If the drain/source region has a dimension of, then the resultant junction capacitance can be expressed as. This is the capacitance at 0-V bias. The value at any other voltage can be computed by. For our computation, we use (the minimum required for a contact of metal), (typical precharge voltage), and (the junction built-in potentila) and. The values of the other parameters are: pf m, capacitance of metal over poly; pf m, capacitance of metal over field; pf m, capacitance of poly; pf m, junction-area capacitance; pf m, junction sidewall capacitance; pf m, capacitance of the memory cell; m, area of a DRAM cell. Parameters for Power Estimation: (Parameters explicitly required for power estimation in the RAM)., the voltage swing in the RAM; 200 MHz, operating frequency of the Traditional RAM; where ;, internal supply voltage; amp, dc static current. Bit-Line Capacitance: The source/drain-junction capacitance is important for computing the equivalent capacitance of the bit line, and is obtained as explained previously. The bit-line capacitance can be estimated as, where the width of the metal line is, and the length is. Constants Characterizing the Other Functional Blocks: (These values are taken directly from), row and column decoder pitch/bit;, depth of sense amplifier/bit;, area of address buffer;, area of data buffer;, area of timing and control unit;, pitch of metal in the bus structure. REFERENCES [1] N. T. Jarwala and D. K. Pradhan, TRAM: A design methodology for high-performance, easily testable, multimegabit RAM s, IEEE Trans. Comput., vol. 37, pp , Oct [2] K. Itoh, K. Sasaki, and Y. Nakagome, Trends in low-power RAM circuit technologies, Proc. IEEE, vol. 83, pp , Apr [3] K. Itoh, Trends in megabit DRAM circuit design, IEEE J. Solid-State Circuits, vol. 25, pp , June [4] P. Mazumder and K. Chakraborty, Testing and Testable Design of Random-Access Memories. Norwell, MA: Kluwer, [5] S. Rai and V. P. Kirpalani, A modified TRAM architecture, IEEE Trans. Comput., vol. 45, pp , Aug [6] K. Itoh et al., An experimental 1 Mb DRAM with on-chip voltage limiter, in Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 1984, pp [7] K. Kimura et al., Power reduction in megabit DRAM s, IEEE J. Solid- State Circuits, vol. SSC 21, pp , June [8] M. Margala and N. G. Durdle, Noncomplementary BiCMOS logic and CMOS logic styles for low-voltage operation A comprehensive study, IEEE J. Solid-State Circuits, vol. 33, pp , Oct [9] A. Bellaouar and M. I. Elmasry, Low-Power Digital VLSI Design, Circuits and Systems. Norwell, MA: Kluwer, [10] J. S. Caravella, A low-voltage SRAM for embedded applications, IEEE J. Solid-State Circuits, vol. 32, pp , Oct [11] A. K. Sharma, Semiconductor Memories Technology, Testing and Reliability. Piscataway, NJ: IEEE Press, [12] D. C. Choi et al., Battery operated 16 M DRAM with post package programmable and variable self refresh, in Symp. VLSI Circuit Dig. Tech. Papers, May 1994, pp [13] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective. Reading, MA: Addison-Wesley, [14] R. Nair, Comments on An optimal algorithm for testing stuck-at faults in random access memories, IEEE Trans. Comput., vol. C-28, pp , Mar [15] S. Ravi, G. Lakshminarayana, and N. K. Jha, Testing of core-based systems-on-a-chip, IEEE Trans. Computer-Aided Design, vol. 20, Mar [16] N. C. C. Lu and H. Chao, Half- V bit-line sensing scheme in CMOS DRAM, IEEE J. Solid-State Circuits, vol. SSC 19, pp , Aug [17] A. Chandra and K. Chakraborty, Low-power scan testing and test data compression for system-on-chip, IEEE Trans. Computer-Aided Design, vol. 21, pp , May [18] R. P. Dick, G. Lakshminarayana, A. Raghunathan, and N. K. Jha, Power analysis of embedded operating systems, in Proc. IEEE Design Automation Conf., June [19] S. Bhattacharjee and D. K. Pradhan, A Low Power RAM Design, U.K. Patent filed, June 2003.

15 BHATTACHARJEE AND PRADHAN: LPRAM: NOVEL METHODOLOGY FOR LOW-POWER HIGH-PERFORMANCE RAM DESIGN 651 Subhasis Bhattacharjee received the B.E. degree in computer engineering from S. V. Regional College of Engineering and Technology, India, in 1996 and the M.Tech. degree in computer science from the Indian Statistical Institute, Calcutta, India, in He was a Senior Software Engineer with Wipro Ltd., Bangalore, India, and a Research Project Engineer with ISI, Calcutta, India, where he is now a Research Fellow. His research interests include very large scale integrated design, logic synthesis, and distributed systems. Dhiraj K. Pradhan (S 70 M 72 SM 80 F 88) received the M.S. degree from Brown University, Providence, RI, and the Ph.D. degree from the University of Iowa, Iowa City. He currently holds a Chair in computer science at the University of Bristol, Bristol, U.K. Recently, he had been a Professor in the Electrical and Computer Engineering Department, Oregon State University, Corvallis. Previous to this, he had held the COE Endowed Chair Professorship in Computer Science at Texas A&M University, College Station, and also serving as a Visiting Professor at Stanford University, Stanford, CA. Additionally, he held a Professorship at the University of Massachusetts, Amherst, where he also served as Coordinator of Computer Engineering. He has been with the University of California, Berkeley, Oakland University, Rochester, MI, and at the University of Regina, in Saskatchewan, Canada. He has contributed to very large scale integrated computer-aided design and test, as well as to fault-tolerant computing, computer architecture, and parallel processing research, with major publications in journals and conferences, spanning 30 years. He holds two U.S. patents. He has served as coauthor and editor of various books, including Fault-Tolerant Computing: Theory and Techniques, Vols. I & II (New York: Prentice-Hall, 1986), Fault-Tolerant Computer Systems Design (New York: Prentice-Hall, 1996), and IC Manufacturability: The Art of Process and Design Integration (Piscataway, NJ: IEEE Press, 2000). Professor Pradhan is a Fellow of ACM. He has served as Guest Editor of special issues in prestigious journals, such as the IEEE TRANSACTIONS ON COMPUTERS. He has also worked as an editor for several journals, including IEEE Transactions and JETTA. He has served as the General Chair and the Program Chair for various major conferences. He has received several awards, including the 1996 IEEE Transactions on Computer-Aided Design Best Paper Award, with W. Kunz, on Recursive Learning: A New Implication Technique for Efficient Solutions to CAD Problems Test, Verification and Optimization and the Humboldt Prize, Germany. In 1997, he was awarded the Fulbright-Flad Chair in Computer Science.

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache

More information

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 09, 2016 ISSN (online): 2321-0613 A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM Yogit

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Cache 11232011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Memory Components/Boards Two-Level Memory Hierarchy

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

Column decoder using PTL for memory

Column decoder using PTL for memory IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 5, Issue 4 (Mar. - Apr. 2013), PP 07-14 Column decoder using PTL for memory M.Manimaraboopathy

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA

More information

DIRECT Rambus DRAM has a high-speed interface of

DIRECT Rambus DRAM has a high-speed interface of 1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999 A 1.6-GByte/s DRAM with Flexible Mapping Redundancy Technique and Additional Refresh Scheme Satoru Takase and Natsuki Kushiyama

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS ABSTRACT We describe L1 cache designed for digital signal processor (DSP) core. The cache is 32KB with variable associativity (4 to 16 ways) and is pseudo-dual-ported.

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

CS250 VLSI Systems Design Lecture 9: Memory

CS250 VLSI Systems Design Lecture 9: Memory CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled

More information

Concept of Memory. The memory of computer is broadly categories into two categories:

Concept of Memory. The memory of computer is broadly categories into two categories: Concept of Memory We have already mentioned that digital computer works on stored programmed concept introduced by Von Neumann. We use memory to store the information, which includes both program and data.

More information

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

! Memory Overview. ! ROM Memories. ! RAM Memory  SRAM  DRAM. ! This is done because we can build.  large, slow memories OR ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 2: April 5, 26 Memory Overview, Memory Core Cells Lecture Outline! Memory Overview! ROM Memories! RAM Memory " SRAM " DRAM 2 Memory Overview

More information

Chapter Two - SRAM 1. Introduction to Memories. Static Random Access Memory (SRAM)

Chapter Two - SRAM 1. Introduction to Memories. Static Random Access Memory (SRAM) 1 3 Introduction to Memories The most basic classification of a memory device is whether it is Volatile or Non-Volatile (NVM s). These terms refer to whether or not a memory device loses its contents when

More information

AMD actual programming and testing on a system board. We will take a simple design example and go through the various stages of this design process.

AMD actual programming and testing on a system board. We will take a simple design example and go through the various stages of this design process. actual programming and testing on a system board. We will take a simple design example and go through the various stages of this design process. Conceptualize A Design Problem Select Device Implement Design

More information

Topics. ! PLAs.! Memories: ! Datapaths.! Floor Planning ! ROM;! SRAM;! DRAM. Modern VLSI Design 2e: Chapter 6. Copyright 1994, 1998 Prentice Hall

Topics. ! PLAs.! Memories: ! Datapaths.! Floor Planning ! ROM;! SRAM;! DRAM. Modern VLSI Design 2e: Chapter 6. Copyright 1994, 1998 Prentice Hall Topics! PLAs.! Memories:! ROM;! SRAM;! DRAM.! Datapaths.! Floor Planning Programmable logic array (PLA)! Used to implement specialized logic functions.! A PLA decodes only some addresses (input values);

More information

FPGA Power Management and Modeling Techniques

FPGA Power Management and Modeling Techniques FPGA Power Management and Modeling Techniques WP-01044-2.0 White Paper This white paper discusses the major challenges associated with accurately predicting power consumption in FPGAs, namely, obtaining

More information

A Low Power SRAM Base on Novel Word-Line Decoding

A Low Power SRAM Base on Novel Word-Line Decoding Vol:, No:3, 008 A Low Power SRAM Base on Novel Word-Line Decoding Arash Azizi Mazreah, Mohammad T. Manzuri Shalmani, Hamid Barati, Ali Barati, and Ali Sarchami International Science Index, Computer and

More information

+1 (479)

+1 (479) Memory Courtesy of Dr. Daehyun Lim@WSU, Dr. Harris@HMC, Dr. Shmuel Wimer@BIU and Dr. Choi@PSU http://csce.uark.edu +1 (479) 575-6043 yrpeng@uark.edu Memory Arrays Memory Arrays Random Access Memory Serial

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 13 COMPUTER MEMORY So far, have viewed computer memory in a very simple way Two memory areas in our computer: The register file Small number

More information

INTERCONNECT TESTING WITH BOUNDARY SCAN

INTERCONNECT TESTING WITH BOUNDARY SCAN INTERCONNECT TESTING WITH BOUNDARY SCAN Paul Wagner Honeywell, Inc. Solid State Electronics Division 12001 State Highway 55 Plymouth, Minnesota 55441 Abstract Boundary scan is a structured design technique

More information

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

! Memory.  RAM Memory.  Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell.  Used in most commercial chips ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 5, 8 Memory: Periphery circuits Today! Memory " RAM Memory " Architecture " Memory core " SRAM " DRAM " Periphery " Serial Access Memories

More information

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. UCAS-6 6 > Stanford > Imperial > Verify 2011 Marching Memory マーチングメモリ Tadao Nakamura 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. Flynn 1 Copyright 2010 Tadao Nakamura C-M-C Computer

More information

Magnetic core memory (1951) cm 2 ( bit)

Magnetic core memory (1951) cm 2 ( bit) Magnetic core memory (1951) 16 16 cm 2 (128 128 bit) Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Memory Read-Only Memory Random Access Non-Random Access EPROM E 2 PROM

More information

Low-Power SRAM and ROM Memories

Low-Power SRAM and ROM Memories Low-Power SRAM and ROM Memories Jean-Marc Masgonty 1, Stefan Cserveny 1, Christian Piguet 1,2 1 CSEM, Neuchâtel, Switzerland 2 LAP-EPFL Lausanne, Switzerland Abstract. Memories are a main concern in low-power

More information

a) Memory management unit b) CPU c) PCI d) None of the mentioned

a) Memory management unit b) CPU c) PCI d) None of the mentioned 1. CPU fetches the instruction from memory according to the value of a) program counter b) status register c) instruction register d) program status word 2. Which one of the following is the address generated

More information

Impact of JTAG/ Testability on Reliability

Impact of JTAG/ Testability on Reliability Impact of JTAG/1149.1 Testability on Reliability SCTA041A January 1997 1 IMPORTANT NOTICE Texas Instruments (TI) reserves the right to make changes to its products or to discontinue any semiconductor product

More information

edram Macro MUX SR (12) Patent Application Publication (10) Pub. No.: US 2002/ A1 1" (RH) Read-Buffer" JO s (19) United States

edram Macro MUX SR (12) Patent Application Publication (10) Pub. No.: US 2002/ A1 1 (RH) Read-Buffer JO s (19) United States (19) United States US 2002O174291A1 (12) Patent Application Publication (10) Pub. No.: US 2002/0174291 A1 Hsu et al. (43) Pub. Date: Nov. 21, 2002 (54) HIGH SPEED EMBEDDED DRAM WITH SRAM-LIKE INTERFACE

More information

Designing with Siliconix PC Card (PCMCIA) Power Interface Switches

Designing with Siliconix PC Card (PCMCIA) Power Interface Switches Designing with Siliconix PC Card (PCMCIA) Power Interface Switches AN716 Innovation in portable computer design is driven today by the need for smaller, lighter, and more energy-efficient products. This

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 8 Dr. Ahmed H. Madian ah_madian@hotmail.com Content Array Subsystems Introduction General memory array architecture SRAM (6-T cell) CAM Read only memory Introduction

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 6 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Contents FPGA Technology Programmable logic Cell (PLC) Mux-based cells Look up table PLA

More information

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types CSCI 4717/5717 Computer Architecture Topic: Internal Memory Details Reading: Stallings, Sections 5.1 & 5.3 Basic Organization Memory Cell Operation Represent two stable/semi-stable states representing

More information

PICo Embedded High Speed Cache Design Project

PICo Embedded High Speed Cache Design Project PICo Embedded High Speed Cache Design Project TEAM LosTohmalesCalientes Chuhong Duan ECE 4332 Fall 2012 University of Virginia cd8dz@virginia.edu Andrew Tyler ECE 4332 Fall 2012 University of Virginia

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary

More information

A Novel Methodology to Debug Leakage Power Issues in Silicon- A Mobile SoC Ramp Production Case Study

A Novel Methodology to Debug Leakage Power Issues in Silicon- A Mobile SoC Ramp Production Case Study A Novel Methodology to Debug Leakage Power Issues in Silicon- A Mobile SoC Ramp Production Case Study Ravi Arora Co-Founder & CTO, Graphene Semiconductors India Pvt Ltd, India ABSTRACT: As the world is

More information

Power Optimization in FPGA Designs

Power Optimization in FPGA Designs Mouzam Khan Altera Corporation mkhan@altera.com ABSTRACT IC designers today are facing continuous challenges in balancing design performance and power consumption. This task is becoming more critical as

More information

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy. ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 4, 7 Memory Overview, Memory Core Cells Today! Memory " Classification " ROM Memories " RAM Memory " Architecture " Memory core " SRAM

More information

Design of Low Power Wide Gates used in Register File and Tag Comparator

Design of Low Power Wide Gates used in Register File and Tag Comparator www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,

More information

Embedded SRAM Technology for High-End Processors

Embedded SRAM Technology for High-End Processors Embedded SRAM Technology for High-End Processors Hiroshi Nakadai Gaku Ito Toshiyuki Uetake Fujitsu is the only company in Japan that develops its own processors for use in server products that support

More information

UNIT IV CMOS TESTING

UNIT IV CMOS TESTING UNIT IV CMOS TESTING 1. Mention the levels at which testing of a chip can be done? At the wafer level At the packaged-chip level At the board level At the system level In the field 2. What is meant by

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

Chapter 6. CMOS Functional Cells

Chapter 6. CMOS Functional Cells Chapter 6 CMOS Functional Cells In the previous chapter we discussed methods of designing layout of logic gates and building blocks like transmission gates, multiplexers and tri-state inverters. In this

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

TEXAS INSTRUMENTS ANALOG UNIVERSITY PROGRAM DESIGN CONTEST MIXED SIGNAL TEST INTERFACE CHRISTOPHER EDMONDS, DANIEL KEESE, RICHARD PRZYBYLA SCHOOL OF

TEXAS INSTRUMENTS ANALOG UNIVERSITY PROGRAM DESIGN CONTEST MIXED SIGNAL TEST INTERFACE CHRISTOPHER EDMONDS, DANIEL KEESE, RICHARD PRZYBYLA SCHOOL OF TEXASINSTRUMENTSANALOGUNIVERSITYPROGRAMDESIGNCONTEST MIXED SIGNALTESTINTERFACE CHRISTOPHEREDMONDS,DANIELKEESE,RICHARDPRZYBYLA SCHOOLOFELECTRICALENGINEERINGANDCOMPUTERSCIENCE OREGONSTATEUNIVERSITY I. PROJECT

More information

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007 Minimizing Power Dissipation during Write Operation to Register Files Kimish Patel, Wonbok Lee, Massoud Pedram University of Southern California Los Angeles CA August 28 th, 2007 Introduction Outline Conditional

More information

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts Hardware/Software Introduction Chapter 5 Memory Outline Memory Write Ability and Storage Permanence Common Memory Types Composing Memory Memory Hierarchy and Cache Advanced RAM 1 2 Introduction Memory:

More information

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction Hardware/Software Introduction Chapter 5 Memory 1 Outline Memory Write Ability and Storage Permanence Common Memory Types Composing Memory Memory Hierarchy and Cache Advanced RAM 2 Introduction Embedded

More information

Section 3 - Backplane Architecture Backplane Designer s Guide

Section 3 - Backplane Architecture Backplane Designer s Guide Section 3 - Backplane Architecture Backplane Designer s Guide March 2002 Revised March 2002 The primary criteria for backplane design are low cost, high speed, and high reliability. To attain these often-conflicting

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 938 LOW POWER SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY T.SANKARARAO STUDENT OF GITAS, S.SEKHAR DILEEP

More information

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering IP-SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY A LOW POWER DESIGN D. Harihara Santosh 1, Lagudu Ramesh Naidu 2 Assistant professor, Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India

More information

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory William Stallings Computer Organization and Architecture 6th Edition Chapter 5 Internal Memory Semiconductor Memory Types Semiconductor Memory RAM Misnamed as all semiconductor memory is random access

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 8(2) I DDQ Current Testing (Chapter 13) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Learning aims Describe the

More information

ECE 486/586. Computer Architecture. Lecture # 2

ECE 486/586. Computer Architecture. Lecture # 2 ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:

More information

Power Consumption in 65 nm FPGAs

Power Consumption in 65 nm FPGAs White Paper: Virtex-5 FPGAs R WP246 (v1.2) February 1, 2007 Power Consumption in 65 nm FPGAs By: Derek Curd With the introduction of the Virtex -5 family, Xilinx is once again leading the charge to deliver

More information

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology Vol. 3, Issue. 3, May.-June. 2013 pp-1475-1481 ISSN: 2249-6645 Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology Bikash Khandal,

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

EECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5. EECS 427 F09 Lecture Reminders

EECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5. EECS 427 F09 Lecture Reminders EECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5 1 Reminders Deadlines HW4 is due Tuesday 11/17 at 11:59 pm (email submission) CAD8 is due Saturday 11/21 at 11:59 pm Quiz 2 is on Wednesday

More information

Design and verification of low power SRAM system: Backend approach

Design and verification of low power SRAM system: Backend approach Design and verification of low power SRAM system: Backend approach Yasmeen Saundatti, PROF.H.P.Rajani E&C Department, VTU University KLE College of Engineering and Technology, Udhayambag Belgaum -590008,

More information

CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL

CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL Shyam Akashe 1, Ankit Srivastava 2, Sanjay Sharma 3 1 Research Scholar, Deptt. of Electronics & Comm. Engg., Thapar Univ.,

More information

CENG 4480 Lecture 11: PCB

CENG 4480 Lecture 11: PCB CENG 4480 Lecture 11: PCB Bei Yu Reference: Chapter 5 of Ground Planes and Layer Stacking High speed digital design by Johnson and Graham 1 Introduction What is a PCB Why we need one? For large scale production/repeatable

More information

CMOS Logic Circuit Design Link( リンク ): センター教官講義ノートの下 CMOS 論理回路設計

CMOS Logic Circuit Design   Link( リンク ): センター教官講義ノートの下 CMOS 論理回路設計 CMOS Logic Circuit Design http://www.rcns.hiroshima-u.ac.jp Link( リンク ): センター教官講義ノートの下 CMOS 論理回路設計 Memory Circuits (Part 1) Overview of Memory Types Memory with Address-Based Access Principle of Data Access

More information

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01

More information

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering,

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering, Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering, K.S.R College of Engineering, Tiruchengode, Tamilnadu,

More information

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee

More information

EE414 Embedded Systems Ch 5. Memory Part 2/2

EE414 Embedded Systems Ch 5. Memory Part 2/2 EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage

More information

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM ECEN454 Digital Integrated Circuit Design Memory ECEN 454 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports DRAM Outline Serial Access Memories ROM ECEN 454 12.2 1 Memory

More information

Low-Power Technology for Image-Processing LSIs

Low-Power Technology for Image-Processing LSIs Low- Technology for Image-Processing LSIs Yoshimi Asada The conventional LSI design assumed power would be supplied uniformly to all parts of an LSI. For a design with multiple supply voltages and a power

More information

ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES

ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES Shashikiran H. Tadas & Chaitali Chakrabarti Department of Electrical Engineering Arizona State University Tempe, AZ, 85287. tadas@asu.edu, chaitali@asu.edu

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

DESIGN AND IMPLEMENTATION OF 8X8 DRAM MEMORY ARRAY USING 45nm TECHNOLOGY

DESIGN AND IMPLEMENTATION OF 8X8 DRAM MEMORY ARRAY USING 45nm TECHNOLOGY DESIGN AND IMPLEMENTATION OF 8X8 DRAM MEMORY ARRAY USING 45nm TECHNOLOGY S.Raju 1, K.Jeevan Reddy 2 (Associate Professor) Digital Systems & Computer Electronics (DSCE), Sreenidhi Institute of Science &

More information

POWER REDUCTION IN CONTENT ADDRESSABLE MEMORY

POWER REDUCTION IN CONTENT ADDRESSABLE MEMORY POWER REDUCTION IN CONTENT ADDRESSABLE MEMORY Latha A 1, Saranya G 2, Marutharaj T 3 1, 2 PG Scholar, Department of VLSI Design, 3 Assistant Professor Theni Kammavar Sangam College Of Technology, Theni,

More information

STUDY OF SRAM AND ITS LOW POWER TECHNIQUES

STUDY OF SRAM AND ITS LOW POWER TECHNIQUES INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN ISSN 0976 6464(Print)

More information

Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier

Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier Vivek. V. Babu 1, S. Mary Vijaya Lense 2 1 II ME-VLSI DESIGN & The Rajaas Engineering College Vadakkangulam, Tirunelveli 2 Assistant Professor

More information

Chapter 6 (Lect 3) Counters Continued. Unused States Ring counter. Implementing with Registers Implementing with Counter and Decoder

Chapter 6 (Lect 3) Counters Continued. Unused States Ring counter. Implementing with Registers Implementing with Counter and Decoder Chapter 6 (Lect 3) Counters Continued Unused States Ring counter Implementing with Registers Implementing with Counter and Decoder Sequential Logic and Unused States Not all states need to be used Can

More information

Design and Analysis of 32 bit SRAM architecture in 90nm CMOS Technology

Design and Analysis of 32 bit SRAM architecture in 90nm CMOS Technology Design and Analysis of 32 bit SRAM architecture in 90nm CMOS Technology Jesal P. Gajjar 1, Aesha S. Zala 2, Sandeep K. Aggarwal 3 1Research intern, GTU-CDAC, Pune, India 2 Research intern, GTU-CDAC, Pune,

More information

A Comparative Study of Power Efficient SRAM Designs

A Comparative Study of Power Efficient SRAM Designs A Comparative tudy of Power Efficient RAM Designs Jeyran Hezavei, N. Vijaykrishnan, M. J. Irwin Pond Laboratory, Department of Computer cience & Engineering, Pennsylvania tate University {hezavei, vijay,

More information

RAM Testing Algorithms for Detection Multiple Linked Faults

RAM Testing Algorithms for Detection Multiple Linked Faults RAM Testing Algorithms for Detection Multiple Linked Faults V.G. Mikitjuk, V.N. Yarmolik, A.J. van de Goor* Belorussian State Univ. of Informatics and Radioelectronics, P.Brovki 6, Minsk, Belarus *Delft

More information

THE widespread use of embedded cores in system-on-chip

THE widespread use of embedded cores in system-on-chip IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 12, DECEMBER 2004 1263 SOC Test Planning Using Virtual Test Access Architectures Anuja Sehgal, Member, IEEE, Vikram Iyengar,

More information

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory William Stallings Computer Organization and Architecture 8th Edition Chapter 5 Internal Memory Semiconductor Memory The basic element of a semiconductor memory is the memory cell. Although a variety of

More information

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition William Stallings Computer Organization and Architecture 6th Edition Chapter 5 Internal Memory 5.1 Semiconductor Main Memory 5.2 Error Correction 5.3 Advanced DRAM Organization 5.1 Semiconductor Main Memory

More information

Improving Memory Repair by Selective Row Partitioning

Improving Memory Repair by Selective Row Partitioning 200 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems Improving Memory Repair by Selective Row Partitioning Muhammad Tauseef Rab, Asad Amin Bawa, and Nur A. Touba Computer

More information

Prototype of SRAM by Sergey Kononov, et al.

Prototype of SRAM by Sergey Kononov, et al. Prototype of SRAM by Sergey Kononov, et al. 1. Project Overview The goal of the project is to create a SRAM memory layout that provides maximum utilization of the space on the 1.5 by 1.5 mm chip. Significant

More information

Address connections Data connections Selection connections

Address connections Data connections Selection connections Interface (cont..) We have four common types of memory: Read only memory ( ROM ) Flash memory ( EEPROM ) Static Random access memory ( SARAM ) Dynamic Random access memory ( DRAM ). Pin connections common

More information

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 01 Introduction Welcome to the course on Hardware

More information

Basic Idea. The routing problem is typically solved using a twostep

Basic Idea. The routing problem is typically solved using a twostep Global Routing Basic Idea The routing problem is typically solved using a twostep approach: Global Routing Define the routing regions. Generate a tentative route for each net. Each net is assigned to a

More information

Memory Supplement for Section 3.6 of the textbook

Memory Supplement for Section 3.6 of the textbook The most basic -bit memory is the SR-latch with consists of two cross-coupled NOR gates. R Recall the NOR gate truth table: A S B (A + B) The S stands for Set to remember, and the R for Reset to remember.

More information

Recent Advancements in Bus-Interface Packaging and Processing

Recent Advancements in Bus-Interface Packaging and Processing Recent Advancements in Bus-Interface Packaging and Processing SCZA001A February 1997 1 IMPORTANT NOTICE Texas Instruments (TI) reserves the right to make changes to its products or to discontinue any semiconductor

More information

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

ECEN 449 Microprocessor System Design. Memories. Texas A&M University ECEN 449 Microprocessor System Design Memories 1 Objectives of this Lecture Unit Learn about different types of memories SRAM/DRAM/CAM Flash 2 SRAM Static Random Access Memory 3 SRAM Static Random Access

More information

Design and Implementation of Low Leakage Power SRAM System Using Full Stack Asymmetric SRAM

Design and Implementation of Low Leakage Power SRAM System Using Full Stack Asymmetric SRAM Design and Implementation of Low Leakage Power SRAM System Using Full Stack Asymmetric SRAM Rajlaxmi Belavadi 1, Pramod Kumar.T 1, Obaleppa. R. Dasar 2, Narmada. S 2, Rajani. H. P 3 PG Student, Department

More information

1. Designing a 64-word Content Addressable Memory Background

1. Designing a 64-word Content Addressable Memory Background UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Project Phase I Specification NTU IC541CA (Spring 2004) 1. Designing a 64-word Content Addressable

More information

PERSONAL communications service (PCS) provides

PERSONAL communications service (PCS) provides 646 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 5, OCTOBER 1997 Dynamic Hierarchical Database Architecture for Location Management in PCS Networks Joseph S. M. Ho, Member, IEEE, and Ian F. Akyildiz,

More information

Reference Sheet for C112 Hardware

Reference Sheet for C112 Hardware Reference Sheet for C112 Hardware 1 Boolean Algebra, Gates and Circuits Autumn 2016 Basic Operators Precedence : (strongest),, + (weakest). AND A B R 0 0 0 0 1 0 1 0 0 1 1 1 OR + A B R 0 0 0 0 1 1 1 0

More information

8Kb Logic Compatible DRAM based Memory Design for Low Power Systems

8Kb Logic Compatible DRAM based Memory Design for Low Power Systems 8Kb Logic Compatible DRAM based Memory Design for Low Power Systems Harshita Shrivastava 1, Rajesh Khatri 2 1,2 Department of Electronics & Instrumentation Engineering, Shree Govindram Seksaria Institute

More information