PRES: A Pseudo-Random Encoding Scheme to Increase Bit Flip Reduction in Memory

Size: px
Start display at page:

Download "PRES: A Pseudo-Random Encoding Scheme to Increase Bit Flip Reduction in Memory"

Transcription

1 PRES: A Pseudo-Random Encoding Scheme to Increase Bit Flip Reduction in Memory Seyed Mohammad Seyedzadeh, Raan Maddah, Alex Jones, Rami Melhem Computer Science Department, University of Pittsburgh {seyedzadeh,maddah,jones,melhem}@cs.pitt.edu ABSTRACT Nonvolatile memory technologies such as Spin-Transfer Torque Random Access Memory (STT-RAM) and Phase Change Memory (PCM) are emerging as promising replacements to DRAM. Before deploying STT-RAM and PCM into functional systems, a number of challenges still remain. Specifically, both require relatively high write energy, STT-RAM suffers from high bit error rates and PCM suffers from low endurance. A common solution to overcome those challenges is to minimize the number of bits changed per write. In this wor, we introduce Pseudo-Random Encoding Scheme (PRES) to minimize the number of bit changes during memory writes. PRES maps the write data vector into an intermediate highly random set of data vectors. Subsequently, the intermediate data vector that yields the least number of differences when compared to the currently stored data is selected. Our evaluation shows that PRES reduces bit flips by up to 5% over a baseline differential writing scheme. Further, PRES reduces bit flips by 15% over the leading bit-flip minimization scheme, while decreasing encoding and decoding complexities by more than 90%. Categories and Subject Descriptors B.8. [Performance and Reliability]: Performance Analysis General Terms Design, Reliability Keywords Memory, Endurance, Pseudo-Random Encoding 1. INTRODUCTION As the number of cores per chip continues to increase, the memory system is becoming more than ever a defining component for the performance of computer systems. A large memory capacity operating under stringent quality of service requirements is required to respond to memory access requests of executing cores within acceptable latencies. Unfortunately, DRAM, which currently forms the building bloc of the memory system, is becoming limited by power and scalability challenges, thus, endangering the evolution of the memory system. This has turned the attention of architects and researchers to considering alternative memory technologies [1 4]. Amongst several memory candidates, both Phase Change Memory (PCM) and Spin-Transfer Torque Random Access Memory (STT-RAM) are receiving considerable attention as potential replacements for DRAM. Assessments and evaluations of PCM [5, 6] show that it can compete with DRAM in terms of performance while providing improved scalability and power efficiency. Multi-level cell techniques for STT- RAM [7] suggest the potential for near DRAM densities while retaining a near SRAM performance (potentially faster than DRAM). Yet, both technologies suffer from a number of challenges that must be resolved to become viable for high volume manufacturing. Specifically, PCM suffers from low endurance [8] (10 6 to 10 8 writes on average) and STT-RAM suffers from high write bit error rates [9, 10]. To address the challenges faced by PCM and STT-RAM, servicing write requests while minimizing the actual number of bits written to memory is a promising approach. This achieves a reduced effective wear-out rate of PCM cells and a lower effective bit error rate for STT-RAM. In its simplest form, minimizing the actual number of bit flips can be achieved through a concept called a differential write [11]. Differential write is a bit by bit comparison between the new data to be written and the currently stored data within the memory bloc and subsequently only writing to the cells that their stored bit values differ than their new bit values. Clearly, the higher the similarity of the new data to the currently stored data the lower the number of required bit changes. In this realm, coding techniques [1 16] have been used to encode the new data into a form that exhibits high similarity of the data to be written to the currently stored data. The encoding process consists of mapping the new data vector to be written into several other vector candidates and subsequently picing the candidate that minimizes the bit flips. In this paper, we mae the following contributions to data encoding for bit flip minimization as follows: We observe that increasing the randomness of encoded vector candidates decreases the required number of bit flips required for unbiased (i.e. apparently random) data sets. We demonstrate mathematically and by simulation that random encoding outperforms the leading bit-minimization encoding approach based partially inverting bits of the current value. As truly random data vector generation is a non-reversible transformation, we propose a low overhead, reversible pseudo-random encoding scheme (PRES) for bit-flip minimization of unbiased data. We demonstrate the randomness using a battery of random code tests for both PRES and the leading encoding for bit flip minimization that uses Hamming dual codes. PRES reduces bit flips by up to 5% over a baseline differential writing scheme. Further, PRES reduces bit flips

2 by 15% over the leading bit-flip minimization scheme, Flip- Min, while decreasing encoding and decoding complexities by more than 90%. The remainder of this paper is organized as follows. Section presents fundamentals on bit flip reduction techniques. Section 3 explores the effect of randomization on bit flip reduction. Section 4 describes PRES in detail including the encoding and decoding process and overheads. Section 5 presents a experimental evaluation of PRES and conclusions are related in Section 6.. BACKGROUND: WRITE MINIMIZATION USING COSET THEORY There have been several proposals to minimize the number of bit differences between two consecutive values for a variety of purposes. For example, coset theory was proposed to reduce the number of bit transitions between two successive values [17, 18]. However, coset theory can be used to further reduce the number of bits written in emerging memory technologies. Recall, as mentioned in the previous section, differential write [11] is a technique to minimize bits written by only writing bits that change value during the write operation. Consider an n-bit data bloc B that is to be written to an n-cell memory bloc that already stores data D. A traditional differential write first reads D, compares it with B and only writes the bits of B that are different from their corresponding bits in D. Thus, for entirely random data, the distribution of zeros and ones in both B and D is random, and hence, on average, n/ bits are written to memory instead of n bits. Coset theory attempts to increase the number of bits that are identical in B and D by exploring a number K of possible encodings of B, where K =. The overall number of written bits can be reduced by selecting the encoding that has the minimum Hamming distance to D (the fewest number of different bits). To recover the original data B, it is necessary to have a unique decoding path. To accomplish this, the coset approach defines a fixed set of K, n-bit vectors, C 0,..., C K 1 and uses these vectors to generate K different encodings of B through bit-wise XOR. We denote these K different encodings of B as W 0,, W K 1 where each W i, 0 i K 1, is an (n + )-bit vector with the first n bits equal C i B and the last bits record the binary representation of the index i. Any W i among the K possible encodings can be decoded bac to B by recovering i from the last bits of W i to looup and XOR C i with the first n bits of W i, i.e., (C i B) C i = B. To minimize the number of bits written, as previously mentioned, coset writes the encoding W i that has the minimum Hamming distance from D (the data already stored in memory). However, using coset, each W i and D contains n + bits, while the data bloc, B, contains only n bits. The additional bits, which we recall stores the index i, represents the coding overhead that is needed to reduce the Hamming distance and minimize the number of bits written during a differential write. We denote these overhead bits as h 0,..., h 1. In the rest of this paper, we will refer to the set C 0,..., C K 1 as the base coset. Flip-N-Write (FNW) proposes a method to reduce bits written using differential write by selectively inverting blocs of the data to be written [15]. In general, for any number of bits, Flip-N-Write divides B into equal partitions and writes each partition directly or inverted, whichever minimizes the number of written bits. It uses overhead bits h 0,, h 1 to trac which partition is inverted to allow retrieval of the original data. FNW is a special case of the coset approach in which each n-bit vector C i in the base coset is constructed from K = sub-vectors, C i,j, j = 0,, K 1, each consisting of n/ bits. Specifically, if the overhead bits, h 0,, h 1, are used to store the binary representation of the index i, then the n/ bits of C i,j are all zeroes if h j = 0 and the n/ bits of C i,j are all ones if h j = 1. For example, if = 1 (that is K = ), then the Base Coset for Flip-N-Write contains the two vectors 0 0 and 1 1, which means that the data vector B can be either written as is B 0 0 or written inverted B 1 1, with one overhead bit, h 0, indicating which of the two option is used. For = (that is K = 4), the Base Coset for Flip-N-Write contains the 4 vectors , , and , which means that B is divided into two halves and each of the two halves can be either inverted or not inverted, with one overhead bit, h 0, to indicate if the first half is inverted and another overhead bit, h 1, to indicate if the second half is inverted. Another approach for write minimization, Flip-Min, proposes using a special instance of coset theory to encode using Hamming (7,64) s dual code [19] to obtain W 0,, W K 1 from B and to decode any W i bac to B [16]. Experimental studies showed that Flip-Min reduces the number of written bits further than FNW. Thus, we have two options from previous wor to build n-bit vectors in the base coset. The first option is based on FNW and leverages simple and non-random patterns such as 0 0 and 1 1. The second option is based on FlipMin and utilizes the generator of the linear code. Since n-bit vectors are derived from the linear combinations of the rows of the generator matrix, they are not random. In the next section, we propose a third option for the base coset. 3. WRITE MINIMIZATION USING A RANDOM BASE COSET In this section, we tae a different approach to building the base coset by randomly generating each vector C 0,, C K 1. In this section we mathematically demonstrate that a random base coset can complete write requests while inducing fewer bit flips than FNW. Given a memory bloc of size n and auxiliary bits, the number of written bits (NWB) by FNW [15] can be expressed as: NW B F NW = n 1 n 1 i i=0 ( n ) ( ) + n n i n +1 n To derive a formula for the number of written bits when the base coset consists of K randomly generated vectors, C 0,, C K 1, we compute the Hamming distance between each encoded vector C i B and the currently stored data, D. We then estimate the expected value of the minimum of this distance over i = 0,, K 1. The Hamming distance between C i B and D is equal to the Hamming weight of C i B D. Moreover, because B, D and C i are random, then C i B D is also random. This implies that NW B Random is equivalent to the expected value of the minimum Hamming weight of K random vectors. The random vector, C i can be any element of the set of n possible strings of zeroes and ones. We divide the set of n possible elements into the set of n distinct groups so that each group has elements. The number of groups can change from one group ( = n) to n groups ( = 0). We assume that the weight of a word is determined by the number of ones it contains denoted by the parameter w such that 0 w n. The total number of elements that has weight w in the (1)

3 set of n elements is ( n w). Since the number of elements in each group is, the number of elements with weight w in a group of elements ranges 1 l. In the set of n elements, there are ( n ) different combinations to form n distinct groups so that each group includes elements. To obtain the average weight of groups, we follow steps 1-3 as follows: Step 1. Find the number of combinations with the weight w so that each combination has at least one element with weight w and at most elements with weight w as follows: l=1 (( n w l )) ( n j=w+1 l ( ) n j) Step. Multiply outputs of Step 1 by the corresponding weights as follows: ( n (( n )) ( n ( )) n w w j=w+1 j) (3) l l w=0 l=1 Step 3. Calculate the average number of bits updated per write (denoted NW B Random ) as follows: ( n w=0 w ( ( n ) ( w) n l=1 l j=w+1 ( n j) ) ) l NW B Random = ( n ) (4) where 0 w n, 1 l and 0 n. Figure 1 shows the bit flip reduction over differential write (n/ bit flips) achieved by random based cosets and FNW based cosets derived through Eq. (4) and Eq. (1). The random based coset achieves considerably higher flip reduction rates than a FNW based coset for various different bloc sizes. Accordingly, we propose in the next section a new write minimization scheme that is based on the idea of random cosets. Unfortunately, it is infeasible to determine an analytical model to estimate the number of required bit flips for Flip- Min to complete a write request. Thus, we rely on Monte Carlo simulations to compare random cosets against Flip- Min. Our analysis from these simulations do indicate that random cosets can achieve significantly fewer bit flips than Flip-Min and we report these results in Section WRITE MINIMIZATION USING PSEUDO- RANDOM ENCODING In Section we introduced cosets that can encode a value B into a value W i that reduces the bit transitions when writing to a memory cell containing D using differential write. We can also refer to the encoded value W i as a codeword. In Section 3 we demonstrated that using a randomly generated coset provided higher reduction in bit flips to the leading existing flip minimization schemes. However, to decode a codeword encoded with a random coset, we need to now the random vector C i that we used to encode the data. Since an inverse generator matrix concept used in Flip-Min is not a valid option for a random coset, we are left with the option of generating the random coset elements in advance and storing them in both the encoder and the decoder. Unfortunately, this option incurs an unnecessarily large storage overhead. Another potential solution is to use a random number generator at encoding time. Such an encoder leverages the initial value B to generate the random coset. The use of traditional pseudo-random number generator has a few dif- () Bit Flip Reduc:on ( =8 auxiliary bits) 5% 0% 15% 10% 5% 0% Random Based Coset FNW Based Coset 64 (1.5%) 18 (6.5%) 56 (3.15%) 51 (1.565%) 104 (0.7815%) Bloc Size Figure 1: Weight average difference of FNW and random cosets with 8 auxiliary bits and different bloc sizes. The value in between the parenthesis represents space overhead. ficulties, the overhead of generating these random numbers has a high hardware overhead and is typically irreversible. To address these difficulties, in this section we propose PRES, a novel tree-structure pseudo-random encoding model to generate pseudo-random vectors directly from the value B. PRES does not require additional storage for coset elements. PRES codewords are quicly recoverable using an efficient decoding mechanism that is less complex than Flip- Min. Further, we define conditions for our proposed treestructure model to encode/decode to guarantee that the generation of the pseudo-random vectors is demonstrably close to true random through several standard tests. We describe the PRES scheme in detail in the following sections. 4.1 PREM: A Pseudo-Random Encoding Model We first define a pseudo-random encoding model (PREM) to decorrelate a data bloc B as: P n 1 B i i=0 P i = B i 1 B i i=1 P i 1 B i i=,...,n-1 where B i and P i represent the i-th element of the data bloc and the pseudo-random vector, respectively. The parameter n is the number of memory cells to be encoded. As illustrated in Eq. (5), P i for 1 i n 1 is first generated, followed by P 0 that is produced with P n 1 from the previous step and B 0. Eq. (5) has been designed so that all B i have the potential to be updated after each encoding process; Since, the probability of the cell having a 0 or 1 is 1, it follows that the probability of the cell being updated is also 1. Thus, the probability of cells with dif- Pn- 1 Pn- 1 B0 B1... Bn- Bn- 1 P0 P1... Pn- Pn- 1 B0 B1... Bn- Bn- 1 Figure : Overview of PREM for encoding and decoding. Encoder Decoder (5)

4 LR RL TB BT Dataword Bloc/Index (a)/ (b)/ (c)/ Figure 3: PRES 16-bit example (a) the 4 4 bloc, (b) the generation of parents, (c) the generation of children. ferent values is 1 because the corresponding combinations are 0 1 and also 1 0 out of the four possible combinations that also include 0 0 and 1 1. Therefore, there is a high probability to produce pseudo-random vectors using Eq. (5). The corresponding decoding algorithm for Eq. (5) can be expressed as: P n 1 P i i=0 B i = B i 1 P i i=1 (6) P i 1 P i i=,...,n-1 Figure shows the feedbac path from the left side to the right side that causes P i to be produced serially. As shown in Figure, there is no feedbac path between P i and B i. However, the advantage of this configuration is that all cells in the decoder can be simultaneously decoded. Thus, read accesses that are typically on the critical path for processor performance and require decoding can be streamlined. Further, several pseudo-random encodings can be obtained by applying PREM in different bit-orderings to expand the number of candidate encodings. For example, if the feedbac path in Figure is used from the right side to the left side (i.e., reversed), the pseudo-random vector P can be generated. Thus, it is possible to utilize one bit pattern in two different directions to build P and P. To produce additional pseudo-random codewords, the encoding process can be applied in different patterns such as different bloc interleaved orderings. We describe this process in the following section and use this to build an indexable set of pseudorandom vectors for write minimized encoding. 4. PRES: A Pseudo-Random Encoding Scheme We can create p different patterns by subdividing B into sub-blocs conceptually represented by the rows (or columns) of a two dimensional matrix. Thus, PRES can simultaneously and independently encode these sub-blocs using PREM in two opposite directions to generate two different pseudo-random codewords. By constructing p different matrices we can generate p codewords. For a partition based pseudo-random generator model to be functional, there are certain requirements on how each matrix be partitioned and encoded. To that end, each particular partitioning corresponds to a particular pattern and results in a unique pseudo-random encoding. However, partitions should attempt to group bits in the sub-blocs such that each partition groups unique bits together and does not repeat bits grouped together in other partitions sub-blocs. If overlapping occurs, those bits are guaranteed to have the same values in the two different codewords, which decreases the randomness of the encoding candidates. Ensuring that the patterns have minimal overlap maes it possible to build a partition based pseudo-random generator model that closely mimics an ideal random encoding generator. One way to do this is to use a single matrix and to apply PREM along different dimensions using different orderings such as along rows, then columns, diagonals, etc. In PRES we propose a two-phase tree structure to generate p codewords in the first phase and to tae these resulting codewords and from each generate p 1 codewords in the second phase, resulting in a total of p (p 1) codewords. We assume each matrix partitions the n bits of B in a square m m matrix, such that each row (or column) of the matrix can be considered a sub-bloc that contains m bits 1. The encoding process is explained as follows: Step 1. The encoder uses p given patterns in the tree structure to partition B into the equal sub-blocs and generate p new partitioned matrices. PREM is independently applied to the sub-blocs of p new partitioned matrices in two different directions to generate p pseudo-random codewords. The generated codewords are different in terms of the used encoding direction or the pattern. Then, each codeword can be re-partitioned into p new matrices to produce p 1 codewords by PREM. Note, the same direction of the original PREM used to generate the first phase codeword is not used as it will essentially reproduce several bits of the original bloc B. Step. The encoder utilizes -bit indices ( = log 4p ) to label the generated codewords (C 0,, C 1) in Step 1 and compare each codeword, concatenated with its corresponding index (i.e., W i), with D. The W i that has the minimum Hamming distance to D is selected. Step 3. Then, W i is written to memory. To retrieve B, the decoder uses the index i in memory to find the corresponding patterns (matrix partitioning) and encoding directions used for the codeword written in memory. Finally, the decoder using Eq. (6) restores the original data bloc by reversing the two phases from the encoding. To clarify the process we provide a detailed example in Figure 3 to generate 16 pseudo-random codewords from a new 16-bit data bloc storing B=0xBC07 and then minimize the number of bits flipped during the write to the memory that currently stores an existing value D=0x Recall that D is the concatenation of a previously stored codeword (16-bits) and index (4-bits) requiring 0-bits of storage to store a 16-bit value. Let us assume bits of B are arranged in a 4 4 matrix [Figure 3(a)]. In the first phase of PRES, 1 It is relatively straightforward to extend this idea to support non-square matrices through repartitioning.

5 PREM is applied to the matrix from Left-to-Right (LR), Right-to-Left (RL), Top-to-Bottom (TB), and Bottom-to- Top (BT). Note, TB is equivalent to applying LR to the transposed matrix. Each function generates one codeword we call a parent [Figure 3(b)]. In the next phase, each parent using the three other functions generates three additional codewords, or children. For instance, the parent generated by LR uses RL, BT and TB functions to create three children [Figure 3(c)]. Each parent and child is provided a 4-bit unique index. PRES then compares 16 generated codewords and index (i.e., W 0,, W 15) with D (i.e., 0x00000) and selects W i with the minimum Hamming weight. According to Figure 3, the bloc with the minimum Hamming weight is five at index i = 4. Thus W 4 is written in memory. It is important to note that PRES actually generates 17 codeword candidates, because the original data B can be a codeword. If there is a desire to use the original data as one of the codeword candidates, rather than use an additional index bit, we can systematically replace one of the generated codewords with B. Also, in this example, only p = patterns (horizontal and vertical) were used to generate 16 pseudo-random codewords. If we have additional index bits (e.g., 6-bits), PRES can utilize other patterns such as southwest-to-northeast and southeast-to-northwest diagonal patterns to increase the generated pseudo-random codewords from 16 to EXPERIMENTAL RESULTS In this section, we evaluate PRES against both Flip-min and FNW. We compare the reduction in bit flips achieved by each scheme with respect to differential write as a baseline. Moreover, we compare the encoding and decoding overhead of each scheme in terms of the number of required operations. Finally, we measure the randomness degree of the data vectors generated by the encoder of each scheme. Our experiments process random data and are applied across various bloc sizes with different space overhead. 5.1 Comparison of Bit Flip Reduction To compare the bit flips reduction over differential write, we assign each scheme 4 auxiliary bits ( = 4) i.e. the base coset of each of the three schemes encompasses 16 different elements. In addition, we experiment with different size blocs to vary the incurred space overhead. Figure 4 shows that PRES is capable of flipping fewer bits than either Flip- Min or FNW across different bloc sizes with varying space overhead. PRES requires up to 15% and 3% fewer bit flips than Flip-Min and FNW, respectively. To achieve higher flip reduction rates, the number of elements of the base coset can be increased to allow the currently stored data to be compared against more data vectors. Accordingly, we extend the number of auxiliary bits to eight (56 elements per coset) and plot our findings in Figure 5. The results show that for the same space overhead as in Figure 4, the flip reduction capabilities of PRES, as well as Flip-Min and FNW, have significantly improved. For a space overhead of 1.5%, the flip reduction capability of PRES increased from 1% to 5% which amounts to a 15% improvement. This improvement comes at a higher computational overhead as a larger base coset requires more elements to compare against. Thus, the number of elements that forms the base coset requires a trade-off between the flip reduction capability and the increase in computational complexity. We note that with a larger base cost, Flip-Min gets closer to the capability of PRES for blocs with higher space overhead. However, the performance overhead of Flip- Min is significantly larger than PRES as we show next. Bit Flip Reduc:on (=4 auxiliary Bits) 0% 15% 10% 5% 0% 3 (1.5%) 64 (6.5%) 18 (3.15%) 56 (1.565%) 51 (0.781%) Bloc Size PRES FlipMin FNW Figure 4: Bit flip reduction over Differential Write with 4 auxiliary bits and different bloc sizes. The value in between the parenthesis represents space overhead. Bit Flip Reduc:on (=8 auxiliary Bits) 5% 0% 15% 10% 5% 0% 64 (1.5%) 18 (6.5%) 56 (3.15%) 51 (1.565%) 104 (0.781%) Bloc Size PRES FlipMin FNW Figure 5: Bit flip reduction over Differential Write with 8 auxiliary bits and different bloc sizes. The value in between the parenthesis represents space overhead. 5. Comparison of Coding Complexity In Table 1, we report the number of required operations to encode n-bit datawords and decode (n + ) bit codewords. While the encoding and decoding process of PRES is merely a series of simple XOR operation, Flip-Min necessitates a series of AND operations on top of the XOR operations. Furthermore, the overall required operations by Flip-Min are significantly more than PRES. For instance, PRES requires less than 10% of the overall required number of operations by Flip-Min to encode 64-bit data words with Base coset of 56 elements. Moreover, PRES requires less than 1.5% of the overall required number of operation by Flip-Min to decode the data. Although Flip-Min can flip as few bits as PRES, it comes at an excessively higher computational overhead than PRES. In comparison to FNW, more operations are required by PRES to encode/decode data. However, PRES significantly outperforms FNW in flip reduction capability as illustrated in Figures 4 and 5. Moreover, Table 1 reveals that PRES overhead is comparable to FNW for decoding. As for the encoding, the operations required by PRES are executed off the critical path and their incurred latency can be mased though buffering techniques. 5.3 Randomness Measurement Figures 4 and 5 showed that PRES outperforms both Flip- Min and FNW in bit flip reduction capability. We attribute those results to the fact that PRES generates a base coset with elements that are more random than the elements of the base cosets of Flip-Min and FNW. NIST SP 800- [0] are well nown tests to measure the randomness of data vectors. Each of the tests reports whether an input data vector is random or not. Accordingly, we have used those

6 Table 1: The number of operations used in PRES, FlipMin and FNW. Scheme PRES FlipMin FNW Operation XOR AND XOR AND XOR AND n-bit Input, n (n + ) 0 ( + 1) (n + ) (n + ) n+ 0 (n+)-bit Output Encoder n=3, = n=64, = n-bit Input, (n + ) 0 n (n + ) n (n + ) n+ 0 (n+)-bit Output Decoder n=3, = n=64, = tests to measure the randomness of the codewords produced by the encoders of PRES, Flip-Min and FNW. Our measurements reveal that the data vectors generated by PRES, FlipMin and FNW pass 98.94%, 95.61% and 88.67% of the tests, respectively. Those findings bac our rationale that the higher the randomness of the base coset elements the higher the rate of bit flip reduction that can be achieved. 6. CONCLUSION In this paper, we have showed that the effectiveness of coset based write minimization techniques is directly correlated with the randomness of the elements that form the base coset. We have proposed PRES as a new scheme to minimize the number of flipped bits to service write requests. PRES encoding is characterized by its capability to generate highly random coset elements. Our analyses and experimental results showed that pseudo-random codewords generated by PRES can lead to fewer bit flips than non-random codewords generated based on linear generator matrices while incurring low performance overhead. Overall, PRES represents a solid foundation to help overcome the challenges of emerging non-volatile memories, paving the way for their deployment in functional systems. References [1] H.-S. Wong, S. Raoux, S. Kim, J. Liang, J. P. Reifenberg, B. Rajendran, M. Asheghi, and K. E. Goodson, Phase change memory, Proceedings of the IEEE, vol. 98, no. 1, pp. 01 7, 010. [] E. Chen, D. Apalov, Z. Diao, A. Drisill-Smith, D. Druist, D. Lottis, V. Niitin, X. Tang, S. Watts, S. Wang et al., Advances and future prospects of spintransfer torque random access memory, Magnetics, IEEE Transactions on, vol. 46, no. 6, pp , 010. [3] M. Rasquinha, D. Choudhary, S. Chatterjee, S. Muhopadhyay, and S. Yalamanchili, An energy efficient cache design using spin torque transfer (stt) ram, in Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design. ACM, 010, pp [4] X. Guo, E. Ipe, and T. Soyata, Resistive computation: avoiding the power wall with low-leaage, sttmram based computing, in ACM SIGARCH Computer Architecture News, vol. 38, no. 3. ACM, 010, pp [5] B. C. Lee, E. Ipe, O. Mutlu, and D. Burger, Architecting phase change memory as a scalable dram alternative, ACM SIGARCH Computer Architecture News, vol. 37, no. 3, pp. 13, 009. [6] M. K. Qureshi, V. Srinivasan, and J. A. Rivers, Scalable high performance main memory system using phase-change memory technology, ACM SIGARCH Computer Architecture News, vol. 37, no. 3, pp. 4 33, 009. [7] Y. Zhang, L. Zhang, W. Wen, G. Sun, and Y. Chen, Multi-level cell stt-ram: Is it realistic or just a dream? in Computer-Aided Design (ICCAD), 01 IEEE/ACM International Conference on, Nov 01, pp [8] S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rettner, Y.-C. Chen, R. M. Shelby, M. Salinga, D. Krebs, S.-H. Chen, H.-L. Lung et al., Phase-change random access memory: A scalable technology, IBM Journal of Research and Development, vol. 5, no. 4.5, pp , 008. [9] W. Wen, Y. Zhang, Y. Chen, Y. Wang, and Y. Xie, Ps3-ram: A fast portable and scalable statistical sttram reliability analysis method, in Proceedings of the 49th Annual Design Automation Conference. ACM, 01, pp [10] Y. Zhang, W. Wen, and Y. Chen, The prospect of sttram scaling from readability perspective, Magnetics, IEEE Transactions on, vol. 48, no. 11, pp , 01. [11] B. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipe, O. Mutlu, and D. Burger, Phase-change technology and the future of main memory, Micro, IEEE, 010. [1] B.-D. Yang, J.-E. Lee, J.-S. Kim, J. Cho, S.-Y. Lee, and B.-G. Yu, A low power phase-change random access memory using a data-comparison write scheme, in Circuits and Systems, 007. ISCAS 007. IEEE International Symposium on. IEEE, 007, pp [13] A. N. Jacobvitz, R. Calderban, and D. J. Sorin, Writing cosets of a convolutional code to increase the lifetime of flash memory, in Communication, Control, and Computing (Allerton), 01 50th Annual Allerton Conference on. IEEE, 01, pp [14] J. Li and K. Mohanram, Write-once-memory-code phase change memory, in Design, Automation and Test in Europe Conference and Exhibition (DATE), 014. IEEE, 014, pp [15] S. Cho and H. Lee, Flip-n-write: a simple deterministic technique to improve pram write performance, energy and endurance, in Microarchitecture, 009. MICRO- 4. 4nd Annual IEEE/ACM International Symposium on. IEEE, 009, pp [16] A. N. Jacobvitz, R. Calderban, and D. J. Sorin, Coset coding to extend the lifetime of memory, in High Performance Computer Architecture (HPCA013), 013 IEEE 19th International Symposium on. IEEE, 013, pp. 33. [17] G. D. Forney Jr, Coset codes. i. introduction and geometrical classification, Information Theory, IEEE Transactions on, vol. 34, no. 5, pp , [18], Coset codes. ii. binary lattices and related codes, Information Theory, IEEE Transactions on, vol. 34, no. 5, pp , [19] I. Reed, A class of multiple-error-correcting codes and the decoding scheme, Information Theory, Transactions of the IRE Professional Group on, vol. 4, no. 4, pp , [0] Ruhin et al., Nist special publication 800-, A statistical test suite for random and pseudorandom number generators for cryptographic applications, 001.

CAFO: Cost Aware Flip Optimization for Asymmetric Memories

CAFO: Cost Aware Flip Optimization for Asymmetric Memories CAFO: Cost Aware Flip Optimization for Asymmetric Memories Rakan Maddah, Seyed Mohammad Seyedzadeh and Rami Melhem Computer Science Department, University of Pittsburgh {rmaddah,seyedzadeh,melhem}@cs.pitt.edu

More information

Energy-Aware Writes to Non-Volatile Main Memory

Energy-Aware Writes to Non-Volatile Main Memory Energy-Aware Writes to Non-Volatile Main Memory Jie Chen Ron C. Chiang H. Howie Huang Guru Venkataramani Department of Electrical and Computer Engineering George Washington University, Washington DC ABSTRACT

More information

DFPC: A Dynamic Frequent Pattern Compression Scheme in NVM-based Main Memory

DFPC: A Dynamic Frequent Pattern Compression Scheme in NVM-based Main Memory DFPC: A Dynamic Frequent Pattern Compression Scheme in NVM-based Main Memory Yuncheng Guo, Yu Hua, Pengfei Zuo Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology Huazhong

More information

A Phase Change Memory as a Secure Main Memory

A Phase Change Memory as a Secure Main Memory A Phase Change Memory as a Secure Main Memory André Seznec To cite this version: André Seznec. A Phase Change Memory as a Secure Main Memory. IEEE Computer Architecture Letters, Institute of Electrical

More information

Phase Change Memory An Architecture and Systems Perspective

Phase Change Memory An Architecture and Systems Perspective Phase Change Memory An Architecture and Systems Perspective Benjamin C. Lee Stanford University bcclee@stanford.edu Fall 2010, Assistant Professor @ Duke University Benjamin C. Lee 1 Memory Scaling density,

More information

Phase Change Memory An Architecture and Systems Perspective

Phase Change Memory An Architecture and Systems Perspective Phase Change Memory An Architecture and Systems Perspective Benjamin Lee Electrical Engineering Stanford University Stanford EE382 2 December 2009 Benjamin Lee 1 :: PCM :: 2 Dec 09 Memory Scaling density,

More information

Area, Power, and Latency Considerations of STT-MRAM to Substitute for Main Memory

Area, Power, and Latency Considerations of STT-MRAM to Substitute for Main Memory Area, Power, and Latency Considerations of STT-MRAM to Substitute for Main Memory Youngbin Jin, Mustafa Shihab, and Myoungsoo Jung Computer Architecture and Memory Systems Laboratory Department of Electrical

More information

Phase-change RAM (PRAM)- based Main Memory

Phase-change RAM (PRAM)- based Main Memory Phase-change RAM (PRAM)- based Main Memory Sungjoo Yoo April 19, 2011 Embedded System Architecture Lab. POSTECH sungjoo.yoo@gmail.com Agenda Introduction Current status Hybrid PRAM/DRAM main memory Next

More information

User-Friendly Sharing System using Polynomials with Different Primes in Two Images

User-Friendly Sharing System using Polynomials with Different Primes in Two Images User-Friendly Sharing System using Polynomials with Different Primes in Two Images Hung P. Vo Department of Engineering and Technology, Tra Vinh University, No. 16 National Road 53, Tra Vinh City, Tra

More information

Correcting Two Deletions and Insertions in Racetrack Memory

Correcting Two Deletions and Insertions in Racetrack Memory 1 Correcting Two Deletions and Insertions in Racetrack Memory Alireza Vahid, Georgios Mappouras, Daniel J. Sorin, Robert Calderbank Department of Electrical and Computer Engineering Duke University Abstract

More information

Fine-Granularity Tile-Level Parallelism in Non-volatile Memory Architecture with Two-Dimensional Bank Subdivision

Fine-Granularity Tile-Level Parallelism in Non-volatile Memory Architecture with Two-Dimensional Bank Subdivision Fine-Granularity Tile-Level Parallelism in Non-volatile Memory Architecture with Two-Dimensional Bank Subdivision Matthew Poremba Pennsylvania State University, AMD Research matthew.poremba@amd.com Tao

More information

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly

More information

Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write

Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write Guangyu Sun 1, Yaojun Zhang 2, Yu Wang 3, Yiran Chen 2 1 Center for Energy-efficient Computing and Applications, Peking University

More information

The Role of Storage Class Memory in Future Hardware Platforms Challenges and Opportunities

The Role of Storage Class Memory in Future Hardware Platforms Challenges and Opportunities The Role of Storage Class Memory in Future Hardware Platforms Challenges and Opportunities Sudhanva Gurumurthi gurumurthi@cs.virginia.edu Multicore Processors Intel Nehalem AMD Phenom IBM POWER6 Future

More information

Live Together or Die Alone: Block Cooperation to Extend Lifetime of Resistive Memories

Live Together or Die Alone: Block Cooperation to Extend Lifetime of Resistive Memories Live Together or Die Alone: Block Cooperation to Extend Lifetime of Resistive Memories Mohammad Khavari Tavana, Amir Kavyan Ziabari, David Kaeli Department of Electrical and Computer Engineering Northeastern

More information

Power of One Bit: Increasing Error Correction Capability with Data Inversion

Power of One Bit: Increasing Error Correction Capability with Data Inversion Power of One Bit: Increasing Error Correction Capability with Data Inversion Rakan Maddah, Sangyeun Cho 2,, and Rami Melhem Computer Science Department, University of Pittsburgh 2 Memory Solutions Lab,

More information

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices Ilhoon Shin Seoul National University of Science & Technology ilhoon.shin@snut.ac.kr Abstract As the amount of digitized

More information

Designing a Fast and Adaptive Error Correction Scheme for Increasing the Lifetime of Phase Change Memories

Designing a Fast and Adaptive Error Correction Scheme for Increasing the Lifetime of Phase Change Memories 2011 29th IEEE VLSI Test Symposium Designing a Fast and Adaptive Error Correction Scheme for Increasing the Lifetime of Phase Change Memories Rudrajit Datta and Nur A. Touba Computer Engineering Research

More information

Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM

Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM Seyed Mohammad Seyedzadeh, Alex K. Jones, Rami Melhem Computer Science Department, Electrical and Computer Engineering

More information

Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM

Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM Seyed Mohammad Seyedzadeh, Alex K. Jones, Rami Melhem Computer Science Department, Electrical and Computer Engineering

More information

Performance Enhancement Guaranteed Cache Using STT-RAM Technology

Performance Enhancement Guaranteed Cache Using STT-RAM Technology Performance Enhancement Guaranteed Cache Using STT-RAM Technology Ms.P.SINDHU 1, Ms.K.V.ARCHANA 2 Abstract- Spin Transfer Torque RAM (STT-RAM) is a form of computer data storage which allows data items

More information

Mitigating Bitline Crosstalk Noise in DRAM Memories

Mitigating Bitline Crosstalk Noise in DRAM Memories Mitigating Bitline Crosstalk Noise in DRAM Memories Seyed Mohammad Seyedzadeh, Donald Kline Jr, Alex K. Jones, Rami Melhem University of Pittsburgh seyedzadeh@cs.pitt.edu,{dek61,akjones}@pitt.edu,melhem@cs.pitt.edu

More information

Emerging NV Storage and Memory Technologies --Development, Manufacturing and

Emerging NV Storage and Memory Technologies --Development, Manufacturing and Emerging NV Storage and Memory Technologies --Development, Manufacturing and Applications-- Tom Coughlin, Coughlin Associates Ed Grochowski, Computer Storage Consultant 2014 Coughlin Associates 1 Outline

More information

Emerging NVM Memory Technologies

Emerging NVM Memory Technologies Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement

More information

Chapter 12 Wear Leveling for PCM Using Hot Data Identification

Chapter 12 Wear Leveling for PCM Using Hot Data Identification Chapter 12 Wear Leveling for PCM Using Hot Data Identification Inhwan Choi and Dongkun Shin Abstract Phase change memory (PCM) is the best candidate device among next generation random access memory technologies.

More information

Cascaded Channel Model, Analysis, and Hybrid Decoding for Spin-Torque Transfer Magnetic Random Access Memory (STT-MRAM)

Cascaded Channel Model, Analysis, and Hybrid Decoding for Spin-Torque Transfer Magnetic Random Access Memory (STT-MRAM) 1/16 Cascaded Channel Model, Analysis, and Hybrid Decoding for Spin-Torque Transfer Magnetic Random Access Memory (STT-MRAM) Kui Cai 1, K.A.S Immink 2, and Zhen Mei 1 Advanced Coding and Signal Processing

More information

Chapter 4: Implicit Error Detection

Chapter 4: Implicit Error Detection 4. Chpter 5 Chapter 4: Implicit Error Detection Contents 4.1 Introduction... 4-2 4.2 Network error correction... 4-2 4.3 Implicit error detection... 4-3 4.4 Mathematical model... 4-6 4.5 Simulation setup

More information

A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing SRAM,

A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing SRAM, A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing, RAM, or edram Justin Bates Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 3816-36

More information

Scalable High Performance Main Memory System Using PCM Technology

Scalable High Performance Main Memory System Using PCM Technology Scalable High Performance Main Memory System Using PCM Technology Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers IBM T. J. Watson Research Center, Yorktown Heights, NY International Symposium on

More information

ECE 341. Lecture # 16

ECE 341. Lecture # 16 ECE 341 Lecture # 16 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 24, 2014 Portland State University Lecture Topics The Memory System Basic Concepts Semiconductor RAM Memories Organization of

More information

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea

More information

AdaMS: Adaptive MLC/SLC Phase-Change Memory Design for File Storage

AdaMS: Adaptive MLC/SLC Phase-Change Memory Design for File Storage B-2 AdaMS: Adaptive MLC/SLC Phase-Change Memory Design for File Storage Xiangyu Dong and Yuan Xie Department of Computer Science and Engineering Pennsylvania State University e-mail: {xydong,yuanxie}@cse.psu.edu

More information

A New Combinatorial Design of Coded Distributed Computing

A New Combinatorial Design of Coded Distributed Computing A New Combinatorial Design of Coded Distributed Computing Nicholas Woolsey, Rong-Rong Chen, and Mingyue Ji Department of Electrical and Computer Engineering, University of Utah Salt Lake City, UT, USA

More information

CS 320 February 2, 2018 Ch 5 Memory

CS 320 February 2, 2018 Ch 5 Memory CS 320 February 2, 2018 Ch 5 Memory Main memory often referred to as core by the older generation because core memory was a mainstay of computers until the advent of cheap semi-conductor memory in the

More information

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Couture: Tailoring STT-MRAM for Persistent Main Memory Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Executive Summary Motivation: DRAM plays an instrumental role in modern

More information

Design and Implementation of a Random Access File System for NVRAM

Design and Implementation of a Random Access File System for NVRAM This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Design and Implementation of a Random Access

More information

Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array

Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array Wujie Wen, Yaojun Zhang, Lu Zhang and Yiran Chen University of Pittsburgh Loadsa: a slang language means lots of Outline Introduction

More information

Tree-Based Minimization of TCAM Entries for Packet Classification

Tree-Based Minimization of TCAM Entries for Packet Classification Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.

More information

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections ) Lecture 8: Virtual Memory Today: DRAM innovations, virtual memory (Sections 5.3-5.4) 1 DRAM Technology Trends Improvements in technology (smaller devices) DRAM capacities double every two years, but latency

More information

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors , July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As

More information

COMPRESSION ARCHITECTURE FOR BIT-WRITE REDUCTION IN NON-VOLATILE MEMORY TECHNOLOGIES. David Dgien. Submitted to the Graduate Faculty of

COMPRESSION ARCHITECTURE FOR BIT-WRITE REDUCTION IN NON-VOLATILE MEMORY TECHNOLOGIES. David Dgien. Submitted to the Graduate Faculty of COMPRESSION ARCHITECTURE FOR BIT-WRITE REDUCTION IN NON-VOLATILE MEMORY TECHNOLOGIES by David Dgien B.S. in Computer Engineering, University of Pittsburgh, 2012 Submitted to the Graduate Faculty of the

More information

A Universal Test Pattern Generator for DDR SDRAM *

A Universal Test Pattern Generator for DDR SDRAM * A Universal Test Pattern Generator for DDR SDRAM * Wei-Lun Wang ( ) Department of Electronic Engineering Cheng Shiu Institute of Technology Kaohsiung, Taiwan, R.O.C. wlwang@cc.csit.edu.tw used to detect

More information

Exploiting Unused Spare Columns to Improve Memory ECC

Exploiting Unused Spare Columns to Improve Memory ECC 2009 27th IEEE VLSI Test Symposium Exploiting Unused Spare Columns to Improve Memory ECC Rudrajit Datta and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering

More information

A Survey of Power Management Techniques for Phase Change Memory. Sparsh Mittal. 1 Introduction

A Survey of Power Management Techniques for Phase Change Memory. Sparsh Mittal. 1 Introduction A Survey of Power Management Techniques for Phase Change Memory Sparsh Mittal Future Technologies Group Oak Ridge National Laboratory Tennessee, USA 37830 Email: mittals@ornl.gov Abstract: The demands

More information

Leveraging ECC to Mitigate Read Disturbance, False Reads Mitigating Bitline Crosstalk Noise in DRAM Memories and Write Faults in STT-RAM

Leveraging ECC to Mitigate Read Disturbance, False Reads Mitigating Bitline Crosstalk Noise in DRAM Memories and Write Faults in STT-RAM 1 MEMSYS 2017 DSN 2016 Leveraging ECC to Mitigate ead Disturbance, False eads Mitigating Bitline Crosstalk Noise in DAM Memories and Write Faults in STT-AM Mohammad Seyedzadeh, akan. Maddah, Alex. Jones,

More information

AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1

AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 Virgil Andronache Richard P. Simpson Nelson L. Passos Department of Computer Science Midwestern State University

More information

Utilizing PCM for Energy Optimization in Embedded Systems

Utilizing PCM for Energy Optimization in Embedded Systems Utilizing PCM for Energy Optimization in Embedded Systems Zili Shao, Yongpan Liu, Yiran Chen,andTaoLi Department of Computing, The Hong Kong Polytechnic University, cszlshao@comp.polyu.edu.hk Department

More information

Cache Memory Configurations and Their Respective Energy Consumption

Cache Memory Configurations and Their Respective Energy Consumption Cache Memory Configurations and Their Respective Energy Consumption Dylan Petrae Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 32816-2362 Abstract When it

More information

A Page-Based Storage Framework for Phase Change Memory

A Page-Based Storage Framework for Phase Change Memory A Page-Based Storage Framework for Phase Change Memory Peiquan Jin, Zhangling Wu, Xiaoliang Wang, Xingjun Hao, Lihua Yue University of Science and Technology of China 2017.5.19 Outline Background Related

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256

More information

A Low-Cost Correction Algorithm for Transient Data Errors

A Low-Cost Correction Algorithm for Transient Data Errors A Low-Cost Correction Algorithm for Transient Data Errors Aiguo Li, Bingrong Hong School of Computer Science and Technology Harbin Institute of Technology, Harbin 150001, China liaiguo@hit.edu.cn Introduction

More information

MORPHOLOGICAL BOUNDARY BASED SHAPE REPRESENTATION SCHEMES ON MOMENT INVARIANTS FOR CLASSIFICATION OF TEXTURES

MORPHOLOGICAL BOUNDARY BASED SHAPE REPRESENTATION SCHEMES ON MOMENT INVARIANTS FOR CLASSIFICATION OF TEXTURES International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 125-130 MORPHOLOGICAL BOUNDARY BASED SHAPE REPRESENTATION SCHEMES ON MOMENT INVARIANTS FOR CLASSIFICATION

More information

Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation

Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation http://dx.doi.org/10.5573/jsts.2012.12.4.418 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.4, DECEMBER, 2012 Efficient Implementation of Single Error Correction and Double Error Detection

More information

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC Lecture 9: Group Communication Operations Shantanu Dutt ECE Dept. UIC Acknowledgement Adapted from Chapter 4 slides of the text, by A. Grama w/ a few changes, augmentations and corrections Topic Overview

More information

A NOVEL SECURED BOOLEAN BASED SECRET IMAGE SHARING SCHEME

A NOVEL SECURED BOOLEAN BASED SECRET IMAGE SHARING SCHEME VOL 13, NO 13, JULY 2018 ISSN 1819-6608 2006-2018 Asian Research Publishing Network (ARPN) All rights reserved wwwarpnjournalscom A NOVEL SECURED BOOLEAN BASED SECRET IMAGE SHARING SCHEME Javvaji V K Ratnam

More information

Nonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian

Nonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian Nonblocking Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian Latency (ns) History of DRAM 2 Refresh Latency Bus Cycle Time Min. Read Latency 512 550 16 13.5 0.5 0.75 1968 DRAM

More information

P2FS: supporting atomic writes for reliable file system design in PCM storage

P2FS: supporting atomic writes for reliable file system design in PCM storage LETTER IEICE Electronics Express, Vol.11, No.13, 1 6 P2FS: supporting atomic writes for reliable file system design in PCM storage Eunji Lee 1, Kern Koh 2, and Hyokyung Bahn 2a) 1 Department of Software,

More information

Construction C : an inter-level coded version of Construction C

Construction C : an inter-level coded version of Construction C Construction C : an inter-level coded version of Construction C arxiv:1709.06640v2 [cs.it] 27 Dec 2017 Abstract Besides all the attention given to lattice constructions, it is common to find some very

More information

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2 ISSN 2277-2685 IJESR/November 2014/ Vol-4/Issue-11/799-807 Shruti Hathwalia et al./ International Journal of Engineering & Science Research DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL ABSTRACT

More information

RDIS: Tolerating Many Stuck-At Faults in Resistive Memory

RDIS: Tolerating Many Stuck-At Faults in Resistive Memory RDIS: Tolerating Many Stuck-At Faults in Resistive Memory Rakan Maddah, Member, IEEE, Rami Melhem, Fellow, IEEE, and Sangyeun Cho, Senior Member, IEEE Abstract With their potential for high scalability

More information

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China doi:10.21311/001.39.7.41 Implementation of Cache Schedule Strategy in Solid-state Disk Baoping Wang School of software, Nanyang Normal University, Nanyang 473061, Henan, China Chao Yin* School of Information

More information

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 35, NO. 9, SEPTEMBER

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 35, NO. 9, SEPTEMBER IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 35, NO. 9, SEPTEMBER 2016 1461 Adapting B + -Tree for Emerging Nonvolatile Memory-Based Main Memory Ping Chi, Student

More information

Revolutionizing Technological Devices such as STT- RAM and their Multiple Implementation in the Cache Level Hierarchy

Revolutionizing Technological Devices such as STT- RAM and their Multiple Implementation in the Cache Level Hierarchy Revolutionizing Technological s such as and their Multiple Implementation in the Cache Level Hierarchy Michael Mosquera Department of Electrical and Computer Engineering University of Central Florida Orlando,

More information

WAM: Wear-Out-Aware Memory Management for SCRAM-Based Low Power Mobile Systems

WAM: Wear-Out-Aware Memory Management for SCRAM-Based Low Power Mobile Systems D. Seo and D. Shin: WAM: Wear-Out-Aware Memory Management for SCRAM-Based Low Power Mobile Systems 803 WAM: Wear-Out-Aware Memory Management for SCRAM-Based Low Power Mobile Systems Dongyoung Seo and Dongkun

More information

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can 208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer

More information

Aegis: Partitioning Data Block for Efficient Recovery of Stuck-at-Faults in Phase Change Memory

Aegis: Partitioning Data Block for Efficient Recovery of Stuck-at-Faults in Phase Change Memory Aegis: Partitioning Data Block for Efficient Recovery of Stuck-at-Faults in Phase Change Memory Jie Fan fanj11@mails.tsinghua.edu.cn Song Jiang sjiang@eng.wayne.edu Jiwu Shu shujw@tsinghua.edu.cn Youhui

More information

EE 6900: FAULT-TOLERANT COMPUTING SYSTEMS

EE 6900: FAULT-TOLERANT COMPUTING SYSTEMS EE 6900: FAULT-TOLERANT COMPUTING SYSTEMS LECTURE 6: CODING THEORY - 2 Fall 2014 Avinash Kodi kodi@ohio.edu Acknowledgement: Daniel Sorin, Behrooz Parhami, Srinivasan Ramasubramanian Agenda Hamming Codes

More information

Efficient Algorithm for Test Vector Decompression Using an Embedded Processor

Efficient Algorithm for Test Vector Decompression Using an Embedded Processor Efficient Algorithm for Test Vector Decompression Using an Embedded Processor Kamran Saleem and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering University

More information

Efficiently Utilizing ATE Vector Repeat for Compression by Scan Vector Decomposition

Efficiently Utilizing ATE Vector Repeat for Compression by Scan Vector Decomposition Efficiently Utilizing ATE Vector Repeat for Compression by Scan Vector Decomposition Jinkyu Lee and Nur A. Touba Computer Engineering Research Center University of Teas, Austin, TX 7872 {jlee2, touba}@ece.uteas.edu

More information

Use of Shape Deformation to Seamlessly Stitch Historical Document Images

Use of Shape Deformation to Seamlessly Stitch Historical Document Images Use of Shape Deformation to Seamlessly Stitch Historical Document Images Wei Liu Wei Fan Li Chen Jun Sun Satoshi Naoi In China, efforts are being made to preserve historical documents in the form of digital

More information

Performance and Power Solutions for Caches Using 8T SRAM Cells

Performance and Power Solutions for Caches Using 8T SRAM Cells Performance and Power Solutions for Caches Using 8T SRAM Cells Mostafa Farahani Amirali Baniasadi Department of Electrical and Computer Engineering University of Victoria, BC, Canada {mostafa, amirali}@ece.uvic.ca

More information

Routability-Driven Bump Assignment for Chip-Package Co-Design

Routability-Driven Bump Assignment for Chip-Package Co-Design 1 Routability-Driven Bump Assignment for Chip-Package Co-Design Presenter: Hung-Ming Chen Outline 2 Introduction Motivation Previous works Our contributions Preliminary Problem formulation Bump assignment

More information

Microcontroller Systems. ELET 3232 Topic 11: General Memory Interfacing

Microcontroller Systems. ELET 3232 Topic 11: General Memory Interfacing Microcontroller Systems ELET 3232 Topic 11: General Memory Interfacing 1 Objectives To become familiar with the concepts of memory expansion and the data and address bus To design embedded systems circuits

More information

Cascaded Coded Distributed Computing on Heterogeneous Networks

Cascaded Coded Distributed Computing on Heterogeneous Networks Cascaded Coded Distributed Computing on Heterogeneous Networks Nicholas Woolsey, Rong-Rong Chen, and Mingyue Ji Department of Electrical and Computer Engineering, University of Utah Salt Lake City, UT,

More information

Analysis of Cache Configurations and Cache Hierarchies Incorporating Various Device Technologies over the Years

Analysis of Cache Configurations and Cache Hierarchies Incorporating Various Device Technologies over the Years Analysis of Cache Configurations and Cache Hierarchies Incorporating Various Technologies over the Years Sakeenah Khan EEL 30C: Computer Organization Summer Semester Department of Electrical and Computer

More information

Enabling Node Repair in Any Erasure Code for Distributed Storage

Enabling Node Repair in Any Erasure Code for Distributed Storage Enabling Node Repair in Any Erasure Code for Distributed Storage K. V. Rashmi, Nihar B. Shah, and P. Vijay Kumar, Fellow, IEEE Abstract Erasure codes are an efficient means of storing data across a network

More information

Lossless Compression using Efficient Encoding of Bitmasks

Lossless Compression using Efficient Encoding of Bitmasks Lossless Compression using Efficient Encoding of Bitmasks Chetan Murthy and Prabhat Mishra Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL 326, USA

More information

Post-Manufacturing ECC Customization Based on Orthogonal Latin Square Codes and Its Application to Ultra-Low Power Caches

Post-Manufacturing ECC Customization Based on Orthogonal Latin Square Codes and Its Application to Ultra-Low Power Caches Post-Manufacturing ECC Customization Based on Orthogonal Latin Square Codes and Its Application to Ultra-Low Power Caches Rudrajit Datta and Nur A. Touba Computer Engineering Research Center The University

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 6 Coding I Chapter 3 Information Redundancy Part.6.1 Information Redundancy - Coding A data word with d bits is encoded

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN 255 CORRECTIONS TO FAULT SECURE OF MAJORITY LOGIC DECODER AND DETECTOR FOR MEMORY APPLICATIONS Viji.D PG Scholar Embedded Systems Prist University, Thanjuvr - India Mr.T.Sathees Kumar AP/ECE Prist University,

More information

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT K.Sandyarani 1 and P. Nirmal Kumar 2 1 Research Scholar, Department of ECE, Sathyabama

More information

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory Scalable and Energy-Efficient Architecture Lab (SEAL) PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in -based Main Memory Ping Chi *, Shuangchen Li *, Tao Zhang, Cong

More information

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

International Journal of Digital Application & Contemporary research Website:   (Volume 1, Issue 7, February 2013) Programmable FSM based MBIST Architecture Sonal Sharma sonal.sharma30@gmail.com Vishal Moyal vishalmoyal@gmail.com Abstract - SOCs comprise of wide range of memory modules so it is not possible to test

More information

Deduction and Logic Implementation of the Fractal Scan Algorithm

Deduction and Logic Implementation of the Fractal Scan Algorithm Deduction and Logic Implementation of the Fractal Scan Algorithm Zhangjin Chen, Feng Ran, Zheming Jin Microelectronic R&D center, Shanghai University Shanghai, China and Meihua Xu School of Mechatronical

More information

Improved Reversible Data Hiding in Encrypted Images Based on Reserving Room After Encryption and Pixel Prediction

Improved Reversible Data Hiding in Encrypted Images Based on Reserving Room After Encryption and Pixel Prediction Improved Reversible Data Hiding in Encrypted Images Based on Reserving Room After Encryption and Pixel Prediction Ioan Catalin Dragoi, Henri-George Coanda and Dinu Coltuc Electrical Engineering Dept. Valahia

More information

An Enhanced Mixed-Scaling-Rotation CORDIC algorithm with Weighted Amplifying Factor

An Enhanced Mixed-Scaling-Rotation CORDIC algorithm with Weighted Amplifying Factor SEAS-WP-2016-10-001 An Enhanced Mixed-Scaling-Rotation CORDIC algorithm with Weighted Amplifying Factor Jaina Mehta jaina.mehta@ahduni.edu.in Pratik Trivedi pratik.trivedi@ahduni.edu.in Serial: SEAS-WP-2016-10-001

More information

Cache Memory Introduction and Analysis of Performance Amongst SRAM and STT-RAM from The Past Decade

Cache Memory Introduction and Analysis of Performance Amongst SRAM and STT-RAM from The Past Decade Cache Memory Introduction and Analysis of Performance Amongst S and from The Past Decade Carlos Blandon Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 386-36

More information

Performance Optimization of HVD: An Error Detection and Correction Code

Performance Optimization of HVD: An Error Detection and Correction Code Abstract Research Journal of Engineering Sciences ISSN 2278 9472 Performance Optimization of HVD: An Error Detection and Correction Code Fadnavis Shubham Department of Electronics and Communication, Acropolis

More information

DETECTION AND CORRECTION OF CELL UPSETS USING MODIFIED DECIMAL MATRIX

DETECTION AND CORRECTION OF CELL UPSETS USING MODIFIED DECIMAL MATRIX DETECTION AND CORRECTION OF CELL UPSETS USING MODIFIED DECIMAL MATRIX ENDREDDY PRAVEENA 1 M.T.ech Scholar ( VLSID), Universal College Of Engineering & Technology, Guntur, A.P M. VENKATA SREERAJ 2 Associate

More information

Chord-based Key Establishment Schemes for Sensor Networks

Chord-based Key Establishment Schemes for Sensor Networks Chord-based Key Establishment Schemes for Sensor Networks Fan Zhang, Zhijie Jerry Shi, Bing Wang Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269 Abstract Because

More information

Emerging NVM Enabled Storage Architecture:

Emerging NVM Enabled Storage Architecture: Emerging NVM Enabled Storage Architecture: From Evolution to Revolution. Yiran Chen Electrical and Computer Engineering University of Pittsburgh Sponsors: NSF, DARPA, AFRL, and HP Labs 1 Outline Introduction

More information

SF-LRU Cache Replacement Algorithm

SF-LRU Cache Replacement Algorithm SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,

More information

J. Manikandan Research scholar, St. Peter s University, Chennai, Tamilnadu, India.

J. Manikandan Research scholar, St. Peter s University, Chennai, Tamilnadu, India. Design of Single Correction-Double -Triple -Tetra (Sec-Daed-Taed- Tetra Aed) Codes J. Manikandan Research scholar, St. Peter s University, Chennai, Tamilnadu, India. Dr. M. Manikandan Associate Professor,

More information

Memory technology and optimizations ( 2.3) Main Memory

Memory technology and optimizations ( 2.3) Main Memory Memory technology and optimizations ( 2.3) 47 Main Memory Performance of Main Memory: Latency: affects Cache Miss Penalty» Access Time: time between request and word arrival» Cycle Time: minimum time between

More information

SAS Meets Big Iron: High Performance Computing in SAS Analytic Procedures

SAS Meets Big Iron: High Performance Computing in SAS Analytic Procedures SAS Meets Big Iron: High Performance Computing in SAS Analytic Procedures Robert A. Cohen SAS Institute Inc. Cary, North Carolina, USA Abstract Version 9targets the heavy-duty analytic procedures in SAS

More information

Low Cost Convolutional Code Based Concurrent Error Detection in FSMs

Low Cost Convolutional Code Based Concurrent Error Detection in FSMs Low Cost Convolutional Code Based Concurrent Error Detection in FSMs Konstantinos Rokas & Yiorgos Makris Electrical Engineering Department Yale University {konstantinos.rokas, yiorgos.makris}@yale.edu

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

The p-sized partitioning algorithm for fast computation of factorials of numbers

The p-sized partitioning algorithm for fast computation of factorials of numbers J Supercomput (2006) 38:73 82 DOI 10.1007/s11227-006-7285-5 The p-sized partitioning algorithm for fast computation of factorials of numbers Ahmet Ugur Henry Thompson C Science + Business Media, LLC 2006

More information