Article begins on next page

Size: px
Start display at page:

Download "Article begins on next page"

Transcription

1 Title: A 19.4 nj/ 364K s/s in-memory random forest classifier in 6T SRAM array Archived version Accepted manuscript: the content is identical to the published paper, but without the final typesetting by the publisher Published version DOI : DOI: 1.119/ESSCIRC Conference homepage Authors (contact) Mingu Kang (mkang17@illinois.edu) Sujan K. Gonugondla (gonugon2@illinois.edu) Naresh R. Shanbhag (shanbhag@illinois.edu) Affiliation University of Illinois at Urbana Champaign Article begins on next page

2 A 19.4 nj/ 364K s/s In-memory Random Forest Classifier in 6T SRAM Array Mingu Kang, Sujan K. Gonugondla, Naresh R. Shanbhag Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, USA. Abstract This paper presents IC realization of a random forest (RF) machine learning classifier. Algorithm-architecturecircuit is co-optimized to minimize the energy-delay product (EDP). Deterministic subsampling (DSS) and balanced trees result in reduced interconnect complexity and avoid irregular memory accesses. Low-swing analog in-memory computations embedded in a standard 6T SRAM enable massively parallel processing thereby minimizing the memory fetches and reducing the EDP further. The 65nm CMOS prototype achieves a 6.8 lower EDP compared to a conventional design at the same accuracy (94%) for an 8-class traffic sign recognition problem. Keywords machine learning; random forest; in-memory computing; pattern recognition; traffic sign recognition I. INTRODUCTION The random forest (RF) classifier [1] is attractive due to its high-accuracy, simple operations (comparisons), applicability to multi-class problems, and robustness to non-ideal computations due to its majority voting based- [1]. However, realizing an energy-efficient implementation of the RF algorithm is made challenging due to its high data access rate combined with its highly irregular data access pattern. This paper presents an energy-efficient and high throughput RF classifier IC by employing: 1) deterministic subsampling (DSS) to reduce interconnect complexity, 2) balanced tree to regularize memory access pattern, 3) deeply embedded analog computations [3,4,5] in the periphery of an SRAM bitcell array (BCA) to exploit the inherent algorithmic error tolerance. To the best of our knowledge, this is the first IC implementation of the RF algorithm as only FPGAs, GPUs, and multi-core processor implementations of the RF algorithm [2] exist today. These fail to take advantage of the opportunities afforded by analog computations. II. THE RF ALGORITHM This section explains the RF algorithm and its implementation challenges. A. RF Algorithm P 1 leaf nodes chosen path in each tree Input (X) RSS RSS RSS tree 1 tree 2 tree M label1 P 2 label2 voter P M node m,n > yes τ m,n labelm no P1 RSS RSS RSS balanced tree 1 label1 Input (X) P2 label2 4:1 DSS balanced tree 2 voter PM balanced tree M labelm OPs per tree Memory accesses Data p m,n c m,l τ m,n Comp. > τ m,n 8 Bit precision 6 4 /4 8 Size 21.5 (Byte) /31 /16 /31 - # of 3.5 OPs /4 /2 /4 - proposed / conventional 8 bytes per SRAM access assumed 31 /31 Cross bar Mux ratio 64:1 /256:1 1/1 τ m,n: threshold level of n th node in m th tree p m,n: pixel index of n th node in m th tree P m: [p m,1, p m,2, p m,n] RSS: Random subsampling by sample pattern P m : p m,n th pixel of input image X c m,l: label corresponding to l th leaf node in m th tree (m: 1~M, n: 1~N, l: 1~N+1) (a) (b) (c) Fig. 1. Random forest algorithm: (a) conventional, (b) proposed w/ deterministic subsample (DSS), and (c) number of required operations.

3 The RF algorithm (Fig. 1(a)) consists of M trees. The m-th tree processes data obtained by random subsampling (RSS) the input image (X) using a pseudo-random pattern vector P m. The n-th node in the m-th tree compares x(p (m,n) ), which is the pixel (or feature) indexed by p (m,n), with a threshold τ (m,n) to obtain a node-level binary q (m,n). Either the left or right branch is taken based on q (m,n). This process is repeated until a leaf node is reached. The label c (m,l) corresponding to the l-th leaf node is the tree-level. The final is obtained by majority-voting the M treelevel s. B. Implementation Challenges Two different architectures can be considered to implement the RF algorithm: serial and parallel architectures. A serial architecture needs to process nodes sequentially resulting in large delay and requires reading of two 11-b (for a 16 KB array) child node addresses per node, which takes roughly of the storage space. On the other hand, a fully parallel architecture computes all q (m,n) in parallel and uses these to address a look-up-table (LUT) to obtain c (m,l). Doing so requires a large number of memory accesses, e.g., 78 8-b bytes per tree (Fig. 1(c)), which in turn limits the achievable throughput and energy efficiency. Additionally, a complex (i.e., 256:1 with image X) crossbar is needed to route the pixel indexed by p (m,n) from X for comparison. III. THE PROPOSED RF ALGORITHM AND ARCHITECTURE This paper co-optimizes the algorithm and architecture to achieve energy and throughput benefits. A. The Proposed RF Algorithm The modified RF algorithm (Fig. 1(b)) employs a fixed pattern deterministic subsampling (DSS) step prior to RSS to solve the crossbar problem mentioned above. A 4:1 DSS factor is chosen to balance the loss in classification accuracy with crossbar complexity. The complexity of the RSS crossbar is reduced from 256:1 to 64:1 when the input X is a image. Thus, the precision of p (m,n) is also reduced from 8-b to 6-b. Additionally, the trees are balanced (Fig. 1(b)) by filling some empty nodes in order to regularize the memory access pattern. The memory access problem is addressed by reducing the number of memory accesses via in-memory comparison (Fig. 3) eliminating the need to fetch τ (m,n). The Class ADD generator (CAG) generates the address of chosen c (m,l) from q (m,n) s eliminating the need to fetch all the c (m,l) s. Only 24.5 bytes of data need to be fetched per tree compared to 78 bytes/tree in the parallel architecture. B. Proposed Architecture and Operations The proposed RF architecture (Fig. 2(a)) includes a SRAM BCA, multi-row wordline (WL) driver, 64-b I/O with a 4:1 column mux, DSS input buffer to store streamed X, RSS crossbars, CAG, label finder, majority voter, and the peripherals for standard read/write operations. A group of four trees are processed in parallel and 16 such groups are processed sequentially for a total of M = 64 trees. The classifier first: 1) writes the pixel index register, 2) enables crossbar, 3) does inmemory comparison enabled by multi-row WL driver and analog comparators, 4) sequentially fetches four tree-level labels using address generated by CAG, and 5) majority votes CORE CTRL Group 1 Group M DSS RSS In-memory comparison 64-b BUS Input buffer (X) with DSS X 1,5,,253 X 2,6,,254 X 3,7,,255 X 4,8,,256 (x) (x) SRAM Replica bit-cell array T SRAM bit-cell array tree 1 Group 1 tree 2 tree 3 tree 4 p 1,1~31 p 2,1~31 p 3,1~31 p 4,1~31 τ 1,1~31 τ 2,1~31 τ 3,1~31 τ 4,1~31 c 1,1~32 c 2,1~32 c 3,1~32 c 4,1~32 Group 2 Group 42 (x) Normal read/write circuitry (X(p m,n)) 64-b IO p m~(m+3),1~31 (x) Multi-row WL driver w/ row dec. m~(m+3) EN EN q[1:4] CTRL, ADD CTRL, ADD m class ADD gen. (CAG) ADDm~(m+3) voter - : pixel index register - : crossbar - : RSS register - : analog comparators Pixel index Cross bar Enable Replica cell Write In-memory Comp. vote Pixel index tree 1 tree 2 tree 3 12 reads P 1,1~31 1 MR-read 2 reads 3 reads 3 reads 3 reads 3 reads P 2,1~31 P 3,1~31 tree 4 P 4,1~31 row i row (i+1) left right left right 1 read 1 read 32 bits including 1 32 bits including 2 1 MR-read 2 reads 32 bits including 3 32 bits including 4 (a) Fig. 2. Proposed RF: (a) architecture, and (b) timing diagram. (b)

4 in the final tree. C. In-memory comparison In-memory comparison requires the 8-b thresholds τ (m,n) (T in Fig. 3(a)) and the indexed pixels x(p (m,n) ) (X in Fig. 3(a)) to be stored in a column major pattern, i.e., bits of a word are stored in a column. The comparison begins with the simultaneous application of WL access pulses with binaryweighted pulse widths to all the rows storing T and X_B. Here, the pulse width is proportional to the bit position. Doing so creates a bitline (BL) voltage swing ΔV BLB (ΔV BL ) proportional to T-X (X-T) [3,4]. Linearity of this multi-row read is improved by reading 4-b MSBs and LSBs separately from adjacent columns followed by a capacitively-weighted charge sharing that assigns 16 greater weight to the MSBs. The WL voltage is reduced (e.g.,.65v) to prevent destructive read and improve the linearity further. Storing the X_B in the replica bit-cell array allows fast writing through a separate write BL (WBL) and wordline (WWL) by eliminating the overheads of slow write operation into normal BCA. The feed into analog comparators to generate node-level s (q). In-memory comparison is an intrinsically and massively parallel operation as it processes all b words in parallel from 256 columns whereas conventional memory fetches only 64 bits (= 8 words) per read access when the sense amplifier is shared across four columns. In addition, multi-row read saves energy by accessing 4 bits per precharge. A. Component-level Accuracy Characterization Measured in-memory comparison results show (Fig. 3(b)) the comparator error rate increasing from 1.6% to 14.5% as ΔV BL reduces from 25mV to 5mV. The RF algorithm with 64 trees needs an error rate of less than 9.5% at comparator output q to avoid a discernable 8-class classification accuracy loss. Four trees tolerate only 4% error restricting further reduction in ΔV BL. B. Application-level Accuracy, Energy, and Throughput Measured results (Fig. 4) of energy vs. accuracy trade-off for the binary classification (face detection) with 64 trees show the proposed IC achieves a 3.1 energy savings over the conventional architecture (SRAM + digital processor). The energy of the conventional architecture is obtained via postlayout simulations of the digital blocks and read access energy measured from the prototype IC. This energy savings come from multi-row read, in-memory comparison, and lowcomplexity cross bar. Fewer memory accesses also reduce the delay by 2.2 over a conventional architecture, thereby providing a 6.8 lower energy-delay product (EDP) at the same accuracy of > 93% as the conventional architecture. The prototype IC achieves a throughput of 364K s/s and energy efficiency of 19.4 nj/, achieving at least 5.6 smaller EDP compared to prior multi-class classifier ICs as listed in Table II. _EN ΔV BL RWL WWL RWL 1 WWL 1 RWL 2 WWL 2 RWL 3 WWL 3 WL i+ WL i+1 WL i+2 WL i+3 WBL IV. BL q > BLB x 3 x 2 x 1 x t 3 t 2 t 1 t (a) CHIP MEASURED RESULTS The in-memory RF classifier is implemented in a 65nm CMOS process (chip micrograph in Fig. 5 and summarized in Table I) to prove the application-level s benefits. ΔV BLB Replica bit-cells 6-T SRAM bit-cells Comparison error rate (%) WL i+ & RWL WL i+1 & RWL 1 WL i+2 & RWL 2 WL i+3 & RWL 2 _EN q *minimum ΔV BL to achieve classification accuracy 93% V WL<V DD ΔV BL X + T X T ΔV BLB X + T T X 1 if X > T ( V BL< VBLB) q =, otherwise with 64 trees* with 4 trees* ΔV BL per LSB (mv) Fig. 3. In-memory comparison: (a) bit-cell column for in-memory comparison of T and X, and (b) measured accuracy of comparison. (b) Core energy per (nj) Proposed Energy Proposed Accuracy Conv. Energy Conv. Accuracy ΔV BL per LSB for proposed (mv) Fig. 4. Energy vs. error rate w.r.t ΔV BL with 64 trees (binary classification), where ΔV BL of Conv. = 8 ΔV BL per LSB Classification error rate (1-P DET ) (%)

5 Table I: Chip summary. Technology 65 nm CMOS Die size mm SRAM capacity 16 KB ( bit-cells) Bit-cell size um 2 CTRL CLK freq. Supply voltage (V) Energy per (4 trees, 64 tress)(nj) Decision throughput (s/s) (4 trees, 64 trees) V. CONCLUSION 1 GHz CORE 1. CTRL.75 CORE (.9, 14.4) CTRL (.3, 5.) (5.6M, 364k) This paper has presented an IC realization of random forest (RF) algorithm to achieve energy-efficient and high throughput by co-optimizing algorithm, architecture, and circuit design. As a result, the prototype IC achieves a 3.1 energy savings and 2.2 speed-up at the same time providing a 6.8 lower energy-delay product (EDP) at the same accuracy of > 93% compared to conventional digital architecture. As a result, the proposed IC achieves a throughput of 364K s/s and energy efficiency of 19.4 nj/. To the best of our knowledge, this is the first IC realization of the RF algorithm. The benefits of the proposed architecture are expected to increase with image resolution and data size. This is because the subsampling ratio can be increased without losing classification accuracy and the random noise components in the low-swing analog in-memory comparison get averaged out better with data size. ACKNOWLEDGMENT This work was supported by Systems on Nanoscale Information fabrics (SONIC), one of the six SRC STARnet Centers, sponsored by SRC and DARPA. The authors would 1.2 mm 64-b bus Bitcell Array Fig. 5. Chip micrograph. Input buffer & Pixel index register & Cross bar Analog comparators Replica bitcell array R/W Bitcell Array 1.2 mm Digital CTRL Test block Decision like to acknowledge constructive discussions with S. Eilert, K. Curewitz, N. Verma, B. Murmann, and P. Hanumolu. REFERENCES [1] L. Breiman, Random forests, Machine Learning, vol. 45, 1. [2] B. Van Essen, C. Macaraeg, M. Gokhale, and R. Prenger, Accelerating a random forest classifier: Multi-core, GP-GPU, or FPGA?, IEEE FCCM, 12. [3] M. Kang, M.S. Keel, N.R. Shanbhag, S. Eilert, & K. Curewitz, An Energy-efficient VLSI Architecture for Pattern Recognition via Deep Embedding of Computation in SRAM, IEEE ICASSP, 14. [4] M. Kang, S. Gonugondla, A. Patil, and N. Shanbhag, A 481pJ/ 3.4M /s multifunctional deep in-memory inference processor using standard 6T SRAM array, arxiv preprint arxiv: , 16. [5] J. Zhang, Z. Wang, and N. Verma, In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array, IEEE JSSC, 17. [6] J. Park, et al., A 92-mW Real-Time Traffic Sign Recognition System with Robust Illumination Adaptation and Support Vector Machine, IEEE JSSC, 12. [7] H. Kaul, et al., A 21.5M-Query-Vectors/s 3.37nJ/Vector Reconfigurable k-nearest-neighbor Accelerator with Adaptive Precision in 14nm Tri-Gate CMOS, ISSCC Dig. Tech. Papers, 16. Prior art [6] [7] Ours (M=64) Table II: Comparison with prior arts. Input Throughput Energy Process Algorithm Dataset Size (/s) (nj/) (8b) 13nm CMOS 14nm tri-gate 65nm CMOS Support Vector Machine K-nearest Neighbor Random Forest Traffic sign video Not reported KUL traffic signs [K]* 21.5M [498.8K]* 364.4K 1.5M [125]* 3.4 [145.3]* 19.4 (w/ CTRL) EDP (fjs/) 45G [3125]*.2 [292.3]* Accuracy 9% Not reported % *throughput & energy scaled to a 65nm process w/ pixels; SRAM memory access cost not included

A 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array. Mingu Kang, Sujan Gonugondla, Naresh Shanbhag

A 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array. Mingu Kang, Sujan Gonugondla, Naresh Shanbhag A 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array Mingu Kang, Sujan Gonugondla, Naresh Shanbhag University of Illinois at Urbana Champaign Machine Learning under Resource

More information

arxiv: v1 [cs.ar] 24 Oct 2016

arxiv: v1 [cs.ar] 24 Oct 2016 A 481pJ/decision 3.4M decision/s Multifunctional Deep In-memory Inference Processor using Standard 6T SRAM Array arxiv:161.751v1 [cs.ar] 24 Oct 216 Mingu Kang, Sujan Gonugondla, Ameya Patil, and Naresh

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

/ISCAS

/ISCAS Title Energy-Efficient Deep In-memory Architecture for NAND Flash Memories Archived version Accepted manuscript: the content is same as the published paper but without the final typesetting by the publisher

More information

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

! Memory.  RAM Memory.  Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell.  Used in most commercial chips ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 5, 8 Memory: Periphery circuits Today! Memory " RAM Memory " Architecture " Memory core " SRAM " DRAM " Periphery " Serial Access Memories

More information

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE. Memory Design I Professor Chris H. Kim University of Minnesota Dept. of ECE chriskim@ece.umn.edu Array-Structured Memory Architecture 2 1 Semiconductor Memory Classification Read-Write Wi Memory Non-Volatile

More information

DRAM with Boosted 3T Gain Cell, PVT-tracking Read Reference Bias

DRAM with Boosted 3T Gain Cell, PVT-tracking Read Reference Bias ASub-0 Sub-0.9V Logic-compatible Embedded DRAM with Boosted 3T Gain Cell, Regulated Bit-line Write Scheme and PVT-tracking Read Reference Bias Ki Chul Chun, Pulkit Jain, Jung Hwa Lee*, Chris H. Kim University

More information

A Write-Back-Free 2T1D Embedded. a Dual-Row-Access Low Power Mode.

A Write-Back-Free 2T1D Embedded. a Dual-Row-Access Low Power Mode. A Write-Back-Free 2T1D Embedded DRAM with Local Voltage Sensing and a Dual-Row-Access Low Power Mode Wei Zhang, Ki Chul Chun, Chris H. Kim University of Minnesota, Minneapolis, MN zhang758@umn.edu Outline

More information

CS250 VLSI Systems Design Lecture 9: Memory

CS250 VLSI Systems Design Lecture 9: Memory CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled

More information

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007 Minimizing Power Dissipation during Write Operation to Register Files Kimish Patel, Wonbok Lee, Massoud Pedram University of Southern California Los Angeles CA August 28 th, 2007 Introduction Outline Conditional

More information

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy. ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 4, 7 Memory Overview, Memory Core Cells Today! Memory " Classification " ROM Memories " RAM Memory " Architecture " Memory core " SRAM

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems

ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 26: November 9, 2018 Memory Overview Dynamic OR4! Precharge time?! Driving input " With R 0 /2 inverter! Driving inverter

More information

A 65nm 8T Sub-V t SRAM Employing Sense-Amplifier Redundancy

A 65nm 8T Sub-V t SRAM Employing Sense-Amplifier Redundancy A 65nm Sub-V t SRAM Employing Sense-Amplifier Redundancy Naveen Verma and Anantha Chandrakasan Massachusetts Institute of Technology ISSCC 2007 Energy Minimization Minimum energy V DD for logic results

More information

8Kb Logic Compatible DRAM based Memory Design for Low Power Systems

8Kb Logic Compatible DRAM based Memory Design for Low Power Systems 8Kb Logic Compatible DRAM based Memory Design for Low Power Systems Harshita Shrivastava 1, Rajesh Khatri 2 1,2 Department of Electronics & Instrumentation Engineering, Shree Govindram Seksaria Institute

More information

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Charles Eckert Xiaowei Wang Jingcheng Wang Arun Subramaniyan Ravi Iyer Dennis Sylvester David Blaauw Reetuparna Das M-Bits Research

More information

MTJ-Based Nonvolatile Logic-in-Memory Architecture

MTJ-Based Nonvolatile Logic-in-Memory Architecture 2011 Spintronics Workshop on LSI @ Kyoto, Japan, June 13, 2011 MTJ-Based Nonvolatile Logic-in-Memory Architecture Takahiro Hanyu Center for Spintronics Integrated Systems, Tohoku University, JAPAN Laboratory

More information

Design of Low Power Wide Gates used in Register File and Tag Comparator

Design of Low Power Wide Gates used in Register File and Tag Comparator www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,

More information

Column decoder using PTL for memory

Column decoder using PTL for memory IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 5, Issue 4 (Mar. - Apr. 2013), PP 07-14 Column decoder using PTL for memory M.Manimaraboopathy

More information

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA

More information

Scalable series-stacked power delivery architectures for improved efficiency and reduced supply current

Scalable series-stacked power delivery architectures for improved efficiency and reduced supply current Scalable series-stacked power delivery architectures for improved efficiency and reduced supply current Robert Pilawa Enver Candan, Josiah McClurg, Sai Zhang, Pradeep Shenoy* Phil Krein, Naresh Shanbhag

More information

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

! Memory Overview. ! ROM Memories. ! RAM Memory  SRAM  DRAM. ! This is done because we can build.  large, slow memories OR ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 2: April 5, 26 Memory Overview, Memory Core Cells Lecture Outline! Memory Overview! ROM Memories! RAM Memory " SRAM " DRAM 2 Memory Overview

More information

Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals

Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals Objectives In this lecture you will learn the following Introduction SRAM and its Peripherals DRAM and its Peripherals 30.1 Introduction

More information

Memory Classification revisited. Slide 3

Memory Classification revisited. Slide 3 Slide 1 Topics q Introduction to memory q SRAM : Basic memory element q Operations and modes of failure q Cell optimization q SRAM peripherals q Memory architecture and folding Slide 2 Memory Classification

More information

Magnetic core memory (1951) cm 2 ( bit)

Magnetic core memory (1951) cm 2 ( bit) Magnetic core memory (1951) 16 16 cm 2 (128 128 bit) Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Memory Read-Only Memory Random Access Non-Random Access EPROM E 2 PROM

More information

ECE 2300 Digital Logic & Computer Organization

ECE 2300 Digital Logic & Computer Organization ECE 2300 Digital Logic & Computer Organization Spring 201 Memories Lecture 14: 1 Announcements HW6 will be posted tonight Lab 4b next week: Debug your design before the in-lab exercise Lecture 14: 2 Review:

More information

A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter

A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter Bongjin Kim, Weichao Xu, and Chris H. Kim University of Minnesota,

More information

A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment.

A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment. A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment. 8th Workshop on Electronics for LHC Experiments 9-13 Sept.

More information

! Serial Access Memories. ! Multiported SRAM ! 5T SRAM ! DRAM. ! Shift registers store and delay data. ! Simple design: cascade of registers

! Serial Access Memories. ! Multiported SRAM ! 5T SRAM ! DRAM. ! Shift registers store and delay data. ! Simple design: cascade of registers ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 28: November 16, 2016 RAM Core Pt 2 Outline! Serial Access Memories! Multiported SRAM! 5T SRAM! DRAM Penn ESE 370 Fall 2016

More information

Semiconductor Memory Classification

Semiconductor Memory Classification ESE37: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 6: November, 7 Memory Overview Today! Memory " Classification " Architecture " Memory core " Periphery (time permitting)!

More information

Millimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells

Millimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells 1 Millimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells Gregory Chen, Matthew Fojtik, Daeyeon Kim, David Fick, Junsun Park, Mingoo Seok, Mao-Ter Chen, Zhiyoong Foo, Dennis

More information

A Single Ended SRAM cell with reduced Average Power and Delay

A Single Ended SRAM cell with reduced Average Power and Delay A Single Ended SRAM cell with reduced Average Power and Delay Kritika Dalal 1, Rajni 2 1M.tech scholar, Electronics and Communication Department, Deen Bandhu Chhotu Ram University of Science and Technology,

More information

CENG 4480 L09 Memory 2

CENG 4480 L09 Memory 2 CENG 4480 L09 Memory 2 Bei Yu Reference: Chapter 11 Memories CMOS VLSI Design A Circuits and Systems Perspective by H.E.Weste and D.M.Harris 1 v.s. CENG3420 CENG3420: architecture perspective memory coherent

More information

EECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5. EECS 427 F09 Lecture Reminders

EECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5. EECS 427 F09 Lecture Reminders EECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5 1 Reminders Deadlines HW4 is due Tuesday 11/17 at 11:59 pm (email submission) CAD8 is due Saturday 11/21 at 11:59 pm Quiz 2 is on Wednesday

More information

Macro in a Generic Logic Process with No Boosted Supplies

Macro in a Generic Logic Process with No Boosted Supplies A 700MHz 2T1C Embedded DRAM Macro in a Generic Logic Process with No Boosted Supplies Ki Chul Chun, Wei Zhang, Pulkit Jain, and Chris H. Kim University of Minnesota, Minneapolis, MN Outline Motivation

More information

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

Design of 6-T SRAM Cell for enhanced read/write margin

Design of 6-T SRAM Cell for enhanced read/write margin International Journal of Advances in Electrical and Electronics Engineering 317 Available online at www.ijaeee.com & www.sestindia.org ISSN: 2319-1112 Design of 6-T SRAM Cell for enhanced read/write margin

More information

The Memory Hierarchy 1

The Memory Hierarchy 1 The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow

More information

Memory in Digital Systems

Memory in Digital Systems MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked

More information

A 32 kb 10T sub-threshold sram array with bitinterleaving and differential read scheme in 90 nm CMOS

A 32 kb 10T sub-threshold sram array with bitinterleaving and differential read scheme in 90 nm CMOS Purdue University Purdue e-pubs Department of Electrical and Computer Engineering Faculty Publications Department of Electrical and Computer Engineering January 2009 A 32 kb 10T sub-threshold sram array

More information

A Low Power SRAM Cell with High Read Stability

A Low Power SRAM Cell with High Read Stability 16 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.9, NO.1 February 2011 A Low Power SRAM Cell with High Read Stability N.M. Sivamangai 1 and K. Gunavathi 2, Non-members ABSTRACT

More information

Lecture 14. Advanced Technologies on SRAM. Fundamentals of SRAM State-of-the-Art SRAM Performance FinFET-based SRAM Issues SRAM Alternatives

Lecture 14. Advanced Technologies on SRAM. Fundamentals of SRAM State-of-the-Art SRAM Performance FinFET-based SRAM Issues SRAM Alternatives Source: Intel the area ratio of SRAM over logic increases Lecture 14 Advanced Technologies on SRAM Fundamentals of SRAM State-of-the-Art SRAM Performance FinFET-based SRAM Issues SRAM Alternatives Reading:

More information

Significance Driven Hybrid 8T-6T SRAM for Energy-Efficient Synaptic Storage in Artificial Neural Networks

Significance Driven Hybrid 8T-6T SRAM for Energy-Efficient Synaptic Storage in Artificial Neural Networks Significance Driven Hybrid 8T-6T SRAM for Energy-Efficient Synaptic Storage in Artificial Neural Networks Gopalakrishnan Srinivasan, Parami Wijesinghe, Syed Shakib Sarwar, Akhilesh Jaiswal, and Kaushik

More information

Advanced Digital Integrated Circuits. Lecture 9: SRAM. Announcements. Homework 1 due on Wednesday Quiz #1 next Monday, March 7

Advanced Digital Integrated Circuits. Lecture 9: SRAM. Announcements. Homework 1 due on Wednesday Quiz #1 next Monday, March 7 EE241 - Spring 2011 Advanced Digital Integrated Circuits Lecture 9: SRAM Announcements Homework 1 due on Wednesday Quiz #1 next Monday, March 7 2 1 Outline Last lecture Variability This lecture SRAM 3

More information

MEMORIES. Memories. EEC 116, B. Baas 3

MEMORIES. Memories. EEC 116, B. Baas 3 MEMORIES Memories VLSI memories can be classified as belonging to one of two major categories: Individual registers, single bit, or foreground memories Clocked: Transparent latches and Flip-flops Unclocked:

More information

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. UCAS-6 6 > Stanford > Imperial > Verify 2011 Marching Memory マーチングメモリ Tadao Nakamura 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. Flynn 1 Copyright 2010 Tadao Nakamura C-M-C Computer

More information

Comparative Analysis of Low Leakage SRAM Cell at 32nm Technology

Comparative Analysis of Low Leakage SRAM Cell at 32nm Technology Comparative Analysis of Low Leakage SRAM Cell at 32nm Technology Jaspreet Kaur Electronics and Communication Engg Section Yadavindra College of Engineering, Talwandi Sabo, India Candy Goyal Assistant Professor,

More information

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements EE241 - Spring 2007 Advanced Digital Integrated Circuits Lecture 22: SRAM Announcements Homework #4 due today Final exam on May 8 in class Project presentations on May 3, 1-5pm 2 1 Class Material Last

More information

Highly Reliable Radiation Hardened Memory Cell for FINFET Technology

Highly Reliable Radiation Hardened Memory Cell for FINFET Technology Highly Reliable Radiation Hardened Memory Cell for FINFET Technology Shantha Devi.P 1, Vennila.P 2, Ramya.M 3, Krishnakumar.S 4 1PG Scholar,Department of ECE,Theni Kammavar Sangam College of Technology,Tamilnadu,India.

More information

Prototype of SRAM by Sergey Kononov, et al.

Prototype of SRAM by Sergey Kononov, et al. Prototype of SRAM by Sergey Kononov, et al. 1. Project Overview The goal of the project is to create a SRAM memory layout that provides maximum utilization of the space on the 1.5 by 1.5 mm chip. Significant

More information

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering IP-SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY A LOW POWER DESIGN D. Harihara Santosh 1, Lagudu Ramesh Naidu 2 Assistant professor, Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India

More information

Lecture 11 SRAM Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

Lecture 11 SRAM Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010 EE4800 CMOS Digital IC Design & Analysis Lecture 11 SRAM Zhuo Feng 11.1 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitryit Multiple Ports Outline Serial Access Memories 11.2 Memory Arrays

More information

A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context.

A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context. A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context. Anselme Vignon, Stefan Cosemans, Wim Dehaene K.U. Leuven ESAT - MICAS Laboratory Kasteelpark Arenberg

More information

Design and Implementation of Low Leakage Power SRAM System Using Full Stack Asymmetric SRAM

Design and Implementation of Low Leakage Power SRAM System Using Full Stack Asymmetric SRAM Design and Implementation of Low Leakage Power SRAM System Using Full Stack Asymmetric SRAM Rajlaxmi Belavadi 1, Pramod Kumar.T 1, Obaleppa. R. Dasar 2, Narmada. S 2, Rajani. H. P 3 PG Student, Department

More information

Improved Initial Overdrive Sense-Amplifier. For Low-Voltage DRAMS. Analog CMOS IC Design. Esayas Naizghi April 30, 2004

Improved Initial Overdrive Sense-Amplifier. For Low-Voltage DRAMS. Analog CMOS IC Design. Esayas Naizghi April 30, 2004 Analog CMOS IC Design Improved Initial Overdrive Sense-Amplifier For Low-Voltage DRAMS Esayas Naizghi April 30, 2004 Overview 1. Introduction 2. Goals and Objectives 3. Gate Sizing Theory 4. DRAM Introduction

More information

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed. Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports

More information

Low Power SRAM Design with Reduced Read/Write Time

Low Power SRAM Design with Reduced Read/Write Time International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 3 (2013), pp. 195-200 International Research Publications House http://www. irphouse.com /ijict.htm Low

More information

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 09, 2016 ISSN (online): 2321-0613 A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM Yogit

More information

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS ABSTRACT We describe L1 cache designed for digital signal processor (DSP) core. The cache is 32KB with variable associativity (4 to 16 ways) and is pseudo-dual-ported.

More information

Introduction to Semiconductor Memory Dr. Lynn Fuller Webpage:

Introduction to Semiconductor Memory Dr. Lynn Fuller Webpage: ROCHESTER INSTITUTE OF TECHNOLOGY MICROELECTRONIC ENGINEERING Introduction to Semiconductor Memory Webpage: http://people.rit.edu/lffeee 82 Lomb Memorial Drive Rochester, NY 14623-5604 Tel (585) 475-2035

More information

Structured Datapaths. Preclass 1. Throughput Yield. Preclass 1

Structured Datapaths. Preclass 1. Throughput Yield. Preclass 1 ESE534: Computer Organization Day 23: November 21, 2016 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. March, 2015 Tabula closed doors 1 [src: www.tabula.com]

More information

Memory in Digital Systems

Memory in Digital Systems MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

VLSID KOLKATA, INDIA January 4-8, 2016

VLSID KOLKATA, INDIA January 4-8, 2016 VLSID 2016 KOLKATA, INDIA January 4-8, 2016 Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in Hybrid Memory Cube Architectures Ishan Thakkar, Sudeep Pasricha Department of Electrical

More information

EECS Dept., University of California at Berkeley. Berkeley Wireless Research Center Tel: (510)

EECS Dept., University of California at Berkeley. Berkeley Wireless Research Center Tel: (510) A V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications Hui Zhang, Vandana Prabhu, Varghese George, Marlene Wan, Martin Benes, Arthur Abnous, and Jan M. Rabaey EECS Dept., University

More information

Analysis of 8T SRAM with Read and Write Assist Schemes (UDVS) In 45nm CMOS Technology

Analysis of 8T SRAM with Read and Write Assist Schemes (UDVS) In 45nm CMOS Technology Analysis of 8T SRAM with Read and Write Assist Schemes (UDVS) In 45nm CMOS Technology Srikanth Lade 1, Pradeep Kumar Urity 2 Abstract : UDVS techniques are presented in this paper to minimize the power

More information

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques Yu Cai, Saugata Ghose, Yixin Luo, Ken Mai, Onur Mutlu, Erich F. Haratsch February 6, 2017

More information

A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit

A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit International Journal of Electrical and Computer Engineering (IJECE) Vol. 3, No. 4, August 2013, pp. 509~515 ISSN: 2088-8708 509 A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit Sidhant Kukrety*,

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 938 LOW POWER SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY T.SANKARARAO STUDENT OF GITAS, S.SEKHAR DILEEP

More information

ESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?

ESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable? ESE534: Computer Organization Day 22: April 9, 2012 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. 1 [src: www.tabula.com] 2 Previously Today Saw how to pipeline

More information

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System 1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman

More information

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang Executive Summary Problems: performance and reliability of write operations

More information

FABRICATION TECHNOLOGIES

FABRICATION TECHNOLOGIES FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general

More information

Design and Analysis of 32 bit SRAM architecture in 90nm CMOS Technology

Design and Analysis of 32 bit SRAM architecture in 90nm CMOS Technology Design and Analysis of 32 bit SRAM architecture in 90nm CMOS Technology Jesal P. Gajjar 1, Aesha S. Zala 2, Sandeep K. Aggarwal 3 1Research intern, GTU-CDAC, Pune, India 2 Research intern, GTU-CDAC, Pune,

More information

A Low Power SRAM Base on Novel Word-Line Decoding

A Low Power SRAM Base on Novel Word-Line Decoding Vol:, No:3, 008 A Low Power SRAM Base on Novel Word-Line Decoding Arash Azizi Mazreah, Mohammad T. Manzuri Shalmani, Hamid Barati, Ali Barati, and Ali Sarchami International Science Index, Computer and

More information

FeRAM Circuit Technology for System on a Chip

FeRAM Circuit Technology for System on a Chip FeRAM Circuit Technology for System on a Chip K. Asari 1,2,4, Y. Mitsuyama 2, T. Onoye 2, I. Shirakawa 2, H. Hirano 1, T. Honda 1, T. Otsuki 1, T. Baba 3, T. Meng 4 1 Matsushita Electronics Corp., Osaka,

More information

Self-Time Tracking Circuit to Improve Access Time of SRAM

Self-Time Tracking Circuit to Improve Access Time of SRAM Self-Time Tracking Circuit to Improve Access Time of SRAM Pullareddy. A Research Scholar, Department of Electronics and Communication Engineering, Sri Venkateswara University College of Engineering, Sri

More information

Survey on Stability of Low Power SRAM Bit Cells

Survey on Stability of Low Power SRAM Bit Cells International Journal of Electronics Engineering Research. ISSN 0975-6450 Volume 9, Number 3 (2017) pp. 441-447 Research India Publications http://www.ripublication.com Survey on Stability of Low Power

More information

Memory Design I. Semiconductor Memory Classification. Read-Write Memories (RWM) Memory Scaling Trend. Memory Scaling Trend

Memory Design I. Semiconductor Memory Classification. Read-Write Memories (RWM) Memory Scaling Trend. Memory Scaling Trend Array-Structured Memory Architecture Memory Design I Professor hris H. Kim University of Minnesota Dept. of EE chriskim@ece.umn.edu 2 Semiconductor Memory lassification Read-Write Memory Non-Volatile Read-Write

More information

DIRECT Rambus DRAM has a high-speed interface of

DIRECT Rambus DRAM has a high-speed interface of 1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999 A 1.6-GByte/s DRAM with Flexible Mapping Redundancy Technique and Additional Refresh Scheme Satoru Takase and Natsuki Kushiyama

More information

Design and Implementation of High Performance Application Specific Memory

Design and Implementation of High Performance Application Specific Memory Design and Implementation of High Performance Application Specific Memory - 고성능 Application Specific Memory 의설계와구현 - M.S. Thesis Sungdae Choi Dec. 20th, 2002 Outline Introduction Memory for Mobile 3D Graphics

More information

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Z-RAM Ultra-Dense Memory for 90nm and Below Hot Chips 2006 David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Outline Device Overview Operation Architecture Features Challenges Z-RAM Performance

More information

A Comparative Study of Power Efficient SRAM Designs

A Comparative Study of Power Efficient SRAM Designs A Comparative tudy of Power Efficient RAM Designs Jeyran Hezavei, N. Vijaykrishnan, M. J. Irwin Pond Laboratory, Department of Computer cience & Engineering, Pennsylvania tate University {hezavei, vijay,

More information

Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network

Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Lixue Xia, Tianqi Tang, Wenqin Huangfu, Ming Cheng, Xiling Yin, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E., Tsinghua

More information

Digital Integrated Circuits (83-313) Lecture 7: SRAM. Semester B, Lecturer: Dr. Adam Teman Itamar Levi, Robert Giterman.

Digital Integrated Circuits (83-313) Lecture 7: SRAM. Semester B, Lecturer: Dr. Adam Teman Itamar Levi, Robert Giterman. Digital Integrated Circuits (83-313) Lecture 7: SRAM Semester B, 2016-17 Lecturer: Dr. Adam Teman TAs: Itamar Levi, Robert Giterman 16 May 2017 Disclaimer: This course was prepared, in its entirety, by

More information

Spiral 2-9. Tri-State Gates Memories DMA

Spiral 2-9. Tri-State Gates Memories DMA 2-9.1 Spiral 2-9 Tri-State Gates Memories DMA 2-9.2 Learning Outcomes I understand how a tri-state works and the rules for using them to share a bus I understand how SRAM and DRAM cells perform reads and

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 ISSCC 26 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 22.1 A 125µW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ting-An Lin 2, Sheng-Zen Wang 2, Wen-Ping Lee

More information

Optimizing Standby

Optimizing Standby Optimizing Power @ Standby Memory Benton H. Calhoun Jan M. Rabaey Chapter Outline Memory in Standby Voltage Scaling Body Biasing Periphery Memory Dominates Processor Area SRAM is a major source of static

More information

Dynamic Write Limited Minimum Operating Voltage for Nanoscale SRAMs

Dynamic Write Limited Minimum Operating Voltage for Nanoscale SRAMs Dynamic Write Limited Minimum Operating Voltage for Nanoscale SRAMs Satyanand Nalam, Vikas Chandra, Robert C. Aitken, Benton H. Calhoun Dept. of ECE, University of Virginia, Charlottesville; ARM R&D, San

More information

X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories

X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories Amogh Agrawal*, Akhilesh Jaiswal*, Chankyu Lee and Kaushik Roy, Fellow, IEEE School of Electrical and Computer Engineering,

More information

Simulation and Analysis of SRAM Cell Structures at 90nm Technology

Simulation and Analysis of SRAM Cell Structures at 90nm Technology Vol.1, Issue.2, pp-327-331 ISSN: 2249-6645 Simulation and Analysis of SRAM Cell Structures at 90nm Technology Sapna Singh 1, Neha Arora 2, Prof. B.P. Singh 3 (Faculty of Engineering and Technology, Mody

More information

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,

More information

Implementation of DRAM Cell Using Transmission Gate

Implementation of DRAM Cell Using Transmission Gate Implementation of DRAM Cell Using Transmission Gate Pranita J. Giri 1, Sunanda K. Kapde 2 PG Student, Department of E&TC, Deogiri Institute of Engineering & Management Studies, Aurangabad (MS), India 1

More information

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.1, FEBRUARY, 2015 http://dx.doi.org/10.5573/jsts.2015.15.1.077 Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network

More information

Calibrating Achievable Design GSRC Annual Review June 9, 2002

Calibrating Achievable Design GSRC Annual Review June 9, 2002 Calibrating Achievable Design GSRC Annual Review June 9, 2002 Wayne Dai, Andrew Kahng, Tsu-Jae King, Wojciech Maly,, Igor Markov, Herman Schmit, Dennis Sylvester DUSD(Labs) Calibrating Achievable Design

More information

Based on slides/material by. Topic 7-4. Memory and Array Circuits. Outline. Semiconductor Memory Classification

Based on slides/material by. Topic 7-4. Memory and Array Circuits. Outline. Semiconductor Memory Classification Based on slides/material by Topic 7 Memory and Array Circuits K. Masselos http://cas.ee.ic.ac.uk/~kostas J. Rabaey http://bwrc.eecs.berkeley.edu/classes/icbook/instructors.html Digital Integrated Circuits:

More information

Research Scholar, Chandigarh Engineering College, Landran (Mohali), 2

Research Scholar, Chandigarh Engineering College, Landran (Mohali), 2 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Optimize Parity Encoding for Power Reduction in Content Addressable Memory Nisha Sharma, Manmeet Kaur 1 Research Scholar, Chandigarh

More information

Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration

Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration Don t Forget the : Automatic Block RAM Modelling, Optimization, and Architecture Exploration S. Yazdanshenas, K. Tatsumura *, and V. Betz University of Toronto, Canada * Toshiba Corporation, Japan : An

More information