Article begins on next page
|
|
- Clinton Goodman
- 6 years ago
- Views:
Transcription
1 Title: A 19.4 nj/ 364K s/s in-memory random forest classifier in 6T SRAM array Archived version Accepted manuscript: the content is identical to the published paper, but without the final typesetting by the publisher Published version DOI : DOI: 1.119/ESSCIRC Conference homepage Authors (contact) Mingu Kang (mkang17@illinois.edu) Sujan K. Gonugondla (gonugon2@illinois.edu) Naresh R. Shanbhag (shanbhag@illinois.edu) Affiliation University of Illinois at Urbana Champaign Article begins on next page
2 A 19.4 nj/ 364K s/s In-memory Random Forest Classifier in 6T SRAM Array Mingu Kang, Sujan K. Gonugondla, Naresh R. Shanbhag Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, USA. Abstract This paper presents IC realization of a random forest (RF) machine learning classifier. Algorithm-architecturecircuit is co-optimized to minimize the energy-delay product (EDP). Deterministic subsampling (DSS) and balanced trees result in reduced interconnect complexity and avoid irregular memory accesses. Low-swing analog in-memory computations embedded in a standard 6T SRAM enable massively parallel processing thereby minimizing the memory fetches and reducing the EDP further. The 65nm CMOS prototype achieves a 6.8 lower EDP compared to a conventional design at the same accuracy (94%) for an 8-class traffic sign recognition problem. Keywords machine learning; random forest; in-memory computing; pattern recognition; traffic sign recognition I. INTRODUCTION The random forest (RF) classifier [1] is attractive due to its high-accuracy, simple operations (comparisons), applicability to multi-class problems, and robustness to non-ideal computations due to its majority voting based- [1]. However, realizing an energy-efficient implementation of the RF algorithm is made challenging due to its high data access rate combined with its highly irregular data access pattern. This paper presents an energy-efficient and high throughput RF classifier IC by employing: 1) deterministic subsampling (DSS) to reduce interconnect complexity, 2) balanced tree to regularize memory access pattern, 3) deeply embedded analog computations [3,4,5] in the periphery of an SRAM bitcell array (BCA) to exploit the inherent algorithmic error tolerance. To the best of our knowledge, this is the first IC implementation of the RF algorithm as only FPGAs, GPUs, and multi-core processor implementations of the RF algorithm [2] exist today. These fail to take advantage of the opportunities afforded by analog computations. II. THE RF ALGORITHM This section explains the RF algorithm and its implementation challenges. A. RF Algorithm P 1 leaf nodes chosen path in each tree Input (X) RSS RSS RSS tree 1 tree 2 tree M label1 P 2 label2 voter P M node m,n > yes τ m,n labelm no P1 RSS RSS RSS balanced tree 1 label1 Input (X) P2 label2 4:1 DSS balanced tree 2 voter PM balanced tree M labelm OPs per tree Memory accesses Data p m,n c m,l τ m,n Comp. > τ m,n 8 Bit precision 6 4 /4 8 Size 21.5 (Byte) /31 /16 /31 - # of 3.5 OPs /4 /2 /4 - proposed / conventional 8 bytes per SRAM access assumed 31 /31 Cross bar Mux ratio 64:1 /256:1 1/1 τ m,n: threshold level of n th node in m th tree p m,n: pixel index of n th node in m th tree P m: [p m,1, p m,2, p m,n] RSS: Random subsampling by sample pattern P m : p m,n th pixel of input image X c m,l: label corresponding to l th leaf node in m th tree (m: 1~M, n: 1~N, l: 1~N+1) (a) (b) (c) Fig. 1. Random forest algorithm: (a) conventional, (b) proposed w/ deterministic subsample (DSS), and (c) number of required operations.
3 The RF algorithm (Fig. 1(a)) consists of M trees. The m-th tree processes data obtained by random subsampling (RSS) the input image (X) using a pseudo-random pattern vector P m. The n-th node in the m-th tree compares x(p (m,n) ), which is the pixel (or feature) indexed by p (m,n), with a threshold τ (m,n) to obtain a node-level binary q (m,n). Either the left or right branch is taken based on q (m,n). This process is repeated until a leaf node is reached. The label c (m,l) corresponding to the l-th leaf node is the tree-level. The final is obtained by majority-voting the M treelevel s. B. Implementation Challenges Two different architectures can be considered to implement the RF algorithm: serial and parallel architectures. A serial architecture needs to process nodes sequentially resulting in large delay and requires reading of two 11-b (for a 16 KB array) child node addresses per node, which takes roughly of the storage space. On the other hand, a fully parallel architecture computes all q (m,n) in parallel and uses these to address a look-up-table (LUT) to obtain c (m,l). Doing so requires a large number of memory accesses, e.g., 78 8-b bytes per tree (Fig. 1(c)), which in turn limits the achievable throughput and energy efficiency. Additionally, a complex (i.e., 256:1 with image X) crossbar is needed to route the pixel indexed by p (m,n) from X for comparison. III. THE PROPOSED RF ALGORITHM AND ARCHITECTURE This paper co-optimizes the algorithm and architecture to achieve energy and throughput benefits. A. The Proposed RF Algorithm The modified RF algorithm (Fig. 1(b)) employs a fixed pattern deterministic subsampling (DSS) step prior to RSS to solve the crossbar problem mentioned above. A 4:1 DSS factor is chosen to balance the loss in classification accuracy with crossbar complexity. The complexity of the RSS crossbar is reduced from 256:1 to 64:1 when the input X is a image. Thus, the precision of p (m,n) is also reduced from 8-b to 6-b. Additionally, the trees are balanced (Fig. 1(b)) by filling some empty nodes in order to regularize the memory access pattern. The memory access problem is addressed by reducing the number of memory accesses via in-memory comparison (Fig. 3) eliminating the need to fetch τ (m,n). The Class ADD generator (CAG) generates the address of chosen c (m,l) from q (m,n) s eliminating the need to fetch all the c (m,l) s. Only 24.5 bytes of data need to be fetched per tree compared to 78 bytes/tree in the parallel architecture. B. Proposed Architecture and Operations The proposed RF architecture (Fig. 2(a)) includes a SRAM BCA, multi-row wordline (WL) driver, 64-b I/O with a 4:1 column mux, DSS input buffer to store streamed X, RSS crossbars, CAG, label finder, majority voter, and the peripherals for standard read/write operations. A group of four trees are processed in parallel and 16 such groups are processed sequentially for a total of M = 64 trees. The classifier first: 1) writes the pixel index register, 2) enables crossbar, 3) does inmemory comparison enabled by multi-row WL driver and analog comparators, 4) sequentially fetches four tree-level labels using address generated by CAG, and 5) majority votes CORE CTRL Group 1 Group M DSS RSS In-memory comparison 64-b BUS Input buffer (X) with DSS X 1,5,,253 X 2,6,,254 X 3,7,,255 X 4,8,,256 (x) (x) SRAM Replica bit-cell array T SRAM bit-cell array tree 1 Group 1 tree 2 tree 3 tree 4 p 1,1~31 p 2,1~31 p 3,1~31 p 4,1~31 τ 1,1~31 τ 2,1~31 τ 3,1~31 τ 4,1~31 c 1,1~32 c 2,1~32 c 3,1~32 c 4,1~32 Group 2 Group 42 (x) Normal read/write circuitry (X(p m,n)) 64-b IO p m~(m+3),1~31 (x) Multi-row WL driver w/ row dec. m~(m+3) EN EN q[1:4] CTRL, ADD CTRL, ADD m class ADD gen. (CAG) ADDm~(m+3) voter - : pixel index register - : crossbar - : RSS register - : analog comparators Pixel index Cross bar Enable Replica cell Write In-memory Comp. vote Pixel index tree 1 tree 2 tree 3 12 reads P 1,1~31 1 MR-read 2 reads 3 reads 3 reads 3 reads 3 reads P 2,1~31 P 3,1~31 tree 4 P 4,1~31 row i row (i+1) left right left right 1 read 1 read 32 bits including 1 32 bits including 2 1 MR-read 2 reads 32 bits including 3 32 bits including 4 (a) Fig. 2. Proposed RF: (a) architecture, and (b) timing diagram. (b)
4 in the final tree. C. In-memory comparison In-memory comparison requires the 8-b thresholds τ (m,n) (T in Fig. 3(a)) and the indexed pixels x(p (m,n) ) (X in Fig. 3(a)) to be stored in a column major pattern, i.e., bits of a word are stored in a column. The comparison begins with the simultaneous application of WL access pulses with binaryweighted pulse widths to all the rows storing T and X_B. Here, the pulse width is proportional to the bit position. Doing so creates a bitline (BL) voltage swing ΔV BLB (ΔV BL ) proportional to T-X (X-T) [3,4]. Linearity of this multi-row read is improved by reading 4-b MSBs and LSBs separately from adjacent columns followed by a capacitively-weighted charge sharing that assigns 16 greater weight to the MSBs. The WL voltage is reduced (e.g.,.65v) to prevent destructive read and improve the linearity further. Storing the X_B in the replica bit-cell array allows fast writing through a separate write BL (WBL) and wordline (WWL) by eliminating the overheads of slow write operation into normal BCA. The feed into analog comparators to generate node-level s (q). In-memory comparison is an intrinsically and massively parallel operation as it processes all b words in parallel from 256 columns whereas conventional memory fetches only 64 bits (= 8 words) per read access when the sense amplifier is shared across four columns. In addition, multi-row read saves energy by accessing 4 bits per precharge. A. Component-level Accuracy Characterization Measured in-memory comparison results show (Fig. 3(b)) the comparator error rate increasing from 1.6% to 14.5% as ΔV BL reduces from 25mV to 5mV. The RF algorithm with 64 trees needs an error rate of less than 9.5% at comparator output q to avoid a discernable 8-class classification accuracy loss. Four trees tolerate only 4% error restricting further reduction in ΔV BL. B. Application-level Accuracy, Energy, and Throughput Measured results (Fig. 4) of energy vs. accuracy trade-off for the binary classification (face detection) with 64 trees show the proposed IC achieves a 3.1 energy savings over the conventional architecture (SRAM + digital processor). The energy of the conventional architecture is obtained via postlayout simulations of the digital blocks and read access energy measured from the prototype IC. This energy savings come from multi-row read, in-memory comparison, and lowcomplexity cross bar. Fewer memory accesses also reduce the delay by 2.2 over a conventional architecture, thereby providing a 6.8 lower energy-delay product (EDP) at the same accuracy of > 93% as the conventional architecture. The prototype IC achieves a throughput of 364K s/s and energy efficiency of 19.4 nj/, achieving at least 5.6 smaller EDP compared to prior multi-class classifier ICs as listed in Table II. _EN ΔV BL RWL WWL RWL 1 WWL 1 RWL 2 WWL 2 RWL 3 WWL 3 WL i+ WL i+1 WL i+2 WL i+3 WBL IV. BL q > BLB x 3 x 2 x 1 x t 3 t 2 t 1 t (a) CHIP MEASURED RESULTS The in-memory RF classifier is implemented in a 65nm CMOS process (chip micrograph in Fig. 5 and summarized in Table I) to prove the application-level s benefits. ΔV BLB Replica bit-cells 6-T SRAM bit-cells Comparison error rate (%) WL i+ & RWL WL i+1 & RWL 1 WL i+2 & RWL 2 WL i+3 & RWL 2 _EN q *minimum ΔV BL to achieve classification accuracy 93% V WL<V DD ΔV BL X + T X T ΔV BLB X + T T X 1 if X > T ( V BL< VBLB) q =, otherwise with 64 trees* with 4 trees* ΔV BL per LSB (mv) Fig. 3. In-memory comparison: (a) bit-cell column for in-memory comparison of T and X, and (b) measured accuracy of comparison. (b) Core energy per (nj) Proposed Energy Proposed Accuracy Conv. Energy Conv. Accuracy ΔV BL per LSB for proposed (mv) Fig. 4. Energy vs. error rate w.r.t ΔV BL with 64 trees (binary classification), where ΔV BL of Conv. = 8 ΔV BL per LSB Classification error rate (1-P DET ) (%)
5 Table I: Chip summary. Technology 65 nm CMOS Die size mm SRAM capacity 16 KB ( bit-cells) Bit-cell size um 2 CTRL CLK freq. Supply voltage (V) Energy per (4 trees, 64 tress)(nj) Decision throughput (s/s) (4 trees, 64 trees) V. CONCLUSION 1 GHz CORE 1. CTRL.75 CORE (.9, 14.4) CTRL (.3, 5.) (5.6M, 364k) This paper has presented an IC realization of random forest (RF) algorithm to achieve energy-efficient and high throughput by co-optimizing algorithm, architecture, and circuit design. As a result, the prototype IC achieves a 3.1 energy savings and 2.2 speed-up at the same time providing a 6.8 lower energy-delay product (EDP) at the same accuracy of > 93% compared to conventional digital architecture. As a result, the proposed IC achieves a throughput of 364K s/s and energy efficiency of 19.4 nj/. To the best of our knowledge, this is the first IC realization of the RF algorithm. The benefits of the proposed architecture are expected to increase with image resolution and data size. This is because the subsampling ratio can be increased without losing classification accuracy and the random noise components in the low-swing analog in-memory comparison get averaged out better with data size. ACKNOWLEDGMENT This work was supported by Systems on Nanoscale Information fabrics (SONIC), one of the six SRC STARnet Centers, sponsored by SRC and DARPA. The authors would 1.2 mm 64-b bus Bitcell Array Fig. 5. Chip micrograph. Input buffer & Pixel index register & Cross bar Analog comparators Replica bitcell array R/W Bitcell Array 1.2 mm Digital CTRL Test block Decision like to acknowledge constructive discussions with S. Eilert, K. Curewitz, N. Verma, B. Murmann, and P. Hanumolu. REFERENCES [1] L. Breiman, Random forests, Machine Learning, vol. 45, 1. [2] B. Van Essen, C. Macaraeg, M. Gokhale, and R. Prenger, Accelerating a random forest classifier: Multi-core, GP-GPU, or FPGA?, IEEE FCCM, 12. [3] M. Kang, M.S. Keel, N.R. Shanbhag, S. Eilert, & K. Curewitz, An Energy-efficient VLSI Architecture for Pattern Recognition via Deep Embedding of Computation in SRAM, IEEE ICASSP, 14. [4] M. Kang, S. Gonugondla, A. Patil, and N. Shanbhag, A 481pJ/ 3.4M /s multifunctional deep in-memory inference processor using standard 6T SRAM array, arxiv preprint arxiv: , 16. [5] J. Zhang, Z. Wang, and N. Verma, In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array, IEEE JSSC, 17. [6] J. Park, et al., A 92-mW Real-Time Traffic Sign Recognition System with Robust Illumination Adaptation and Support Vector Machine, IEEE JSSC, 12. [7] H. Kaul, et al., A 21.5M-Query-Vectors/s 3.37nJ/Vector Reconfigurable k-nearest-neighbor Accelerator with Adaptive Precision in 14nm Tri-Gate CMOS, ISSCC Dig. Tech. Papers, 16. Prior art [6] [7] Ours (M=64) Table II: Comparison with prior arts. Input Throughput Energy Process Algorithm Dataset Size (/s) (nj/) (8b) 13nm CMOS 14nm tri-gate 65nm CMOS Support Vector Machine K-nearest Neighbor Random Forest Traffic sign video Not reported KUL traffic signs [K]* 21.5M [498.8K]* 364.4K 1.5M [125]* 3.4 [145.3]* 19.4 (w/ CTRL) EDP (fjs/) 45G [3125]*.2 [292.3]* Accuracy 9% Not reported % *throughput & energy scaled to a 65nm process w/ pixels; SRAM memory access cost not included
A 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array. Mingu Kang, Sujan Gonugondla, Naresh Shanbhag
A 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array Mingu Kang, Sujan Gonugondla, Naresh Shanbhag University of Illinois at Urbana Champaign Machine Learning under Resource
More informationarxiv: v1 [cs.ar] 24 Oct 2016
A 481pJ/decision 3.4M decision/s Multifunctional Deep In-memory Inference Processor using Standard 6T SRAM Array arxiv:161.751v1 [cs.ar] 24 Oct 216 Mingu Kang, Sujan Gonugondla, Ameya Patil, and Naresh
More information6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1
6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,
More information/ISCAS
Title Energy-Efficient Deep In-memory Architecture for NAND Flash Memories Archived version Accepted manuscript: the content is same as the published paper but without the final typesetting by the publisher
More information! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 5, 8 Memory: Periphery circuits Today! Memory " RAM Memory " Architecture " Memory core " SRAM " DRAM " Periphery " Serial Access Memories
More informationMemory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.
Memory Design I Professor Chris H. Kim University of Minnesota Dept. of ECE chriskim@ece.umn.edu Array-Structured Memory Architecture 2 1 Semiconductor Memory Classification Read-Write Wi Memory Non-Volatile
More informationDRAM with Boosted 3T Gain Cell, PVT-tracking Read Reference Bias
ASub-0 Sub-0.9V Logic-compatible Embedded DRAM with Boosted 3T Gain Cell, Regulated Bit-line Write Scheme and PVT-tracking Read Reference Bias Ki Chul Chun, Pulkit Jain, Jung Hwa Lee*, Chris H. Kim University
More informationA Write-Back-Free 2T1D Embedded. a Dual-Row-Access Low Power Mode.
A Write-Back-Free 2T1D Embedded DRAM with Local Voltage Sensing and a Dual-Row-Access Low Power Mode Wei Zhang, Ki Chul Chun, Chris H. Kim University of Minnesota, Minneapolis, MN zhang758@umn.edu Outline
More informationCS250 VLSI Systems Design Lecture 9: Memory
CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled
More informationMinimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007
Minimizing Power Dissipation during Write Operation to Register Files Kimish Patel, Wonbok Lee, Massoud Pedram University of Southern California Los Angeles CA August 28 th, 2007 Introduction Outline Conditional
More informationSemiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 4, 7 Memory Overview, Memory Core Cells Today! Memory " Classification " ROM Memories " RAM Memory " Architecture " Memory core " SRAM
More informationUnleashing the Power of Embedded DRAM
Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers
More informationESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems
ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 26: November 9, 2018 Memory Overview Dynamic OR4! Precharge time?! Driving input " With R 0 /2 inverter! Driving inverter
More informationA 65nm 8T Sub-V t SRAM Employing Sense-Amplifier Redundancy
A 65nm Sub-V t SRAM Employing Sense-Amplifier Redundancy Naveen Verma and Anantha Chandrakasan Massachusetts Institute of Technology ISSCC 2007 Energy Minimization Minimum energy V DD for logic results
More information8Kb Logic Compatible DRAM based Memory Design for Low Power Systems
8Kb Logic Compatible DRAM based Memory Design for Low Power Systems Harshita Shrivastava 1, Rajesh Khatri 2 1,2 Department of Electronics & Instrumentation Engineering, Shree Govindram Seksaria Institute
More informationNeural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Charles Eckert Xiaowei Wang Jingcheng Wang Arun Subramaniyan Ravi Iyer Dennis Sylvester David Blaauw Reetuparna Das M-Bits Research
More informationMTJ-Based Nonvolatile Logic-in-Memory Architecture
2011 Spintronics Workshop on LSI @ Kyoto, Japan, June 13, 2011 MTJ-Based Nonvolatile Logic-in-Memory Architecture Takahiro Hanyu Center for Spintronics Integrated Systems, Tohoku University, JAPAN Laboratory
More informationDesign of Low Power Wide Gates used in Register File and Tag Comparator
www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,
More informationColumn decoder using PTL for memory
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 5, Issue 4 (Mar. - Apr. 2013), PP 07-14 Column decoder using PTL for memory M.Manimaraboopathy
More informationHigh Performance Memory Read Using Cross-Coupled Pull-up Circuitry
High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA
More informationScalable series-stacked power delivery architectures for improved efficiency and reduced supply current
Scalable series-stacked power delivery architectures for improved efficiency and reduced supply current Robert Pilawa Enver Candan, Josiah McClurg, Sai Zhang, Pradeep Shenoy* Phil Krein, Naresh Shanbhag
More information! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 2: April 5, 26 Memory Overview, Memory Core Cells Lecture Outline! Memory Overview! ROM Memories! RAM Memory " SRAM " DRAM 2 Memory Overview
More informationModule 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals
Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals Objectives In this lecture you will learn the following Introduction SRAM and its Peripherals DRAM and its Peripherals 30.1 Introduction
More informationMemory Classification revisited. Slide 3
Slide 1 Topics q Introduction to memory q SRAM : Basic memory element q Operations and modes of failure q Cell optimization q SRAM peripherals q Memory architecture and folding Slide 2 Memory Classification
More informationMagnetic core memory (1951) cm 2 ( bit)
Magnetic core memory (1951) 16 16 cm 2 (128 128 bit) Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Memory Read-Only Memory Random Access Non-Random Access EPROM E 2 PROM
More informationECE 2300 Digital Logic & Computer Organization
ECE 2300 Digital Logic & Computer Organization Spring 201 Memories Lecture 14: 1 Announcements HW6 will be posted tonight Lab 4b next week: Debug your design before the in-lab exercise Lecture 14: 2 Review:
More informationA 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter
A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter Bongjin Kim, Weichao Xu, and Chris H. Kim University of Minnesota,
More informationA Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment.
A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment. 8th Workshop on Electronics for LHC Experiments 9-13 Sept.
More information! Serial Access Memories. ! Multiported SRAM ! 5T SRAM ! DRAM. ! Shift registers store and delay data. ! Simple design: cascade of registers
ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 28: November 16, 2016 RAM Core Pt 2 Outline! Serial Access Memories! Multiported SRAM! 5T SRAM! DRAM Penn ESE 370 Fall 2016
More informationSemiconductor Memory Classification
ESE37: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 6: November, 7 Memory Overview Today! Memory " Classification " Architecture " Memory core " Periphery (time permitting)!
More informationMillimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells
1 Millimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells Gregory Chen, Matthew Fojtik, Daeyeon Kim, David Fick, Junsun Park, Mingoo Seok, Mao-Ter Chen, Zhiyoong Foo, Dennis
More informationA Single Ended SRAM cell with reduced Average Power and Delay
A Single Ended SRAM cell with reduced Average Power and Delay Kritika Dalal 1, Rajni 2 1M.tech scholar, Electronics and Communication Department, Deen Bandhu Chhotu Ram University of Science and Technology,
More informationCENG 4480 L09 Memory 2
CENG 4480 L09 Memory 2 Bei Yu Reference: Chapter 11 Memories CMOS VLSI Design A Circuits and Systems Perspective by H.E.Weste and D.M.Harris 1 v.s. CENG3420 CENG3420: architecture perspective memory coherent
More informationEECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5. EECS 427 F09 Lecture Reminders
EECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5 1 Reminders Deadlines HW4 is due Tuesday 11/17 at 11:59 pm (email submission) CAD8 is due Saturday 11/21 at 11:59 pm Quiz 2 is on Wednesday
More informationMacro in a Generic Logic Process with No Boosted Supplies
A 700MHz 2T1C Embedded DRAM Macro in a Generic Logic Process with No Boosted Supplies Ki Chul Chun, Wei Zhang, Pulkit Jain, and Chris H. Kim University of Minnesota, Minneapolis, MN Outline Motivation
More informationPower Reduction Techniques in the Memory System. Typical Memory Hierarchy
Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache
More informationTHE latest generation of microprocessors uses a combination
1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz
More informationDesign of 6-T SRAM Cell for enhanced read/write margin
International Journal of Advances in Electrical and Electronics Engineering 317 Available online at www.ijaeee.com & www.sestindia.org ISSN: 2319-1112 Design of 6-T SRAM Cell for enhanced read/write margin
More informationThe Memory Hierarchy 1
The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow
More informationMemory in Digital Systems
MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked
More informationA 32 kb 10T sub-threshold sram array with bitinterleaving and differential read scheme in 90 nm CMOS
Purdue University Purdue e-pubs Department of Electrical and Computer Engineering Faculty Publications Department of Electrical and Computer Engineering January 2009 A 32 kb 10T sub-threshold sram array
More informationA Low Power SRAM Cell with High Read Stability
16 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.9, NO.1 February 2011 A Low Power SRAM Cell with High Read Stability N.M. Sivamangai 1 and K. Gunavathi 2, Non-members ABSTRACT
More informationLecture 14. Advanced Technologies on SRAM. Fundamentals of SRAM State-of-the-Art SRAM Performance FinFET-based SRAM Issues SRAM Alternatives
Source: Intel the area ratio of SRAM over logic increases Lecture 14 Advanced Technologies on SRAM Fundamentals of SRAM State-of-the-Art SRAM Performance FinFET-based SRAM Issues SRAM Alternatives Reading:
More informationSignificance Driven Hybrid 8T-6T SRAM for Energy-Efficient Synaptic Storage in Artificial Neural Networks
Significance Driven Hybrid 8T-6T SRAM for Energy-Efficient Synaptic Storage in Artificial Neural Networks Gopalakrishnan Srinivasan, Parami Wijesinghe, Syed Shakib Sarwar, Akhilesh Jaiswal, and Kaushik
More informationAdvanced Digital Integrated Circuits. Lecture 9: SRAM. Announcements. Homework 1 due on Wednesday Quiz #1 next Monday, March 7
EE241 - Spring 2011 Advanced Digital Integrated Circuits Lecture 9: SRAM Announcements Homework 1 due on Wednesday Quiz #1 next Monday, March 7 2 1 Outline Last lecture Variability This lecture SRAM 3
More informationMEMORIES. Memories. EEC 116, B. Baas 3
MEMORIES Memories VLSI memories can be classified as belonging to one of two major categories: Individual registers, single bit, or foreground memories Clocked: Transparent latches and Flip-flops Unclocked:
More informationMarching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.
UCAS-6 6 > Stanford > Imperial > Verify 2011 Marching Memory マーチングメモリ Tadao Nakamura 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. Flynn 1 Copyright 2010 Tadao Nakamura C-M-C Computer
More informationComparative Analysis of Low Leakage SRAM Cell at 32nm Technology
Comparative Analysis of Low Leakage SRAM Cell at 32nm Technology Jaspreet Kaur Electronics and Communication Engg Section Yadavindra College of Engineering, Talwandi Sabo, India Candy Goyal Assistant Professor,
More informationEE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements
EE241 - Spring 2007 Advanced Digital Integrated Circuits Lecture 22: SRAM Announcements Homework #4 due today Final exam on May 8 in class Project presentations on May 3, 1-5pm 2 1 Class Material Last
More informationHighly Reliable Radiation Hardened Memory Cell for FINFET Technology
Highly Reliable Radiation Hardened Memory Cell for FINFET Technology Shantha Devi.P 1, Vennila.P 2, Ramya.M 3, Krishnakumar.S 4 1PG Scholar,Department of ECE,Theni Kammavar Sangam College of Technology,Tamilnadu,India.
More informationPrototype of SRAM by Sergey Kononov, et al.
Prototype of SRAM by Sergey Kononov, et al. 1. Project Overview The goal of the project is to create a SRAM memory layout that provides maximum utilization of the space on the 1.5 by 1.5 mm chip. Significant
More informationInternational Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering
IP-SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY A LOW POWER DESIGN D. Harihara Santosh 1, Lagudu Ramesh Naidu 2 Assistant professor, Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India
More informationLecture 11 SRAM Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010
EE4800 CMOS Digital IC Design & Analysis Lecture 11 SRAM Zhuo Feng 11.1 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitryit Multiple Ports Outline Serial Access Memories 11.2 Memory Arrays
More informationA novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context.
A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context. Anselme Vignon, Stefan Cosemans, Wim Dehaene K.U. Leuven ESAT - MICAS Laboratory Kasteelpark Arenberg
More informationDesign and Implementation of Low Leakage Power SRAM System Using Full Stack Asymmetric SRAM
Design and Implementation of Low Leakage Power SRAM System Using Full Stack Asymmetric SRAM Rajlaxmi Belavadi 1, Pramod Kumar.T 1, Obaleppa. R. Dasar 2, Narmada. S 2, Rajani. H. P 3 PG Student, Department
More informationImproved Initial Overdrive Sense-Amplifier. For Low-Voltage DRAMS. Analog CMOS IC Design. Esayas Naizghi April 30, 2004
Analog CMOS IC Design Improved Initial Overdrive Sense-Amplifier For Low-Voltage DRAMS Esayas Naizghi April 30, 2004 Overview 1. Introduction 2. Goals and Objectives 3. Gate Sizing Theory 4. DRAM Introduction
More informationLecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.
Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports
More informationLow Power SRAM Design with Reduced Read/Write Time
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 3 (2013), pp. 195-200 International Research Publications House http://www. irphouse.com /ijict.htm Low
More informationA Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM
IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 09, 2016 ISSN (online): 2321-0613 A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM Yogit
More informationA 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS
A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS ABSTRACT We describe L1 cache designed for digital signal processor (DSP) core. The cache is 32KB with variable associativity (4 to 16 ways) and is pseudo-dual-ported.
More informationIntroduction to Semiconductor Memory Dr. Lynn Fuller Webpage:
ROCHESTER INSTITUTE OF TECHNOLOGY MICROELECTRONIC ENGINEERING Introduction to Semiconductor Memory Webpage: http://people.rit.edu/lffeee 82 Lomb Memorial Drive Rochester, NY 14623-5604 Tel (585) 475-2035
More informationStructured Datapaths. Preclass 1. Throughput Yield. Preclass 1
ESE534: Computer Organization Day 23: November 21, 2016 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. March, 2015 Tabula closed doors 1 [src: www.tabula.com]
More informationMemory in Digital Systems
MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationVLSID KOLKATA, INDIA January 4-8, 2016
VLSID 2016 KOLKATA, INDIA January 4-8, 2016 Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in Hybrid Memory Cube Architectures Ishan Thakkar, Sudeep Pasricha Department of Electrical
More informationEECS Dept., University of California at Berkeley. Berkeley Wireless Research Center Tel: (510)
A V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications Hui Zhang, Vandana Prabhu, Varghese George, Marlene Wan, Martin Benes, Arthur Abnous, and Jan M. Rabaey EECS Dept., University
More informationAnalysis of 8T SRAM with Read and Write Assist Schemes (UDVS) In 45nm CMOS Technology
Analysis of 8T SRAM with Read and Write Assist Schemes (UDVS) In 45nm CMOS Technology Srikanth Lade 1, Pradeep Kumar Urity 2 Abstract : UDVS techniques are presented in this paper to minimize the power
More informationVulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques
Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques Yu Cai, Saugata Ghose, Yixin Luo, Ken Mai, Onur Mutlu, Erich F. Haratsch February 6, 2017
More informationA Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit
International Journal of Electrical and Computer Engineering (IJECE) Vol. 3, No. 4, August 2013, pp. 509~515 ISSN: 2088-8708 509 A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit Sidhant Kukrety*,
More informationInternational Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN
International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 938 LOW POWER SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY T.SANKARARAO STUDENT OF GITAS, S.SEKHAR DILEEP
More informationESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?
ESE534: Computer Organization Day 22: April 9, 2012 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. 1 [src: www.tabula.com] 2 Previously Today Saw how to pipeline
More informationCentip3De: A 64-Core, 3D Stacked, Near-Threshold System
1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman
More informationSpeeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns
March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang Executive Summary Problems: performance and reliability of write operations
More informationFABRICATION TECHNOLOGIES
FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general
More informationDesign and Analysis of 32 bit SRAM architecture in 90nm CMOS Technology
Design and Analysis of 32 bit SRAM architecture in 90nm CMOS Technology Jesal P. Gajjar 1, Aesha S. Zala 2, Sandeep K. Aggarwal 3 1Research intern, GTU-CDAC, Pune, India 2 Research intern, GTU-CDAC, Pune,
More informationA Low Power SRAM Base on Novel Word-Line Decoding
Vol:, No:3, 008 A Low Power SRAM Base on Novel Word-Line Decoding Arash Azizi Mazreah, Mohammad T. Manzuri Shalmani, Hamid Barati, Ali Barati, and Ali Sarchami International Science Index, Computer and
More informationFeRAM Circuit Technology for System on a Chip
FeRAM Circuit Technology for System on a Chip K. Asari 1,2,4, Y. Mitsuyama 2, T. Onoye 2, I. Shirakawa 2, H. Hirano 1, T. Honda 1, T. Otsuki 1, T. Baba 3, T. Meng 4 1 Matsushita Electronics Corp., Osaka,
More informationSelf-Time Tracking Circuit to Improve Access Time of SRAM
Self-Time Tracking Circuit to Improve Access Time of SRAM Pullareddy. A Research Scholar, Department of Electronics and Communication Engineering, Sri Venkateswara University College of Engineering, Sri
More informationSurvey on Stability of Low Power SRAM Bit Cells
International Journal of Electronics Engineering Research. ISSN 0975-6450 Volume 9, Number 3 (2017) pp. 441-447 Research India Publications http://www.ripublication.com Survey on Stability of Low Power
More informationMemory Design I. Semiconductor Memory Classification. Read-Write Memories (RWM) Memory Scaling Trend. Memory Scaling Trend
Array-Structured Memory Architecture Memory Design I Professor hris H. Kim University of Minnesota Dept. of EE chriskim@ece.umn.edu 2 Semiconductor Memory lassification Read-Write Memory Non-Volatile Read-Write
More informationDIRECT Rambus DRAM has a high-speed interface of
1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999 A 1.6-GByte/s DRAM with Flexible Mapping Redundancy Technique and Additional Refresh Scheme Satoru Takase and Natsuki Kushiyama
More informationDesign and Implementation of High Performance Application Specific Memory
Design and Implementation of High Performance Application Specific Memory - 고성능 Application Specific Memory 의설계와구현 - M.S. Thesis Sungdae Choi Dec. 20th, 2002 Outline Introduction Memory for Mobile 3D Graphics
More informationZ-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.
Z-RAM Ultra-Dense Memory for 90nm and Below Hot Chips 2006 David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Outline Device Overview Operation Architecture Features Challenges Z-RAM Performance
More informationA Comparative Study of Power Efficient SRAM Designs
A Comparative tudy of Power Efficient RAM Designs Jeyran Hezavei, N. Vijaykrishnan, M. J. Irwin Pond Laboratory, Department of Computer cience & Engineering, Pennsylvania tate University {hezavei, vijay,
More informationSwitched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network
Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Lixue Xia, Tianqi Tang, Wenqin Huangfu, Ming Cheng, Xiling Yin, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E., Tsinghua
More informationDigital Integrated Circuits (83-313) Lecture 7: SRAM. Semester B, Lecturer: Dr. Adam Teman Itamar Levi, Robert Giterman.
Digital Integrated Circuits (83-313) Lecture 7: SRAM Semester B, 2016-17 Lecturer: Dr. Adam Teman TAs: Itamar Levi, Robert Giterman 16 May 2017 Disclaimer: This course was prepared, in its entirety, by
More informationSpiral 2-9. Tri-State Gates Memories DMA
2-9.1 Spiral 2-9 Tri-State Gates Memories DMA 2-9.2 Learning Outcomes I understand how a tri-state works and the rules for using them to share a bus I understand how SRAM and DRAM cells perform reads and
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1
ISSCC 26 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 22.1 A 125µW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ting-An Lin 2, Sheng-Zen Wang 2, Wen-Ping Lee
More informationOptimizing Standby
Optimizing Power @ Standby Memory Benton H. Calhoun Jan M. Rabaey Chapter Outline Memory in Standby Voltage Scaling Body Biasing Periphery Memory Dominates Processor Area SRAM is a major source of static
More informationDynamic Write Limited Minimum Operating Voltage for Nanoscale SRAMs
Dynamic Write Limited Minimum Operating Voltage for Nanoscale SRAMs Satyanand Nalam, Vikas Chandra, Robert C. Aitken, Benton H. Calhoun Dept. of ECE, University of Virginia, Charlottesville; ARM R&D, San
More informationX-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories
X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories Amogh Agrawal*, Akhilesh Jaiswal*, Chankyu Lee and Kaushik Roy, Fellow, IEEE School of Electrical and Computer Engineering,
More informationSimulation and Analysis of SRAM Cell Structures at 90nm Technology
Vol.1, Issue.2, pp-327-331 ISSN: 2249-6645 Simulation and Analysis of SRAM Cell Structures at 90nm Technology Sapna Singh 1, Neha Arora 2, Prof. B.P. Singh 3 (Faculty of Engineering and Technology, Mody
More informationMassively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain
Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,
More informationImplementation of DRAM Cell Using Transmission Gate
Implementation of DRAM Cell Using Transmission Gate Pranita J. Giri 1, Sunanda K. Kapde 2 PG Student, Department of E&TC, Deogiri Institute of Engineering & Management Studies, Aurangabad (MS), India 1
More informationDesign of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.1, FEBRUARY, 2015 http://dx.doi.org/10.5573/jsts.2015.15.1.077 Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network
More informationCalibrating Achievable Design GSRC Annual Review June 9, 2002
Calibrating Achievable Design GSRC Annual Review June 9, 2002 Wayne Dai, Andrew Kahng, Tsu-Jae King, Wojciech Maly,, Igor Markov, Herman Schmit, Dennis Sylvester DUSD(Labs) Calibrating Achievable Design
More informationBased on slides/material by. Topic 7-4. Memory and Array Circuits. Outline. Semiconductor Memory Classification
Based on slides/material by Topic 7 Memory and Array Circuits K. Masselos http://cas.ee.ic.ac.uk/~kostas J. Rabaey http://bwrc.eecs.berkeley.edu/classes/icbook/instructors.html Digital Integrated Circuits:
More informationResearch Scholar, Chandigarh Engineering College, Landran (Mohali), 2
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Optimize Parity Encoding for Power Reduction in Content Addressable Memory Nisha Sharma, Manmeet Kaur 1 Research Scholar, Chandigarh
More informationDon t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration
Don t Forget the : Automatic Block RAM Modelling, Optimization, and Architecture Exploration S. Yazdanshenas, K. Tatsumura *, and V. Betz University of Toronto, Canada * Toshiba Corporation, Japan : An
More information