A 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array. Mingu Kang, Sujan Gonugondla, Naresh Shanbhag
|
|
- Berniece Randall
- 5 years ago
- Views:
Transcription
1 A 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array Mingu Kang, Sujan Gonugondla, Naresh Shanbhag University of Illinois at Urbana Champaign
2 Machine Learning under Resource Constraints Embedded statistical inference: IoT, sensor-rich platforms Decision making under resource constraints Limited form factor, battery-powered, real-time 2
3 The Random Forest (RF) Algorithm Random Forest [1] Ensemble of many (a few hundreds) decision trees High accuracy Simple computation (only comparisons) Suitable for multi-class classifications Inherent error-resiliency (from ensemble nature) RF algorithm [1] L. Breiman, Machine Learning2001 3
4 Implementation Challenges Implementation challenges Non-uniform tree structure - Variations in depth, # of nodes, symmetricity Frequent memory access (,,, - Memory dominates the system efficiency Irregular data access pattern:, RF algorithm Prior Art: Software and FPGA implementations. No ASIC. Fails to take advantage of inherent error-resiliency 4
5 Proposed Solution: Deep In-memory Architecture (DIMA) with DSS DIMA [2-4]: Embedded analog processing Storage density, normal read & write function preserved FR: functional read BLP: bitline processor (subtraction, comparison) CBLP: cross BLP (aggregation) RDL: ADC & residual digital logic Deterministic sub-sampling (DSS) Regularizes memory access pattern [2] M.Kang, et al., ICASSP14 [3] M.Kang, et al., Arxiv16 [4] M.Kang, et al., US Patent no. 9,697,877 5
6 RF Chip Architecture Proposed architecture SRAM bitcell array Stores up to 42 groups Each group has 4 sub-group (1 sub group = 1 tree) Input buffer Stores 4:1 sub-sampled pixels in 4 sections for DSS Cross bar (CB) 31 CB units per sub-group enabled in parallel Comparator (COMP) 128 analog comparators (. ) -IREG: pixel index register, RSREG: RSS register 6
7 Functional READ (FR) Δ Δ Δ 0.5 Δ Conventional read Functional read (FR) -B: bit precision, L: column mux ratio Fetches and computes the linear combination of stored data into analog (LB) times more data access per read & precharge Savings in energy & delay at the cost of reduced SNR 7
8 In-memory Bitline Processing Subtraction 2 Store and in the same column, Comparison: > < : variation due to possible cominations of (T MSB, X MSB ) at the T MSB X MSB value V BL (V) A column of SRAM array X MSB 15 0 T MSB = T MSB X MSB T MSB 0 15 X MSB = Measured subtraction in a 65nm CMOS 8 0
9 Deterministic Sub-sampling (DSS) Random sub-sampling (RSS) Requires complex cross bar (e.g., 256:1 for 256-pixel ) Proposed RF algorithm Deterministic sub-sampling (DSS) before RSS Sub-samples to generate four sub-images,,, Reduces cross bar complexity (e.g., 256:1 64:1) More than 3 and 4 energy and layout area savings 4:1 chosen due to accuracy vs. sub-sampling ratio trade-off 9
10 Application & Measured Results KUL Belgium traffic sign dataset Training (off-chip) 200 images per class employed for training Bit precision: 8, tree depth: 6, 64 trees Testing Randomly chosen 200 testing images from test data set Platform (65nm CMOS) # of trees Max tree Depth Classification rate (decisions/ms) Energy per decision (nj) Energy delay product (fj s) Accuracy (%) Conv. Arch /bank Proposed Arch /bank EDP reduction by
11 Measured Energy vs. Accuracy Trade-off Accuracy vs. # of trees vs. Δ Accuracy vs. energy w.r.t BL swing (Δ )* Accuracy BL swing Energy # of trees error resiliency allows lower BL swing higher energy efficiency *Δ for conv. is 10 Δ per LSB 11
12 Chip Summary & Comparison Chip micrograph Technology Die size SRAM capacity Chip summary 65 nm CMOS mm 16 KB ( bit-cells) Bit-cell size um 2 CTRL CLK freq. Supply voltage (V) 1 GHz CORE 1.0 CTRL 0.75 Prior art [5] [6] Ours (M=64) Process Algorithm 130nm CMOS 14nm tri-gate 65nm CMOS Support vector machine K-nearest neighbor Random forest Comparison with state-of-the-art Dataset Traffic sign video Not reported KUL traffic signs Input size (8b) Throughput (decision/s) 33 [40K]* 21.5M [498.8K]* 364.4K Energy (nj/decision) 1.5M [1250]* 3.4 [145.3]* 19.4 (w/ CTRL) EDP (fjs/decision) 45G [31250]* 0.2 [292.3]* Accuracy 90% Not reported % [5]: J.Park JSSC12, [6]: H.Kaul ISSCC16, *scaled to 65 nm CMOS 12
13 Conclusions First ASIC implementation of RF algorithm low-snr processing via DIMA and DSS Energy & speed benefits 2.2 and 3.1 smaller delay and energy 6.8 smaller EDP compared to digital ASIC Higher potential in large-scale applications # of trees up to a few hundreds in real-life applications Higher error-resiliency More room to scale for energy efficiency Future work On-chip training to compensate process variations Different algorithms (e.g., boosted ensemble classifier) 13
14 Acknowledgment This work was supported by Systems on Nanoscale Information fabrics (SONIC), one of the six SRC STARnet Centers, sponsored by SRC and DARPA. 14
Article begins on next page
Title: A 19.4 nj/ 364K s/s in-memory random forest classifier in 6T SRAM array Archived version Accepted manuscript: the content is identical to the published paper, but without the final typesetting by
More informationarxiv: v1 [cs.ar] 24 Oct 2016
A 481pJ/decision 3.4M decision/s Multifunctional Deep In-memory Inference Processor using Standard 6T SRAM Array arxiv:161.751v1 [cs.ar] 24 Oct 216 Mingu Kang, Sujan Gonugondla, Ameya Patil, and Naresh
More information/ISCAS
Title Energy-Efficient Deep In-memory Architecture for NAND Flash Memories Archived version Accepted manuscript: the content is same as the published paper but without the final typesetting by the publisher
More informationScalable series-stacked power delivery architectures for improved efficiency and reduced supply current
Scalable series-stacked power delivery architectures for improved efficiency and reduced supply current Robert Pilawa Enver Candan, Josiah McClurg, Sai Zhang, Pradeep Shenoy* Phil Krein, Naresh Shanbhag
More informationCommunications-inspired Design for the Deep Nanoscale Era
Communications-inspired Design for the Deep Nanoscale Era Naresh Shanbhag Department of Electrical and Computer Engineering Coordinated Science Laboratory University of Illinois at Urbana-Champaign shanbhag@uiuc.edu
More information6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1
6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,
More informationVdd Programmable and Variation Tolerant FPGA Circuits and Architectures
Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance
More informationMillimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells
1 Millimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells Gregory Chen, Matthew Fojtik, Daeyeon Kim, David Fick, Junsun Park, Mingoo Seok, Mao-Ter Chen, Zhiyoong Foo, Dennis
More informationAn Overview of Standard Cell Based Digital VLSI Design
An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,
More informationDesign of Low Power Wide Gates used in Register File and Tag Comparator
www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,
More informationVLSID KOLKATA, INDIA January 4-8, 2016
VLSID 2016 KOLKATA, INDIA January 4-8, 2016 Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in Hybrid Memory Cube Architectures Ishan Thakkar, Sudeep Pasricha Department of Electrical
More informationCS250 VLSI Systems Design Lecture 9: Memory
CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled
More informationMemory in Digital Systems
MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked
More informationDRAM with Boosted 3T Gain Cell, PVT-tracking Read Reference Bias
ASub-0 Sub-0.9V Logic-compatible Embedded DRAM with Boosted 3T Gain Cell, Regulated Bit-line Write Scheme and PVT-tracking Read Reference Bias Ki Chul Chun, Pulkit Jain, Jung Hwa Lee*, Chris H. Kim University
More informationUnleashing the Power of Embedded DRAM
Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers
More informationAbbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University
Abbas El Gamal Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program Stanford University Chip stacking Vertical interconnect density < 20/mm Wafer Stacking
More informationThe Memory Hierarchy 1
The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow
More informationMemory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.
Memory Design I Professor Chris H. Kim University of Minnesota Dept. of ECE chriskim@ece.umn.edu Array-Structured Memory Architecture 2 1 Semiconductor Memory Classification Read-Write Wi Memory Non-Volatile
More informationA 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS
A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS ABSTRACT We describe L1 cache designed for digital signal processor (DSP) core. The cache is 32KB with variable associativity (4 to 16 ways) and is pseudo-dual-ported.
More informationCentip3De: A 64-Core, 3D Stacked, Near-Threshold System
1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman
More informationSignificance Driven Hybrid 8T-6T SRAM for Energy-Efficient Synaptic Storage in Artificial Neural Networks
Significance Driven Hybrid 8T-6T SRAM for Energy-Efficient Synaptic Storage in Artificial Neural Networks Gopalakrishnan Srinivasan, Parami Wijesinghe, Syed Shakib Sarwar, Akhilesh Jaiswal, and Kaushik
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationIndex. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,
Index A Algorithmic noise tolerance (ANT), 93 94 Application specific instruction set processors (ASIPs), 115 116 Approximate computing application level, 95 circuits-levels, 93 94 DAS and DVAS, 107 110
More informationDesign and Implementation of High Performance Application Specific Memory
Design and Implementation of High Performance Application Specific Memory - 고성능 Application Specific Memory 의설계와구현 - M.S. Thesis Sungdae Choi Dec. 20th, 2002 Outline Introduction Memory for Mobile 3D Graphics
More informationLow-Power SRAM and ROM Memories
Low-Power SRAM and ROM Memories Jean-Marc Masgonty 1, Stefan Cserveny 1, Christian Piguet 1,2 1 CSEM, Neuchâtel, Switzerland 2 LAP-EPFL Lausanne, Switzerland Abstract. Memories are a main concern in low-power
More informationSpeeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns
March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang Executive Summary Problems: performance and reliability of write operations
More informationA 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More informationFABRICATION TECHNOLOGIES
FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general
More informationMemory in Digital Systems
MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked
More informationA 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter
A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter Bongjin Kim, Weichao Xu, and Chris H. Kim University of Minnesota,
More informationMore Course Information
More Course Information Labs and lectures are both important Labs: cover more on hands-on design/tool/flow issues Lectures: important in terms of basic concepts and fundamentals Do well in labs Do well
More informationA 1-GHz Configurable Processor Core MeP-h1
A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface
More informationSpin-Hall Effect MRAM Based Cache Memory: A Feasibility Study
Spin-Hall Effect MRAM Based Cache Memory: A Feasibility Study Jongyeon Kim, Bill Tuohy, Cong Ma, Won Ho Choi, Ibrahim Ahmed, David Lilja, and Chris H. Kim University of Minnesota Dept. of ECE 1 Overview
More informationMEMORIES. Memories. EEC 116, B. Baas 3
MEMORIES Memories VLSI memories can be classified as belonging to one of two major categories: Individual registers, single bit, or foreground memories Clocked: Transparent latches and Flip-flops Unclocked:
More informationLecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.
Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports
More informationMacro in a Generic Logic Process with No Boosted Supplies
A 700MHz 2T1C Embedded DRAM Macro in a Generic Logic Process with No Boosted Supplies Ki Chul Chun, Wei Zhang, Pulkit Jain, and Chris H. Kim University of Minnesota, Minneapolis, MN Outline Motivation
More informationA 65nm 8T Sub-V t SRAM Employing Sense-Amplifier Redundancy
A 65nm Sub-V t SRAM Employing Sense-Amplifier Redundancy Naveen Verma and Anantha Chandrakasan Massachusetts Institute of Technology ISSCC 2007 Energy Minimization Minimum energy V DD for logic results
More informationarxiv: v1 [cs.ar] 3 Jul 2016
Reducing the Energy Cost of Inference via In-sensor Information Processing arxiv:1607.00667v1 [cs.ar] 3 Jul 2016 Sai Zhang Mingu Kang Charbel Sakr Naresh Shanbhag Department of Electrical and Computer
More informationVersatile RRAM Technology and Applications
Versatile RRAM Technology and Applications Hagop Nazarian Co-Founder and VP of Engineering, Crossbar Inc. Santa Clara, CA 1 Agenda Overview of RRAM Technology RRAM for Embedded Memory Mass Storage Memory
More informationDon t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration
Don t Forget the : Automatic Block RAM Modelling, Optimization, and Architecture Exploration S. Yazdanshenas, K. Tatsumura *, and V. Betz University of Toronto, Canada * Toshiba Corporation, Japan : An
More informationZynq-7000 All Programmable SoC Product Overview
Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform
More informationedram to the Rescue Why edram 1/3 Area 1/5 Power SER 2-3 Fit/Mbit vs 2k-5k for SRAM Smaller is faster What s Next?
edram to the Rescue Why edram 1/3 Area 1/5 Power SER 2-3 Fit/Mbit vs 2k-5k for SRAM Smaller is faster What s Next? 1 Integrating DRAM and Logic Integrate with Logic without impacting logic Performance,
More informationHigh speed CMOS image sensors Wim Wuyts Sr. Staff Applications Engineer Cypress Semiconductor Corporation Belgium Vision 2006
High speed CMOS image sensors Wim Wuyts Sr. Staff Applications Engineer Cypress Semiconductor Corporation Belgium Vision 2006 P E R F O R M Outline Introduction Architecture Analog high speed CIS Digital
More informationA novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context.
A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context. Anselme Vignon, Stefan Cosemans, Wim Dehaene K.U. Leuven ESAT - MICAS Laboratory Kasteelpark Arenberg
More informationMemory in Digital Systems
MEMORIES Memory in Digital Systems Three primary components of digital systems Datapath (does the work) Control (manager) Memory (storage) Single bit ( foround ) Clockless latches e.g., SR latch Clocked
More informationPower Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas
Power Solutions for Leading-Edge FPGAs Vaughn Betz & Paul Ekas Agenda 90 nm Power Overview Stratix II : Power Optimization Without Sacrificing Performance Technical Features & Competitive Results Dynamic
More informationAdaptive Robustness Tuning for High Performance Domino Logic
Adaptive Robustness Tuning for High Performance Domino Logic Bharan Giridhar 1, David Fick 1, Matthew Fojtik 1, Sudhir Satpathy 1, David Bull 2, Dennis Sylvester 1 and David Blaauw 1 1 niversity of Michigan,
More informationColumn decoder using PTL for memory
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 5, Issue 4 (Mar. - Apr. 2013), PP 07-14 Column decoder using PTL for memory M.Manimaraboopathy
More informationfor High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami
3D Implemented dsram/dram HbidC Hybrid Cache Architecture t for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami Kyushu University
More informationDNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses
DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough 1,2 S. K. Lee 2, N. Mulholland 2, P. Hansen 2, S. Kodali 3, D. Brooks 2, G.-Y. Wei 2 1 ARM Research, Boston,
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2
ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2 9.2 A 80/20MHz 160mW Multimedia Processor integrated with Embedded DRAM MPEG-4 Accelerator and 3D Rendering Engine for Mobile Applications
More informationMinimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007
Minimizing Power Dissipation during Write Operation to Register Files Kimish Patel, Wonbok Lee, Massoud Pedram University of Southern California Los Angeles CA August 28 th, 2007 Introduction Outline Conditional
More informationComputer Architecture
Computer Architecture Lecture 7: Memory Hierarchy and Caches Dr. Ahmed Sallam Suez Canal University Spring 2015 Based on original slides by Prof. Onur Mutlu Memory (Programmer s View) 2 Abstraction: Virtual
More informationLecture 41: Introduction to Reconfigurable Computing
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following
More information! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 5, 8 Memory: Periphery circuits Today! Memory " RAM Memory " Architecture " Memory core " SRAM " DRAM " Periphery " Serial Access Memories
More informationZ-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.
Z-RAM Ultra-Dense Memory for 90nm and Below Hot Chips 2006 David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Outline Device Overview Operation Architecture Features Challenges Z-RAM Performance
More informationISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1
ISSCC 26 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 22.1 A 125µW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ting-An Lin 2, Sheng-Zen Wang 2, Wen-Ping Lee
More informationSUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.
SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, 2007 1 A Low-Power Field-Programmable Gate Array Routing Fabric Mingjie Lin Abbas El Gamal Abstract This paper describes a new FPGA
More informationReducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck
More informationAn Exact Algorithm for the Statistical Shortest Path Problem
An Exact Algorithm for the Statistical Shortest Path Problem Liang Deng and Martin D. F. Wong Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Outline Motivation
More informationemram: From Technology to Applications David Eggleston VP Embedded Memory
emram: From Technology to Applications David Eggleston VP Embedded Memory 10,000 foot view What are we trying to achieve? 2 Memory is Know Remembering. Think Events 3 Memory is Code Persistence. Data State
More informationGHz Asynchronous SRAM in 65nm. Jonathan Dama, Andrew Lines Fulcrum Microsystems
GHz Asynchronous SRAM in 65nm Jonathan Dama, Andrew Lines Fulcrum Microsystems Context Three Generations in Production, including: Lowest latency 24-port 10G L2 Ethernet Switch Lowest Latency 24-port 10G
More informationEE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements
EE241 - Spring 2007 Advanced Digital Integrated Circuits Lecture 22: SRAM Announcements Homework #4 due today Final exam on May 8 in class Project presentations on May 3, 1-5pm 2 1 Class Material Last
More informationA 32 kb 10T sub-threshold sram array with bitinterleaving and differential read scheme in 90 nm CMOS
Purdue University Purdue e-pubs Department of Electrical and Computer Engineering Faculty Publications Department of Electrical and Computer Engineering January 2009 A 32 kb 10T sub-threshold sram array
More informationA Write-Back-Free 2T1D Embedded. a Dual-Row-Access Low Power Mode.
A Write-Back-Free 2T1D Embedded DRAM with Local Voltage Sensing and a Dual-Row-Access Low Power Mode Wei Zhang, Ki Chul Chun, Chris H. Kim University of Minnesota, Minneapolis, MN zhang758@umn.edu Outline
More informationDynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers
Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California
More informationA Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM
IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 09, 2016 ISSN (online): 2321-0613 A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM Yogit
More informationC ELEMENTS LINEAR IMAGE SENSOR DATA SHEET
March 2008 4000 ELEMENTS LINEAR IMAGE SENSOR DATA SHEET Website: http://www.csensor.com / E-mail : sales@csensor.com March 06, 2007 Page 1 Contents 1. General description ------------------------------------------------------
More informationVulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques
Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques Yu Cai, Saugata Ghose, Yixin Luo, Ken Mai, Onur Mutlu, Erich F. Haratsch February 6, 2017
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationThe DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell
EEC 581 Computer Architecture Memory Hierarchy Design (III) Department of Electrical Engineering and Computer Science Cleveland State University The DRAM Cell Word Line (Control) Bit Line (Information)
More informationSAR ADC That is Configurable to Optimize Yield
APCCAS 2010 Session :Hang Jebat 1 & 2 ADC / DAC II Paper ID : 1569334445 SAR ADC That is Configurable to Optimize Yield T. Ogawa, H. Kobayashi, Y. Tan, S. Ito, S. Uemori, N. Takai, K. Niitsu, T. J. Yamaguchi,
More informationA Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache
A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache Stefan Rusu Intel Corporation Santa Clara, CA Intel and the Intel logo are registered trademarks of Intel Corporation or its subsidiaries in
More informationA 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology
http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee
More informationEE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali.
EE219A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 9 FPGA Architecture Ranier Yap, Mohamed Ali Annoucements Homework 2 posted Due Wed, May 7 Now is the time to turn-in your Hw
More informationKiloCore: A 32 nm 1000-Processor Array
KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation
More informationMassively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain
Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,
More informationCurrent and Projected Digital Complexity of DMT VDSL
June 1, 1999 1 Standards Project: T1E1.4:99-268 VDSL Title: Current and Projected Digital Complexity of DMT VDSL Source: Texas Instruments Author: C. S. Modlin J. S. Chow Texas Instruments 2043 Samaritan
More informationHigh-speed Serial Interface
High-speed Serial Interface Lect. 16 Clock and Data Recovery 3 1 CDR Design Example ( 권대현 ) Clock and Data Recovery Circuits Transceiver PLL vs. CDR High-speed CDR Phase Detector Charge Pump Voltage Controlled
More informationThe Kinect Sensor. Luís Carriço FCUL 2014/15
Advanced Interaction Techniques The Kinect Sensor Luís Carriço FCUL 2014/15 Sources: MS Kinect for Xbox 360 John C. Tang. Using Kinect to explore NUI, Ms Research, From Stanford CS247 Shotton et al. Real-Time
More informationHigh Performance Memory Read Using Cross-Coupled Pull-up Circuitry
High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA
More informationLecture 14. Advanced Technologies on SRAM. Fundamentals of SRAM State-of-the-Art SRAM Performance FinFET-based SRAM Issues SRAM Alternatives
Source: Intel the area ratio of SRAM over logic increases Lecture 14 Advanced Technologies on SRAM Fundamentals of SRAM State-of-the-Art SRAM Performance FinFET-based SRAM Issues SRAM Alternatives Reading:
More informationInternational Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering
IP-SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY A LOW POWER DESIGN D. Harihara Santosh 1, Lagudu Ramesh Naidu 2 Assistant professor, Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India
More informationCS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory
CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5
More informationA Magnetoelectronic Register File Cell for a Self-Checkpointing Microprocessor
1 A Magnetoelectronic Register File Cell for a Self-Checkpointing Microprocessor Nitin Navale, Erica Lundgren, and Nicholas P. Carter Coordinated Science Laboratory University of Illinois at Urbana-Champaign
More information+1 (479)
Memory Courtesy of Dr. Daehyun Lim@WSU, Dr. Harris@HMC, Dr. Shmuel Wimer@BIU and Dr. Choi@PSU http://csce.uark.edu +1 (479) 575-6043 yrpeng@uark.edu Memory Arrays Memory Arrays Random Access Memory Serial
More informationOptimizing Standby
Optimizing Power @ Standby Memory Benton H. Calhoun Jan M. Rabaey Chapter Outline Memory in Standby Voltage Scaling Body Biasing Periphery Memory Dominates Processor Area SRAM is a major source of static
More informationImplementation of DRAM Cell Using Transmission Gate
Implementation of DRAM Cell Using Transmission Gate Pranita J. Giri 1, Sunanda K. Kapde 2 PG Student, Department of E&TC, Deogiri Institute of Engineering & Management Studies, Aurangabad (MS), India 1
More informationESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems
ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 26: November 9, 2018 Memory Overview Dynamic OR4! Precharge time?! Driving input " With R 0 /2 inverter! Driving inverter
More informationSurvey on Stability of Low Power SRAM Bit Cells
International Journal of Electronics Engineering Research. ISSN 0975-6450 Volume 9, Number 3 (2017) pp. 441-447 Research India Publications http://www.ripublication.com Survey on Stability of Low Power
More informationarxiv: v1 [cs.cv] 11 Feb 2018
arxiv:8.8v [cs.cv] Feb 8 - Partitioning of Deep Neural Networks with Feature Space Encoding for Resource-Constrained Internet-of-Things Platforms ABSTRACT Jong Hwan Ko, Taesik Na, Mohammad Faisal Amir,
More informationSemiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 4, 7 Memory Overview, Memory Core Cells Today! Memory " Classification " ROM Memories " RAM Memory " Architecture " Memory core " SRAM
More informationA Partial Memory Protection Scheme for Higher Effective Yield of Embedded Memory for Video Data
A Partial Protection Scheme for Higher Effective Yield of Embedded for Video Data Kang Yi1, Shih-Yang Cheng2, Fadi Kurdahi2, and Ahmed Eltawil2 1 School of Computer Sci. and Electrical Eng., Handong Global
More informationSilicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly)
Memories and SRAM 1 Silicon Memories Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap Dense -- The smaller the bits, the less area you need,
More informationSigmaRAM Echo Clocks
SigmaRAM Echo s AN002 Introduction High speed, high throughput cell processing applications require fast access to data. As clock rates increase, the amount of time available to access and register data
More informationNeural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Charles Eckert Xiaowei Wang Jingcheng Wang Arun Subramaniyan Ravi Iyer Dennis Sylvester David Blaauw Reetuparna Das M-Bits Research
More informationNAND Flash Memory: Basics, Key Scaling Challenges and Future Outlook. Pranav Kalavade Intel Corporation
NAND Flash Memory: Basics, Key Scaling Challenges and Future Outlook Pranav Kalavade Intel Corporation pranav.kalavade@intel.com October 2012 Outline Flash Memory Product Trends Flash Memory Device Primer
More informationECE 2300 Digital Logic & Computer Organization
ECE 2300 Digital Logic & Computer Organization Spring 201 Memories Lecture 14: 1 Announcements HW6 will be posted tonight Lab 4b next week: Debug your design before the in-lab exercise Lecture 14: 2 Review:
More informationPostsilicon Adaptation for Low-Power SRAM under Process Variation
Postsilicon Calibration and Repair for Yield and Reliability Improvement Postsilicon Adaptation for Low-Power SRAM under Process Variation Minki Cho Georgia Institute of Technology Jason Schlessman Princeton
More information