DRISA: A DRAM-based Reconfigurable In-Situ Accelerator
|
|
- Magdalen Berry
- 5 years ago
- Views:
Transcription
1 DRI: A DRAM-based Reconfigurable In-Situ Accelerator Shuangchen Li, Dimin Niu, Krishna T. Malladi, Hongzhong Zheng, Bob Brennan, Yuan Xie University of California, Santa Barbara Memory Solutions Lab, Samsung Semiconductor Inc. Scalable and Energy-efficient Architecture Lab (SEAL) SEAL@UCSB
2 Normalized On-chip Mem.Capacity per Area Scalable and Energy-efficient Architecture Lab (SEAL) Motivation and Observation 1.E+03 Merging the computing resources and memory fabrics 1.E+02 1.E+01 1.E+00 1E+00 1E+01 1E+02 1E+03 1E+04 Normalized Peak Perf. per Area 2
3 Normalized On-chip Mem.Capacity per Area Scalable and Energy-efficient Architecture Lab (SEAL) Motivation and Observation 1.E+03 1.E+02 Merging the computing resources and memory fabrics Memory-rich processor: low memory capacity 1.E+01 Memory-rich Processor Dadiannao Shidiannao (ASICs) TITAN X (GPU) 1.E+00 1E+00 1E+01 1E+02 1E+03 1E+04 Normalized Peak Perf. per Area 2
4 Normalized On-chip Mem.Capacity per Area Scalable and Energy-efficient Architecture Lab (SEAL) Motivation and Observation 1.E+03 1.E+02 1.E+01 BufferedComp NeuroCube Compute-capable Memory (PIM) Memory-rich Processor Dadiannao Shidiannao (ASICs) TITAN X (GPU) 1.E+00 1E+00 1E+01 1E+02 1E+03 1E+04 Normalized Peak Perf. per Area Merging the computing resources and memory fabrics Memory-rich processor: low memory capacity Compute-capable memory: low performance 2
5 Normalized On-chip Mem.Capacity per Area Scalable and Energy-efficient Architecture Lab (SEAL) Motivation and Observation 1.E+03 1.E+02 1.E+01 BufferedComp NeuroCube Compute-capable Memory (PIM) Memory-rich Processor This Work Dadiannao Shidiannao (ASICs) TITAN X (GPU) 1.E+00 1E+00 1E+01 1E+02 1E+03 1E+04 Normalized Peak Perf. per Area Merging the computing resources and memory fabrics Memory-rich processor: low memory capacity Compute-capable memory: low performance 2
6 Normalized On-chip Mem.Capacity per Area Scalable and Energy-efficient Architecture Lab (SEAL) Motivation and Observation 1.E+03 1.E+02 1.E+01 BufferedComp NeuroCube Compute-capable Memory (PIM) Memory-rich Processor This Work Dadiannao Shidiannao (ASICs) TITAN X (GPU) 1.E+00 1E+00 1E+01 1E+02 1E+03 1E+04 Normalized Peak Perf. per Area Merging the computing resources and memory fabrics Memory-rich processor: low memory capacity Compute-capable memory: low performance To have BOTH: (1) Use DRAM technology (2) Remove sys-memory constraints Building an accelerator with DRAM technology 2
7 Key Ideas and Approaches To have BOTH: (1) Use DRAM technology (2) Remove sys-memory constraints Building an accelerator with DRAM technology 3
8 Key Ideas and Approaches To have BOTH: (1) Use DRAM technology (2) Remove sys-memory constraints DRAM technology Building an accelerator with DRAM technology 3
9 Key Ideas and Approaches To have BOTH: (1) Use DRAM technology (2) Remove sys-memory constraints DRAM technology Logic Incompatible Building an accelerator with DRAM technology 3
10 Key Ideas and Approaches To have BOTH: (1) Use DRAM technology (2) Remove sys-memory constraints Building an accelerator with DRAM technology DRAM technology Logic Incompatible Simple Boolean logic Operation Cells Bitline NOR 3
11 Key Ideas and Approaches To have BOTH: (1) Use DRAM technology (2) Remove sys-memory constraints Building an accelerator with DRAM technology DRAM technology Logic Incompatible Simple Boolean logic Operation General Purpose Reconfigurable Cells Bitline NOR SHIFT 3
12 Key Ideas and Approaches To have BOTH: (1) Use DRAM technology (2) Remove sys-memory constraints Building an accelerator with DRAM technology DRAM technology Logic Incompatible Simple Boolean logic operations General Purpose High Pref. Reconfigurable Improve Parallelism Cells Bitline Multisubarray active Unblock Data Mov. Optimize Activation NOR SHIFT Multi-bank active 3
13 Architecture Overview Group Bank Bank Bank Bank Group Group (a) Chip DRAM modifications: 4
14 bctrl Scalable and Energy-efficient Architecture Lab (SEAL) Architecture Overview Group Bank Bank Mat Mat Bank Bank Group Group Subarry (a) Chip DRAM modifications: (b) Bank 4
15 bctrl Scalable and Energy-efficient Architecture Lab (SEAL) Architecture Overview Group Bank Bank Bank Bank Mat Mat sctrl DRAM Cells supports Boolean logic operations Group Group Subarry Shifter (a) Chip DRAM modifications: (b) Bank (c) Subarray and mat 4
16 bctrl Scalable and Energy-efficient Architecture Lab (SEAL) Architecture Overview Group Bank Bank Bank Bank Mat Mat sctrl DRAM Cells supports Boolean logic operations Group Group Subarry Shifter (a) Chip DRAM modifications: Change decoders to controllers (b) Bank (c) Subarray and mat 4
17 bctrl Scalable and Energy-efficient Architecture Lab (SEAL) Architecture Overview Group Bank Bank Bank Bank Mat Mat sctrl DRAM Cells supports Boolean logic operations Group Group Subarry Shifter (a) Chip (b) Bank DRAM modifications: Change decoders to controllers Change to support logic operations (c) Subarray and mat 4
18 bctrl Scalable and Energy-efficient Architecture Lab (SEAL) Architecture Overview Group Bank Bank Bank Bank Mat Mat sctrl DRAM Cells supports Boolean logic operations Group Group Subarry Shifter (a) Chip (b) Bank DRAM modifications: Change decoders to controllers Change to support logic operations Add shifters (c) Subarray and mat 4
19 bctrl Architecture Overview Group Group Bank Bank Bank Bank Group Mat Subarry Mat sctrl DRAM Cells supports Boolean logic operations Shifter (a) Chip (b) Bank DRAM modifications: Change decoders to controllers Change to support logic operations Add shifters (c) Subarray and mat Others: Group/Bank buffers helps internal data transfer, Bank/Subarray reorganization, Spitted cell array regions 4
20 Make BL Be Able To Compute (1/2) Three solutions: Cells Bitline NOR SHIFT 5
21 Make BL Be Able To Compute (1/2) Three solutions: 3T1C: natural NOR on BL NOR Cells Bitline SHIFT 3T1C-NOR Rs Rt Rr wbl rwl wwl rbl 5
22 Make BL Be Able To Compute (1/2) Three solutions: 3T1C: natural NOR on BL 1T1C: adds gates or adopting AMBIT s methods NOR Cells Bitline SHIFT 3T1C-NOR Rs rwl Rt wwl Rr rbl wbl Rs Rt Rr 1T1C-NOR/MIX and or Pre-load <0.5 > Or Rs Rt Rr logic gate latch 5
23 Make BL Be Able To Compute (1/2) Three solutions: 3T1C: natural NOR on BL 1T1C: adds gates or adopting AMBIT s methods 1T1C-adder: adds full-adders to BL NOR Cells Bitline SHIFT 3T1C-NOR Rs rwl Rt wwl Rr rbl wbl Rs Rt Rr 1T1C-NOR/MIX and or Pre-load <0.5 > Or Rs Rt Rr logic gate latch 1T1C-ADDER Rs Rt Rr latches n-bit adder
24 Make BL Be Able To Compute (2/2) Example: selector R = (S == 1)? X: Y Cells Bitline NOR SHIFT 6
25 Make BL Be Able To Compute (2/2) Example: selector R = (S == 1)? X: Y Cells Bitline R = S X + ሚS Y NOR SHIFT 6
26 Make BL Be Able To Compute (2/2) Example: selector R = (S == 1)? X: Y Cells Bitline R = S X + ሚS Y NOR-only logic R = NOR( NOR( ሚS, X), NOR(S, Y) ) NOR SHIFT 6
27 Make BL Be Able To Compute (2/2) Example: selector R = (S == 1)? X: Y Cells Bitline R = S X + ሚS Y NOR-only logic X Y S NOR SHIFT R = NOR( NOR( ሚS, X), NOR(S, Y) ) 6
28 Make BL Be Able To Compute (2/2) Example: selector R = (S == 1)? X: Y Cells Bitline R = S X + ሚS Y NOR-only logic R = NOR( NOR( ሚS, X), NOR(S, Y) ) X Y S!X NOR SHIFT Step-1: X = NOR(0, X) 6
29 Make BL Be Able To Compute (2/2) Bitline Example: selector R = (S == 1)? X: Y Cells R = S X + ሚS Y NOR-only logic R = NOR( NOR( ሚS, X), NOR(S, Y) ) X Y S!X!Y NOR SHIFT Step-1: Step-2: X = NOR(0, X) Y = NOR(0, Y) 6
30 Make BL Be Able To Compute (2/2) Bitline Example: selector R = (S == 1)? X: Y Cells R = S X + ሚS Y NOR-only logic R = NOR( NOR( ሚS, X), NOR(S, Y) ) Step-1: X = NOR(0, X) X Y S!X!Y!S NOR SHIFT Step-2: Step-3: Y = NOR(0, Y) ሚS = NOR(0, S) 6
31 Make BL Be Able To Compute (2/2) Example: selector R = (S == 1)? X: Y Cells Bitline R = S X + ሚS Y NOR-only logic R = NOR( NOR( ሚS, X), NOR(S, Y) ) Step-4: tmp1 = NOR( ሚS, X) X Y S!X!Y!S!(!X+!S) NOR SHIFT 6
32 Make BL Be Able To Compute (2/2) Example: selector R = (S == 1)? X: Y Cells Bitline R = S X + ሚS Y NOR-only logic R = NOR( NOR( ሚS, X), NOR(S, Y) ) Step-4: Step-5: tmp1 = NOR( ሚS, X) tmp2 = NOR(S, Y) X Y S!X!Y!S!(!X+!S)!(!Y+S) NOR SHIFT 6
33 Make BL Be Able To Compute (2/2) Bitline Example: selector R = (S == 1)? X: Y Cells R = S X + ሚS Y NOR-only logic R = NOR( NOR( ሚS, X), NOR(S, Y) ) Step-4: Step-5: Step-6: tmp1 = NOR( ሚS, X) tmp2 = NOR(S, Y) R = NOR(tmp1,tmp2) X Y S!X!Y!S!(!X+!S)!(!Y+S)!R NOR SHIFT 6
34 Make BL Be Able To Compute (2/2) Example: selector R = (S == 1)? X: Y Cells Bitline R = S X + ሚS Y NOR-only logic R = NOR( NOR( ሚS, X), NOR(S, Y) ) Step-7: R = NOR(0, R) X Y S!X!Y!S!(!X+!S)!(!Y+S)!R NOR SHIFT R 6
35 Why include shifters: E.g., carry-in propagation Shifters (1/2) NOR Cells Bitline SHIFT 7
36 Why include shifters: E.g., carry-in propagation Shifters (1/2) Cells Bitline X 1 Y 1 X 0 Y 0 NOR SHIFT C in0 7
37 Why include shifters: E.g., carry-in propagation Shifters (1/2) Cells Bitline X 1 Y 1 X 0 Y 0 NOR SHIFT C in0 S 0 7
38 Why include shifters: E.g., carry-in propagation Shifters (1/2) Cells Bitline X 1 Y 1 X 0 Y 0 NOR SHIFT C in0 S 0 C out0 7
39 Why include shifters: E.g., carry-in propagation X 1 Y 1 Y 0 Y 1 X 0 Shifters (1/2) NOR Cells Bitline SHIFT C in1 C in0 S 0 C out0 7
40 Multiple hierarchies: Shifters (2/2) Cells Bitline NOR SHIFT 8
41 Multiple hierarchies: Shifters (2/2) Intra-lane: bit shift inside 8 bit lane NOR Cells Bitline SHIFT Virtual lane (INT8) Virtual lane (INT8) 8
42 Multiple hierarchies: Shifters (2/2) Intra-lane: bit shift inside 8 bit lane Inter-lane: array element shift NOR Cells Bitline SHIFT Virtual lane (INT8) Virtual lane (INT8) 8
43 Multiple hierarchies: Shifters (2/2) Intra-lane: bit shift inside 8 bit lane Inter-lane: array element shift Forwarding: access any element in the array NOR Cells Bitline SHIFT Virtual lane (INT8) Virtual lane (INT8) 8
44 Cycles Scalable and Energy-efficient Architecture Lab (SEAL) Putting Compute-capable BLs and Shifters Together C FA Observations: Operand bit length C is preferred: reduction works fine 9
45 Cycles Scalable and Energy-efficient Architecture Lab (SEAL) Cycles Putting Compute-capable BLs and Shifters Together 40 C FA Operand-2 bit length = 2 bit Operand bit length Operand-1 bit length 1 Observations: C is preferred: reduction works fine Affordable MUL: need to have one operand within 2-bit 9
46 Optimizations for high performance 10
47 Optimizations for high performance DRAM technology Logic Incompatible Simple Boolean logic + Serially run General Purpose High Pref. Reconfigurable 10
48 Optimizations for high performance DRAM technology Logic Incompatible Simple Boolean logic + Serially run General Purpose High Pref. Reconfigurable 10
49 Normalized On-chip Mem.Capacity per Area DRAM technology Optimizations for high performance 1.E+03 Logic Incompatible Simple Boolean logic + Serially run General Purpose High Pref. Reconfigurable 1.E+02 Compute-capable Memory (PIM) Adopting commodity DRAM: 13-cycles for 8-bit C 1.E+01 1.E+00 Memory-rich Processor 1E+00 1E+01 1E+02 1E+03 1E+04 Normalized Peak Perf. per Area trc (46ns) 10
50 Normalized On-chip Mem.Capacity per Area DRAM technology Optimizations for high performance 1.E+03 Logic Incompatible Simple Boolean logic + Serially run General Purpose High Pref. Reconfigurable 1.E+02 un-optimized Compute-capable Memory (PIM) Adopting commodity DRAM: 13-cycles for 8-bit C 1.E+01 1.E+00 Memory-rich Processor 1E+00 1E+01 1E+02 1E+03 1E+04 Normalized Peak Perf. per Area trc (46ns) 10
51 Normalized On-chip Mem.Capacity per Area DRAM technology Logic Incompatible Simple Boolean logic + Serially run Optimizations for high performance General Purpose High Pref. Reconfigurable Improve Parallelism Unblock Data Mov. Optimize Activation Adopting commodity DRAM: 13-cycles for 8-bit C 1.E+03 1.E+02 1.E+01 1.E+00 un-optimized Compute-capable Memory (PIM) Target Memory-rich Processor 1E+00 1E+01 1E+02 1E+03 1E+04 Normalized Peak Perf. per Area trc (46ns) 10
52 Experiment Setup DRI circuit simulator: NN topology Heavily modified CACTI Digital circuit (controller, logic gates) From Design Compiler synthesis Scaled to DRAM process with 20% perf. Overhead and 80% area overhead (ISCAS 99) DRI performance simulator: A behavior-level simulator Including a mapping optimization framework Mapping scheme Design options # mat/ subarr y/bank Devise parameter Design options Performance Simulator [In-house] Latency/ cycles Circuit Simulator [DesignCompiler+ CACTI-3DD] Circuits Power/ops Speed Power Area Leakage 11
53 Perf/Area (fr./s/mm2) Scalable and Energy-efficient Architecture Lab (SEAL) Binary weight, 8-bit activation CNN inference case study 1E+02 1E+01 3T1C 1T1C-mixed GPU-INT 1T1C-nor 1T1C-adder 1E+00 1E-01 1E AlexNet vgg-16 vgg-19 resnet-152 GM 12
54 Perf/Area (fr./s/mm2) Scalable and Energy-efficient Architecture Lab (SEAL) Binary weight, 8-bit activation CNN inference case study 1E+02 1E+01 3T1C 1T1C-mixed GPU-INT 1T1C-nor 1T1C-adder 1E+00 1E-01 1E AlexNet vgg-16 vgg-19 resnet-152 GM 12
55 Perf/Area (fr./s/mm2) Scalable and Energy-efficient Architecture Lab (SEAL) Binary weight, 8-bit activation CNN inference case study 1E+02 1E+01 3T1C 1T1C-mixed GPU-INT 1T1C-nor 1T1C-adder 1E+00 1E-01 1E AlexNet vgg-16 vgg-19 resnet-152 GM 12
56 Perf/Area (fr./s/mm2) Scalable and Energy-efficient Architecture Lab (SEAL) Binary weight, 8-bit activation CNN inference case study 1E+02 1E+01 3T1C 1T1C-mixed GPU-INT 1T1C-nor 1T1C-adder 1E+00 1E-01 1E AlexNet vgg-16 vgg-19 resnet-152 GM 12
57 Perf/Area (fr./s/mm2) Scalable and Energy-efficient Architecture Lab (SEAL) Binary weight, 8-bit activation CNN inference case study 1E+02 1E+01 3T1C 1T1C-mixed GPU-INT 1T1C-nor 1T1C-adder 1E+00 1E-01 1E AlexNet vgg-16 vgg-19 resnet-152 GM 12
58 Perf/Area (fr./s/mm2) Scalable and Energy-efficient Architecture Lab (SEAL) Binary weight, 8-bit activation CNN inference case study 1E+02 1E+01 3T1C 1T1C-mixed GPU-INT 1T1C-nor 1T1C-adder 1E+00 1E-01 1E AlexNet vgg-16 vgg-19 resnet-152 GM 12
59 Perf/Area (fr./s/mm2) Scalable and Energy-efficient Architecture Lab (SEAL) 1E+02 1E+01 Binary weight, 8-bit activation CNN inference case study 3T1C 1T1C-nor 1T1C-mixed 1T1C-adder GPU-INT 3T1C is not good The lowest area overhead Large memory cells 1E+00 1E-01 1E AlexNet vgg-16 vgg-19 resnet-152 GM 12
60 Perf/Area (fr./s/mm2) Scalable and Energy-efficient Architecture Lab (SEAL) 1E+02 1E+01 1E+00 1E-01 1E-02 Binary weight, 8-bit activation CNN inference 3T1C 1T1C-mixed GPU-INT case study 1T1C-nor 1T1C-adder AlexNet vgg-16 vgg-19 resnet-152 GM 3T1C is not good The lowest area overhead Large memory cells 1T1C-adder is not the best The best peak performance Low effective performance 1T1C-mixed is the best solution 12
61 More in the paper Microarchitectures of BL-logic operations and shifter Interface design Optimizations for high performance Impact of variation CNN mapping and optimizations Detail experiment setup and more results 13
62 Summary In-situ computing: building an accelerator with DRAM technology DRAM for large memory capacity BL-computing logic design + Shifter for general purpose instructions Optimized for high computing performance Experiments on binary CNN acceleration: perf. per area 8.8x than ASIC,7.7x than GPU energy efficiency per area: 1.2x than ASIC, 15x than GPU NOR Cells Bitline SHIFT Multisubarray active Multi-bank active 14
63 Questions? DRI: A DRAM-based Reconfigurable In-Situ Accelerator Shuangchen Li, Dimin Niu, Krishna T. Malladi, Hongzhong Zheng, Bob Brennan, Yuan Xie University of California, Santa Barbara Memory Solutions Lab, Samsung Semiconductor Inc. Scalable and Energy-efficient Architecture Lab (SEAL) SEAL@UCSB
SCOPE: A Stochastic Computing Engine for DRAM-based In-situ Accelerator
SCOPE: A Stochastic Computing Engine for DRAM-based In-situ Accelerator Shuangchen Li, Alvin Oliver Glova, Xing Hu, Peng Gu, Dimin Niu*, Krishna T. Malladi*, Hongzhong Zheng*, Bob Brennan*, and Yuan Xie
More informationPRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
Scalable and Energy-Efficient Architecture Lab (SEAL) PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in -based Main Memory Ping Chi *, Shuangchen Li *, Tao Zhang, Cong
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationNeural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Charles Eckert Xiaowei Wang Jingcheng Wang Arun Subramaniyan Ravi Iyer Dennis Sylvester David Blaauw Reetuparna Das M-Bits Research
More informationTowards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, Chunyuan Zhang National University of Defense Technology,
More informationCache/Memory Optimization. - Krishna Parthaje
Cache/Memory Optimization - Krishna Parthaje Hybrid Cache Architecture Replacing SRAM Cache with Future Memory Technology Suji Lee, Jongpil Jung, and Chong-Min Kyung Department of Electrical Engineering,KAIST
More informationEmerging NVM Memory Technologies
Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement
More informationBandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design
Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design Song Yao 姚颂 Founder & CEO DeePhi Tech 深鉴科技 song.yao@deephi.tech Outline - About DeePhi Tech - Background - Bandwidth Matters
More informationMNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System
MNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System Lixue Xia 1, Boxun Li 1, Tianqi Tang 1, Peng Gu 12, Xiling Yin 1, Wenqin Huangfu 1, Pai-Yu Chen 3, Shimeng Yu 3, Yu Cao 3,
More informationPRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory Ping Chi, Shuangchen
More informationVLSID KOLKATA, INDIA January 4-8, 2016
VLSID 2016 KOLKATA, INDIA January 4-8, 2016 Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in Hybrid Memory Cube Architectures Ishan Thakkar, Sudeep Pasricha Department of Electrical
More informationThe DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell
EEC 581 Computer Architecture Memory Hierarchy Design (III) Department of Electrical Engineering and Computer Science Cleveland State University The DRAM Cell Word Line (Control) Bit Line (Information)
More informationAbstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE
A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany
More informationTiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu Carnegie Mellon University HPCA - 2013 Executive
More informationARCHITECTURAL TECHNIQUES TO ENHANCE DRAM SCALING. Thesis Defense Yoongu Kim
ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM SCALING Thesis Defense Yoongu Kim CPU+CACHE MAIN MEMORY STORAGE 2 Complex Problems Large Datasets High Throughput 3 DRAM Module DRAM Chip 1 0 DRAM Cell (Capacitor)
More informationMohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu
Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly
More informationEyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks Yu-Hsin Chen 1, Joel Emer 1, 2, Vivienne Sze 1 1 MIT 2 NVIDIA 1 Contributions of This Work A novel energy-efficient
More informationTwo FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters
Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters *Argonne National Lab +BU & USTC Presented by Martin Herbordt Work by Ahmed
More informationA Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment.
A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment. 8th Workshop on Electronics for LHC Experiments 9-13 Sept.
More informationPower Reduction Techniques in the Memory System. Typical Memory Hierarchy
Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache
More informationMacro in a Generic Logic Process with No Boosted Supplies
A 700MHz 2T1C Embedded DRAM Macro in a Generic Logic Process with No Boosted Supplies Ki Chul Chun, Wei Zhang, Pulkit Jain, and Chris H. Kim University of Minnesota, Minneapolis, MN Outline Motivation
More informationEE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing
EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 6c High-Speed Multiplication - III Spring 2017 Koren Part.6c.1 Array Multipliers The two basic operations - generation
More informationSwitched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network
Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Lixue Xia, Tianqi Tang, Wenqin Huangfu, Ming Cheng, Xiling Yin, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E., Tsinghua
More informationIn Live Computer Vision
EVA 2 : Exploiting Temporal Redundancy In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson International Symposium on Computer Architecture (ISCA) Tuesday June 5, 2018
More informationReducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck
More informationARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES
ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES Shashikiran H. Tadas & Chaitali Chakrabarti Department of Electrical Engineering Arizona State University Tempe, AZ, 85287. tadas@asu.edu, chaitali@asu.edu
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 6c High-Speed Multiplication - III Israel Koren Fall 2010 ECE666/Koren Part.6c.1 Array Multipliers
More informationDNN Accelerator Architectures
DNN Accelerator Architectures ISCA Tutorial (2017) Website: http://eyeriss.mit.edu/tutorial.html Joel Emer, Vivienne Sze, Yu-Hsin Chen 1 2 Highly-Parallel Compute Paradigms Temporal Architecture (SIMD/SIMT)
More informationDRAM with Boosted 3T Gain Cell, PVT-tracking Read Reference Bias
ASub-0 Sub-0.9V Logic-compatible Embedded DRAM with Boosted 3T Gain Cell, Regulated Bit-line Write Scheme and PVT-tracking Read Reference Bias Ki Chul Chun, Pulkit Jain, Jung Hwa Lee*, Chris H. Kim University
More informationAn introduction to Machine Learning silicon
An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional
More informationLow-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM
Low-Cost Inter-Linked ubarrays (LIA) Enabling Fast Inter-ubarray Data Movement in DRAM Kevin Chang rashant Nair, Donghyuk Lee, augata Ghose, Moinuddin Qureshi, and Onur Mutlu roblem: Inefficient Bulk Data
More informationTETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory
TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, Christos Kozyrakis Stanford University Platform Lab Review Feb 2017 Deep Neural
More informationECE 152 Introduction to Computer Architecture
Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2009 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2009 1 Where We Are in This Course
More informationNewton: Gravitating Towards the Physical Limits of Crossbar Acceleration
Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration Anirban Nag, Ali Shafiee, Rajeev Balasubramonian, Vivek Srikumar, Naveen Muralimanohar School of Computing, University of Utah,
More informationProcessing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach
2018 51st Annual IEEE/ACM International Symposium on Microarchitecture Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach Jiawen Liu*, Hengyu Zhao*, Matheus Almeida
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationENEE 759H, Spring 2005 Memory Systems: Architecture and
SLIDE, Memory Systems: DRAM Device Circuits and Architecture Credit where credit is due: Slides contain original artwork ( Jacob, Wang 005) Overview Processor Processor System Controller Memory Controller
More informationEECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5. EECS 427 F09 Lecture Reminders
EECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5 1 Reminders Deadlines HW4 is due Tuesday 11/17 at 11:59 pm (email submission) CAD8 is due Saturday 11/21 at 11:59 pm Quiz 2 is on Wednesday
More informationOpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationEmbedded Systems Ch 15 ARM Organization and Implementation
Embedded Systems Ch 15 ARM Organization and Implementation Byung Kook Kim Dept of EECS Korea Advanced Institute of Science and Technology Summary ARM architecture Very little change From the first 3-micron
More informationA Write-Back-Free 2T1D Embedded. a Dual-Row-Access Low Power Mode.
A Write-Back-Free 2T1D Embedded DRAM with Local Voltage Sensing and a Dual-Row-Access Low Power Mode Wei Zhang, Ki Chul Chun, Chris H. Kim University of Minnesota, Minneapolis, MN zhang758@umn.edu Outline
More informationSynthesis at different abstraction levels
Synthesis at different abstraction levels System Level Synthesis Clustering. Communication synthesis. High-Level Synthesis Resource or time constrained scheduling Resource allocation. Binding Register-Transfer
More informationECE 2020 Fundamentals of Digital Design Spring problems, 7 pages Exam Three Solutions 2 April DRAM chips required 4*16 = 64
Problem 1 (3 parts, 30 points) Memory Chips/Systems Part A (12 points) Consider a 256 Mbit DRAM chip organized as 16 million addresses of 16-bit words. Assume both the DRAM cell and the DRAM chip are square.
More informationEECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis
EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis Jan 31, 2012 John Wawrzynek Spring 2012 EECS150 - Lec05-verilog_synth Page 1 Outline Quick review of essentials of state elements Finite State
More informationMagnetic core memory (1951) cm 2 ( bit)
Magnetic core memory (1951) 16 16 cm 2 (128 128 bit) Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Memory Read-Only Memory Random Access Non-Random Access EPROM E 2 PROM
More informationELCT 912: Advanced Embedded Systems
Advanced Embedded Systems Lecture 2: Memory and Programmable Logic Dr. Mohamed Abd El Ghany, Memory Random Access Memory (RAM) Can be read and written Static Random Access Memory (SRAM) Data stored so
More informationPinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-volatile Memories
Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-volatile Memories Shuangchen Li 1, Cong Xu 2, Qiaosha Zou 1,5, Jishen Zhao 3,YuLu 4, and Yuan Xie 1 University
More informationRegular Fabrics for Retiming & Pipelining over Global Interconnects
Regular Fabrics for Retiming & Pipelining over Global Interconnects Jason Cong Computer Science Department University of California, Los Angeles cong@cs cs.ucla.edu http://cadlab cadlab.cs.ucla.edu/~cong
More informationFINN: A Framework for Fast, Scalable Binarized Neural Network Inference
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference Yaman Umuroglu (NTNU & Xilinx Research Labs Ireland) in collaboration with N Fraser, G Gambardella, M Blott, P Leong, M Jahre and
More informationNISC Application and Advantages
NISC Application and Advantages Daniel D. Gajski Mehrdad Reshadi Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697-3425, USA {gajski, reshadi}@cecs.uci.edu CECS Technical
More informationHIERARCHICAL DESIGN. RTL Hardware Design by P. Chu. Chapter 13 1
HIERARCHICAL DESIGN Chapter 13 1 Outline 1. Introduction 2. Components 3. Generics 4. Configuration 5. Other supporting constructs Chapter 13 2 1. Introduction How to deal with 1M gates or more? Hierarchical
More informationOutline HIERARCHICAL DESIGN. 1. Introduction. Benefits of hierarchical design
Outline HIERARCHICAL DESIGN 1. Introduction 2. Components 3. Generics 4. Configuration 5. Other supporting constructs Chapter 13 1 Chapter 13 2 1. Introduction How to deal with 1M gates or more? Hierarchical
More informationSpeeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns
March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang Executive Summary Problems: performance and reliability of write operations
More informationCan FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al.
Can FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al.) Andreas Kurth 2017-12-05 1 In short: The situation Image credit:
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCouture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung
Couture: Tailoring STT-MRAM for Persistent Main Memory Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Executive Summary Motivation: DRAM plays an instrumental role in modern
More informationIntroduction to Semiconductor Memory Dr. Lynn Fuller Webpage:
ROCHESTER INSTITUTE OF TECHNOLOGY MICROELECTRONIC ENGINEERING Introduction to Semiconductor Memory Webpage: http://people.rit.edu/lffeee 82 Lomb Memorial Drive Rochester, NY 14623-5604 Tel (585) 475-2035
More informationCMOS Logic Circuit Design Link( リンク ): センター教官講義ノートの下 CMOS 論理回路設計
CMOS Logic Circuit Design http://www.rcns.hiroshima-u.ac.jp Link( リンク ): センター教官講義ノートの下 CMOS 論理回路設計 Memory Circuits (Part 1) Overview of Memory Types Memory with Address-Based Access Principle of Data Access
More informationGeneral-purpose Reconfigurable Functional Cache architecture. Rajesh Ramanujam. A thesis submitted to the graduate faculty
General-purpose Reconfigurable Functional Cache architecture by Rajesh Ramanujam A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE
More informationSpiral 1 / Unit 4 Verilog HDL. Digital Circuit Design Steps. Digital Circuit Design OVERVIEW. Mark Redekopp. Description. Verification.
1-4.1 1-4.2 Spiral 1 / Unit 4 Verilog HDL Mark Redekopp OVERVIEW 1-4.3 1-4.4 Digital Circuit Design Steps Digital Circuit Design Description Design and computer-entry of circuit Verification Input Stimulus
More informationIntegrated Circuits & Systems
Federal University of Santa Catarina Center for Technology Computer Science & Electronics Engineering Integrated Circuits & Systems INE 5442 Lecture 23-1 guntzel@inf.ufsc.br Semiconductor Memory Classification
More informationTopic #6. Processor Design
Topic #6 Processor Design Major Goals! To present the single-cycle implementation and to develop the student's understanding of combinational and clocked sequential circuits and the relationship between
More informationDesign Methodologies and Tools. Full-Custom Design
Design Methodologies and Tools Design styles Full-custom design Standard-cell design Programmable logic Gate arrays and field-programmable gate arrays (FPGAs) Sea of gates System-on-a-chip (embedded cores)
More information! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 5, 8 Memory: Periphery circuits Today! Memory " RAM Memory " Architecture " Memory core " SRAM " DRAM " Periphery " Serial Access Memories
More informationAC-DIMM: Associative Computing with STT-MRAM
AC-DIMM: Associative Computing with STT-MRAM Qing Guo, Xiaochen Guo, Ravi Patel Engin Ipek, Eby G. Friedman University of Rochester Published In: ISCA-2013 Motivation Prevalent Trends in Modern Computing:
More informationECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path
ECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path Project Summary This project involves the schematic and layout design of an 8-bit microprocessor
More information+1 (479)
Memory Courtesy of Dr. Daehyun Lim@WSU, Dr. Harris@HMC, Dr. Shmuel Wimer@BIU and Dr. Choi@PSU http://csce.uark.edu +1 (479) 575-6043 yrpeng@uark.edu Memory Arrays Memory Arrays Random Access Memory Serial
More informationLecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)
Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache 2 Virtually Indexed Caches 24-bit virtual address, 4KB page size 12 bits offset and 12 bits
More informationScaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research
Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)
More informationArchitectural Support for Large-Scale Visual Search. Carlo C. del Mundo Vincent Lee Armin Alaghi Luis Ceze Mark Oskin
Architectural Support for Large-Scale Visual Search Carlo C. del Mundo Vincent Lee Armin Alaghi Luis Ceze Mark Oskin Motivation: Visual Data & Their Applications Rebooting the IT Revolution, SIA, September
More informationChapter 4. The Processor Designing the datapath
Chapter 4 The Processor Designing the datapath Introduction CPU performance determined by Instruction Count Clock Cycles per Instruction (CPI) and Cycle time Determined by Instruction Set Architecure (ISA)
More informationLecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)
Lecture 15: DRAM Main Memory Systems Today: DRAM basics and innovations (Section 2.3) 1 Memory Architecture Processor Memory Controller Address/Cmd Bank Row Buffer DIMM Data DIMM: a PCB with DRAM chips
More informationDIRECT Rambus DRAM has a high-speed interface of
1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999 A 1.6-GByte/s DRAM with Flexible Mapping Redundancy Technique and Additional Refresh Scheme Satoru Takase and Natsuki Kushiyama
More informationNovel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014
Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014 Shinobu Fujita, Kumiko Nomura, Hiroki Noguchi, Susumu Takeda, Keiko Abe Toshiba Corporation, R&D Center Advanced
More informationDRAM Tutorial Lecture. Vivek Seshadri
DRAM Tutorial 18-447 Lecture Vivek Seshadri DRAM Module and Chip 2 Goals Cost Latency Bandwidth Parallelism Power Energy 3 DRAM Chip Bank I/O 4 Sense Amplifier top enable Inverter bottom 5 Sense Amplifier
More informationDesign Space Exploration of FPGA-Based Deep Convolutional Neural Networks
Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks Abstract Deep Convolutional Neural Networks (DCNN) have proven to be very effective in many pattern recognition applications, such
More informationEmbedded Memories. Advanced Digital IC Design. What is this about? Presentation Overview. Why is this important? Jingou Lai Sina Borhani
1 Advanced Digital IC Design What is this about? Embedded Memories Jingou Lai Sina Borhani Master students of SoC To introduce the motivation, background and the architecture of the embedded memories.
More informationStructure of Computer Systems
288 between this new matrix and the initial collision matrix M A, because the original forbidden latencies for functional unit A still have to be considered in later initiations. Figure 5.37. State diagram
More informationPipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning
PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning Presented by Nils Weller Hardware Acceleration for Data Processing Seminar, Fall 2017 PipeLayer: A Pipelined ReRAM-Based Accelerator for
More informationECE 2300 Digital Logic & Computer Organization
ECE 2300 Digital Logic & Computer Organization Spring 201 Memories Lecture 14: 1 Announcements HW6 will be posted tonight Lab 4b next week: Debug your design before the in-lab exercise Lecture 14: 2 Review:
More informationOCP Engineering Workshop - Telco
OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,
More informationComputer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University Main Memory Lectures These slides are from the Scalable Memory Systems course taught at ACACES 2013 (July 15-19,
More informationDesign Space Exploration of FPGA-Based Deep Convolutional Neural Networks
Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks Mohammad Motamedi, Philipp Gysel, Venkatesh Akella and Soheil Ghiasi Electrical and Computer Engineering Department, University
More informationAn Introduction to the Logic. Silicon Chips
An Introduction to the Logic of Silicon Chips Here is a photo of a typical silicon chip, taken alongside the tip of my little finger. Modern chips can be made a good deal smaller than the one shown - just
More informationPACE: Power-Aware Computing Engines
PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious
More informationEvaluating STT-RAM as an Energy-Efficient Main Memory Alternative
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University
More informationMulti-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture
The 51st Annual IEEE/ACM International Symposium on Microarchitecture Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture Byungchul Hong Yeonju Ro John Kim FuriosaAI Samsung
More informationThe Processor That Don't Cost a Thing
The Processor That Don't Cost a Thing Peter Hsu, Ph.D. Peter Hsu Consulting, Inc. http://cs.wisc.edu/~peterhsu DRAM+Processor Commercial demand Heat stiffling industry's growth Heat density limits small
More informationHigh Performance Computing
High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason
More informationBoolean Unit (The obvious way)
oolean Unit (The obvious way) It is simple to build up a oolean unit using primitive gates and a mux to select the function. Since there is no interconnection between bits, this unit can be simply replicated
More informationLecture-14 (Memory Hierarchy) CS422-Spring
Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationEXPERIMENT NUMBER 11 REGISTERED ALU DESIGN
11-1 EXPERIMENT NUMBER 11 REGISTERED ALU DESIGN Purpose Extend the design of the basic four bit adder to include other arithmetic and logic functions. References Wakerly: Section 5.1 Materials Required
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationBinary Convolutional Neural Network on RRAM
Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua
More informationUnleashing the Power of Embedded DRAM
Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers
More informationENGIN 112 Intro to Electrical and Computer Engineering
ENGIN 112 Intro to Electrical and Computer Engineering Lecture 30 Random Access Memory (RAM) Overview Memory is a collection of storage cells with associated input and output circuitry Possible to read
More information