Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration

Similar documents
Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals

EEM 486: Computer Architecture. Lecture 9. Memory

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung

CMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.

The Memory Hierarchy 1

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly)

Unleashing the Power of Embedded DRAM

CS250 VLSI Systems Design Lecture 9: Memory

2. Link and Memory Architectures and Technologies

Architectural Aspects in Design and Analysis of SOTbased

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

ECE 485/585 Microprocessor System Design

MTJ-Based Nonvolatile Logic-in-Memory Architecture

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Versatile RRAM Technology and Applications

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

Semiconductor Memory Classification

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap

COMPUTER ARCHITECTURES

The Speed of Diversity: Exploring Complex FPGA Routing Topologies for the Global Metal Layer

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

Lecture-14 (Memory Hierarchy) CS422-Spring

Emerging NVM Memory Technologies

Test and Reliability of Emerging Non-Volatile Memories

Field Programmable Gate Array (FPGA) Devices

ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems

Advanced 1 Transistor DRAM Cells

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

Understanding Reduced-Voltage Operation in Modern DRAM Devices

CSE502: Computer Architecture CSE 502: Computer Architecture

2000 N + N <100N. When is: Find m to minimize: (N) m. N log 2 C 1. m + C 3 + C 2. ESE534: Computer Organization. Previously. Today.

Deep Sub-Micron Cache Design

Prototype of SRAM by Sergey Kononov, et al.

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

The DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell

ECE 485/585 Microprocessor System Design

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall

Integrated Circuits & Systems

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Topic 21: Memory Technology

Topic 21: Memory Technology

Memories: Memory Technology

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

EMDBAM: A Low-Power Dual Bit Associative Memory with Match Error and Mask Control

EECS150 - Digital Design Lecture 16 Memory 1

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

+1 (479)

CIRCUIT AND ARCHITECTURE CO-DESIGN OF STT-RAM FOR HIGH PERFORMANCE AND LOW ENERGY. by Xiuyuan Bi Master of Science, New York University, 2010

CSE502: Computer Architecture CSE 502: Computer Architecture

Research Challenges for FPGAs

Magnetoresistive RAM (MRAM) Jacob Lauzon, Ryan McLaughlin

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

Memory Systems IRAM. Principle of IRAM

Very Large Scale Integration (VLSI)

PICo Embedded High Speed Cache Design Project

CS311 Lecture 21: SRAM/DRAM/FLASH

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

Design Methodologies. Full-Custom Design

Spiral 2-9. Tri-State Gates Memories DMA

Learning Outcomes. Spiral 2-9. Typical Logic Gate TRI-STATE GATES

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

The Engine. SRAM & DRAM Endurance and Speed with STT MRAM. Les Crudele / Andrew J. Walker PhD. Santa Clara, CA August

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

Christophe HURIAUX. Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

AC-DIMM: Associative Computing with STT-MRAM

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Lecture 11: MOS Memory

Chapter 8 Memory Basics

An introduction to SDRAM and memory controllers. 5kk73

arxiv: v1 [cs.ar] 13 Jul 2018

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization

MNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System

ECE 2300 Digital Logic & Computer Organization

Column decoder using PTL for memory

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

CENG 4480 L09 Memory 2

EECS150 - Digital Design Lecture 16 - Memory

Stratix II vs. Virtex-4 Power Comparison & Estimation Accuracy

CENG 4480 L09 Memory 3

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

The Processor That Don't Cost a Thing

Developing a Prototyping Board for Emerging Memory

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns

Calibrating Achievable Design GSRC Annual Review June 9, 2002

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS

Improved Initial Overdrive Sense-Amplifier. For Low-Voltage DRAMS. Analog CMOS IC Design. Esayas Naizghi April 30, 2004

Introduction to SRAM. Jasur Hanbaba

Memory and Programmable Logic

Couture: Tailoring STT-MRAM for Persistent Main Memory

Transcription:

Don t Forget the : Automatic Block RAM Modelling, Optimization, and Architecture Exploration S. Yazdanshenas, K. Tatsumura *, and V. Betz University of Toronto, Canada * Toshiba Corporation, Japan

: An Indispensable FPGA Component Logic Blocks BRAM DDR Controller DDR DDR DDR 2

: An Indispensable FPGA Component DDR Controller DDR DDR DDR 3

: An Indispensable FPGA Component ~1000x Lower Bandwidth ~100x Higher Access Energy ~10x Higher Latency DDR Controller DDR DDR DDR 4

: An Indispensable FPGA Component BRAM growing in importance Many applications (search engine, CNNs, ) BRAM-intensive Can t fully utilize FPGA s computation capacity without on-chip memory RAM bit/le Xilinx 200 150 100 50 4kb Virtex 220nm Richness Increase 18kb 18kb + 288kb Ultrascale+ 16nm 16nm 20nm 28nm 40nm 65nm 90nm 150nm 220nm 0 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 Number of Logic Elements (4-input-LUT equivalent) 5

BRAM s Evolution -richness growth Organization changes 6

The Key Point BRAMs can t be neglected! ~25% of area Should respond to application demands Need BRAM models How efficient is an architecture? What s the best architecture? Set of BRAM Models Set of Benchmark Circuits VTR The best Architecture 7

BRAM design is Difficult BRAM design is challenging! Analog nature of some components Sense Amplifier 8

BRAM design is Difficult BRAM design is challenging! Analog nature of some components Variability of memory cells 9

BRAM design is Difficult BRAM design is challenging! Analog nature of some components Variability of memory cells Custom layout style Significant FPGA-specific peripheral circuitry Hand design of each candidate BRAM infeasible 10

Use existing tools? 11

BRAM design is Different CACTI underestimates area ~1.9x ~2.8x 12

BRAM design is Different Also underestimates read energy ~5.5x ~8.1x 13

BRAM design is Different Overestimates operating frequency ~4.7x ~3.9x 14

Emerging Technologies Model promising emerging memory technologies Magnetic Tunnel Junction (MTJ) Phase Change (PCM) Resistive RAM (RRAM) Ideal: model any technology with SPICE support Magnetic Tunnel Junction (MTJ) Parallel Free Layer Insulator Fixed Layer Low resistance state (LRS) Anti-parallel High resistance state (HRS) 15

BRAM Design Tool 16

What do we need? Process Model Architecture Specifications Automatic Transistor Sizing Tool VTR Architecture File Detailed Sizing Solution 17

COFFE: Logic & Routing C. Chiasson et al. "COFFE: Fully-Automated Transistor Sizing for FPGAs," FPT 2013 18

COFFE BRAM Flow Process Model Spice Netlist Generation Individual Sub- Circuits Architecture Specifications Top Level Evaluation Paths Update FPGA Area, Wires, Wire RC Size All Components (Involves updating area, delay,...) No Finished Sizing Iterations? Yes Measure RAM Core Power To VTR Generate VTR Architecture File 19

2-Bank SRAM-Based BRAM Cells Bank #1 Cell Array Bank #2 Cell Array 20

2-Bank SRAM-Based BRAM Cells WL Port B Vdd WL Port B BL Port B BL Port A WL Port A SRAM Cell WL Port A BL Port B BL Port A Bank #1 Cell Array Bank #2 Cell Array 21

2-Bank SRAM-Based BRAM Cells Traditional Decoders Column Decoders Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array 22

2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Write Drivers and Sense Amplifiers Write Drivers and Sense Amplifiers Precharge and Equalizer Column Decoders Precharge and Equalizer Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array 23

2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Vdd Vdd Write Drivers and Sense Amplifiers Write Drivers and Sense Amplifiers Precharge and Equalizer Bank #1 Cell Array Column Decoders Row Decoders and Wordline Drivers Precharge and Equalizer Bank #2 Cell Array BL BL Precharge and equalizer 24

2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Vdd Data Vdd Write Drivers and Sense Amplifiers Write Drivers and Sense Amplifiers Precharge and Equalizer Column Decoders Precharge and Equalizer Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array Write Driver 25

2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Write Drivers and Sense Amplifiers Precharge and Equalizer Column Decoders Write Drivers and Sense Amplifiers Precharge and Equalizer Sense Amplifier Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array 26

2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Width- Configurability Circuitry Write Drivers and Sense Amplifiers Width- Configurable Decoders Write Drivers and Sense Amplifiers Input Crossbar and Level Shifter Precharge and Equalizer Column Decoders Precharge and Equalizer Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array 27

2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Width- Configurability Circuitry Write Drivers and Sense Amplifiers Width- Configurable Decoders Write Drivers and Sense Amplifiers To/From Routing Input Crossbar and Level Shifter Precharge and Equalizer Column Decoders Precharge and Equalizer Bank #1 Cell Array Bank #2 Cell Array 28

2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Width- Configurability Circuitry Write Drivers and Sense Amplifiers Width- Configurable Decoders Write Drivers and Sense Amplifiers To/From Routing Pipeline Registers Input Crossbar and Level Shifter Precharge and Equalizer Column Decoders Precharge and Equalizer Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array 29

2-Bank MTJ-Based BRAM Cells MTJbased Cell Bank #1 Cell Array Bank #2 Cell Array 30

2-Bank MTJ-Based BRAM Cells Reference Cells Reference Cells Bank #1 Cell Array Bank #2 Cell Array 31

2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Sense Amplifier Sense Amplifier Write Driver Write Driver Write Driver Write Driver Column Selector and Predischarger Column Decoders Column Selector and Predischarger Reference Cells Bank #1 Cell Array Row Decoders and Wordline Drivers Reference Cells Bank #2 Cell Array 32

2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Sense Amplifier Sense Amplifier Write Driver Write Driver Write Driver Write Driver Column Selector and Predischarger Column Decoders Column Selector and Predischarger Reference Cells Bank #1 Cell Array Row Decoders and Wordline Drivers Reference Cells Bank #2 Cell Array 33

2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Sense Amplifier Sense Amplifier Write Driver Write Driver Write Driver Write Driver Column Selector and Predischarger Column Decoders Column Selector and Predischarger Reference Cells Bank #1 Cell Array Row Decoders and Wordline Drivers Reference Cells Bank #2 Cell Array Write Driver 34

2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Sense Amplifier Sense Amplifier Write Driver Write Driver Write Driver Write Driver Column Selector and Predischarger Column Decoders Column Selector and Predischarger Reference Cells Bank #1 Cell Array Row Decoders and Wordline Drivers Reference Cells Bank #2 Cell Array 35

2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Width- Configurability Circuitry Input Crossbar and Level Shifter Sense Amplifier Write Driver Write Driver Column Selector and Predischarger Width-Configurable Decoders Column Decoders Write Driver Sense Amplifier Write Driver Column Selector and Predischarger Reference Cells Bank #1 Cell Array Row Decoders and Wordline Drivers Reference Cells Bank #2 Cell Array 36

2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Width- Configurability Circuitry To/From Routing Pipeline Registers Input Crossbar and Level Shifter Sense Amplifier Write Driver Write Driver Column Selector and Predischarger Reference Cells Bank #1 Cell Array Width-Configurable Decoders Column Decoders Row Decoders and Wordline Drivers Write Driver Sense Amplifier Write Driver Column Selector and Predischarger Reference Cells Bank #2 Cell Array 37

Simulation In Context: SRAM Precharge Sense Amp Write Driver Vdd Vdd Vdd The Rest of SRAM Cells Go here Vdd 38

Simulation In Context: SRAM Precharge Sense Amp Write Driver Vdd Vdd Initial Voltage = Vdd Initial Voltage = 0 Vdd The Rest of SRAM Cells Go here Vdd 39

Simulation In Context: SRAM Precharge Sense Amp Write Driver Vdd Vdd Initial Voltage = Vdd Initial Voltage = 0 Vdd The Rest of SRAM Cells Go here Vdd Target Voltage = 0.99 Vdd 40

Simulation In Context: Worst-case SRAM Cell Variation changes SRAM cell read/write currents significantly 41

Simulation In Context: Worst-case SRAM Cell Using the nominal memory cell will be inaccurate BRAM energy will be underestimated BRAM frequency will be overestimated 42

Simulation In Context: Worst-case SRAM Cell Don t have the Spice model for the worst-case cell! 43

Simulation In Context: Worst-case SRAM Cell Use Monte Carlo simulation to find distribution of cell properties 44

Monte Carlo Simulation: Worst-case Cell Read speed Cumulative probability 4x10-6 50% 0 Average Cell worst Worst Cell 10 20 0I 10 I read [ua] 20 read K. Tatsumura et al."high Density, Low Energy, Magnetic Tunnel Junction Based Block RAMs for -Rich FPGAs," FPT 2016 45

Simulation In Context: Worst-Case Cell Write Driver Sense Amplifier Precharge The Rest of SRAM Cells Go here 46

Simulation In Context: Worst-Case Cell Write Driver Sense Amplifier Precharge Initial Voltage = Vdd Initial Voltage = Vdd The Rest of SRAM Cells Go here 47

Simulation In Context: Worst-Case Cell Write Driver Sense Amplifier Precharge Initial Voltage = Vdd Initial Voltage = Vdd The Rest of SRAM Cells Go here Weakest Read Current 48

Simulation In Context: Worst-Case Cell Target Voltage Difference Write Driver Sense Amplifier Precharge Initial Voltage = Vdd Initial Voltage = Vdd The Rest of SRAM Cells Go here Weakest Read Current 49

Validation and Results 50

Area Validation Area results align well with commercial data 51

SRAM-based BRAM Area Breakdown SRAM area dominates for large BRAMs Smaller BRAMs: other components relevant 52

MTJ is more area-efficient It gets increasingly more efficient with BRAM size Area: MTJ vs. SRAM ~3.1x ~1.6x 53

Reasonable alignment with commercial data Less guardband No aggressive banking Frequency validation 54

Operating Frequency: MTJ vs. SRAM SRAM is faster The gap narrows with increasing size ~3.6x ~1.6x 55

Simulation Results: Energy 56

SRAM-based BRAM Narrow Mode FPGA RAM configurable Often used in narrower modes Energy mostly unaffected 57

Energy Per Bit: MTJ vs. SRAM MTJs are more efficient with large BRAM MTJ narrow mode more efficient ~1.7x ~3.2x 58

Worst-Case Cell Modeling Crucial Use nominal memory cell? Underestimates area and energy Gets more severe with increasing memory size BRAM Capacity (Kbit) Change in Delay 8-21% -9% 16-19% -6% 32-27% -15% 64-22% -9% 128-30% -20% 256-42% -29% Change in Energy per bit 59

Architecture Exploration 60

Architecture Exploration: RAM-Mapping Flow BRAM models generated by COFFE can be used in architecture exploration Area-oriented RAM mapping 69 industrial circuits Used in development of Stratix V memory architecture We have partial data: Number of logic blocks used Number, Sizes, and types of Logical RAMs Gradually excluding less-memory-rich circuits 61

16K always the best Stratix V-like SRAM-based BRAM richness 62

MTJ always saves area The best architecture changes 32k or 64k best MTJ-based BRAM ~26% richness 63

Architecture Exploration: VTR Flow Nine VTR benchmark circuits with memory MTJ vs. SRAM Architecture Parameters 32kb BRAM, every 8 columns MTJ BRAM is smaller à get more 2.3x more RAM blocks per column Ten 6-luts per logic block 64

Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 65

Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 66

Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 67

Architecture Exploration: VTR q Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 68

Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 69

Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 70

Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 71

Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 72

Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 73

Conclusion First transistor sizing tool capable of BRAM modeling SRAM-based MTJ-based Simulation results align well with available commercial data COFFE now enables BRAM architecture exploration! RAM-Mapping VTR 74