ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator

Size: px
Start display at page:

Download "ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator"

Transcription

1 ICS 28 ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator June 3, 28 Dongwoo Lee, Sungbum Kang, Kiyoung Choi Neural Processing Research Center (NPRC)

2 2 Outline Motivation Early Negative Detection (END) Computation Pruning thru END (ComPEND) Evaluation Conclusion

3 3 Motivation Perceptron AA ll WW ll xx = NN ii= Σ AA ii WW ii xx AA ll =f(x) f(x) x Rectified linear unit (ReLU, [f(x) = max(,x)]) is widely used as an activation function for DNN.

4 4 Motivation Perceptron AA ll WW ll xx = NN ii= Σ AA ii WW ii xx AA ll =f(x)= f(x) x Rectified linear unit (ReLU, [f(x) = max(,x)]) is widely used as an activation function for DNN.

5 5 Motivation Perceptron AA ll WW ll xx = NN ii= Σ AA ii WW ii xx AA ll =f(x)= f(x) x Rectified linear unit (ReLU, [f(x) = max(,x)]) is widely used as an activation function for DNN. If we know a priori that x, we can skip unnecessary computations and simply set ReLU output to zero.

6 6 Motivation Distribution of negative inputs to ReLU functions in VGG-6 More than 6%

7 7 Early Negative Detection (END) Two s complement number representation (4 bits) Negative Positive = -8+7 = - = -8+6 = -2 = -8+5 = -3 = -8+4 = = -+ = + = -+ = + For a B-bit number WW : ( ww BB ww BB 2 ww BB 3 ww ww ) WW = ww BB ( 2 BB ) + BB 2 kk= ww kk +2 kk

8 8 Early Negative Detection (END) Inverted two s complement number representation (4 bits) Positive Negative = +8-7 = + = +8-6 = +2 = +8-5 = +3 = +8-4 = = +- = - = +- = - For a B-bit number WW : ( ww BB ww BB 2 ww BB 3 ww ww ) WW = ww BB (+2 BB ) + BB 2 kk= ww kk 2 kk

9 9 Early Negative Detection (END) Inverted two s complement representation for negative detection Decimal Activation: 5 Weight: ) x s complement x ) ReLU

10 Early Negative Detection (END) Inverted two s complement representation for negative detection Activation: 5 Weight: ) Decimal 2 s complement Inverted 2 s complement x x ) x ) ReLU

11 Early Negative Detection (END) Inverted two s complement representation for negative detection Activation: 5 Weight: ) Decimal 2 s complement Inverted 2 s complement x x ) x ) - ReLU

12 2 Early Negative Detection (END) Inverted two s complement representation for negative detection Activation: 5 Weight: ) Decimal 2 s complement Inverted 2 s complement x x ) Skipped! x ) ReLU

13 3 Early Negative Detection (END) Two s complement representation Positive sum value Negative sum WW = ww BB ( 2 BB ) + BB 2 kk= ww kk +2 kk steps Inverted two s complement representation value WW = ww BB (+2 BB ) + BB 2 kk= ww kk 2 kk steps Stop here!

14 4 Early Negative Detection (END) For multiple inputs AA ll WW ll Σ xx xx = NN ii= AA ii WW ii = AA [ww,bb 2 BB ww,bb 2 2 BB 2 ww,bb 3 2 BB 3 ] +AA 2 [ww 2,BB 2 BB ww 2,BB 2 2 BB 2 ww 2,BB 3 2 BB 3 ] +AA NN [ww NN,BB 2 BB ww NN,BB 2 2 BB 2 ww NN,BB 3 2 BB 3 ]

15 5 Early Negative Detection (END) For multiple inputs AA ll WW ll Σ xx xx = NN ii= AA ii WW ii = AA [ww,bb 2 BB ww,bb 2 2 BB 2 ww,bb 3 2 BB 3 ] +AA 2 [ww 2,BB 2 BB ww 2,BB 2 2 BB 2 ww 2,BB 3 2 BB 3 ] +AA NN [ww NN,BB 2 BB ww NN,BB 2 2 BB 2 ww NN,BB 3 2 BB 3 ]

16 6 Early Negative Detection (END) For multiple inputs AA ll WW ll Σ xx xx = NN ii= AA ii WW ii = AA [ww,bb 2 BB ww,bb 2 2 BB 2 ww,bb 3 2 BB 3 ] +AA 2 [ww 2,BB 2 BB ww 2,BB 2 2 BB 2 ww 2,BB 3 2 BB 3 ] +AA NN [ww NN,BB 2 BB ww NN,BB 2 2 BB 2 ww NN,BB 3 2 BB 3 ]

17 7 Computation Pruning thru END (ComPEND) Bit-serial sum of products Takes multiple steps, but the area of a bit-serial unit is much smaller Can integrate more units higher performance Similar to Stripes (P. Judd et al., MICRO 26) MSB LSB W A W N A N + S W W N LSB MSB A + B bits A N + B Steps << S < Conventional sum of products > < Bit-seral sum of products >

18 8 Computation Pruning thru END (ComPEND) Overall architecture of ComPEND DRAM STT-RAM WB WB WB Memory Controller Provider Network Global Controller 9x6 array of s 32 6-bit inputs per 9x6x32 inputs at a time (3x3x52 filter) 6 + additional s A l * W l A l AB AB AB AB

19 9 Computation Pruning thru END (ComPEND) DATA packing Input activation block 32 activations of same X, Y I z I x O z O x A,, A,,2 A,,3 A,,4 A,,3 A,,32 I y F y F x O y 6-bit F z =I z 52-bit Weight bits block 52 bits of weights in same bit position I z O z w,, MSB w,,2 MSB w,,3 MSB w,,4 MSB w,,5 w,,52 MSB MSB I y I x O y O x w,, MSB- w,,2 MSB- w,,3 MSB- w,,4 MSB- w,,5 w,,52 MSB- MSB- F y F x F z =I z w,, LSB w,,2 LSB w,,3 LSB w,,4 LSB w,,5 w,,52 LSB LSB -bit 52-bit < in the case of F z = 52 >

20 2 Computation Pruning thru END (ComPEND) Processing unit Input activations input 6-bit adder tree 32 6-bit input activation registers 32-bit weight bits register Weight bits

21 2 Computation Pruning thru END (ComPEND) Memory controller Manages all kinds of memory-involved data transfers Weight blocks Off-chip memory -> STT-RAM STT-RAM -> Weight Buffers (WBs) WBs -> Weight registers in s DRAM STT-RAM AB AB AB AB Activation blocks Off-chip memory -> Activation Buffers (ABs) Off-chip memory -> Registers in s (FC layers: activation blocks are moved directly from off-chip memory to registers) ABs -> Registers in s WB WB WB Memory Controller Provider Network Global Controller Output activation blocks Global controller -> Off-chip memory

22 22 Computation Pruning thru END (ComPEND) Provider network A, A,2 A,3 A,4 A, A,2 A,3 A,4 Inputs: 32 x 9 x 6 bits A 2, A 2,2 A 2,3 A 2,4 A 2, A 2,2 A 2,3 A 2,4 outputs: 32 x 9 x 6 bits A 3, A 3,2 A 3,3 A 3,4 Sliding window W, A, W,2 A,2 W,3 A,3 W, W,2 W,3 A 3, A 3,2 A 3,3 A 3,4 Sliding window A, A,2 a,3 Activation reuse in s During 2D convolution with 3x3 filters Reconfiguration with 9 types of connections for shuffling weights W 3,3 A 3,3 W 3,3 A 3,3 < Connection type > < Connection type 2 >

23 23 Computation Pruning thru END (ComPEND) head Global controller id id id id Pipeline list pos pos pos pos Decision unit id last pos DATA id last pos DATA id last pos DATA = Entry board MUX id last pos DATA << - 6 decision units Decision unit Decides final sum of products Zero if DATA is negative DATA if last position is LSB Pipeline list id: filter ID pos: bit position in 6-bit weights head: current output of adder tree Entry board id: filter ID last pos: last position in the pipeline DATA: partial sum

24 24 Computation Pruning thru END (ComPEND) Global controller head Pipeline list id pos id pos id pos id pos Decision unit id last pos DATA id last pos DATA id last pos DATA = Entry board MUX Filling up the pipeline P: The next bit in the bit-serial computation P2: A new sum of products that has not yet been entered into the pipeline P3: The next step of a sum of products whose prior step is still in the pipeline id last pos DATA << - completed P P3 F p : ( ww ii,bb ww ii,bb 2 ww ii,bb 3 ww ii, ww ii, ) F q : ( ww jj,bb ww jj,bb 2 ww jj,bb 3 ww jj, ww jj, ) P2

25 25 Computation Pruning thru END (ComPEND) Operation pipeline (4) Global Controller () Weight buffers -> DRAM STT-RAM () WB WB WB Memory Controller (2) Provider Network (3) (2) Provider network (2) -> (3) Processing unit array (3) -> (4) Global controller AB AB AB AB

26 26 Evaluation Pre-trained weights of VGG-6 network and images from ImageNet ILSVRC-22 In-house cycle-accurate timing simulator by using C++ with DRAMSim2 for off-chip memory CACTI 6.5 to model SRAM NVSim for on-chip STT-RAM Synopsys Design Compiler with TSMC 45nm technology library with.9v to get parameters of timing/power/area for s and Provider Network

27 27 Evaluation VGG-6 network We use 5 layers in the VGG-6 network as workloads, excluding layer F. F is excluded since the total size of input activations is too big. Inputs to C are raw data that can be negative. The pruning scheme cannot be applied. C is implemented without ComPEND.

28 28 Evaluation Configuration Area Peak throughput (32-input 6 s in a row 9 rows GHz = 4.6 TOPS)

29 29 Evaluation Runtime Reduced by 6.62% on average compared to that without ComPEND for 5 layers Left bars: without ComPEND Right bars: with ComPEND < for VGG-6 layers > MEM_STT: reads/writes between off-chip memory and STT-RAM STT_WB: runtime of reads/writes between STT-RAM and WB MEM_WB: reads/writes between off-chip memory and WB MEM_AB: reads/writes between off-chip memory and AB AB_: reads/writes between AB and registers in s RUN_: computation in s

30 3 Evaluation Energy (dynamic & static) consumption Reduced by 23.5% on average for 5 layers D/S_CTRL: global controller D/S_NET: provider network D/S_STT: STT-RAM. D/S_AB: activation buffers D/S_WB: weight buffer D/S_: processing units Left bars: without ComPEND Right bars: with ComPEND < for VGG-6 layers >

31 3 Evaluation Power consumption Average over 5 layers Without ComPEND:.2 Watt With ComPEND:.3 Watt < for VGG-6 layers >

32 32 Evaluation Energy-delay product ComPEND reduces EDP and ED 2 P by 36.2% and 46.8% for the execution of the 5 layers < for VGG-6 layers >

33 33 Conclusion Proposed the concept of END (early negative detection) based on inverted two s complement Proposed an architecture that implements ComPEND Achieved 6.62% higher speed and 23.5% less energy consumption for inference Future work Combining with other zero-skipping approaches Handling layers (say, F in VGG-6) exceeding the capacity of the architecture

34 THANK YOU

BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques

BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques Jingyang Zhu 1, Zhiliang Qian 2*, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and

More information

Binary Convolutional Neural Network on RRAM

Binary Convolutional Neural Network on RRAM Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua

More information

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Charles Eckert Xiaowei Wang Jingcheng Wang Arun Subramaniyan Ravi Iyer Dennis Sylvester David Blaauw Reetuparna Das M-Bits Research

More information

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough 1,2 S. K. Lee 2, N. Mulholland 2, P. Hansen 2, S. Kodali 3, D. Brooks 2, G.-Y. Wei 2 1 ARM Research, Boston,

More information

Binary Addition. Add the binary numbers and and show the equivalent decimal addition.

Binary Addition. Add the binary numbers and and show the equivalent decimal addition. Binary Addition The rules for binary addition are 0 + 0 = 0 Sum = 0, carry = 0 0 + 1 = 0 Sum = 1, carry = 0 1 + 0 = 0 Sum = 1, carry = 0 1 + 1 = 10 Sum = 0, carry = 1 When an input carry = 1 due to a previous

More information

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School

More information

How to Estimate the Energy Consumption of Deep Neural Networks

How to Estimate the Energy Consumption of Deep Neural Networks How to Estimate the Energy Consumption of Deep Neural Networks Tien-Ju Yang, Yu-Hsin Chen, Joel Emer, Vivienne Sze MIT 1 Problem of DNNs Recognition Smart Drone AI Computation DNN 15k 300k OP/Px DPM 0.1k

More information

DNN Accelerator Architectures

DNN Accelerator Architectures DNN Accelerator Architectures ISCA Tutorial (2017) Website: http://eyeriss.mit.edu/tutorial.html Joel Emer, Vivienne Sze, Yu-Hsin Chen 1 2 Highly-Parallel Compute Paradigms Temporal Architecture (SIMD/SIMT)

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

CMPE223/CMSE222 Digital Logic Design. Positional representation

CMPE223/CMSE222 Digital Logic Design. Positional representation CMPE223/CMSE222 Digital Logic Design Number Representation and Arithmetic Circuits: Number Representation and Unsigned Addition Positional representation First consider integers Begin with positive only

More information

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei

More information

Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks

Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks Yu-Hsin Chen 1, Joel Emer 1, 2, Vivienne Sze 1 1 MIT 2 NVIDIA 1 Contributions of This Work A novel energy-efficient

More information

High Performance Computing

High Performance Computing High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason

More information

Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters

Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters *Argonne National Lab +BU & USTC Presented by Martin Herbordt Work by Ahmed

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

Revolutionizing the Datacenter

Revolutionizing the Datacenter Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5

More information

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal Computer Architectures for Deep Learning Ethan Dell and Daniyal Iqbal Agenda Introduction to Deep Learning Challenges Architectural Solutions Hardware Architectures CPUs GPUs Accelerators FPGAs SOCs ASICs

More information

C-Brain: A Deep Learning Accelerator

C-Brain: A Deep Learning Accelerator C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level Parallelization Lili Song, Ying Wang, Yinhe Han, Xin Zhao, Bosheng Liu, Xiaowei Li State Key Laboratory

More information

Deep Learning Accelerators

Deep Learning Accelerators Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Ruizhe Zhao 1, Xinyu Niu 1, Yajie Wu 2, Wayne Luk 1, and Qiang Liu 3 1 Imperial College London {ruizhe.zhao15,niu.xinyu10,w.luk}@imperial.ac.uk

More information

Bit-Pragmatic Deep Neural Network Computing

Bit-Pragmatic Deep Neural Network Computing Bit-Pragmatic Deep Neural Network Computing Jorge Albericio, Patrick Judd, Alberto Delmás, Sayeh Sharify, Andreas Moshovos Department of Electrical and Computer Engineering University of Toronto {jorge,

More information

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture The 51st Annual IEEE/ACM International Symposium on Microarchitecture Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture Byungchul Hong Yeonju Ro John Kim FuriosaAI Samsung

More information

CSEE 3827: Fundamentals of Computer Systems. Storage

CSEE 3827: Fundamentals of Computer Systems. Storage CSEE 387: Fundamentals of Computer Systems Storage The big picture General purpose processor (e.g., Power PC, Pentium, MIPS) Internet router (intrusion detection, pacet routing, etc.) WIreless transceiver

More information

Introduction to Neural Networks

Introduction to Neural Networks ECE 5775 (Fall 17) High-Level Digital Design Automation Introduction to Neural Networks Ritchie Zhao, Zhiru Zhang School of Electrical and Computer Engineering Rise of the Machines Neural networks have

More information

Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network

Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Lixue Xia, Tianqi Tang, Wenqin Huangfu, Ming Cheng, Xiling Yin, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E., Tsinghua

More information

Research Faculty Summit Systems Fueling future disruptions

Research Faculty Summit Systems Fueling future disruptions Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning

More information

THE NVIDIA DEEP LEARNING ACCELERATOR

THE NVIDIA DEEP LEARNING ACCELERATOR THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural

More information

An introduction to Machine Learning silicon

An introduction to Machine Learning silicon An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional

More information

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, Chunyuan Zhang National University of Defense Technology,

More information

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several

More information

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Why this class? Deep Features Have been able to harness the big data in the most efficient and effective

More information

Deep Learning Hardware Acceleration

Deep Learning Hardware Acceleration * Deep Learning Hardware Acceleration Jorge Albericio + Alberto Delmas Lascorz Patrick Judd Sayeh Sharify Tayler Hetherington* Natalie Enright Jerger Tor Aamodt* + now at NVIDIA Andreas Moshovos Disclaimer

More information

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks Angshuman Parashar Minsoo Rhu Anurag Mukkara Antonio Puglielli Rangharajan Venkatesan Brucek Khailany Joel Emer Stephen W. Keckler

More information

IT 201 Digital System Design Module II Notes

IT 201 Digital System Design Module II Notes IT 201 Digital System Design Module II Notes BOOLEAN OPERATIONS AND EXPRESSIONS Variable, complement, and literal are terms used in Boolean algebra. A variable is a symbol used to represent a logical quantity.

More information

Outline. Deep Convolutional Neural Network (DCNN) Stochastic Computing (SC)

Outline. Deep Convolutional Neural Network (DCNN) Stochastic Computing (SC) L.C.Smith College of Engineering and Computer Science Towards Acceleration of Deep Convolutional Neural Networks using Stochastic Computing Ji Li Ao Ren Zhe Li Caiwen Ding Bo Yuan Qinru Qiu Yanzhi Wang

More information

Digital Arithmetic. Digital Arithmetic: Operations and Circuits Dr. Farahmand

Digital Arithmetic. Digital Arithmetic: Operations and Circuits Dr. Farahmand Digital Arithmetic Digital Arithmetic: Operations and Circuits Dr. Farahmand Binary Arithmetic Digital circuits are frequently used for arithmetic operations Fundamental arithmetic operations on binary

More information

Implementing Multipliers in Xilinx Virtex II FPGAs

Implementing Multipliers in Xilinx Virtex II FPGAs HUNT ENGINEERING Chestnut Court, Burton Row, Brent Knoll, Somerset, TA9 4BP, UK Tel: (+44) (0)1278 760188, Fax: (+44) (0)1278 760199, Email: sales@hunteng.co.uk http://www.hunteng.co.uk http://www.hunt-dsp.com

More information

2. Link and Memory Architectures and Technologies

2. Link and Memory Architectures and Technologies 2. Link and Memory Architectures and Technologies 2.1 Links, Thruput/Buffering, Multi-Access Ovrhds 2.2 Memories: On-chip / Off-chip SRAM, DRAM 2.A Appendix: Elastic Buffers for Cross-Clock Commun. Manolis

More information

Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System

Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System Xiaofan Zhang1, Mohamed El Hadedy1, Wen-mei Hwu1, Nam Sung Kim1, Jinjun Xiong2, Deming Chen1 1 University of Illinois Urbana-Champaign

More information

CapsAcc: An Efficient Hardware Accelerator for CapsuleNets with Data Reuse

CapsAcc: An Efficient Hardware Accelerator for CapsuleNets with Data Reuse Accepted for publication at Design, Automation and Test in Europe (DATE 2019). Florence, Italy CapsAcc: An Efficient Hardware Accelerator for CapsuleNets with Reuse Alberto Marchisio, Muhammad Abdullah

More information

USING DATAFLOW TO OPTIMIZE ENERGY EFFICIENCY OF DEEP NEURAL NETWORK ACCELERATORS

USING DATAFLOW TO OPTIMIZE ENERGY EFFICIENCY OF DEEP NEURAL NETWORK ACCELERATORS ... USING DATAFLOW TO OPTIMIZE ENERGY EFFICIENCY OF DEEP NEURAL NETWORK ACCELERATORS... Yu-Hsin Chen Massachusetts Institute of Technology Joel Emer Nvidia and Massachusetts Institute of Technology Vivienne

More information

Research Faculty Summit Systems Fueling future disruptions

Research Faculty Summit Systems Fueling future disruptions Research Faculty Summit 2018 Systems Fueling future disruptions Efficient Edge Computing for Deep Neural Networks and Beyond Vivienne Sze In collaboration with Yu-Hsin Chen, Joel Emer, Tien-Ju Yang, Sertac

More information

Outline. Introduction to Structured VLSI Design. Signed and Unsigned Integers. 8 bit Signed/Unsigned Integers

Outline. Introduction to Structured VLSI Design. Signed and Unsigned Integers. 8 bit Signed/Unsigned Integers Outline Introduction to Structured VLSI Design Integer Arithmetic and Pipelining Multiplication in the digital domain HW mapping Pipelining optimization Joachim Rodrigues Signed and Unsigned Integers n-1

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

Code No: R Set No. 1

Code No: R Set No. 1 Code No: R059210504 Set No. 1 II B.Tech I Semester Regular Examinations, November 2006 DIGITAL LOGIC DESIGN ( Common to Computer Science & Engineering, Information Technology and Computer Science & Systems

More information

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim Farzad Farshchi, Qijing Huang, Heechul Yun University of Kansas, University of California, Berkeley SiFive Internship Rocket

More information

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory Scalable and Energy-Efficient Architecture Lab (SEAL) PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in -based Main Memory Ping Chi *, Shuangchen Li *, Tao Zhang, Cong

More information

Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand

Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand A. Ansari, S. Feng, S. Gupta, J. Torrellas, S. Mahlke HPCA - 2013 University of Illinois University of Michigan June 28, 2013.

More information

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Couture: Tailoring STT-MRAM for Persistent Main Memory Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Executive Summary Motivation: DRAM plays an instrumental role in modern

More information

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard

More information

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu

More information

Numeric Encodings Prof. James L. Frankel Harvard University

Numeric Encodings Prof. James L. Frankel Harvard University Numeric Encodings Prof. James L. Frankel Harvard University Version of 10:19 PM 12-Sep-2017 Copyright 2017, 2016 James L. Frankel. All rights reserved. Representation of Positive & Negative Integral and

More information

Efficient Methods for Deep Learning

Efficient Methods for Deep Learning Efficient Methods for Deep Learning Song Han Stanford University Sep 2016 Background: Deep Learning for Everything Source: Brody Huval et al., An Empirical Evaluation, arxiv:1504.01716 Source: leon A.

More information

Chapter 3: part 3 Binary Subtraction

Chapter 3: part 3 Binary Subtraction Chapter 3: part 3 Binary Subtraction Iterative combinational circuits Binary adders Half and full adders Ripple carry and carry lookahead adders Binary subtraction Binary adder-subtractors Signed binary

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, Christos Kozyrakis Stanford University Platform Lab Review Feb 2017 Deep Neural

More information

ELE 655 Microprocessor System Design

ELE 655 Microprocessor System Design ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half

More information

FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review

FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review Date of publication 2018 00, 0000, date of current version 2018 00, 0000. Digital Object Identifier 10.1109/ACCESS.2018.2890150.DOI arxiv:1901.00121v1 [cs.ne] 1 Jan 2019 FPGA-based Accelerators of Deep

More information

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Deep Learning with CPU, GPU, and FPGA Technologies Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant

More information

IN-MEMORY ASSOCIATIVE COMPUTING

IN-MEMORY ASSOCIATIVE COMPUTING IN-MEMORY ASSOCIATIVE COMPUTING AVIDAN AKERIB, GSI TECHNOLOGY AAKERIB@GSITECHNOLOGY.COM AGENDA The AI computational challenge Introduction to associative computing Examples An NLP use case What s next?

More information

EET 1131 Lab #7 Arithmetic Circuits

EET 1131 Lab #7 Arithmetic Circuits Name Equipment and Components Safety glasses ETS-7000 Digital-Analog Training System Integrated Circuits: 7483, 74181 Quartus II software and Altera DE2-115 board Multisim simulation software EET 1131

More information

arxiv: v1 [cs.cv] 11 Feb 2018

arxiv: v1 [cs.cv] 11 Feb 2018 arxiv:8.8v [cs.cv] Feb 8 - Partitioning of Deep Neural Networks with Feature Space Encoding for Resource-Constrained Internet-of-Things Platforms ABSTRACT Jong Hwan Ko, Taesik Na, Mohammad Faisal Amir,

More information

arxiv: v3 [cs.ne] 17 Dec 2018

arxiv: v3 [cs.ne] 17 Dec 2018 DPRed: Making Typical Activation and Weight Values Matter In Deep Learning Computing arxiv:84.673v3 [cs.ne] 7 Dec 8 Alberto Delmás Lascorz, Sayeh Sharify, Patrick Judd, Kevin Siu, Milos Nikolic, Andreas

More information

Chapter 4: Combinational Logic

Chapter 4: Combinational Logic Chapter 4: Combinational Logic Combinational Circuit Design Analysis Procedure (Find out nature of O/P) Boolean Expression Approach Truth Table Approach Design Procedure Example : BCD to Excess-3 code

More information

CS/COE 0447 Example Problems for Exam 2 Spring 2011

CS/COE 0447 Example Problems for Exam 2 Spring 2011 CS/COE 0447 Example Problems for Exam 2 Spring 2011 1) Show the steps to multiply the 4-bit numbers 3 and 5 with the fast shift-add multipler. Use the table below. List the multiplicand (M) and product

More information

1. Mark the correct statement(s)

1. Mark the correct statement(s) 1. Mark the correct statement(s) 1.1 A theorem in Boolean algebra: a) Can easily be proved by e.g. logic induction b) Is a logical statement that is assumed to be true, c) Can be contradicted by another

More information

Solving the Non-Volatile Memory Conundrum for Deep Learning Workloads

Solving the Non-Volatile Memory Conundrum for Deep Learning Workloads Solving the Non-Volatile Memory Conundrum for Deep Learning Workloads Ahmet Inci and Diana Marculescu Department of Electrical and Computer Engineering Carnegie Mellon University ainci@andrew.cmu.edu Architectures

More information

ENGR 303 Introduction to Logic Design Lecture 7. Dr. Chuck Brown Engineering and Computer Information Science Folsom Lake College

ENGR 303 Introduction to Logic Design Lecture 7. Dr. Chuck Brown Engineering and Computer Information Science Folsom Lake College Introduction to Logic Design Lecture 7 Dr. Chuck Brown Engineering and Computer Information Science Folsom Lake College Outline for Todays Lecture Shifter Multiplier / Divider Memory Shifters Logical

More information

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: January 2, 2018 at 11:23 CS429 Slideset 5: 1 Topics of this Slideset

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

CMSC 2833 Lecture Memory Organization and Addressing

CMSC 2833 Lecture Memory Organization and Addressing Computer memory consists of a linear array of addressable storage cells that are similar to registers. Memory can be byte-addressable, or word-addressable, where a word typically consists of two or more

More information

A performance comparison of Deep Learning frameworks on KNL

A performance comparison of Deep Learning frameworks on KNL A performance comparison of Deep Learning frameworks on KNL R. Zanella, G. Fiameni, M. Rorro Middleware, Data Management - SCAI - CINECA IXPUG Bologna, March 5, 2018 Table of Contents 1. Problem description

More information

A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models

A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models and Voice-Activated Power Gating Michael Price*, James Glass, Anantha Chandrakasan MIT, Cambridge, MA * now at Analog Devices, Cambridge,

More information

arxiv: v2 [cs.cv] 3 May 2016

arxiv: v2 [cs.cv] 3 May 2016 EIE: Efficient Inference Engine on Compressed Deep Neural Network Song Han Xingyu Liu Huizi Mao Jing Pu Ardavan Pedram Mark A. Horowitz William J. Dally Stanford University, NVIDIA {songhan,xyl,huizi,jingpu,perdavan,horowitz,dally}@stanford.edu

More information

The x86 Microprocessors. Introduction. The 80x86 Microprocessors. 1.1 Assembly Language

The x86 Microprocessors. Introduction. The 80x86 Microprocessors. 1.1 Assembly Language The x86 Microprocessors Introduction 1.1 Assembly Language Numbering and Coding Systems Human beings use the decimal system (base 10) Decimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Computer systems use the

More information

Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design

Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design Song Yao 姚颂 Founder & CEO DeePhi Tech 深鉴科技 song.yao@deephi.tech Outline - About DeePhi Tech - Background - Bandwidth Matters

More information

Chapter 4 Design of Function Specific Arithmetic Circuits

Chapter 4 Design of Function Specific Arithmetic Circuits Chapter 4 Design of Function Specific Arithmetic Circuits Contents Chapter 4... 55 4.1 Introduction:... 55 4.1.1 Incrementer/Decrementer Circuit...56 4.1.2 2 s Complement Circuit...56 4.1.3 Priority Encoder

More information

Design of Arithmetic circuits

Design of Arithmetic circuits Design of Arithmetic circuits ic principle of pipelining ditional approach Input Data clk Process < 100 ns Through 10 MH elining approach Throughput considerably. increases Chip area also increases. Latency

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

By, Ajinkya Karande Adarsh Yoga

By, Ajinkya Karande Adarsh Yoga By, Ajinkya Karande Adarsh Yoga Introduction Early computer designers believed saving computer time and memory were more important than programmer time. Bug in the divide algorithm used in Intel chips.

More information

Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm

Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm ECE5775 High-Level Digital Design Automation, Fall 2017 School of Electrical Computer Engineering, Cornell University Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm 1 Introduction

More information

Cache Memory - II. Some of the slides are adopted from David Patterson (UCB)

Cache Memory - II. Some of the slides are adopted from David Patterson (UCB) Cache Memory - II Some of the slides are adopted from David Patterson (UCB) Outline Direct-Mapped Cache Types of Cache Misses A (long) detailed example Peer - to - peer education example Block Size Tradeoff

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Slide Set 1. for ENEL 339 Fall 2014 Lecture Section 02. Steve Norman, PhD, PEng

Slide Set 1. for ENEL 339 Fall 2014 Lecture Section 02. Steve Norman, PhD, PEng Slide Set 1 for ENEL 339 Fall 2014 Lecture Section 02 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Fall Term, 2014 ENEL 353 F14 Section

More information

DeepLearning on FPGAs

DeepLearning on FPGAs DeepLearning on FPGAs Introduction to FPGAs Sebastian Buschjäger Technische Universität Dortmund - Fakultät Informatik - Lehrstuhl 8 October 24, 2017 1 Recap: Convolution Observation 1 Even smaller images

More information

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework RAPIDNN: In-Memory Deep Neural Network Acceleration Framework Mohsen Imani, Mohammad Samragh, Yeseong Kim, Saransh Gupta, Farinaz Koushanfar and Tajana Rosing Computer Science and Engineering Department,

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30

More information

Cache/Memory Optimization. - Krishna Parthaje

Cache/Memory Optimization. - Krishna Parthaje Cache/Memory Optimization - Krishna Parthaje Hybrid Cache Architecture Replacing SRAM Cache with Future Memory Technology Suji Lee, Jongpil Jung, and Chong-Min Kyung Department of Electrical Engineering,KAIST

More information

Deep Learning Processing Technologies for Embedded Systems. October 2018

Deep Learning Processing Technologies for Embedded Systems. October 2018 Deep Learning Processing Technologies for Embedded Systems October 2018 1 Neural Networks Architecture Single Neuron DNN Multi Task NN Multi-Task Vehicle Detection With Region-of-Interest Voting Popular

More information

Xilinx DNN Processor An Inference Engine, Network Compiler + Runtime for Xilinx FPGAs

Xilinx DNN Processor An Inference Engine, Network Compiler + Runtime for Xilinx FPGAs ilinx DNN Proceor An Inference Engine, Network Compiler Runtime for ilinx FPGA Rahul Nimaiyar, Brian Sun, Victor Wu, Thoma Branca, Yi Wang, Jutin Oo, Elliott Delaye, Aaron Ng, Paolo D'Alberto, Sean Settle,

More information

END-TERM EXAMINATION

END-TERM EXAMINATION (Please Write your Exam Roll No. immediately) END-TERM EXAMINATION DECEMBER 2006 Exam. Roll No... Exam Series code: 100919DEC06200963 Paper Code: MCA-103 Subject: Digital Electronics Time: 3 Hours Maximum

More information

Value-driven Synthesis for Neural Network ASICs

Value-driven Synthesis for Neural Network ASICs Value-driven Synthesis for Neural Network ASICs Zhiyuan Yang University of Maryland, College Park zyyang@umd.edu ABSTRACT In order to enable low power and high performance evaluation of neural network

More information

Combinational Logic Use the Boolean Algebra and the minimization techniques to design useful circuits No feedback, no memory Just n inputs, m outputs

Combinational Logic Use the Boolean Algebra and the minimization techniques to design useful circuits No feedback, no memory Just n inputs, m outputs Combinational Logic Use the Boolean Algebra and the minimization techniques to design useful circuits No feedback, no memory Just n inputs, m outputs and an arbitrary truth table Analysis Procedure We

More information

Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC

Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC Eriko Nurvitadhi, David Sheffield, Jaewoong Sim, Asit Mishra, Ganesh Venkatesh and Debbie Marr Accelerator Architecture Lab,

More information

mrna: Enabling Efficient Mapping Space Exploration for a Reconfigurable Neural Accelerator

mrna: Enabling Efficient Mapping Space Exploration for a Reconfigurable Neural Accelerator mrna: Enabling Efficient Mapping Space Exploration for a Reconfigurable Neural Accelerator Zhongyuan Zhao, Hyoukjun Kwon, Sachit Kuhar, Weiguang Sheng, Zhigang Mao, and Tushar Krishna Shanghai Jiao Tong

More information

High-Throughput and High-Accuracy Classification with Convolutional Ternary Neural Networks. Frédéric Pétrot, Adrien Prost-Boucle, Alban Bourge

High-Throughput and High-Accuracy Classification with Convolutional Ternary Neural Networks. Frédéric Pétrot, Adrien Prost-Boucle, Alban Bourge High-Throughput and High-Accuracy Classification with Convolutional Ternary Neural Networks Frédéric Pétrot, Adrien Prost-Boucle, Alban Bourge International Workshop on Highly Efficient Neural Processing

More information

Memory Devices. Future?

Memory Devices. Future? Memory evices Small: Register file (group of numbered registers) Medium: SRAM (Static Random Access Memory) Large: RAM (ynamic Random Access Memory) Future? 1 Processor: ata Path Components 2 1 3 Instruction

More information

Method for hardware implementation of a convolutional turbo code interleaver and a sub-block interleaver

Method for hardware implementation of a convolutional turbo code interleaver and a sub-block interleaver Method for hardware implementation of a convolutional turbo code interleaver and a sub-block interleaver isclosed is a method for hardware implementation of a convolutional turbo code interleaver and a

More information