Neural Computer Architectures

Size: px
Start display at page:

Download "Neural Computer Architectures"

Transcription

1 Neural Computer Architectures 5kk73 Embedded Computer Architecture By: Maurice Peemen Date:

2 Convergence of different domains Neurobiology Applications 1 Constraints Machine Learning Technology Innovations Neuromorphic

3 Biological Neural Networks 2

4 Biological Neural Networks 3 Presynaptic neuron Postsynaptic neuron Cell body Synapses

5 Perceptron Model (1957) 4 Feed forward processing Tuning the weights by learning Non-linear separability (1969) y b xi w i i step x[1] x[2] x[3] x[k] w[1] w[2] w[3] w[k] K-1 Σ = p φ(p) y k=1 bias sigmoid

6 Convergence of different domains Neurobiology Applications 5 Constraints Machine Learning Technology Innovations Neuromorphic

7 Multi Layer Perceptron (1979) 6 Training is done by error back-propagation Target Input Layer Hidden Layer Output Layer 0 0

8 The Hype Curve of Neural Networks 7 level of interest Non-Linear Separability 1969 SVM 1998 Today Perceptron 1957 Multi Layer Perceptron time

9 Deep Big Neural Networks 8 Deep Big Neural networks outperform SVM ANNs are now state-of-the-art classifiers again 5 layers 1000s of nodes connection constraints Big Deep Network H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, ICML 2007

10 Convergence of different domains Neurobiology Applications 9 Constraints Machine Learning Technology Innovations Neuromorphic

11 Classification: Face detection 10

12 Intelligent Vision Applications 11 Emerging field of research Applications in many domains Examples: Security, Industrial, Medical, Automotive

13 Intelligent Vision Applications 12 Emerging field of research Applications in many domains Examples: Security, Industrial, Medical, Automotive

14 Intelligent Vision Applications 13 Emerging field of research Applications in many domains Examples: Security, Industrial, Medical, Automotive Old man Breathing Heart beat No action

15 Intelligent Vision Applications 14 Emerging field of research Applications in many domains Examples: Security, Industrial, Medical, Automotive

16 Intelligent Vision Applications 15 Emerging field of research Applications in many domains Examples: Security, Industrial, Medical, Automotive

17 Classical recognition systems are stupid 16 Design is based on knowledge of the task Carefully tuned pipeline of algorithms Really complex for real world problems Design must be redone if the task changes light correction histogram stretch colour thresholding edge detection corner detection shape recognition hough transform matching neural networks

18 Train a Neural Network for the task 17 Focus on data instead of algorithm complexity Pre-process data to generate more examples Use a test set to verify generalization 30 km/h 50 km/h 60 km/h 70 km/h 80 km/h 90 km/h 100 km/h Background images hard to suppress Random background image patches

19 Biologically inspired object recognition 18 Convolutional Neural Network A deep and big neural network input 32 x 32 C 1 feature maps 28 x 28 C 2 feature maps S 1 10 x 10 feature maps 14 x 14 S 2 feature maps 5 x 5 n 1 n 2 output sign x5 convolution 2x2 subsampling 5x5 convolution feature extraction 2x2 5x5 subsampling convolution 100 1x1 convolution classification

20 Detection and Recognition Application 19

21 Detection and Recognition Application 20

22 Detection and Recognition Application 21

23 Detection and Recognition Application 22

24 Speed Sign Detection and Recognition 23

25 Advantage of flexibility 24 Extend existing trained network Add new road signs and restart training New weight file is new functionality Send new weight file to users (100 KB)

26 Advantage of flexibility 25 Extend existing trained network Add new road signs and restart training New weight file is new functionality Send new weight file to users (100 KB)

27 Major road detection 26

28 What can these NN further do 27 Classification Approximation Optimization Clustering

29 Function Approximation 28 Stock market prediction: Black Scholes

30 Placement Optimization 29 Chip routing: Canneal Minimize wire length Hopfield Neural Network

31 Convergence of different domains Neurobiology Applications 30 Constraints Machine Learning Technology Innovations Neuromorphic

32 Technology Constraints 31 Dark Silicon Defect tolerance

33 Dark Silicon 32 What to do with chips that are too hot? Reduce clock frequency Go multi-core If chip is still too hot? Turn parts of the chip off! Generates Dark Silicon

34 Energy Efficiency 33 Super Computer (K computer, Fujitsu) 8.2 billion Megaflops => 9.9 million watts ~ 800 Megaflops / watt ipad Megaflops => 2.5 watts ~ 68 Megaflops / watt Human Brain 2.2 billion Megaops => 20 watts ~ 110 Teraops / watt

35 Toward Heterogeneous Systems 34 Efficient accelerators Multi-purpose ASICs ANN is a candidate Flexible functionality State-of-the-art results Parallelism

36 Developing ANN Accelerators 35 for i = 1:N Y[i] = Bias[i] for k = 1:K Y[i] += X[k] * W[i][k] Y[i] = Sigmoid(Y[i]) y b x w i i k ik k

37 Time-Multiplexed Accelerator 36 for i = 1:N Y[i] = Bias[i] for k = 1:K Y[i] += X[k] * W[i][k] Y[i] = Sigmoid(Y[i]) y b x w i i k ik k 1 () v 1 exp( a x) Load Bias X[1:N] = 1 W[i][1] = Bias[i] Perform MACC Sigmoid Approximate

38 Analog Intel ETANN Electrically Trainable Analog Neural Network Analog Gilbert- Multiplier Circuits Sum differential currents from synapses and convert to voltage Weights stored as Electrical Charge on floating gates Analog sigmoid activation function

39 Digital Implementation 38 Sigmoid Function Look Up Table Use linear approximation ( x) bi ai x Multiply Accumulate

40 SIMD design Adaptive Solutions N

41 Conversion to vector operations 40 y [ n] b x [ n] w k i i k ik y[ n] b x[ n] W Y b X W

42 Systolic Matrix Multiplication 41 Siemens MA16 High efficiency Low flexibility

43 An example state-of-the-art accelerator 42

44 Systolic 2D Convolution 43

45 Convolutional Neural Network 44 Data reuse input 32 x 32 C 1 feature maps 28 x 28 C 2 feature maps S 1 10 x 10 feature maps 14 x 14 S 2 feature maps 5 x 5 n 1 n 2 output sign x5 convolution 2x2 subsampling 1x1 convolution 5x5 convolution 2x2 5x5 subsampling convolution

46 Reduce Memory Accesses 45 Configurable Number of Input Maps Configurable Number of Output Maps

47 Is it worth the effort? 46 More important the energy efficiency

48 More Flexibility and Better Memory Behaviour? 47

49 Energy for Data Transfer [J] The performance bottleneck 48 Huge data transfer requirements (3.4 billion per layer) Exploit data reuse with local memories DRAM Cache Total On-Chip Cache Size [Words]

50 Accelerator Template 49 FPGA prototyping platform: Xilinx Virtex 6 Designed with Vivado High Level Synthesis (HLS) MACC in_img weight * + acc bias out_img FSLs In Ctrl in_img weight bias MACC MACC MACC MACC MACC MACC MACC MACC DDR Out Ctrl out_img Activation LUT MACC Select saturate MACC

51 Programmable Buffers 50 Image Coefficients addr select wr demux Input FSLs X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr rotate mux sigmoid LUT out-img BRAM Output FSLs weight BRAM addr select rd/wr addr select rd

52 Programmable Buffers 51 Image Coefficients addr select wr demux Input FSLs X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr rotate mux sigmoid LUT out-img BRAM Output FSLs weight BRAM addr select rd/wr addr select rd

53 Programmable Buffers 52 buffer address x 00, x 01, x 02, x 03 x 50 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x x 00 x 01 x 02 x 03 x 04 x 05 x 06 x 07 x 08 x x 10 x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x x 01, x 02, x 03, x 04 x 02, x 03, x 04, x 05 x 03, x 04, x 05, x 06 x 04, x 05, x 06, x 07 x 10, x 11, x 12, x 13 addr select wr demux Input FSLs X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr rotate mux sigmoid LUT out-img BRAM Output FSLs weight BRAM addr select rd/wr addr select rd

54 Flexible Reuse Buffers input 720 x 1280 Layer 1 6x358x addr select wr 6x6 conv. with 2x2 subsample demux Input FSLs demux X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr weight BRAM weight BRAM rotate addr select rd

55 Flexible Reuse Buffers 54 Image Coefficients addr select wr demux Input FSLs demux X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr weight BRAM weight BRAM rotate addr select rd

56 Flexible Reuse Buffers 55 Image Coefficients addr select wr demux Input FSLs demux X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr weight BRAM weight BRAM rotate addr select rd

57 Flexible Reuse Buffers 56 x 05 x 15 x 25 x 35 buffer address x 04, x 14, x 24, x 34 x x 00 x 40 x 01 x 41 x 02 x 42 x 03 x 43 x 04 x 44 y x 14, x 24, x 34, x 44 x 55 x 10 x 50 x 11 x 51 x 12 x 52 x 13 x 53 x 14 x 54 y 10 x 24, x 34, x 44, x 54 x 65 x x 34, x 44, x 54, x x 60 x 21 x 61 x 22 x 62 x 23 x 63 x 24 x 64 y 20 x 75 x 44, x 54, x 64, x x x 70 x 31 x 71 x 32 x 72 x 33 x 73 x 34 x 74 y 30 y 00, y 10, y 20, y 30 addr select wr demux Input FSLs demux X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr weight BRAM weight BRAM rotate addr select rd

58 Support for Subsampling 57 Image Coefficients x 05 x 15 x 25 x 35 x 45 x 55 x 65 x 75 buffer address x 00 x 10 x 80 x 90 x 01 x 11 x 81 x 20 x 30 x a0 x 21 x 31 x a1 x 40 x 60 x 50 x 70 x 41 x 61 x 51 x x 91 x 02 x 22 x 42 x 62 x 00, x 20, x 40, x 60 x 10, x 30, x 50, x 70 x 20, x 40, x 60, x 80 x 30, x 50, x 70, x 90 x 40, x 60, x 80, x a0 x 01, x 21, x 41, x 61

59 Support for Subsampling 58 Image Coefficients x 05 x 15 x 25 x 35 x 45 x 55 x 65 x 75 buffer address x 00 x 10 x 80 x 90 x 01 x 11 x 81 x 20 x 30 x a0 x 21 x 31 x a1 x 40 x 60 x 50 x 70 x 41 x 61 x 51 x x 91 x 02 x 22 x 42 x 62 x 00, x 20, x 40, x 60 x 10, x 30, x 50, x 70 x 20, x 40, x 60, x 80 x 30, x 50, x 70, x 90 x 40, x 60, x 80, x a0 x 01, x 21, x 41, x 61

60 What would be the best compute order? 59 Small memories have low energy per access Area and Latency advantage Big memories can exploit more data reuse

61 Improve by locality driven synthesis 60 Loop Transformations Interchange Tiling Reduce reuse distance A huge design space! Use a framework with: Reuse detection Model utilized reuse Model required buffer size Optimize for buffer size Cost models

62 Compared to manually optimized order 61 Up to 13x resource reduction Up to 11x performance increase

63 Memory bandwidth requirements? 62 Data layout transformation Bandwidth up to 150 MB/s Better than an optimized Intel implementation

64 What do we achieve? 63 Flexible architecture template HLS vision cores Iteration reordering models to minimize data transfer Small but flexible accelerators Up to 13x smaller Up to 11x faster XPower Analyzer 4.5 Watt External RAM 0.5 Watt

65 Beyond Energy: Defects-Tolerant Accelerators? 64 Growing number of defect Design of micro-architectures Homogeneous Architectures Core redundancy Switch-off the defect cores How about Heterogeneous designs? A little story Defect tolerant accelerators

66 Defects-Tolerant ANNs 65 Memory decoder Spatially unfolding a network Power reduction, Memory BW Time-Multiplexing

67 Hardware ANN Robustness 66 ANN 90 inputs 10 outputs Olivier Temam: A Defect-Tolerant Accelerator for Emerging High-Performance Applications, ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2012

68 Convergence of different domains Neurobiology Applications 67 Constraints Machine Learning Technology Innovations Neuromorphic

69 Beyond ANNs: Biological NNs 68 Understand the mind by simulating the brain Model perception Model memory Etc. Understand brain diseases Parkinson Alzheimer Etc. Software simulators Emergent NEURON Neurons Synapses Hz

70 Can computers do the same? 69 Blue Brain Project IBM/EPFL Molecular level 10 4 neurons 10 3 cores Spinnaker Integrate & fire 10 9 neurons 10 4 Arm9 cores

71 Spinnaker Chip Architecture 70

72 Spinnaker interconnect 71 Connection Hierarchy Group neurons to reduce inter-chip communication 128 MB SDRam Small Packets bit Routing tables

73 Convergence of different domains Neurobiology Applications 72 Constraints Machine Learning Technology Innovations Neuromorphic

74 Size 73 Digital CMOS Technology available Implementation of useful accelerators Not dense enough for largest bio-inspired networks Analog Much more dense implementation Recall Biological Neuron

75 Analog Spiking Neurons 74 Kirchhoff s law Capacitive integration Leakage ~14 transistors

76 Architecture Facets Project 75 Facets Integrate & Fire neurons wafer 60 million synapses Most area used for synapses Storage of connection strength Interconnect 2-D

77 Convergence of different domains Neurobiology Applications 76 Constraints Machine Learning Technology Innovations Neuromorphic

78 Synapses as Memristors Intel (2012) 77 Memristor can be used as switch Also analog storage of memristance

79 Beyond Silicon 78 Infineon NeuroChip (2003) Directly uses biological networks Difficult to connect to other devices

80 Convergence of different domains Neurobiology Applications 79 Constraints Machine Learning Technology Innovations Neuromorphic

Neural Computer Architectures

Neural Computer Architectures Neural Computer Architectures Accelerating Deep Learning Applications By: Maurice Peemen Date: 19-12-2018 Background Maurice 1 Masters Electrical Engineering at TU/e PhD work at TU/e Thesis work with Henk

More information

Memory-Centric Accelerator Design for Convolutional Neural Networks

Memory-Centric Accelerator Design for Convolutional Neural Networks Memory-Centric Accelerator Design for Convolutional Neural Networks Maurice Peemen, Arnaud A. A. Setio, Bart Mesman and Henk Corporaal Department of Electrical Engineering, Eindhoven University of Technology,

More information

Neural Network based Energy-Efficient Fault Tolerant Architect

Neural Network based Energy-Efficient Fault Tolerant Architect Neural Network based Energy-Efficient Fault Tolerant Architectures and Accelerators University of Rochester February 7, 2013 References Flexible Error Protection for Energy Efficient Reliable Architectures

More information

Brainchip OCTOBER

Brainchip OCTOBER Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History

More information

SpiNNaker - a million core ARM-powered neural HPC

SpiNNaker - a million core ARM-powered neural HPC The Advanced Processor Technologies Group SpiNNaker - a million core ARM-powered neural HPC Cameron Patterson cameron.patterson@cs.man.ac.uk School of Computer Science, The University of Manchester, UK

More information

Biologically-Inspired Massively-Parallel Architectures - computing beyond a million processors

Biologically-Inspired Massively-Parallel Architectures - computing beyond a million processors Biologically-Inspired Massively-Parallel Architectures - computing beyond a million processors Dave Lester The University of Manchester d.lester@manchester.ac.uk NeuroML March 2011 1 Outline 60 years of

More information

M.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India

M.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 High Performance Scalable Deep Learning Accelerator

More information

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017 SpiNNaker a Neuromorphic Supercomputer Steve Temple University of Manchester, UK SOS21-21 Mar 2017 Outline of talk Introduction Modelling neurons Architecture and technology Principles of operation Summary

More information

DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA

DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA J.Jayalakshmi 1, S.Ali Asgar 2, V.Thrimurthulu 3 1 M.tech Student, Department of ECE, Chadalawada Ramanamma Engineering College, Tirupati Email

More information

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, Christos Kozyrakis Stanford University Platform Lab Review Feb 2017 Deep Neural

More information

Data Mining. Neural Networks

Data Mining. Neural Networks Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most

More information

An Adaptable Deep Learning Accelerator Unit (DLAU) for FPGA

An Adaptable Deep Learning Accelerator Unit (DLAU) for FPGA An Adaptable Deep Learning Accelerator Unit (DLAU) for FPGA N. Sireesha 1 & P.Malleswari 2 1PG Scholar, Dept of ECE, Narsaraopeta Institute of Technology, Yellamanda, Narsaraopeta, Guntur district, Andhra

More information

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder ESE532: System-on-a-Chip Architecture Day 8: September 26, 2018 Spatial Computations Today Graph Cycles (from Day 7) Accelerator Pipelines FPGAs Zynq Computational Capacity 1 2 Message Custom accelerators

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant

More information

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, Chunyuan Zhang National University of Defense Technology,

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A

More information

Introduction to Neural Networks

Introduction to Neural Networks ECE 5775 (Fall 17) High-Level Digital Design Automation Introduction to Neural Networks Ritchie Zhao, Zhiru Zhang School of Electrical and Computer Engineering Rise of the Machines Neural networks have

More information

Altera SDK for OpenCL

Altera SDK for OpenCL Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group

More information

Neuromorphic Hardware. Adrita Arefin & Abdulaziz Alorifi

Neuromorphic Hardware. Adrita Arefin & Abdulaziz Alorifi Neuromorphic Hardware Adrita Arefin & Abdulaziz Alorifi Introduction Neuromorphic hardware uses the concept of VLSI systems consisting of electronic analog circuits to imitate neurobiological architecture

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

Opening the Black Box Data Driven Visualizaion of Neural N

Opening the Black Box Data Driven Visualizaion of Neural N Opening the Black Box Data Driven Visualizaion of Neural Networks September 20, 2006 Aritificial Neural Networks Limitations of ANNs Use of Visualization (ANNs) mimic the processes found in biological

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

A Large-Scale Spiking Neural Network Accelerator for FPGA Systems

A Large-Scale Spiking Neural Network Accelerator for FPGA Systems A Large-Scale Spiking Neural Network Accelerator for FPGA Systems Kit Cheung 1, Simon R Schultz 2, Wayne Luk 1 1 Department of Computing, 2 Department of Bioengineering Imperial College London {k.cheung11,

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu

More information

CNP: An FPGA-based Processor for Convolutional Networks

CNP: An FPGA-based Processor for Convolutional Networks Clément Farabet clement.farabet@gmail.com Computational & Biological Learning Laboratory Courant Institute, NYU Joint work with: Yann LeCun, Cyril Poulet, Jefferson Y. Han Now collaborating with Eugenio

More information

A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation

A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation Abstract: The power budget is expected to limit the portion of the chip that we can power ON at the upcoming technology nodes. This problem,

More information

Approximate Fixed-Point Elementary Function Accelerator for the SpiNNaker-2 Neuromorphic Chip

Approximate Fixed-Point Elementary Function Accelerator for the SpiNNaker-2 Neuromorphic Chip Approximate Fixed-Point Elementary Function Accelerator for the SpiNNaker-2 Neuromorphic Chip Mantas Mikaitis, PhD student @ University of Manchester, UK mantas.mikaitis@manchester.ac.uk 25 th IEEE Symposium

More information

Neurmorphic Architectures. Kenneth Rice and Tarek Taha Clemson University

Neurmorphic Architectures. Kenneth Rice and Tarek Taha Clemson University Neurmorphic Architectures Kenneth Rice and Tarek Taha Clemson University Historical Highlights Analog VLSI Carver Mead and his students pioneered the development avlsi technology for use in neural circuits

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference FINN: A Framework for Fast, Scalable Binarized Neural Network Inference Yaman Umuroglu (XIR & NTNU), Nick Fraser (XIR & USydney), Giulio Gambardella (XIR), Michaela Blott (XIR), Philip Leong (USydney),

More information

All Programmable: from Silicon to System

All Programmable: from Silicon to System All Programmable: from Silicon to System Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates Variability Page 3 Industry Debates on Cost Page 4

More information

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough 1,2 S. K. Lee 2, N. Mulholland 2, P. Hansen 2, S. Kodali 3, D. Brooks 2, G.-Y. Wei 2 1 ARM Research, Boston,

More information

Neuromorphic Computing: Our approach to developing applications using a new model of computing

Neuromorphic Computing: Our approach to developing applications using a new model of computing Neuromorphic Computing: Our approach to developing applications using a new model of computing David J. Mountain Senior Technical Director Advanced Computing Systems Research Program Background Info Outline

More information

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School

More information

MNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System

MNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System MNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System Lixue Xia 1, Boxun Li 1, Tianqi Tang 1, Peng Gu 12, Xiling Yin 1, Wenqin Huangfu 1, Pai-Yu Chen 3, Shimeng Yu 3, Yu Cao 3,

More information

FPGA Programming Technology

FPGA Programming Technology FPGA Programming Technology Static RAM: This Xilinx SRAM configuration cell is constructed from two cross-coupled inverters and uses a standard CMOS process. The configuration cell drives the gates of

More information

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem. The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults

More information

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture The 51st Annual IEEE/ACM International Symposium on Microarchitecture Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture Byungchul Hong Yeonju Ro John Kim FuriosaAI Samsung

More information

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation

More information

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016 CS 4510/9010 Applied Machine Learning 1 Neural Nets Paula Matuszek Fall 2016 Neural Nets, the very short version 2 A neural net consists of layers of nodes, or neurons, each of which has an activation

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

Design Methodologies

Design Methodologies Design Methodologies 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 Complexity Productivity (K) Trans./Staff - Mo. Productivity Trends Logic Transistor per Chip (M) 10,000 0.1

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

Course Overview Revisited

Course Overview Revisited Course Overview Revisited void blur_filter_3x3( Image &in, Image &blur) { // allocate blur array Image blur(in.width(), in.height()); // blur in the x dimension for (int y = ; y < in.height(); y++) for

More information

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University Lab 4: Binarized Convolutional Neural Networks Due Wednesday, October 31, 2018, 11:59pm

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm

Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm ECE5775 High-Level Digital Design Automation, Fall 2017 School of Electrical Computer Engineering, Cornell University Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm 1 Introduction

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

DNN Accelerator Architectures

DNN Accelerator Architectures DNN Accelerator Architectures ISCA Tutorial (2017) Website: http://eyeriss.mit.edu/tutorial.html Joel Emer, Vivienne Sze, Yu-Hsin Chen 1 2 Highly-Parallel Compute Paradigms Temporal Architecture (SIMD/SIMT)

More information

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx High Capacity and High Performance 20nm FPGAs Steve Young, Dinesh Gaitonde August 2014 Not a Complete Product Overview Page 2 Outline Page 3 Petabytes per month Increasing Bandwidth Global IP Traffic Growth

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Character Recognition Using Convolutional Neural Networks

Character Recognition Using Convolutional Neural Networks Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract

More information

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation Example Computer Vision

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

The Memory Hierarchy 1

The Memory Hierarchy 1 The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow

More information

A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models

A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models and Voice-Activated Power Gating Michael Price*, James Glass, Anantha Chandrakasan MIT, Cambridge, MA * now at Analog Devices, Cambridge,

More information

ESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?

ESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable? ESE534: Computer Organization Day 22: April 9, 2012 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. 1 [src: www.tabula.com] 2 Previously Today Saw how to pipeline

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning

PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning Presented by Nils Weller Hardware Acceleration for Data Processing Seminar, Fall 2017 PipeLayer: A Pipelined ReRAM-Based Accelerator for

More information

Back propagation Algorithm:

Back propagation Algorithm: Network Neural: A neural network is a class of computing system. They are created from very simple processing nodes formed into a network. They are inspired by the way that biological systems such as the

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Image Compression: An Artificial Neural Network Approach

Image Compression: An Artificial Neural Network Approach Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and

More information

Deep Learning with Intel DAAL

Deep Learning with Intel DAAL Deep Learning with Intel DAAL on Knights Landing Processor David Ojika dave.n.ojika@cern.ch March 22, 2017 Outline Introduction and Motivation Intel Knights Landing Processor Intel Data Analytics and Acceleration

More information

Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks

Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks Yu-Hsin Chen 1, Joel Emer 1, 2, Vivienne Sze 1 1 MIT 2 NVIDIA 1 Contributions of This Work A novel energy-efficient

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly

More information

Microprocessor Trends and Implications for the Future

Microprocessor Trends and Implications for the Future Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from

More information

Neural Networks (pp )

Neural Networks (pp ) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.

More information

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal Computer Architectures for Deep Learning Ethan Dell and Daniyal Iqbal Agenda Introduction to Deep Learning Challenges Architectural Solutions Hardware Architectures CPUs GPUs Accelerators FPGAs SOCs ASICs

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Advanced Synthesis Techniques

Advanced Synthesis Techniques Advanced Synthesis Techniques Reminder From Last Year Use UltraFast Design Methodology for Vivado www.xilinx.com/ultrafast Recommendations for Rapid Closure HDL: use HDL Language Templates & DRC Constraints:

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS) INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS) Bill Jason P. Tomas Dept. of Electrical and Computer Engineering University of Nevada Las Vegas FIELD PROGRAMMABLE ARRAYS Dominant digital design

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

Stacked Silicon Interconnect Technology (SSIT)

Stacked Silicon Interconnect Technology (SSIT) Stacked Silicon Interconnect Technology (SSIT) Suresh Ramalingam Xilinx Inc. MEPTEC, January 12, 2011 Agenda Background and Motivation Stacked Silicon Interconnect Technology Summary Background and Motivation

More information

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Ruizhe Zhao 1, Xinyu Niu 1, Yajie Wu 2, Wayne Luk 1, and Qiang Liu 3 1 Imperial College London {ruizhe.zhao15,niu.xinyu10,w.luk}@imperial.ac.uk

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

Deep (1) Matthieu Cord LIP6 / UPMC Paris 6

Deep (1) Matthieu Cord LIP6 / UPMC Paris 6 Deep (1) Matthieu Cord LIP6 / UPMC Paris 6 Syllabus 1. Whole traditional (old) visual recognition pipeline 2. Introduction to Neural Nets 3. Deep Nets for image classification To do : Voir la leçon inaugurale

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Yuki Osada Andrew Cannon

Yuki Osada Andrew Cannon Yuki Osada Andrew Cannon 1 Humans are an intelligent species One feature is the ability to learn The ability to learn comes down to the brain The brain learns from experience Research shows that the brain

More information

Multi-Core Microprocessor Chips: Motivation & Challenges

Multi-Core Microprocessor Chips: Motivation & Challenges Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005

More information

COMPUTATIONAL INTELLIGENCE

COMPUTATIONAL INTELLIGENCE COMPUTATIONAL INTELLIGENCE Fundamentals Adrian Horzyk Preface Before we can proceed to discuss specific complex methods we have to introduce basic concepts, principles, and models of computational intelligence

More information

11/14/2010 Intelligent Systems and Soft Computing 1

11/14/2010 Intelligent Systems and Soft Computing 1 Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary

More information

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas Power Solutions for Leading-Edge FPGAs Vaughn Betz & Paul Ekas Agenda 90 nm Power Overview Stratix II : Power Optimization Without Sacrificing Performance Technical Features & Competitive Results Dynamic

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic computational unit of the brain about 10^11 neurons in human brain Simplified neuron model as linear threshold

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras CAD for VLSI Debdeep Mukhopadhyay IIT Madras Tentative Syllabus Overall perspective of VLSI Design MOS switch and CMOS, MOS based logic design, the CMOS logic styles, Pass Transistors Introduction to Verilog

More information

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane

More information

Bridging Analog Neuromorphic and Digital von Neumann Computing

Bridging Analog Neuromorphic and Digital von Neumann Computing Bridging Analog Neuromorphic and Digital von Neumann Computing Amir Yazdanbakhsh, Bradley Thwaites Advisors: Hadi Esmaeilzadeh and Doug Burger Qualcomm Mentors: Manu Rastogiand Girish Varatkar Alternative

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information