Introduction to Neural Networks: Structure and Training

Similar documents
Training of Neural Networks. Q.J. Zhang, Carleton University

Supervised Learning in Neural Networks (Part 2)

Neural Network Structures and Training Algorithms for RF and Microwave Applications

MATLAB representation of neural network Outline Neural network with single-layer of neurons. Neural network with multiple-layer of neurons.

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Data Mining. Neural Networks

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

A NEW EFFICIENT VARIABLE LEARNING RATE FOR PERRY S SPECTRAL CONJUGATE GRADIENT TRAINING METHOD

CS6220: DATA MINING TECHNIQUES

For Monday. Read chapter 18, sections Homework:

Artificial Neural Networks MLP, RBF & GMDH

2. Neural network basics

Hidden Units. Sargur N. Srihari

CMPT 882 Week 3 Summary

Neural Network Weight Selection Using Genetic Algorithms

Proceedings of the 2016 International Conference on Industrial Engineering and Operations Management Detroit, Michigan, USA, September 23-25, 2016

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

6. Backpropagation training 6.1 Background

APPLICATION OF A MULTI- LAYER PERCEPTRON FOR MASS VALUATION OF REAL ESTATES

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

11/14/2010 Intelligent Systems and Soft Computing 1

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Dynamic Analysis of Structures Using Neural Networks

Deep Learning. Architecture Design for. Sargur N. Srihari

MODELLING OF ARTIFICIAL NEURAL NETWORK CONTROLLER FOR ELECTRIC DRIVE WITH LINEAR TORQUE LOAD FUNCTION

COMPUTATIONAL INTELLIGENCE

Week 3: Perceptron and Multi-layer Perceptron

Why MultiLayer Perceptron/Neural Network? Objective: Attributes:

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM

Fast Learning for Big Data Using Dynamic Function

MULTILAYER PERCEPTRON WITH ADAPTIVE ACTIVATION FUNCTIONS CHINMAY RANE. Presented to the Faculty of Graduate School of

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS

Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification

Multi-Layered Perceptrons (MLPs)

Neural Networks for Optimal Control of Aircraft Landing Systems

IN RECENT years, neural network techniques have been recognized

ANFIS: ADAPTIVE-NETWORK-BASED FUZZY INFERENCE SYSTEMS (J.S.R. Jang 1993,1995) bell x; a, b, c = 1 a

Prior Knowledge Input Method In Device Modeling

Multi Layer Perceptron trained by Quasi Newton learning rule

27: Hybrid Graphical Models and Neural Networks

Artificial Neural Network Methodology for Modelling and Forecasting Maize Crop Yield

Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidirectional Associative Memory (BAM) for Function Approximation

Extensive research has been conducted, aimed at developing

NNIGnets, Neural Networks Software

Global Journal of Engineering Science and Research Management

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Neural Network Neurons

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Deep Neural Networks Optimization

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

PERFORMANCE COMPARISON OF BACK PROPAGATION AND RADIAL BASIS FUNCTION WITH MOVING AVERAGE FILTERING AND WAVELET DENOISING ON FETAL ECG EXTRACTION

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Practical Tips for using Backpropagation

Chap.12 Kernel methods [Book, Chap.7]

SGD: Stochastic Gradient Descent

COMPUTATIONAL NEURAL NETWORKS FOR GEOPHYSICAL DATA PROCESSING

Learning from Data: Adaptive Basis Functions

Combined Weak Classifiers

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.

Planar Robot Arm Performance: Analysis with Feedforward Neural Networks

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick

Neural Networks (Overview) Prof. Richard Zanibbi

OMBP: Optic Modified BackPropagation training algorithm for fast convergence of Feedforward Neural Network

Fast Training of Multilayer Perceptrons

Constructing Hidden Units using Examples and Queries

Neuro-Fuzzy Computing

CSC 578 Neural Networks and Deep Learning

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

CP365 Artificial Intelligence

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Networks. Neural Network. Neural Network. Neural Network 2/21/2008. Andrew Kusiak. Intelligent Systems Laboratory Seamans Center

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

Support Vector Machines

Channel Performance Improvement through FF and RBF Neural Network based Equalization

IMPROVED MODEL ORDER ESTIMATION FOR NONLINEAR DYNAMIC SYSTEMS

Statistical foundations of machine learning

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Multilayer Feed-forward networks

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

Review on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Neural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University)

CHAPTER VI BACK PROPAGATION ALGORITHM

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

FAST NEURAL NETWORK ALGORITHM FOR SOLVING CLASSIFICATION TASKS

A New Learning Algorithm for Neural Networks with Integer Weights and Quantized Non-linear Activation Functions

INTELLIGENT PROCESS SELECTION FOR NTM - A NEURAL NETWORK APPROACH

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

Introduction to Multilayer Perceptrons

Optimizing Number of Hidden Nodes for Artificial Neural Network using Competitive Learning Approach

Neural Networks: What can a network represent. Deep Learning, Fall 2018

Future Image Prediction using Artificial Neural Networks

International Research Journal of Computer Science (IRJCS) ISSN: Issue 09, Volume 4 (September 2017)

Neural Networks: What can a network represent. Deep Learning, Spring 2018

Deep Learning for Computer Vision

Adaptive Tiled Neural Networks

Machine Learning 13. week

Transcription:

Introduction to Neural Networks: Structure and Training Qi-Jun Zhang Department of Electronics Carleton University, Ottawa, ON, Canada

A Quick Illustration Example: Neural Network Model for Delay Estimation in a High-Speed Interconnect Network

High-Speed VLSI Interconnect Network Receiver 1 Driver 1 Driver 2 Receiver 4 Driver 3 Receiver 3 Receiver 2

Circuit Representation of the Interconnect Network L1 τ1 L2 τ2 L3 τ3 Rs R3 C3 L4 Source Vp, Tr R1 C1 R2 C2 τ4 R4 C4

Need for a Neural Network Model A PCB contains large number of interconnect networks, each with different interconnect lengths, terminations, and topology, leading to need of massive analysis of interconnect networks During PCB design/optimization, the interconnect networks need to be adjusted in terms of interconnect lengths, receiver-pin load characteristics, etc, leading to need of repetitive analysis of interconnect networks This necessitates fast and accurate interconnect network models and neural network model is a good candidate

Neural Network Model for Delay Analysis τ1 τ2 τ3 τ4... L1 L2 L3 L4 R1 R2 R3 R4 C1 C2 C3 C4 Rs Vp Tr e1 e2 e3

Simulation Time for 20,000 Interconnect Configurations Method CPU Circuit Simulator (NILT) 34.43 hours AWE 9.56 hours Neural Network Approach 6.67 minutes

Important Features of Neural Networks Neural networks have the ability to model multi-dimensional nonlinear relationships Neural models are simple and the model computation is fast Neural networks can learn and generalize from available data thus making model development possible even when component formulae are unavailable Neural network approach is generic, i.e., the same modeling technique can be re-used for passive/active devices/circuits It is easier to update neural models whenever device or component technology changes

Neural Network Structures

Neural Network Structures A neural network contains neurons (processing elements) connections (links between neurons) A neural network structure defines how information is processed inside a neuron how the neurons are connected Examples of neural network structures multi-layer perceptrons (MLP) radial basis function (RBF) networks wavelet networks recurrent neural networks knowledge based neural networks MLP is the basic and most frequently used structure

MLP Structure y 1 y 2 y m 1 2 N.... L (Output) Layer L 1 2 3.... N L-1 (Hidden) Layer L-1....... 1 2 3.... N 2 (Hidden) Layer 2 1 2 3.... N 1 (Input) Layer 1 x 1 x 2 x 3 x n

Information Processing In a Neuron l z i σ (.) γ l γ i z l 1 0 z l wi 0 l 1 1 l wi 1 z l 1 2 l wi 2 Σ. l win l 1 l 1 z N l 1

Neuron Activation Functions Input layer neurons simply relay the external inputs to the neural network Hidden layer neurons have smooth switch-type activation functions Output layer neurons can have simple linear activation functions

Activation Functions for Hidden Neurons 1.5 Sigmoid σ(γ)=1/(1 +e - γ ) 1 0.5 0-25 -20-15 -10-5 0 5 10 15 20 25 2 Arc-tangent σ(γ)=(2/π)arctan(γ) 1 0-1 -2-10 -8-6 -4-2 0 2 4 6 8 10 2 Hyperbolic-tangent σ(γ)=(e + γ -e - γ )/(e + γ +e - γ ) 1 0-1 -2-10 -8-6 -4-2 0 2 4 6 8 10

y MLP Structure y 1 y 2 y m z (L) 1 2 N.... L (Output) Layer L z (l -1) 1 2 3.... N L-1 (Hidden) Layer L-1 z (2)....... 1 2 3.... N 2 (Hidden) Layer 2 z (1) 1 2 3.... N 1 (Input) Layer 1 x x 1 x 2 x 3 x n

3 Layer MLP: Feedforward Computation Outputs y 1 y 2 y j =Σ W jkz k k W jk Hidden Neuron Values Z 1 Z 2 Z 3 Z 4 W ki Z k = tanh(σ W ki x i ) i x 1 x 2 x 3 Inputs

How can ANN represent an arbitrary nonlinear input-output relationship? Universal Approximation Theorem (Cybenko, 1989, Hornik, StinchCombe and White, 1989) In plain words: Given enough hidden layer neurons, a 3-layer MLP neural network can approximate an arbitrary continuous multidimensional function to any desired accuracy

How many hidden neurons are needed? The number of hidden neurons depends upon the degree of non-linearity, and dimension of the original problem Highly nonlinear problems and high dimensional problems need more neurons while smoother problems and small dimensional problems need fewer neurons To determine number of hidden neurons experience empirical criteria adaptive schemes software tool internal estimation

Development of Neural Network Models

Notation y = y(x, w): ANN model x: inputs of given modeling problem or ANN y: outputs of given modeling problem or ANN w: weight parameters in ANN d : data of y from simulation or measurement

Define Model Input-Output and Generate Data Define model input-output (x, y), for example, x: physical/geometrical parameters of the component y: S-parameters of the component Generate (x, y) samples (x k,d k ), k = 1, 2,, P, such that the data set sufficiently represent the behavior of the given x-y problem Types of Data Generator: simulation and measurement

Where Data Should be Sampled x3 x1 x2 Uniform grid distribution Non-uniform grid distribution Design of Experiments (DOE) methodology central-composite design 2 n factorial design Star distribution Random distribution

Training and Test Data Sets The overall data should be divided into 2 sets, training, and test. Notation: T r - Index set of training data T e - Index set of test data

Error Definitions Training error: E T r ( w ) = k T j = 1 r m y j ( x k,w ) d jk 2 test error E Te can be similarly defined. Training Objective: Adjust w to minimize E Tr, update of w is carried out using the information (w) and ET r At end of training, the quality of the neural model can be tested using test error E Te ET r w

Neural Network Training The error between training data and neural network outputs is feedback to the neural network to guide the internal weight update of the network d Training Error - y Training Data Neural Network W x

Typical Training Process Step 1: w = initial guess, set epoch = 0 Step 2: If (E Tr (epoch) < required_accuracy) or if (epoch > max_epoch) then STOP Step 3: Compute ET r (w) (or in training data set ( w) and ) using all samples w Step 4: Use optimization algorithm to find w and update the weights w w+ w ET r (w) ET r Step 5: Set epoch = epoch + 1 and GO TO Step 2

Gradient-based Training Algorithms w = η h where h is the direction of the update of w η is the step size Gradient-based methods use information of ET r (w) and to determine update direction h ( w) ET r w Step size η is determined in one of the following ways: Small value either fixed or adaptive during training Line minimization to find best value of η Examples of algorithms: backpropagation, conjugate gradient, and quasi-newton

Example: Backpropagation (BP) Training (Rumelhart, Hinton, Williams 1986) In the gradient algorithm, w = η h Let the update direction h be the negative gradient direction, then: or w w E w η where η is called learning rate β is called momentum factor = E T r ( w ) w ( w) Tr = w η + β w epoch 1 w

Example: FET Modeling Using Neural Networks

Neural Model for MESFET Modeling I d W Source N d Gate Drain a. L L W a N d V gs V ds

V d (V) FET I-V curves: Neural model vs physics-based test data V g = 0V -1V I d (ma) -2V -3V -4V -5V