Introduction to Neural Networks: Structure and Training Qi-Jun Zhang Department of Electronics Carleton University, Ottawa, ON, Canada
A Quick Illustration Example: Neural Network Model for Delay Estimation in a High-Speed Interconnect Network
High-Speed VLSI Interconnect Network Receiver 1 Driver 1 Driver 2 Receiver 4 Driver 3 Receiver 3 Receiver 2
Circuit Representation of the Interconnect Network L1 τ1 L2 τ2 L3 τ3 Rs R3 C3 L4 Source Vp, Tr R1 C1 R2 C2 τ4 R4 C4
Need for a Neural Network Model A PCB contains large number of interconnect networks, each with different interconnect lengths, terminations, and topology, leading to need of massive analysis of interconnect networks During PCB design/optimization, the interconnect networks need to be adjusted in terms of interconnect lengths, receiver-pin load characteristics, etc, leading to need of repetitive analysis of interconnect networks This necessitates fast and accurate interconnect network models and neural network model is a good candidate
Neural Network Model for Delay Analysis τ1 τ2 τ3 τ4... L1 L2 L3 L4 R1 R2 R3 R4 C1 C2 C3 C4 Rs Vp Tr e1 e2 e3
Simulation Time for 20,000 Interconnect Configurations Method CPU Circuit Simulator (NILT) 34.43 hours AWE 9.56 hours Neural Network Approach 6.67 minutes
Important Features of Neural Networks Neural networks have the ability to model multi-dimensional nonlinear relationships Neural models are simple and the model computation is fast Neural networks can learn and generalize from available data thus making model development possible even when component formulae are unavailable Neural network approach is generic, i.e., the same modeling technique can be re-used for passive/active devices/circuits It is easier to update neural models whenever device or component technology changes
Neural Network Structures
Neural Network Structures A neural network contains neurons (processing elements) connections (links between neurons) A neural network structure defines how information is processed inside a neuron how the neurons are connected Examples of neural network structures multi-layer perceptrons (MLP) radial basis function (RBF) networks wavelet networks recurrent neural networks knowledge based neural networks MLP is the basic and most frequently used structure
MLP Structure y 1 y 2 y m 1 2 N.... L (Output) Layer L 1 2 3.... N L-1 (Hidden) Layer L-1....... 1 2 3.... N 2 (Hidden) Layer 2 1 2 3.... N 1 (Input) Layer 1 x 1 x 2 x 3 x n
Information Processing In a Neuron l z i σ (.) γ l γ i z l 1 0 z l wi 0 l 1 1 l wi 1 z l 1 2 l wi 2 Σ. l win l 1 l 1 z N l 1
Neuron Activation Functions Input layer neurons simply relay the external inputs to the neural network Hidden layer neurons have smooth switch-type activation functions Output layer neurons can have simple linear activation functions
Activation Functions for Hidden Neurons 1.5 Sigmoid σ(γ)=1/(1 +e - γ ) 1 0.5 0-25 -20-15 -10-5 0 5 10 15 20 25 2 Arc-tangent σ(γ)=(2/π)arctan(γ) 1 0-1 -2-10 -8-6 -4-2 0 2 4 6 8 10 2 Hyperbolic-tangent σ(γ)=(e + γ -e - γ )/(e + γ +e - γ ) 1 0-1 -2-10 -8-6 -4-2 0 2 4 6 8 10
y MLP Structure y 1 y 2 y m z (L) 1 2 N.... L (Output) Layer L z (l -1) 1 2 3.... N L-1 (Hidden) Layer L-1 z (2)....... 1 2 3.... N 2 (Hidden) Layer 2 z (1) 1 2 3.... N 1 (Input) Layer 1 x x 1 x 2 x 3 x n
3 Layer MLP: Feedforward Computation Outputs y 1 y 2 y j =Σ W jkz k k W jk Hidden Neuron Values Z 1 Z 2 Z 3 Z 4 W ki Z k = tanh(σ W ki x i ) i x 1 x 2 x 3 Inputs
How can ANN represent an arbitrary nonlinear input-output relationship? Universal Approximation Theorem (Cybenko, 1989, Hornik, StinchCombe and White, 1989) In plain words: Given enough hidden layer neurons, a 3-layer MLP neural network can approximate an arbitrary continuous multidimensional function to any desired accuracy
How many hidden neurons are needed? The number of hidden neurons depends upon the degree of non-linearity, and dimension of the original problem Highly nonlinear problems and high dimensional problems need more neurons while smoother problems and small dimensional problems need fewer neurons To determine number of hidden neurons experience empirical criteria adaptive schemes software tool internal estimation
Development of Neural Network Models
Notation y = y(x, w): ANN model x: inputs of given modeling problem or ANN y: outputs of given modeling problem or ANN w: weight parameters in ANN d : data of y from simulation or measurement
Define Model Input-Output and Generate Data Define model input-output (x, y), for example, x: physical/geometrical parameters of the component y: S-parameters of the component Generate (x, y) samples (x k,d k ), k = 1, 2,, P, such that the data set sufficiently represent the behavior of the given x-y problem Types of Data Generator: simulation and measurement
Where Data Should be Sampled x3 x1 x2 Uniform grid distribution Non-uniform grid distribution Design of Experiments (DOE) methodology central-composite design 2 n factorial design Star distribution Random distribution
Training and Test Data Sets The overall data should be divided into 2 sets, training, and test. Notation: T r - Index set of training data T e - Index set of test data
Error Definitions Training error: E T r ( w ) = k T j = 1 r m y j ( x k,w ) d jk 2 test error E Te can be similarly defined. Training Objective: Adjust w to minimize E Tr, update of w is carried out using the information (w) and ET r At end of training, the quality of the neural model can be tested using test error E Te ET r w
Neural Network Training The error between training data and neural network outputs is feedback to the neural network to guide the internal weight update of the network d Training Error - y Training Data Neural Network W x
Typical Training Process Step 1: w = initial guess, set epoch = 0 Step 2: If (E Tr (epoch) < required_accuracy) or if (epoch > max_epoch) then STOP Step 3: Compute ET r (w) (or in training data set ( w) and ) using all samples w Step 4: Use optimization algorithm to find w and update the weights w w+ w ET r (w) ET r Step 5: Set epoch = epoch + 1 and GO TO Step 2
Gradient-based Training Algorithms w = η h where h is the direction of the update of w η is the step size Gradient-based methods use information of ET r (w) and to determine update direction h ( w) ET r w Step size η is determined in one of the following ways: Small value either fixed or adaptive during training Line minimization to find best value of η Examples of algorithms: backpropagation, conjugate gradient, and quasi-newton
Example: Backpropagation (BP) Training (Rumelhart, Hinton, Williams 1986) In the gradient algorithm, w = η h Let the update direction h be the negative gradient direction, then: or w w E w η where η is called learning rate β is called momentum factor = E T r ( w ) w ( w) Tr = w η + β w epoch 1 w
Example: FET Modeling Using Neural Networks
Neural Model for MESFET Modeling I d W Source N d Gate Drain a. L L W a N d V gs V ds
V d (V) FET I-V curves: Neural model vs physics-based test data V g = 0V -1V I d (ma) -2V -3V -4V -5V