Neural Networks Neural Network Input Andrew Kusiak Intelligent t Systems Laboratory 2139 Seamans Center Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934 Age = 51 Temperature = 40C Neuron 1 Neuron 2 Disease A = Yes Neuron 3 Neural Network Neural Network Age =.70 Temperature =.85 Neuron 1 Neuron 2 Weight 1 =.6 Neuron 3 Weight 2 =.2 Disease A =.59 (=.7.6 +.85.2) 1. Design a neural network structure. 2. Assign weights to the connectors. 3. Train the neural network. 4. Check for stopping criterion, e.g., training error or the network cross-validation. If the stopping criterion is not met, go to Step 3; otherwise go to Step 5. 5. Use the trained neural network for decision-making. 1
When NNs can be Used? Neural Networks are well understood s are well understood Historical data is available Applicable to a wide range of problems Good results even for complex domains Handling of categorical and continuous variables Off-the-shelf software is available Neural Networks in the range 0 to 1 s are produced in the range 0 to 1 Result are not explained May generate local optimum Solutions with errors Application Examples 1 Banking: Credit Application Evaluation Image Signal Processing: Data Compression Financial: Real Estate Appraisal Manufacturing: Manufacturing Process Control, Chemical Product Design Analysis Robotics: Vision Systems 2
Application Examples 2 Medicine: - Predict diagnosis - Determine diagnostic tests - Predict length of stay in a hospital - Predict treatment cost Biological Neuron vs Processing Element dendrites nucleus synapse axon cell body W 2 W 1 W n Σ y x θ Combination function Activation function Neural Networks Processing Elements Weighted Connections Activation Functions threshold function: sigmoid function: complex function: or linear function: W 2 W 1 Wn 1,x > 0 f ( x ) = 0, else 1 (1+ exp( x)) y x Σ θ (exp(x) exp( x)) (exp(x) + exp( x)) (2/(1+ exp( 2x))) 1 x Neural Network Types Architecture Feedforward Feedback (Loops) Learning Algorithm Supervised Unsupervised 3
Neural Network Types Supervised Feedforward Perceptron Multilayer Perceptron Cascade Correlation ART(Adaptive Resonance Theory) Map Unsupervised Feedforward ART 1&2 - Adaptive Resonance Theory 1&2 SOM (Self Organizing Map) Supervised & Unsupervised Feedforward RBF (Radial Basis Function) Counter Propagation Supervised Feedback Hopfield BAM (Bidirective Associate Memory) Boltzmann-Machine Feedforward NN i1 Example 1 i1 Example 2 i2 o i2 o i3 i3 Analogous to regression analysis Intermediate layer More powerful network 4
i1 i2 Example 3 o i1 i2 Example 4 o i3 i3 More neurons in the intermediate layer More powerful network with the increased risk of overfitting s Multiple output values Simple NN Application Illustrative Learning Rules $ spent/ month Age Gender Income Single intermediate layer NN Apparel Furniture Entertainment Hebbian (Hebb 1949) Winner-Takes-All (Competitive learning) (Kohonnen 1982) Simple error - correcting rule (Rosenblatt 1958) Backpropagation error - correcting rule (Werbos 1974) Propensity of making the next purchase Radial Basis Function (RBF) Network 5
Hebbian x1 x2 Learning wij yj y Winner-Takes-All x1 x2 Learning wi yj y x3 x3 t wij(t+1) = wij(t) + η yj(t)xi(t) 0 < η <1 learning rate The winning neuron s weight wi(t+1) = wi(t) + η(t) (X - wi(t)) X = input vector Learning Simple error - correcting rule x1 x2 x3 wi yj wij(t+1) = wij(t) + η (d - y) xi y d Desired output Learning Backpropagation error - correcting rule x1 x2 x3 wi yj y d Desired output w(t+1) = w(t) + η Δf(x(t+1), w(t)) (d(t +1)) - - f(x(t+1), w(t))) Δ = gradient 6
Learning Radial Basis Function (RBF) Network Features: Based on approximation and regularization theory Global optimum is more likely to be determined Short training time Simple topology Mathematical Model of a Neural Network Training Patterns x, f (x), for some unknown function f in a space of possible functions Φ and an error function E( f ˆ ), f ˆ Φ Function f ˆ that minimizes E( f ˆ ) Generalization NN is to generalize (examples) not to memorize Training Overfitting Prediction Problems: Overfitting Underfitting Network Size Good Generalization Overfitting 7
Number of Training Examples Given N nodes, W weights Steps for Building a Neural Network Model The number of training examples p Vapnik-Chervonenkis (VC) Dimension p O( W ε log N ε ) Widrow s rule of thumb p 10W Identify Problem Collect Data (Choose the training set) Preprocess Data Implement a Neural Network Choosing the Training Set NN Implementation The number of features - NN training time depends on the number of features - Number of example increases with the number of NN inputs Coverage of the range values of all features Number of outputs - Sufficient coverage in examples What kind of architecture is best for modeling the underlying problem? Which learning algorithm can achieve the best generalization? What size network gives the best generalization? How many training instances are required for good generalization? 8
Implementation Sequence 1. Normalize Data 2. Design a Network, e.g., #Layers, #Neurons and Transfer Function for Each Layer 3StT 3. Set Training ii Parameters 4. Provide Initial Weights and Save the Weights if Necessary 5. Training with a Learning Algorithm 6. Testing Matlab Implementation of NNs Getting Started and Quit % matlab >>...... >> quit Online Help >> help >> help which >> help lookfor >> help diary Variables >> a=1 >> a=1; >> A = 2 >> x = 4\1 % x = 1/4 >> C = astring % Don t forget >> b = [1 2 3] >> d = 1:3 >> e = 3:-1:1 % e = [3 2 1] >> g = [1 2 3... 4 5] % g = [ 1 2 3 4 5] Matrices >> B = [ 1 2 3; 4 5 6 7 8 9] >> P = [10 11 12], B=[B; P] >> zeros(3, 2) >> ones(2, 3) >> B(1, 2) >> B(:, 3) >> B(2, :) >> B(:, [1 3]) 9
Matrix Operations and Manipulation >> C = A*B >> B = B >> D = B + 2 >> E = B + B % E = 2 * B >> X = [1 2 3], Y=[4 5 6], Z = X.*Y >> X = A\B % A*X = B >> X = B/A % X*A = B >> Y = [] >> Y(:, [1 3])=[] NN and Genetic Algorithm Input 1 Input 2 Weight 1 Weight 2 11011000 11001111 Neuron 1 Neuron 2 Weight 1 Weight 2 Neuron 3 NN and Evolutionary Computation Input layer Intermediate layer layer Evolutionary computation algorithm 10