CS30 - Neural Nets in Python

Similar documents
Neural Nets. General Model Building

Neural Networks (pp )

2.3 Algorithms Using Map-Reduce

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #43. Multidimensional Arrays

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Implement NN using NumPy

Figure (5) Kohonen Self-Organized Map

Time Series prediction with Feed-Forward Neural Networks -A Beginners Guide and Tutorial for Neuroph. Laura E. Carter-Greaves

5 Learning hypothesis classes (16 points)

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Programming Exercise 4: Neural Networks Learning

Homework 5. Due: April 20, 2018 at 7:00PM

11/14/2010 Intelligent Systems and Soft Computing 1

CS6220: DATA MINING TECHNIQUES

SOFTWARE ENGINEERING DESIGN I

MATLAB representation of neural network Outline Neural network with single-layer of neurons. Neural network with multiple-layer of neurons.

WHAT TYPE OF NEURAL NETWORK IS IDEAL FOR PREDICTIONS OF SOLAR FLARES?

Supervised Learning in Neural Networks (Part 2)

Notes on Multilayer, Feedforward Neural Networks

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

COUNTING AND PROBABILITY

2. Neural network basics

Performing Matrix Operations on the TI-83/84

Fundamentals of Operations Research. Prof. G. Srinivasan. Department of Management Studies. Indian Institute of Technology, Madras. Lecture No.

Introduction to MATLAB

OUTLINES. Variable names in MATLAB. Matrices, Vectors and Scalar. Entering a vector Colon operator ( : ) Mathematical operations on vectors.

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #19. Loops: Continue Statement Example

Machine Learning 13. week

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop

Linear Classification and Perceptron

Lecture 20: Neural Networks for NLP. Zubin Pahuja

CS 4510/9010 Applied Machine Learning

Multi-Layered Perceptrons (MLPs)

Arrays. Defining arrays, declaration and initialization of arrays. Designed by Parul Khurana, LIECA.

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Neural Network Neurons

Knowledge Discovery and Data Mining

Chapter 18 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.

The method of rationalizing

FreeMat Tutorial. 3x + 4y 2z = 5 2x 5y + z = 8 x x + 3y = -1 xx

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Midterm 2 Solutions. CS70 Discrete Mathematics and Probability Theory, Spring 2009

Lecture Transcript While and Do While Statements in C++

Lecture 6: Arithmetic and Threshold Circuits

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Opening the Black Box Data Driven Visualizaion of Neural N

Lecture 3: Linear Classification

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test).

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

Discrete Mathematics, Spring 2004 Homework 8 Sample Solutions

Exercise: Training Simple MLP by Backpropagation. Using Netlab.

Artificial Intellegence

SCHEME 8. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. March 23, 2017

(Refer Slide Time: 00:02:24 min)

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Framework for Design of Dynamic Programming Algorithms

Coarse-to-fine image registration

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Dept. of Computing Science & Math

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018

CS150 - Sample Final

(Lec 14) Placement & Partitioning: Part III

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Lecture 19. Lecturer: Aleksander Mądry Scribes: Chidambaram Annamalai and Carsten Moldenhauer

Introduction to Homogeneous coordinates

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 02 Lecture - 45 Memoization

Section 0.3 The Order of Operations

The exam is closed book, closed notes except your one-page cheat sheet.

Introduction to MATLAB

CS 261 Data Structures. Big-Oh Analysis: A Review

The method of rationalizing

Problem Set 2: From Perceptrons to Back-Propagation

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

CS281 Section 3: Practical Optimization

Functions and Graphs. The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996.

Back Propagation and Other Differentiation Algorithms. Sargur N. Srihari

Thomas Nabelek September 22, ECE 7870 Project 1 Backpropagation

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

Week - 03 Lecture - 18 Recursion. For the last lecture of this week, we will look at recursive functions. (Refer Slide Time: 00:05)

Handout 9: Imperative Programs and State

Support Vector Machines

1 Historical Notes. Kinematics 5: Quaternions

CS 224d: Assignment #1

6.001 Notes: Section 6.1

Hidden Units. Sargur N. Srihari

Document Information

Automatic Differentiation for R

Bits, Words, and Integers

Allocating Storage for 1-Dimensional Arrays

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi

Databases 2 (VU) ( / )

Linear algebra deals with matrixes: two-dimensional arrays of values. Here s a matrix: [ x + 5y + 7z 9x + 3y + 11z

Building your Deep Neural Network: Step by Step

Instantaneously trained neural networks with complex inputs

Transcription:

CS30 - Neural Nets in Python We will experiment with neural networks using a simple software package written in Python. You can find the package at: http://www.cs.pomona.edu/~dkauchak/classes/cs30/assignments/assign6/cs30neural.txt Create a new file in Wing, copy the contents of this file and save it as cs30neural.py. 1 Usage It is easy to use the package. Here is a simple example of creating, training, and testing a network that has two input nodes, five hidden nodes, and one node. nn = neuralnet(2, 5, 1 nn.train(xortrainingdata #notice that xortrainingdata is defined below print nn.test(xortestingdata During training, we want to give the network labeled examples. An example, consists of a tuple of two things: (input, 1

All input and values are specified as lists. For example, ([1.0, 1.0], [0.0] represents an input pair mapping input 1, 1 to 0. The training data is then a list of input- pairs. For the classic XOR example, the training data would be as follows: xortrainingdata = [([0.0, 0.0], [0.0], ([0.0, 1.0], [1.0], ([1.0, 0.0], [1.0], ([1.0, 1.0], [0.0]] The testing data is also presented in the same format. If you apply the model to test data, you will get back a list of triples: (input, desired-, actual- This format that is convenient for further analysis by computer, but it is difficult to read when printed directly. You might want to print it one triple to a line: for triple in nn.test(testingdata: print triple If you unpack the triple, you can ask more interesting questions, for example to print only the differences between the desired and actual s: for triple in nn.test(testingdata: (input, desired, actual = triple # unpack the triple # remember s are lists, so we need to get the first item from the list print desired[0] - actual[0] If you want to evaluate just one input, you may do it with the evaluate function. To see the for a particular list of inputs, for example, you would write the following. print nn.evaluate(inputlist 2 The Neural Network Class The software we re playing with represents a neural network as an object in the neuralnet class. We ve already seen instances of this, for example, lists are a class of objects. 2

The main difference between a list object and our neural network object is that there is no special way for constructing neural network objects (to create lists we use the square brackets, []. Instead, there is a special function called a constructor that takes some number of arguments (in our case, three and creates a neural network object. Once you have that object, you can call methods on it just like you would any other object. In the following summary, the neural network object is named nn. You may, of course, give your networks any name you like. nn = neuralnet(ninput, nhidden, noutput this calls the constructor and creates a network with the specified number of nodes in each layer. The initial weights are random values between 2.0 and 2.0. Notice that all of the other methods below are called on a neural network object that has been created. nn.evaluate(input returns the of the neural network when it is presented with the given input. Remember that the input and are lists. nn.train(trainingdata carries out a training cycle. As specified earlier, the training data is a list of input- pairs. There are four optional arguments to the train function: learningrate defaults to 0.5. momentumfactor defaults to 0.1. The idea of momentum is discussed in the next section. Set it to 0 to suppress the affect of the momentum in the calculation. iterations defaults to 1000. It specifies the number of passes over the training data. printinterval defaults to 100. The value of the error is displayed after printinterval passes over the data; we hope to see the value decreasing. Set the value to 0 if you do not want to see the error values. You may specify some, or all, of the optional arguments by name in the following format. nn.train(trainingdata, learningrate=0.8, momentumfactor=0.0, iterations=100, printinterval=5 nn.test(testingdata evaluates the network on a list of examples. As mentioned above, the testing data is presented in the same format as the training data. The result is a list of triples which can be used in further evaluation. nn.getihweights( returns a list of lists representing the weights between the input and hidden layers. If there are t input nodes and u hidden nodes, the result will be a list containing t lists of length u. nn.gethoweights( returns a list of lists representing the weights between the input and hidden layers. If there are u hidden nodes and v nodes, the result will be a list containing u lists of length v. nn.usetanh( changes the activation function from the sigmoid function to a commonly used alternative, the hyperbolic tangent function. 3

3 Theory and Implementation Details For those that are curious, this section gives more detail on exactly how the neural network learning algorithm works. This material goes beyond what I expect you to know for the class, however, I ve included it here in case you re curious and would like to learn more. Data values. Suppose that we have a network with t input nodes, u hidden nodes, and v nodes. The state of the network is represented by the values below. I 0 I 1... I t 1 I t H 0 H 1... H u 1 H u O 0 O 1... O v 1 Notice that there is an extra input node and an extra hidden node. These are the bias nodes; I t and H u always have the value 1.0. The bias nodes are handled automatically; you do not have to account for them in the code you write. There are two matrices for weights. The value Wi,j IH is the weight assigned to the link between input node i and hidden node j, and Wj,k HO is the weight between hidden node j and node k. The two matrices are of size (t + 1 (u + 1 and (u + 1 v, respectively. The change matrices keep track of the most recent changes to the weights. The values are Ci,j IH and Cj,k HO, with notation analogous to the weights. Activation function. The most commonly used activation function is the sigmoid function, defined by σ(x = 1/(1 + e x. Its derivative given by the formula D σ (x = σ(x(1 σ(x. The values of the sigmoid function are always between 0.0 and 1.0. The activation function is used to compute the value at the hidden and nodes. Its derivative is used to compute adjustments to the weights. An alternative to the sigmoid function is the hyperbolic tangent, given by the formula tanh(x = (e x e x /(e x + e x and its derivative is given by the formula D tanh (x = (1 tanh 2 (x. The values of the hyperbolic tangent range between 1.0 and 1.0. Computing the values. When presented with t input values, the network is evaluated by computing the hidden values and then the values. The input values are assigned to I 0 through I t 1. (Remember that I t is always 1.0. The jth hidden value is ( H j = σ I 0 W IH 0,j + I 1W1,j IH +... + I twt,j IH. = σ ( ti=0 I i W IH i,j 4

Once we know all the hidden values, we can similarly compute the values. The kth value is ( O k = σ H 0 W HO 0,k + H 1W1,k HO +... + H uwu,k HO. = σ ( uj=0 H j W HO j,k Training the network. We start with a collection of known input and values. For a single input- pair, we evaluate the input to obtain the actual values O 0 through O v 1. We compare these values with the desired values Q 0 through Q v 1 to obtain the error or required change at each node. The change at node k is δ O k = (Q k O k D σ (O k. Notice that the change is multiplied by the derivative of the activation function at the actual value. We apportion the responsibility for the error at an node to the hidden nodes according to the weights. The change at hidden node j is ( δj H = = W HO ( v 1 j,0 δo 0 + W j,1 HOδO 1 +... + W HO D σ (H j. k=0 W j,k HO δo k j,v 1 δo v 1 D σ (H j The next step is to adjust the weights between the input and hidden layer, using the errors in the hidden layer. The weight Wi,j IH becomes W IH i,j + Lδ H j I i + MC IH i,j. The weight is adjusted by two terms. One involves the learning rate, L. It is usually a value between 0 and 1. In our software, the default value is 0.5. A higher rate may make the learning go faster, but it may also take too large steps causing the weights to gyrate and never settle down. The other term involves the momentum factor, M. It tries to capitalize on the idea that if we moved a weight in one direction the last time, it may speed up the learning if we moved it a little further in the same direction this time. The momentum factor is usually rather small; the default value in our software is 0.1. In order to use the momentum factor, we have to remember the previous change. We record it in the change matrix after updating Wt,u IH, by setting Ci,j IH to δ H j I i. Finally, we update the weights and changes between the hidden and layers in the same manner. The weight Wj,k HO becomes 5

Wj,k HO + Lδk O H j + MCj,k HO, and the change C HO j,k becomes δ O k H j. Computing the error. The error for one evaluation is defined by ( (Q 0 O 0 2 + (Q 1 O 1 2 +... + (Q v 1 O v 1 2 /2 The whole training cycle described above evaluate the, compute the errors, and adjust the weights and changes is executed for each element in the training data. The error for one pass over the training data is the sum of the individual errors. The formulas for adjusting the weights are designed to minimize the sum of errors. If all goes well, the error will decrease rapidly as the process is repeated, often with thousands of passes across the training data. Variations. Our software works with only the most common type of neural network. There are many variations, including these: Some formulations omit the bias node in the hidden layer; others omit both bias nodes. Our software connects each node in one layer to each in the succeeding layer; sometimes some of the connections are omitted. One can argue that it is not necessary to leave out a connection; ideally, the learning process will discover that the weight is zero. But if we know something about the network and the significance of the nodes, we may know a priori that the weight is zero, and omitting the connection will speed up the learning process. Occasionally, there is more than one hidden layer. Sometimes there are edges from the layers back to the input or hidden layer. These add cycles which give the network a form of memory in which one calculation is influenced by the previous calculation. 4 Examples The standard XOR example. inputs 0 0 0 0 1 1 1 0 1 1 1 0 6

Testing for greater than. ones. The data come in two parts: the positive facts and the negative inputs 0.2 0.0 1 0.4 0.2 1 0.6 0.4 1 0.8 0.6 1 1.0 0.8 1 inputs 0.0 0.2 0 0.2 0.4 0 0.4 0.6 0 0.6 0.8 0 0.8 1.0 0 Squaring a number. input 0.0 0.00 0.2 0.04 0.3 0.09 0.4 0.16 0.6 0.36 Voter preferences. This is training information. Voters were asked to rate the importance of some issues and then to identify their party affiliation. The inputs are, from left to right, the importance of budget, defense, crime, environment, and social security. They are ranked on a scale of 0.0 to 1.0. The s are 0.0 for Democrat and 1.0 for Republican. inputs 0.9 0.6 0.8 0.3 0.1 1.0 0.8 0.8 0.4 0.6 0.4 1.0 0.7 0.2 0.4 0.6 0.3 1.0 0.5 0.5 0.8 0.4 0.8 0.0 0.3 0.1 0.6 0.8 0.8 0.0 0.6 0.3 0.4 0.3 0.6 0.0 Unidentified. Can you determine the underlying rule? 7

inputs 1 0 1 0 0 0 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 1 0 0 0 1 1 1 0 1 1 0 1 1 0 1 1 0 0 0 0 1 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 0 0 8