CS 4510/9010 Applied Machine Learning

Size: px

Start display at page:

Download "CS 4510/9010 Applied Machine Learning"

Curtis Barnett
5 years ago
Views:

1 CS 4510/9010 Applied Machine Learning Neural Nets Paula Matuszek Spring,

2 Neural Nets, the very short version A neural net consists of layers of nodes, or neurons, each of which has an activation level Nodes of each layer receive inputs from previous layers; these are combined according to a set of weights. If the activation level is reached the node fires and sends inputs to the next level The initial layer is data from cases; the final layer is expected outcomes Learning is accomplished by modifying the weights to reduce the prediction error 2

3 Connectionist Systems A neural net is an example of a connectionist system; we are looking at the connections among the neurons Neurons are also known as perceptrons; the Weka book calls these MultiLayer Perceptron systems The origin of NN systems is modeling human neurons A recent research topic is deep learning systems, which are layered NNs; earlier NNs are the inputs to later ones. They are being explored as providing an approach to modeling a richer knowledge space or model. 3

4 a 0 = 1 a j = g(in j ) wi,j a i Bias Weight w 0,j Σ in j g a j Input Links Input Function Activation Function Output Output Links Figure 18.19: A neuron 1 w 1,3 3 1 w 1,3 3 w 3,5 5 w 1,4 w 1,4 w 3,6 2 w 2,3 w 2,4 4 2 w 2,3 w 2,4 4 w 4,5 w 4,6 6 (a) Figure 18:20 A simple network (b) A network with an input layer, a hidden layer, and an output layer. 1 Artificial Intelligence, a Modern Approach, third edition, Russell and 4 Norvig, 2010

5 Diagram of an NN Fig: A simple Neural Network 5

6 Network Layers Input Layer - The activity of the input units represents the raw information that is fed into the network. Hidden Layer - The activity of each hidden unit is determined by the activities of the input units and the weights on the connections between the input and the hidden units. Output Layer - The behavior of the output units depends on the activity of the hidden units and the weights between the hidden and output units.

7 Continued This simple type of network is interesting because the hidden units are free to construct their own representations of the input. The weights between the input and hidden units determine when each hidden unit is active, and so by modifying these weights, a hidden unit can choose what it represents.

8 Network Structure The number of layers and of neurons depend on the specific task. In practice this issue is solved by trial and error. Two types of adaptive algorithms can be used: start from a large network and successively remove some neurons and links until network performance degrades. begin with a small network and introduce new neurons until performance is satisfactory.

9 Training Basics The most basic method of training a neural network is trial and error. If the network isn't behaving the way it should, change the weighting of a random link by a random amount. If the accuracy of the network declines, undo the change and make a different one. It takes time, but the trial and error method does produce results.

10 Neural Nets, continued The typical method of modifying the weights is back-propagation success or failure at the output node is propagated back through the nodes which contributed to that output node 10

11 Training: Backprop algorithm The Backprop algorithm searches for weight values that minimize the total error of the network over the set of training examples (training set). Backprop consists of the repeated application of the following two passes: Forward pass: in this step the network is activated on one example and the error of (each neuron of) the output layer is computed. Backward pass: in this step the network error is used for updating the weights. Starting at the output layer, the error is propagated backwards through the network, layer by layer.

12 Back Propagation Back-propagation training algorithm Network activation Forward Step Error propagation Backward Step Backprop adjusts the weights of the NN in order to minimize the network total mean squared error.

13 More Number of hidden nodes and layers is complicated decision too many = overfitting Typical is to try several and evaluate Number of connections can also be tweaked; these show fully-connected networks No really good algorithm for this either These are feed forward networks; there are no loops or cycles. 13

14 Some NN Advantages and Disadvantages Advantages Can learn complex patterns Works well for complex problems such as pattern recognition Good for multiple classes Disadvantages Ccan be very slow Needs a lot of examples to work well Very black box A lot of heuristics; results not identical every time 14

15 Example from AISpace Mail Examples Set properties Initialize the parameters Solve How do we use it? Calculate output We are using the NN applet at aispace.org 15

16 Example: Which class to take? Inputs? Outputs? Sample data 16

17 Some Examples Example 1: 3 inputs, 1 output, all binary Example 2: same inputs, output inverted 17

18 Getting the right inputs Example 3 Same inputs as 1 and 2 Same output as 1 Outcomes reversed for half the cases 18

19 Getting the right inputs Example 3 Same inputs as 1 and 2 Same output as 1 Outcomes reversed for half the cases Network is not converging The output here cannot be predicted from these inputs. Whatever is determining whether to take the class, we haven t captured it 19

20 Representing non-numeric values Example 4 Required is represented as yes or no 20

21 Representing non-numeric values Example 4 Required is represented as yes or no Actual model still uses 1 and 0; transformation is done by applet. 21

22 More non-numeric values Example 5 Workload is low, med, high: text values but they can be ordered. 22

23 More non-numeric values Example 5 Workload is low, med, high: text values but they can be ordered. Applet asks us to assign values. 1, 0.5, 0 is typical. 23

24 Unordered values Example 6 Input variables here include professor Non-numeric, can t be ordered. 24

25 Unordered values Example 6 Input variables here include professor Non-numeric, can t be ordered. Still need numeric values Solution is to treat n possible values as n separate binary values Again, applet does this for us 25

26 Variables with more values Example 7 GPA and number of classes taken are integer values Takes considerably longer to solve Looks for a while like it s not converging 26

27 And Reals Example 8 GPA is a real. Takes about 20,000 steps to converge 27

28 And multiple outputs Small Car database from AIspace For any given input case, you will get a value for each possible outcome. Typical for, for instance, character recognition. 28

29 Training and Test Cases The basic training approach will fit the training data as closely as possible. But we really want something that will generalize to other cases This is why we have test cases. The training cases are used to compute the weights The test cases tell us how well they generalize Both training and test cases should represent the overall population as well as possible. 29

30 Representative Training Cases Example 9 Training cases and test cases are similar 30

31 Representative Training Cases Example 9 Training cases and test cases are similar (actually identical...) Training error and test error are comparable 31

32 ` Example 10 Training cases and test cases represent different circumstances We ve missed including any cases involving Lee in the training Training error goes down, but test error goes up. In reality these are probably bad training AND test cases; neither seems representative. 32

33 So: As for any classifier, getting a good NN involves understanding your domain and capturing knowledge about it choosing the right inputs and outputs choosing representative training and test sets Beware convenient training sets You can represent any kind of variable: numeric or not, ordered or not. Not every set of variables and training cases will produce a net that can be trained. 33

34 Once it s trained... When your NN is trained, you can feed it a specific set of inputs and get one or more outputs. These outputs are typically interpreted as some decision: take the class this is probably a 5 The network itself is black box. If the situation changes the NN should be retrained new variables new values for some variables new patterns of cases 34

35 One last note These have all been simple cases, as examples Most of my examples could in fact be predicted much more easily and cleanly with a decision tree, or even a couple of IF statements A more typical use for any connectionist system has many more inputs and many more training cases 35

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

CS 4510/9010 Applied Machine Learning 1 Neural Nets Paula Matuszek Fall 2016 Neural Nets, the very short version 2 A neural net consists of layers of nodes, or neurons, each of which has an activation