Introduction AL Neuronale Netzwerke. VL Algorithmisches Lernen, Teil 2b. Norman Hendrich

Similar documents
For Monday. Read chapter 18, sections Homework:

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

COMPUTATIONAL INTELLIGENCE

Data Mining. Neural Networks

Supervised Learning in Neural Networks (Part 2)

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Character Recognition Using Convolutional Neural Networks

CS6220: DATA MINING TECHNIQUES

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Neural Networks for Classification

Perceptron: This is convolution!

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Machine Learning 13. week

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Neural Networks CMSC475/675

Neural Networks (Overview) Prof. Richard Zanibbi

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Neural Networks and Deep Learning

Lecture 20: Neural Networks for NLP. Zubin Pahuja

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

CMPT 882 Week 3 Summary

Support Vector Machines

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

ECG782: Multidimensional Digital Signal Processing

Learning and Generalization in Single Layer Perceptrons

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Multilayer Feed-forward networks

Support Vector Machines

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

Deep (1) Matthieu Cord LIP6 / UPMC Paris 6

Deepest Neural Networks

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Support Vector Machines

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Multi-Layered Perceptrons (MLPs)

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

11/14/2010 Intelligent Systems and Soft Computing 1

Back propagation Algorithm:

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

CSC 578 Neural Networks and Deep Learning

Lecture #11: The Perceptron

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

In this assignment, we investigated the use of neural networks for supervised classification

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

Notes on Multilayer, Feedforward Neural Networks

Deep Learning. Volker Tresp Summer 2014

Visual object classification by sparse convolutional neural networks

3 Perceptron Learning; Maximum Margin Classifiers

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Neural Network Neurons

Pattern Classification Algorithms for Face Recognition

Deep Learning for Computer Vision II

Yuki Osada Andrew Cannon

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Week 3: Perceptron and Multi-layer Perceptron

Artificial Neural Networks MLP, RBF & GMDH

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Technical University of Munich. Exercise 7: Neural Network Basics

Machine Learning Classifiers and Boosting

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Alex Waibel

EECS 496 Statistical Language Models. Winter 2018

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Neural Network Weight Selection Using Genetic Algorithms

NEURAL NETWORKS. Typeset by FoilTEX 1

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM

Backpropagation + Deep Learning

27: Hybrid Graphical Models and Neural Networks

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Introduction to Multilayer Perceptrons

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

9. Lecture Neural Networks

Artificial Neural Network based Curve Prediction

Classification: Feature Vectors

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Using neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah

Deep Neural Networks Optimization

Transcription:

Introduction AL 64-360 Neuronale Netzwerke VL Algorithmisches Lernen, Teil 2b Norman Hendrich University of Hamburg, Dept. of Informatics Vogt-Kölln-Str. 30, D-22527 Hamburg hendrich@informatik.uni-hamburg.de 27/05/2009 Hendrich 1

Introduction AL 64-360 Terminplanung: Part 2 27/05/2009 28/05/2009 perceptron, backpropagation associative memory, self-organizing maps 03-04/06/2009 10-11/06/2009 holidays dimensionality reduction Hendrich 2

Introduction AL 64-360 Outline Introduction Perceptron model Perceptron learning Multi-layer networks Backpropagation learning Backpropagation applications Hendrich 3

Perceptron model AL 64-360 Remember: Eight major aspects of a PDP model a set of processing units the current state of activation an output function for each unit a pattern of connectivity between units a propagation rule for propagatin patterns of activities through the network of connectivities an activation rule for combining the inputs impinging on a unit a learning rule whereby patterns of connectivity are modified by experience the environment within the system must operate (Rumelhard, McClelland, et.al. 1986) Hendrich 4

Perceptron model AL 64-360 Remember: McCulloch-Pitts model binary-valued threshold neurons with unweighted connections McCulloch-Pitts networks can compute any Boolean function but very similar to standard logic-gates network must be designed for a given computation no free parameters for adaptation learning only by modification of the network topology today, we will study weighted networks Hendrich 5

Perceptron model AL 64-360 The classical Perceptron proposed by Rosenblatt (1958) threshold elements with real-valued weights originally, assuming stochastic connections and a special learning algorithm a model for pattern recognition problems and the human visual processing very popular in the 1960s comprehensive critical analysis by Minsky and Papert (1969) more details and proofs: Rojas, chapter 3 Hendrich 6

Perceptron model AL 64-360 The Perceptron a binary classifier that maps a real-valued vector x of inputs to a binary output value f (x) binary output function with threshold θ similar to the McCulloch-Pitts neuron but weighted real-valued inputs x i (t + 1) = 1 if ( j w ij x j (t) θ), 0 else what are the computational limits of this model? Hendrich 7

Perceptron model AL 64-360 Perceptron motivation abstract model of the retina and local preprocessing the association area: feature detection (perceptrons) active outputs indicate matched patterns Hendrich 8

Perceptron model AL 64-360 Predicate functions arbitrary predicate functions P, possibly non-linear and complex but with limited receptive fields perceptron applies linear weighting and threshold θ Hendrich 9

Perceptron model AL 64-360 Example predicates (Minsky and Papert 1969) Hendrich 10

Perceptron model AL 64-360 Example task: detection of connected figures? Hendrich 11

Perceptron model AL 64-360 Example task: detection of connected figures? three receptive fields, combined by a predicate function each group outputs fed to a single perceptron can the perceptron detect connectedness of the line? Hendrich 12

Perceptron model AL 64-360 Can we detect connected figures? Proposition: No diameter limited perceptron can decide whether a geometric figure is connected or not. assume the perceptron can decide this construct demo patterns A B C D scale until figure fills the left/middle/right receptive fields assume that figures A, B, C are classified correctly contradiction: figure D is classified incorrectly (Minsky and Papert, 1969) Hendrich 13

Perceptron model AL 64-360 Proof S = 2 S + 1 S 2S, S + S S 0, so D connected Hendrich 14

Perceptron model AL 64-360 Logical functions with Perceptrons single McCulloch-Pitts neurons implement monotonous functions McCulloch-Pitts networks can implement any Boolean function what can a single Perceptron do? Geometric interpretation: Perceptron implements a thresholded decision it separates its input space into two half-spaces for points in one half the result is 0, and in the other half 1 Hendrich 15

Perceptron model AL 64-360 Geometric interpretation basic visualization in the 2D-case: dimensions x 1 and x 2 a line separating the upper and lower half of the plane Hendrich 16

Perceptron model AL 64-360 Example separations: OR and AND functions Perceptron computes separable functions one line/plane/hyperplane separates the 0- and the 1-values Hendrich 17

Perceptron model AL 64-360 Absolute linear separability Hendrich 18

Perceptron model AL 64-360 Open half space Hendrich 19

Perceptron model AL 64-360 Example separations: the 16 functions of 2 variables 2 2n different Boolean with n variables table shows all possible functions for n = 2 14 functions can be computed with a single Perceptron but f 6 and f 9 cannot - the XOR and XNOR Hendrich 20

Perceptron model AL 64-360 Proof for the XOR problem XOR function cannot be separated with a single Perceptron must fulfil the following inequalities: x 1 = 0, x 2 = 0: w 1 x 1 + w 2 x 2 = 0 = 0 < θ x 1 = 1, x 2 = 0: w 1 x 1 + w 2 x 2 = w 1 = w 1 θ x 1 = 0, x 2 = 1: w 1 x 1 + w 2 x 2 = w 2 = w 2 θ x 1 = 1, x 2 = 1: w 1 x 1 + w 2 x 2 = w 1 + w 2 = w 1 + w 2 < θ contradiction: w 1 θ and w 2 θ and positive, but w 1 + w 2 < θ. Hendrich 21

Perceptron model AL 64-360 Error function in weight space learning should automatically find the weights to separate the given input patterns with f w (x) = 1 for class A and f w (x) = 0 for class B. Define the error function as the number of false classifications, obtained using the current weight vector w: E(w) = (1 f w (x)) + x A x B f w (x) learning should minimize this error function at the global minimum, E(w) = 0, the problem is solved Hendrich 22

Perceptron model AL 64-360 Extended weight vector proofs and implementation are often easier if threshold θ = 0 (so that we can omit it) introduce an extra weight, w n+1 = θ and an extra constant input, x n+1 = 1 the so-called extended weight vector representation Hendrich 23

Perceptron model AL 64-360 The dual formulation the weighting calculation is j w jx j 0 symmetric in w and x classification uses inputs x as variables dual-problem: use w as the variables Hendrich 24

Perceptron model AL 64-360 General decision curves original Perceptron allows complex predicates what happens with non-linear decision curves/surfaces? example shows a valid solution of the XOR function we will come back to this question later (kernel-trick) Hendrich 25

Perceptron model AL 64-360 Feature detection: structure of the retina retina known to use (non-linear) pre-processing reconsider the perceptron for visual pattern detection Hendrich 26

Perceptron model AL 64-360 Low-level convolution operators compare: traditional image-processing / computer-vision basic convolution operators for low-level feature detection for example, Laplace 3x3 edge-detection operator separate neurons can compute those operators in parallel Hendrich 27

Perceptron model AL 64-360 Perceptron as an edge-detector assume binary or gray-scale images perceptron connected to input pixels works as feature detector e.g., 3x3 Laplace operator use one perceptron centered at every pixel massively parallel computation Hendrich 28

Perceptron model AL 64-360 Feature detection classification directly on the raw-pixels? not invariant against translation, rotation, etc. Hendrich 29

Perceptron model AL 64-360 Neocognitron network use pre-processing to extract low-level features run classification on the features, not the raw pixels Hendrich 30

Perceptron model AL 64-360 Perceptron summary: binary neurons, internal state is x = j w ijx j threshold output function o(x) = 1 2 (sgn(x) 1) single-layer, fully connected to input neurons continous weights w ij basic learning rule single Perceptron cannot solve XOR or connectedness non-linear predicate functions might help a model for low-level feature detection Hendrich 31

Perceptron learning AL 64-360 Learning algorithms for neural networks network self-organizes to implement the desired behaviour correction-steps to improve the network error Hendrich 32

Perceptron learning AL 64-360 Three models of learning supervised learning (corrective learning) learning from labelled examples provided by a knowledgable external supervisor the correct outputs are known for all training samples the type of learning usually assumed in NN studies reinforcement learning: no labelled examples environment provides a scalar feedback signal combine exploration and exploitation to maximize the reward unsupervised learning: no external feedback at all Hendrich 33

Perceptron learning AL 64-360 Classes of learning algorithms most general is unsupervised learning e.g., classify input-space clusters but we don t know how many or even which clusters Hendrich 34

Perceptron learning AL 64-360 Perceptron learning example assume a perceptron with constant threshold θ = 1 look for weights w 1 and w 2 to realize the logic AND function calculate and plot the error-function E(w) for all weights learning algorithm should reach the solution triangle Hendrich 35

Perceptron learning AL 64-360 Error surface for the AND function Hendrich 36

Perceptron learning AL 64-360 Error surface for the AND function, from above areas of the error-function in weight-space four steps of weight adjustments during learning final state w is a valid solution with error E(w) = 0 Hendrich 37

Perceptron learning AL 64-360 Solution polytope in general, solution area is a polytope in R n at given threshold θ, a cut of the solution polytope Hendrich 38

Perceptron learning AL 64-360 Perceptron training algorithm assume a given classification problem set of m training input patterns p i and outputs f (p i ) try to minimize error-function E(w) in weight-space standard algorithms starts with random weights pick a training pattern p i randomly update the weights to improve classification for this pattern pick the next training pattern until E(w) = 0 Hendrich 39

Perceptron learning AL 64-360 Perceptron training algorithm Hendrich 40

Perceptron learning AL 64-360 Perceptron training algorithm algorithm can stop once a first solution is found we will study a more robust variant later (the so-called support-vector machine) algorithm converges in finite number of steps iff the patterns are linearly separable proof: Rojas 4.2.2 no convergence if problem is not solvable learning a conflicting pattern can degrade network performance Hendrich 41

Perceptron learning AL 64-360 Visualization in input and extended weight space Hendrich 42

Perceptron learning AL 64-360 Behaviour of Perceptron learning Hendrich 43

Perceptron learning AL 64-360 Worst-case behaviour search direction (in w space) according to last incorrect classified pattern but no information about the shape of the error function figure shows worst case for Perceptron learning Hendrich 44

Perceptron learning AL 64-360 Pocket training algorithm how to guarantee good behaviour if problem not separable? we want the best solution: least total error E(w) standard Perceptron algorithm won t converge Pocket algorithm: keep a history of the learning process over a set of patterns keep previous best solution in your pocket restore previous best solution if E(w) degrades Hendrich 45

Perceptron learning AL 64-360 Pocket training algorithm variant of the basic Perceptron learning keep previous best solution in your pocket restore previous solution if E(w) degrades converges to the best possible solution Hendrich 46

Perceptron learning AL 64-360 Linear programming solution area bounded by hyperplanes use linear programming (simplex-algorithm) to find optimal solution worst-case performance is exponential but usually, well-behaved Hendrich 47

Multi-layer networks AL 64-360 Multi-layer networks combine multiple threshold elements to larger networks can compute any Boolean function but we have no learning rule (yet) Network architecture? 2-layer networks multi-layer feed-forward networks multi-layer networks with shortcut connections networks with feedback connections Hendrich 48

Multi-layer networks AL 64-360 Network architecture Hendrich 49

Multi-layer networks AL 64-360 General multi-layered feed-forward network input layer (remember: inputs only, no computation here) one or more layers of hidden neurons one layer of output neurons Hendrich 50

Multi-layer networks AL 64-360 The XOR in a 2-layer network a solution of the XOR function based on (x 1 x 2 ) ( x 1 x 2 ) Hendrich 51

Multi-layer networks AL 64-360 The 16 Boolean two-arg functions, again Hendrich 52

Multi-layer networks AL 64-360 Computation of XOR with three units Hendrich 53

Multi-layer networks AL 64-360 The XOR with two neurons network with shortcut connections only two neurons but learning is more complex Hendrich 54

every neuron in the hidden layer defines one separation output neuron combines the half-spaces Hendrich 55 University of Hamburg Multi-layer networks AL 64-360 Geometric interpretation

Multi-layer networks AL 64-360 Geometric interpretation output neuron must only learn 3 regions feature space has lower dimension that the input space Hendrich 56

Multi-layer networks AL 64-360 Polytopes multiple separations delimit a polytope Hendrich 57

Multi-layer networks AL 64-360 Counting regions in input and weight space how many regions can be delimited with n hyperplanes? Hendrich 58

Multi-layer networks AL 64-360 Visualization of the two-arg functions Hendrich 59

Multi-layer networks AL 64-360 Example error functions f 1111 (left) x 1 x 2 (right) Hendrich 60

Multi-layer networks AL 64-360 Example error functions error-function for OR Hendrich 61

Multi-layer networks AL 64-360 Shatterings What is the maximum number of elements of an input space which can be classfied by our networks? single two-input Perceptron can classify three points but not four (see XOR) network with 2 hidden neurons: two dividing lines in feature space, 14 regions how many concepts can be classified by a network with n hidden units? Hendrich 62

Multi-layer networks AL 64-360 Shatterings possible colorings of the input space for two dividing lines Hendrich 63

Multi-layer networks AL 64-360 Number of regions in general How many regions are defined by m cutting hyperplanes of dimension n 1 in an n-dimensional space? consider only hyperplanes going through the origin in 2D: 2m regions in 3D: each cut increases the number of regions by a factor of 2 general case: recursive calculation Hendrich 64

Multi-layer networks AL 64-360 Recursive calculation of R(m, n) Hendrich 65

Multi-layer networks AL 64-360 Hendrich 66

Multi-layer networks AL 64-360 Number of Boolean and threshold functions of n inputs Hendrich 67

Multi-layer networks AL 64-360 Number of threshold functions grows polynomially Hendrich 68

Multi-layer networks AL 64-360 Learnability problems Hendrich 69

Backpropagation learning AL 64-360 Learning in multi-layer networks we now need learning algorithms for our multi-layer networks minimization of the error-funtion gradient-descent algorithms look promising but error-functions of binary threshold neurons are flat use neurons with smooth output functions instead of the step-function accept small/large values (e.g. 0.1, 0.9) for classification Hendrich 70

Backpropagation learning AL 64-360 Sigmoid function s c (x) = 1 1 + e cx S(x) = 2s(x) 1 = 1 e x 1 + e x Hendrich 71

Backpropagation learning AL 64-360 Generalized error function Hendrich 72

Backpropagation learning AL 64-360 Error function with sigmoid transfer function smooth gradients instead of steps Hendrich 73

Backpropagation learning AL 64-360 Error function with local minima possibly, exponential number of local minima Hendrich 74

our output function is differentiable calculate and use f (x) and f (x) Hendrich 75 University of Hamburg Backpropagation learning AL 64-360 Two sides of a computing unit

Backpropagation learning AL 64-360 Backpropagation idea present a training pattern to the network compare the network output to the desired output, calculate the error in each output neuron for each neuron, calculate what the output should have been. Calculate a scaling factor required to reach the desired output. Adjust the weights of each neuron to lower the local error Assign blame for the local error to neurons at the previous level, giving greater responsibility to neurons connected by stronger weights repeat for every previous layer, using the blame as the local error (Wikipedia) Hendrich 76

Backpropagation learning AL 64-360 Backpropagation pseudocode Initialize the weights in the network (often randomly) Do For each example e in the training set O = neural-net-output(network, e) forward pass T = teacher output for e Calculate error (T O) at the output units ; backward pass Compute δw i for all weights from hidden layer to output layer Compute δw i for all weights from input layer to hidden layer Update the weights in the network Until all examples classified correctly or stopping criterion satisfied Return the network (Wikipedia) Hendrich 77

Backpropagation learning AL 64-360 Backpropagation algorithm Hendrich 78

Backpropagation learning AL 64-360 Four phases of the algorithm Hendrich 79

Backpropagation learning AL 64-360 Function composition Hendrich 80

Backpropagation learning AL 64-360 Backpropagation composition Hendrich 81

Backpropagation learning AL 64-360 Schema in both directions, values weighted with w ij Hendrich 82

Backpropagation learning AL 64-360 Connections for the 3-layer network Hendrich 83

Backpropagation learning AL 64-360 Calculation of the output layer Hendrich 84

Backpropagation learning AL 64-360 Backpropagation of the error Hendrich 85

Backpropagation learning AL 64-360 Calculation of hidden units Hendrich 86

Backpropagation learning AL 64-360 Time evolution of the error function Hendrich 87

Backpropagation learning AL 64-360 Backpropagation with momentum term Hendrich 88

Backpropagation learning AL 64-360 Typical backpropagation experiment find an interesting problem prepare a pattern set normalize, pre-process if necessary select training patterns select validation patterns select the network architecture number of layers and neurons per layer select input and output encodings select neuron activation and output functions select the learning algorithm and parameters run the learning algorithm on the training patters check network performance on the validation patterns repeat until convergence Hendrich 89

Backpropagation learning AL 64-360 Online vs. offline learning Online-learning: for each training pattern: apply pattern to the input layer propagate the pattern forward to the output neurons calculate error backpropagate errors to the hidden layers patterns selected randomly or in fixed order usual strategy for BP learning Offline-learning (batch-learning): apply all training patterns to the network accumulate the error over all patterns one backpropagation step very slow when large training set Hendrich 90

Backpropagation applications AL 64-360 Applications of backpropagation NETtalk function approximation Hendrich 91

Backpropagation applications AL 64-360 NETtalk: speech synthesis text-to-speech synthesis? commercially available for many years most spoken languages based on a small set of fixed phonemes so, transform a string of input characters into (a string of) phonemes usually, built around hand-crafted linguistic rules and transformation rules plus dictionaries to handle exceptions from the rule can neural networks be taught to speak? Hendrich 92

Backpropagation applications AL 64-360 NETtalk English speech synthesis without linguistic knowledge or transformations 3-layer feed-forward network input is 7-character sliding window of text 7 29 input sites: 1-hot encoding of the 26 letters plus punctuation 80 hidden units 26 output units: 1-hot encoding of the phonemes network generates the phoneme corresponding to the middle of the seven input characters (Sejnowski and Rosenberg 1980) Hendrich 93

Backpropagation applications AL 64-360 NETtalk corpus of several hundred works together with their phonetic transcription used as training pattersn backpropagation learning used to set the network weights (7 29 80 + 26 80 = 18320) the output neuron with the highest activation is used to select the output phoneme phoneme speech-synthesizer drivers the speaker (Sejnowski and Rosenberg 1980) Hendrich 94

Backpropagation applications AL 64-360 NETtalk architecture S h e 1 2 3 c o u l 4 5 6 7 [k] d {I} {H} {O} 7*29 80 26 Hendrich 95

Backpropagation applications AL 64-360 NETtalk demo initially, network errors similar to children s errors final network performance acceptable but not outstanding increasing the network size does not really help analysis of the final learning network is difficult evidence that some of the hidden units encode well-known linguistic rules damaging some of the weights produces specific deficiencies demo Hendrich 96

Backpropagation applications AL 64-360 Outlook We will look at several PDP/NN architectures: McCulloch-Pitts model Perceptron Multilayer perceptron with backpropagation learning Recurrent netwoks: Hopfield-model and associative memory Self-organizing maps their units and interconnections their learning rules their environment model and possible applications Hendrich 97