A Modularization Scheme for Feedforward Networks
|
|
- Leslie Blake
- 5 years ago
- Views:
Transcription
1 A Modularization Scheme for Feedforward Networks Arnfried Ossen Institut für Angewandte Informatik Technical University of Berlin Berlin W-1000/19, FR Germany Abstract This article proposes a modularization scheme for feedforward networks based on controllable internal representations. Control is achieved by replacing units with pretrained modules that constrain internal patterns of activity to desired subsets. In the case of auto-associative feedforward networks these subsets can be seen as module interfaces. If enough a priori knowledge about a system is available, hierarchical systems with separately trainable and exchangable modules can be built. 1 INTRODUCTION Feedforward networks with backpropagation as a learning procedure have been successfully used for many problems. Generally speaking, they can approximate continuous nonlinear functions to any desired precision. But there are deficiencies: learning time efficiency is at least of polynomial order; generalization abilities of standard networks are insufficient; interpretation of network behavior is difficult, if not impossible. Much work has been done on the optimization of learning time efficiency. The majority of the techniques developed are based on second order methods to minimize the number of steps the gradient descent procedure has to take. But even if significant progress could be achieved, it would not help improve generalization abilities of networks. Le Cun [1989] has pointed out that the number of training examples required for good generalization scales like the logarithm of the number of functions which a specific network architecture can implement. Assuming that a small network is less general than a bigger one, i.e. that it can implement less functions, provided it is still able to compute Sekretariat FR -9, Franklin Str. 28/29, <ao@coma.cs.tuberlin.de> the desired function, we can conclude that small networks yield better generalization than bigger networks, given the same amount of training data. Learning complexity per learning cycle decreases, then, simply because small networks contain less links and units. On the other hand, overall learning efficiency may deteriorate because of a decreasing convergence rate. A couple of problem-independent strategies are known for the construction of small nets. Either an auxiliary error term is added to the cost function in order to penalize network configurations with many active links and/or units [Chauvin, 1989], or the relevance of links/units with respect to the error is determined, and the least relevant links/units are deleted [Mozer and Smolensky, 1989; le Cun et al., 1990]. The problem with minimal network construction using auxiliary error terms is the difficulty of relative weighting of terms, which may cause convergence problems. Also, it might still be very difficult to understand the emerging patterns of activity in the layers. A more general approach is to minimize the number of free parameters in the network, e.g. by incorporating equality constraints between the weights of the network based on a priori knowledge of the problem [le Cun, 1989]. A significant reduction in learning complexity and better generalization can be obtained. On an even more general level, it is now recognized that the optimal network architecture for a given problem should be the one that can be described with the least number of bits, that is, the one with the Minimum Description Length (MDL) proposed by Rissanen. A third approach is proposed in this article. A priori knowledge is used to select a constraint space for internal representations. The goal is to use the constrained internal representations as within bounds flexible interfaces between modules of feedforward networks. In the first place, a break-down into modules would result in reduced learning complexity. In addition, if enough a priori knowledge is available, it should also be possible to tailor the interfaces in such a way that the number of free parameters is also re-
2 duced without losing the network s ability to implement the desired function, resulting in enhanced generalization abilities. m 2 2 CONSTRAINING INTERNAL REPRESENTATIONS This interface can be represented as a constraint space for patterns of activation at module boundaries. The patterns have to be controllable and interpretable, but must still be general enough to capture underlying regularities of the environment of the module during the learning procedure. On the other hand, they have to facilitate the encoding of external values and the decoding of adopted patterns of activity in terms of external concepts. A promising way of defining these patterns is the use of coding schemes that restrict patterns of activation to specific subsets with well-defined values related to them. I have chosen a coarse coding [Hinton et al., 1986] scheme of scalar values. In comparison to value-unit codings, it shows improved convergence [Hancock, 1989]. It also provides the resolution needed for the development of internal representations. The intended scalar value can simply be represented by sampling a unimodal function centered at this value [Saund, 1989]. Decoding of scalars is possible via auxiliary networks that map scalar representations to an activation value or by a least squares error procedure. 2.1 A CONSTRAINED NETWORK Constraints are enforced by replacing the layer of a standard feedforward network with a pretrained autoassociative network. An auto-associative network m 1 trained for identity mapping of any desired scalar representations will generalize to reasonable representations at the s if clamped to unknown data. If backpropagated to the units, the error can be used to modify the s in a way that minimizes the error at the units. If placed in the middle of a surrounding network m 2, adopted representations at the units of m 1 serve as encodings of internal representations of m 2, while the backpropagated errors can be used to modify the weights between the layer of m 2 and the layer of m 1 (see figure 1). Thus two optimizations take place: (a) the layer of m 1 will eventually generate almost perfect scalar representations without any weight changes in m 1 proper; (b) m 2 converges to its desired / mapping. 2.2 A CONSTRAINED ENCODER NETWORK The scheme was tested on an encoder whose layer had been replaced by a module. Three m 1 Figure 1: An auto-associative module m 1 as an abstraction of the layer of m 2. simulation runs were carried out (see figure 2): a) standard system; b) system (three layers); c) 8-[4-4-4]-8 system with pretrained abstract module. The weights of the links of the three layers are copied from a system that has learned to auto-associate arbitrarily-positioned coarse-coded scalars in a separate learning procedure. After being copied, the weights in the module are fixed at their current values. Only weights from the layer to the layer of the system and from the layer of the system to the layer and the respective biases are subject to change by the backpropagation procedure. The system merely serves to propagate activations and backpropagate errors, thereby constraining the internal representations of the encoder to scalar representations. The convergence of the modular system is very similar to the original encoder and about one order of magnitude better than a system without separate training. Of course, the training effort for the pretrained module has to be taken into account, too. But since backpropagation is typically of polynomial order, the additional complexity (40 free weights) is low in comparison to the complexity of the training cycles for the (120 free weights) and 8-[4-4- 4]-8 (80 free weights) systems.
3 total sum of squares 1e [4-4-4]-8 Y e e-01 1e+00 1e+01 1e+02 1e+0 training epochs X Figure 2: Convergence of encoder, encoder, and 8-[4-4-4]-8 encoder with pretrained abstract module. APPLICATIONS.1 NONLINEAR DIMENSIONALITY REDUCTION This system can be applied to nonlinear dimensionality reduction problems [Saund, 1989], which turn out to be the special case of equal representations at and layers. The constraining module has to be pretrained to auto-associate the chosen scalar coding, while the original modules auto-associate the higher dimensional data. The constraining module pressures internal patterns of activation into the desired coding, without need for special convergence strategies 1. Figure shows a nonlinear dimensionality reduction from two-dimensional data to a onedimensional constraint using a 16-[8-8-8]-16 architecture..2 INTERFACE DEFINITION In truly modular systems, e.g. large computer programs, there is typically a design phase requiring the strict definition of interfaces before the construction of modules can take place. This includes a fixed / relation for all data to be processed. In addition, the representation of data at module boundaries is given by data types, that is, the 1 To obtain convergence, Saund [1989] has to use a simulated annealing schedule (smoothness of scalar coding) and a method of encouraging scalarized behavior by increasing peaks and decreasing valleys in a particular trial. Figure : Nonlinear dimensionality reduction from twodimensional curve data to one-dimensional data denoting points along the curve. range of possible values a parameter can adopt. Once a design is completed, modules can be constructed separately, which in general results in a significant reduction in implementation complexity. Also, modules can be exchanged, e.g. if implemented inefficiently, at any time without interfering with other parts of the system. How might this scheme be transferred to modular neural networks? An important feature of neural nets is their ability to learn from examples, that is, to develop internal representations according to the underlying regularities of the environment. If internal representations are to be used as interfaces, all internal representations that can emerge during learning have to be known in advance in order to define module / relations, making the learning phase senseless. This dilemma can partly be overcome if interfaces are defined more loosely. One way of achieving a loose coupling is to take advantage of the characteristics of scalar codings. In the case of an auto-associative feedforward network, where internal representations are forced into scalar codings and where s/targets are also presented in the form of scalar codings, a sufficient approximation of the targets is only possible if the scalar codings at the layer evolve in orderly fashion. In other words, internal codings are almost equally distributed over the given interval and are in ascending or descending order with respect to corresponding s/targets, as shown above for the two-to-
4 one dimensionality-reduction problem. The behavior very much resembles a simple topology-preserving mapping as known from Kohonen s self-organizing feature maps [Kohonen, 1989]. Given that, an interface can be defined by a type definition, that is, the range of valid scalar values, and the assertion that a neighborship relation between patterns of activation at s/targets and internal patterns will be preserved. On the other hand, a learning system consisting of more loosely coupled modules will lose some of the capabilities a strictly defined system has. m m. EXAMPLE Let us consider a system where some raw data is to be processed in several steps. First, data is compressed into compact encodings, then a second step is applied, e.g. feature values derived from the encodings are displayed. Then further processing steps are carried out. Such a system could be broken down into modules loosely coupled via scalar-coded interfaces. Figure 4 shows an example. Two modules (m 1 ; m 2 ) of auto-associative feedforward networks learn to transform the raw data into scalarcoded internal representations. Assuming that the low-level configuration is appropriate for the task (a priori knowledge), two sets of scalar-coded sequences are learned. Since the possible values are known, it is sufficient to train one display module (m ) to indicate the adopted values of both low-level modules. The display module can therefore be trained separately and may be exchanged. 4 DISCUSSION Modularization is a promising way of combatting learning time complexity in backpropagation networks. On the other hand, the above modular system would be of little use if absolute feature values were important for further processing. It was not possible to train higher-level modules separately and no reduction in learning complexity was achievable. However, the topology-preserving behavior of scalarcoded interfaces might be a sufficient justification here, at least for a subset of problems. For these cases, a looselycoupled modular system not only allows the required learning epochs to be reduced; a means of interpreting patterns of activity at module interfaces is also achieved. A LEARNING PROCEDURE ADJUSTMENTS All simulations were carried out using the standard backpropagation algorithm with a fixed learning rate of = 0:2, a fixed momentum of = 0:1, weight update after each m m 1 2 data data Figure 4: A modular system with scalar coded interfaces. presentation (on-line mode), and sequential presentation of patterns. To speed up learning, the gradient was normalized before updating the weight. This follows from the empirical observation that the product of optimal learning rate and absolute value of the gradient remains almost constant over all learning cycles, see [Salomon, 1989]. Scalar codings were created using the derivative of the sigmoidal function 1=(1 + e?x=t ) at a temperature of t = 1=8. Acknowledgement I would like to thank Albrecht Biedl for his helpful comments on early drafts of the paper. My special thanks go to Geoffrey Hinton for his critical comments during revision of a previous published version of this paper. References [Chauvin, 1989] Yves Chauvin. A back-propagation algorithm with optimal use of units. In David S. Touretzky, editors, Advances in Neural Information Processing Systems I, pages Morgan Kaufmann Publishers, San Mateo, California, [Hancock, 1989] Peter J. B. Hancock. Data representation in neural nets: An empirical study. In David Touretzky, Geoffrey Hinton, and Terrence Sejnowski, editor, Proceedings of the 1988 Connectionist Models Summer School, pages 11 20, San Mateo, CA, Morgan Kaufmann Publishers.
5 [Hinton et al., 1986] Geoffrey E. Hinton, James L. Mc- Clelland, and David E. Rumelhart. Distributed representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, chapter. MIT Press/Bradford Books, [Kohonen, 1989] Teuvo Kohonen. Self-Organization and Associative Memory. Springer Verlag, [le Cun et al., 1990] Yann le Cun, John S. Denker, and Sara A. Solla. Optimal brain damage. In David S. Touretzky, editors, Advances in Neural Information Processing Systems II, pages Morgan Kaufmann Publishers, San Mateo, California, [le Cun, 1989] Yann le Cun. Generalization and network design strategies. In Rolf Pfeiffer, Zoltan Schreter, Françoise Fogelman-Solié, and Luc Steels, editor, Proceedings Connectionism in Perspective. Swiss Group for Artificial Intelligence and Cognititive Science (SGAICO), Elsevier Science Publishers B.V., [Mozer and Smolensky, 1989] Michael C. Mozer and Paul Smolensky. Skeletonization: A technique for trimming the fat from a network via relevance assessment. Technical Report CU-CS , University of Colorado at Boulder, Boulder, January [Salomon, 1989] Ralf Salomon. Adaptiv geregelte Lernrate bei Back-propagation. Technical Report 89-24, Technische Universität Berlin, Forschungsberichte des Fachbereichs Informatik. [Saund, 1989] Eric Saund. Dimensionality-reduction using connectionist networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11():04 14, March 1989.
Classifier C-Net. 2D Projected Images of 3D Objects. 2D Projected Images of 3D Objects. Model I. Model II
Advances in Neural Information Processing Systems 7. (99) The MIT Press, Cambridge, MA. pp.949-96 Unsupervised Classication of 3D Objects from D Views Satoshi Suzuki Hiroshi Ando ATR Human Information
More informationLecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan
Lecture 17: Neural Networks and Deep Learning Instructor: Saravanan Thirumuruganathan Outline Perceptron Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks Auto Encoders
More informationPractical Tips for using Backpropagation
Practical Tips for using Backpropagation Keith L. Downing August 31, 2017 1 Introduction In practice, backpropagation is as much an art as a science. The user typically needs to try many combinations of
More informationSTART HINDERNIS ZIEL KOSTEN
In J. A. Meyer, H. L. Roitblat, and S. W. Wilson, editors, Proc. of the 2nd International Conference on Simulation of Adaptive Behavior, pages 196-202. MIT Press, 1992. PLANNING SIMPLE TRAJECTORIES USING
More informationAssignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions
ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan
More informationHandwritten Zip Code Recognition with Multilayer Networks
Handwritten Zip Code Recognition with Multilayer Networks Y. Le Cun, 0. Matan, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel and H. S. Baird AT&T Bell Laboratories, Holmdel,
More informationSimplifying OCR Neural Networks with Oracle Learning
SCIMA 2003 - International Workshop on Soft Computing Techniques in Instrumentation, Measurement and Related Applications Provo, Utah, USA, 17 May 2003 Simplifying OCR Neural Networks with Oracle Learning
More informationThis leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section
An Algorithm for Incremental Construction of Feedforward Networks of Threshold Units with Real Valued Inputs Dhananjay S. Phatak Electrical Engineering Department State University of New York, Binghamton,
More informationOptimal Brain Damage. Yann Le Cun, John S. Denker and Sara A. Solla. presented by Chaitanya Polumetla
Optimal Brain Damage Yann Le Cun, John S. Denker and Sara A. Solla presented by Chaitanya Polumetla Overview Introduction Need for OBD The Idea Authors Proposal Why OBD could work? Experiments Results
More informationOptimal Artificial Neural Network Architecture Selection for Bagging
Optimal Artificial Neural Network Architecture Selection for Bagging Tim Andersen Mike Rimer Tony Martinez Iarchives 1401 S. State, Provo, UT 84097 USA timothyandersen@yahoo.com Abstract This paper studies
More informationCharacter Recognition Using Convolutional Neural Networks
Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationConstructing Hidden Units using Examples and Queries
Constructing Hidden Units using Examples and Queries Eric B. Baum Kevin J. Lang NEC Research Institute 4 Independence Way Princeton, NJ 08540 ABSTRACT While the network loading problem for 2-layer threshold
More informationNetwork Simplification Through Oracle Learning
Brigham Young University BYU ScholarsArchive All Faculty Publications 2002-05-17 Network Simplification Through Oracle Learning Tony R. Martinez martinez@cs.byu.edu Joshua Menke See next page for additional
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationIMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM
Annals of the University of Petroşani, Economics, 12(4), 2012, 185-192 185 IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM MIRCEA PETRINI * ABSTACT: This paper presents some simple techniques to improve
More informationOverfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping
Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Earl Stopping Rich Caruana CALD, CMU 5 Forbes Ave. Pittsburgh, PA 53 caruana@cs.cmu.edu Steve Lawrence NEC Research Institute 4 Independence
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationLearning Population Codes by Minimizing Description Length
To appear in: Neural Computation, 7:3, 1995 Learning Population Codes by Minimizing Description Length Richard S Zemel Geoffrey E Hinton Computational Neurobiology Laboratory Department of Computer Science
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationA Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks
Communicated by Fernando Pineda A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks Jurgen Schmidhuber Department of Computer Science, University
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationHandwritten Digit Recognition with a. Back-Propagation Network. Y. Le Cun, B. Boser, J. S. Denker, D. Henderson,
Handwritten Digit Recognition with a Back-Propagation Network Y. Le Cun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel AT&T Bell Laboratories, Holmdel, N. J. 07733 ABSTRACT
More informationQuery Learning Based on Boundary Search and Gradient Computation of Trained Multilayer Perceptrons*
J.N. Hwang, J.J. Choi, S. Oh, R.J. Marks II, "Query learning based on boundary search and gradient computation of trained multilayer perceptrons", Proceedings of the International Joint Conference on Neural
More informationNeural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders
Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components
More informationLearning a Nonlinear Model of a Manufacturing Process Using Multilayer Connectionist Networks
Learning a Nonlinear Model of a Manufacturing Process Using Multilayer Connectionist Networks Charles W. Anderson Judy A. Franklin Richard S. Sutton GTE Laboratories Incorporated 40 Sylvan Road Waltham,
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationTangent Prop - A formalism for specifying selected invariances in an adaptive network
Tangent Prop - A formalism for specifying selected invariances in an adaptive network Patrice Simard AT&T Bell Laboratories 101 Crawford Corner Rd Holmdel, NJ 07733 Yann Le Cun AT&T Bell Laboratories 101
More informationArtificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)
Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent
More informationJohn R. Koza Computer Science Department Stanford University Stanford, CA USA
GENETIC GENERATION OF BOTH THE EIGHTS ARCHITECTURE F A NEURAL NETK John R. Koza Computer Science Department Stanford University Stanford, CA 94305 USA Koza@Sunburn.Stanford.Edu 415-941-0336 James. Rice
More informationAutomatic Adaptation of Learning Rate for Backpropagation Neural Networks
Automatic Adaptation of Learning Rate for Backpropagation Neural Networks V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece.
More information3 Nonlinear Regression
CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic
More informationAn Improved Backpropagation Method with Adaptive Learning Rate
An Improved Backpropagation Method with Adaptive Learning Rate V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, Division of Computational Mathematics
More informationA New Learning Algorithm for Neural Networks with Integer Weights and Quantized Non-linear Activation Functions
A New Learning Algorithm for Neural Networks with Integer Weights and Quantized Non-linear Activation Functions Yan Yi 1 and Zhang Hangping and Zhou Bin 3 Abstract The hardware implementation of neural
More informationUsing neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah
Using neural nets to recognize hand-written digits Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the first chapter of the online book by Michael
More informationIn this assignment, we investigated the use of neural networks for supervised classification
Paul Couchman Fabien Imbault Ronan Tigreat Gorka Urchegui Tellechea Classification assignment (group 6) Image processing MSc Embedded Systems March 2003 Classification includes a broad range of decision-theoric
More informationNets with Unreliable Hidden Nodes Learn Error-Correcting Codes
Nets with Unreliable Hidden Nodes Learn Error-Correcting Codes Stephen Judd Siemens Corporate Research 755 College Road East Princeton NJ 08540 judd@learning.siemens.com Paul W. Munro Department of Infonnation
More informationSupervised Learning in Neural Networks (Part 2)
Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning
More information3 Nonlinear Regression
3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More informationAdaptive Tiled Neural Networks
Adaptive Tiled Neural Networks M.Nokhbeh-Zaeem, D.Khashabi, H.A.Talebi, Sh.Navabi, F.Jabbarvaziri Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran {nokhbeh100,d.khashabi,shiva.navabi,faramarz.vaziri87}@gmail.com,
More informationMANIAC: A Next Generation Neurally Based Autonomous Road Follower
MANIAC: A Next Generation Neurally Based Autonomous Road Follower Todd M. Jochem Dean A. Pomerleau Charles E. Thorpe The Robotics Institute, Carnegie Mellon University, Pittsburgh PA 15213 Abstract The
More informationConstructively Learning a Near-Minimal Neural Network Architecture
Constructively Learning a Near-Minimal Neural Network Architecture Justin Fletcher and Zoran ObradoviC Abetract- Rather than iteratively manually examining a variety of pre-specified architectures, a constructive
More informationEFFICIENT PARALLEL LEARNING ALGORITHMS FOR NEURAL NETWORKS
40 EFFICIENT PARALLEL LEARNING ALGORITHMS FOR NEURAL NETWORKS Alan H. Kramer and A. Sangiovanni-Vincentelli Department of EECS U.C. Berkeley Berkeley, CA 94720 ABSTRACT Parallelizable optimization techniques
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationMODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS
MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS Deniz Erdogmus, Justin C. Sanchez 2, Jose C. Principe Computational NeuroEngineering Laboratory, Electrical & Computer
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More informationRichard S. Zemel 1 Georey E. Hinton North Torrey Pines Rd. Toronto, ONT M5S 1A4. Abstract
Developing Population Codes By Minimizing Description Length Richard S Zemel 1 Georey E Hinton University oftoronto & Computer Science Department The Salk Institute, CNL University oftoronto 0 North Torrey
More informationIEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999 1271 Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming Bao-Liang Lu, Member, IEEE, Hajime Kita, and Yoshikazu
More informationRecognizing Hand-written Digits Using Hierarchical Products of Experts
Recognizing Hand-written Digits Using Hierarchical Products of Experts Guy Mayraz & Geoffrey E. Hinton Gatsby Computational Neuroscience Unit University College London 17 Queen Square, London WCIN 3AR,
More informationHybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for Classification Sriram G. Sanjeevi 1, A. Naga Nikhila 2,Thaseem Khan 3 and G. Sumathi 4 1 Associate Professor, Dept. of CSE, National Institute
More informationA Novel Technique for Optimizing the Hidden Layer Architecture in Artificial Neural Networks N. M. Wagarachchi 1, A. S.
American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629
More information28th Seismic Research Review: Ground-Based Nuclear Explosion Monitoring Technologies
APPLICATION OF ARTIFICIAL NEURAL NETWORK MODELING TO THE ANALYSIS OF THE AUTOMATED RADIOXENON SAMPLER-ANALYZER STATE OF HEALTH SENSORS James C. Hayes, Pamela G. Doctor, Tom R. Heimbigner, Charles W. Hubbard,
More informationMulti-Layered Perceptrons (MLPs)
Multi-Layered Perceptrons (MLPs) The XOR problem is solvable if we add an extra node to a Perceptron A set of weights can be found for the above 5 connections which will enable the XOR of the inputs to
More informationA Massively-Parallel SIMD Processor for Neural Network and Machine Vision Applications
A Massively-Parallel SIMD Processor for Neural Network and Machine Vision Applications Michael A. Glover Current Technology, Inc. 99 Madbury Road Durham, NH 03824 W. Thomas Miller, III Department of Electrical
More informationCOMBINING NEURAL NETWORKS FOR SKIN DETECTION
COMBINING NEURAL NETWORKS FOR SKIN DETECTION Chelsia Amy Doukim 1, Jamal Ahmad Dargham 1, Ali Chekima 1 and Sigeru Omatu 2 1 School of Engineering and Information Technology, Universiti Malaysia Sabah,
More informationDeveloping Population Codes By Minimizing Description Length
Developing Population Codes By Minimizing Description Length Richard S. Zemel CNL, The Salk Institute 10010 North Torrey Pines Rd. La J oua, CA 92037 Geoffrey E. Hinton Department of Computer Science University
More informationA Connectionist Learning Control Architecture for Navigation
A Connectionist Learning Control Architecture for Navigation Jonathan R. Bachrach Department of Computer and Information Science University of Massachusetts Amherst, MA 01003 Abstract A novel learning
More informationAn Empirical Study of Software Metrics in Artificial Neural Networks
An Empirical Study of Software Metrics in Artificial Neural Networks WING KAI, LEUNG School of Computing Faculty of Computing, Information and English University of Central England Birmingham B42 2SU UNITED
More informationNonparametric Error Estimation Methods for Evaluating and Validating Artificial Neural Network Prediction Models
Nonparametric Error Estimation Methods for Evaluating and Validating Artificial Neural Network Prediction Models Janet M. Twomey and Alice E. Smith Department of Industrial Engineering University of Pittsburgh
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationOMBP: Optic Modified BackPropagation training algorithm for fast convergence of Feedforward Neural Network
2011 International Conference on Telecommunication Technology and Applications Proc.of CSIT vol.5 (2011) (2011) IACSIT Press, Singapore OMBP: Optic Modified BackPropagation training algorithm for fast
More informationA *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-"&"3 -"(' ( +-" " " % '.+ % ' -0(+$,
The structure is a very important aspect in neural network design, it is not only impossible to determine an optimal structure for a given problem, it is even impossible to prove that a given structure
More informationLinear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons
Linear Separability Input space in the two-dimensional case (n = ): - - - - - - w =, w =, = - - - - - - w = -, w =, = - - - - - - w = -, w =, = Linear Separability So by varying the weights and the threshold,
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More informationThe Potential of Prototype Styles of Generalization. D. Randall Wilson Tony R. Martinez
Proceedings of the 6th Australian Joint Conference on Artificial Intelligence (AI 93), pp. 356-361, Nov. 1993. The Potential of Prototype Styles of Generalization D. Randall Wilson Tony R. Martinez Computer
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationApproximation of 3D-Parametric Functions by Bicubic B-spline Functions
International Journal of Mathematical Modelling & Computations Vol. 02, No. 03, 2012, 211-220 Approximation of 3D-Parametric Functions by Bicubic B-spline Functions M. Amirfakhrian a, a Department of Mathematics,
More informationMulti-Layer Perceptrons with B-SpIine Receptive Field Functions
Multi-Layer Perceptrons with B-SpIine Receptive Field Functions Stephen H. Lane, Marshall G. Flax, David A. Handelman and JackJ. Gelfand Human Information Processing Group Department of Psychology Princeton
More informationCS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016
CS 4510/9010 Applied Machine Learning 1 Neural Nets Paula Matuszek Fall 2016 Neural Nets, the very short version 2 A neural net consists of layers of nodes, or neurons, each of which has an activation
More informationConnectivity and Performance Tradeoffs in the Cascade Correlation Learning Architecture
Connectivity and Performance Tradeoffs in the Cascade Correlation Learning Architecture D. S. Phatak and I. Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst,
More informationPerceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15
Perceptrons and Backpropagation Fabio Zachert Cognitive Modelling WiSe 2014/15 Content History Mathematical View of Perceptrons Network Structures Gradient Descent Backpropagation (Single-Layer-, Multilayer-Networks)
More informationA VARIANT OF BACK-PROPAGATION ALGORITHM FOR MULTILAYER FEED-FORWARD NETWORK. Anil Ahlawat, Sujata Pandey
International Conference «Information Research & Applications» - i.tech 2007 1 A VARIANT OF BACK-PROPAGATION ALGORITHM FOR MULTILAYER FEED-FORWARD NETWORK Anil Ahlawat, Sujata Pandey Abstract: In this
More informationComputational Learning by an Optical Thin-Film Model
Computational Learning by an Optical Thin-Film Model Xiaodong Li School of Environmental and Information Sciences Charles Sturt University Albury, NSW 264, Australia xli@csu.edu.au Martin Purvis Computer
More informationSeismic regionalization based on an artificial neural network
Seismic regionalization based on an artificial neural network *Jaime García-Pérez 1) and René Riaño 2) 1), 2) Instituto de Ingeniería, UNAM, CU, Coyoacán, México D.F., 014510, Mexico 1) jgap@pumas.ii.unam.mx
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationSynergy Of Clustering Multiple Back Propagation Networks
650 Lincoln and Skrzypek Synergy Of Clustering Multiple Back Propagation Networks William P. Lincoln* and Josef Skrzypekt UCLA Machine Perception Laboratory Computer Science Department Los Angeles CA 90024
More informationCPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018
CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2018 Last Time: Multi-Dimensional Scaling Multi-dimensional scaling (MDS): Non-parametric visualization: directly optimize the z i locations.
More informationPerformance analysis of a MLP weight initialization algorithm
Performance analysis of a MLP weight initialization algorithm Mohamed Karouia (1,2), Régis Lengellé (1) and Thierry Denœux (1) (1) Université de Compiègne U.R.A. CNRS 817 Heudiasyc BP 49 - F-2 Compiègne
More informationAnalog VLSI Implementation of Multi-dimensional Gradient Descent
Analog VLSI Implementation of Multi-dimensional Gradient Descent David B. Kirk, Douglas Kerns, Kurt Fleischer, Alan H. Barr California Institute of Technology Beckman Institute 350-74 Pasadena, CA 91125
More informationText Modeling with the Trace Norm
Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to
More informationThe rest of the paper is organized as follows: we rst shortly describe the \growing neural gas" method which we have proposed earlier [3]. Then the co
In: F. Fogelman and P. Gallinari, editors, ICANN'95: International Conference on Artificial Neural Networks, pages 217-222, Paris, France, 1995. EC2 & Cie. Incremental Learning of Local Linear Mappings
More informationAutoencoders, denoising autoencoders, and learning deep networks
4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationA Compensatory Wavelet Neuron Model
A Compensatory Wavelet Neuron Model Sinha, M., Gupta, M. M. and Nikiforuk, P.N Intelligent Systems Research Laboratory College of Engineering, University of Saskatchewan Saskatoon, SK, S7N 5A9, CANADA
More informationAn efficient MDL-based construction of RBF networks
PERGAMON NN 1202 Neural Networks 11 (1998) 963 973 Neural Networks Contributed article An efficient MDL-based construction of RBF networks Aleš Leonardis a, Horst Bischof b, * a Faculty of Computer and
More informationNeuro-Fuzzy Inverse Forward Models
CS9 Autumn Neuro-Fuzzy Inverse Forward Models Brian Highfill Stanford University Department of Computer Science Abstract- Internal cognitive models are useful methods for the implementation of motor control
More informationNovel Lossy Compression Algorithms with Stacked Autoencoders
Novel Lossy Compression Algorithms with Stacked Autoencoders Anand Atreya and Daniel O Shea {aatreya, djoshea}@stanford.edu 11 December 2009 1. Introduction 1.1. Lossy compression Lossy compression is
More informationUsing Genetic Algorithms to Improve Pattern Classification Performance
Using Genetic Algorithms to Improve Pattern Classification Performance Eric I. Chang and Richard P. Lippmann Lincoln Laboratory, MIT Lexington, MA 021739108 Abstract Genetic algorithms were used to select
More informationUsing Decision Boundary to Analyze Classifiers
Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision
More informationCenter for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.
Two-phase Backpropagation George M. Georgiou Cris Koutsougeras Center for Automation and Autonomous Complex Systems Computer Science Department, Tulane University New Orleans, LA 70118 June 5, 1991 Abstract
More informationTraffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers
Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane
More informationAccelerating the convergence speed of neural networks learning methods using least squares
Bruges (Belgium), 23-25 April 2003, d-side publi, ISBN 2-930307-03-X, pp 255-260 Accelerating the convergence speed of neural networks learning methods using least squares Oscar Fontenla-Romero 1, Deniz
More informationANFIS: ADAPTIVE-NETWORK-BASED FUZZY INFERENCE SYSTEMS (J.S.R. Jang 1993,1995) bell x; a, b, c = 1 a
ANFIS: ADAPTIVE-NETWORK-ASED FUZZ INFERENCE SSTEMS (J.S.R. Jang 993,995) Membership Functions triangular triangle( ; a, a b, c c) ma min = b a, c b, 0, trapezoidal trapezoid( ; a, b, a c, d d) ma min =
More informationDeep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies
More informationarxiv: v1 [cond-mat.dis-nn] 30 Dec 2018
A General Deep Learning Framework for Structure and Dynamics Reconstruction from Time Series Data arxiv:1812.11482v1 [cond-mat.dis-nn] 30 Dec 2018 Zhang Zhang, Jing Liu, Shuo Wang, Ruyue Xin, Jiang Zhang
More informationComparing Natural and Synthetic Training Data for Off-line Cursive Handwriting Recognition
Comparing Natural and Synthetic Training Data for Off-line Cursive Handwriting Recognition Tamás Varga and Horst Bunke Institut für Informatik und angewandte Mathematik, Universität Bern Neubrückstrasse
More informationNeural Networks: promises of current research
April 2008 www.apstat.com Current research on deep architectures A few labs are currently researching deep neural network training: Geoffrey Hinton s lab at U.Toronto Yann LeCun s lab at NYU Our LISA lab
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationFast Learning for Big Data Using Dynamic Function
IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Fast Learning for Big Data Using Dynamic Function To cite this article: T Alwajeeh et al 2017 IOP Conf. Ser.: Mater. Sci. Eng.
More information