Reservoir Computing for Neural Networks

Similar documents
For Monday. Read chapter 18, sections Homework:

Reservoir Computing with Emphasis on Liquid State Machines

Machine Learning 13. week

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Extending reservoir computing with random static projections: a hybrid between extreme learning and RC

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Neural Networks CMSC475/675

Data Mining. Neural Networks

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Activity recognition with echo state networks using 3D body joints and objects category

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Week 3: Perceptron and Multi-layer Perceptron

Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Character Recognition Using Convolutional Neural Networks

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

27: Hybrid Graphical Models and Neural Networks

Deep Learning. Architecture Design for. Sargur N. Srihari

CMPT 882 Week 3 Summary

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Yuki Osada Andrew Cannon

CS6220: DATA MINING TECHNIQUES

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Chapter 4. Adaptive Self-tuning : A Neural Network approach. 4.1 Introduction

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Artificial Neuron Modelling Based on Wave Shape

Fast Learning for Big Data Using Dynamic Function

Supervised Learning in Neural Networks (Part 2)

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

CSC 578 Neural Networks and Deep Learning

Artificial Neural Networks

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

Introduction to Multilayer Perceptrons

Neural Network Neurons

11/14/2010 Intelligent Systems and Soft Computing 1

Using neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah

CS229 Final Project: Predicting Expected Response Times

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Neural Networks and Deep Learning

arxiv: v1 [cs.lg] 25 Jan 2018

Practical Tips for using Backpropagation

Neural Networks. By Laurence Squires

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

FastText. Jon Koss, Abhishek Jindal

Artificial Neural Networks MLP, RBF & GMDH

10-701/15-781, Fall 2006, Final

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Adaptive Regularization. in Neural Network Filters

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

A Reservoir Computing approach to Image Classification using Coupled Echo State and Back-Propagation Neural Networks

Extreme Learning Machines. Tony Oakden ANU AI Masters Project (early Presentation) 4/8/2014

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Deep Learning With Noise

Review on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network

Theory of Computation. Prof. Kamala Krithivasan. Department of Computer Science and Engineering. Indian Institute of Technology, Madras

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Ensemble methods in machine learning. Example. Neural networks. Neural networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Artificial Intelligence

Introduction to Deep Learning

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Feedback Alignment Algorithms. Lisa Zhang, Tingwu Wang, Mengye Ren

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Artificial Neural Network based Curve Prediction

Introduction to Neural Networks

^ Springer. Computational Intelligence. A Methodological Introduction. Rudolf Kruse Christian Borgelt. Matthias Steinbrecher Pascal Held

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Neural Networks for Classification

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

In this assignment, we investigated the use of neural networks for supervised classification

Multilayer Feed-forward networks

Logical Rhythm - Class 3. August 27, 2018

Recurrent Neural Network (RNN) Industrial AI Lab.

Notes on Multilayer, Feedforward Neural Networks

Back propagation Algorithm:

Neural Network Approach for Automatic Landuse Classification of Satellite Images: One-Against-Rest and Multi-Class Classifiers

Neural Networks (pp )

EECS 496 Statistical Language Models. Winter 2018

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Neural Network Weight Selection Using Genetic Algorithms

Instantaneously trained neural networks with complex inputs

A faster model selection criterion for OP-ELM and OP-KNN: Hannan-Quinn criterion

Exercise: Training Simple MLP by Backpropagation. Using Netlab.

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Pattern Classification Algorithms for Face Recognition

V.Petridis, S. Kazarlis and A. Papaikonomou

Deep Learning. Roland Olsson

Parallelization and optimization of the neuromorphic simulation code. Application on the MNIST problem

Transcription:

Reservoir Computing for Neural Networks Felix Grezes CUNY Graduate Center fgrezes@gc.cuny.edu September 4, 2014 Felix Grezes (CUNY) Reservoir Computing September 4, 2014 1 / 33

Introduction The artificial neural network paradigm is a major area of research within A.I., with feedforward networks having the most recent success. Recurrent networks offer more biological plausibility and theoretical computing power, but exacerbate the flaws of feedforward nets. Reservoir computing emerges as a solution, offering a generic paradigm for fast and efficient training of RNNs. Used in a variety of fields: NLP, computational neuroscience, robotics, machine learning and more. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 2 / 33

Overview 1 Introduction 2 Neural Networks Short History Feedforward Networks Recurrent Neural Networks 3 The Reservoir Computing Paradigm Models Reservoir Computing Theory 4 How the Reservoir Computing Paradigm is used Other Randomized Networks 5 Conclusion Future Work using Reservoirs Felix Grezes (CUNY) Reservoir Computing September 4, 2014 3 / 33

Short History of Neural Networks Some Important Dates 1890 Discovery of the biological neuron by Golgi and Ramón y Cajal. 1957 Rosenblatt s Perceptron: f (x) = 1 if w x + b > 0, and 0 otherwise. 1969 Minsky and Papert show that perceptron cannot learn XOR 1975 Werbos proposes the backpropagation algorithm, training over multiple layers of perceptrons. 1989/91 Cybenko/Hornik proves that multi-layer feedforward networks are universal function approximators. 1990 Werbos proposes the backpropagation through time algorithm for RNNs. 2001/02 Jaeger/Maass propose the reservoir computing paradigm, under the names of Echo State Networks/Liquid State Machines. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 4 / 33

Feedforward Networks - The Artificial Neuron Felix Grezes (CUNY) Reservoir Computing September 4, 2014 5 / 33

Feedforward Networks Architecture Felix Grezes (CUNY) Reservoir Computing September 4, 2014 6 / 33

Feedforward Networks - Universal Approximators In landmark results, Cybenko (1989) and Hornik(1991) proved that feedforward networks can approximate any continuous function from R R under certain conditions: The activation function needs to be continuous, non-constant, bounded, and monotonically-increasing. The network contains one or more hidden layers. These results show that it is the layered architecture that gives the networks the power to approximate any function, legitimizing the search for the most efficient learning algorithms. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 7 / 33

Backpropagation - Werbos (1975) The most successful training algorithm for feedforward neural networks. It is a variant of gradient-descent and requires the activation function to be differentiable. Steps: 1 Forward propagation of the input values to obtain the activation of each neuron. 2 Backwards propagation of the error for each node, in order to calculate the delta (weight update) for each weight. Computing the delta is done by using the calculus chain rule to calculate the partial derivative of the error with respect to a weight. 3 Update each weight according to the gradient and learning rate. 4 Repeat until the error over the data is below a threshold, or the gradient converges, or for a fixed number of iterations. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 8 / 33

Recurrent Neural Networks Recurrent Neural Networks contain cycles in the graph. Why study recurrent neural networks? The human brain has a recurrent architecture. Computational neuroscientists need to understand how RNNs work. Because of the recurrent architecture, RNNs approximate not just functions but dynamical systems. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 9 / 33

RNNs - Dynamical Systems Definition A discrete dynamical system is defined by a state x X (state space) that evolves over discrete time-steps according to a fixed rule f. The rule f is a function of time and the initial state x 0, and f (x 0, t) it the state at time-step t. Dynamical system are famous for their long term dependencies and sensitivity to small initial changes (Chaos Theory). RNNs can model complex dynamical systems, allowing them to remember input values over time by echoing them through the network nodes. This is particularly useful for tasks with temporal dependencies. Feedforward networks have relied on hand-crafted data representation, e.g. n-grams in NLP tasks. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 10 / 33

Training RNNs Universal Approximator Theorem - Siegelmann and Sontag (1991) Similar to feedforward networks, RNNs are universal approximators of dynamical systems, proved by emulating a universal Turing machine. Backpropagation Through Time - Werbos (1990) Adapts the backpropagation algorithm by unfolding the RNN into a feedforward network at each time step. The related weights are tied by averaging the changes after each iteration. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 11 / 33

The Vanishing Gradient Problem - Hochreiter (1991) In networks with many hidden layers, the error gradient weakens as it moves from the back of the network to the front. RNNs trained with BPTT exacerbate the problem since the number of layers grows over time. Schiller and Steil (2005) verified this effect experimentally, noticing that dominant changes during training only appeared on the output layer of the network. The reservoir computing paradigm emerges as a solution. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 12 / 33

The Reservoir Computing Paradigm for RNNs Central Idea Since only the changes in the output layer weights are significant, then the treatment of the weights of the inner network can be completely separated from the treatment the output layer weights. In many practical applications, the initial weights of the network are randomized and never changed, with only the weights of the output layer being trained, usually by a simple linear classifier such as logistic regression or the least squares. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 13 / 33

Reservoir Models The idea of reservoir computing was proposed independently by multiple teams in the early 2000s. Since 2008, they have been united under the reservoir term. Echo State Networks - Jaeger (2001) Pioneered by Jaeger and his team. Their work focused on the properties of reservoir networks that make them work, and applied them to signal processing tasks, from a machine learning angle. Liquid State Machines - Maass (2002) Proposed by Maass, LSMs are the other pioneer model of reservoir computing. Coming from a neuroscience background, this type of reservoir has been used to understand the computing power of real neural circuits. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 14 / 33

Other Models The reservoir computing paradigm can be extended beyond neural networks. Water Basin - Fernando (2003) Taking the idea of a reservoir and echoes literally, an experiment was set up where the inputs were projected into a bucket of water, and by recording the waves bouncing around the liquid s surface, the authors were able to successfully train a pattern recognizer. Bacteria Colony - Fernando(2007) Another exotic idea for an untrained reservoir is an E.Coli. bacteria colony, with chemical stimuli as input and protein measures as output. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 15 / 33

Why do untrained reservoirs work? All these reservoir models rely on exploding the data dimensions in a random or untrained manner, where it can then be easily classified. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 16 / 33

Reservoir Computing Theory - Echo State Property Because of the network cycles, node activities can be self-reinforcing, leading to a saturation of the network. To avoid this, reservoir networks should possess the Echo State Property. Definition: Echo State Property The influence of past inputs and past states over the network activities should gradually vanish over time. Heuristics Scale the weights of the network to a desired spectral radius of the weight matrix. A spectral radius smaller than 1 guarantees the echo state property. For practical tasks with long range time dependencies, the spectral radius should be close to 1 or slightly larger. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 17 / 33

Reservoir Computing Theory - Separability If two different input signals produce a similar output signal, then the output layer classifier will fail. To produce reservoirs with a high separability power, the network should be: Large, allowing numerous activations. Sparse, making the activations decoupled. When producing random reservoirs, their separability power can be compared by measuring the distances between states of different inputs. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 18 / 33

Reservoir Computing Theory - Topology The best topology for a reservoir network remains an open question, and certainly task dependent. A study by Liebald (2004) compared well known topologies on dynamical system prediction tasks, including small world, scale free, biologically inspired, and exhaustively searching very small networks. No specific topology performed better than fully random networks. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 19 / 33

How the Reservoir Computing Paradigm is used Since the discovery of reservoir computing in the early 2000s, it has been applied in a number of tasks. Computational Neuroscience Machine Learning Speech Processing Physics: Hardware Implementation Felix Grezes (CUNY) Reservoir Computing September 4, 2014 20 / 33

Research using Reservoirs The use of reservoir in scientific research can be broadly classified in two categories: Used as a generic and powerful machine learning tool. Used to explain and simulate realistic biological processes. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 21 / 33

Reservoirs in Neuroscience - Dominey (1995) Dominey was the first to exhibit reservoir patterns in the brain, as early as 1995, before the reservoir computing was coined. He was the first to spell out that some parts of the brain seemed to be randomly connected and did not change over time, and that learning only happened on the output layer. In this early work, the activity of the pre-frontal cortex was simulated using leaky-integrator neurons and the least-mean squares output classifier. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 22 / 33

Reservoirs in Neuroscience - Bernacchia (2011) The authors found that monkey brains exhibit multiple timescales when processing expected rewards. Classical reinforcement learning only accounts for a fixed timescale. What was observed The activation of different blocks of neuron correlated with when the monkey expected a reward. How this was reproduced Single 1000 neuron reservoir, with sparse and random connections. The simulations showed a similar distribution of timescales in neural activation. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 23 / 33

Reservoirs in Neuroscience - Hinaut (2014) Combines research from neuroscience, robotics and linguistics to show that an icub robot can learn complex grammar systems through purely associative mechanisms observed in its environment. The icub robot learning from its environment By using a reservoir network modeled after the human pre-frontal cortex, the robot learns that sentences like John hit Mary and Mary was hit by John have the same meaning and same grammatical structure. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 24 / 33

What do Reservoirs bring to Neuroscience? Some kind of learning must be going on in the recurrent networks that are animal brains. But from a purely mechanical viewpoint, is it realistic that errors can influence every single connection between neurons? If efficient training of RNNs is possible using the reservoir computing paradigm, it may drastically simplify the chemical mechanisms needed to explain the learning capacities of the brain. Neuroscientists could confirm or refute this hypothesis with precise observations of neural activities. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 25 / 33

Reservoirs in Machine Learning - Oubbati (2012) Multi-objective Problems Multi-objective tasks seeks a solution that balances multiple goals at the same time: speed/quality, financial/social. The authors used a technique called Adaptive Dynamic Programming developed to solve these multi-objective tasks, but needed a fast and powerful critic to compute multiple expected rewards at once. Reservoir networks satisfy these criteria and proved capable of reaching different Pareto optimal solutions, depending on the preferences of the critic. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 26 / 33

Reservoirs in Speech Processing - Triefenbach (2010) Can reservoir networks compete with other state of the art techniques (HMM, Deepf Belief Nets) on the task of continuous phoneme recognition on the TIMIT dataset? Results The overall performance was comparable to more sophisticated approaches. The second layers improves the recognition rate by 3-5%, but subsequent layers only provide minimal gain. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 27 / 33

What do Reservoirs bring to Machine Learning? The reservoir computing paradigm provides scientists with a generic and powerful tool to tackle tasks that require temporal dependencies, without requiring hand-crafted data representations. With the above early success, the path is clear for other fields to use the paradigm. Simple ideas: continuous hand-written character recognition, stock market predictions... Felix Grezes (CUNY) Reservoir Computing September 4, 2014 28 / 33

Hardware Implementation - Appeltant 2011 Hardware implementation are always much faster than computer simulation. However, it is both expensive to create random circuitry, and inefficient since not all random topologies possess the appropriate properties. The authors test a tunable dynamical system, whose values are saved for a number of time-step. These delayed values act as virtual nodes of the reservoir, whose connections to the output layer a trained in classical reservoir fashion. The Mackey-Glass oscillator η [X (t τ)+γ J(t)] The state X evolves according to Ẋ (t) = X (t) + [1+X (t τ)+γ J(t)] p with η, γ, p tunable parameters. J(t) is the input vector. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 29 / 33

Other Randomized Networks The idea of separating the treatment of the output layer from the inner layers has also been applied to feedforward networks. Extreme Learning Machines - Huang (2011) Randomize weights for the hidden layers. Train only weights of the output layer. ELMs have also been proven to be universal approximators, and applied to both toy tasks and real world problems. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 30 / 33

Conclusion The simple reservoir computing paradigm states that recurrent neural networks can be efficiently trained by optimizing only the weights connected to the output neurons, leaving the weights of internal connections unchanged. The reservoir computing paradigm offers a theoretically sound approach to modeling dynamical systems with temporal dependencies. The paradigm also promises to be much more computationally efficient than traditional RNN approaches that aim at optimizing every weight of the network. Reservoir computing techniques have successfully been applied to classical artificial intelligence problems, providing state-of-the-art performance on engineering problems, and offering explanations of biological brain processes. Felix Grezes (CUNY) Reservoir Computing September 4, 2014 31 / 33

Future Work using Reservoirs Speech Lab @ Queens College: speech prosody analysis. NSF IGERT: Multi-disciplinary work (bio-medical data) I have already explored toy problems using reservoir networks, implemented using the OGER Python toolbox. 10 5 input 0 5 10 0 10 20 30 40 50 60 70 15 10 5 target output 0 5 10 15 0 10 20 30 40 50 60 70 Results of a simple 50 neuron random reservoir on a k-delay replication task Felix Grezes (CUNY) Reservoir Computing September 4, 2014 32 / 33

The End Thank You Felix Grezes (CUNY) Reservoir Computing September 4, 2014 33 / 33