Natural Actor-Critic. Authors: Jan Peters and Stefan Schaal Neurocomputing, Cognitive robotics 2008/2009 Wouter Klijn
|
|
- Antony Hubbard
- 5 years ago
- Views:
Transcription
1 Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn
2 Content Content / Introduction Actor-Critic Natural gradient Applications Conclusion References
3 Actor-Critic Separate memory structure for policy (Actor) and value function (Critic). After each action the critic evaluates the new state and returns an error. The actor and the critic are updated using this error. The Actor-Critic Architecture [2]
4 Actor-Critic: Notation The model in the article is based loosely on a MDP. - Discrete time - Continues state set: - Continues action set: The system: - Start state: drawn from a start-state distribution - At any state the actor chooses an action -The system transfers to a new state. -The system yields a reward after each action.
5 Actor-Critic: Functions Goal of the system is to find a policy This goal is reached by optimizing the normalized expected return as a function of the inputs With the differential : Problem: The meat and bones of the article gets lost in convoluted functions. Solution: Use a (presumably) known model/system that can be improved using the same method [4].
6 Actor-Critic: Simplified model Actor: - Universal function approximator e.g. Multi Layer Perceptron (MLP). - Gets error from the critic. - Gradient descent! Critic: Baseline (based on example data or a constant) times a function containing learned information combined with the reward.
7 Natural Gradient : Vanilla Gradient Descent The critic returns an error which in combination with the function approximator can be used to create an error function The partial differential of this error function, the gradient can now be used to update the internal variables in the function approximator (and critic). Gradient descent [3]
8 Natural Gradient: definition An alternative gradient to update the function approximator. Definition of the natural gradient: Where Matrix (FIM). denotes the transposed Fisher Information The FIM is a statistical construct that summarizes the mean and variation of the input data. Used in combination with the natural gradient FIM gives the direction of steepest descent [4].
9 Natural Gradient: Properties The natural gradient is a linear weighted version of the normal (vanilla) gradient. Convergence to a local minimum is guaranteed. By choosing a more direct path to optimal solution faster convergence is reached avoiding premature convergence. Covariant: Independent of the coordinate frame. Averages out stochasticity resulting in smaller datasets for estimating the correct data set. Gradient landscape for the vanilla and natural gradient. Adapted from [1]
10 Natural Gradient: plateaus The natural gradient is a solution to escape from plateaus in the gradient landscape. Plateaus are parts where the gradients of a function are extremely small. It takes considerate time to traverse these and are well know feature of gradient descent methods. Example function landscape showing multiple plateaus and the resulting error while traversing it with normal gradient steps (iterations) [5]
11 Applications: Cart-Pole Balancing Well known benchmark for reinforced learning [1] Unstable non-linear system that can be simulated. State: Action: Reward based on the current state with constant baseline. (Episodic Actor-Critic)
12 Applications: Cart-Pole Balancing Simulated experiment with a sample rate of 60 hertz, Comparing natural and vanilla gradient Actor-Critic algorithms. Results: The natural gradient implementation takes on average ten minutes to find an optimal solution. The vanilla gradient takes on average two hours. Expected return policy error averaged over 100 simulated runs
13 Applications: Baseball Optimizing nonlinear dynamic motor primitives for robotics. In plain English: Learning a robot to hit a ball. Shows the usage of a baseline for the critic: A teacher manipulating the robot. (LSTD-Q(λ) Actor-Critic) State, action and reward not explicitly given but are based on the motor primitives (and presumably a camera input): Optimal (red), POMDP (dashed) and Actor-Critic motor primitives.
14 Applications: Baseball The task of the robot is to hit the ball so that is flies as far as possible. The robot has seven degrees of freedom. Initially the robot is taught by supervised learning and fails. Subsequently the performance is improved by the Natural Actor-critic.
15 Applications: Baseball Both learning methods eventually learn their version of the best solution. However the POMDB requires 10^6 learning steps compared to 10^3 for the Natural Actor-Critic. Remarkable is that the Natural Actor-critic subjectively has a solution that is closer to the teacher/optimal solution.
16 Conclusions A novel policy-gradient reinforcement learning method. Two distinct flavors: -Episodic with a constant as a baseline function in the critic - LSTD-Q(λ) with a rich baseline (teacher) function. The improved functioning can be traced back to the usage of the improved natural gradient which uses statistical information of the input data to optimize changes in the used learning functions.
17 Conclusions The preliminary versions of the method have been implemented in wide range of real word applications: - Humanoid robots -Trafic light optimalisation - Multirobot systems - Gait optimalisation in robot locomotion.
18 References [1] J. Peters and S. Schaal, Natural Actor Critic. Neurocomputing, 2008 [2] R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction MIT Press, Cambridge, 1998 Web version: [3] [4] S. Amari Natural Gradient Works Efficiently in Learning Neural Computation 10, (1998) [5] K. Fukumizu, S. Amari Local Minima and Plateaus in Hierarchical Structures of Multilayer Perceptrons, Neural Networks, 2000
Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning
Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning Jan Peters 1, Stefan Schaal 1 University of Southern California, Los Angeles CA 90089, USA Abstract. In this paper, we
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More informationLearning Motor Behaviors: Past & Present Work
Stefan Schaal Computer Science & Neuroscience University of Southern California, Los Angeles & ATR Computational Neuroscience Laboratory Kyoto, Japan sschaal@usc.edu http://www-clmc.usc.edu Learning Motor
More informationDeep Reinforcement Learning
Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic
More informationIntroduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University
Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture
More informationLearning to bounce a ball with a robotic arm
Eric Wolter TU Darmstadt Thorsten Baark TU Darmstadt Abstract Bouncing a ball is a fun and challenging task for humans. It requires fine and complex motor controls and thus is an interesting problem for
More informationTwo steps Natural Actor Critic Learning for Underwater Cable Tracking
2 IEEE International Conference on Robotics and Automation Anchorage Convention District May 3-8, 2, Anchorage, Alaska, USA Two steps Natural Actor Critic Learning for Underwater Cable Tracking Andres
More informationA Brief Introduction to Reinforcement Learning
A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent
More informationApproximating a Policy Can be Easier Than Approximating a Value Function
Computer Science Technical Report Approximating a Policy Can be Easier Than Approximating a Value Function Charles W. Anderson www.cs.colo.edu/ anderson February, 2 Technical Report CS-- Computer Science
More informationReinforcement Learning and Shape Grammars
Reinforcement Learning and Shape Grammars Technical report Author Manuela Ruiz Montiel Date April 15, 2011 Version 1.0 1 Contents 0. Introduction... 3 1. Tabular approach... 4 1.1 Tabular Q-learning...
More informationMEMORY AUGMENTED CONTROL NETWORKS
MEMORY AUGMENTED CONTROL NETWORKS Arbaaz Khan, Clark Zhang, Nikolay Atanasov, Konstantinos Karydis, Vijay Kumar, Daniel D. Lee GRASP Laboratory, University of Pennsylvania Presented by Aravind Balakrishnan
More informationReinforcement Learning for Appearance Based Visual Servoing in Robotic Manipulation
Reinforcement Learning for Appearance Based Visual Servoing in Robotic Manipulation UMAR KHAN, LIAQUAT ALI KHAN, S. ZAHID HUSSAIN Department of Mechatronics Engineering AIR University E-9, Islamabad PAKISTAN
More informationLearning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation
Learning Learning agents Inductive learning Different Learning Scenarios Evaluation Slides based on Slides by Russell/Norvig, Ronald Williams, and Torsten Reil Material from Russell & Norvig, chapters
More informationReinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended)
Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Pavlos Andreadis, February 2 nd 2018 1 Markov Decision Processes A finite Markov Decision Process
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationLearn to Swing Up and Balance a Real Pole Based on Raw Visual Input Data
Learn to Swing Up and Balance a Real Pole Based on Raw Visual Input Data Jan Mattner*, Sascha Lange, and Martin Riedmiller Machine Learning Lab Department of Computer Science University of Freiburg 79110,
More informationActor-Critic Control with Reference Model Learning
Actor-Critic Control with Reference Model Learning Ivo Grondman Maarten Vaandrager Lucian Buşoniu Robert Babuška Erik Schuitema Delft Center for Systems and Control, Faculty 3mE, Delft University of Technology,
More informationarxiv: v1 [cs.ro] 10 May 2014
Efficient Reuse of Previous Experiences to Improve Policies in Real Environment arxiv:1405.2406v1 [cs.ro] 10 May 2014 Norikazu Sugimoto 1,3, Voot Tangkaratt 2, Thijs Wensveen 4, Tingting Zhao 2, Masashi
More informationReinforcement Learning for Parameterized Motor Primitives
2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006 Reinforcement Learning for Parameterized Motor Primitives Jan Peters
More informationPATTERN CLASSIFICATION AND SCENE ANALYSIS
PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane
More informationKinesthetic Teaching via Fast Marching Square
Kinesthetic Teaching via Fast Marching Square Javier V. Gómez, David Álvarez, Santiago Garrido and Luis Moreno Abstract This paper presents a novel robotic learning technique based on Fast Marching Square
More informationUsing Continuous Action Spaces to Solve Discrete Problems
Using Continuous Action Spaces to Solve Discrete Problems Hado van Hasselt Marco A. Wiering Abstract Real-world control problems are often modeled as Markov Decision Processes (MDPs) with discrete action
More informationADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL
ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN
More informationThe Organization of Cortex-Ganglia-Thalamus to Generate Movements From Motor Primitives: a Model for Developmental Robotics
The Organization of Cortex-Ganglia-Thalamus to Generate Movements From Motor Primitives: a Model for Developmental Robotics Alessio Mauro Franchi 1, Danilo Attuario 2, and Giuseppina Gini 3 Department
More informationLecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 1 Notes Outline 1. Machine Learning What is it? Classification vs. Regression Error Training Error vs. Test Error 2. Linear Classifiers Goals and Motivations
More informationHuman-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015
Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015 Content Demo Framework Remarks Experiment Discussion Content Demo Framework Remarks Experiment Discussion
More informationWestminsterResearch
WestminsterResearch http://www.westminster.ac.uk/research/westminsterresearch Reinforcement learning in continuous state- and action-space Barry D. Nichols Faculty of Science and Technology This is an
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes
More informationA Systematic Overview of Data Mining Algorithms
A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a
More informationNeural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.
Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric
More informationResidual Advantage Learning Applied to a Differential Game
Presented at the International Conference on Neural Networks (ICNN 96), Washington DC, 2-6 June 1996. Residual Advantage Learning Applied to a Differential Game Mance E. Harmon Wright Laboratory WL/AAAT
More informationTeaching a robot to perform a basketball shot using EM-based reinforcement learning methods
Teaching a robot to perform a basketball shot using EM-based reinforcement learning methods Tobias Michels TU Darmstadt Aaron Hochländer TU Darmstadt Abstract In this paper we experiment with reinforcement
More informationNeuro-Dynamic Programming An Overview
1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. May 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly applicable,
More informationMODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS
MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS Deniz Erdogmus, Justin C. Sanchez 2, Jose C. Principe Computational NeuroEngineering Laboratory, Electrical & Computer
More informationLocally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005
Locally Weighted Learning for Control Alexander Skoglund Machine Learning Course AASS, June 2005 Outline Locally Weighted Learning, Christopher G. Atkeson et. al. in Artificial Intelligence Review, 11:11-73,1997
More informationApproximate Linear Successor Representation
Approximate Linear Successor Representation Clement A. Gehring Computer Science and Artificial Intelligence Laboratory Massachusetts Institutes of Technology Cambridge, MA 2139 gehring@csail.mit.edu Abstract
More informationNeuro-Fuzzy Inverse Forward Models
CS9 Autumn Neuro-Fuzzy Inverse Forward Models Brian Highfill Stanford University Department of Computer Science Abstract- Internal cognitive models are useful methods for the implementation of motor control
More informationICA as a preprocessing technique for classification
ICA as a preprocessing technique for classification V.Sanchez-Poblador 1, E. Monte-Moreno 1, J. Solé-Casals 2 1 TALP Research Center Universitat Politècnica de Catalunya (Catalonia, Spain) enric@gps.tsc.upc.es
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationReinforcement Learning with Parameterized Actions
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Reinforcement Learning with Parameterized Actions Warwick Masson and Pravesh Ranchod School of Computer Science and Applied
More information291 Programming Assignment #3
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationGetting a kick out of humanoid robotics
Getting a kick out of humanoid robotics Using reinforcement learning to shape a soccer kick Christiaan W. Meijer Getting a kick out of humanoid robotics Using reinforcement learning to shape a soccer
More informationCommon Subspace Transfer for Reinforcement Learning Tasks
Common Subspace Transfer for Reinforcement Learning Tasks ABSTRACT Haitham Bou Ammar Institute of Applied Research Ravensburg-Weingarten University of Applied Sciences, Germany bouammha@hs-weingarten.de
More informationQuadruped Robots and Legged Locomotion
Quadruped Robots and Legged Locomotion J. Zico Kolter Computer Science Department Stanford University Joint work with Pieter Abbeel, Andrew Ng Why legged robots? 1 Why Legged Robots? There is a need for
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationOptimization. Industrial AI Lab.
Optimization Industrial AI Lab. Optimization An important tool in 1) Engineering problem solving and 2) Decision science People optimize Nature optimizes 2 Optimization People optimize (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)
More informationReinforcement Control via Heuristic Dynamic Programming. K. Wendy Tang and Govardhan Srikant. and
Reinforcement Control via Heuristic Dynamic Programming K. Wendy Tang and Govardhan Srikant wtang@ee.sunysb.edu and gsrikant@ee.sunysb.edu Department of Electrical Engineering SUNY at Stony Brook, Stony
More informationClustering with Reinforcement Learning
Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of
More informationRobotic Search & Rescue via Online Multi-task Reinforcement Learning
Lisa Lee Department of Mathematics, Princeton University, Princeton, NJ 08544, USA Advisor: Dr. Eric Eaton Mentors: Dr. Haitham Bou Ammar, Christopher Clingerman GRASP Laboratory, University of Pennsylvania,
More informationCombining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving
Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving Xi Xiong Jianqiang Wang Fang Zhang Keqiang Li State Key Laboratory of Automotive Safety and Energy, Tsinghua University
More informationEfficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper
More informationSample-Efficient Reinforcement Learning for Walking Robots
Sample-Efficient Reinforcement Learning for Walking Robots B. Vennemann Delft Robotics Institute Sample-Efficient Reinforcement Learning for Walking Robots For the degree of Master of Science in Mechanical
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationPolicy gradient methods with model predictive control applied to ball bouncing
Policy gradient methods with model predictive control applied to ball bouncing Paul Kulchenko Department of Computer Science and Engineering University of Washington, Seattle, WA Email: paul@kulchenko.com
More informationReinforcement Learning of Walking Behavior for a Four-Legged Robot
TuM01-3 Reinforcement Learning of Walking Behavior for a Four-Legged Robot Hajime Kimura GEN~FE.DIS.TITECH.AC.JP Toru Yamashita YAMA~FE.DIS.TITECH.AC.JP Shigenobu Kobayashi KOBAYASI~DIS.TITECH.AC.JP Tokyo
More informationRobot learning for ball bouncing
Robot learning for ball bouncing Denny Dittmar Denny.Dittmar@stud.tu-darmstadt.de Bernhard Koch Bernhard.Koch@stud.tu-darmstadt.de Abstract For robots automatically learning to solve a given task is still
More informationToday. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps
Today Gradient descent for minimization of functions of real variables. Multi-dimensional scaling Self-organizing maps Gradient Descent Derivatives Consider function f(x) : R R. The derivative w.r.t. x
More information0.1. alpha(n): learning rate
Least-Squares Temporal Dierence Learning Justin A. Boyan NASA Ames Research Center Moett Field, CA 94035 jboyan@arc.nasa.gov Abstract TD() is a popular family of algorithms for approximate policy evaluation
More informationLocally Weighted Learning
Locally Weighted Learning Peter Englert Department of Computer Science TU Darmstadt englert.peter@gmx.de Abstract Locally Weighted Learning is a class of function approximation techniques, where a prediction
More informationCharacter Recognition Using Convolutional Neural Networks
Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract
More informationICRA 2012 Tutorial on Reinforcement Learning I. Introduction
ICRA 2012 Tutorial on Reinforcement Learning I. Introduction Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt Motivational Example: Helicopter Control Unstable Nonlinear Complicated dynamics Air flow
More informationVisual object classification by sparse convolutional neural networks
Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.
More informationResearch on time optimal trajectory planning of 7-DOF manipulator based on genetic algorithm
Acta Technica 61, No. 4A/2016, 189 200 c 2017 Institute of Thermomechanics CAS, v.v.i. Research on time optimal trajectory planning of 7-DOF manipulator based on genetic algorithm Jianrong Bu 1, Junyan
More informationPlanning and Control: Markov Decision Processes
CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What
More informationModule 1 Lecture Notes 2. Optimization Problem and Model Formulation
Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization
More informationCan Active Memory Replace Attention?
Google Brain NIPS 2016 Presenter: Chao Jiang NIPS 2016 Presenter: Chao Jiang 1 / Outline 1 Introduction 2 Active Memory 3 Step by Step to Neural GPU 4 Another two steps: 1. the Markovian Neural GPU 5 Another
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationIEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999 1271 Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming Bao-Liang Lu, Member, IEEE, Hajime Kita, and Yoshikazu
More informationLogistic Regression
Logistic Regression ddebarr@uw.edu 2016-05-26 Agenda Model Specification Model Fitting Bayesian Logistic Regression Online Learning and Stochastic Optimization Generative versus Discriminative Classifiers
More informationLearning of a ball-in-a-cup playing robot
Learning of a ball-in-a-cup playing robot Bojan Nemec, Matej Zorko, Leon Žlajpah Robotics Laboratory, Jožef Stefan Institute Jamova 39, 1001 Ljubljana, Slovenia E-mail: bojannemec@ijssi Abstract In the
More informationHierarchical Reinforcement Learning for Robot Navigation
Hierarchical Reinforcement Learning for Robot Navigation B. Bischoff 1, D. Nguyen-Tuong 1,I-H.Lee 1, F. Streichert 1 and A. Knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701
More informationInverse KKT Motion Optimization: A Newton Method to Efficiently Extract Task Spaces and Cost Parameters from Demonstrations
Inverse KKT Motion Optimization: A Newton Method to Efficiently Extract Task Spaces and Cost Parameters from Demonstrations Peter Englert Machine Learning and Robotics Lab Universität Stuttgart Germany
More informationA fast point-based algorithm for POMDPs
A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationARTICLE IN PRESS Neural Networks ( )
Neural Networks ( ) Contents lists available at ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet 2008 Special Issue Reinforcement learning of motor skills with policy gradients
More informationModel learning for robot control: a survey
Model learning for robot control: a survey Duy Nguyen-Tuong, Jan Peters 2011 Presented by Evan Beachly 1 Motivation Robots that can learn how their motors move their body Complexity Unanticipated Environments
More information5 Machine Learning Abstractions and Numerical Optimization
Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer
More informationBayesian update of dialogue state: A POMDP framework for spoken dialogue systems
Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems Blaise Thomson and Steve Young University of Cambridge Abstract This paper describes a statistically motivated framework
More informationSimulated Transfer Learning Through Deep Reinforcement Learning
Through Deep Reinforcement Learning William Doan Griffin Jarmin WILLRD9@VT.EDU GAJARMIN@VT.EDU Abstract This paper encapsulates the use reinforcement learning on raw images provided by a simulation to
More informationNotes on Multilayer, Feedforward Neural Networks
Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book
More informationLearning Parameterized Skills
Bruno Castro da Silva bsilva@cs.umass.edu Autonomous Learning Laboratory, Computer Science Dept., University of Massachusetts Amherst, 13 USA. George Konidaris gdk@csail.mit.edu MIT Computer Science and
More informationReinforcement learning for imitating constrained reaching movements
Advanced Robotics, Vol. 21, No. 13, pp. 1521 1544 (2007) VSP and Robotics Society of Japan 2007. Also available online - www.brill.nl/ar Full paper Reinforcement learning for imitating constrained reaching
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationGradient Reinforcement Learning of POMDP Policy Graphs
1 Gradient Reinforcement Learning of POMDP Policy Graphs Douglas Aberdeen Research School of Information Science and Engineering Australian National University Jonathan Baxter WhizBang! Labs July 23, 2001
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationA Framework for adaptive focused web crawling and information retrieval using genetic algorithms
A Framework for adaptive focused web crawling and information retrieval using genetic algorithms Kevin Sebastian Dept of Computer Science, BITS Pilani kevseb1993@gmail.com 1 Abstract The web is undeniably
More informationImage Registration Lecture 4: First Examples
Image Registration Lecture 4: First Examples Prof. Charlene Tsai Outline Example Intensity-based registration SSD error function Image mapping Function minimization: Gradient descent Derivative calculation
More informationCHAPTER VI BACK PROPAGATION ALGORITHM
6.1 Introduction CHAPTER VI BACK PROPAGATION ALGORITHM In the previous chapter, we analysed that multiple layer perceptrons are effectively applied to handle tricky problems if trained with a vastly accepted
More informationUnivariate and Multivariate Decision Trees
Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each
More informationSupervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.
Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce
More informationDr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically
Supervised Learning in Neural Networks (Part 1) A prescribed set of well-defined rules for the solution of a learning problem is called a learning algorithm. Variety of learning algorithms are existing,
More informationIN recent years, neural networks have attracted considerable attention
Multilayer Perceptron: Architecture Optimization and Training Hassan Ramchoun, Mohammed Amine Janati Idrissi, Youssef Ghanou, Mohamed Ettaouil Modeling and Scientific Computing Laboratory, Faculty of Science
More informationREINFORCEMENT LEARNING WITH HIGH-DIMENSIONAL, CONTINUOUS ACTIONS
REINFORCEMENT LEARNING WITH HIGH-DIMENSIONAL, CONTINUOUS ACTIONS Leemon C. Baird III and A. Harry Klopf Technical Report WL-TR-93-1147 Wright Laboratory Wright-Patterson Air Force Base, OH 45433-7301 Address:
More informationGossip Learning. Márk Jelasity
Gossip Learning Márk Jelasity 2 3 Motivation Explosive growth of smart phone platforms, and Availability of sensor and other contextual data Makes collaborative data mining possible Health care: following
More informationContinuous Valued Q-learning for Vision-Guided Behavior Acquisition
Continuous Valued Q-learning for Vision-Guided Behavior Acquisition Yasutake Takahashi, Masanori Takeda, and Minoru Asada Dept. of Adaptive Machine Systems Graduate School of Engineering Osaka University
More informationPerformance Comparison of Sarsa(λ) and Watkin s Q(λ) Algorithms
Performance Comparison of Sarsa(λ) and Watkin s Q(λ) Algorithms Karan M. Gupta Department of Computer Science Texas Tech University Lubbock, TX 7949-314 gupta@cs.ttu.edu Abstract This paper presents a
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationPredict the box office of US movies
Predict the box office of US movies Group members: Hanqing Ma, Jin Sun, Zeyu Zhang 1. Introduction Our task is to predict the box office of the upcoming movies using the properties of the movies, such
More information