SIMULATION BASED REINFORCEMENT LEARNING FOR PATH TRACKING ROBOT
|
|
- Lucy Smith
- 5 years ago
- Views:
Transcription
1 SIMULATION BASED REINFORCEMENT LEARNING FOR PATH TRACKING ROBOT Tony 1, M. Rahmat Widyanto 2 1 Department of Computer System, Faculty of Information Technology, Tarumanagara University Jl. Letjen S. Parman No. 1 Jakarta Faculty of Computer Science, University of Indonesia Kampus Baru UI Depok tony.b@fti.utara.org 1, widyanto@cs.ui.ac.id 2 ABSTRACT In this paper, we present a simulation based reinforcement learning for path tracking robot using Q-Learning algorithm. Q-Learning is reinforcement learning technique used mainly in robotics. This technique will estimate the value of state-action and choose the maximum reward value for each action so the shortest path will be taken. The simulation based on Q-Learning has been conducted in an environment which consists of eight rooms with one room as the goal. Our experiment shows that the algorithm can choose actions with maximum reward to find the shortest path to reach the goal. Keywords: path tracking, Q-Learning, reinforcement learning, shortest path 1 INTRODUCTION Reinforcement learning is a sub-area of machine learning which concerned in learning process by rewards and punishments. An agent ought to take actions in an environment so as to maximize the reward value. Reinforcement learning algorithm will find a policy that maps states of the environment to the actions that agent should take in those states. Commonly, the basic reinforcement learning model consists of environment states, actions, and rewards. Figure 1 shows a typical reinforcement learning system. An agent receives description of the environment, which are called states, and choose actions to perform. The effect of an action to the environment is evaluated and fed back to the agent in the form of positive or negative rewards. The mission of the agent is to find the action rules to achieve maximum rewards through its interaction with the environment. One of the most important breakthroughs in reinforcement learning was the development of Q-Learning by Watkins in 1989 [1]. In this paper, we present simulation based reinforcement learning for path tracking robot using the famous Q-Learning algorithm. The rest of the paper is organized as follows: section 2 introduces Q-Learning algorithm, environment model, and experiment. Section 3 discusses the result of simulation, while section 4 concludes the paper. Figure 1. A Typical Structure of Reinforcement Learning 2 ALGORITHM, ENVIRONMENT MODEL, AND EXPERIMENT This section describes the Q-Learning algorithm, its environment model, and experiment. 2.1 Q-Learning Algorithm The task of reinforcement learning is generally stated as follows [2]: For each transition of the system from one state to another, a value called reward is assigned. The system receives the reward after the transition is carried out. The purpose is to find a control policy that maximizes the expected discounted amount of the reward known as a return. The value function is a prediction of return value of any state : k V ( xt ) E γ. rt + k k = 0 where r t is the reward received in transition from state x t to x t+1 and γ is the discount factor (0 γ 1). V(x t ) is the discounted amount of the reward since time t. This amount depends on the sequence of actions chosen which are determined by the 289
2 290 The 5th International Conference on Information & Communication Technology and Systems policy of control. The system has to find a control policy that maximizes V(x t ) for each state. The Q-Learning algorithm does not work with the value function directly. It employs the Q-function whose argument might be not only a state, but also an action. This enables one to construct the Q-function by iterative method and thus find an optimal control policy. The expression for the Q-Learning function describes like this: Q( xt ) rt + γ. V ( xt + 1) a t is an action chosen at time t out of set of all possible actions. Because the purpose of the system is to maximize the sum total of the reward, V(x t+1 ) is replaced by max Q(x t+1,a t+1 ) and as a result, the following function is obtained: Q( xt ) rt + γ. max Q( xt ) Q values are stored in a matrix whose inputs are a state and an action. In systems which employ Q-Learning, the expression above is usually combined with the method of temporal difference, TD(λ), proposed by Sutton [3]. If the parameter of temporal difference λ is equal to zero, the method is called single step Q-Learning because only the current and the next value of the prediction of Q-values participate in the update. The function for single step Q-Learning describes as follows. Q( xt, at ) Q( xt ) + α.( rt + γ. max Q( xt+ 1, at + 1) Q( xt )) Single step Q-Learning is a reinforcement learning technique which learn an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter. It leads an agent to acquire optimal control strategies from delayed rewards, even when there is no prior knowledge of the effects of its actions on the environment [4]. Single step Q-Learning algorithm is described as follows [5]: Given: state diagram with a goal state (R matrix) Find: shortest path from any initial state to the goal state (Q matrix) 1. Set parameter γ, and reward (R) matrix 2. Initialize Q matrix as zero matrix 3. For each episode: Select random initial state Do while not reach goal state a. Select one among all possible actions for the current state b. Using this possible action, consider go to the next state c. Get maximum Q value of this next state based on all possible actions d. Compute: Q(state,action) = R(state,action) + γ. max [Q(next state, all actions)] e. Set the next state as the current state end do end for 2.2 Environment Model The environment model consists of eight rooms which connected by certain doors as shown in Figure 2. Each room is labeled from A to H. Room D is the target room or goal. Notice that there is only one door which connected to the goal room, that is through room C. Figure 2. Environment model The environment model in Figure 2 can be represented by graph, where each room as a vertex and each door as an edge. The graph is shown in the Figure 3. Figure 3. Graph of environment The goal room is the node D. Each door or edge of the graph has a reward value. The edge that lead immediately to the goal has instant reward of 100 (see Figure 4). Others that do not have direct connection to the target room have zero reward. Because the door is two way (from A can go to E and from E can go back to A), we assign two arrows to each room of the previous graph. Each arrow contains a reward value. The above graph becomes state diagram as shown in the Figure 4. Additional loop with highest reward (100) is given to the goal room (D back to D) so if the agent arrives at the goal, it will remain there forever. This is called
3 C44 - Simulation Based Reinforcement Learning For Path Tracking Robot - Tony 291 absorbing goal because when it reaches the goal state, it will stay in there. Figure 4. State Diagram The virtual robot acts as an agent that can pass one room to another without knowledge of the environment. It does not know which sequence of doors to pass to go to the target room. It will learn through experience using Q-Learning. Suppose the agent is in room A, it will learn to reach room D as goal (see Figure 2). Each room in the environment is called as a state. Agent's movement from one room to another room is called as an action. State is represented by node in the state diagram, while action is represented by arrow (see Figure 4). Suppose the agent is in state A. From state A, the agent can not directly go to state B because there is no direct door or arrow connecting state A and B, it just can go to the state E because the state E is connected to A. From state E, the agent just can go to state F. From state F, the agent can go either to state B or state G or back to state E (look at the arrow out of state D). From state B, the agent can go either to state C or back to state F. From state C, the agent can go either to state D or state G or back to state B. From state G, the agent can go either to state C or state H or back to state F. From state H, the agent just can go to state G. If the agent is in state D, it will remain there because state D is the target room. The state diagram and reward values can be illustrated into the following reward table or R matrix. The negative value in the matrix means that the row state has no action to go to column state. For example, state A can not go to state B, C, D, F, G, and H because there is no connecting door. Zero means that the action from one state to other state is not the goal. The reward of 100 is given to action from one state to goal state. 2.3 Experiment In our experiment, we use javascript, PHP (Personal Home Page), and HTML (Hyper Text Mark-up Language) as programming tools. To support that, we need XAMPP (X-any of operating systems, Apache, MySQL, PHP, and Perl) as web server and internet browser (such as Internet Explorer or Mozilla Firefox) which supports javascript to run the simulation. After XAMPP is installed, folder which contains the simulation program must be copied to folder C:\ProgramFiles\XAMPP\htdocs. To run the program, open internet browser, type on the address bar. Action to go to state Agent now in state A B C D E F G H A B C R = D E F G H Figure 5. R Matrix The implementation of Q-learning algorithm is described as follows: Set the value of learning parameter γ = 0.8 and reward (R) matrix. The reward value of each action are: R[A,E] = 0; R[B,C] = 0; R[B,F] = 0; R[C,B] = 0; R[C,D] = 100; R[C,G] = 0; R[D,C] = 0; R[D,D] = 100; R[E,A] = 0; R[E,F] = 0; R[F,B] = 0; R[F,E] = 0; R[F,G] = 0; R[G,C] = 0; R[G,F] = 0; R[G,H] = 0; R[G,H] = 0. The R matrix is shown in Figure 5 above. Initialize Q matrix as a zero matrix. Initial value each state for all action is zero. Q matrix is described in Figure below. state/action A B C D E F G H A B C Q = D E F G H Figure 6. Q Matrix For each episode, set initial state (s t ) by random selection. Using this possible action, go to the next state. Look at the fourth column (action to go to
4 292 The 5th International Conference on Information & Communication Technology and Systems state D) of R matrix. There is only one possible state to go to state D. It is state C. We call it as Q(C,D). Get the maximum Q value of this state based on all possible action. The action is Q(C,D), so we count the maximum Q value of state D: max[q(next state, all actions) or max [Q(D,D)] because state D is the goal state. Then, compute the Q value. Q(state,action) = R(state,action) + γ. max[q(next state, all actions)] or Q(s t,a t ) = R(s t,a t ) + γ.max[q(s t+1,a t+1 )] Q(C,D) = R(C,D) + γ * max[q(d,d)] = * 0 = 100 Because state D is the goal state, then we finish one episode. The updated Q matrix is: From room F, choose room B or room G with reward value 64. From room B or room G, choose room C with reward value 80. From room C, go to room D (goal state) with reward value 100. (a). Agent & Environment Model Figure 7. Updated Q Matrix Set s t as s t+1. Choose the initial state which can go to state C. There are two possible states, state B and state G. So the actions are Q(B,C) and Q(G,C). Compute the Q value again using the updated Q matrix (or do step c, d, and e from Q-Learning algorithm above). The agent will learn more and more experiences through many episodes until the value Q matrix reach convergence. If the convergence value is reached, the agent will choose the shortest path to go to the goal state. 3 RESULT The simulation of path tracking is shown in Figure 8. The Q matrix in the simulation program is the convergence value. Figure 9 shows the convergence Q matrix. The convergence Q matrix can be represented in graph (see Figure 10). The reward (R) matrix in the simulation program is same as the R matrix in Figure 5. In our simulation program, we can locate any state as initial state. The goal state is state D. If we start from state D, the agent does not need to find the shortest path to go to goal state. It will remain there because state D is the goal state. As first example, we choose room A as initial state. We describe Figure 10 as follow: Start from room A, choose room E with reward value 41. From room E, choose room F with reward value 51. (b). Convergence Q Matrix (c). R Matrix Figure 8. Screen shot simulation program state/action A B C D E F G H A B C Q = D E F G H Figure 9. The convergence Q matrix Based on reward value, the shortest path from initial state (room A) to goal state (room D) are A-E-F-B-C-D or A-E-F-G-C-D. There are
5 C44 - Simulation Based Reinforcement Learning For Path Tracking Robot - Tony 293 two options of shortest path because the reward value from F to B and from F to G are similar. As discussed above, we can locate any room as initial state. For example, we choose room H as initial state. The result is shown in Figure 11. Figure 10. The graph of convergence Q matrix with room A as initial state especially for path tracking. Our experiment shows that by using the simulation program, the reinforcement learning technique using Q-Learning algorithm can be used to find the shortest path for path tracking robot. The simulation shows that there were two shortest path from initial state A to goal state D. The path are A-E-F-B-C-D and A-E-F-G-C- D. Based on reward value, we found that there was the same value in some actions. This is the weakness of Q-Learning algorithm. The algorithm only consider one step ahead and update the value of Q-matrix for one action. In the future, we will consider stronger complex algorithm such as Q(λ) algorithm which is an on-line multi step learning algorithm also developed by Watkins [6] used to perform faster and more accurate actions in finding a shortest path. REFERENCES Figure 11. The graph of convergence Q matrix with room H as initial state Figure 11 can be explained as follow: Start from room H, choose room G with reward value 64. From room G, choose room C with reward value 80. From room C, go to room D (goal state) with reward value 100. Based on reward value, the shortest path from initial state (room H) to goal state (room D) are H-G-C-D. 4 CONCLUSION AND DISCUSSION As reinforcement learning technique, Q-Learning algorithm developed by Watkins in 1989 can be used in many robotics application, [1] C. J. C. H. Watkins (1989). Learning From Delayed Rewards. PhD Dissertation, Cambridge University. [2] Valery Kuzmin (2002). Connectionist Q-Learning In Robot Control Task. Scientific Proceedings of Riga Technical University. [3] R. S. Sutton (1988). Learning to Predict by Methods of Temporal Differences. In Machine Learning 3, Kluwer Academic Publishers, Boston, pp [4] C. Clausen and H. Wechsler (2000). Quad Q-Learning. IEEE Trans. On Neural Network 11: [5] Kardi T (2005). Q-Learning by Examples [Online]. Available at: forcementlearning/index.html [Accessed: March 3 rd,2009]. [6] Hyun-Chang Y, et al. (2007). Hexagon-Based Q-Learning Algorithm and Applications. International Journal of Control, Automation, and Systems 5:
6 294 The 5th International Conference on Information & Communication Technology and Systems Figure 12 (a) Start from A and go to E Figure 12 (f) Reach the goal state D Figure 12 (b) In E and go to F Figure 13 (a) Start from H and go to G Figure 12 (c) In F and go to B Figure 13 (b) In G and go to C Figure 12 (d) In B and go to C Figure 13 (c) In C and go to D Figure 12 (e) In C and go to D Figure 13 (d) Reach the goal state D
Reinforcement Learning and Shape Grammars
Reinforcement Learning and Shape Grammars Technical report Author Manuela Ruiz Montiel Date April 15, 2011 Version 1.0 1 Contents 0. Introduction... 3 1. Tabular approach... 4 1.1 Tabular Q-learning...
More informationAlgorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10
Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes
More informationPerformance Comparison of Sarsa(λ) and Watkin s Q(λ) Algorithms
Performance Comparison of Sarsa(λ) and Watkin s Q(λ) Algorithms Karan M. Gupta Department of Computer Science Texas Tech University Lubbock, TX 7949-314 gupta@cs.ttu.edu Abstract This paper presents a
More informationADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL
ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN
More informationUnsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1
Unsupervised CS 3793/5233 Artificial Intelligence Unsupervised 1 EM k-means Procedure Data Random Assignment Assign 1 Assign 2 Soft k-means In clustering, the target feature is not given. Goal: Construct
More informationContinuous Valued Q-learning for Vision-Guided Behavior Acquisition
Continuous Valued Q-learning for Vision-Guided Behavior Acquisition Yasutake Takahashi, Masanori Takeda, and Minoru Asada Dept. of Adaptive Machine Systems Graduate School of Engineering Osaka University
More informationA Brief Introduction to Reinforcement Learning
A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent
More informationReinforcement Learning-Based Path Planning for Autonomous Robots
Reinforcement Learning-Based Path Planning for Autonomous Robots Dennis Barrios Aranibar 1, Pablo Javier Alsina 1 1 Laboratório de Sistemas Inteligentes Departamento de Engenharia de Computação e Automação
More informationIntroduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University
Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:
More informationˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS Summer Introduction to Artificial Intelligence Midterm ˆ You have approximately minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. ˆ Mark your answers
More informationThe exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS Summer Introduction to Artificial Intelligence Midterm You have approximately minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark your answers
More informationPer-decision Multi-step Temporal Difference Learning with Control Variates
Per-decision Multi-step Temporal Difference Learning with Control Variates Kristopher De Asis Department of Computing Science University of Alberta Edmonton, AB T6G 2E8 kldeasis@ualberta.ca Richard S.
More informationRobotic Search & Rescue via Online Multi-task Reinforcement Learning
Lisa Lee Department of Mathematics, Princeton University, Princeton, NJ 08544, USA Advisor: Dr. Eric Eaton Mentors: Dr. Haitham Bou Ammar, Christopher Clingerman GRASP Laboratory, University of Pennsylvania,
More informationPath Planning for a Robot Manipulator based on Probabilistic Roadmap and Reinforcement Learning
674 International Journal Jung-Jun of Control, Park, Automation, Ji-Hun Kim, and and Systems, Jae-Bok vol. Song 5, no. 6, pp. 674-680, December 2007 Path Planning for a Robot Manipulator based on Probabilistic
More informationA Fuzzy Reinforcement Learning for a Ball Interception Problem
A Fuzzy Reinforcement Learning for a Ball Interception Problem Tomoharu Nakashima, Masayo Udo, and Hisao Ishibuchi Department of Industrial Engineering, Osaka Prefecture University Gakuen-cho 1-1, Sakai,
More informationDetermination of 3-D Image Viewpoint Using Modified Nearest Feature Line Method in Its Eigenspace Domain
Determination of 3-D Image Viewpoint Using Modified Nearest Feature Line Method in Its Eigenspace Domain LINA +, BENYAMIN KUSUMOPUTRO ++ + Faculty of Information Technology Tarumanagara University Jl.
More informationPascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering
Study of Multiple Multiagent Reinforcement Learning Algorithms in Grid Games by Pascal De Beck-Courcelle A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of
More informationAdaptive Building of Decision Trees by Reinforcement Learning
Proceedings of the 7th WSEAS International Conference on Applied Informatics and Communications, Athens, Greece, August 24-26, 2007 34 Adaptive Building of Decision Trees by Reinforcement Learning MIRCEA
More informationMarkov Decision Processes (MDPs) (cont.)
Markov Decision Processes (MDPs) (cont.) Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 29 th, 2007 Markov Decision Process (MDP) Representation State space: Joint state x
More informationHierarchical Reinforcement Learning for Robot Navigation
Hierarchical Reinforcement Learning for Robot Navigation B. Bischoff 1, D. Nguyen-Tuong 1,I-H.Lee 1, F. Streichert 1 and A. Knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701
More informationAn Actor-Critic Algorithm using a Binary Tree Action Selector
Trans. of the Society of Instrument and Control Engineers Vol.E-4, No.1, 1/9 (27) An Actor-Critic Algorithm using a Binary Tree Action Selector Reinforcement Learning to Cope with Enormous Actions Hajime
More informationarxiv: v1 [cs.cv] 2 Sep 2018
Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering
More informationQ-learning with linear function approximation
Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007
More informationMarkov Decision Processes and Reinforcement Learning
Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence
More informationHierarchically Optimal Average Reward Reinforcement Learning
Hierarchically Optimal Average Reward Reinforcement Learning Mohammad Ghavamzadeh mgh@cs.umass.edu Sridhar Mahadevan mahadeva@cs.umass.edu Department of Computer Science, University of Massachusetts Amherst,
More informationMidterm I. Introduction to Artificial Intelligence. CS 188 Fall You have approximately 3 hours.
CS 88 Fall 202 Introduction to Artificial Intelligence Midterm I You have approximately 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators
More informationReinforcement Learning (2)
Reinforcement Learning (2) Bruno Bouzy 1 october 2013 This document is the second part of the «Reinforcement Learning» chapter of the «Agent oriented learning» teaching unit of the Master MI computer course.
More informationLearning State-Action Basis Functions for Hierarchical MDPs
Sarah Osentoski Sridhar Mahadevan University of Massachusetts Amherst, 140 Governor s Drive, Amherst, MA 01002 SOSENTOS@CS.UMASS.EDU MAHADEVA@CS.UMASS.EDU Reinforcement Learning, Semi-Markov Decision Processes,
More informationMarco Wiering Intelligent Systems Group Utrecht University
Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment
More informationPredictive Autonomous Robot Navigation
Predictive Autonomous Robot Navigation Amalia F. Foka and Panos E. Trahanias Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece and Department of Computer
More informationA Reinforcement Learning Approach to Automated GUI Robustness Testing
A Reinforcement Learning Approach to Automated GUI Robustness Testing Sebastian Bauersfeld and Tanja E. J. Vos Universitat Politècnica de València, Camino de Vera s/n, 46022, Valencia, Spain {sbauersfeld,tvos}@pros.upv.es
More informationAmpliación de Bases de Datos
1. Introduction to In this course, we are going to use: Apache web server PHP installed as a module for Apache Database management system MySQL and the web application PHPMyAdmin to administrate it. It
More informationReinforcement Learning in Discrete and Continuous Domains Applied to Ship Trajectory Generation
POLISH MARITIME RESEARCH Special Issue S1 (74) 2012 Vol 19; pp. 31-36 10.2478/v10012-012-0020-8 Reinforcement Learning in Discrete and Continuous Domains Applied to Ship Trajectory Generation Andrzej Rak,
More informationLecture 18: Video Streaming
MIT 6.829: Computer Networks Fall 2017 Lecture 18: Video Streaming Scribe: Zhihong Luo, Francesco Tonolini 1 Overview This lecture is on a specific networking application: video streaming. In particular,
More informationGeneralization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
Advances in Neural Information Processing Systems 8, pp. 1038-1044, MIT Press, 1996. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding Richard S. Sutton University
More informationInternet programming Lab. Lecturer Mariam A. Salih
Internet programming Lab. Lecturer Mariam A. Salih The Internet : The Internet is a worldwide network of computer systems through which information can be easily shared. Browsers : To view information
More informationMidterm I. Introduction to Artificial Intelligence. CS 188 Fall You have approximately 3 hours.
CS 88 Fall 202 Introduction to Artificial Intelligence Midterm I You have approximately 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators
More information6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION
6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm
More information15-780: MarkovDecisionProcesses
15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic
More informationA DECENTRALIZED REINFORCEMENT LEARNING CONTROLLER FOR COLLABORATIVE DRIVING. Luke Ng, Chris Clark, Jan P. Huissoon
A DECENTRALIZED REINFORCEMENT LEARNING CONTROLLER FOR COLLABORATIVE DRIVING Luke Ng, Chris Clark, Jan P. Huissoon Department of Mechanical Engineering, Automation & Controls Group University of Waterloo
More informationDynamic Analysis of Structures Using Neural Networks
Dynamic Analysis of Structures Using Neural Networks Alireza Lavaei Academic member, Islamic Azad University, Boroujerd Branch, Iran Alireza Lohrasbi Academic member, Islamic Azad University, Boroujerd
More informationMáté Lengyel, Peter Dayan: Hippocampal contributions to control: the third way
Máté Lengyel, Peter Dayan: Hippocampal contributions to control: the third way David Nagy journal club at 1 markov decision processes 2 model-based & model-free control 3 a third way 1 markov decision
More informationReinforcement learning of competitive and cooperative skills in soccer agents
Edith Cowan University Research Online ECU Publications 2011 2011 Reinforcement learning of competitive and cooperative skills in soccer agents Jinsong Leng Edith Cowan University Chee Lim 10.1016/j.asoc.2010.04.007
More informationReinforcement Learning for Appearance Based Visual Servoing in Robotic Manipulation
Reinforcement Learning for Appearance Based Visual Servoing in Robotic Manipulation UMAR KHAN, LIAQUAT ALI KHAN, S. ZAHID HUSSAIN Department of Mechatronics Engineering AIR University E-9, Islamabad PAKISTAN
More informationCSE151 Assignment 2 Markov Decision Processes in the Grid World
CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 gclin@ucsd.edu Tom Maddock A55645 tmaddock@ucsd.edu Abstract Markov decision processes exemplify sequential problems, which are
More informationSelf-Organization of Place Cells and Reward-Based Navigation for a Mobile Robot
Self-Organization of Place Cells and Reward-Based Navigation for a Mobile Robot Takashi TAKAHASHI Toshio TANAKA Kenji NISHIDA Takio KURITA Postdoctoral Research Fellow of the Japan Society for the Promotion
More informationSimulated Transfer Learning Through Deep Reinforcement Learning
Through Deep Reinforcement Learning William Doan Griffin Jarmin WILLRD9@VT.EDU GAJARMIN@VT.EDU Abstract This paper encapsulates the use reinforcement learning on raw images provided by a simulation to
More informationInstructor s Notes Web Data Management Web Client/Server Concepts. Web Data Management Web Client/Server Concepts
Instructor s Web Data Management Web Client/Server Concepts Web Data Management 152-155 Web Client/Server Concepts Quick Links & Text References Client / Server Concepts Pages 4 11 Web Data Mgt Software
More informationREINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION
REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION ABSTRACT Mark A. Mueller Georgia Institute of Technology, Computer Science, Atlanta, GA USA The problem of autonomous vehicle navigation between
More informationActa Universitaria ISSN: Universidad de Guanajuato México
Acta Universitaria ISSN: 0188-6266 actauniversitaria@ugto.mx Universidad de Guanajuato México Cruz-Álvarez, Víctor Ricardo; Hidalgo-Peña, Enrique; Acosta-Mesa, Hector-Gabriel A line follower robot implementation
More informationWhen Network Embedding meets Reinforcement Learning?
When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE
More informationFaculty Quick Guide to Blackboard. Blackboard Version 9.1. Christine Paige Educational Technology Specialist.
Faculty Quick Guide to Blackboard Blackboard Version 9.1 Christine Paige Educational Technology Specialist paigec@strose.edu (518) 337-4912 July 2010 Table of Contents Description of Blackboard... 3 Uses
More informationDeep Reinforcement Learning for Pellet Eating in Agar.io
Deep Reinforcement Learning for Pellet Eating in Agar.io Nil Stolt Ansó 1, Anton O. Wiehe 1, Madalina M. Drugan 2 and Marco A. Wiering 1 1 Bernoulli Institute, Department of Artificial Intelligence, University
More informationIntegration of van Hasselt s RL library into RL-glue
Integration of van Hasselt s RL library into RL-glue José Luis Benacloch Ayuso ETSINF, Universitat Politècnica de València, València, Spain. jobeay@fiv.upv.es March 15, 2012 Abstract This paper describes
More informationNatural Actor-Critic. Authors: Jan Peters and Stefan Schaal Neurocomputing, Cognitive robotics 2008/2009 Wouter Klijn
Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn Content Content / Introduction Actor-Critic Natural gradient Applications Conclusion
More informationStrategies for simulating pedestrian navigation with multiple reinforcement learning agents
Strategies for simulating pedestrian navigation with multiple reinforcement learning agents Francisco Martinez-Gil, Miguel Lozano, Fernando Ferna ndez Presented by: Daniel Geschwender 9/29/2016 1 Overview
More informationApprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang
Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control
More informationWestminsterResearch
WestminsterResearch http://www.westminster.ac.uk/research/westminsterresearch Reinforcement learning in continuous state- and action-space Barry D. Nichols Faculty of Science and Technology This is an
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationReinforcement Learning: A brief introduction. Mihaela van der Schaar
Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming
More informationIntroduction to Fall 2008 Artificial Intelligence Midterm Exam
CS 188 Introduction to Fall 2008 Artificial Intelligence Midterm Exam INSTRUCTIONS You have 80 minutes. 70 points total. Don t panic! The exam is closed book, closed notes except a one-page crib sheet,
More informationMULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION
MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION Panca Mudjirahardjo, Rahmadwati, Nanang Sulistiyanto and R. Arief Setyawan Department of Electrical Engineering, Faculty of
More informationTechniques. IDSIA, Istituto Dalle Molle di Studi sull'intelligenza Articiale. Phone: Fax:
Incorporating Learning in Motion Planning Techniques Luca Maria Gambardella and Marc Haex IDSIA, Istituto Dalle Molle di Studi sull'intelligenza Articiale Corso Elvezia 36 - CH - 6900 Lugano Phone: +41
More informationA Symmetric Multiprocessor Architecture for Multi-Agent Temporal Difference Learning
A Symmetric Multiprocessor Architecture for Multi-Agent Temporal Difference Learning Scott Fields, Student Member, IEEE, Itamar Elhanany, Senior Member, IEEE Department of Electrical & Computer Engineering
More informationResidual Advantage Learning Applied to a Differential Game
Presented at the International Conference on Neural Networks (ICNN 96), Washington DC, 2-6 June 1996. Residual Advantage Learning Applied to a Differential Game Mance E. Harmon Wright Laboratory WL/AAAT
More informationA fast point-based algorithm for POMDPs
A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl
More informationApproximating a Policy Can be Easier Than Approximating a Value Function
Computer Science Technical Report Approximating a Policy Can be Easier Than Approximating a Value Function Charles W. Anderson www.cs.colo.edu/ anderson February, 2 Technical Report CS-- Computer Science
More informationIn Homework 1, you determined the inverse dynamics model of the spinbot robot to be
Robot Learning Winter Semester 22/3, Homework 2 Prof. Dr. J. Peters, M.Eng. O. Kroemer, M. Sc. H. van Hoof Due date: Wed 6 Jan. 23 Note: Please fill in the solution on this sheet but add sheets for the
More informationFaculty Guide to Blackboard
Faculty Guide to Blackboard August 2012 1 Table of Contents Description of Blackboard... 3 Uses of Blackboard... 3 Hardware Configurations and Web Browsers... 3 Logging Into Blackboard... 3 Customizing
More informationMarkov Decision Processes. (Slides from Mausam)
Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology
More informationRough Sets-based Prototype Optimization in Kanerva-based Function Approximation
215 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Rough Sets-based Prototype Optimization in Kanerva-based Function Approximation Cheng Wu School of Urban Rail
More informationAIR FORCE INSTITUTE OF TECHNOLOGY
Scaling Ant Colony Optimization with Hierarchical Reinforcement Learning Partitioning THESIS Erik Dries, Captain, USAF AFIT/GCS/ENG/07-16 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE
More informationSlides credited from Dr. David Silver & Hung-Yi Lee
Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to
More informationTowards Traffic Anomaly Detection via Reinforcement Learning and Data Flow
Towards Traffic Anomaly Detection via Reinforcement Learning and Data Flow Arturo Servin Computer Science, University of York aservin@cs.york.ac.uk Abstract. Protection of computer networks against security
More informationClustering with Reinforcement Learning
Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of
More informationCall Admission Control for Multimedia Cellular Networks Using Neuro-dynamic Programming
Call Admission Control for Multimedia Cellular Networks Using Neuro-dynamic Programming Sidi-Mohammed Senouci, André-Luc Beylot 2, and Guy Pujolle Laboratoire LIP6 Université de Paris VI 8, rue du Capitaine
More informationLocally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005
Locally Weighted Learning for Control Alexander Skoglund Machine Learning Course AASS, June 2005 Outline Locally Weighted Learning, Christopher G. Atkeson et. al. in Artificial Intelligence Review, 11:11-73,1997
More informationGeneralized Inverse Reinforcement Learning
Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract
More informationNovel Function Approximation Techniques for. Large-scale Reinforcement Learning
Novel Function Approximation Techniques for Large-scale Reinforcement Learning A Dissertation by Cheng Wu to the Graduate School of Engineering in Partial Fulfillment of the Requirements for the Degree
More informationUsing Continuous Action Spaces to Solve Discrete Problems
Using Continuous Action Spaces to Solve Discrete Problems Hado van Hasselt Marco A. Wiering Abstract Real-world control problems are often modeled as Markov Decision Processes (MDPs) with discrete action
More informationResearch Project. Reinforcement Learning Policy Gradient Algorithms for Robot Learning
University of Girona Department of Electronics, Informatics and Automation Research Project Reinforcement Learning Policy Gradient Algorithms for Robot Learning Work presented by Andres El-Fakdi Sencianes
More informationCMPSCI 250: Introduction to Computation. Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014
CMPSCI 250: Introduction to Computation Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014 Graphs, Paths, and Trees Graph Definitions Paths and the Path Predicate Cycles, Directed
More informationA Framework for A Graph- and Queuing System-Based Pedestrian Simulation
A Framework for A Graph- and Queuing System-Based Pedestrian Simulation Srihari Narasimhan IPVS Universität Stuttgart Stuttgart, Germany Hans-Joachim Bungartz Institut für Informatik Technische Universität
More informationDeep Q-Learning to play Snake
Deep Q-Learning to play Snake Daniele Grattarola August 1, 2016 Abstract This article describes the application of deep learning and Q-learning to play the famous 90s videogame Snake. I applied deep convolutional
More informationSmart Search: A Firefox Add-On to Compute a Web Traffic Ranking. A Writing Project. Presented to. The Faculty of the Department of Computer Science
Smart Search: A Firefox Add-On to Compute a Web Traffic Ranking A Writing Project Presented to The Faculty of the Department of Computer Science San José State University In Partial Fulfillment of the
More informationCompetition Between Reinforcement Learning Methods in a Predator-Prey Grid World
Competition Between Reinforcement Learning Methods in a Predator-Prey Grid World Jacob Schrum (schrum2@cs.utexas.edu) Department of Computer Sciences University of Texas at Austin Austin, TX 78712 USA
More informationPROBLEM SOLVING WITH
PROBLEM SOLVING WITH REINFORCEMENT LEARNING Gavin Adrian Rummery A Cambridge University Engineering Department Trumpington Street Cambridge CB2 1PZ England This dissertation is submitted for consideration
More informationA New Technique for Ranking Web Pages and Adwords
A New Technique for Ranking Web Pages and Adwords K. P. Shyam Sharath Jagannathan Maheswari Rajavel, Ph.D ABSTRACT Web mining is an active research area which mainly deals with the application on data
More informationA Connectionist Learning Control Architecture for Navigation
A Connectionist Learning Control Architecture for Navigation Jonathan R. Bachrach Department of Computer and Information Science University of Massachusetts Amherst, MA 01003 Abstract A novel learning
More informationHuman-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015
Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015 Content Demo Framework Remarks Experiment Discussion Content Demo Framework Remarks Experiment Discussion
More informationLecture 4: Linear Programming
COMP36111: Advanced Algorithms I Lecture 4: Linear Programming Ian Pratt-Hartmann Room KB2.38: email: ipratt@cs.man.ac.uk 2017 18 Outline The Linear Programming Problem Geometrical analysis The Simplex
More informationCellular Learning Automata-Based Color Image Segmentation using Adaptive Chains
Cellular Learning Automata-Based Color Image Segmentation using Adaptive Chains Ahmad Ali Abin, Mehran Fotouhi, Shohreh Kasaei, Senior Member, IEEE Sharif University of Technology, Tehran, Iran abin@ce.sharif.edu,
More informationLearning to bounce a ball with a robotic arm
Eric Wolter TU Darmstadt Thorsten Baark TU Darmstadt Abstract Bouncing a ball is a fun and challenging task for humans. It requires fine and complex motor controls and thus is an interesting problem for
More informationStandalone Mobile Application for Shipping Services Based on Geographic Information System and A-Star Algorithm
Journal of Physics: Conference Series PAPER OPEN ACCESS Standalone Mobile Application for Shipping Services Based on Geographic Information System and A-Star Algorithm To cite this article: D Gunawan et
More informationA STRUCTURAL OPTIMIZATION METHODOLOGY USING THE INDEPENDENCE AXIOM
Proceedings of ICAD Cambridge, MA June -3, ICAD A STRUCTURAL OPTIMIZATION METHODOLOGY USING THE INDEPENDENCE AXIOM Kwang Won Lee leekw3@yahoo.com Research Center Daewoo Motor Company 99 Cheongchon-Dong
More informationLearning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation
Learning Learning agents Inductive learning Different Learning Scenarios Evaluation Slides based on Slides by Russell/Norvig, Ronald Williams, and Torsten Reil Material from Russell & Norvig, chapters
More informationProject Title REPRESENTATION OF ELECTRICAL NETWORK USING GOOGLE MAP API. Submitted by: Submitted to: SEMANTA RAJ NEUPANE, Research Assistant,
- 1 - Project Title REPRESENTATION OF ELECTRICAL NETWORK USING GOOGLE MAP API Submitted by: SEMANTA RAJ NEUPANE, Research Assistant, Department of Electrical Energy Engineering, Tampere University of Technology
More informationReinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019
Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 1 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Introduction, History, General Concepts
More information