SIMULATION BASED REINFORCEMENT LEARNING FOR PATH TRACKING ROBOT

Size: px
Start display at page:

Download "SIMULATION BASED REINFORCEMENT LEARNING FOR PATH TRACKING ROBOT"

Transcription

1 SIMULATION BASED REINFORCEMENT LEARNING FOR PATH TRACKING ROBOT Tony 1, M. Rahmat Widyanto 2 1 Department of Computer System, Faculty of Information Technology, Tarumanagara University Jl. Letjen S. Parman No. 1 Jakarta Faculty of Computer Science, University of Indonesia Kampus Baru UI Depok tony.b@fti.utara.org 1, widyanto@cs.ui.ac.id 2 ABSTRACT In this paper, we present a simulation based reinforcement learning for path tracking robot using Q-Learning algorithm. Q-Learning is reinforcement learning technique used mainly in robotics. This technique will estimate the value of state-action and choose the maximum reward value for each action so the shortest path will be taken. The simulation based on Q-Learning has been conducted in an environment which consists of eight rooms with one room as the goal. Our experiment shows that the algorithm can choose actions with maximum reward to find the shortest path to reach the goal. Keywords: path tracking, Q-Learning, reinforcement learning, shortest path 1 INTRODUCTION Reinforcement learning is a sub-area of machine learning which concerned in learning process by rewards and punishments. An agent ought to take actions in an environment so as to maximize the reward value. Reinforcement learning algorithm will find a policy that maps states of the environment to the actions that agent should take in those states. Commonly, the basic reinforcement learning model consists of environment states, actions, and rewards. Figure 1 shows a typical reinforcement learning system. An agent receives description of the environment, which are called states, and choose actions to perform. The effect of an action to the environment is evaluated and fed back to the agent in the form of positive or negative rewards. The mission of the agent is to find the action rules to achieve maximum rewards through its interaction with the environment. One of the most important breakthroughs in reinforcement learning was the development of Q-Learning by Watkins in 1989 [1]. In this paper, we present simulation based reinforcement learning for path tracking robot using the famous Q-Learning algorithm. The rest of the paper is organized as follows: section 2 introduces Q-Learning algorithm, environment model, and experiment. Section 3 discusses the result of simulation, while section 4 concludes the paper. Figure 1. A Typical Structure of Reinforcement Learning 2 ALGORITHM, ENVIRONMENT MODEL, AND EXPERIMENT This section describes the Q-Learning algorithm, its environment model, and experiment. 2.1 Q-Learning Algorithm The task of reinforcement learning is generally stated as follows [2]: For each transition of the system from one state to another, a value called reward is assigned. The system receives the reward after the transition is carried out. The purpose is to find a control policy that maximizes the expected discounted amount of the reward known as a return. The value function is a prediction of return value of any state : k V ( xt ) E γ. rt + k k = 0 where r t is the reward received in transition from state x t to x t+1 and γ is the discount factor (0 γ 1). V(x t ) is the discounted amount of the reward since time t. This amount depends on the sequence of actions chosen which are determined by the 289

2 290 The 5th International Conference on Information & Communication Technology and Systems policy of control. The system has to find a control policy that maximizes V(x t ) for each state. The Q-Learning algorithm does not work with the value function directly. It employs the Q-function whose argument might be not only a state, but also an action. This enables one to construct the Q-function by iterative method and thus find an optimal control policy. The expression for the Q-Learning function describes like this: Q( xt ) rt + γ. V ( xt + 1) a t is an action chosen at time t out of set of all possible actions. Because the purpose of the system is to maximize the sum total of the reward, V(x t+1 ) is replaced by max Q(x t+1,a t+1 ) and as a result, the following function is obtained: Q( xt ) rt + γ. max Q( xt ) Q values are stored in a matrix whose inputs are a state and an action. In systems which employ Q-Learning, the expression above is usually combined with the method of temporal difference, TD(λ), proposed by Sutton [3]. If the parameter of temporal difference λ is equal to zero, the method is called single step Q-Learning because only the current and the next value of the prediction of Q-values participate in the update. The function for single step Q-Learning describes as follows. Q( xt, at ) Q( xt ) + α.( rt + γ. max Q( xt+ 1, at + 1) Q( xt )) Single step Q-Learning is a reinforcement learning technique which learn an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter. It leads an agent to acquire optimal control strategies from delayed rewards, even when there is no prior knowledge of the effects of its actions on the environment [4]. Single step Q-Learning algorithm is described as follows [5]: Given: state diagram with a goal state (R matrix) Find: shortest path from any initial state to the goal state (Q matrix) 1. Set parameter γ, and reward (R) matrix 2. Initialize Q matrix as zero matrix 3. For each episode: Select random initial state Do while not reach goal state a. Select one among all possible actions for the current state b. Using this possible action, consider go to the next state c. Get maximum Q value of this next state based on all possible actions d. Compute: Q(state,action) = R(state,action) + γ. max [Q(next state, all actions)] e. Set the next state as the current state end do end for 2.2 Environment Model The environment model consists of eight rooms which connected by certain doors as shown in Figure 2. Each room is labeled from A to H. Room D is the target room or goal. Notice that there is only one door which connected to the goal room, that is through room C. Figure 2. Environment model The environment model in Figure 2 can be represented by graph, where each room as a vertex and each door as an edge. The graph is shown in the Figure 3. Figure 3. Graph of environment The goal room is the node D. Each door or edge of the graph has a reward value. The edge that lead immediately to the goal has instant reward of 100 (see Figure 4). Others that do not have direct connection to the target room have zero reward. Because the door is two way (from A can go to E and from E can go back to A), we assign two arrows to each room of the previous graph. Each arrow contains a reward value. The above graph becomes state diagram as shown in the Figure 4. Additional loop with highest reward (100) is given to the goal room (D back to D) so if the agent arrives at the goal, it will remain there forever. This is called

3 C44 - Simulation Based Reinforcement Learning For Path Tracking Robot - Tony 291 absorbing goal because when it reaches the goal state, it will stay in there. Figure 4. State Diagram The virtual robot acts as an agent that can pass one room to another without knowledge of the environment. It does not know which sequence of doors to pass to go to the target room. It will learn through experience using Q-Learning. Suppose the agent is in room A, it will learn to reach room D as goal (see Figure 2). Each room in the environment is called as a state. Agent's movement from one room to another room is called as an action. State is represented by node in the state diagram, while action is represented by arrow (see Figure 4). Suppose the agent is in state A. From state A, the agent can not directly go to state B because there is no direct door or arrow connecting state A and B, it just can go to the state E because the state E is connected to A. From state E, the agent just can go to state F. From state F, the agent can go either to state B or state G or back to state E (look at the arrow out of state D). From state B, the agent can go either to state C or back to state F. From state C, the agent can go either to state D or state G or back to state B. From state G, the agent can go either to state C or state H or back to state F. From state H, the agent just can go to state G. If the agent is in state D, it will remain there because state D is the target room. The state diagram and reward values can be illustrated into the following reward table or R matrix. The negative value in the matrix means that the row state has no action to go to column state. For example, state A can not go to state B, C, D, F, G, and H because there is no connecting door. Zero means that the action from one state to other state is not the goal. The reward of 100 is given to action from one state to goal state. 2.3 Experiment In our experiment, we use javascript, PHP (Personal Home Page), and HTML (Hyper Text Mark-up Language) as programming tools. To support that, we need XAMPP (X-any of operating systems, Apache, MySQL, PHP, and Perl) as web server and internet browser (such as Internet Explorer or Mozilla Firefox) which supports javascript to run the simulation. After XAMPP is installed, folder which contains the simulation program must be copied to folder C:\ProgramFiles\XAMPP\htdocs. To run the program, open internet browser, type on the address bar. Action to go to state Agent now in state A B C D E F G H A B C R = D E F G H Figure 5. R Matrix The implementation of Q-learning algorithm is described as follows: Set the value of learning parameter γ = 0.8 and reward (R) matrix. The reward value of each action are: R[A,E] = 0; R[B,C] = 0; R[B,F] = 0; R[C,B] = 0; R[C,D] = 100; R[C,G] = 0; R[D,C] = 0; R[D,D] = 100; R[E,A] = 0; R[E,F] = 0; R[F,B] = 0; R[F,E] = 0; R[F,G] = 0; R[G,C] = 0; R[G,F] = 0; R[G,H] = 0; R[G,H] = 0. The R matrix is shown in Figure 5 above. Initialize Q matrix as a zero matrix. Initial value each state for all action is zero. Q matrix is described in Figure below. state/action A B C D E F G H A B C Q = D E F G H Figure 6. Q Matrix For each episode, set initial state (s t ) by random selection. Using this possible action, go to the next state. Look at the fourth column (action to go to

4 292 The 5th International Conference on Information & Communication Technology and Systems state D) of R matrix. There is only one possible state to go to state D. It is state C. We call it as Q(C,D). Get the maximum Q value of this state based on all possible action. The action is Q(C,D), so we count the maximum Q value of state D: max[q(next state, all actions) or max [Q(D,D)] because state D is the goal state. Then, compute the Q value. Q(state,action) = R(state,action) + γ. max[q(next state, all actions)] or Q(s t,a t ) = R(s t,a t ) + γ.max[q(s t+1,a t+1 )] Q(C,D) = R(C,D) + γ * max[q(d,d)] = * 0 = 100 Because state D is the goal state, then we finish one episode. The updated Q matrix is: From room F, choose room B or room G with reward value 64. From room B or room G, choose room C with reward value 80. From room C, go to room D (goal state) with reward value 100. (a). Agent & Environment Model Figure 7. Updated Q Matrix Set s t as s t+1. Choose the initial state which can go to state C. There are two possible states, state B and state G. So the actions are Q(B,C) and Q(G,C). Compute the Q value again using the updated Q matrix (or do step c, d, and e from Q-Learning algorithm above). The agent will learn more and more experiences through many episodes until the value Q matrix reach convergence. If the convergence value is reached, the agent will choose the shortest path to go to the goal state. 3 RESULT The simulation of path tracking is shown in Figure 8. The Q matrix in the simulation program is the convergence value. Figure 9 shows the convergence Q matrix. The convergence Q matrix can be represented in graph (see Figure 10). The reward (R) matrix in the simulation program is same as the R matrix in Figure 5. In our simulation program, we can locate any state as initial state. The goal state is state D. If we start from state D, the agent does not need to find the shortest path to go to goal state. It will remain there because state D is the goal state. As first example, we choose room A as initial state. We describe Figure 10 as follow: Start from room A, choose room E with reward value 41. From room E, choose room F with reward value 51. (b). Convergence Q Matrix (c). R Matrix Figure 8. Screen shot simulation program state/action A B C D E F G H A B C Q = D E F G H Figure 9. The convergence Q matrix Based on reward value, the shortest path from initial state (room A) to goal state (room D) are A-E-F-B-C-D or A-E-F-G-C-D. There are

5 C44 - Simulation Based Reinforcement Learning For Path Tracking Robot - Tony 293 two options of shortest path because the reward value from F to B and from F to G are similar. As discussed above, we can locate any room as initial state. For example, we choose room H as initial state. The result is shown in Figure 11. Figure 10. The graph of convergence Q matrix with room A as initial state especially for path tracking. Our experiment shows that by using the simulation program, the reinforcement learning technique using Q-Learning algorithm can be used to find the shortest path for path tracking robot. The simulation shows that there were two shortest path from initial state A to goal state D. The path are A-E-F-B-C-D and A-E-F-G-C- D. Based on reward value, we found that there was the same value in some actions. This is the weakness of Q-Learning algorithm. The algorithm only consider one step ahead and update the value of Q-matrix for one action. In the future, we will consider stronger complex algorithm such as Q(λ) algorithm which is an on-line multi step learning algorithm also developed by Watkins [6] used to perform faster and more accurate actions in finding a shortest path. REFERENCES Figure 11. The graph of convergence Q matrix with room H as initial state Figure 11 can be explained as follow: Start from room H, choose room G with reward value 64. From room G, choose room C with reward value 80. From room C, go to room D (goal state) with reward value 100. Based on reward value, the shortest path from initial state (room H) to goal state (room D) are H-G-C-D. 4 CONCLUSION AND DISCUSSION As reinforcement learning technique, Q-Learning algorithm developed by Watkins in 1989 can be used in many robotics application, [1] C. J. C. H. Watkins (1989). Learning From Delayed Rewards. PhD Dissertation, Cambridge University. [2] Valery Kuzmin (2002). Connectionist Q-Learning In Robot Control Task. Scientific Proceedings of Riga Technical University. [3] R. S. Sutton (1988). Learning to Predict by Methods of Temporal Differences. In Machine Learning 3, Kluwer Academic Publishers, Boston, pp [4] C. Clausen and H. Wechsler (2000). Quad Q-Learning. IEEE Trans. On Neural Network 11: [5] Kardi T (2005). Q-Learning by Examples [Online]. Available at: forcementlearning/index.html [Accessed: March 3 rd,2009]. [6] Hyun-Chang Y, et al. (2007). Hexagon-Based Q-Learning Algorithm and Applications. International Journal of Control, Automation, and Systems 5:

6 294 The 5th International Conference on Information & Communication Technology and Systems Figure 12 (a) Start from A and go to E Figure 12 (f) Reach the goal state D Figure 12 (b) In E and go to F Figure 13 (a) Start from H and go to G Figure 12 (c) In F and go to B Figure 13 (b) In G and go to C Figure 12 (d) In B and go to C Figure 13 (c) In C and go to D Figure 12 (e) In C and go to D Figure 13 (d) Reach the goal state D

Reinforcement Learning and Shape Grammars

Reinforcement Learning and Shape Grammars Reinforcement Learning and Shape Grammars Technical report Author Manuela Ruiz Montiel Date April 15, 2011 Version 1.0 1 Contents 0. Introduction... 3 1. Tabular approach... 4 1.1 Tabular Q-learning...

More information

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10 Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

Performance Comparison of Sarsa(λ) and Watkin s Q(λ) Algorithms

Performance Comparison of Sarsa(λ) and Watkin s Q(λ) Algorithms Performance Comparison of Sarsa(λ) and Watkin s Q(λ) Algorithms Karan M. Gupta Department of Computer Science Texas Tech University Lubbock, TX 7949-314 gupta@cs.ttu.edu Abstract This paper presents a

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

Unsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1

Unsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1 Unsupervised CS 3793/5233 Artificial Intelligence Unsupervised 1 EM k-means Procedure Data Random Assignment Assign 1 Assign 2 Soft k-means In clustering, the target feature is not given. Goal: Construct

More information

Continuous Valued Q-learning for Vision-Guided Behavior Acquisition

Continuous Valued Q-learning for Vision-Guided Behavior Acquisition Continuous Valued Q-learning for Vision-Guided Behavior Acquisition Yasutake Takahashi, Masanori Takeda, and Minoru Asada Dept. of Adaptive Machine Systems Graduate School of Engineering Osaka University

More information

A Brief Introduction to Reinforcement Learning

A Brief Introduction to Reinforcement Learning A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent

More information

Reinforcement Learning-Based Path Planning for Autonomous Robots

Reinforcement Learning-Based Path Planning for Autonomous Robots Reinforcement Learning-Based Path Planning for Autonomous Robots Dennis Barrios Aranibar 1, Pablo Javier Alsina 1 1 Laboratório de Sistemas Inteligentes Departamento de Engenharia de Computação e Automação

More information

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:

More information

ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS Summer Introduction to Artificial Intelligence Midterm ˆ You have approximately minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. ˆ Mark your answers

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS Summer Introduction to Artificial Intelligence Midterm You have approximately minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark your answers

More information

Per-decision Multi-step Temporal Difference Learning with Control Variates

Per-decision Multi-step Temporal Difference Learning with Control Variates Per-decision Multi-step Temporal Difference Learning with Control Variates Kristopher De Asis Department of Computing Science University of Alberta Edmonton, AB T6G 2E8 kldeasis@ualberta.ca Richard S.

More information

Robotic Search & Rescue via Online Multi-task Reinforcement Learning

Robotic Search & Rescue via Online Multi-task Reinforcement Learning Lisa Lee Department of Mathematics, Princeton University, Princeton, NJ 08544, USA Advisor: Dr. Eric Eaton Mentors: Dr. Haitham Bou Ammar, Christopher Clingerman GRASP Laboratory, University of Pennsylvania,

More information

Path Planning for a Robot Manipulator based on Probabilistic Roadmap and Reinforcement Learning

Path Planning for a Robot Manipulator based on Probabilistic Roadmap and Reinforcement Learning 674 International Journal Jung-Jun of Control, Park, Automation, Ji-Hun Kim, and and Systems, Jae-Bok vol. Song 5, no. 6, pp. 674-680, December 2007 Path Planning for a Robot Manipulator based on Probabilistic

More information

A Fuzzy Reinforcement Learning for a Ball Interception Problem

A Fuzzy Reinforcement Learning for a Ball Interception Problem A Fuzzy Reinforcement Learning for a Ball Interception Problem Tomoharu Nakashima, Masayo Udo, and Hisao Ishibuchi Department of Industrial Engineering, Osaka Prefecture University Gakuen-cho 1-1, Sakai,

More information

Determination of 3-D Image Viewpoint Using Modified Nearest Feature Line Method in Its Eigenspace Domain

Determination of 3-D Image Viewpoint Using Modified Nearest Feature Line Method in Its Eigenspace Domain Determination of 3-D Image Viewpoint Using Modified Nearest Feature Line Method in Its Eigenspace Domain LINA +, BENYAMIN KUSUMOPUTRO ++ + Faculty of Information Technology Tarumanagara University Jl.

More information

Pascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering

Pascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering Study of Multiple Multiagent Reinforcement Learning Algorithms in Grid Games by Pascal De Beck-Courcelle A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of

More information

Adaptive Building of Decision Trees by Reinforcement Learning

Adaptive Building of Decision Trees by Reinforcement Learning Proceedings of the 7th WSEAS International Conference on Applied Informatics and Communications, Athens, Greece, August 24-26, 2007 34 Adaptive Building of Decision Trees by Reinforcement Learning MIRCEA

More information

Markov Decision Processes (MDPs) (cont.)

Markov Decision Processes (MDPs) (cont.) Markov Decision Processes (MDPs) (cont.) Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 29 th, 2007 Markov Decision Process (MDP) Representation State space: Joint state x

More information

Hierarchical Reinforcement Learning for Robot Navigation

Hierarchical Reinforcement Learning for Robot Navigation Hierarchical Reinforcement Learning for Robot Navigation B. Bischoff 1, D. Nguyen-Tuong 1,I-H.Lee 1, F. Streichert 1 and A. Knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701

More information

An Actor-Critic Algorithm using a Binary Tree Action Selector

An Actor-Critic Algorithm using a Binary Tree Action Selector Trans. of the Society of Instrument and Control Engineers Vol.E-4, No.1, 1/9 (27) An Actor-Critic Algorithm using a Binary Tree Action Selector Reinforcement Learning to Cope with Enormous Actions Hajime

More information

arxiv: v1 [cs.cv] 2 Sep 2018

arxiv: v1 [cs.cv] 2 Sep 2018 Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering

More information

Q-learning with linear function approximation

Q-learning with linear function approximation Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007

More information

Markov Decision Processes and Reinforcement Learning

Markov Decision Processes and Reinforcement Learning Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence

More information

Hierarchically Optimal Average Reward Reinforcement Learning

Hierarchically Optimal Average Reward Reinforcement Learning Hierarchically Optimal Average Reward Reinforcement Learning Mohammad Ghavamzadeh mgh@cs.umass.edu Sridhar Mahadevan mahadeva@cs.umass.edu Department of Computer Science, University of Massachusetts Amherst,

More information

Midterm I. Introduction to Artificial Intelligence. CS 188 Fall You have approximately 3 hours.

Midterm I. Introduction to Artificial Intelligence. CS 188 Fall You have approximately 3 hours. CS 88 Fall 202 Introduction to Artificial Intelligence Midterm I You have approximately 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators

More information

Reinforcement Learning (2)

Reinforcement Learning (2) Reinforcement Learning (2) Bruno Bouzy 1 october 2013 This document is the second part of the «Reinforcement Learning» chapter of the «Agent oriented learning» teaching unit of the Master MI computer course.

More information

Learning State-Action Basis Functions for Hierarchical MDPs

Learning State-Action Basis Functions for Hierarchical MDPs Sarah Osentoski Sridhar Mahadevan University of Massachusetts Amherst, 140 Governor s Drive, Amherst, MA 01002 SOSENTOS@CS.UMASS.EDU MAHADEVA@CS.UMASS.EDU Reinforcement Learning, Semi-Markov Decision Processes,

More information

Marco Wiering Intelligent Systems Group Utrecht University

Marco Wiering Intelligent Systems Group Utrecht University Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment

More information

Predictive Autonomous Robot Navigation

Predictive Autonomous Robot Navigation Predictive Autonomous Robot Navigation Amalia F. Foka and Panos E. Trahanias Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece and Department of Computer

More information

A Reinforcement Learning Approach to Automated GUI Robustness Testing

A Reinforcement Learning Approach to Automated GUI Robustness Testing A Reinforcement Learning Approach to Automated GUI Robustness Testing Sebastian Bauersfeld and Tanja E. J. Vos Universitat Politècnica de València, Camino de Vera s/n, 46022, Valencia, Spain {sbauersfeld,tvos}@pros.upv.es

More information

Ampliación de Bases de Datos

Ampliación de Bases de Datos 1. Introduction to In this course, we are going to use: Apache web server PHP installed as a module for Apache Database management system MySQL and the web application PHPMyAdmin to administrate it. It

More information

Reinforcement Learning in Discrete and Continuous Domains Applied to Ship Trajectory Generation

Reinforcement Learning in Discrete and Continuous Domains Applied to Ship Trajectory Generation POLISH MARITIME RESEARCH Special Issue S1 (74) 2012 Vol 19; pp. 31-36 10.2478/v10012-012-0020-8 Reinforcement Learning in Discrete and Continuous Domains Applied to Ship Trajectory Generation Andrzej Rak,

More information

Lecture 18: Video Streaming

Lecture 18: Video Streaming MIT 6.829: Computer Networks Fall 2017 Lecture 18: Video Streaming Scribe: Zhihong Luo, Francesco Tonolini 1 Overview This lecture is on a specific networking application: video streaming. In particular,

More information

Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding

Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding Advances in Neural Information Processing Systems 8, pp. 1038-1044, MIT Press, 1996. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding Richard S. Sutton University

More information

Internet programming Lab. Lecturer Mariam A. Salih

Internet programming Lab. Lecturer Mariam A. Salih Internet programming Lab. Lecturer Mariam A. Salih The Internet : The Internet is a worldwide network of computer systems through which information can be easily shared. Browsers : To view information

More information

Midterm I. Introduction to Artificial Intelligence. CS 188 Fall You have approximately 3 hours.

Midterm I. Introduction to Artificial Intelligence. CS 188 Fall You have approximately 3 hours. CS 88 Fall 202 Introduction to Artificial Intelligence Midterm I You have approximately 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators

More information

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION 6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm

More information

15-780: MarkovDecisionProcesses

15-780: MarkovDecisionProcesses 15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic

More information

A DECENTRALIZED REINFORCEMENT LEARNING CONTROLLER FOR COLLABORATIVE DRIVING. Luke Ng, Chris Clark, Jan P. Huissoon

A DECENTRALIZED REINFORCEMENT LEARNING CONTROLLER FOR COLLABORATIVE DRIVING. Luke Ng, Chris Clark, Jan P. Huissoon A DECENTRALIZED REINFORCEMENT LEARNING CONTROLLER FOR COLLABORATIVE DRIVING Luke Ng, Chris Clark, Jan P. Huissoon Department of Mechanical Engineering, Automation & Controls Group University of Waterloo

More information

Dynamic Analysis of Structures Using Neural Networks

Dynamic Analysis of Structures Using Neural Networks Dynamic Analysis of Structures Using Neural Networks Alireza Lavaei Academic member, Islamic Azad University, Boroujerd Branch, Iran Alireza Lohrasbi Academic member, Islamic Azad University, Boroujerd

More information

Máté Lengyel, Peter Dayan: Hippocampal contributions to control: the third way

Máté Lengyel, Peter Dayan: Hippocampal contributions to control: the third way Máté Lengyel, Peter Dayan: Hippocampal contributions to control: the third way David Nagy journal club at 1 markov decision processes 2 model-based & model-free control 3 a third way 1 markov decision

More information

Reinforcement learning of competitive and cooperative skills in soccer agents

Reinforcement learning of competitive and cooperative skills in soccer agents Edith Cowan University Research Online ECU Publications 2011 2011 Reinforcement learning of competitive and cooperative skills in soccer agents Jinsong Leng Edith Cowan University Chee Lim 10.1016/j.asoc.2010.04.007

More information

Reinforcement Learning for Appearance Based Visual Servoing in Robotic Manipulation

Reinforcement Learning for Appearance Based Visual Servoing in Robotic Manipulation Reinforcement Learning for Appearance Based Visual Servoing in Robotic Manipulation UMAR KHAN, LIAQUAT ALI KHAN, S. ZAHID HUSSAIN Department of Mechatronics Engineering AIR University E-9, Islamabad PAKISTAN

More information

CSE151 Assignment 2 Markov Decision Processes in the Grid World

CSE151 Assignment 2 Markov Decision Processes in the Grid World CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 gclin@ucsd.edu Tom Maddock A55645 tmaddock@ucsd.edu Abstract Markov decision processes exemplify sequential problems, which are

More information

Self-Organization of Place Cells and Reward-Based Navigation for a Mobile Robot

Self-Organization of Place Cells and Reward-Based Navigation for a Mobile Robot Self-Organization of Place Cells and Reward-Based Navigation for a Mobile Robot Takashi TAKAHASHI Toshio TANAKA Kenji NISHIDA Takio KURITA Postdoctoral Research Fellow of the Japan Society for the Promotion

More information

Simulated Transfer Learning Through Deep Reinforcement Learning

Simulated Transfer Learning Through Deep Reinforcement Learning Through Deep Reinforcement Learning William Doan Griffin Jarmin WILLRD9@VT.EDU GAJARMIN@VT.EDU Abstract This paper encapsulates the use reinforcement learning on raw images provided by a simulation to

More information

Instructor s Notes Web Data Management Web Client/Server Concepts. Web Data Management Web Client/Server Concepts

Instructor s Notes Web Data Management Web Client/Server Concepts. Web Data Management Web Client/Server Concepts Instructor s Web Data Management Web Client/Server Concepts Web Data Management 152-155 Web Client/Server Concepts Quick Links & Text References Client / Server Concepts Pages 4 11 Web Data Mgt Software

More information

REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION

REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION ABSTRACT Mark A. Mueller Georgia Institute of Technology, Computer Science, Atlanta, GA USA The problem of autonomous vehicle navigation between

More information

Acta Universitaria ISSN: Universidad de Guanajuato México

Acta Universitaria ISSN: Universidad de Guanajuato México Acta Universitaria ISSN: 0188-6266 actauniversitaria@ugto.mx Universidad de Guanajuato México Cruz-Álvarez, Víctor Ricardo; Hidalgo-Peña, Enrique; Acosta-Mesa, Hector-Gabriel A line follower robot implementation

More information

When Network Embedding meets Reinforcement Learning?

When Network Embedding meets Reinforcement Learning? When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE

More information

Faculty Quick Guide to Blackboard. Blackboard Version 9.1. Christine Paige Educational Technology Specialist.

Faculty Quick Guide to Blackboard. Blackboard Version 9.1. Christine Paige Educational Technology Specialist. Faculty Quick Guide to Blackboard Blackboard Version 9.1 Christine Paige Educational Technology Specialist paigec@strose.edu (518) 337-4912 July 2010 Table of Contents Description of Blackboard... 3 Uses

More information

Deep Reinforcement Learning for Pellet Eating in Agar.io

Deep Reinforcement Learning for Pellet Eating in Agar.io Deep Reinforcement Learning for Pellet Eating in Agar.io Nil Stolt Ansó 1, Anton O. Wiehe 1, Madalina M. Drugan 2 and Marco A. Wiering 1 1 Bernoulli Institute, Department of Artificial Intelligence, University

More information

Integration of van Hasselt s RL library into RL-glue

Integration of van Hasselt s RL library into RL-glue Integration of van Hasselt s RL library into RL-glue José Luis Benacloch Ayuso ETSINF, Universitat Politècnica de València, València, Spain. jobeay@fiv.upv.es March 15, 2012 Abstract This paper describes

More information

Natural Actor-Critic. Authors: Jan Peters and Stefan Schaal Neurocomputing, Cognitive robotics 2008/2009 Wouter Klijn

Natural Actor-Critic. Authors: Jan Peters and Stefan Schaal Neurocomputing, Cognitive robotics 2008/2009 Wouter Klijn Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn Content Content / Introduction Actor-Critic Natural gradient Applications Conclusion

More information

Strategies for simulating pedestrian navigation with multiple reinforcement learning agents

Strategies for simulating pedestrian navigation with multiple reinforcement learning agents Strategies for simulating pedestrian navigation with multiple reinforcement learning agents Francisco Martinez-Gil, Miguel Lozano, Fernando Ferna ndez Presented by: Daniel Geschwender 9/29/2016 1 Overview

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

WestminsterResearch

WestminsterResearch WestminsterResearch http://www.westminster.ac.uk/research/westminsterresearch Reinforcement learning in continuous state- and action-space Barry D. Nichols Faculty of Science and Technology This is an

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Reinforcement Learning: A brief introduction. Mihaela van der Schaar

Reinforcement Learning: A brief introduction. Mihaela van der Schaar Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming

More information

Introduction to Fall 2008 Artificial Intelligence Midterm Exam

Introduction to Fall 2008 Artificial Intelligence Midterm Exam CS 188 Introduction to Fall 2008 Artificial Intelligence Midterm Exam INSTRUCTIONS You have 80 minutes. 70 points total. Don t panic! The exam is closed book, closed notes except a one-page crib sheet,

More information

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION Panca Mudjirahardjo, Rahmadwati, Nanang Sulistiyanto and R. Arief Setyawan Department of Electrical Engineering, Faculty of

More information

Techniques. IDSIA, Istituto Dalle Molle di Studi sull'intelligenza Articiale. Phone: Fax:

Techniques. IDSIA, Istituto Dalle Molle di Studi sull'intelligenza Articiale. Phone: Fax: Incorporating Learning in Motion Planning Techniques Luca Maria Gambardella and Marc Haex IDSIA, Istituto Dalle Molle di Studi sull'intelligenza Articiale Corso Elvezia 36 - CH - 6900 Lugano Phone: +41

More information

A Symmetric Multiprocessor Architecture for Multi-Agent Temporal Difference Learning

A Symmetric Multiprocessor Architecture for Multi-Agent Temporal Difference Learning A Symmetric Multiprocessor Architecture for Multi-Agent Temporal Difference Learning Scott Fields, Student Member, IEEE, Itamar Elhanany, Senior Member, IEEE Department of Electrical & Computer Engineering

More information

Residual Advantage Learning Applied to a Differential Game

Residual Advantage Learning Applied to a Differential Game Presented at the International Conference on Neural Networks (ICNN 96), Washington DC, 2-6 June 1996. Residual Advantage Learning Applied to a Differential Game Mance E. Harmon Wright Laboratory WL/AAAT

More information

A fast point-based algorithm for POMDPs

A fast point-based algorithm for POMDPs A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl

More information

Approximating a Policy Can be Easier Than Approximating a Value Function

Approximating a Policy Can be Easier Than Approximating a Value Function Computer Science Technical Report Approximating a Policy Can be Easier Than Approximating a Value Function Charles W. Anderson www.cs.colo.edu/ anderson February, 2 Technical Report CS-- Computer Science

More information

In Homework 1, you determined the inverse dynamics model of the spinbot robot to be

In Homework 1, you determined the inverse dynamics model of the spinbot robot to be Robot Learning Winter Semester 22/3, Homework 2 Prof. Dr. J. Peters, M.Eng. O. Kroemer, M. Sc. H. van Hoof Due date: Wed 6 Jan. 23 Note: Please fill in the solution on this sheet but add sheets for the

More information

Faculty Guide to Blackboard

Faculty Guide to Blackboard Faculty Guide to Blackboard August 2012 1 Table of Contents Description of Blackboard... 3 Uses of Blackboard... 3 Hardware Configurations and Web Browsers... 3 Logging Into Blackboard... 3 Customizing

More information

Markov Decision Processes. (Slides from Mausam)

Markov Decision Processes. (Slides from Mausam) Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology

More information

Rough Sets-based Prototype Optimization in Kanerva-based Function Approximation

Rough Sets-based Prototype Optimization in Kanerva-based Function Approximation 215 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Rough Sets-based Prototype Optimization in Kanerva-based Function Approximation Cheng Wu School of Urban Rail

More information

AIR FORCE INSTITUTE OF TECHNOLOGY

AIR FORCE INSTITUTE OF TECHNOLOGY Scaling Ant Colony Optimization with Hierarchical Reinforcement Learning Partitioning THESIS Erik Dries, Captain, USAF AFIT/GCS/ENG/07-16 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE

More information

Slides credited from Dr. David Silver & Hung-Yi Lee

Slides credited from Dr. David Silver & Hung-Yi Lee Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to

More information

Towards Traffic Anomaly Detection via Reinforcement Learning and Data Flow

Towards Traffic Anomaly Detection via Reinforcement Learning and Data Flow Towards Traffic Anomaly Detection via Reinforcement Learning and Data Flow Arturo Servin Computer Science, University of York aservin@cs.york.ac.uk Abstract. Protection of computer networks against security

More information

Clustering with Reinforcement Learning

Clustering with Reinforcement Learning Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of

More information

Call Admission Control for Multimedia Cellular Networks Using Neuro-dynamic Programming

Call Admission Control for Multimedia Cellular Networks Using Neuro-dynamic Programming Call Admission Control for Multimedia Cellular Networks Using Neuro-dynamic Programming Sidi-Mohammed Senouci, André-Luc Beylot 2, and Guy Pujolle Laboratoire LIP6 Université de Paris VI 8, rue du Capitaine

More information

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005 Locally Weighted Learning for Control Alexander Skoglund Machine Learning Course AASS, June 2005 Outline Locally Weighted Learning, Christopher G. Atkeson et. al. in Artificial Intelligence Review, 11:11-73,1997

More information

Generalized Inverse Reinforcement Learning

Generalized Inverse Reinforcement Learning Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract

More information

Novel Function Approximation Techniques for. Large-scale Reinforcement Learning

Novel Function Approximation Techniques for. Large-scale Reinforcement Learning Novel Function Approximation Techniques for Large-scale Reinforcement Learning A Dissertation by Cheng Wu to the Graduate School of Engineering in Partial Fulfillment of the Requirements for the Degree

More information

Using Continuous Action Spaces to Solve Discrete Problems

Using Continuous Action Spaces to Solve Discrete Problems Using Continuous Action Spaces to Solve Discrete Problems Hado van Hasselt Marco A. Wiering Abstract Real-world control problems are often modeled as Markov Decision Processes (MDPs) with discrete action

More information

Research Project. Reinforcement Learning Policy Gradient Algorithms for Robot Learning

Research Project. Reinforcement Learning Policy Gradient Algorithms for Robot Learning University of Girona Department of Electronics, Informatics and Automation Research Project Reinforcement Learning Policy Gradient Algorithms for Robot Learning Work presented by Andres El-Fakdi Sencianes

More information

CMPSCI 250: Introduction to Computation. Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014

CMPSCI 250: Introduction to Computation. Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014 CMPSCI 250: Introduction to Computation Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014 Graphs, Paths, and Trees Graph Definitions Paths and the Path Predicate Cycles, Directed

More information

A Framework for A Graph- and Queuing System-Based Pedestrian Simulation

A Framework for A Graph- and Queuing System-Based Pedestrian Simulation A Framework for A Graph- and Queuing System-Based Pedestrian Simulation Srihari Narasimhan IPVS Universität Stuttgart Stuttgart, Germany Hans-Joachim Bungartz Institut für Informatik Technische Universität

More information

Deep Q-Learning to play Snake

Deep Q-Learning to play Snake Deep Q-Learning to play Snake Daniele Grattarola August 1, 2016 Abstract This article describes the application of deep learning and Q-learning to play the famous 90s videogame Snake. I applied deep convolutional

More information

Smart Search: A Firefox Add-On to Compute a Web Traffic Ranking. A Writing Project. Presented to. The Faculty of the Department of Computer Science

Smart Search: A Firefox Add-On to Compute a Web Traffic Ranking. A Writing Project. Presented to. The Faculty of the Department of Computer Science Smart Search: A Firefox Add-On to Compute a Web Traffic Ranking A Writing Project Presented to The Faculty of the Department of Computer Science San José State University In Partial Fulfillment of the

More information

Competition Between Reinforcement Learning Methods in a Predator-Prey Grid World

Competition Between Reinforcement Learning Methods in a Predator-Prey Grid World Competition Between Reinforcement Learning Methods in a Predator-Prey Grid World Jacob Schrum (schrum2@cs.utexas.edu) Department of Computer Sciences University of Texas at Austin Austin, TX 78712 USA

More information

PROBLEM SOLVING WITH

PROBLEM SOLVING WITH PROBLEM SOLVING WITH REINFORCEMENT LEARNING Gavin Adrian Rummery A Cambridge University Engineering Department Trumpington Street Cambridge CB2 1PZ England This dissertation is submitted for consideration

More information

A New Technique for Ranking Web Pages and Adwords

A New Technique for Ranking Web Pages and Adwords A New Technique for Ranking Web Pages and Adwords K. P. Shyam Sharath Jagannathan Maheswari Rajavel, Ph.D ABSTRACT Web mining is an active research area which mainly deals with the application on data

More information

A Connectionist Learning Control Architecture for Navigation

A Connectionist Learning Control Architecture for Navigation A Connectionist Learning Control Architecture for Navigation Jonathan R. Bachrach Department of Computer and Information Science University of Massachusetts Amherst, MA 01003 Abstract A novel learning

More information

Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015

Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015 Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015 Content Demo Framework Remarks Experiment Discussion Content Demo Framework Remarks Experiment Discussion

More information

Lecture 4: Linear Programming

Lecture 4: Linear Programming COMP36111: Advanced Algorithms I Lecture 4: Linear Programming Ian Pratt-Hartmann Room KB2.38: email: ipratt@cs.man.ac.uk 2017 18 Outline The Linear Programming Problem Geometrical analysis The Simplex

More information

Cellular Learning Automata-Based Color Image Segmentation using Adaptive Chains

Cellular Learning Automata-Based Color Image Segmentation using Adaptive Chains Cellular Learning Automata-Based Color Image Segmentation using Adaptive Chains Ahmad Ali Abin, Mehran Fotouhi, Shohreh Kasaei, Senior Member, IEEE Sharif University of Technology, Tehran, Iran abin@ce.sharif.edu,

More information

Learning to bounce a ball with a robotic arm

Learning to bounce a ball with a robotic arm Eric Wolter TU Darmstadt Thorsten Baark TU Darmstadt Abstract Bouncing a ball is a fun and challenging task for humans. It requires fine and complex motor controls and thus is an interesting problem for

More information

Standalone Mobile Application for Shipping Services Based on Geographic Information System and A-Star Algorithm

Standalone Mobile Application for Shipping Services Based on Geographic Information System and A-Star Algorithm Journal of Physics: Conference Series PAPER OPEN ACCESS Standalone Mobile Application for Shipping Services Based on Geographic Information System and A-Star Algorithm To cite this article: D Gunawan et

More information

A STRUCTURAL OPTIMIZATION METHODOLOGY USING THE INDEPENDENCE AXIOM

A STRUCTURAL OPTIMIZATION METHODOLOGY USING THE INDEPENDENCE AXIOM Proceedings of ICAD Cambridge, MA June -3, ICAD A STRUCTURAL OPTIMIZATION METHODOLOGY USING THE INDEPENDENCE AXIOM Kwang Won Lee leekw3@yahoo.com Research Center Daewoo Motor Company 99 Cheongchon-Dong

More information

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation Learning Learning agents Inductive learning Different Learning Scenarios Evaluation Slides based on Slides by Russell/Norvig, Ronald Williams, and Torsten Reil Material from Russell & Norvig, chapters

More information

Project Title REPRESENTATION OF ELECTRICAL NETWORK USING GOOGLE MAP API. Submitted by: Submitted to: SEMANTA RAJ NEUPANE, Research Assistant,

Project Title REPRESENTATION OF ELECTRICAL NETWORK USING GOOGLE MAP API. Submitted by: Submitted to: SEMANTA RAJ NEUPANE, Research Assistant, - 1 - Project Title REPRESENTATION OF ELECTRICAL NETWORK USING GOOGLE MAP API Submitted by: SEMANTA RAJ NEUPANE, Research Assistant, Department of Electrical Energy Engineering, Tampere University of Technology

More information

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019 Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 1 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Introduction, History, General Concepts

More information