CSE151 Assignment 2 Markov Decision Processes in the Grid World
|
|
- Daniella Simpson
- 6 years ago
- Views:
Transcription
1 CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 Tom Maddock A55645 Abstract Markov decision processes exemplify sequential problems, which are defined by a transition model and a reward function, situated in uncertain environments. The environment we will be observing is the grid world, which is analogous to the problem of solving for the best moves in a board game. Our objective is to take a grid world where there are obstacles and terminals and solve for the optimal policy in the world. In the optimal policy, each state that is not an obstacle or a terminal state will acquire a direction. This direction is the direction that will result in the most optimal outcome when at that particular state. To solve for the optimal policy, we use two algorithms, value iteration and policy iteration. When observing the results of these two algorithms, we noticed that they produced results that were similar. However, policy iteration performed much faster than value iteration. Introduction One of the most significant and popular aspects of Artificial Intelligence is solving sequential decision problems. We can view a sequential decision problem as a situation where an agent s utility is dependent on a sequence of decisions that the agent can make. Our focus is on sequential decision problems because sequential decision problems lay the groundwork for reinforcement learning. Sequential problems are intriguing due to the components that contribute to the makeup of the problem and how each of the components affects the solution. The infrastructure that we will focus on is the Markovian transition model and additive rewards which is called a Markov decision process (MDP). The model is of a grid world, an m-by-n world that contains obstacles, terminals, a starting initial state, and where each state in the world has a reward value that can be a negative or positive value. In the grid world, there is also a probabilistic move formulation that specifies the probability of moving in each of the four directions (North, South, East, and West) given that the agent has chose to go towards a particular direction.
2 Given a grid world with obstacles, terminals, a start state, rewards, and moves, we will find the optimal policy of the grid world, the policy that yields the highest expected utility. The result is a world where each state has a direction and each direction points to the best direction to take when the agent is at that state; the best direction being the direction that will yield the highest expected utility. To solve for the optimal policy, we use two different algorithms: value iteration and policy iteration. Each of the algorithms will produce an optimal policy. We will be observing the results of these two algorithms and comparing them with each other. Methods. Matlab Code Overview For this assignment, we wrote a handful of Matlab functions that find solutions to a given MDP. The two top-level functions, value_iteration.m and policy_iteration.m, are the core result of our work, and produce, respectively, a utility function and an optimal policy for a given MDP. The display_agent_choice.m and policy_evaluation.m functions complement them by computing, respectively, an optimal policy given a utility function, and a utility function given a policy. Our next_state_utility.m function provides computational support to these functions by finding the expected utility for the next state, given a current state, an action, and a utility function. Finally, the iterate_rewards.m, compute_performance.m, and create_figure.m functions provide convenient methods of investigating the performance of our solution algorithms and plotting the results.. Solution Algorithms Our top-level functions, mentioned above, implement the value iteration and policy iteration algorithms for solving MDPs. Our implementation of value iteration is relatively straightforward, and works by iteratively applying the Bellman update to a utility function until the resulting utilities are within a given error bound. For policy iteration, we chose to implement the policy evaluation step by solving a system of linear equations, instead of using modified policy iteration. We felt that, for the size of the MDPs given in this assignment, this was the preferred method for policy evaluation, in terms of speed as well as accuracy. Additionally, our policy iteration algorithm starts with a random initial policy, instead of a fixed one, which it then iteratively refines until it becomes optimal..3 Measuring Performance For this assignment, we tested our solution algorithms by measuring their performance while they solved two MDPs. The first of these (MDP ) was a grid world built from the data in rewards.txt and terminals.txt. The second of these (MDP ) used data from newrewards.txt and newterminals.txt. To measure the performance of these MDP solution algorithms, we added extra provisions that enable them to output a history of all the utility functions that were computed for each of their iterations. This allows us to observe the evolution of the utilities as they converge on their final values. We then derived values for the maximum error and policy loss, for each of their iterations, based on this history. From this data we could measure how quickly and accurately each algorithm found a solution to the given MDP, and compare based on these results.
3 3 Results 3. Investigating Value Iteration In the first part of our assignment, we investigated the performance of our value iteration algorithm by having it solve both of the provided MDPs, with various discount factors, and watching the progress of the algorithm as it iterated. Figures 3. and 3. below show the algorithm s progress as it finds a solution for MDP and MDP, respectively, with a discount factor of.9. For each figure, the utilities of the states (excluding the obstacle state) are shown on the left-hand graph, and the maximum error and policy loss of the utility function is shown on the right-hand graph. Each graph is plotted over the number of iterations of the algorithm. Value Iteration Utilities for γ =.9 Value Iteration Accuracy for γ =.9 Utility Values U(,) U(,) U(,3) U(,4) U(,) U(,4) U(3,) Figure 3.: Value Iteration Performance for MDP Value Iteration Utilities for γ =.9 Value Iteration Accuracy for γ = Utility Values 4 3 U(,) U(,) U(,3) U(,4) U(,) U(,4) U(3,) Figure 3.: Value Iteration Performance for MDP We then used our value iteration algorithm, in conjunction with our display_agent_choice.m function, to produce the optimal policies for MDPs and as a function of the reward given for non-terminal states. Figures 3.3 and 3.4 on the following page show the results from this experiment. Note that an X in the figures denotes that any action was optimal in that state for the given reward range.
4 N O E T E E E N R < R < R < R < R < N W N W E E E N N W W W E E N N N O W T N W W W R < R < R < R R > Figure 3.3: Optimal policies for MDP for different ranges of rewards, γ = E N N E E T O N T E N N N O W T N W W S N E N W X X W T X O W T X X X S S O E S E E T T R < R < R < R < R < -.3 E E W T N W T T S O E S E E T T E E T T E E N O N W W T N S T T N E T T X W W T X O N W X W T T -.3 R < R R > N W T T Figure 3.4: Optimal policies for MDP for different ranges of rewards, γ = Investigating Policy Iteration For the second part of the assignment, we investigated the performance of policy iteration by having it find solutions for MDPs and, with various discount factors. Figures 3.5 and 3.6 below show the algorithm s progress while solving these MDPs with a discount factor of.9, plotted against the number of iterations taken. Policy Iteration Utilities for γ =.9 Policy Iteration Accuracy for γ =.9 Utility Values U(,) U(,) U(,3) U(,4) U(,) U(,4) U(3,) Figure 3.5: Policy Iteration Performance for MDP
5 Utility Values Policy Iteration Utilities for γ =.9 U(,) U(,) U(,3) U(,4) U(,) U(,4) U(3,) Policy Iteration Accuracy for γ = Figure 3.6: Policy Iteration Performance for MDP 4 Discussion 4. Value Iteration We charted the performance of value iteration, as it solves the MDPs given in the provided text files, in Figures 3. and 3. above, and notice a couple of interesting things about the performance of algorithm. First of all, we noticed that some of the states utility values converged faster than others. The states that tended to converge slower were often the states that were further towards the start of an optimal path. Since the nature of the Bellman equation is such that the utility of each state depends on the utility of its most optimal next state, if a state happens to have many future states that come after it in an optimal path, then it should take longer for the Bellman updates from those states to propagate down that path and reach states more towards the beginning. Thus, this result is reasonable. Although our graphs do not show it, we also noticed that as the discount factor for the given MDP increased, the number of iterations it took before the utility values converged increased as well. This result, too, is to be expected. With a high discount factor, the utility of each state is more heavily influenced by the utilities of future states that come after it. Thus, more Bellman updates are necessary to propagate the utilities of future states through the world, so that all the states that come before them may have their own utilities properly set. In general, we can also see that the policy loss produced by the value iteration algorithm approaches zero much faster than the maximum error, demonstrating that the resulting utility values are often able to produce an optimal policy long before the actual utility values are close enough to be considered correct. This, of course, is the motivation behind the policy iteration algorithm, which we cover in the next section. However, it is interesting to note that the policy loss for value iteration is not always monotonically decreasing, as shown in Figure 3.. This is, presumably, an effect of the non-uniform convergence of the utilities during value iteration, which could cause a policy for some iteration of the algorithm to, in fact, be less optimal than the policy for the previous iteration. In Figures 3.3 and 3.4, we indicated what the optimal policies for both of the MDPs would be, depending on the reward value for the non-terminal states. Most noticeable from these results is that some of our reward ranges in Figure 3.3 do not
6 exactly match the reward ranges given in the book for the same grid world (Russell & Norvig 66). We assume that this is because we do not know the exact discount factor with which the results in the book were computed, and therefore, cannot make any assumption that our results should be exactly the same. However, our policies for rewards ranges that are in the same neighborhood seem to match with the book, so we feel our results are reasonable. That being said, we found reward ranges for MDP where the optimal policy changes (6 more than in the book), and 8 reward ranges for MDP that influence the optimal policy. It is interesting to note that, for both MDPs, when the reward for the non-terminal states was strictly greater than zero, our algorithm was indecisive of the optimal action for the states that had no terminal neighbors, because utilities in neighboring states would continue to increase in value, making all directions equally optimal. Apparently, then, an optimal agent would do its best to stay away from all terminal states, and maximize its reward by moving around in the world indefinitely. For MDP, in particular, it was also interesting to see that for R >.99, the optimal policy for the world was to pass by the + reward state, and instead try and reach the +4.7 reward state that lay beyond. Even more interesting, was that for R > -.3, the optimal policy was to avoid the + reward state at all costs by going West in (3,), and hope for that % chance that the agent would end up going South instead, towards the +4.7 state. 4. Policy Iteration When we ran policy iteration on the same MDPs, we noticed that it behaved a lot like value iteration. In both algorithms, the utility values started out being relatively inaccurate, and then were refined with each iteration until they converged. Unlike value iteration, however, the initial values for the utilities in our policy iteration algorithm always started at random values, so the number of iterations it would take for convergence was never fixed. This is, of course, a result of our policy iteration implementation. Still, we saw that policy iteration generally converged much faster than value iteration. For example, comparing Figures 3. and 3.5, policy iteration converges four times faster than value iteration; we can see that for Figure 3.5, all states have converged by the third iteration, while for Figure 3., it takes at least iterations for value iteration to even get close to the correct utility values. We also found that, for policy iteration, varying the discount factor does not appear to have an influence on the number of iterations required to find the optimal policy, unlike for the value iteration algorithm. This result seems reasonable, firstly, since our policy iteration algorithm starts with a random policy for the first iteration, and secondly, since the convergence of policy iteration does not depend directly on Bellman updates propagating through the state space, as it does for value iteration. 5 Conclusion In this assignment, Tom wrote the majority of the value iteration functions in Matlab, prepared the figures and graphs, and helped edit and proofread the report. He learned more about the relationship between utility functions and policy functions in an MDP, the limits of value and policy iteration, and how to create and save plots of 3-dimensional matrices of data in Matlab. Grace wrote the majority of the policy iteration functions in Matlab, ran the algorithms on the given MDPs with various parameters, and wrote a good share of the report. She developed a better understanding of MDPs, value iteration, and policy iteration, as well as learned more about the Matlab environment and how to accomplish things such as working with matrices and linear equations.
Reinforcement Learning: A brief introduction. Mihaela van der Schaar
Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming
More informationMarkov Decision Processes and Reinforcement Learning
Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence
More informationAssignment 4: CS Machine Learning
Assignment 4: CS7641 - Machine Learning Saad Khan November 29, 2015 1 Introduction The purpose of this assignment is to apply some of the techniques learned from reinforcement learning to make decisions
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationMarkov Decision Processes. (Slides from Mausam)
Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology
More informationPolicy Iteration, Value Iteration, and Linear Programming
151-0563-01 Dynamic Programming and Optimal Control (Fall 2018) Programming Exercise Topic: Infinite Horizon Problems Issued: Nov 22, 2018 Due: Dec 19, 2018 Rajan Gill(rgill@ethz.ch), Weixuan Zhang(wzhang@ethz.ch),
More informationNon-Homogeneous Swarms vs. MDP s A Comparison of Path Finding Under Uncertainty
Non-Homogeneous Swarms vs. MDP s A Comparison of Path Finding Under Uncertainty Michael Comstock December 6, 2012 1 Introduction This paper presents a comparison of two different machine learning systems
More informationFinal Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.
CS 188 Spring 2010 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators
More informationProbabilistic Double-Distance Algorithm of Search after Static or Moving Target by Autonomous Mobile Agent
2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel Probabilistic Double-Distance Algorithm of Search after Static or Moving Target by Autonomous Mobile Agent Eugene Kagan Dept.
More information15-780: MarkovDecisionProcesses
15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic
More informationTo earn the extra credit, one of the following has to hold true. Please circle and sign.
CS 188 Spring 2011 Introduction to Artificial Intelligence Practice Final Exam To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 3 or more hours on the
More informationPlanning and Control: Markov Decision Processes
CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What
More informationPascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering
Study of Multiple Multiagent Reinforcement Learning Algorithms in Grid Games by Pascal De Beck-Courcelle A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of
More informationDriveFaster: Optimizing a Traffic Light Grid System
DriveFaster: Optimizing a Traffic Light Grid System Abstract CS221 Fall 2016: Final Report Team Members: Xiaofan Li, Ahmed Jaffery Traffic lights are the central point of control of traffic for cities
More informationAdversarial Policy Switching with Application to RTS Games
Adversarial Policy Switching with Application to RTS Games Brian King 1 and Alan Fern 2 and Jesse Hostetler 2 Department of Electrical Engineering and Computer Science Oregon State University 1 kingbria@lifetime.oregonstate.edu
More informationArtificial Intelligence
Artificial Intelligence Other models of interactive domains Marc Toussaint University of Stuttgart Winter 2018/19 Basic Taxonomy of domain models Other models of interactive domains Basic Taxonomy of domain
More informationReinforcement Learning (2)
Reinforcement Learning (2) Bruno Bouzy 1 october 2013 This document is the second part of the «Reinforcement Learning» chapter of the «Agent oriented learning» teaching unit of the Master MI computer course.
More informationA Brief Introduction to Reinforcement Learning
A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent
More informationREINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION
REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION ABSTRACT Mark A. Mueller Georgia Institute of Technology, Computer Science, Atlanta, GA USA The problem of autonomous vehicle navigation between
More informationWhen Network Embedding meets Reinforcement Learning?
When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE
More informationMidterm I. Introduction to Artificial Intelligence. CS 188 Fall You have approximately 3 hours.
CS 88 Fall 202 Introduction to Artificial Intelligence Midterm I You have approximately 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators
More informationValue Iteration. Reinforcement Learning: Introduction to Machine Learning. Matt Gormley Lecture 23 Apr. 10, 2019
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Reinforcement Learning: Value Iteration Matt Gormley Lecture 23 Apr. 10, 2019 1
More informationˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
S 88 Summer 205 Introduction to rtificial Intelligence Final ˆ You have approximately 2 hours 50 minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
More informationReinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended)
Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Pavlos Andreadis, February 2 nd 2018 1 Markov Decision Processes A finite Markov Decision Process
More informationarxiv: v1 [cs.cv] 2 Sep 2018
Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering
More informationProgramming Reinforcement Learning in Jason
Programming Reinforcement Learning in Jason Amelia Bădică 1, Costin Bădică 1, Mirjana Ivanović 2 1 University of Craiova, Romania 2 University of Novi Sad, Serbia Talk Outline Introduction, Motivation
More informationIntroduction to Fall 2008 Artificial Intelligence Midterm Exam
CS 188 Introduction to Fall 2008 Artificial Intelligence Midterm Exam INSTRUCTIONS You have 80 minutes. 70 points total. Don t panic! The exam is closed book, closed notes except a one-page crib sheet,
More informationQ-learning with linear function approximation
Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007
More informationIncremental methods for computing bounds in partially observable Markov decision processes
Incremental methods for computing bounds in partially observable Markov decision processes Milos Hauskrecht MIT Laboratory for Computer Science, NE43-421 545 Technology Square Cambridge, MA 02139 milos@medg.lcs.mit.edu
More informationIntroduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University
Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:
More informationStuck in Traffic (SiT) Attacks
Stuck in Traffic (SiT) Attacks Mina Guirguis Texas State University Joint work with George Atia Traffic 2 Intelligent Transportation Systems V2X communication enable drivers to make better decisions: Avoiding
More informationApprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang
Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control
More informationResidual Advantage Learning Applied to a Differential Game
Presented at the International Conference on Neural Networks (ICNN 96), Washington DC, 2-6 June 1996. Residual Advantage Learning Applied to a Differential Game Mance E. Harmon Wright Laboratory WL/AAAT
More informationPartially Observable Markov Decision Processes. Silvia Cruciani João Carvalho
Partially Observable Markov Decision Processes Silvia Cruciani João Carvalho MDP A reminder: is a set of states is a set of actions is the state transition function. is the probability of ending in state
More informationApproximate Linear Programming for Average-Cost Dynamic Programming
Approximate Linear Programming for Average-Cost Dynamic Programming Daniela Pucci de Farias IBM Almaden Research Center 65 Harry Road, San Jose, CA 51 pucci@mitedu Benjamin Van Roy Department of Management
More informationNicole Dobrowolski. Keywords: State-space, Search, Maze, Quagent, Quake
The Applicability of Uninformed and Informed Searches to Maze Traversal Computer Science 242: Artificial Intelligence Dept. of Computer Science, University of Rochester Nicole Dobrowolski Abstract: There
More informationReview of the Robust K-means Algorithm and Comparison with Other Clustering Methods
Review of the Robust K-means Algorithm and Comparison with Other Clustering Methods Ben Karsin University of Hawaii at Manoa Information and Computer Science ICS 63 Machine Learning Fall 8 Introduction
More informationDistributed and Asynchronous Policy Iteration for Bounded Parameter Markov Decision Processes
Distributed and Asynchronous Policy Iteration for Bounded Parameter Markov Decision Processes Willy Arthur Silva Reis 1, Karina Valdivia Delgado 2, Leliane Nunes de Barros 1 1 Departamento de Ciência da
More informationExploring Performance Tradeoffs in a Sudoku SAT Solver CS242 Project Report
Exploring Performance Tradeoffs in a Sudoku SAT Solver CS242 Project Report Hana Lee (leehana@stanford.edu) December 15, 2017 1 Summary I implemented a SAT solver capable of solving Sudoku puzzles using
More informationSpace-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs
Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs Nevin L. Zhang and Weihong Zhang lzhang,wzhang @cs.ust.hk Department of Computer Science Hong Kong University of Science &
More informationMaximizing an interpolating quadratic
Week 11: Monday, Apr 9 Maximizing an interpolating quadratic Suppose that a function f is evaluated on a reasonably fine, uniform mesh {x i } n i=0 with spacing h = x i+1 x i. How can we find any local
More informationMonte Carlo Tree Search PAH 2015
Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the
More informationHi everyone. I hope everyone had a good Fourth of July. Today we're going to be covering graph search. Now, whenever we bring up graph algorithms, we
Hi everyone. I hope everyone had a good Fourth of July. Today we're going to be covering graph search. Now, whenever we bring up graph algorithms, we have to talk about the way in which we represent the
More informationCSC 2515 Introduction to Machine Learning Assignment 2
CSC 2515 Introduction to Machine Learning Assignment 2 Zhongtian Qiu(1002274530) Problem 1 See attached scan files for question 1. 2. Neural Network 2.1 Examine the statistics and plots of training error
More informationMidterm I. Introduction to Artificial Intelligence. CS 188 Fall You have approximately 3 hours.
CS 88 Fall 202 Introduction to Artificial Intelligence Midterm I You have approximately 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators
More informationPLite.jl. Release 1.0
PLite.jl Release 1.0 October 19, 2015 Contents 1 In Depth Documentation 3 1.1 Installation................................................ 3 1.2 Problem definition............................................
More informationComputer Game Programming Basic Path Finding
15-466 Computer Game Programming Basic Path Finding Robotics Institute Path Planning Sven Path Planning needs to be very fast (especially for games with many characters) needs to generate believable paths
More informationFundamentals of Operations Research. Prof. G. Srinivasan. Department of Management Studies. Indian Institute of Technology, Madras. Lecture No.
Fundamentals of Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture No. # 13 Transportation Problem, Methods for Initial Basic Feasible
More informationIntroduction to Artificial Intelligence Midterm 1. CS 188 Spring You have approximately 2 hours.
CS 88 Spring 0 Introduction to Artificial Intelligence Midterm You have approximately hours. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS Summer Introduction to Artificial Intelligence Midterm ˆ You have approximately minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. ˆ Mark your answers
More informationNote on Neoclassical Growth Model: Value Function Iteration + Discretization
1 Introduction Note on Neoclassical Growth Model: Value Function Iteration + Discretization Makoto Nakajima, UIUC January 27 We study the solution algorithm using value function iteration, and discretization
More informationAnalysis of ARES Data using ML-EM
Analysis of ARES Data using ML-EM Nicole Eikmeier Hosting Site: Lawrence Berkeley National Laboratory Mentor(s): Brian Quiter, Mark Bandstra Abstract. Imaging analysis of background data collected from
More informationGradient Reinforcement Learning of POMDP Policy Graphs
1 Gradient Reinforcement Learning of POMDP Policy Graphs Douglas Aberdeen Research School of Information Science and Engineering Australian National University Jonathan Baxter WhizBang! Labs July 23, 2001
More informationConstraint Satisfaction Algorithms for Graphical Games
Constraint Satisfaction Algorithms for Graphical Games Vishal Soni soniv@umich.edu Satinder Singh baveja@umich.edu Computer Science and Engineering Division University of Michigan, Ann Arbor Michael P.
More informationPerseus: randomized point-based value iteration for POMDPs
Universiteit van Amsterdam IAS technical report IAS-UA-04-02 Perseus: randomized point-based value iteration for POMDPs Matthijs T. J. Spaan and Nikos lassis Informatics Institute Faculty of Science University
More informationPartially Observable Markov Decision Processes. Mausam (slides by Dieter Fox)
Partially Observable Markov Decision Processes Mausam (slides by Dieter Fox) Stochastic Planning: MDPs Static Environment Fully Observable Perfect What action next? Stochastic Instantaneous Percepts Actions
More informationAttend to details of the value iteration and policy iteration algorithms Reflect on Markov decision process behavior Reinforce C programming skills
CSC 261 Lab 11: Markov Decision Processes Fall 2015 Assigned: Tuesday 1 December Due: Friday 11 December, 5:00 pm Objectives: Attend to details of the value iteration and policy iteration algorithms Reflect
More informationGeneralized Inverse Reinforcement Learning
Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes
More informationDeep Q-Learning to play Snake
Deep Q-Learning to play Snake Daniele Grattarola August 1, 2016 Abstract This article describes the application of deep learning and Q-learning to play the famous 90s videogame Snake. I applied deep convolutional
More informationRanking Clustered Data with Pairwise Comparisons
Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances
More informationLearning to bounce a ball with a robotic arm
Eric Wolter TU Darmstadt Thorsten Baark TU Darmstadt Abstract Bouncing a ball is a fun and challenging task for humans. It requires fine and complex motor controls and thus is an interesting problem for
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationA fast point-based algorithm for POMDPs
A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl
More informationMonte Carlo Tree Search
Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated
More informationPredictive Autonomous Robot Navigation
Predictive Autonomous Robot Navigation Amalia F. Foka and Panos E. Trahanias Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece and Department of Computer
More informationLecture #3: PageRank Algorithm The Mathematics of Google Search
Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,
More informationMarco Wiering Intelligent Systems Group Utrecht University
Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment
More informationParallelizing LU Factorization
Parallelizing LU Factorization Scott Ricketts December 3, 2006 Abstract Systems of linear equations can be represented by matrix equations of the form A x = b LU Factorization is a method for solving systems
More informationAn Agent-Based Adaptation of Friendship Games: Observations on Network Topologies
An Agent-Based Adaptation of Friendship Games: Observations on Network Topologies David S. Dixon University of New Mexico, Albuquerque NM 87131, USA Abstract. A friendship game in game theory is a network
More informationModeling Plant Succession with Markov Matrices
Modeling Plant Succession with Markov Matrices 1 Modeling Plant Succession with Markov Matrices Concluding Paper Undergraduate Biology and Math Training Program New Jersey Institute of Technology Catherine
More informationThe Simplex Algorithm for LP, and an Open Problem
The Simplex Algorithm for LP, and an Open Problem Linear Programming: General Formulation Inputs: real-valued m x n matrix A, and vectors c in R n and b in R m Output: n-dimensional vector x There is one
More informationWorksheet Answer Key: Scanning and Mapping Projects > Mine Mapping > Investigation 2
Worksheet Answer Key: Scanning and Mapping Projects > Mine Mapping > Investigation 2 Ruler Graph: Analyze your graph 1. Examine the shape formed by the connected dots. i. Does the connected graph create
More informationThe exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS Summer Introduction to Artificial Intelligence Midterm You have approximately minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark your answers
More informationHierarchical Assignment of Behaviours by Self-Organizing
Hierarchical Assignment of Behaviours by Self-Organizing W. Moerman 1 B. Bakker 2 M. Wiering 3 1 M.Sc. Cognitive Artificial Intelligence Utrecht University 2 Intelligent Autonomous Systems Group University
More informationChapter 14 Global Search Algorithms
Chapter 14 Global Search Algorithms An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Introduction We discuss various search methods that attempts to search throughout the entire feasible set.
More informationThroughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach
Throughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach Charles Pandana and K. J. Ray Liu Department of Electrical and Computer Engineering University of Maryland,
More informationSearch in discrete and continuous spaces
UNSW COMP3431: Robot Architectures S2 2006 1 Overview Assignment #1 Answer Sheet Search in discrete and continuous spaces Due: Start of Lab, Week 6 (1pm, 30 August 2006) The goal of this assignment is
More informationAdvanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras
Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture 16 Cutting Plane Algorithm We shall continue the discussion on integer programming,
More informationTargeting Specific Distributions of Trajectories in MDPs
Targeting Specific Distributions of Trajectories in MDPs David L. Roberts 1, Mark J. Nelson 1, Charles L. Isbell, Jr. 1, Michael Mateas 1, Michael L. Littman 2 1 College of Computing, Georgia Institute
More informationStatistical Techniques in Robotics (16-831, F10) Lecture #02 (Thursday, August 28) Bayes Filtering
Statistical Techniques in Robotics (16-831, F10) Lecture #02 (Thursday, August 28) Bayes Filtering Lecturer: Drew Bagnell Scribes: Pranay Agrawal, Trevor Decker, and Humphrey Hu 1 1 A Brief Example Let
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationMathZoom, Summer, 2014
A one-dimensional bug starts at the origin and each minute moves either left or right exactly one unit. Suppose it makes there moves with equal likelihood. That is the probability of a move to the left
More informationChapter 3 Limits and Derivative Concepts
Chapter 3 Limits and Derivative Concepts 1. Average Rate of Change 2. Using Tables to Investigate Limits 3. Symbolic Limits and the Derivative Definition 4. Graphical Derivatives 5. Numerical Derivatives
More informationDetection of Man-made Structures in Natural Images
Detection of Man-made Structures in Natural Images Tim Rees December 17, 2004 Abstract Object detection in images is a very active research topic in many disciplines. Probabilistic methods have been applied
More informationAdaptive Robotics - Final Report Extending Q-Learning to Infinite Spaces
Adaptive Robotics - Final Report Extending Q-Learning to Infinite Spaces Eric Christiansen Michael Gorbach May 13, 2008 Abstract One of the drawbacks of standard reinforcement learning techniques is that
More informationICRA 2012 Tutorial on Reinforcement Learning I. Introduction
ICRA 2012 Tutorial on Reinforcement Learning I. Introduction Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt Motivational Example: Helicopter Control Unstable Nonlinear Complicated dynamics Air flow
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:
More information6.094 Introduction to MATLAB January (IAP) 2009
MIT OpenCourseWare http://ocw.mit.edu 6.094 Introduction to MATLAB January (IAP) 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 6.094: Introduction
More informationUsing Markov decision processes to optimise a non-linear functional of the final distribution, with manufacturing applications.
Using Markov decision processes to optimise a non-linear functional of the final distribution, with manufacturing applications. E.J. Collins 1 1 Department of Mathematics, University of Bristol, University
More informationMáté Lengyel, Peter Dayan: Hippocampal contributions to control: the third way
Máté Lengyel, Peter Dayan: Hippocampal contributions to control: the third way David Nagy journal club at 1 markov decision processes 2 model-based & model-free control 3 a third way 1 markov decision
More informationPackage pomdp. January 3, 2019
Package pomdp January 3, 2019 Title Solver for Partially Observable Markov Decision Processes (POMDP) Version 0.9.1 Date 2019-01-02 Provides an interface to pomdp-solve, a solver for Partially Observable
More informationLatency of Remote Access
Latency of Remote Access Introdunction Remote access refers to that we log in the server system which is far away from where we are at the present. One typical example of Remote Access is Windows RDP.
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationArtificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras
Artificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras (Refer Slide Time: 00:17) Lecture No - 10 Hill Climbing So, we were looking
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationroving the Learning Efficiencies of ealtime Searc
roving the Learning Efficiencies of ealtime Searc Toru Ishida Masashi Shimbo Department of Information Science Kyoto University Kyoto, 606-01, JAPAN {ishida, shimbo}@kuis.kyoto-u.ac.jp Abstract The capability
More informationTraining Intelligent Stoplights
Training Intelligent Stoplights Thomas Davids, Michael Celentano, and Luke Knepper December 14, 2012 1 Introduction Traffic is a huge problem for the American economy. In 2010, the average American commuter
More informationEvolutionary Computation Algorithms for Cryptanalysis: A Study
Evolutionary Computation Algorithms for Cryptanalysis: A Study Poonam Garg Information Technology and Management Dept. Institute of Management Technology Ghaziabad, India pgarg@imt.edu Abstract The cryptanalysis
More informationRanking Clustered Data with Pairwise Comparisons
Ranking Clustered Data with Pairwise Comparisons Kevin Kowalski nargle@cs.wisc.edu 1. INTRODUCTION Background. Machine learning often relies heavily on being able to rank instances in a large set of data
More information