Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay

Size: px
Start display at page:

Download "Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay"

Transcription

1 Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay Haiyan (Helena) Yin, Sinno Jialin Pan School of Computer Science and Engineering Nanyang Technological University Singapore {haiyanyin, November 15, 2017 Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

2 Overview 1 Motivation 2 Background Deep Q-Learning Multi-task Deep Reinforcement Learning 3 Methodology Multi-task Architecture Hierarchical Experience Replay 4 Experiments Atari 2600 domain 5 Conclusion Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

3 Motivation Deep reinforcement learning (DRL) enables us to derive sequential decision making policies from low-level sensory inputs. Mastering the game of Go with deep neural networks and tree search (2016) Feudal networks for hierarchical reinforcement learning (2017) Sim-to-Real robot learning from pixels with progressive nets (2016) Training each DRL model requires extensive computational efforts. e.g. With a modern GPU, training a deep RL model for Atari takes about 1 week. Each DRL model could only be used on a single task domain. Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

4 Objective Train a multi-task DQN, which can be used across multiple task domains. Knowledge transfer via a student-teacher setting Policy distillation. Avoid negative transfer on individual task. New network architecture. Increase sample efficiency. Hierarchical experience sampling. Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

5 Markov Decision Process Definition A Markov Decision Process is a tuple S, A, P, R, γ S is a set of states. A is a set of actions. P is state transition probability, P a ss = P[S t+1 = s S t = s, A t = a] R is a reward function R a s = E[R t+1 S t = s, A t = a] γ is a discount factor, γ [0, 1] Definition Optimization: to learn a behavior policy π, to maximize the expected cumulative future reward: Q(s t, a t ) = E π [ k=0 γk R t+k+1 s t, a t ] Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

6 Multi-task DQN Multi-task DQN adopts a student-teacher architecture for supervised policy training. Policy distillation. Rusu et al, ICLR (2016). Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

7 Teacher: Deep Q-Networks (DQN) DQN optimizes Q(s,a) using Q-learning algorithm: [ ] Q (s, a) = E s r + γ max Q (s, a s, a). a L(θ i ) = E s,a [(r + γmax a Q(s, a ; θ i 1 ) Q(s, a; θ i )) 2 ] DQN uses experience replay to sample experience to update the network Store transition (s t, a t, r t+1, s t+1 ) as experience in replay memory D Sample uniformly for mini-batch of experience (s, a, r, s ) from D To reduce variance, DQN adopts reward clipping and parameter freezing Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

8 Transfer: Teacher(s) Student Multi-task DQN Setting: Suppose there is a set of m source tasks, S 1,..., S m Each task i has trained a teacher network, Q Ti Each domain keeps a replay memory D (i) = {e (i) k, q(i) k } Denote the output value by multi-task student as q (S) (k) Loss functions for the supervised training: L NLL (D (i), θ S ) = D (i) i=1 logp(a i = a i,best s i, θ S ) L MSE (D (i), θ S ) = D (i) i=1 q(i) k q (S) (k) 2 2 L KL (D (i), θ S ) = D (i) i=1 q(i) k softmax( τ q (i) softmax( k )ln τ ) softmax(q (S) (k) ) Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

9 Challenge for Transfer Negative transfer for multi-task network: Slow convergence for learning(#training frames scale to 1e8) Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

10 Multi-task Architecture AMN Actor-Mimic Deep Multitask And Transfer Reinforcement Learning (ICLR 2016) Dist Policy Distillation (ICLR 2016) Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

11 Multi-task Architecture A new multi-task framework with task-specific high-level features: Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

12 Multi-task Architecture Intention for using task-specific high-level feature: Improve performance: Low-level pixel representation is quite game-specific, sharing very little statistical base. Sharing the convolutional filters among tasks may fail to learn important task-specific features. Time efficiency: Lead to much more reduced convergence time. Using the task-specific convolutional filters (i.e., pre-training) doesn t involve additional cost. Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

13 Experience Replay for DQN Uniform sampling [DQN, Double-DQN, A3C, etc] Probability for each experience from D is equal: 1 D Samples follow the data distribution Prioritized experience replay [Prior.DQN] Select experience based on TD error: (r + γmax a Q(s, a ; θ i 1 ) Q(s, a; θ i )) TD error serves as an informative metric for prioritization Samples no longer follows the original data distribution Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

14 State Visiting Distribution for Breakout State distribution for Breakout with networks of different playing ability. Playing ability increases from Net-1 to Net-3. Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

15 Hierarchical Prioritized Experience Replay Partition the state space Propose state distribution based on V(s) V(s) = max a Q(s, a) Observe boundry [V (i) min, V max] (i) Divide V(s) into p partitions with equal length {[V (i) 1, V (i) (i) 2 ], (V 2, V (i) (i) 3 ],...(V p, V (i) p+1 ]} Hierarchical Sampling: Within partition: prioritized sampling (based on distillation error) Partition selection: uniform sampling Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

16 Hierarchical Prioritized Experience Replay Uniform sampling on partitions For each task i, track of the num. of experience samples assigned to partition j within a window: N (i) j Probability for partition j to be selected (for task i): P (i) j = N(i) j p k=1 N(i) k Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

17 Hierarchical Prioritized Experience Replay Prioritization within selected partition (a rank-based approach) ( ) Prioritization: δ (i) [k] = 1 A Ti f q (i) ( ) j [k] τ f q (S) j [k] Weight: σ (i) 1 j (k) = rank (i) j (k) Prioritized sampling probability (within partition): P (i) j [k] = ( σ (i) j N (i) j t=1 ( ) α (k) σ (i) j 1 ) (1) α (t) Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

18 Hierarchical Prioritized Experience Replay Definition The overall probability for an experience k for task i from partition j to be sampled w.r.t the entire replay memory is: P (i) j (k) = P (i) j P (i) j [k] Bias correction via importance sampling: w (i) j (k) = 1 p t=1 N(i) t P (i) j P (i) j [k] β = 1 N (i) j 1 P (i) j [k] β, (2) The gradient used for mini-batch update with hierarchical importance sampling is: ŵ (i) j (k) δ (i) j [k] Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

19 Experiment with Atari 2600 Evaluation Criteria: Architectural efficiency: A multi-task domain with 10 Atari games: Beamrider, Breakout, Enduro, Freeway, Ms.Pacman, Pong, Q*bert, Seaquest, Space Invaders, and River Raid. Sampling efficiency: A multi-task domain with 4 Atari games: Breakout, Freeway, Pong and Q*bert Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

20 Evaluation on Architecture Baseline approaches : AMN: unified CNN+FC DIST: (shared CNN+FC) + (game-specific FC+output) Teacher DIST AMN Proposed (score) (% of teacher) Beamrider Breakout Enduro Freeway Ms.Pacman Pong Q*bert Seaquest Space Invaders River Raid Geometric Mean (Note that all approaches adopt uniform sampling) Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

21 Evaluation on Architecture Learning curves on 4 games with the most slow convergence: Tasks: Breakout, Enduro, River Raid, Space Invaders AMN/DIST: >2.5m mini-batch Proposed: <1.5m mini-batch Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

22 Evaluation on Sampling Efficiency Baseline approaches: Uniform PR: rank-based prioritized replay Tasks: Breakout, Freeway, Pong, Q*bert Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

23 Conclusion Supervised training of multi-task DQN via policy distillation. Using task-specific features reduce negative transfer effect and save training time. Hierarchical prioritized sampling accelerates the learning by considering state visiting distribution. Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

24 Hashing Over Predicted Future Frames for Informed Exploration of Deep Reinforcement Learning Haiyan Yin, Sinno Jialin Pan Informed Exploration with Model-based Knowledge. Haiyan (Helena) Yin, Sinno Jialin Pan Multitask DRL with H-PR November 15, / 25

25 Thank you.

Slides credited from Dr. David Silver & Hung-Yi Lee

Slides credited from Dr. David Silver & Hung-Yi Lee Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to

More information

A Brief Introduction to Reinforcement Learning

A Brief Introduction to Reinforcement Learning A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent

More information

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:

More information

When Network Embedding meets Reinforcement Learning?

When Network Embedding meets Reinforcement Learning? When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

arxiv: v1 [cs.cv] 2 Sep 2018

arxiv: v1 [cs.cv] 2 Sep 2018 Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering

More information

GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning

GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning Koichi Shirahata, Youri Coppens, Takuya Fukagai, Yasumoto Tomita, and Atsushi Ike FUJITSU LABORATORIES LTD. March 27, 2018 0 Deep

More information

Introduction to Deep Q-network

Introduction to Deep Q-network Introduction to Deep Q-network Presenter: Yunshu Du CptS 580 Deep Learning 10/10/2016 Deep Q-network (DQN) Deep Q-network (DQN) An artificial agent for general Atari game playing Learn to master 49 different

More information

Neural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze)

Neural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze) Neural Episodic Control Alexander pritzel et al. 2017 (presented by Zura Isakadze) Reinforcement Learning Image from reinforce.io RL Example - Atari Games Observed States Images. Internal state - RAM.

More information

arxiv: v2 [cs.lg] 16 May 2017

arxiv: v2 [cs.lg] 16 May 2017 EFFICIENT PARALLEL METHODS FOR DEEP REIN- FORCEMENT LEARNING Alfredo V. Clemente Department of Computer and Information Science Norwegian University of Science and Technology Trondheim, Norway alfredvc@stud.ntnu.no

More information

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10 Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Accelerating Reinforcement Learning in Engineering Systems

Accelerating Reinforcement Learning in Engineering Systems Accelerating Reinforcement Learning in Engineering Systems Tham Chen Khong with contributions from Zhou Chongyu and Le Van Duc Department of Electrical & Computer Engineering National University of Singapore

More information

Neural Networks and Tree Search

Neural Networks and Tree Search Mastering the Game of Go With Deep Neural Networks and Tree Search Nabiha Asghar 27 th May 2016 AlphaGo by Google DeepMind Go: ancient Chinese board game. Simple rules, but far more complicated than Chess

More information

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING M. Babaeizadeh,, I.Frosio, S.Tyree, J. Clemons, J.Kautz University of Illinois at Urbana-Champaign, USA NVIDIA, USA An ICLR 2017 paper A github project GPU-BASED

More information

Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015

Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015 Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015 Content Demo Framework Remarks Experiment Discussion Content Demo Framework Remarks Experiment Discussion

More information

Applications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich?

Applications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich? Applications of Reinforcement Learning Ist künstliche Intelligenz gefährlich? Table of contents Playing Atari with Deep Reinforcement Learning Playing Super Mario World Stanford University Autonomous Helicopter

More information

CS 234 Winter 2018: Assignment #2

CS 234 Winter 2018: Assignment #2 Due date: 2/10 (Sat) 11:00 PM (23:00) PST These questions require thought, but do not require long answers. Please be as concise as possible. We encourage students to discuss in groups for assignments.

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Reinforcement Learning: A brief introduction. Mihaela van der Schaar

Reinforcement Learning: A brief introduction. Mihaela van der Schaar Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

Representations and Control in Atari Games Using Reinforcement Learning

Representations and Control in Atari Games Using Reinforcement Learning Representations and Control in Atari Games Using Reinforcement Learning by Yitao Liang Class of 2016 A thesis submitted in partial fulfillment of the requirements for the honor in Computer Science May

More information

arxiv: v1 [stat.ml] 1 Sep 2017

arxiv: v1 [stat.ml] 1 Sep 2017 Mean Actor Critic Kavosh Asadi 1 Cameron Allen 1 Melrose Roderick 1 Abdel-rahman Mohamed 2 George Konidaris 1 Michael Littman 1 arxiv:1709.00503v1 [stat.ml] 1 Sep 2017 Brown University 1 Amazon 2 Providence,

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Deep Reinforcement Learning for Urban Traffic Light Control

Deep Reinforcement Learning for Urban Traffic Light Control Deep Reinforcement Learning for Urban Traffic Light Control Noé Casas Department of Artificial Intelligence Universidad Nacional de Educación a Distancia This dissertation is submitted for the degree of

More information

arxiv: v1 [cs.lg] 6 Nov 2018

arxiv: v1 [cs.lg] 6 Nov 2018 Quasi-Newton Optimization in Deep Q-Learning for Playing ATARI Games Jacob Rafati 1 and Roummel F. Marcia 2 {jrafatiheravi,rmarcia}@ucmerced.edu 1 Electrical Engineering and Computer Science, 2 Department

More information

COMPRESSING U-NET USING KNOWLEDGE DISTILLATION

COMPRESSING U-NET USING KNOWLEDGE DISTILLATION COMPRESSING U-NET USING KNOWLEDGE DISTILLATION Karttikeya Mangalam Master s Semester Project Computer Vision Lab EPFL, Switzerland Mentored By: Dr. Mathieu Salzmann 17 th January 2018 Introduction Objective

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Generalized Inverse Reinforcement Learning

Generalized Inverse Reinforcement Learning Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu Artificial Neural Networks Introduction to Computational Neuroscience Ardi Tampuu 7.0.206 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Markov Decision Processes. (Slides from Mausam)

Markov Decision Processes. (Slides from Mausam) Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology

More information

Deep Reinforcement Learning

Deep Reinforcement Learning Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic

More information

Markov Decision Processes (MDPs) (cont.)

Markov Decision Processes (MDPs) (cont.) Markov Decision Processes (MDPs) (cont.) Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 29 th, 2007 Markov Decision Process (MDP) Representation State space: Joint state x

More information

Learning to bounce a ball with a robotic arm

Learning to bounce a ball with a robotic arm Eric Wolter TU Darmstadt Thorsten Baark TU Darmstadt Abstract Bouncing a ball is a fun and challenging task for humans. It requires fine and complex motor controls and thus is an interesting problem for

More information

Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning

Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning Jan Peters 1, Stefan Schaal 1 University of Southern California, Los Angeles CA 90089, USA Abstract. In this paper, we

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Distributed Deep Q-Learning

Distributed Deep Q-Learning Distributed Deep Q-Learning Kevin Chavez 1, Hao Yi Ong 1, and Augustus Hong 1 Abstract We propose a distributed deep learning model to successfully learn control policies directly from highdimensional

More information

Planning and Control: Markov Decision Processes

Planning and Control: Markov Decision Processes CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

Deep Reinforcement Learning and the Atari MARC G. BELLEMARE Google Brain

Deep Reinforcement Learning and the Atari MARC G. BELLEMARE Google Brain Deep Reinforcement Learning and the Atari 2600 MARC G. BELLEMARE Google Brain DEEP REINFORCEMENT LEARNING Inputs Hidden layers Output Backgammon (Tesauro, 1994) Elevator control (Crites and Barto, 1996)

More information

Lecture 13: Learning from Demonstration

Lecture 13: Learning from Demonstration CS 294-5 Algorithmic Human-Robot Interaction Fall 206 Lecture 3: Learning from Demonstration Scribes: Samee Ibraheem and Malayandi Palaniappan - Adapted from Notes by Avi Singh and Sammy Staszak 3. Introduction

More information

Unsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1

Unsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1 Unsupervised CS 3793/5233 Artificial Intelligence Unsupervised 1 EM k-means Procedure Data Random Assignment Assign 1 Assign 2 Soft k-means In clustering, the target feature is not given. Goal: Construct

More information

Markov Decision Processes and Reinforcement Learning

Markov Decision Processes and Reinforcement Learning Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving

Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving Xi Xiong Jianqiang Wang Fang Zhang Keqiang Li State Key Laboratory of Automotive Safety and Energy, Tsinghua University

More information

A Deep Reinforcement Learning-Based Framework for Content Caching

A Deep Reinforcement Learning-Based Framework for Content Caching A Deep Reinforcement Learning-Based Framework for Content Caching Chen Zhong, M. Cenk Gursoy, and Senem Velipasalar Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse,

More information

Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift. Amit Patel Sambit Pradhan

Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift. Amit Patel Sambit Pradhan Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift Amit Patel Sambit Pradhan Introduction Internal Covariate Shift Batch Normalization Computational Graph of Batch

More information

Deep Q-Learning to play Snake

Deep Q-Learning to play Snake Deep Q-Learning to play Snake Daniele Grattarola August 1, 2016 Abstract This article describes the application of deep learning and Q-learning to play the famous 90s videogame Snake. I applied deep convolutional

More information

Semantic Segmentation. Zhongang Qi

Semantic Segmentation. Zhongang Qi Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in

More information

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours. CS 188 Spring 2010 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Deep Reinforcement Learning with Macro-Actions arxiv:1606.04615v1 [cs.lg] 15 Jun 2016 Ishan P. Durugkar University of Massachusetts, Amherst, MA 01002 idurugkar@cs.umass.edu Stefan Dernbach University

More information

Throughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach

Throughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach Throughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach Charles Pandana and K. J. Ray Liu Department of Electrical and Computer Engineering University of Maryland,

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning V. Zhong, C. Xiong, R. Socher Salesforce Research arxiv: 1709.00103 Reviewed by : Bill Zhang University of Virginia

More information

Lecture 18: Video Streaming

Lecture 18: Video Streaming MIT 6.829: Computer Networks Fall 2017 Lecture 18: Video Streaming Scribe: Zhihong Luo, Francesco Tonolini 1 Overview This lecture is on a specific networking application: video streaming. In particular,

More information

Pascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering

Pascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering Study of Multiple Multiagent Reinforcement Learning Algorithms in Grid Games by Pascal De Beck-Courcelle A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of

More information

A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning

A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning Ning Liu, Zhe Li, Jielong Xu, Zhiyuan Xu, Sheng Lin, Qinru Qiu, Jian Tang, Yanzhi Wang Department

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Knowledge-Defined Networking: Towards Self-Driving Networks

Knowledge-Defined Networking: Towards Self-Driving Networks Knowledge-Defined Networking: Towards Self-Driving Networks Albert Cabellos (UPC/BarcelonaTech, Spain) albert.cabellos@gmail.com 2nd IFIP/IEEE International Workshop on Analytics for Network and Service

More information

TD LEARNING WITH CONSTRAINED GRADIENTS

TD LEARNING WITH CONSTRAINED GRADIENTS TD LEARNING WITH CONSTRAINED GRADIENTS Anonymous authors Paper under double-blind review ABSTRACT Temporal Difference Learning with function approximation is known to be unstable. Previous work like Sutton

More information

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space Sikai Zhong February 14, 2018 COMPUTER SCIENCE Table of contents 1. PointNet 2. PointNet++ 3. Experiments 1 PointNet Property

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Gradient Reinforcement Learning of POMDP Policy Graphs

Gradient Reinforcement Learning of POMDP Policy Graphs 1 Gradient Reinforcement Learning of POMDP Policy Graphs Douglas Aberdeen Research School of Information Science and Engineering Australian National University Jonathan Baxter WhizBang! Labs July 23, 2001

More information

Content-Aware Personalised Rate Adaptation for Adaptive Streaming via Deep Video Analysis

Content-Aware Personalised Rate Adaptation for Adaptive Streaming via Deep Video Analysis Content-Aware Personalised Rate Adaptation for Adaptive Streaming via Deep Video Analysis Guanyu Gao, Linsen Dong, Huaizheng Zhang, Yonggang Wen, and Wenjun Zeng 2 Nanyang Technological University, Singapore

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Probabilistic Siamese Network for Learning Representations. Chen Liu

Probabilistic Siamese Network for Learning Representations. Chen Liu Probabilistic Siamese Network for Learning Representations by Chen Liu A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical

More information

Deep Reinforcement Learning for Pellet Eating in Agar.io

Deep Reinforcement Learning for Pellet Eating in Agar.io Deep Reinforcement Learning for Pellet Eating in Agar.io Nil Stolt Ansó 1, Anton O. Wiehe 1, Madalina M. Drugan 2 and Marco A. Wiering 1 1 Bernoulli Institute, Department of Artificial Intelligence, University

More information

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction

More information

CME 213 SPRING Eric Darve

CME 213 SPRING Eric Darve CME 213 SPRING 2017 Eric Darve MPI SUMMARY Point-to-point and collective communications Process mapping: across nodes and within a node (socket, NUMA domain, core, hardware thread) MPI buffers and deadlocks

More information

Robotic Search & Rescue via Online Multi-task Reinforcement Learning

Robotic Search & Rescue via Online Multi-task Reinforcement Learning Lisa Lee Department of Mathematics, Princeton University, Princeton, NJ 08544, USA Advisor: Dr. Eric Eaton Mentors: Dr. Haitham Bou Ammar, Christopher Clingerman GRASP Laboratory, University of Pennsylvania,

More information

Mobile Robot Obstacle Avoidance based on Deep Reinforcement Learning

Mobile Robot Obstacle Avoidance based on Deep Reinforcement Learning Mobile Robot Obstacle Avoidance based on Deep Reinforcement Learning by Shumin Feng Thesis submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of

More information

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization Presented by: Karen Lucknavalai and Alexandr Kuznetsov Example Style Content Result Motivation Transforming content of an image

More information

Deep Learning Cook Book

Deep Learning Cook Book Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation

More information

CS221 Final Project: Learning Atari

CS221 Final Project: Learning Atari CS221 Final Project: Learning Atari David Hershey, Rush Moody, Blake Wulfe {dshersh, rmoody, wulfebw}@stanford December 11, 2015 1 Introduction 1.1 Task Definition and Atari Learning Environment Our goal

More information

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs In-Place Activated BatchNorm for Memory- Optimized Training of DNNs Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder Mapillary Research Paper: https://arxiv.org/abs/1712.02616 Code: https://github.com/mapillary/inplace_abn

More information

Application of convolutional neural networks to RL control problems. by Marek Romanowicz (T) Fourth-year undergraduate project in Group F,

Application of convolutional neural networks to RL control problems. by Marek Romanowicz (T) Fourth-year undergraduate project in Group F, Application of convolutional neural networks to RL control problems by Marek Romanowicz (T) Fourth-year undergraduate project in Group F, 2014-2015 I hereby declare that, except where specifically indicated,

More information

CS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8

CS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8 CS885 Reinforcement Learning Lecture 9: May 30, 2018 Model-based RL [SutBar] Chap 8 CS885 Spring 2018 Pascal Poupart 1 Outline Model-based RL Dyna Monte-Carlo Tree Search CS885 Spring 2018 Pascal Poupart

More information

Optimizing Distributed MIMO Wi-Fi Networks with Deep Reinforcement Learning

Optimizing Distributed MIMO Wi-Fi Networks with Deep Reinforcement Learning Optimizing Distributed MIMO Wi-Fi Networks with Deep Reinforcement Learning Neelakantan Nurani Krishnan, Student Member, IEEE, Eric Torkildson, Member, IEEE, Narayan Mandayam, Fellow, IEEE, Dipankar Raychaudhuri,

More information

CP365 Artificial Intelligence

CP365 Artificial Intelligence CP365 Artificial Intelligence Example Problem Problem: Does a given image contain cats? Input vector: RGB/BW pixels of the image. Output: Yes or No. Example Problem Problem: What category is a news story?

More information

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,

More information

CS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation

CS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation CS 687 Jana Kosecka Reinforcement Learning Continuous State MDP s Value Function approximation Markov Decision Process - Review Formal definition 4-tuple (S, A, T, R) Set of states S - finite Set of actions

More information

Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning

Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning 1 Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning Yuanlong Li, Yonggang Wen, Kyle Guan, and Dacheng Tao arxiv:179.577v1 [cs.ai] 15 Sep 217 Abstract Cooling system

More information

Exploring Boosted Neural Nets for Rubik s Cube Solving

Exploring Boosted Neural Nets for Rubik s Cube Solving Exploring Boosted Neural Nets for Rubik s Cube Solving Alexander Irpan Department of Computer Science University of California, Berkeley alexirpan@berkeley.edu Abstract We explore whether neural nets can

More information

GraphGAN: Graph Representation Learning with Generative Adversarial Nets

GraphGAN: Graph Representation Learning with Generative Adversarial Nets The 32 nd AAAI Conference on Artificial Intelligence (AAAI 2018) New Orleans, Louisiana, USA GraphGAN: Graph Representation Learning with Generative Adversarial Nets Hongwei Wang 1,2, Jia Wang 3, Jialin

More information

CSE 490R P3 - Model Learning and MPPI Due date: Sunday, Feb 28-11:59 PM

CSE 490R P3 - Model Learning and MPPI Due date: Sunday, Feb 28-11:59 PM CSE 490R P3 - Model Learning and MPPI Due date: Sunday, Feb 28-11:59 PM 1 Introduction In this homework, we revisit the concept of local control for robot navigation, as well as integrate our local controller

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

arxiv: v1 [cs.lg] 22 Jul 2018

arxiv: v1 [cs.lg] 22 Jul 2018 NAVREN-RL: Learning to fly in real environment via end-to-end deep reinforcement learning using monocular images Malik Aqeel Anwar 1, Arijit Raychowdhury 2 Department of Electrical and Computer Engineering

More information

A Novel Two-Layered Reinforcement Learning for Task Offloading with Tradeoff between Physical Machine Utilization Rate and Delay

A Novel Two-Layered Reinforcement Learning for Task Offloading with Tradeoff between Physical Machine Utilization Rate and Delay future internet Article A Novel Two-Layered Reinforcement Learning for Task Offloading with Tradeoff between Physical Machine Utilization Rate and Delay Li Quan 1, Zhiliang Wang 1, * and Fuji Ren 2 1 School

More information

Clustering with Reinforcement Learning

Clustering with Reinforcement Learning Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of

More information

arxiv: v1 [cs.ai] 18 Apr 2017

arxiv: v1 [cs.ai] 18 Apr 2017 Investigating Recurrence and Eligibility Traces in Deep Q-Networks arxiv:174.5495v1 [cs.ai] 18 Apr 217 Jean Harb School of Computer Science McGill University Montreal, Canada jharb@cs.mcgill.ca Abstract

More information

WestminsterResearch

WestminsterResearch WestminsterResearch http://www.westminster.ac.uk/research/westminsterresearch Reinforcement learning in continuous state- and action-space Barry D. Nichols Faculty of Science and Technology This is an

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods Alessandro Lazaric Marcello Restelli Andrea Bonarini Department of Electronics and Information Politecnico di Milano

More information

Resource Provisioning and Profit Maximization for Transcoding in Clouds: A Two-Timescale Approach

Resource Provisioning and Profit Maximization for Transcoding in Clouds: A Two-Timescale Approach 1 Resource Provisioning and Profit Maximization for Transcoding in Clouds: A Two-Timescale Approach Guanyu Gao, Han Hu, Yonggang Wen, Senior Member, IEEE, Cedric Westphal, Senior Member, IEEE Abstract

More information

arxiv: v1 [cs.lg] 7 Dec 2018

arxiv: v1 [cs.lg] 7 Dec 2018 A new multilayer optical film optimal method based on deep q-learning A.Q. JIANG, 1,* OSAMU YOSHIE, 1 AND L.Y. CHEN 2 1 Graduate school of IPS, Waseda University, Fukuoka 8080135, Japan 2 Department of

More information

Generative Adversarial Text to Image Synthesis

Generative Adversarial Text to Image Synthesis Generative Adversarial Text to Image Synthesis Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee Presented by: Jingyao Zhan Contents Introduction Related Work Method

More information

Q-learning with linear function approximation

Q-learning with linear function approximation Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007

More information

Supervised Hashing for Image Retrieval via Image Representation Learning

Supervised Hashing for Image Retrieval via Image Representation Learning Supervised Hashing for Image Retrieval via Image Representation Learning Rongkai Xia, Yan Pan, Cong Liu (Sun Yat-Sen University) Hanjiang Lai, Shuicheng Yan (National University of Singapore) Finding Similar

More information