When Network Embedding meets Reinforcement Learning?

Size: px
Start display at page:

Download "When Network Embedding meets Reinforcement Learning?"

Transcription

1 When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1

2 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE and RL to solve combinatorial problems on graphs 2

3 1. An Introduction to (Deep) Reinforcement Learning 3

4 Supervised Learning 4

5 Unsupervised Learning 5

6 Reinforcement Learning 6

7 Reinforcement Learning 7

8 Reinforcement Learning - Example 8

9 How can we mathematically formalize the RL problem? Markov Decision Process : MDP = Markov Assumption: P(s $%& s (, a (,..., s $, a $ ) = P(s $%& s $,a $ ) Reward Assumption: R(s (, a (,..., s $, a $, s $%& ) = R(s $, a $, s $%& ) = r $%& R 9

10 How can we mathematically formalize the RL problem? Markov Decision Process : MDP = Follow a policy, sample trajectories (or) path s0, a0, r0, s1, a1, r1, Policy: π(s, a) = P(a $ s $ ) [0,1], that is a $ π(s $, ) Goal: find policy π that maximizes cumulative discount rewards: max? γ $ r $ $A( 10

11 Value function 11

12 State Value function : V 12

13 State Value function : V 13

14 State-Action Value function : Q 14

15 Relation between Q and V functions: 15

16 The optimal Value function and optimal Policy Optimal policy and optimal state-value function: 16

17 The optimal Value function and optimal Policy 17

18 The optimal Value function and optimal Policy Proof: Algorithms for Reinforcement Learning, Szepesvári, Csaba. 18

19 Bellman optimality equation for V* 19

20 Bellman optimality equation for V* 20

21 Bellman optimality equation for Q* 21

22 Solving for the optimal value function 22

23 Solving for the optimal value function : Q-learning DQN : Deep Q Nework 23

24 Solving for the optimal value function : Q-learning 24

25 Case study: Atari Game Minh et al. NIPS Workshop 2014: Nature

26 Q-network Architecture Minh et al. NIPS Workshop 2014: Nature

27 Training Techniques : Experience Replay Experience Replay: store the agent s experiences at each time-step, e t = (s t, a t, r t, s t+1 ), in a data set D t ={e 1,,e t }, pooled over many episodes into a replay memory, and train Q network on random mini-batch samples from this replay memory. Advantages: Ø Each step of experience is potentially used in many weight updates, which allows for greater data efficiency; Ø Learning directly from consecutive samples is inefficient, owing to the strong correlations between the samples, randomizing the samples breaks these correlations and therefore reduces the variance of the updates. 27

28 Training Techniques : target Q network Use a separate network for generating the targets y in the Q-learning updates. More precisely, every C updates, clone the network Q to obtain a target network Q, and use Q for generating the Q-learning targets y for the following C updates. Advantages: Generating the targets using an older set of parameters adds a delay between the time an update to Q is made and time the update affects the targets y, making divergence or oscillations much more unlikely. 28

29 Putting it together: Deep Q-Learning with Experience Replay Human-level control through deep reinforcement learning, Nature,

30 Putting it together: Deep Q-Learning with Experience Replay Initialize replay memory, Q-network Human-level control through deep reinforcement learning, Nature,

31 Putting it together: Deep Q-Learning with Experience Replay For each timestep t of the game Human-level control through deep reinforcement learning, Nature,

32 Putting it together: Deep Q-Learning with Experience Replay With small probability, select a random action (explore), otherwise select greedy action from current policy Human-level control through deep reinforcement learning, Nature,

33 Putting it together: Deep Q-Learning with Experience Replay Take the action (at), and observe the reward rt and next state st+1 Human-level control through deep reinforcement learning, Nature,

34 Putting it together: Deep Q-Learning with Experience Replay Store transition in replay memory Human-level control through deep reinforcement learning, Nature,

35 Putting it together: Deep Q-Learning with Experience Replay Experience replay: sample a random minibatch of transitions from replay memory and perform a gradient descent step Human-level control through deep reinforcement learning, Nature,

36 Putting it together: Deep Q-Learning with Experience Replay Human-level control through deep reinforcement learning, Nature, 2015 Every C steps, clone the current Q network to obtain a target Q network 36

37 Using DQN to play Atari Games

38 Two main methods of Reinforcement Learning Ø Value based: Q-Learning Ø Policy based: Policy Gradient 38

39 Solving for the optimal policy: Policy Gradient What s the problem with Q-learning? The Q-function can be very complicated! For example: a robot grasping an object has a very high-dimensional state =>hard to learn exact value of every (state, action) pair. But the policy can be much simpler: just close your hand Can we learn a policy directly, e.g. find the best policy from a collection of polies? 39

40 Solving for the optimal policy: Policy Gradient Gradient ascent on policy parameters! 40

41 Solving for the optimal policy: Policy Gradient 41

42 Variance Reduction 42

43 How to choose the baseline? 43

44 Actor-Critic Algorithm 44

45 Actor-Critic Algorithm 45

46 2. How to combine NE and RL to solve combinatorial optimization problems on graphs Q1: What is a combinatorial optimization problem? A1: It is a topic that consists of finding an optimal objects from a finite set of objects. Q2: What is a combinatorial optimization problem on graph? A2: Objects refer to the nodes or edges, such as MVC (minimum vertex cover), MAXCUT,, etc.. Q3: What are the traditional methods for graph combinatorial optimization problem? A3: Traditional methods include three types: Ø Exact algorithm Ø Approximate algorithm Ø Heuristic algorithm Q4: Why the new algorithm? A4: All three paradigms seldom exploit a common trait of real-world optimization problems: instances of the same type of problem are solved again and again on a regular basis, maintaining the same combinatorial structure, but differing mainly in their data. This algorithm aims to provide a general framework for these problems. 46

47 A motivational example Minimum Vertex Cover Find smallest vertex subset S s.t. each edge has at least one end in S Models advertising optimization in social networks 47

48 A motivational example Minimum Vertex Cover Find smallest vertex subset S s.t. each edge has at least one end in S Models advertising optimization in social networks 48

49 Proposal: Learning Greedy Algorithms Minimum Vertex Cover 2-approx: greedily pick the uncovered edge with maximum sum of degrees of its endpoints and cover its two connected nodes Goal: construct a solution by sequentially adding nodes to a partial solution S, based on maximizing some evaluation function Q, which measures the 49

50 Problem Statement Given a graph optimization problem G and a distribution D of problem instances, can we learn better greedy heuristics that generalize to unseen instances from D? Minimum Vertex Cover Maximum Cut Traveling Salesman Prob. Insert nodes into cover Insert nodes into subset Insert nodes into sub-tour Learning combinatorial optimization algorithms over graphs, Hanjun Dai, et al. NIPS

51 Challenge #1: How to Learn Possible approach: Supervised learning Given a partial solution, predict next vertex to add to solution Data: collect (partial solution, next vertex) pairs features Task: multi-class classification label Pointer Network[Vinyals, et al., NIPS 2015]: a smarter approach with recurrent neural networks PROBLEM Need to compute good/optimal solutions to NP-Hard problems in order to learn!! 51

52 Recall: Reinforcement Learning Background Reward R(t): score you earned at current step State S: current screen Action i: move your board left / right Action value function Q Q (S, i): your predicted future total rewards Policy π(s): How to choose your action Greedy policy: i = argmax I Q K (S, i) [Minh, et al. Nature 2015]

53 Reinforcement Learning Formulation Minimum Vertex Cover min W X (,&? x I I V s. t. x I + x [ 1, i, j E Repeat until all edges covered: 1. Compute score for each vertex 2. Select vertex with largest score 3. Add best vertex to cover Reward: r $ = 1 StateS: current selected nodes SOLUTION Improve policy by learning from experience => no need to compute optima Action value function: Q Q (S, v) Greedy policy: v = argmax c Q K (S, v) Update state S 53

54 Reinforcement Learning Algorithm Algorithm 1 Q-learning for the Greedy Algorithm 1: Initialize experience replay memory M to capacity N 2: for episode e =1to L do 3: Draw graph G from distribution D 4: Initialize the state to empty S 1 =() 5: for step t =1to T do random node v 2 St, w.p. 6: v t = argmax b v2st Q(h(St ),v; ), otherwise 7: Add v t to partial solution: S t+1 :=(S t,v t ) 8: if t n then 9: Add tuple (S t n,v t n,r t n,t,s t ) to M 10: Sample random batch from B iid. M 11: Update by SGD over (6) for B 12: end if 13: end for 14: end for 15: return Instance generation. Θ: model parameters Depend on features Sample graph instance Update state Optimize model parameters (y Q(h(St b ),v t ; )) 2, y = max v 0 Q(h(St+1 b ),v 0 ; )+r(s t,v t ) deal with the issue of delayed rewards, whe 54 b

55 Challenge #2: How to Represent Representation of v A feature vector that describes v in state S $ Representation of S $ A feature vector that describes state S $ Possible approach: Feature engineering Degree, 2-hop neighborhood size, other centrality measures PROBLEMS 1- Task-specific engineering needed 2- Hard to tell what is a good feature 3- Difficult to generalize across diff. graph sizes 55

56 0 v structure2vec: Deep Node Representations [Dai, et al., ICML 2016] µ (t+1) b Repeat embedding T times: Updating feature vector v relu 1 x v b+ 2 R 2 R X X2 R Node s own tag x c Neighbors features u2n (v) µ(t) u + u2n (v) relu( 4 w(v, u)) 0 Neighbors edge weights Non-linearity: relu(x) = max (0, x) Θ: model parameters 56

57 0 v structure2vec: Deep Node Representations [Dai, et al., ICML 2016] Non-linearity: relu(x) = max (0, x) Repeat embedding T times: For each node: Update feature vector SOLUTION 1- No feature engineeringpneeded Compute Q-value: µ (t+1) v 2 R 2 2- Features parameters trained to be good bq(h(s),v; ) = 5 > relu([ 6 Xu2V µ(t u ), 7 µ (T v ) ]) 3- Can handle different graph sizes 2 Sum-pooling over nodes Θ: model parameters 57 b

58 Overall Framework 58

59 Experimental Setup Graph types Minimum Vertex Cover (MVC) Erdos-Renyi (ER) or Barabasi-Albert (BA) Maximum Cut (MAXCUT) ER or BA Traveling Salesman Problem (TSP) DIMACS generator; uniform grid or clustered Solvers ILP with CPLEX IQP with CPLEX Concorde Feature embedding size: 64 Embedding iterations T: 3 to 5 Full details in paper 59

60 Results: Solution Quality [MVC - BA] Near optimal, barely visible. approximation ratio

61 Results: Solution Quality [MAXCUT - BA] 61

62 Results: Solution Quality [TSP - clustered] 62

63 Results: Realistic Instances Network Nodes Edges Weighted MemeTracker No Physics {-1, 0, 1} TSPLIB \ \ 63

64 Results: Algorithm Behavior 64

65 Results: Algorithm Behavior step (4) step (5) step step (1) (6) step (7) step step (3) (8) step (9) step step (5) (10) step (11) 65

66 Learning graph opt: quantitative comparison Train on small graphs with nodes Generalize to not only graphs from same distribution But also larger graphs Approximation ratio < Generalization to large instances

67 Conclusion A learning framework that exploits graph structure Applies directly to many graph optimization problems Promising tool for automated algorithm design NIPS paper: Code:

68 Any Question? 68

Learning Combinatorial Optimization Algorithms over Graphs

Learning Combinatorial Optimization Algorithms over Graphs Learning Combinatorial Optimization Algorithms over Graphs arxiv:704.0665v [cs.lg] 5 Apr 207 Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song College of Computing, Georgia Institute of

More information

A Brief Introduction to Reinforcement Learning

A Brief Introduction to Reinforcement Learning A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent

More information

Learning Combinatorial Optimization Algorithms over Graphs

Learning Combinatorial Optimization Algorithms over Graphs Learning Combinatorial Optimization Algorithms over Graphs arxiv:704.0665v2 [cs.lg] 2 May 207 Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song College of Computing, Georgia Institute of

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

Slides credited from Dr. David Silver & Hung-Yi Lee

Slides credited from Dr. David Silver & Hung-Yi Lee Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:

More information

Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015

Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015 Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015 Content Demo Framework Remarks Experiment Discussion Content Demo Framework Remarks Experiment Discussion

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay

Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay Haiyan (Helena) Yin, Sinno Jialin Pan School of Computer Science and Engineering Nanyang Technological University

More information

Reinforcement Learning: A brief introduction. Mihaela van der Schaar

Reinforcement Learning: A brief introduction. Mihaela van der Schaar Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming

More information

arxiv: v1 [cs.cv] 2 Sep 2018

arxiv: v1 [cs.cv] 2 Sep 2018 Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering

More information

Markov Decision Processes and Reinforcement Learning

Markov Decision Processes and Reinforcement Learning Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence

More information

Deep Q-Learning to play Snake

Deep Q-Learning to play Snake Deep Q-Learning to play Snake Daniele Grattarola August 1, 2016 Abstract This article describes the application of deep learning and Q-learning to play the famous 90s videogame Snake. I applied deep convolutional

More information

Introduction to Deep Q-network

Introduction to Deep Q-network Introduction to Deep Q-network Presenter: Yunshu Du CptS 580 Deep Learning 10/10/2016 Deep Q-network (DQN) Deep Q-network (DQN) An artificial agent for general Atari game playing Learn to master 49 different

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Algorithms for Integer Programming

Algorithms for Integer Programming Algorithms for Integer Programming Laura Galli November 9, 2016 Unlike linear programming problems, integer programming problems are very difficult to solve. In fact, no efficient general algorithm is

More information

Reinforcement Learning (2)

Reinforcement Learning (2) Reinforcement Learning (2) Bruno Bouzy 1 october 2013 This document is the second part of the «Reinforcement Learning» chapter of the «Agent oriented learning» teaching unit of the Master MI computer course.

More information

Monte Carlo Tree Search PAH 2015

Monte Carlo Tree Search PAH 2015 Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the

More information

15-780: MarkovDecisionProcesses

15-780: MarkovDecisionProcesses 15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic

More information

Marco Wiering Intelligent Systems Group Utrecht University

Marco Wiering Intelligent Systems Group Utrecht University Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment

More information

CS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation

CS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation CS 687 Jana Kosecka Reinforcement Learning Continuous State MDP s Value Function approximation Markov Decision Process - Review Formal definition 4-tuple (S, A, T, R) Set of states S - finite Set of actions

More information

CSE151 Assignment 2 Markov Decision Processes in the Grid World

CSE151 Assignment 2 Markov Decision Processes in the Grid World CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 gclin@ucsd.edu Tom Maddock A55645 tmaddock@ucsd.edu Abstract Markov decision processes exemplify sequential problems, which are

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

CS 234 Winter 2018: Assignment #2

CS 234 Winter 2018: Assignment #2 Due date: 2/10 (Sat) 11:00 PM (23:00) PST These questions require thought, but do not require long answers. Please be as concise as possible. We encourage students to discuss in groups for assignments.

More information

Markov Decision Processes (MDPs) (cont.)

Markov Decision Processes (MDPs) (cont.) Markov Decision Processes (MDPs) (cont.) Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 29 th, 2007 Markov Decision Process (MDP) Representation State space: Joint state x

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Neural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze)

Neural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze) Neural Episodic Control Alexander pritzel et al. 2017 (presented by Zura Isakadze) Reinforcement Learning Image from reinforce.io RL Example - Atari Games Observed States Images. Internal state - RAM.

More information

Planning and Control: Markov Decision Processes

Planning and Control: Markov Decision Processes CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What

More information

Deep Reinforcement Learning

Deep Reinforcement Learning Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic

More information

To earn the extra credit, one of the following has to hold true. Please circle and sign.

To earn the extra credit, one of the following has to hold true. Please circle and sign. CS 188 Spring 2011 Introduction to Artificial Intelligence Practice Final Exam To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 3 or more hours on the

More information

Markov Decision Processes. (Slides from Mausam)

Markov Decision Processes. (Slides from Mausam) Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R

More information

Unsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1

Unsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1 Unsupervised CS 3793/5233 Artificial Intelligence Unsupervised 1 EM k-means Procedure Data Random Assignment Assign 1 Assign 2 Soft k-means In clustering, the target feature is not given. Goal: Construct

More information

arxiv: v3 [cs.lg] 12 Sep 2017

arxiv: v3 [cs.lg] 12 Sep 2017 Learning Combinatorial Optimization Algorithms over Graphs Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song College of Computing, Georgia Institute of Technology {hanjun.dai, elias.khalil,

More information

Generalized Inverse Reinforcement Learning

Generalized Inverse Reinforcement Learning Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract

More information

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011

Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011 Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011 Lecture 1 In which we describe what this course is about and give two simple examples of approximation algorithms 1 Overview

More information

Deep Reinforcement Learning for Urban Traffic Light Control

Deep Reinforcement Learning for Urban Traffic Light Control Deep Reinforcement Learning for Urban Traffic Light Control Noé Casas Department of Artificial Intelligence Universidad Nacional de Educación a Distancia This dissertation is submitted for the degree of

More information

COMP 355 Advanced Algorithms Approximation Algorithms: VC and TSP Chapter 11 (KT) Section (CLRS)

COMP 355 Advanced Algorithms Approximation Algorithms: VC and TSP Chapter 11 (KT) Section (CLRS) COMP 355 Advanced Algorithms Approximation Algorithms: VC and TSP Chapter 11 (KT) Section 35.1-35.2(CLRS) 1 Coping with NP-Completeness Brute-force search: This is usually only a viable option for small

More information

Applications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich?

Applications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich? Applications of Reinforcement Learning Ist künstliche Intelligenz gefährlich? Table of contents Playing Atari with Deep Reinforcement Learning Playing Super Mario World Stanford University Autonomous Helicopter

More information

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours. CS 188 Spring 2010 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators

More information

Machine Learning for Software Engineering

Machine Learning for Software Engineering Machine Learning for Software Engineering Introduction and Motivation Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Organizational Stuff Lectures: Tuesday 11:00 12:30 in room SR015 Cover

More information

Q-learning with linear function approximation

Q-learning with linear function approximation Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007

More information

1 Better Approximation of the Traveling Salesman

1 Better Approximation of the Traveling Salesman Stanford University CS261: Optimization Handout 4 Luca Trevisan January 13, 2011 Lecture 4 In which we describe a 1.5-approximate algorithm for the Metric TSP, we introduce the Set Cover problem, observe

More information

Gradient Reinforcement Learning of POMDP Policy Graphs

Gradient Reinforcement Learning of POMDP Policy Graphs 1 Gradient Reinforcement Learning of POMDP Policy Graphs Douglas Aberdeen Research School of Information Science and Engineering Australian National University Jonathan Baxter WhizBang! Labs July 23, 2001

More information

Accelerating Reinforcement Learning in Engineering Systems

Accelerating Reinforcement Learning in Engineering Systems Accelerating Reinforcement Learning in Engineering Systems Tham Chen Khong with contributions from Zhou Chongyu and Le Van Duc Department of Electrical & Computer Engineering National University of Singapore

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Metric Learning for Large-Scale Image Classification:

Metric Learning for Large-Scale Image Classification: Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka

More information

Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended)

Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Pavlos Andreadis, February 2 nd 2018 1 Markov Decision Processes A finite Markov Decision Process

More information

arxiv: v1 [cs.db] 22 Mar 2018

arxiv: v1 [cs.db] 22 Mar 2018 Learning State Representations for Query Optimization with Deep Reinforcement Learning Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi + University of Washington, Microsoft, Criteo

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

node2vec: Scalable Feature Learning for Networks

node2vec: Scalable Feature Learning for Networks node2vec: Scalable Feature Learning for Networks A paper by Aditya Grover and Jure Leskovec, presented at Knowledge Discovery and Data Mining 16. 11/27/2018 Presented by: Dharvi Verma CS 848: Graph Database

More information

Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions

Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions Abhishek Sinha Laboratory for Information and Decision Systems MIT MobiHoc, 2017 April 18, 2017 1 / 63 Introduction

More information

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015 15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015 While we have good algorithms for many optimization problems, the previous lecture showed that many

More information

Deep Boltzmann Machines

Deep Boltzmann Machines Deep Boltzmann Machines Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines

More information

CS270 Combinatorial Algorithms & Data Structures Spring Lecture 19:

CS270 Combinatorial Algorithms & Data Structures Spring Lecture 19: CS270 Combinatorial Algorithms & Data Structures Spring 2003 Lecture 19: 4.1.03 Lecturer: Satish Rao Scribes: Kevin Lacker and Bill Kramer Disclaimer: These notes have not been subjected to the usual scrutiny

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning

GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning Koichi Shirahata, Youri Coppens, Takuya Fukagai, Yasumoto Tomita, and Atsushi Ike FUJITSU LABORATORIES LTD. March 27, 2018 0 Deep

More information

Notes for Lecture 24

Notes for Lecture 24 U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

NP Completeness. Andreas Klappenecker [partially based on slides by Jennifer Welch]

NP Completeness. Andreas Klappenecker [partially based on slides by Jennifer Welch] NP Completeness Andreas Klappenecker [partially based on slides by Jennifer Welch] Dealing with NP-Complete Problems Dealing with NP-Completeness Suppose the problem you need to solve is NP-complete. What

More information

Probabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks

Probabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks Probabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks Dezhen Song CS Department, Texas A&M University Technical Report: TR 2005-2-2 Email: dzsong@cs.tamu.edu

More information

CS221 Final Project: Learning Atari

CS221 Final Project: Learning Atari CS221 Final Project: Learning Atari David Hershey, Rush Moody, Blake Wulfe {dshersh, rmoody, wulfebw}@stanford December 11, 2015 1 Introduction 1.1 Task Definition and Atari Learning Environment Our goal

More information

Coping with NP-Completeness

Coping with NP-Completeness Coping with NP-Completeness Siddhartha Sen Questions: sssix@cs.princeton.edu Some figures obtained from Introduction to Algorithms, nd ed., by CLRS Coping with intractability Many NPC problems are important

More information

Mobile Robot Obstacle Avoidance based on Deep Reinforcement Learning

Mobile Robot Obstacle Avoidance based on Deep Reinforcement Learning Mobile Robot Obstacle Avoidance based on Deep Reinforcement Learning by Shumin Feng Thesis submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of

More information

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL) Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL) http://chrischoy.org Stanford CS231A 1 Understanding a Scene Objects Chairs, Cups, Tables,

More information

Robotic Search & Rescue via Online Multi-task Reinforcement Learning

Robotic Search & Rescue via Online Multi-task Reinforcement Learning Lisa Lee Department of Mathematics, Princeton University, Princeton, NJ 08544, USA Advisor: Dr. Eric Eaton Mentors: Dr. Haitham Bou Ammar, Christopher Clingerman GRASP Laboratory, University of Pennsylvania,

More information

How Learning Differs from Optimization. Sargur N. Srihari

How Learning Differs from Optimization. Sargur N. Srihari How Learning Differs from Optimization Sargur N. srihari@cedar.buffalo.edu 1 Topics in Optimization Optimization for Training Deep Models: Overview How learning differs from optimization Risk, empirical

More information

Question Points Score TOTAL 55 TOTAL (Calibrated) 50

Question Points Score TOTAL 55 TOTAL (Calibrated) 50 UCSD CSE 101 Section B00, Winter 2015 FINAL EXAM March 19, 2015 NAME: Student ID: Question Points Score 1 25 2 10 3 10 4 10 TOTAL 55 TOTAL (Calibrated) 50 INSTRUCTIONS. Be clear and concise. Write your

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Subhash Suri June 5, 2018 1 Figure of Merit: Performance Ratio Suppose we are working on an optimization problem in which each potential solution has a positive cost, and we want

More information

Pascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering

Pascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering Study of Multiple Multiagent Reinforcement Learning Algorithms in Grid Games by Pascal De Beck-Courcelle A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of

More information

ACO and other (meta)heuristics for CO

ACO and other (meta)heuristics for CO ACO and other (meta)heuristics for CO 32 33 Outline Notes on combinatorial optimization and algorithmic complexity Construction and modification metaheuristics: two complementary ways of searching a solution

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,

More information

Greedy and Local Ratio Algorithms in the MapReduce Model

Greedy and Local Ratio Algorithms in the MapReduce Model Greedy and Local Ratio Algorithms in the MapReduce Model Nick Harvey 1 Chris Liaw 1 Paul Liu 2 1 University of British Columbia 2 Stanford University Greedy and Local Ratio Algorithms in the MapReduce

More information

P vs. NP. Simpsons: Treehouse of Horror VI

P vs. NP. Simpsons: Treehouse of Horror VI P vs. NP Simpsons: Treehouse of Horror VI Attribution These slides were prepared for the New Jersey Governor s School course The Math Behind the Machine taught in the summer of 2012 by Grant Schoenebeck

More information

Optimal tour along pubs in the UK

Optimal tour along pubs in the UK 1 From Facebook Optimal tour along 24727 pubs in the UK Road distance (by google maps) see also http://www.math.uwaterloo.ca/tsp/pubs/index.html (part of TSP homepage http://www.math.uwaterloo.ca/tsp/

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Chapter 6. Dynamic Programming

Chapter 6. Dynamic Programming Chapter 6 Dynamic Programming CS 573: Algorithms, Fall 203 September 2, 203 6. Maximum Weighted Independent Set in Trees 6..0. Maximum Weight Independent Set Problem Input Graph G = (V, E) and weights

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

Distributed Deep Q-Learning

Distributed Deep Q-Learning Distributed Deep Q-Learning Kevin Chavez 1, Hao Yi Ong 1, and Augustus Hong 1 Abstract We propose a distributed deep learning model to successfully learn control policies directly from highdimensional

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 29 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/7/2016 Approximation

More information

CSC Linear Programming and Combinatorial Optimization Lecture 12: Semidefinite Programming(SDP) Relaxation

CSC Linear Programming and Combinatorial Optimization Lecture 12: Semidefinite Programming(SDP) Relaxation CSC411 - Linear Programming and Combinatorial Optimization Lecture 1: Semidefinite Programming(SDP) Relaxation Notes taken by Xinwei Gui May 1, 007 Summary: This lecture introduces the semidefinite programming(sdp)

More information

Novel Function Approximation Techniques for. Large-scale Reinforcement Learning

Novel Function Approximation Techniques for. Large-scale Reinforcement Learning Novel Function Approximation Techniques for Large-scale Reinforcement Learning A Dissertation by Cheng Wu to the Graduate School of Engineering in Partial Fulfillment of the Requirements for the Degree

More information

The Buss Reduction for the k-weighted Vertex Cover Problem

The Buss Reduction for the k-weighted Vertex Cover Problem The Buss Reduction for the k-weighted Vertex Cover Problem Hong Xu Xin-Zeng Wu Cheng Cheng Sven Koenig T. K. Satish Kumar University of Southern California, Los Angeles, California 90089, the United States

More information

Approximation Algorithms

Approximation Algorithms Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 Approximation Algorithms Tamassia Approximation Algorithms 1 Applications One of

More information

Exploring Boosted Neural Nets for Rubik s Cube Solving

Exploring Boosted Neural Nets for Rubik s Cube Solving Exploring Boosted Neural Nets for Rubik s Cube Solving Alexander Irpan Department of Computer Science University of California, Berkeley alexirpan@berkeley.edu Abstract We explore whether neural nets can

More information

Forward Search Value Iteration For POMDPs

Forward Search Value Iteration For POMDPs Forward Search Value Iteration For POMDPs Guy Shani and Ronen I. Brafman and Solomon E. Shimony Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel Abstract Recent scaling up of POMDP

More information

CMSC 451: Lecture 22 Approximation Algorithms: Vertex Cover and TSP Tuesday, Dec 5, 2017

CMSC 451: Lecture 22 Approximation Algorithms: Vertex Cover and TSP Tuesday, Dec 5, 2017 CMSC 451: Lecture 22 Approximation Algorithms: Vertex Cover and TSP Tuesday, Dec 5, 2017 Reading: Section 9.2 of DPV. Section 11.3 of KT presents a different approximation algorithm for Vertex Cover. Coping

More information

Greedy algorithms Or Do the right thing

Greedy algorithms Or Do the right thing Greedy algorithms Or Do the right thing March 1, 2005 1 Greedy Algorithm Basic idea: When solving a problem do locally the right thing. Problem: Usually does not work. VertexCover (Optimization Version)

More information

Hierarchical Reinforcement Learning for Robot Navigation

Hierarchical Reinforcement Learning for Robot Navigation Hierarchical Reinforcement Learning for Robot Navigation B. Bischoff 1, D. Nguyen-Tuong 1,I-H.Lee 1, F. Streichert 1 and A. Knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701

More information

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

INTRODUCTION TO ARTIFICIAL INTELLIGENCE v=1 v= 1 v= 1 v= 1 v= 1 v=1 optima 2) 3) 5) 6) 7) 8) 9) 12) 11) 13) INTRDUCTIN T ARTIFICIAL INTELLIGENCE DATA15001 EPISDE 7: MACHINE LEARNING TDAY S MENU 1. WHY MACHINE LEARNING? 2. KINDS F ML 3. NEAREST

More information

Last topic: Summary; Heuristics and Approximation Algorithms Topics we studied so far:

Last topic: Summary; Heuristics and Approximation Algorithms Topics we studied so far: Last topic: Summary; Heuristics and Approximation Algorithms Topics we studied so far: I Strength of formulations; improving formulations by adding valid inequalities I Relaxations and dual problems; obtaining

More information

SLS Methods: An Overview

SLS Methods: An Overview HEURSTC OPTMZATON SLS Methods: An Overview adapted from slides for SLS:FA, Chapter 2 Outline 1. Constructive Heuristics (Revisited) 2. terative mprovement (Revisited) 3. Simple SLS Methods 4. Hybrid SLS

More information

Coloring 3-Colorable Graphs

Coloring 3-Colorable Graphs Coloring -Colorable Graphs Charles Jin April, 015 1 Introduction Graph coloring in general is an etremely easy-to-understand yet powerful tool. It has wide-ranging applications from register allocation

More information

Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving

Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving Xi Xiong Jianqiang Wang Fang Zhang Keqiang Li State Key Laboratory of Automotive Safety and Energy, Tsinghua University

More information