DriveFaster: Optimizing a Traffic Light Grid System

Similar documents
Welfare Navigation Using Genetic Algorithm

CSE151 Assignment 2 Markov Decision Processes in the Grid World

Homework 2: Search and Optimization

Solving Traveling Salesman Problem Using Parallel Genetic. Algorithm and Simulated Annealing

Controlling user groups in traffic

Reinforcement Learning for Adaptive Routing of Autonomous Vehicles in Congested Networks

GenStat for Schools. Disappearing Rock Wren in Fiordland

Lightweight Simulation of Air Traffic Control Using Simple Temporal Networks

Traffic Light Control by Multiagent Reinforcement Learning Systems

Training Intelligent Stoplights

Formations in flow fields

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network

SPEED SURVEY ANALYSIS SYSTEM

Estimating Parking Spot Occupancy

Control Charts. An Introduction to Statistical Process Control

DIAL: A Distributed Adaptive-Learning Routing Method in VDTNs

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data

Review of the Robust K-means Algorithm and Comparison with Other Clustering Methods

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Universiteit Leiden Computer Science

A Joint Replication-Migration-based Routing in Delay Tolerant Networks

Non-Homogeneous Swarms vs. MDP s A Comparison of Path Finding Under Uncertainty

Stability of Marriage and Vehicular Parking

Rise-Time Enhancement Techniques for Resistive Array Infrared Scene Projectors

Multiple Regression White paper

ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Predicting Diabetes using Neural Networks and Randomized Optimization

Archna Rani [1], Dr. Manu Pratap Singh [2] Research Scholar [1], Dr. B.R. Ambedkar University, Agra [2] India

Optimal Detector Locations for OD Matrix Estimation

REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION

Using Traffic Models in Switch Scheduling

Character Recognition

arxiv: v1 [cs.cv] 2 Sep 2018

A Routing Protocol for Utilizing Multiple Channels in Multi-Hop Wireless Networks with a Single Transceiver

BUSNet: Model and Usage of Regular Traffic Patterns in Mobile Ad Hoc Networks for Inter-Vehicular Communications

Separating Speech From Noise Challenge

311 Predictions on Kaggle Austin Lee. Project Description

Improving the Hopfield Network through Beam Search

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

Object vs Image-based Testing Producing Automated GUI Tests to Withstand Change

Study on Indoor and Outdoor environment for Mobile Ad Hoc Network: Random Way point Mobility Model and Manhattan Mobility Model

Background subtraction in people detection framework for RGB-D cameras

Streaming videos. Problem statement for Online Qualification Round, Hash Code 2017

Graph Structure Over Time

Histogram-Based Density Discovery in Establishing Road Connectivity

Naïve Bayes for text classification

Optimal Crane Scheduling

outline Sensor Network Navigation without Locations 11/5/2009 Introduction

Traffic Simulator. Revised Software Requirements Specification. Khalid AlHokail Luke Bay James Grady Michael Murphy

Selecting DaoOpt solver configuration for MPE problems

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Reinforcement Learning of Traffic Light Controllers under Partial Observability

Predicting Bus Arrivals Using One Bus Away Real-Time Data

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li

ADAPTIVE K MEANS CLUSTERING FOR HUMAN MOBILITY MODELING AND PREDICTION Anu Sharma( ) Advisor: Prof. Peizhao Hu

Quality of Service Mechanism for MANET using Linux Semra Gulder, Mathieu Déziel

Last topic: Summary; Heuristics and Approximation Algorithms Topics we studied so far:

Traffic-Signal Control in Real-World Environments

TEST EXAM PART 2 INTERMEDIATE LAND NAVIGATION

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

Wireless Internet Routing. Learning from Deployments Link Metrics

Replica Distribution Scheme for Location-Dependent Data in Vehicular Ad Hoc Networks using a Small Number of Fixed Nodes

Announcements Wednesday, August 22

Chapter 6. Semi-Lagrangian Methods

When Network Embedding meets Reinforcement Learning?

CS 331: Artificial Intelligence Local Search 1. Tough real-world problems

Information Brokerage

Deep Q-Learning to play Snake

A STUDY OF THE PERFORMANCE TRADEOFFS OF A TRADE ARCHIVE

Last Class: Processes

Local Search Methods. CS 188: Artificial Intelligence Fall Announcements. Hill Climbing. Hill Climbing Diagram. Today

ANALYZING AND COMPARING TRAFFIC NETWORK CONDITIONS WITH A QUALITY TOOL BASED ON FLOATING CAR AND STATIONARY DATA

Lesson 14: Graph of a Linear Equation Horizontal and Vertical Lines

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report

CS 229 Final Report: Location Based Adaptive Routing Protocol(LBAR) using Reinforcement Learning

IMPROVING THE DATA COLLECTION RATE IN WIRELESS SENSOR NETWORKS BY USING THE MOBILE RELAYS

Random Search Report An objective look at random search performance for 4 problem sets

Programming Project. Remember the Titans

Evaluating Robot Systems

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Introduction to Fall 2008 Artificial Intelligence Midterm Exam

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

DYNAMIC SITE LAYOUT PLANNING USING MTPE PRINCIPLE FROM PHYSICS

SOLVING AN ACCESSIBILITY-MAXIMIZATION ROAD NETWORK DESIGN MODEL: A COMPARISON OF HEURISTICS

Scalable Trigram Backoff Language Models

Location Traceability of Users in Location-based Services

Travelling Salesman Problem: Tabu Search

Cable s Role in the 5G Evolution

SILAB A Task Oriented Driving Simulation

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

EXECUTION PLAN OPTIMIZATION TECHNIQUES

4 INFORMED SEARCH AND EXPLORATION. 4.1 Heuristic Search Strategies

Homework 1 Solutions:

3. Evaluation of Selected Tree and Mesh based Routing Protocols

Fairness Example: high priority for nearby stations Optimality Efficiency overhead

An Evaluation of Shared Multicast Trees with Multiple Active Cores

Performance Comparison of Scalable Location Services for Geographic Ad Hoc Routing

Robotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.

Confidence Based Dual Reinforcement Q-Routing: An adaptive online network routing algorithm

OPERATING SYSTEMS. Systems with Multi-programming. CS 3502 Spring Chapter 4

Transcription:

DriveFaster: Optimizing a Traffic Light Grid System Abstract CS221 Fall 2016: Final Report Team Members: Xiaofan Li, Ahmed Jaffery Traffic lights are the central point of control of traffic for cities and can be an effective tool for keeping traffic moving. The cheapest way to speed up traffic flow for a city is to optimize their traffic light system. This project aims to optimize traffic flow through a grid of street intersections by controlling only the traffic light grid. The goal is to minimize time from a random start point to a random end point for every car. Using an MDP model with Q-Learning will allow the system to adapt to a variety of situations and traffic patterns. This paper will give details into the MDP definition and the effect of various features upon the Q-Learning simulation. 1. Introduction This paper will investigate the use of a Markhov Decision Process (MDP) model to learn traffic patterns and apply Q-Learning to improve traffic flow. The model will control only the state of each traffic light (green/red) and base its decision making on traffic information. It will take into account the number of cars in the system and wait time at each traffic light. To simplify the system and reduce the number of variables, the cars will all have a constant speed of 1 and the intersection length will be 1. By keeping these values constant we can focus more on the features that focus on managing an increasing number of cars. Firstly we summarize a paper that gave us background information and define our problem in detail. We will then end by discussing our model and approach as well as analyzing the results of Q-Learning as more features are added. 2. Related Works Traffic has been increasing significantly over the past few decades as many people continue to migrate to a few key cities. Minimizing traffic delays and improving traffic flows is a problem that has been worked on by many municipal governments with varying results. A paper from the Deakin University in Australia conducted a study on traffic simulation comparing the use of Q-Learning and Neural Networks. Their goal was to reduce average delay time at an intersection. In the MDP model they chose actions that maximize short term reward rather than distant future rewards. They had a minimum time that a light 1

would be in the green state and have a maximum of 2 extensions that had a fixed time of 10 seconds each. The Neural Network they designed used a Genetic Algorithm and Simulated Annealing which improves the initial solution by selecting another neighbor solution and doing small changes by comparing them to the fitness function. There is no minimum fixed green time or fixed extension time. The advantage of this method is having fewer constraints than Q-Learning and being able to determine exact times for green/red light states. The downside they had was having a much larger state space due to more variability. 3. Content & Simulation a. Problem Definition This project aims to optimize traffic flow through a grid of intersections. The goal is minimize time from a random start point to a random end point for every car. For the scope of this project the model will have 2 major constraints in order to reduce the number of variables: Firstly, the road map will have an M*N grid pattern with fixed distances between each intersection. Secondly, all cars will travel at the same speed and take a time score of 1 to cross a road (from one intersection to the next). The cars will be randomly spawned at an entrance position and make their way over to a randomly chosen exit position. All edges are considered as spawn points and there will only be a green light and red light state for every traffic light (go/stop). The primary variable that will be adjusted is the duration of a green or red light state per traffic light. Initially the reward was calculated by summing the total wait time each car experienced. As the project progressed, the reward was changed to the total time it took for the system to solve (time it took for all cars to reach their destination). b. Car and Traffic Light Generation The MDP is initialized with a set number of cars, a set number of additional cars that will be periodically added, and a grid size of M*N. The number of traffic lights is determined by multiplying M and N. Each traffic light is 1 unit away from each other. The number of cars initialized was arbitrarily set to equal half the number of traffic lights. After each action cycle additional cars are generated at random start positions. Adding additional cars simulates varying traffic trends throughout the day and helps create a more realistic traffic model. These two functions compromise the setup of our simulation. 2

c. Approaches i. Oracle The oracle was designed to have a pre-determined traffic pattern and based on this it ran a search algorithm for the optimal action at a given state. A state will take into account traffic at all intersections and determine how to process based on the successorstate function. We used a 2x2 grid with 4 cars. The result was an reward of 0 here because the system knew the positions of each car and enabled the green light before the car arrived at the intersection. ii. Baseline The baseline algorithm takes a greedy approach and simply activates the greenlight for the direction in which there are maximum number of cars at that one intersection. We simulated a 2x2 grid with 4 cars. This is the approach that is used in most intelligent intersections currently. There is also an upper bound defined in terms of maximum time limit that a traffic light can stay Green in one direction per cycle. The result of the baseline was very low as expected. The reward here was 6 due to each car having a wait time of 1.5. iii. MDP a. State The MDP state is defined using a tuple of ([Car States], [Light States]) which describes the state space. The list [Car States] contains a list of car states for each car, each of which has information about the car position, direction and place in line (if waiting for a light). The list [Light States] contains a list of light states, each of which records the current light color for the up/down direction, and the number of cars waiting at each direction. b. Actions The available actions from a state will be defined as an M*N vector of traffic light colors in the next iteration. This will control the next state of each traffic light which will move the cars throughout the streets. c. Randomness In order to introduce randomness of car actions and simulate different driving patterns, cars can choose to make a turn with probabilities as follows: 3

P (turning) = dist(currentp os, endp os). 5(1[is currentdirection blocked by light]) d. Random Drivers Decisions P (go straight) = 1 P (turning) The decisions made by each car also adds to the randomness of the environment. The drivers follow the rules below: 1. Each driver always make progress towards the goal by always turning towards the goal positions. 2. If the goal is on the left side and in front of the current position: there s a 50% chance that the car will turn left and 50% that the car will go straight. In this case, the driver s decision is independent of the current signal color. 3. If the goal is on the right side and in front of the current position: 3.1. if the current light is red, then the car has a 80% chance of turning right and 20% chance of waiting for the light to turn green. 3.2. If the current light is green, then the car has 20% chance of turning right and 80% chance of going straight. 4. It is impossible for the goal to be behind the current position based on the above rules and the fact that we always initialize the starting positions on the borders of the grid. The intuition behind the rules is to simulate realistic decision making for the drivers. e. State Transition Complexity At each time step, we update State based on the current Action. First the light state is updated because it is most directly impacted by the current action vector; then the car state is updated based on direction, position as well as the updated light state. Light states and car states can each be updated in parallel because any light or car is independent of one another. The only dependency is that the car states have to be updated after light states. If there are L lights and N cars then at each time step: S equential update complexity : O(L + N ) 4

P arallel update complexity : O(1) iv. Learning We used a standard Q-learning algorithm in order to minimize global wait time, which is the sum of all wait times for all cars spawned. Q ˆ opt(s, a) (1 η)[qˆ opt(s, a) + η(r + γv ˆ opt(s 0 ))] V ˆ opt(s 0 ) = max [a Action(s )]Qˆ opt(s, a ) Beside from standard Q-learning algorithm, we also explored various feature extractors: 1. Traffic Light Information Features total_action Keep track of all possible actions at a given state. total_red Total number of lights that are red. total_green - Total number of lights that are green. total_cars Total number of cars on the grid currently. cars_per_light_per_direction Total number of cars at each traffic light per direction. Will be stored in a hash. This will help the computer learn that it is not good to have too many cars in the que. wait_time_per_light_per_direction How much total wait time is in the traffic light queue currently. This will help keep track of cars that have been on the grid for a long time and help prioritize those cars. 2. Spacial Features 2D_Neighborhood - This will extract features based on the 2D neighborhood of a given light. When we know that all the lights surrounding a particular intersection are green/red, we can assign more/less weight accordingly. The spacial feature will extract the geometric features in the intersection grid and use it as a heavily weighted feature. This will help optimize sub sections of the grid map rather than just looking at the whole grid map. 4. Results a. Evaluation Metric 5

To evaluate the effectiveness of our MDP model we ran Q-Learning for various grid sizes with 30 cars and 1000 iterations. We will also investigate the effect of each feature individually and plot how features affect the overall reward of the system. The graph will display a comparison of the number of features versus total reward. This will display the effectivity of our feature extractor and give a comparison of the model vs the optimal result as features are added. b. Analysis In order to better analyze the results, we introduce the concept of Accuracy, where it is defined as: Accuracy = -1 * 100 / score. With this, we can directly compare the results of the learning algorithm with different features. We also compare standard deviation for every 100 iterations to show how stable the predictions are. In the diagram, the three sets of features compared are the following: 1. Feature Set 1: All basic features such as number of cars at each light, number of current cars and current turning directions etc. 2. Feature Set 2: All temporal features such as cumulative wait time for each car, ticketing number etc. 3. Feature Set 3: All spatial features such as neighborhood features and lighting states in certain positions of the grid etc. In total, we have 7 features across all three feature groups. The results are as follows: 6

In the above diagram, we compare the prediction accuracy with the features we used to do the prediction across each grid size. As we can see, the spatial features really increased the accuracy of the prediction. Especially for small grids, it shows significant improvement with spatial features. This result makes sense because in smaller grids, spatial features such as the neighborhood feature essentially computes the whole state. It is also worth noting that with only temporal feature (with feature 1+2), we are seeing degradation in larger grids (16 and 25 lights). This is probably because the temporal features are most focused on localized states such as car wait time. Therefore, with larger grids, it actually hurts accuracy prediction. However, it does show that the spatial extractor performs with higher deviations with larger grids. We think it is because with larger grids, there are more information being incorporated in the neighborhood feature, thus creating more randomness, which makes the results less stable. Despite the stability issue, the results with the spatial feature extractor still consistently outperform the other features. Overall, we cannot see a trend in the standard deviations across grid sizes. In general, the deviation increases as the grid sizes increase but it is not clear if this observation will continue had we tried with larger sizes. 7

c. Challenges The primary challenges of this approach was keeping track of car wait time, taking into account neighboring local traffic lights, and managing the state size. Initially the car wait time was included in the state, however this was affecting the Q-learning as each additional second the car waited that was being considered as a new state. In reality the system state is the same regardless of the wait time of each individual car. This issue was solved by replacing the car wait time with overall simulation time. This approach also allowed the model to optimize for a global solution rather than only individual cars. For example, if all cars except 1 reach their goal in 5 time units and the last car reaches its goal in 100 time units, the average time for each car is low; however the full system time is 100. Optimizing for the global solution minimizes the full system time and ensures that all cars will reach their destination faster. This also reduced the state size considerably. One of the primary features that was decided upon was taking into account the local grid for each traffic light and the total number of cars in each direction at neighboring lights, rather than looking at each traffic light separately. Dividing the grid however prioritized moving cars through certain sectors in certain directions. If there were more cars traveling in the North/South direction then the cars going West/East would end up waiting until the North/South direction was cleared of cars. This was solved by adding in a maximum time that a traffic light could be green in one direction. After this maximum time was hit, the traffic light would switch directions regardless of the number of cars. 5. Conclusion This paper studied the effectivity of using the MDP model for improving traffic flow through traffic light control. The three main features that are investigated consist of basic traffic traffic light information, cumulative traffic wait time, and local neighborhood features. As seen in the graphs above, the local neighborhood feature was determined to be the most crucial as it alone improved the average score by about 24%. The primary challenge that was unable to be overcome was the large state size and scaling the simulation up. In the future, work will have to be done to decrease the state complexity and use a sampling method to look at every Nth traffic light rather 8

than every single one. In conclusion, applying Q-Learning with a few critical features on an MDP model can be a viable way to solve the traffic light control problem. 6. References 1. http://swarmlab.unimaas.nl/ala2013/papers/tuesession1paper2.pdf 2. http://cs229.stanford.edu/proj2015/369_report.pdf 3. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6722370 9