CS 103 Path-so-logical 1 Introduction In this programming assignment you will write a program to read a given maze (provided as an ASCII text file) and find the shortest path from start to finish. 2 Techniques Used 1. Using file streams to perform text file I/O 2. Performing dynamic memory allocation of arrays and understanding 2D arrays. 3. Applying the backtracking Breadth-First-Search algorithm. 4. Implement and utilize a simple queue. 3 Background Information and Notes The input maze file format is as follows. The first line of the file contains two integer numbers indicating the row and column size of the maze. The number of rows indicated will determine how many lines of text follow (1 row per line). On each line will be one character for each of the indicated number of columns followed by a newline character. The characters can be a period (.) indicating a space in the maze, a # sign indicating a wall in the maze, an S indicating the start location for your search, or an F for the desired finish location. You can t go outside the grid. (I.e., you may think that walls surround the maze perimeter.) Sample Input File 4 4..#...#S F.#.... General File Format rows cols <col characters>\n... <col characters>\n Character Meaning. (period) Free space in the maze # Wall in the maze S Start location in the maze F Finish location in the maze Your search algorithm will find a shortest path from the start cell to the finish. Indicate this path by filling in the character locations on the path with asterisks (*); then, write the resulting character maze to an output text file. E.g., Output File 4 4..#...#S F*#*.*** Sometimes, no path exists. In this case your program will just output No path could be found! to the screen instead. Last Revised: 10/19/2014 1
Breadth-First Search (BFS): Breadth First Search is a general technique with many uses including flood filling, shortest paths, and meet-in-the-middle search. The idea is to explore every possible valid location, beginning at the start location, using an ordering so that we always explore ALL locations at a shorter distance from the start/source before exploring any location at a longer distance from the start/source (i.e. all locations at a distance of i are explored before any location at a distance of i+1 from the source). This property ensures that when we do find the finish cell, we ve arrived there via a shortest-length path. As we search we mark cells that we've explored so that we don't explore them again and so the search doesn t run forever. How do we ensure the BFS property (underlined above)? We keep a list of locations in the maze. Initially, we only list the start location. In each iteration, we remove the list s first location from the list, and we add all of its new neighbors to the end of the list. It is not too hard to prove 1 that this simple algorithm successfully implements the BFS. We keep going until either we hit the finish cell (found the shortest path), or until the list becomes empty (no path exists). This behavior, of adding items only to the end of the list, while deleting items only from the start of the list, means that we are implementing a data structure known as a queue. File I/O: You will use an ifstream object to read the maze data in from the given file and an ofstream object to write your results to an output file. Remember you can use the ifstream object like you do cin to read data into variables; and like cin, the ifstream object skips whitespace (including newline characters). By reading the number of rows and columns into two integers initially, you can determine how much input needs to be read. 2-D Array Allocation: You will not know the size of the maze until runtime, when you read the maze data. Thus we will need to dynamically allocate an array to hold the maze data. Remember that new by default can only allocate a 1D array. You will need to allocate some 2D arrays in this assignment. This can be done in two ways. Using new[] once to allocate a 1D array of pointers, then using a loop containing new[] to allocate many 1D arrays. See http://cs103.usc.edu/websheets/#nxmboard http://cs103.usc.edu/websheets/#deepnames for examples of that approach. (Click on View Reference Solution.) If the second array dimension is a constant that is known at compile time (like 2), you can use one line: int (*queue)[2] = new int[max_queue_size][2]; In either case, remember to delete each array that you allocate. 1 Each neighbor of a location at distance i from the start must be at distance i-1, i, or i+1 from the start. However, any new neighbor must be at distance i+1, since those at i-1 or i were already listed. And every location at distance i+1 has some neighbor at distance i. This is enough to finish the proof. 2 Last Revised: 10/19/2014
4 Requirements Your program shall meet the following requirements for features and approach: 1. Break your program into three files with the given structure: File Description / Function Definitions maze.cpp main(int argc, char *argv[]) Accepts the input filename and output filename as command line arguments Calls other functions int maze_search(char **maze, int rows, int cols) Performs the BFS search for a valid path, filling in the path with * characters if found Returns 1 if a valid path was found, 0 if no path was found, and -1 if an error occurred during the search. Possible errors include inability to find the start ( S ) cell or finish ( F ) cell. maze_io.h maze_io.cpp Prototypes for functions in maze_io.cpp Functions: char ** read_maze(char *filename, int *rows, int *cols ); Reads the maze from the given filename, allocates an array for the maze and returns it. The rows and cols arguments should point to variables that can be filled in with the dimensions of the maze read in from the file. void print_maze(char **maze, int rows, int cols); Prints the maze dimensions and maze contents to the screen in a two dimensional format void write_maze(char *filename, char **maze, int rows, int cols); Like print_maze, but writes to the given filename. 2. A valid path consists of steps north, south, east, and west but no diagonals. Thus we only need to explore neighbors of a cell in those 4 directions, not along diagonals. 3. You must dynamically allocate arrays to hold the maze data, the BFS array/queue, and any other needed arrays. 4. A queue is a data structure with the FIFO property (First-In, First-Out): items go in the back (or tail) but are removed from the front (head). It acts like a list where we write items at the bottom and cross them off as we do them from the top. It serves to help us remember what we have to do (until it is time to actually process it) and in what order the work arrived. To implement the queue s behavior let us allocate a large array (NROWS * NCOLS which is large enough to hold all items if we needed). We will also maintain two integer indices: head (the index of the front item) and tail (the index of the back item). Both should start at 0. When an item is added, it should be placed at the Last Revised: 10/19/2014 3
index specified by tail (i.e. 0 for the first item added) and then the tail index should be incremented (i.e. to 1). Another addition to the queue would cause the same behavior (placed at location specified by the tail index and then the tail index incremented). When an item is to be removed and processed, we should always take it from the front (head) index and then increment the head. Equivalently, you can think of tail as counting how many items were added so far, and head as counting how many items were removed. Note that when you delete from the queue, you do NOT move all the other items. You simply move the head counter forwards (leaving the old stuff sitting in its original location). Later in the course you ll learn how to optimize this. For example, here is a queue where, so far, two items were inserted and no items were removed: Head Tail 0 1 2 3 4 5 6 Item Item - - - - - Now, we must be careful not to try to remove an item if the list is empty. Notice that head and tail will be the same if everything that has been added has also been removed, and tail > head if it is nonempty. 5. Remember the earlier description of how to implement breadth-first search with a queue: BFS Algorithm add start location to BFSQ while BFSQ is not empty do front <- remove earliest item from BFSQ for each neighbor of front if neighbor is open square and unfound set predecessor of neighbor = front add neighbor to BFSQ Note that each item in the queue is a pair of numbers {row, col}. 6. You need to avoid adding any location to the queue more than once. Otherwise, your search can cycle infinitely or exceed the maximum queue size. Here is a bad way to solve this problem: before inserting a location into the queue, search the whole queue to see if it s already there. This is correct but has slow performance, O(NROWS 2 *NCOLS 2 ) worst-case time. Don t do this! Instead, maintain a data structure that remembers, for each grid cell, if it s already been added to the queue or not. Therefore, this data structure should let you look up, for a given pair of indices {row, col}, whether that cell has already 4 Last Revised: 10/19/2014
been visited, or not yet visited. What type of structure could you use for this? In your code, allocate and initialize this structure before you start the BFS algorithm. At what step of the BFS algorithm do you have to mark a cell as visited? 7. The final step is actually locating the optimal path and marking it with * characters. This requires a little more bookkeeping, like a trail of breadcrumbs that you can follow from the finish back to the start. In the BFS search, you should utilize a predecessor array (which is of the same size as the queue, and therefore dynamically allocated) to track, for each enqueued location L, the queue location (i.e. array ndex) of L s neighbor that caused L to be explored (the predecessor of L). The predecessor is utilized to find the actual path when your algorithm is complete: we trace it from the finish back to the start. I.e., the predecessor of the finish cell will tell us how to go back one step, then that cell s predecessor will tell us how to go back another step, etc: previous_cell = predecessor[current_cell] At what step of the algorithm will you fill a new entry in the predecessor array? 5 Prelab 1. BEFORE beginning, consider the sample maze shown below and put yourself in place of the BFS. Our examples will explore each cell s neighbors in the order {North, West, South, East}. Show the coordinates of each cell placed in the BFSQ in the appropriate order from the Start node and stopping as soon as the Finish node is entered into the queue. Then for each cell placed in the BFS queue, keep track of how the predecessor array would be updated. Show the predecessor array s FINAL value at the end of the BFS execution. We have started the example for you. Complete it on a separate page. Maze 0123 <- Col. Index Locations Row index -> 0..#. 0,0 0,1 0,2 0,3 1..#S 1,0 1,1 1,2 1,3 2 F.#. 2,0 2,1 2,2 2,3 3... 3,0 3,1 3,2 3,3 BFSQ: (index) 0 1 2 3 value {1, 3}, {0, 3}, {2, 3}, {3, 3} PRED: -1 0 0 2 2. Next, do the backtracking on this example. Which BFSQ index did the F cell (finish) correspond to? If you backtrack from there, what indices do you pass through? What path through the maze does this correspond to? 3. Fill in the readme.txt file prelab, which asks you about your design of the structure that you use to remember which cells have been visited or not. 6 Procedure Perform the following. Last Revised: 10/19/2014 5
1. Create a maze directory in your VM (or wherever you prefer) $ mkdir ~/maze $ cd ~/maze 2. Download the sample mazes and skeleton code to your local directory: $ wget http://ee.usc.edu/~redekopp/cs103/maze.tar This will download the maze_io header file, Makefile, & 3 sample mazes: maze_io.h Prototypes for the maze file/display I/O maze_io.cpp Code for the I/O routines maze.cpp - Main program and maze_search() algorithm Makefile - Run make to compile your program maze1.in Sample 5 x 5 maze maze2.in Sample 10 x 12 maze maze3.in - Sample maze with no solution path maze4.in Sample maze with an error (no S cell) Checkpoint 1: Input/output: 3. Complete the maze_io.cpp code so that you can read/print/write mazes to and from files. In main() in maze.cpp, write a program that simply reads in the file specified on the command line, prints it to the screen, and writes it out to the specified filename (i.e. the output file should thus be a copy of the input file). 4. Compile and run your program: $ make $./maze maze1.in maze1.copy 5. Check that the printed output and output file look correct. It is okay if your program prints Path successfully found! to the screen. Did you remember to deallocate the dynamic memory? Fill in the ADD CODE HERE part. Checkpoint 2: Core BFS algorithm: 6. Now write the search code in maze_search() a. Find the Start and Finish cells b. Setup and initialize your BFSQ, Predecessor and any other arrays/data structures necessary. c. Perform the Breadth First Search until you find the finish cell or the BFSQ is empty. 7. Does your program successfully distinguish mazes where a path exists, from ones where a path does not? Try it on this input (maze3.in): 2 3 F#..#S If need be, print out the location of each item you add or remove to the queue for debugging purposes, and print out the r,c index of the Finish cell when found. Final Product: 1. Now add the code to walk the predecessor array backwards to the start location filling in the cells with * 2. Return the status code: 1=success, 0=no shortest path exists, -1=error 3. Make sure your code meets all the requirements given earlier in this handout. 4. Make sure all dynamically allocated memory is deallocated. 6 Last Revised: 10/19/2014
5. Comment your code as you write your program documenting what abstract operation a certain for, while, or if statement is performing as well as other decisions you make along the way that feel particular to your approach. 6. Compile your program: $ make 7. Run your program: $./maze maze1.in maze1.out 8. Verify your program s outputs and run the program again on the other sample cases. 9. Create your own sample input mymaze.txt that tests if the algorithm is actually returning a shortest path? So your test case should have multiple paths from start to finish, of different lengths. Once you ve verified that your program behaves as expected, upload it along with your submission. 7 Rubric 3 points: Readme completed; get it from http://cs103.usc.edu/files/pa3/readme.txt 5 points: Follows required API and division of functionality into files, has correct return values 5 points: Opens files, reads input and prints output correctly 7 points: Allocates and deallocates memory correctly 10 points: BFS functionality: Finds start, uses queue insert-and-delete algorithm, checks all 4 neighbors, stays in bounds, updates predecessor array, correctness 4 points: Backtracks, draws asterisks through maze along shortest path 4 points: Style and documentation 2 points: Submitted mymaze.txt Last Revised: 10/19/2014 7