Debugging with Visual Studio & GDB OCTOBER 31, 2015 BY CSC 342 FALL 2015 Prof. IZIDOR GERTNER
1 Table of contents 1. Objective... pg. 2 2. Overview... pg. 3 3. Microsoft s Visual Studio Debugger... pg. 5 3.1 Iterative Method... pg. 5 3.2 Recursive Method... pg. 9 4. GDB Debugger for Linux... pg. 17 4.1 Iterative Method... pg. 17 4.2 Recursive Method... pg. 21 5. Analysis... pg. 28 6. Conclusion... pg. 30 7. Appendix... pg. 31
2 1. Objective The objective of this laboratory is to understand how the stack works with recursive functions. We will do this by debugging a function in two different environments; Visual Studio and GDB. We take into consideration a simple function to calculate the factorial of a number. This is a very useful function to understand how the stack works on recursive functions by comparing it with a regular iterative function. The results of these laboratory will be summarized in a plot of the running time of each method versus the value of the number of which the factorial is to be calculated.
3 2. Overview Recursive functions are routines that call themselves. We use these type of functions to solve large problems by combining the solution of smaller sub-problems and since this kind of routine works by calling itself, it will eventually come to a sub-problem which it can solve without calling itself. This case is called the Base Case of the routine. In our case the base case is when the number that we are trying to calculate the factorial of reaches the value of 1. The function that is going to be used to achieve the goal of this laboratory is the Factorial function. The factorial of a number is defined by the symbol!, i.e. exclamation mark. Whenever we see a notation like x! we understand that we are talking about the factorial of number x. The factorial is the product of all the numbers between x and 1. We do not want to go beyond one because a multiplication with zero would result in zero no matter how large the number x is. Furthermore it is unnecessary to compute the factorial of negative numbers since we can simply negate the result to get the product of negative x for example. We use a simple fact of the factorial function to put it into a recursive form. Since the factorial of number x is the product of the numbers between x and 1, we can write: x! = x (x 1) (x 2) 1 It is trivial to see from the above definition of the factorial that x! can be rewritten as: x! = x (x 1)! and (x 1)! = (x 1) (x 2)! and so on Hence, we can use a recursive routine to solve sub-problems till we reach the base case. Recursive functions use call stack.
4 A call stack is composed of stack frames and a new frame is created each time the subroutine is called from the recursive function. The stack frame is used to store all the variables for one invocation of a routine. For the recursive factorial function we except the compiler to create a new stack frame for each call to the function. For very large numbers we expect to have a very large number of stack frames created by the compiler and this would mean using a lot of memory. The following picture illustrates what was stated above: Figure 1 Stack Frames for the Recursive Factorial Function
5 3. Microsoft s Visual Studio Debugger 3.1. Iterative Method In this part of the project we take into consideration the iterative method of the factorial function and debug it via the Visual Studio Debugger. Create a new project in Visual Studio and set it up as an Empty Function. Next, create two new files, the first one is the main function where the second file, factorial will be called from. The following figure shows the two functions to be created: Figure 2 Iterative Method Functions We compile and link these two functions together to make sure that we have we do not have any errors in our code. Then we start debugging. The following figure shows the state of the stack just before calling the factorial function from the main function:
6 Figure 3 Stack Frame of main We can see that the stack frame for the main function is created and we have a base pointer that contains the address 0x0043F874. Let s continue our debugging session by entering the factorial function. We expect the compiler to create a new stack frame for the function called and so the base pointer should point to the new base of the frame, which is a different address than that of the main function. Consider the following screenshot:
7 Figure 4 Iterative Factorial Function Stack We can see that our expectations are met and the address of the base pointer has changed. We now have a new stack frame with the necessary variables stored. In the above figure, the red square contains the address of the EBP pointer which now has a new value of 0x004F790. The blue square shows the location where the variable i is stored and currently has a value of 0x01, and the green circle shows the location of where the variable j is stored which has the value of 0x01 since the. Our final result will be stored in j. Let s observe the stack after the loop has completed all five iterations.
8 Figure 5 End of Iterative Function As we can see, the value of i at this point is 5, therefore the loop has finished iterating and the function will return the value of j which is stored in memory location 0x0043F788 and is 0x78. If we convert this value from Little Endian Notation and from hexadecimal to decimal notation we get the value of 120. After this the function will return to the last address in main before we jumped into the factorial function. We have achieved our goal and have the expected result in stack. We only created one frame and everything was done inside this frame by using the iterative method. Let s see how the stack behaves when we are dealing with a recursive function.
9 3.2. Recursive Method Consider the following functions that are used in Visual Studio to run the recursive method of the factorial function: Figure 6 Recursive Method Functions As we can see the way of coding the factorial function is different this time. We compile each of these two files separately and afterwards we link them together by clicking on the Build Solution option of Visual Studio. After we make sure that our program is correctly coded we procced to start a debugging session. We click on the Start with Debugging option and step over some instruction till we have the stack created for the main function and we are ready to jump into the recursive factorial function. The following screenshot shows the stack frame of the main function just before calling the recfactorial function:
10 Figure 7 Stack Frame of main Function We now have a stack frame for our main function. The base pointer, depicted by the red square in the figure, points to the address 0x001AFA84 which is the base of the main function frame. We can see that the variable n, shown inside the green square of value 0x05, is stored at location 0x001AFA7C and we have the variable result initialized to zero at location 0x001AFA70 which is contained within blue square. On the other hand, the purple square contains the address of the instruction that recursive factorial function will return after it has completed its task. We are ready to jump into the recursive factorial function. The following picture shows the stack frame for the recursive factorial function with parameter n = 5:
11 Figure 8 Recursive Stack Frame 1 We notice that for this frame we have a new base pointer, meaning that the compiler created a new stack frame. This new base pointer is indicated by the red square. The green square shows the location of variable n, which is below the base pointer. This is what we expected since this variable was pushed in stack before calling the recursive factorial function. The purple box contains the address of the return to main instruction which was also pushed in stack just before calling the function and therefore is below the base pointer. Let s call this Recursive Stack Frame 1 since it is the first one created by the compiler after the main function stack frame. Since the value of n in this function is not zero, then the function will call itself again with variable n = 4. Let s consider the following screenshots that show the stack for each call:
12 Figure 9 Recursive Stack Frame 2 Figure 10 Recursive Stack Frame 3
13 Figure 11 Recursive Stack Frame 4 Figure 12 Recursive Stack Frame 5
14 In the above screenshots we can see that the compiler creates different stack frames for each function call. This can be seen by checking the EBP for each time. In stack frame 2, the base pointer contains the address 0x001AF8BC. In the next frame, where the parameter n is 3 we have the base pointer address 0x001AF7E4. Stack frame 4 starts at address 0x001AF70C and finally the last stack frame, where n is 1 has the base pointer pointing to address 0x001AF634. These different addresses for the base pointer show that each function call creates a new stack frame for each function. This could be very costly when we want to find the factorial of a large number, therefore we must be very careful when to use the recursive method. The next call to the recursive factorial function will take the parameter n equal to zero, therefore it will stop at the comparison instruction. At this point, the number 1 will be returned to the caller, which will multiply this number by the current value of n. The Stack Frames return their values to the previous frame, which called them, starting from the innermost frame and end up returning the final value to the main function which will assign it to the variable result. The following diagram shows how the Stack Frames work with each other:
15 Graph 1 Stack Frames of the Recursive Factorial After these operations are completed and the compiler has cleared all the Stack Frames that are not going to be used anymore, we return to the main function to assign the value of 120 to the variable result. Let s see the main stack frame after the return:
16 Figure 13 Stack Frame of Main Function As we can see in the red square, the base pointer is again pointing to the starting address that is 0x001AFA84. At location 0x001AFA7C, shown by the green square, we still have the value of n, which is 5. Finally we can see the result that we were expecting is now set to 0x78 at location 0x001AFA70 which is depicted by the blue square. We convert this value from Little Endian notation and then from hexadecimal notation to get the decimal value of it, which is 120 and we can conclude that the function has worked correctly. In terms of efficiency, the compiler had to create 5 different stack frames for each function call to the recursive routine and this might become very costly when dealing with very large integers. We have finished studying the behavior of the stack when dealing with recursive functions in Visual Studio. Now we are going to do the same analysis on a Linux platform.
17 4. GDB Debugger for Linux 4.1. Iterative Method We repeat the same experiment using another platform. This time we analyze the stack using a Linux operating system with 64-bit architecture. For this part we use the GCC compiler and the GDB debugger. The following figure shows the stack frame of the main function as soon as we run the program: Figure 14 Stack Frame of Main Function This is the disassembly window of our function. In the red square are shown the base pointer, the stack pointer and the current frames. The current frames are shown by using the command
18 info stack from GDB. This is a very interesting option because we can visualize better how many stack frames are created at this point of the program. The next thing to do is to step into the iterative factorial function, disassemble and check the stack. The following screenshot shows the stack at the moment of the first iteration of the loop: Figure 15 Stack Frame of Factorial Function (Part 1) As we can see the base pointer has changed and now points to a new address 0x7FFFFFFFDF60 which is different from the base pointer of the main function. This means that the compiler has created a new frame for the factorial function. Also, from the info stack command we can see that we now have two frames for our program. The frame that we are currently is depicted by #0 and the frame where we will return after this function has
19 completed its task is frame #1. The stack works as a LIFO (Last In First Out) data structure. This means that the last frame to be created will be the first one to return. The next screenshot shows the state of the stack after the last iteration of the loop: Figure 16 Stack Frame of Factorial Function (Part 2) At this point of the program we have finished iterating through the loop. The value of the variable j is the final value that we are looking for. We can check the value of j in two ways, one way is to look at the contents of the memory address where j is stored and the other way is to simply print out the value of j using the print option of GDB. As we can see from the red squares in the screenshot, the value of j is now 120. Notice that we still have two stack frames. This is because we have not returned to the main function yet, but since we are done with the iterations we can return. The following figure shows the state of the stack for the main function when we return from the factorial function:
20 Figure 17 Stack Frame of Main Function after Returning As we can see the base pointer points to address 0x7FFFFFFFDF80 again and the value of the result is located in the stack at memory address 0x7FFFFFFFDF78. We can say that our function has worked correctly. We can continue our study of the stack by using the recursive method now.
21 4.2. Recursive Method For this method, we use the same function as we used in Visual Studio. The following screenshot shows the stack of the main function as soon as we enter the program: Figure 18 Stack Frame of Main Function In this case the base pointer points to address 0x7FFFFFFFDF40 and from the info stack command we can see that we have only one stack frame created so far. Let s step into the recursive factorial function to see how the stack behaves:
22 Figure 19 Recursive Factorial Stack Frame 5 We can see from the figure that the base pointer now points to another address and therefore we know that the compiler has created a new stack frame. We can confirm this by the info stack command which gives us two stack frames. The innermost frame is the frame for the recfactorial function which takes as argument n = 5. Let s consider all the stack frames created for the Recursive function now. The following pictures show the frames created each time the recursive function calls itself:
23 Figure 20 Recursive Factorial Stack Frame 4 Figure 21 Recursive Factorial Stack Frame 3
24 Figure 22 Recursive Factorial Stack Frame 2 Figure 23 Recursive Factorial Stack Frame 1
25 We can see that the stack frames are different for each call of the recursive factorial. In total we have six frames created at the end of the recursion. Five of these frames are created by the recursive factorial function and the last one is the main function stack frame. In each of the above pictures the stack frames are inside the red square. The last recursive call will take n = 0 as an argument therefore it will return 1 to the previous stack frame which will multiply it by its current n. The same procedure will be executed as when we were working with Visual Studio. The frames will be popped one by one till we get to the main function and assign the final value to the variable result. The following two screenshots show how the frames disappear when we keep stepping into our program: Figure 24 Returning Stack Frames (Part 1)
26 Figure 24 Returning Stack Frames (Part 2) As we can see, at the last step we are left with only one stack frame, the one that we started with, i.e. the main function stack frame. For every step we take the current value of n is returned from the current frame and then it is multiplied with the value of n that was on the previous frame. As we said before, the LIFO property of the stack is not lost. Let s make one more step and check the state of the main function stack frame after we return from all the recursive calls. The following screenshot shows the result:
27 Figure 25 Stack Frame of Main Function after Returning As we can see we are back where we started, at the main function frame with base pointer at memory address 0x7FFFFFFFDF40. We can see that the value of the variable result is stored in memory location 0x7FFFFFFFDF38 and it is equal to 120. This means that the recursive factorial function has finished correctly and we have the value that we were looking for.
28 5. Analysis To analyze the performance of the recursive method versus the performance of the iterative method we use the QueryPerformanceCounter function to measure the running time of each function. The following table shows the running time of each function for different values of n, i.e. the number which we want to calculate the factorial: n Iterative (seconds) Recursive (seconds) 10 0.000345603542265217 0.000372550353110154 100 0.000376827624672842 0.000418317158830919 1,000 0.000399924891111359 0.000477771233552287 10,000 0.000400352618267628 0.000753227522189416 100,000 0.000478626687864825 0 1,00,0000 0.000602667563182786 0 Table 1 Iterative vs Recursive Running Time As we can see, there is running time of zero for very large values of n in the recursive method. This means that there is not enough memory to create the necessary stack frames for the 100,000 recursions and the compiler generates and error: Process is terminated due to StackOverflowException. In order to visualize better the performance of these two functions we plot the running times for each of the values of n in a graph to see how they behave. The following graph shows the plot of the running time vs. the size of n:
Time (Seconds) 29 Iterative vs. Recursive Function Calls 0.0008 0.0007 0.0006 0.0005 0.0004 0.0003 0.0002 Iterative Recursive 0.0001 0 10^1 10^2 10^3 10^4 10^5 10^6 n Graph 1 Iterative vs Recursive Running Time It is obvious that the recursive method is much faster than the iterative method for some range of values of n, but after a certain value we cannot use the recursive method anymore because it would result in Stack Overflow which would terminate our program immediately.
30 6. Conclusion In this laboratory we studied the behavior of the stack when we worked with recursive functions and compared it to the iterative methods. The function chosen to be analyzed was the factorial one. We calculated the factorial of the integer 5. The iterative method created only one stack frame for the function and it worked within that frame. The variables were stored in the same frame and were overwritten by the new value assigned to the final result. On the other hand the recursive functions force the compiler to create a new frame every time the sub-routine is called. This method could be very expensive as the number to be calculated gets bigger. Therefore we need to be careful when to use the recursive methods because we might be better off by using a simple iterative loop.
31 7. Appendix main.cpp int factorial(int n); void main(){ int n = 5; factorial(5); } factorial.cpp int factorial(int n){ int j = 1; for(int i = 1; i <= n; i++) j *= i; return j; } recfactorial.cpp int recfactorial(int n) { if (n == 0) return 1; } return n * recfactorial(n - 1);