AD HOC VS. PLANNED SOFTWARE MAINTENANCE

AD HOC VS. PLANNED SOFTWARE MAINTENANCE INTRODUCTION Warren Harrison Portland State University Portland, OR 97207-0751 warren@cs.pdx.edu In a series of papers, Belady and Lehman [Belady & Lehman, 1976] pioneered the study of the evolution of software. They observed that over time, programs exhibit increasing entropy. As a program evolves, its structure degrades and its size increases, resulting in increased complexity. The increase in program entropy as it evolves makes program maintenance increasingly more difficult, and will ultimately result in the program dying and being replaced by another program, or the program undergoing a major and expensive overhaul. When making a modification to a piece of production software, a maintenance programmer must give some thought to the impact their changes will have on the entropy of the module. AD HOC PATCHES AND PLANNED, STRUCTURE PRESERVING MODIFICATIONS When changing a piece of code, a programmer may apply a "patch" that effects the desired change in program behavior, or alternatively restructure (perhaps even rewrite) the module in order to preserve the structure and maintainability of the code. For instance, consider the somewhat contrived and simplified example in Figure 1. A programmer writes a program to count logical source statements in C, based on a simple count of semi-colons. char line_char; // a character from the program int in_code=0; // 0 if not in..., >0 otherwise int lss=0; // count of logical source statements if(line_char== ) in_code++; if(line_char== ) in_code--; if((in_code > 0)&&(line_char== ; )) lss++;

Figure 1. However, shortly after putting the program into production, the programmer discovers that strings and character constants can include semi-colons and/or brackets. Obviously, the "logical source statement count" produced by this program is inaccurate. At this point, the programmer may choose to consider this as a "special case" and "patch" the code to address the issue of strings and character constants through a use (for example) of a series of flags to indicate if we are indeed inside a string or character constant when a semi-colon or bracket is encountered, as shown in Figure 2. char line_char; // a character from the program int in_code=0; // >0 if in..., 0 otherwise int in_string=0;// >0 if in string, 0 otherwise int in_char=0; // >0 if in char const, 0 otherwise int lss=0; // count of logical source statements if((line_char== " )&&(in_char==0)) if(in_string==0) in_string++; else in_string--; if((line_char== \ )&&(in_string==0)) if(in_char==0) in_char++; else in_char--; if((in_string==0)&&(in_char==0)) if(line_char== ) in_code++; if(line_char== ) in_code--; if((in_code > 0)&&(in_string==0)&&(in_char==0)) if(line_char== ; ) lss++; Figure 2. Alternatively, the programmer could have instead chosen to revise the design to accommodate this situation in a more general manner, by introducing a function that returns the "next" executable token as shown in Figure 3. void token(char&); // retrieve "next executable token" char token[token_len]; // a character from the program int lss=0; // count of logical source statements get_token(token);

if(strcmp(token,";")==0) lss++; get_token(token); Figure 3. We can see that the ad hoc patch illustrated in Figure 2 can be made quite easily by simply adding a couple of Boolean variables and a test or two inside the code. The update can be made literally within a matter of minutes, and after a modest amount of testing to verify that the new version can indeed accommodate strings containing semi-colons and brackets, the user will be able to use the new version, probably within the same day. On the other hand, the planned, structure-preserving modification illustrated in Figure 3 will likely take some additional time to implement and retest, since it radically modifies the overall approach to the problem, from a character-by-character scan to a token-by-token. Obviously, the ad hoc approach solves the problem at hand, but should additional "special cases" come up, the superiority of the planned, structure preserving solution can be easily seen. Undoubtedly, an ad hoc patch can be done more quickly and with less effort than a planned, structure preserving change. However, while a single "patch" may not significantly impact future modifications, after a series of patches, the code may very well become unmaintainable. For instance, in Figure 4, the programmer is now responding to the fact that comments can also contain semi-colons and brackets. char line_char; // a character from the program char last_char; // previous character from program int in_code=0; // >0 if in..., 0 otherwise int in_string=0;// >0 if in string, 0 otherwise int in_char=0; // >0 if in char const, 0 otherwise int in_comment=0;// >0 if in comment, 0 otherwise int lss=0; // count of logical source statements if((line_char== / )&&(last_char== / )) in_comment++; if(line_char== \n ) in_comment=0; if((line_char== " )&&(in_char==0)&&(in_comment==0)) if(in_string==0) in_string++; else in_string--; if((line_char== \ )&&(in_string==0)&&(in_comment==0)) if(in_char==0) in_char++; else in_char--; if((in_string==0)&&(in_char==0)&&(in_comment==0)) if(line_char== ) in_code++; if(line_char== ) in_code--; if((in_code>0)&&(in_string==0)&& (in_char==0)&&(in_comment==0))

if(line_char== ; ) lss++; last_char=line_char; Figure 4. The planned, structure preserving modification could be easily updated to also accommodate comments by simply redefining the meaning of an "executable token" within the get_token function, without even touching the program s main function. The subsequent differences in maintainability of the two approaches becomes more and more obvious as additional "special cases" are identified. Of course, we have tried to illustrate the "spirit" of an ad hoc vs. planned, structure preserving change as opposed to presenting a rigorous, objective definition. Currently we know of no clear, objective test to distinguish between the two. CHOOSING AN APPROACH TO MODIFYING SOFTWARE Since every hour spent working on a given maintenance request means that there is one less hour available to accommodate other maintenance requests the programmer needs to balance the effort expended in a modification with the preservation of structure and maintainability. A programmer may be subject to a "Type I Error", in which the decision is made to perform a planned, structure preserving change when in fact an ad hoc patch can be comfortably accommodated, and a "Type II Error" in which an ad hoc patch is applied when a planned, structure preserving change should have been performed instead. The programmer must consider two aspects of the problem. First, if a module will not be modified again (or at least, not modified again in the near future), then a degradation in the structure of the module is unimportant. On the other hand, if a module will be modified again in the future, then significant degradation in the structure of the module may indeed cause problems. Secondly, if a patch is applied to a heretofore well-structured module, then even if it does not preserve the structuredness of the module, it is unlikely to make the module unmaintainable. However, if a patch is made on top of a series of prior patches, the maintainability of the module may be seriously damaged. Therefore, in order to support the maintenance programmer s decision problem, we need to be able to make two projections: (1) is this module likely to undergo further modification in the future? and (2) will the proposed modification render the module unmaintainable in the future? Studies have shown that in a typical maintenance scenario, only a small percentage of the modules in a system are actually modified. For example, Harrison and Cook, [Harrison & Cook, 1990] found that in the embedded flight control software of the AV-8 Attack Fighter/Bomber out of 217 modules, only 102 modules were changed between a major four year release cycle (115 maintenance requests), with an average of 72 lines of code (standard deviation of 153) changed per module. However, what was even more interesting was that of the 102 modules changed, twelve accounted for 60% of the maintenance activity, with the remaining 90 changed modules accounting for only 40% of the activity (and of course 115 modules receiving no maintenance activity at all). It would appear that efforts put into preserving

the maintainability of every module might very well be mis-directed, since many modules will either not be modified, or if modified, changed only slightly. Of course, even if we are able to project that a given module is likely to be modified in the near future, we need to formulate a decision rule for when a proposed ad hoc patch will adversely impact maintainability. CHANGE-PRONE MODULES We consider a module "change-prone" if it is likely to be modified in the near future. There are two basic approaches to identifying a change-prone module. First, the analytic approach can be used to determine if the module contains commonly modified functions. For instance, in the case of an income tax program, tax laws are expected to change on an annual basis, while the user interface, file access routines, etc. can be expected to remain constant. Clearly modules containing tax law implementations can be expected to be highly change-prone. This approach depends heavily on the maintainers semantic knowledge of the application area and overall software design. A second approach to identifying change-prone modules is "historical". Modules which undergo frequent modification - especially corrective maintenance - are likely to experience future maintenance activities as well. By keeping track of how often a module undergoes maintenance, we may very well be able to predict the likelihood of future maintenance activities. Unlike the analytic approach, identifying change prone modules in this manner is only a matter of collecting and analyzing the data. Little if any knowledge of the application area and/or design are necessary. Interestingly enough, static code characteristics, such as various metric values appear to have very little relationship with the likelihood of a module being maintained in the future (though there may be a relationship with how difficult maintenance, if there is any, will be to perform). Identifying change-prone modules through an analysis of historical data would appear to be a ripe field for empirical investigation. In short, can we predict future maintenance activities for a given module, as a function of past maintenance activities? IMPACTS OF AD HOC PATCHES ON MAINTAINABILITY This is a question which has been around for some time. Typically, it is couched in terms of the maintainability of two different programs. In our case, the two programs in question are the modified versions of the code resulting from the ad hoc patch and the planned, structure preserving modification (or at least projections as to what these modifications would look like). This problem has been well-studied, and to date no definitive results have been obtained. However, this is still an important question and one which may perhaps yield to more careful and extensive empirical studies. COST SAVINGS FROM AN AD HOC PATCH This discussion has revolved around an assumption that an ad hoc patch can be performed more cheaply than a planned, structure preserving modification. In fact, we don t know for certain if such a situation exists. It might very well be that ad hoc patches will usually require additional patches to work correctly, and that a planned, structure preserving modification will in fact be less expensive to apply in most cases.

SUMMARY Determining what approach to take in making modifications to production code is a problem faced by maintenance programmers on a daily basis. Ad hoc patches can be made to an application, or special care can be taken in ensuring the structure and maintainability of the code is preserved. We assume that one approach is more expensive (at least in the short run) and the other. Under what circumstances are ad hoc patches justified? What about the more expensive planned, structure preserving changes? We have posed three general questions relating to this problem which can be addressed through the use of empirical studies. First, can we predict the change-proneness of a module based on its prior change history; second, can we assess the relative degradation of maintainability imposed by an ad hoc patch and finally, is our assumption that ad hoc patches are less expensive than planned, structure preserving modification correct? It is easy to imagine an empirical study or sets of studies that could be used to answer these questions. REFERENCES Belady, L.A. and M.M. Lehman, "A model of large program development", IBM Systems Journal, 15, 3 (1976), pp 225-252. Harrison, W. and C. Cook, "Insights on Improving the Maintenance Process Through Software Measurement", Proceedings of the 1990 IEEE Conference on Software Maintenance (November 1990, San Diego CA).