Transforming Legacy Code: The Pitfalls of Automation

Transforming Legacy Code: The Pitfalls of Automation By William Calcagni and Robert Camacho www.languageportability.com 866.731.9977

Code Transformation Once the decision has been made to undertake an automated migration, a variety of options are available for how the code transformation takes place. The method used by each service provider may be slightly different but, essentially, there are two basic categories of code transformation conversion to native types and operations on those types and conversion using classes and methods that mimic the behavior of COBOL types and verbs. Which of these code transformation techniques a service provider uses can have a profound impact on the cost, complexity, future maintainability and, ultimately, the success of the migration. Conversion to Native Types Under this option a service provider chooses to convert all COBOL data items to native types. Typically a table of correspondences is developed based on the closest native type in the target language and an automated mapping is built into the migration tool. A typical example of this would see COBOL alphanumeric items mapped to strings, COBOL numeric items mapped to int, float, decimal, etc. and COBOL 88 level items mapped to booleans. While this may seem a conceptually solid and appealing choice for data transformation there are a number of pitfalls that can cause severe problems in the migrated applications. Some of these potential pitfalls include: 1. Poor correspondence between the COBOL data type and the native type 2. Lack of a native type equivalent to the COBOL data type 3. Differences in implicit operation between the COBOL data type and the native type Each of these pitfalls must be dealt with in some manner or another and how each is dealt with can have important consequences for the overall migration. Some of these consequences can result in significant manual effort being required to complete the migration effort as well as the potential for injecting numerous difficult-to-find errors in the migrated code that will result in maintenance issues long after the migration is completed. We will now take a look at each of these pitfalls. Poor Correspondence One of the main problems with choosing to migrate COBOL data types to native types is that there is generally a poor correspondence between the two. COBOL data types were designed at a time when the use of computers by business was in its infancy and

most applications were accounting functions that relied on batch updates and printed reports. As such, COBOL data types are heavily geared towards titles, financial computations and report presentations. Over time, as COBOL applications grew in complexity and came to include online as well as batch functions, additional COBOL data types were added but the basic nature of data in COBOL remains. By contrast, native types in a language like C# were designed in the modern era of object oriented design and GUI or web based software. Languages like C# are strongly typed and therefore data types are not as interchangeable as they are in COBOL. An example of this type of problem is the COBOL Conditional. At first look a COBOL 88 level conditional seems like a direct correspondence to a boolean item in C#. However, COBOL 88 levels can have symbolic values such as Y or NONE or ranges of values. These indicate a true condition. Boolean items are either true or false and must be set based on the values or ranges in a condition. Since COBOL conditionals have names for each value or range and a different name for the variable that is set, it is not a direct transformation to a Boolean item. For example: 01 COST-CATEGORY PIC 9999 88 LOW-COST VALUE 5 THRU 25. 88 MEDIUM-COST VALUE 26 THRU 300. 88 HIGH-COST VALUE 301 THRU 1500. MOVE 175 TO COST-CATEGORY. IF LOW-COST IF MEDIUM-COST These operations have no direct equivalent in native C# types and require substantial code modification in order to attempt to replicate the functionality of COBOL. This makes the resulting C# more difficult to maintain and can lead to code bloat the phenomenon whereby a 5,000 line COBOL program turns into a 20,000 line C# program. Lack of Equivalents Several COBOL data constructs lack any equivalent type in C#. For example, COBOL contains a type of data items known as numeric edited. These are used to format numbers using character insertion rules, substitution rules and zero or space suppression rules. This cannot be directly represented by native types.

For example: 01 ACCT-DATA. 05 ACCT-NUMBER PIC 9(9). 05 ACCT-TOTAL PIC S9(6)V99 COMP-3. 01 TOTAL-LINE. 05 TOTAL-DESCRIPTION PIC X(27) VALUE TOTALS FOR ACCOUNT NUMBER:. 05 TOTAL-ACCT-NO PIC 999B99B9999. 05 FILLER PIC X(10) VALUE SPACES. 05 TOTAL-ACCT-TOTAL PIC $ZZZ,ZZ9.99-. The data elements TOTAL-ACCT-NO and TOTAL-ACCT-TOTAL are numeric edited items that cannot be translated directly into C# native types. Likewise, COBOL pointers at first look seem equivalent to references in C#. However, because COBOL has the capability to look at the same area of memory in two different ways, it is possible to treat a pointer as a binary number and perform arithmetic on it then use it as a pointer. That is not possible with references. For Example: 01 INDEX-BASE PIC S9(9) BINARY. 01 INDEX-PTR USAGE POINTER REDEFINES INDEX-BASE. 01 DATA-TABLE PIC X OCCURS 20. SET INDEX-PTR TO ADDRESS OF DATA-TABLE. ADD 2 TO INDEX-BASE. IF INDEX-PTR EQUAL Q The above cannot be represented by C# reference variables. Differences in Implicit Operation In COBOL, each type of data item has a specific length associated with it. Thus it is possible to describe an alphanumeric data item that will contain a maximum of 8 characters or a decimal number field that will contain a maximum of 4 significant digits to the left of the decimal and 2 to the right. Any actual data that exceeds those limits will be truncated according to the rules of COBOL and any data less than those limits will be aligned and filled according to those same rules. Native types either have no such limitations or operate with different rules.

For example: 77 ITEM-1 PIC S9999 VALUE -1234. 77 ITEM-2 PIC 999. MOVE ITEM-1 TO ITEM-2. In COBOL, ITEM-2 will contain the value 234 after completion of this assignment. Translating this example to native C# ints and a simple assignment would not produce the same results. Likewise, C# has no native mechanism for replicating the implicit operation of numeric edited items nor does it have a mechanism for implicit type conversion of COBOL types. Using the data descriptions from the previous numeric edited example, consider the following: MOVE ACCT-NUMBER TO TOTAL-ACCT-NO. MOVE ACCT-TOTAL TO TOTAL-ACCT-TOTAL. In COBOL this will result in an implicit conversion of ACCT-NUMBER from COBOL packed decimal to display format as well as the application of the numeric editing formats to the respective data elements. No such facility exists in C#. Why does data type selection matter? Experience has shown that in code transformations, the accuracy and maintainability of the migrated system is critically dependent on the way that the basic data structures and memory mapping in the source language are replicated in the target language. Because of the myriad of possible interactions between data elements in a program, it is not possible to test every possible combination of source statements that might be used in conjunction with any given data element. Therefore it is vital that the data in the migrated system behave exactly as the data in the original system did. The further the data structures and memory mapping in the target language vary from that of the original source language the more errors will be introduced into the converted code and the more difficult it will be to maintain. The reason for this lies in the way that data and program logic interact. In writing program logic, a developer chooses language constructs based on how they interact with the data elements defined for the application. When the data types of those

elements no longer behave in the same way as the original data types, the transformed program logic has to be adjusted to try to accommodate those differences. For example, in COBOL a developer knows that if he makes a computation that results in a 4 digit number and stores that result in a 3 digit number truncation will occur because of the way that COBOL enforces size rules (as illustrated in a previous example). The same is not true with native types in C#. As long as the result of the computation is within the range of possible values for the data type, the computation will not truncate. In some cases, data structures cannot be mapped at all to native types. Consider COBOL group items that are REDEFINEd and contain elementary items with OCCURs clauses. These items can be referenced by the group name, the occurring item name, the REDEFINE group name or the elementary item in the REDEFINE group whose position corresponds to the original OCCURing item. This type of structure cannot be replicated accurately using native types. Therefore, some type of logic modification will be necessary to achieve the desired result. Once manual logic modification occurs, the migration moves further up the cost/risk curve since manual coding is inherently more error prone than automated code transformation. What are the implications of data type selection? If a code transformation attempts to convert all COBOL items to native types either one of two things will happen: Either the automated code transformation will be a partially automated transformation or an attempt will be made to try to force the native types to behave like COBOL types through code additions or modifications. In the case of a partially automated transformation the most obvious and direct mappings are done automatically while the more problematic mappings are flagged for manual intervention. This can significantly increase the amount of time associated with the migration due to the additional personnel resources needed to carry out the manual intervention. Since these resources have a cost associated with them this ultimately increases the cost of the migration, often by a significant amount. In the case of forcing behavior through additional or modified code, the potential for introducing data dependent errors increases significantly. This is because it is virtually impossible to ensure that native types can be coerced into behaving like COBOL types under all possible conditions that could be encountered in the application. Because these are often data dependent errors, they may or may not be found during system testing resulting in potential future maintenance problems. Moreover, the additional coding is often done by a team comprised of numerous different individuals. Each individual developer usually has his or her on coding style that is often slightly different

from that of other developers. When numerous manual changes are required, the same logical function in COBOL is often recoded slightly differently by each member of the migration team. This makes the resulting migrated code more difficult to maintain because of these different coding styles. For example, a table lookup in COBOL that occurs in many places could end up being coded in different ways whereas before there was only one coding technique. The alternative approach The alternative approach is to recognize that the only successful way to migrate a system from one language to another is to develop a set of classes that can be extensively tested to ensure that they accurately and consistently map the critical data structures of COBOL to C#. These classes will ensure that data types behave in C# exactly as they did in COBOL. This is the key to ensuring the accuracy and reliability of an automated code transformation the new system works exactly like the original system because the data behaves in exactly the same way. Doing this requires developing a set of classes and methods that enforce the critical rules of COBOL in the C# application environment. Proponents of transforming all COBOL data types to native types point out that this allows the most complete transformation of a COBOL system to the closest approximation of a native object oriented system. However, the price of this is a higher cost, more manual intervention and a greater potential for maintenance problems long into the future. By the same token, great care must be taken in designing the classes and methods that will enforce the critical rules of COBOL to ensure that they are efficient and consistent, to the maximum extent possible, with normal object oriented design standards. Otherwise, a program can suffer from performance issues or end up looking like COBOL written in C#. The reward for success, however, is that after automated transformation the application will function exactly as the original COBOL system did while taking on the character of an object oriented application in C# that will be understandable and maintainable by C# developers without an extensive COBOL background. It is our view that this represents the most effective compromise that meets the goals of accuracy, reliability, maximum automation, minimum cost and future maintainability.