Transforming Legacy Code: The Pitfalls of Automation

Similar documents
The Migration/Modernization Dilemma

Copyright Network Management Forum

Cloud Security Gaps. Cloud-Native Security.

Normalized Relational Database Implementation of VSAM Indexed Files

Question 1: What is a code walk-through, and how is it performed?

Creating a procedural computer program using COBOL Level 2 Notes for City & Guilds 7540 Unit 005

printf( Please enter another number: ); scanf( %d, &num2);

COMPUTER EDUCATION TECHNIQUES, INC. (COBOL_QUIZ- 4.8) SA:

RM/COBOL to RM/COBOL-85

ASG-Rochade SCANCOB Release Notes

1 class Lecture2 { 2 3 "Elementray Programming" / References 8 [1] Ch. 2 in YDL 9 [2] Ch. 2 and 3 in Sharan 10 [3] Ch.

WR2QTP: Semantic Translator of WinRunner Scripts to QTP

unused unused unused unused unused unused

Request for Comments: 304 NIC: 9077 February 17, 1972 Categories: D3, D4, D7 Obsoletes: none Updates: none

false, import, new 1 class Lecture2 { 2 3 "Data types, Variables, and Operators" 4

11. Architecture of Database Systems

Cursor Design Considerations For the Pointer-based Television

The SPL Programming Language Reference Manual

Princeton University Computer Science 217: Introduction to Programming Systems The C Programming Language Part 1

TABLE 1 HANDLING. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

9/21/17. Outline. Expression Evaluation and Control Flow. Arithmetic Expressions. Operators. Operators. Notation & Placement

Just-In-Time Hypermedia

Expression Evaluation and Control Flow. Outline

Section 1. The essence of COBOL programming. Mike Murach & Associates

The New C Standard (Excerpted material)

Data Types. (with Examples In Haskell) COMP 524: Programming Languages Srinivas Krishnan March 22, 2011

Question No: 1 ( Marks: 1 ) - Please choose one One difference LISP and PROLOG is. AI Puzzle Game All f the given


Three things you should know about Automated Refactoring. When planning an application modernization strategy

CS112 Lecture: Primitive Types, Operators, Strings

CSE 452: Programming Languages. Outline of Today s Lecture. Expressions. Expressions and Control Flow

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

false, import, new 1 class Lecture2 { 2 3 "Data types, Variables, and Operators" 4

LESSON 13: LANGUAGE TRANSLATION

COBOL-SO: The new structured language

COBOL performance: Myths and Realities

Part I Logic programming paradigm

2/12/17. Goals of this Lecture. Historical context Princeton University Computer Science 217: Introduction to Programming Systems

Business Rules Extracted from Code

CPS122 Lecture: From Python to Java last revised January 4, Objectives:

Software II: Principles of Programming Languages. Why Expressions?

Chapter 7. Expressions and Assignment Statements ISBN

MACHINE INDEPENDENCE IN COMPILING*

What do Compilers Produce?

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Chapter 7. Expressions and Assignment Statements

In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set.

Model-Based Design for Large High Integrity Systems: A Discussion Regarding Model Architecture

QIIBEE Security Audit

Legacy Metamorphosis. By Charles Finley, Transformix Computer Corporation

Programming Languages Third Edition. Chapter 7 Basic Semantics

The PCAT Programming Language Reference Manual

Application generators: a case study

Redvers Hashing Algorithm. User Guide. RCHASH Version 2.3

2. You are required to enter a password of up to 100 characters. The characters must be lower ASCII, printing characters.

Chapter 7. Expressions and Assignment Statements

COBOL MOCK TEST COBOL MOCK TEST III

The information contained in this manual is relevant to end-users, application programmers and system administrators.

COSC 243. Data Representation 3. Lecture 3 - Data Representation 3 1. COSC 243 (Computer Architecture)

2 nd Week Lecture Notes

Language Basics. /* The NUMBER GAME - User tries to guess a number between 1 and 10 */ /* Generate a random number between 1 and 10 */

UNIT-IV: MACRO PROCESSOR

CA-MetaCOBOL + Online Programming Language Guide. Release 1.1 R203M+11DRP

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Explicit Conversion Operator Draft Working Paper

Control Flow. COMS W1007 Introduction to Computer Science. Christopher Conway 3 June 2003

Overview (4) CPE 101 mod/reusing slides from a UW course. Assignment Statement: Review. Why Study Expressions? D-1

The type of all data used in a C (or C++) program must be specified

false, import, new 1 class Lecture2 { 2 3 "Data types, Variables, and Operators" 4

Annotation Annotation or block comments Provide high-level description and documentation of section of code More detail than simple comments

CS Programming In C

Chapter 2 Using Data. Instructor s Manual Table of Contents. At a Glance. Overview. Objectives. Teaching Tips. Quick Quizzes. Class Discussion Topics

Logic, Words, and Integers

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

Moving Data and Printing Information. The Move Statement has The following Format: Move Identifier-1 To Identifier-2. Move Literal-1 To Identifier-2

SML Style Guide. Last Revised: 31st August 2011

Preface SCOPE AND OBJECTIVES INTENDED READERS

INFS 214: Introduction to Computing

Expressions and Assignment Statements

Chapter 4. Operations on Data

Chapter 7. Expressions and Assignment Statements (updated edition 11) ISBN

COMP2121: Microprocessors and Interfacing. Number Systems

Modula-2 Legacy Code: Problems and Solutions. Excelsior, LLC

Adapter for Mainframe

APPENDIX E SOLUTION TO CHAPTER SELF-TEST CHAPTER 1 TRUE-FALSE FILL-IN-THE-BLANKS

Go Forth and Code. Jonathan Gertig. CSC 415: Programing Languages. Dr. Lyle

LLVM code generation and implementation of nested functions for the SimpliC language

COBOL for AIX, Version 4.1

1.3b Type Conversion

Principles of Programming Languages. Lecture Outline

CPS122 Lecture: From Python to Java

Lambda Correctness and Usability Issues

What Every Programmer Should Know About Floating-Point Arithmetic

Chapter 7: Statement Generator

Oracle Warehouse Builder 10g Release 2 Integrating COBOL Based Legacy Data

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.

COBOL Unbounded Loops A Diatribe On Their Omission From the COBOL Standard (and a Plea for Understanding)

Volume II, Section 5 Table of Contents

Tokens, Expressions and Control Structures

Chapter 7. Expressions and Assignment Statements ISBN

Transcription:

Transforming Legacy Code: The Pitfalls of Automation By William Calcagni and Robert Camacho www.languageportability.com 866.731.9977

Code Transformation Once the decision has been made to undertake an automated migration, a variety of options are available for how the code transformation takes place. The method used by each service provider may be slightly different but, essentially, there are two basic categories of code transformation conversion to native types and operations on those types and conversion using classes and methods that mimic the behavior of COBOL types and verbs. Which of these code transformation techniques a service provider uses can have a profound impact on the cost, complexity, future maintainability and, ultimately, the success of the migration. Conversion to Native Types Under this option a service provider chooses to convert all COBOL data items to native types. Typically a table of correspondences is developed based on the closest native type in the target language and an automated mapping is built into the migration tool. A typical example of this would see COBOL alphanumeric items mapped to strings, COBOL numeric items mapped to int, float, decimal, etc. and COBOL 88 level items mapped to booleans. While this may seem a conceptually solid and appealing choice for data transformation there are a number of pitfalls that can cause severe problems in the migrated applications. Some of these potential pitfalls include: 1. Poor correspondence between the COBOL data type and the native type 2. Lack of a native type equivalent to the COBOL data type 3. Differences in implicit operation between the COBOL data type and the native type Each of these pitfalls must be dealt with in some manner or another and how each is dealt with can have important consequences for the overall migration. Some of these consequences can result in significant manual effort being required to complete the migration effort as well as the potential for injecting numerous difficult-to-find errors in the migrated code that will result in maintenance issues long after the migration is completed. We will now take a look at each of these pitfalls. Poor Correspondence One of the main problems with choosing to migrate COBOL data types to native types is that there is generally a poor correspondence between the two. COBOL data types were designed at a time when the use of computers by business was in its infancy and

most applications were accounting functions that relied on batch updates and printed reports. As such, COBOL data types are heavily geared towards titles, financial computations and report presentations. Over time, as COBOL applications grew in complexity and came to include online as well as batch functions, additional COBOL data types were added but the basic nature of data in COBOL remains. By contrast, native types in a language like C# were designed in the modern era of object oriented design and GUI or web based software. Languages like C# are strongly typed and therefore data types are not as interchangeable as they are in COBOL. An example of this type of problem is the COBOL Conditional. At first look a COBOL 88 level conditional seems like a direct correspondence to a boolean item in C#. However, COBOL 88 levels can have symbolic values such as Y or NONE or ranges of values. These indicate a true condition. Boolean items are either true or false and must be set based on the values or ranges in a condition. Since COBOL conditionals have names for each value or range and a different name for the variable that is set, it is not a direct transformation to a Boolean item. For example: 01 COST-CATEGORY PIC 9999 88 LOW-COST VALUE 5 THRU 25. 88 MEDIUM-COST VALUE 26 THRU 300. 88 HIGH-COST VALUE 301 THRU 1500. MOVE 175 TO COST-CATEGORY. IF LOW-COST IF MEDIUM-COST These operations have no direct equivalent in native C# types and require substantial code modification in order to attempt to replicate the functionality of COBOL. This makes the resulting C# more difficult to maintain and can lead to code bloat the phenomenon whereby a 5,000 line COBOL program turns into a 20,000 line C# program. Lack of Equivalents Several COBOL data constructs lack any equivalent type in C#. For example, COBOL contains a type of data items known as numeric edited. These are used to format numbers using character insertion rules, substitution rules and zero or space suppression rules. This cannot be directly represented by native types.

For example: 01 ACCT-DATA. 05 ACCT-NUMBER PIC 9(9). 05 ACCT-TOTAL PIC S9(6)V99 COMP-3. 01 TOTAL-LINE. 05 TOTAL-DESCRIPTION PIC X(27) VALUE TOTALS FOR ACCOUNT NUMBER:. 05 TOTAL-ACCT-NO PIC 999B99B9999. 05 FILLER PIC X(10) VALUE SPACES. 05 TOTAL-ACCT-TOTAL PIC $ZZZ,ZZ9.99-. The data elements TOTAL-ACCT-NO and TOTAL-ACCT-TOTAL are numeric edited items that cannot be translated directly into C# native types. Likewise, COBOL pointers at first look seem equivalent to references in C#. However, because COBOL has the capability to look at the same area of memory in two different ways, it is possible to treat a pointer as a binary number and perform arithmetic on it then use it as a pointer. That is not possible with references. For Example: 01 INDEX-BASE PIC S9(9) BINARY. 01 INDEX-PTR USAGE POINTER REDEFINES INDEX-BASE. 01 DATA-TABLE PIC X OCCURS 20. SET INDEX-PTR TO ADDRESS OF DATA-TABLE. ADD 2 TO INDEX-BASE. IF INDEX-PTR EQUAL Q The above cannot be represented by C# reference variables. Differences in Implicit Operation In COBOL, each type of data item has a specific length associated with it. Thus it is possible to describe an alphanumeric data item that will contain a maximum of 8 characters or a decimal number field that will contain a maximum of 4 significant digits to the left of the decimal and 2 to the right. Any actual data that exceeds those limits will be truncated according to the rules of COBOL and any data less than those limits will be aligned and filled according to those same rules. Native types either have no such limitations or operate with different rules.

For example: 77 ITEM-1 PIC S9999 VALUE -1234. 77 ITEM-2 PIC 999. MOVE ITEM-1 TO ITEM-2. In COBOL, ITEM-2 will contain the value 234 after completion of this assignment. Translating this example to native C# ints and a simple assignment would not produce the same results. Likewise, C# has no native mechanism for replicating the implicit operation of numeric edited items nor does it have a mechanism for implicit type conversion of COBOL types. Using the data descriptions from the previous numeric edited example, consider the following: MOVE ACCT-NUMBER TO TOTAL-ACCT-NO. MOVE ACCT-TOTAL TO TOTAL-ACCT-TOTAL. In COBOL this will result in an implicit conversion of ACCT-NUMBER from COBOL packed decimal to display format as well as the application of the numeric editing formats to the respective data elements. No such facility exists in C#. Why does data type selection matter? Experience has shown that in code transformations, the accuracy and maintainability of the migrated system is critically dependent on the way that the basic data structures and memory mapping in the source language are replicated in the target language. Because of the myriad of possible interactions between data elements in a program, it is not possible to test every possible combination of source statements that might be used in conjunction with any given data element. Therefore it is vital that the data in the migrated system behave exactly as the data in the original system did. The further the data structures and memory mapping in the target language vary from that of the original source language the more errors will be introduced into the converted code and the more difficult it will be to maintain. The reason for this lies in the way that data and program logic interact. In writing program logic, a developer chooses language constructs based on how they interact with the data elements defined for the application. When the data types of those

elements no longer behave in the same way as the original data types, the transformed program logic has to be adjusted to try to accommodate those differences. For example, in COBOL a developer knows that if he makes a computation that results in a 4 digit number and stores that result in a 3 digit number truncation will occur because of the way that COBOL enforces size rules (as illustrated in a previous example). The same is not true with native types in C#. As long as the result of the computation is within the range of possible values for the data type, the computation will not truncate. In some cases, data structures cannot be mapped at all to native types. Consider COBOL group items that are REDEFINEd and contain elementary items with OCCURs clauses. These items can be referenced by the group name, the occurring item name, the REDEFINE group name or the elementary item in the REDEFINE group whose position corresponds to the original OCCURing item. This type of structure cannot be replicated accurately using native types. Therefore, some type of logic modification will be necessary to achieve the desired result. Once manual logic modification occurs, the migration moves further up the cost/risk curve since manual coding is inherently more error prone than automated code transformation. What are the implications of data type selection? If a code transformation attempts to convert all COBOL items to native types either one of two things will happen: Either the automated code transformation will be a partially automated transformation or an attempt will be made to try to force the native types to behave like COBOL types through code additions or modifications. In the case of a partially automated transformation the most obvious and direct mappings are done automatically while the more problematic mappings are flagged for manual intervention. This can significantly increase the amount of time associated with the migration due to the additional personnel resources needed to carry out the manual intervention. Since these resources have a cost associated with them this ultimately increases the cost of the migration, often by a significant amount. In the case of forcing behavior through additional or modified code, the potential for introducing data dependent errors increases significantly. This is because it is virtually impossible to ensure that native types can be coerced into behaving like COBOL types under all possible conditions that could be encountered in the application. Because these are often data dependent errors, they may or may not be found during system testing resulting in potential future maintenance problems. Moreover, the additional coding is often done by a team comprised of numerous different individuals. Each individual developer usually has his or her on coding style that is often slightly different

from that of other developers. When numerous manual changes are required, the same logical function in COBOL is often recoded slightly differently by each member of the migration team. This makes the resulting migrated code more difficult to maintain because of these different coding styles. For example, a table lookup in COBOL that occurs in many places could end up being coded in different ways whereas before there was only one coding technique. The alternative approach The alternative approach is to recognize that the only successful way to migrate a system from one language to another is to develop a set of classes that can be extensively tested to ensure that they accurately and consistently map the critical data structures of COBOL to C#. These classes will ensure that data types behave in C# exactly as they did in COBOL. This is the key to ensuring the accuracy and reliability of an automated code transformation the new system works exactly like the original system because the data behaves in exactly the same way. Doing this requires developing a set of classes and methods that enforce the critical rules of COBOL in the C# application environment. Proponents of transforming all COBOL data types to native types point out that this allows the most complete transformation of a COBOL system to the closest approximation of a native object oriented system. However, the price of this is a higher cost, more manual intervention and a greater potential for maintenance problems long into the future. By the same token, great care must be taken in designing the classes and methods that will enforce the critical rules of COBOL to ensure that they are efficient and consistent, to the maximum extent possible, with normal object oriented design standards. Otherwise, a program can suffer from performance issues or end up looking like COBOL written in C#. The reward for success, however, is that after automated transformation the application will function exactly as the original COBOL system did while taking on the character of an object oriented application in C# that will be understandable and maintainable by C# developers without an extensive COBOL background. It is our view that this represents the most effective compromise that meets the goals of accuracy, reliability, maximum automation, minimum cost and future maintainability.