Reverse Engineering with a CASE Tool. Bret Johnson. Research advisors: Spencer Rugaber and Rich LeBlanc. October 6, Abstract

Similar documents
Section 5.5. Left subtree The left subtree of a vertex V on a binary tree is the graph formed by the left child L of V, the descendents

THE NEWYACC USER'S MANUAL. University of Maryland. This manual introduces NewYacc, a parser generator system built upon the

Khoral Research, Inc. Khoros is a powerful, integrated system which allows users to perform a variety

Project Compiler. CS031 TA Help Session November 28, 2011

Introduction to Linux

Construction of Application Generators Using Eli. Uwe Kastens, University of Paderborn, FRG. Abstract

Acknowledgments 2

The members of the Committee approve the thesis of Baosheng Cai defended on March David B. Whalley Professor Directing Thesis Xin Yuan Commit

CS 164 Handout 16. Final Examination. There are nine questions on the exam, some in multiple parts. You have 3 hours to work on the

CS 051 Homework Laboratory #2

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Chapter 20: Binary Trees

CS 6353 Compiler Construction Project Assignments

MathML Editor: The Basics *

A Parallel Intermediate Representation based on. Lambda Expressions. Timothy A. Budd. Oregon State University. Corvallis, Oregon.

CSE 12 Abstract Syntax Trees

Generating Continuation Passing Style Code for the Co-op Language

NAME gcov coverage testing tool SYNOPSIS

Room 3P16 Telephone: extension ~irjohnson/uqc146s1.html

Some Computer Preliminaries

Semantic Analysis. Lecture 9. February 7, 2018

ER E P M S S I A SURVEY OF OBJECT IDENTIFICATION IN SOFTWARE RE-ENGINEERING DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

ITC213: STRUCTURED PROGRAMMING. Bhaskar Shrestha National College of Computer Studies Tribhuvan University

CPSC 320 Sample Solution, Playing with Graphs!

Properties of Regular Expressions and Finite Automata

printf( Please enter another number: ); scanf( %d, &num2);

Computer Science 136 Spring 2004 Professor Bruce. Final Examination May 19, 2004

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

which a value is evaluated. When parallelising a program, instances of this class need to be produced for all the program's types. The paper commented

The Extensible Markup Language (XML) and Java technology are natural partners in helping developers exchange data and programs across the Internet.

INCORPORATING GRAPHIC ITEMS

COMPILER DESIGN. For COMPUTER SCIENCE

Lecture Notes 16 - Trees CSS 501 Data Structures and Object-Oriented Programming Professor Clark F. Olson

Programming Assignment III

CS2383 Programming Assignment 3

YOLOP Language Reference Manual

Object-oriented Compiler Construction

Copyright c 1989, 1992, 1993 Free Software Foundation, Inc. Published by the Free Software Foundation 59 Temple Place - Suite 330 Boston, MA

and easily tailor it for use within the multicast system. [9] J. Purtilo, C. Hofmeister. Dynamic Reconguration of Distributed Programs.

ER E P M S S I TRANSLATION OF CONDITIONAL COMPIL DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

15-122: Principles of Imperative Computation, Spring 2016

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Lab copy. Do not remove! Mathematics 152 Spring 1999 Notes on the course calculator. 1. The calculator VC. The web page

Reid Baldwin and Moon Jung Chung. Michigan State University. unit (task) in a owmap has input and output ports indicating

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Building Compilers with Phoenix

C Language, Token, Keywords, Constant, variable

Unit 1 : Principles of object oriented programming

INF2220: algorithms and data structures Series 1

Trevor Darrell. PostScript gures are automatically scaled and positioned on the page, and the

Lecture 26. Introduction to Trees. Trees

Creating a Basic Chart in Excel 2007

LESSON 4. The DATA TYPE char

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

AGDB: A Debugger for Agent Tcl. Melissa Hirschl and David Kotz. Hanover, NH fhershey, February 4, 1997.

The problem of minimizing the elimination tree height for general graphs is N P-hard. However, there exist classes of graphs for which the problem can

Programming for Engineers Introduction to C

Friday, March 30. Last time we were talking about traversal of a rooted ordered tree, having defined preorder traversal. We will continue from there.

Important Project Dates

Binary Trees. Height 1

Chapter 2 XML, XML Schema, XSLT, and XPath

Expressions and Casting. Data Manipulation. Simple Program 11/5/2013

Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of C

Light Speed with Excel

Programming Assignment IV Due Thursday, June 1, 2017 at 11:59pm

3. Except for strings, double quotes, identifiers, and keywords, C++ ignores all white space.

CSC 467 Lecture 13-14: Semantic Analysis

The Compositional C++ Language. Denition. Abstract. This document gives a concise denition of the syntax and semantics

7.1 Introduction. A (free) tree T is A simple graph such that for every pair of vertices v and w there is a unique path from v to w

Expressions that talk about themselves. Maarten Fokkinga, University of Twente, dept. INF, Version of May 6, 1994

v. 9.1 GMS 9.1 Tutorial UTEXAS Embankment On Soft Clay Introduction to the UTEXAS interface in GMS for a simple embankment analysis

CSE 143 Final Exam Part 1 - August 18, 2011, 9:40 am

Incremental Discovery of Sequential Patterns. Ke Wang. National University of Singapore. examining only the aected part of the database and

Slide Set 1. for ENCM 339 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS. Regrades 10/6/15. Prelim 1. Prelim 1. Expression trees. Pointers to material

Expressions and Casting

A Simple Syntax-Directed Translator

Destination-Driven Code Generation R. Kent Dybvig, Robert Hieb, Tom Butler Computer Science Department Indiana University Bloomington, IN Februa

Compilers and computer architecture From strings to ASTs (2): context free grammars

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

15-122: Principles of Imperative Computation, Spring Due: Thursday, March 10, 2016 by 22:00

Abstract Data Structures IB Computer Science. Content developed by Dartford Grammar School Computer Science Department

The Extensible Java Preprocessor Kit. and a Tiny Data-Parallel Java. Abstract

THE FACT-SHEET: A NEW LOOK FOR SLEUTH S SEARCH ENGINE. Colleen DeJong CS851--Information Retrieval December 13, 1996

class for simulating a die (object "rolled" to generate a random number) Dice(int sides) -- constructor, sides specifies number of "sides" for the die

Copyright c 1992, 1993, 1994, 1995 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided

CS301 - Data Structures Glossary By

TWM Icon Manager Desktop graph. xterm emacs TWM Icon Manager. Desktop. Tree Browser. xterm. graph ABC. Hyperlink. Tree Browser. xterm.

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Syntax-Directed Translation

Martin P. Robillard and Gail C. Murphy. University of British Columbia. November, 1999

QUIZ. Source:

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 2

Using Flex 3 in a Flex 4 World *

2/5/2018. Learn Four More Kinds of C Statements. ECE 220: Computer Systems & Programming. C s if Statement Enables Conditional Execution

CS606- compiler instruction Solved MCQS From Midterm Papers

Extension of GCC with a fully manageable reverse engineering front end

The Stepping Stones. to Object-Oriented Design and Programming. Karl J. Lieberherr. Northeastern University, College of Computer Science

Transcription:

Reverse Engineering with a CASE Tool Bret Johnson Research advisors: Spencer Rugaber and Rich LeBlanc October 6, 994 Abstract We examine using a CASE tool, Interactive Development Environment's Software through Pictures (StP), to support reverse engineering. We generate structure charts in StP from the automated analysis of C source code. The advantages of this approach are that one can use the CASE tool's support for drawing, linking, and modifying pictorial notations for program design in order to make it easier to construct a reverse engineering tool. Additionally, one can then use the design representations with the CASE tool to do reengineering for maintenance. Introduction Reverse engineering, in the context of this paper, is the act of taking the source code for a program and extracting from it information about the design of the program. Reverse engineering is frequently required when an existing program needs to be rewritten or modied. Unfortunately, all too often such programs have been written with little attention to the principles of software engineering such as providing adequate internal and external documentation. Computer Aided Software Engineering (CASE) tools are software tools that are intended to support forward engineering{the transformation from requirements to design to source code that occurs when writing new software. CASE tools frequently allow the the software engineer to represent the design of his program pictorially, through notations such as structure charts. This work examines using a CASE tool, Interactive Development Environment's Software through Pictures (StP), intended for forward engineering, for reverse engineering. StP supports many dierent pictorial notations for representing a program's design at a level more abstract than source code.

These notations include entity relationship diagrams, structure charts, data structure diagrams, and data ow and control ow diagrams. StP is a typical CASE tool, for most other CASE tools also support these notations. One typically thinks of the design information in these notations as coming from the software engineer; he takes ideas in his head about how the program should be designed and from these ideas draws the diagrams that represent the program design. However it is also possible that these diagrams could be generated from source code analysis of an existing program. This source code analysis could be either completely automated or partially automated and partially requiring user interaction. Therefore, given the source code for some program, one could construct a pictorial representation of the design of the program. This pictorial representation could be viewed and modied using StP. And thus StP could be used to support reverse engineering. In our work we generate structure charts from analysis of C programs. The analysis is mostly automated, though the reverse engineer can control the analysis in a few dierent ways. Section 2 describes the details of our reverse engineering tool. Section 3 explains how to run the tool here at Georgia Tech. This section tells where the required les are located and exactly how they are used. Section 4 examines possibilities for future research. And Section 5 gives concluding remarks. 2 The Reverse Engineering Tool Our tool generates StP structure charts by analyzing C source code. The structure charts generated by our tool are basically call trees. They say what functions call what other functions. The structure chart notation supports including other information in the structure chart, such as indicating if a function is called from inside a loop or a conditional statement or what the function's parameters are. Our tool, however, does not currently generate structure charts containing these additional notations. Our tool works in four phases. It analyzes C source code using NewYacc, extracts the important parts of this analysis using Awk, generates a structure chart using a layout program written in C, and allows the reverse engineer to interact with the structure chart using StP. To better illustrate the actions of the various phases of the tool, we show the of the intermediate text les that are generated by the dierent phases when analyzing the simple C program in Figure. The rst phase uses an enhanced version the parser generator Yacc [] 2

int a, b; main() { init(); process(); } init() { init_a(); init_b(); } init_a() { a = ; } init_b() { b = 2; } process() { int i; } for (i = ; i < a; ++i) printf("b == %d\n", b); Figure : A Sample C Program to be Reverse Engineered 3

DCL:example.c::<,5>-<,5>:a DCL:example.c::<,8>-<,8>:b FDC:example.c::<3,>-<3,4>:main FRF:example.c:2:<5,3>-<5,6>:init FRF:example.c:2:<6,3>-<6,9>:process FDC:example.c::<,>-<,4>:init FRF:example.c:2:<2,3>-<2,8>:init_a FRF:example.c:2:<3,3>-<3,8>:init_b FDC:example.c::<7,>-<7,6>:init_a DEF:example.c:2:<9,3>-<9,3>:a FDC:example.c::<23,>-<23,6>:init_b DEF:example.c:2:<25,3>-<25,3>:b FDC:example.c::<29,>-<29,7>:process DCL:example.c:2:<3,9>-<3,9>:i DEF:example.c:2:<34,8>-<34,8>:i REF:example.c:2:<34,5>-<34,5>:i REF:example.c:2:<34,9>-<34,9>:a REF:example.c:2:<34,24>-<34,24>:i FRF:example.c:2:<35,5>-<35,>:printf REF:example.c:2:<35,25>-<35,25>:b Figure 2: Output of NewYacc Phase called NewYacc [2]. NewYacc contains a number of features which make it easy to write a source code analyzer. These features are detailed in [2]. We augmented the C grammar provided with NewYacc to output information concerning the denition of and references to functions, variables, and labels. Only about 2 lines of the C grammar le needed to be modied and a few auxiliary functions written. The output of this NewYacc analysis on our sample program is shown in Figure 2. In the output \DCL" means a variable declaration, \FDC" means a function denition, \FRF" means a function reference (function call), \REF" means a variable reference, and \DEF" means a variable denition (assignment to a variable). The output also tells the le name and line and column numbers where each identier appears and the level of nesting of that identier's scope. The second phase uses Awk [3], a programming language especially good for parsing simple text le formats and scanning for patterns. Our Awk 4

D main R init R process D init R init_a R init_b D init_a D init_b D process R printf Figure 3: Output of Awk Phase program extracts from the output of the NewYacc phase just the information needed to generate structure charts. Our Awk program is only only three lines long. It simply picks out the lines of the NewYacc output representing function denitions and function references (function calls) and outputs the name of the function together with an \R" or a \D" indicating if this is a denition or reference. The output of this phase for our continuing example is shown in Figure 3. Note that all of the \R" functions appearing after a \D" function are functions called by that \D" function. The third phase is the most involved. It is a C program that reads in the output of the Awk phase and converts it to a structure chart diagram in the format required by StP. The main task of the program is to lay out the structure chart in an aesthetically pleasing way, and thus this program is named layout. The program reads in the output of the Awk phase and builds a data structure describing the program being analyzed. This data structure is basically a list of subprogram objects, each subprogram object describing one subprogram (the layout program was designed in an objectoriented fashion). Each of these subprogram objects contains, as one of its components, a list of the subprograms that this subprogram calls. This data structure can be viewed as a call tree. The subprograms are the nodes in the tree and the subprograms they call are the children of these nodes. It is the user's responsibility to specify which subprogram should serve as the root of the tree. Typically, the user will specify \main" as the root for a C program. Currently, the Smalltalk prototype for the layout program allows the user to specify any subprogram name he wants for the root of the tree, 5

while the C version of the layout program always assumes that \main" is at the root. After this data structure is built, the program proceeds to create the StP structure chart diagram le. This operation proceeds in two steps. First, a postorder traversal of the call tree is made. The root of each subtree of the call tree is assigned a number giving the number of units of width that that subtree is allotted in the StP structure chart diagram. Because this is a postorder traversal of the tree, the assignments are made bottom up. Leaves are assigned enough space for a single box in the diagram plus a little extra space so that boxes don't touch each other. A parent node's width assignment is equal to the sum of the width assignments of its children. Next the program does a preorder traversal of the call tree, printing out in the StP structure chart le format each node in the tree and the arc between that node and its parent. This output for our continuing example is in Figure 4. A description of the StP structure chart le format can be found on pages 9-39 { 9-4 of [4]. The reverse engineer has some control over the generation of the structure chart: he can specify predened subprograms and subprograms to be ignored. Note that these two features are present in the prototype of the layout program, written in Smalltalk, but they have not yet been implemented in the nal C version of the layout program. The reverse engineer can specify that certain subprograms are prede- ned. Predened subprograms are subprograms whose implementations are not part of the current program being analyzed. For example, the standard library functions in C should probably be specied by the reverse engineer as being predened. Structure charts represent predened subprograms in a dierent way, with two lines on the sides of their boxes instead of one. The reverse engineer can also specify that certain subprograms are to be ignored. He might choose to ignore subprograms that do an insignicant part of the computation so that these subprograms do not appear in the structure chart and are thus not distracting. The program will simply skip over ignored subprogram nodes and all of their children when traversing the call tree. The nal phase is to read the structure chart generated by the layout program into StP. Once in StP the reverse engineer can pan over and zoom in on the diagram. He can also make modications to the diagram, perhaps to increase its aesthetics. The structure chart for our example, as printed by StP, is shown in Figure 5. 6

scefile6 6 5 5 4 2 main.. 2 4 94 328 init.. 3 4 88 456 init_a.. 2 2 3 2 4 4 456... Figure 4: Partial Output of Layout Phase 7

main process init printf init_b init_a Figure 5: Final Structure Chart 3 Using Our Tool at Georgia Tech For those readers at Georgia Tech who wish to run our tool, we now describe exactly what steps you must go through. All of the tools are compiled for Sparc machines (though they can be compiled for other machines), so log into a Sparc machine. Change to the CToStructureChart directory, in whoever's account is keeping these les. For simplicity, we will work entirely out of this directory. Copy all of the C source les that you wish to reverse engineer to this directory. Run the C preprocessor (/lib/cpp) on these source les and save the resulting les. Next type makesc followed by the names of the les preprocessed above. makesc is a four-line shell script taking as many arguments as you choose to give it, each a le name. makesc concatenates all of the argument les (concatenating several C source les gives a valid C source le), and runs the resulting large le through canal (the NewYacc source analyzer), lter.awk (the Awk program), and layout (the C layout program). The nal result, an StP structure chart le, is written to standard output, so you should redirect it to a le. You should give the structure chart le a name 8

ending in.sce, StP's standard naming convention for structure chart les. Now all that remains is to read the structure chart diagram into StP. Start the StP Main Menu, select the SCE icon, ll in the name of the structure chart le in the \Diagram:" eld, and press the \Execute" button. Now you are free to manipulate the structure chart as you please. 4 Possibilities for Future Work In the future we wish to investigate how StP can be used interactively as a reverse engineering tool. In particular, the structure charts generated by our tool are often too big. When printed on a single sheet of paper, these structure charts are too small to read. Big structure charts need to be broken down into a group of linked smaller charts. We envision a tool where the reverse engineer can interactively break a large structure chart into linked smaller charts using the StP structure chart editor. Unfortunately, the standard StP structure chart editor does not have facilities built in for breaking apart structure charts. We hope to nd an eective way to add such functionality to the structure chart editor. There are a few possible methods of accomplishing this task that we are considering. Perhaps one of these methods will be eective and appear in a future paper. 5 Conclusion We have successfully demonstrated that a CASE tool can be used to support reverse engineering. Our structure chart generator appears to be a genuinely useful tool for a reverse engineer. It was much easier to write this tool to work in conjunction with StP than it would have been to write a stand alone, interactive structure chart generator and viewer. NewYacc has also proven to be a handy tool, making it relatively simple to write source code analyzers. We veried that StP implements the open architecture its makers advertise. Without its simple ASCII text le format for structure chart diagram les, a le format described thoroughly in the StP User's Manual, writing our tool would have been much more dicult. This principle disadvantage of building a reverse engineering tool on top of a CASE tool is that the CASE tool's architecture is not totally open. For example, in StP the user cannot completely redene the user interface for the structure chart editor. If one is interested in developing a high 9

quality, commercial reverse engineering product, this lack of a complete exibility in being able to change the functionality of the CASE tool could be an argument for writing the entire reverse engineering tool from scratch. However, CASE tools, especially StP, can be customized in a fairly large number of ways. Thus for developing inhouse reverse engineering tools, using a CASE tool appears very promising. References [] S. Johnson, \Yacc: Yet Another Compiler-Compiler," Bell Laboratories, 979. [2] E. White, J. Callahan, and J. Purtilo, \The NewYacc User's Manual," University of Maryland. [3] A. Aho, B. Kernighan, and P. Weinberger, \Awk { A Pattern Scanning and Processing Language," Bell Laboratories. [4] Interactive Development Environments, Software through Pictures User Manual, 988.