Presented at the Eighth Nordic Workshop on Programming Environment Research (NWPER 98), Ronneby, Sweden, August 1998. On the Integration of Text Editing and Version Control Patrik Persson Dept of Computer Science, Lund University Box 118, S-221 00 Lund, Sweden e-mail: Patrik.Persson@dna.lth.se Abstract. The COOP/Orm environment is a client/server based environment for collaborative development work, such as software development. The environment is built as a framework where different kinds of editors can be plugged in. We present an important special case, the COOP/Orm text editor. The editor has access to version information and presents it to the user directly during editing. The editor s hierarchical user interface can display version information regarding single characters, subdocuments and the document as a whole. We also describe how the editor detects merge conflicts in our fine-grained model. 1 Introduction In this paper we present the text editor of the COOP/Orm environment. COOP/Orm, originating from the Mjølner project [5], is a system supporting collaborative development work using fine-grained version control. Although our emphasis is on software development, our work can also be applied to other kinds of writing, such as two or more authors writing a paper. One can divide the collaborative editing needs of an organization into two styles: asynchronous editing, where a document (or a version of it) can only be edited by a single user at any given time. Example: emacs. synchronous editing, where multiple users edit a document simultaneously and the changes made by one user makes becomes instantly visible to other users. Example: ShrEdit [4]. The text editors of today typically support one of these editing styles, but we believe that both styles are useful at different times during development. Consequently, a text editor should support both styles, and preferably a smooth transition between them [6]. The COOP/Orm environment [2][3][6] is a framework for version control and the text editor is an application within this framework. The COOP/Orm environment has a number of properties (applying not only to the text editor) which together define our basic approach to supporting collaborative work: Documents are represented hierarchically, which is in general close to our mental view of documents. An object-oriented program consists of classes, which in turn consist of methods. A paper consists of a heading and a set of sections, which in turn consist of paragraphs and subsections. In this paper, we use the term subdocument to refer
Compiler Text editor File system Version control Fig. 1. Version control in a traditional environment. All tools work with the file system with no or little knowledge of each other. to a subtree of this structure. A subdocument is either user data (such as text in a text editor) or a folder possibly containing other subdocuments. The environment maintains a version graph, which describes the dependencies between all versions of the document. The viewed version is always displayed in comparison to a compare version. The compare version is simply a reference for displaying recent changes in the viewed version, and both versions may be set quite arbitrarily (as long as the viewed version is later than, and developed from, the compare version). Since all versions are always visible to all users, any user can watch the work of another at any time (awareness). Note that this facility is similar to the synchronous editing case described earlier. There is no such thing as checking out or checking in a document, as in traditional version control systems. Instead a user may create a new version at any time, and later merge that new version with another version. Consequently, no version of a document is ever locked from any user and users may work asynchronously. The system has a client-server architecture, where all versions of all documents are stored in the server and no data is stored locally. This allows users in different locations to work on the same document. We will place the emphasis in this paper on the client, since that is where the text editor resides. Instead of handling version control and editing in two independent applications (for instance, using RCS [8] and emacs), editors interact directly with the version control system. The editor can present detailed information regarding the document s history directly to the user while editing. The COOP/Orm environment integrates editing and version control of several kinds of data. Apart from the text editor described in this paper, the environment can also handle, for instance, editors for more complex data structures such as abstract syntax trees. In this paper, the text editing case will be described and related to the rest of the COOP/Orm environment. In Section 2 we describe our integration of version control and editing, followed by a brief overview of our user interface in Section 3. Our approach to merging versions is described in Section 4. In Section 5 we give an overview of the text editor s design and
in Section 6 we briefly describe the status of the implementation. Section 7 summarizes the paper. 2 Integrated version control Traditional version control tools such as RCS [8] typically provide facilities for checking in and checking out files to and from some central repository. The development tools (text editors, compilers and others) generally know nothing of the version history of the files they read or manipulate, as displayed in Figure 1. One direct effect of this is that it is usually quite awkward for a developer to see, for instance, the differences between the version at hand and a preceding one. Such differences are usually displayed in terms of which lines were added or deleted, much like the output of the UNIX diff command. Usually such differences must be requested by the user using some special tool. In contrast, the fundamental approach of COOP/Orm is to integrate version control into the very core of the environment. The editor and other tools do not interact directly with the file system, but with the version control system of COOP/Orm. Figure 2 outlines the relations between the editor, the version control system, and the file system. This structure allows the editor (the text editor in our case) to have access to all versions of the document. One important property of this approach is that the editor can interact with all versions of the document, rather than just the version currently checked out by the user. The COOP/Orm text editor makes use of the version information available and presents it to the user. The details of this presentation are given in Section 3. COOP/Orm Text editor Client Server Version control File system Fig. 2. Version control in the COOP/Orm environment. Only the version control system interacts with the file system; editors interact with the version control system instead. The version control is divided between client and server.
Fine grained incremental version control Traditional version control systems let users edit a document in its entirety outside the version control system. When the document is checked in, the system determines the differences from the previous version and stores that difference. The difference is expressed in terms of added, deleted, or changed lines. In the COOP/Orm text editor, the version history of every character is maintained during editing. Whenever a character is added or deleted, the version information of the document is updated instantly. To put it differently, the version information is updated incrementally during editing. Note that the version information is fine grained in that the information describes the history of individual characters rather than entire lines. One attractive property of our incremental approach is that a user s changes instantly become visible as deltas 1 to other users. 3 The presentation and user interface Presenting hierarchical documents COOP/Orm supports hierarchical documents as described in Section 1. This is reflected in the graphical user interface, where the document window contains subwindows which in turn may contain subwindows in a recursive manner. COOP/Orm subwindows may be of different kinds: Editors, such as text editor instances. Folders, which do not hold any explicit information of their own apart from a name. However, folders may contain additional folders and/or editors. Other special windows for displaying various information, such as the version graph. A hierarchical COOP/Orm document may be (abstractly) described as a tree, where the internal nodes are folders and the leaf nodes are editors or empty folders. Figure 3 shows a COOP/Orm client with the Java source code for a sample program. A folder called package BankSystem holds two other folders (classes Account and Bank). These two folders in turn contain text editors for methods, constructors and instance variables. The method deposit in Bank is open but methods withdraw and get- Balance are closed (i.e. iconized). 1 Difference in data (as stored in the server) between two consecutive versions.
Fig. 3. The COOP/Orm client. Four instances of the text editor are visible; two more are iconized. Visualizing version history during editing Editors have detailed information regarding the history of the edited document. The text editor uses this information and presents parts of it to the user during editing. Recall that the COOP/Orm environment always displays a comparison between two versions, the viewed version and the compare version. All displayed information describes the viewed version, and two basic markings (shown in Figure 4) are used to display the changes from the compare version: Text that has been added since the compare version is underlined. Text that has been deleted since the compare version is shown but struck out.
Fig. 4. An example of deleted and added text. Using color: the pen metaphor Editing is always made in the viewed version, and changes made while editing are special cases of the markings above. We use a pen metaphor, where all changes during edit are displayed in a special pen color. This means that a word that is added during edit is displayed as any other added text (underlined), but both the text and its marking are drawn in the pen color. Deleting text while editing gives one of two results. If the deleted text originates from an older version (i. e. not the viewed one), it is struck out using the pen color. If text is deleted in the same version it was added in, it is simply discarded and removed from the version control system. In other words, deleting that text is interpreted as undoing the adding of text. Change propagation The notion of added and deleted characters scales to entire editors and folders in the COOP/Orm environment. Whenever a new editor or folder is created, it is marked as added in the current version. Editors and folders can assume a third state in addition to added or deleted : whenever text is added or deleted in an editor, that editor is considered changed in that version. Similarly, whenever a component (editor or folder) within a folder is added, deleted, or changed, that folder is considered changed. This implies that a simple change in a text editor, such as the addition of a word, propagates upwards through the text editor s surrounding folder(s). We call this change propagation. 4 Merging An important aspect of the COOP/Orm approach is that a user can always create a new version or variant, which can later be merged with another version. The environment performs merging of entire hierarchical documents composed of subdocuments [1]. While merging, every subdocument has two instances of add/delete/change tags: one for each of the versions the merged document is based on. Conflict handling Traditional version control systems typically avoid conflicting changes to a file by different users by only allowing a file to be checked out by one user at a time. This approach in effect serializes the development work, which is often not acceptable.
Many version control systems such as Teamware [7] support a copy-merge approach, where versioned files may be copied (rather than checked out) to a local workspace and modified in parallel with the original files. The files in the local workspace can later be put back, that is, copied to and possibly merged with the original files. Such systems report a conflict when two users put back different versions of the same file and one or more lines have changed in both versions. The first user to put back his version succeeds, but the second user will be informed of the conflict and prompted to resolve it. Merge conflicts are, in general, difficult to handle since they require two or more developers to decide how their individual changes are to be combined. The intention of COOP/Orm is to minimize the number of conflicts and, when they occur, to make them as easy to resolve as possible. Conflicts occur when two or more users work in the same part of the document. In COOP/Orm, a user can always select a suitable viewed version to monitor the work of another user. This feature is intended to reduce the number of conflicts in that the users can find out whether they are working within the same part of the document. Consequently, it should be possible to avoid conflicts rather than detecting them after they occur. To simplify conflict resolution, COOP/Orm suggests a default merge in every position where the two merged documents differ. The default merge is based on the simple rule that all changes from both branches 1 should be included in the merged version. For example, if a sentence is added in one of the branches and the corresponding position in the other branch is untouched, the default merge is to include the added sentence. Additions and deletions originating from the merge branches are displayed using a pen metaphor in a way similar to editing changes, but using other colors. During normal editing, a single pen color is used. During merge, two more colors are introduced to display changes introduced in branches. Our fine-grained model can always suggest a default merge, even if there is a conflict present. Suppose a version of the document contains the word beleeve (sic). Two different users create new versions from that version, and the first user re-spells the word to believe while the other replaces it with the word think. COOP/Orm s fine-grained version control will identify the combination of the two as the addition of the letter i (as done in the first version), the addition of the string think (second version), the deletion of the strings bel and eve (as done in the second version), and the deletion of the second e (both versions). The result is the string ithink, which is probably not what either of the users intended (refer to Figure 5). We define some combinations of add/delete/change information as conflicts (requiring user intervention), and other combinations as non-conflicting. We have cho- Fig. 5. Merge example. The box around the string belieeve indicates a conflict. 1 We use the term branches to refer to the two versions which are merged into one.
sen to define a merge conflict as the case where changes from both branches appear in the same position in the text. This model covers the conflict just described. When a user performs a merge operation, any conflicts are marked in the text and can be examined one by one. The user can accept the default merge or use the changes from either of the branches. (In the example just given, these alternatives refer to the strings believe and think, respectively.) The text can also be edited directly during merge. Note that such choices can be made for entire subdocuments: if the default merge is selected for a folder, it also applies for any subdocuments within that folder. Note that the text editor s conflict model covers syntactic conflicts rather than semantic ones. The text editor does not interpret the users text in any way. 5 Design The COOP/Orm environment supports version control of considerably more complex data structures than text. The text editor is thus a special case of a generic editor concept within COOP/Orm. COOP/Orm is implemented as an object-oriented framework [3] in Simula. Relation to the COOP/Orm framework Individual editors are installed by subclassing a Configurator class, instances of which are created whenever the user creates a new subdocument. The Configurator class declares a number of callback functions, which the subclass is responsible for implementing. Such callbacks are called by the framework in the following situations (amongst others): The user selects a new compare or viewed version The user requests a merge Other user interface related events (such as key presses, menu selections, and mouse actions The Configurator class also provides a number of functions to be called by the editor in various situations. Examples of such functions are: Mark the subdocument as added, deleted, or changed Load versioned data from the server Store versioned data in the server The editor (i.e. subclass of Configurator) is responsible for calling these functions when appropriate. For example, a key press may result in marking the editor as changed and selecting a new viewed version may require data to be loaded from the server. The editor uses version descriptors to tag its data with version information. A version descriptor encapsulates the entire history of a versioned piece of data, such as when it was created and when it was deleted. In the COOP/Orm text editor, every indi-
vidual character may be perceived as having its own version descriptor. Optimizations are made, however, to combine adjacent characters with the same history into a single versioned text block. The Configurator superclass holds a reference to the global version graph, which is responsible for managing all the system s versions and their relations. The version graph can answer straightforward questions regarding version descriptors, for example, whether the data associated with a given descriptor is present in a given version. Editor design overview The details of the design of COOP/Orm and its text editor are probably of little interest to most readers. Instead, we give an overview of the major design components from the text editor s point of view. A class diagram (in UML notation) is given in Figure 6. The text editor is given a window for its user interaction. The text is represented as a list of text blocks, each representing a set of consecutive characters with the same version history. These blocks are split whenever a new character with a different history is inserted. Similarly, when a deletion results in two adjacent blocks with the same history, they are joined. As indicated in the previous section, each text block is equipped with a version descriptor to encapsulate its history. Configurator Views Most traditional version control systems regard a versioned document as a set of text files, each tagged with a version number. In COOP/Orm, however, a versioned document can be regarded as a set of characters where each character has its individual history. Coordinates in a document (i.e. row and column indices) are not unambiguous: in general, such coordinates refer to different positions in different versions of a docu- Version- 0..* Graph Version- Descriptor COOP/Orm framework Text editor application TextWindow DocEditor View TextBlock 1..* 0..* 0..* Fig. 6. Overview (simplified) of the COOP/Orm text editor design in UML notation. Most of the framework has been left out for brevity.
ment. In a sense, a COOP/Orm text document is three-dimensional: row and column indices are not enough to identify a position, version information is required as well. To allow the text editor to manipulate the document in a uniform manner, it internally maintains a set of views (or filters). A view can be seen as a subset of all text blocks, selected according to some simple rule. For instance, one view defines which blocks are visible in the user s window. Another kind of view is used when the user selects an older compare version. Such an action results in the loading of a delta to re-create the version v1 from a newer version v2 from the server. In this case, a view is created where only blocks visible in version v2 are present. (All coordinates in such a delta are relative to the coordinate system of v2.) This view is used to insert the loaded data into the document. All insertion and deletion operations, as well as traversals (i.e. moving the cursor) are applied to some view. The view automatically ignores all text blocks not present in the view. The advantage of using views to access the document is that they resemble ordinary (non-versioned) text documents. Once a suitable view is selected, the document may be manipulated as any two-dimensional text document. The functionality to restrict the set of text blocks according to the view s rule is embedded within the view. 6 Implementation status The COOP/Orm framework is implemented in Simula and supports editors for a few kinds of data. Notably, most of the text editor is implemented, though the merge conflict handling is not yet complete. 7 Conclusions We have presented a way to integrate editing and version control. By integrating the editor and the version control system and making the editor aware of the version information, selected pieces of that version information can be presented to the user directly in the editor. The user can easily see what was changed since an arbitrary preceding version. The hierarchical structure of COOP/Orm documents allows such version information to be automatically displayed for entire subdocuments. It also allows for a structured presentation of potential merge conflicts. The presentation of version information is central to our approach. By making appropriate information available to the developer, our text editor appears to reduce the number of merge conflicts and simplify collaborative work. Users should be able to more easily keep track of each others work. In our fine-grained version model, every individual character can be perceived as having its own version history. This means that two independent changes to a single line do not necessarily result in a conflict. Instead, we define a conflict to occur when two (or more) independent changes are made at the same position in a document.
The COOP/Orm framework supports versioning of complex data structures. Our next step is to develop an editor for abstract syntax trees (ASTs) with the goal to integrate version control into the entire Mjølner/Orm system. Our text editor is one implemented example (albeit an important one) of an editor in the COOP/Orm environment. This editor shows, according to experiences so far, the usefulness of our integrated approach. Acknowledgments I want to thank Ulf Asklund and Boris Magnusson, who developed the COOP/Orm framework and provided significant input both for this paper and the COOP/Orm text editor. I also want to thank Görel Hedin and Klas Nilsson for many helpful comments on earlier versions of this paper. The anonymous reviewers provided several helpful comments. References 1. U. Asklund, Identifying Conflicts During Structural Merge. Proceedings of the Nordic Workshop on Programming Environment Research, Lund, Sweden, June 1-3, 1994. 2. U. Asklund, Integrated Version Control in the COOP/Orm Version Server. Proceedings of the Nordic Workshop on Programming Environment Research, Aalborg, Denmark, May 29-31, 1996. 3. U. Asklund, The COOP/Orm Framework. DRAFT, Lund University, 1997. 4. L.J. McGuffin and G.M. Olson, ShrEdit: A Shared Electronic Workspace. Cognitive Science and Machine Intelligence Laboratory, Tech. report #45, University of Michigan, Ann Arbor, 1992. 5. J.L. Knudsen, M. Löfgren, O.L. Madsen, and B. Magnusson, editors. Object-Oriented Environments the Mjølner Approach. Prentice Hall, 1993. 6. S. Minör and B. Magnusson, A Model for Semi-(a)Synchronous Collaborative Editing. Proceedings of Third European Conference on Computer Supported Cooperative Work, Milano, Italy, September 13-17, 1993. 7. Teamware User s Guides, Sun Microsystems, 1994. 8. W.F. Tichy, RCS - a system for revision control. Software Practice and Experience, 15(7):637-634, July 1985.