Knowledge-based authoring tools (KBATs) for graphics in documents

Knowledge-based authoring tools (KBATs) for graphics in documents Robert P. Futrelle Biological Knowledge Laboratory College of Computer Science 161 Cullinane Hall Northeastern University Boston, MA 02115 futrelle@ccs.neu.edu (extended abstract, submitted to a workshop to be held in conjunction with ACM Multimedia '95 November 4, 1995, San Francisco, California) Introduction Documents that contain both text and graphics are difficult to characterize for the purposes of retrieval and navigation. Digital Libraries of the future will be little advanced over current libraries if we cannot meet the challenge of dealing with the conceptual content of documents. There have been substantial advances over the last few decades in the parsing of text and graphics, but the methods are hardly ready for large-scale application. We submit that large strides can be made by building systems that allow the author to contribute to the generation of conceptual information during the authoring process. Thus, the information would be captured at authoring time rather than being laboriously reconstructed after the fact. The tools for doing this are called Knowledge- Based Authoring Tools (KBATs). We are building such a tool for creating gene diagrams and have elsewhere discussed the issues involved in designing KBATs for text (Futrelle & Fridman, 1995). The discussion here will emphasize graphics. The syntactic and semantic strategies used in our parsing work (Futrelle & Nikolakis, 1995) are an important part of the KBAT design. We describe how graphics drawing environments can be exploited to build successful KBATs. The approach described has clear extensions to text, audio and video. One of the most difficult problems with these methods is that of minimizing the intrusiveness of concept-gathering during the authoring process. The syntax and semantics of graphics Graphics can illustrate many complex structures and relationships with ease and directness. Unfortunately, these qualities are only there for the human beholder; they are unavailable for indexing or searching by computer. Even though there are many standard graphics file formats, they only contain the graphics primitives used in the drawing (lines, polygons, text, etc.) and collections of them, but are of little use to a system that is trying to establish the content of a graphic for indexing. As it is for natural language, the structure of graphics can be reasonably divided between syntax and semantics. Figure 1 is a simple diagram with one principal syntactic analysis, but numerous semantic interpretations.

A B Figure 1. Two joined rectangles with arrows directed at them. Syntactically the arrows appear labeled at one end and contacting the rectangles at the other. But semantically, the arrows could attach the labels to the rectangles in a labeling relation or they might indicate flows in a process, as well as many other interpretations. A syntactic analysis of the above diagram (Futrelle & Nikolakis, 1995), using a constraint grammar, discovers the following relations, among others, (near (text text1 "A") (tail arrow1)) (near (head arrow1) rectangle1) (vertically_aligned (vertical_center rectangle1) line1) (horizontal line1) (1a) (1b) (1c) (1d) where text1, arrow1, and rectangle1 are the components on the left, and line1 is the long horizontal line passing through the rectangles. The most frequent semantic interpretation of the figure is for the arrows to bind the labels, "A" and "B", to the rectangles 1 and 2 in a labeling relation. This could be represented by semantic frame instances such as, labeling1 (2) label: (text1 "A") labeled: rectangle1 coupler: arrow1 Each slot in (2) is a binary relation, e.g., the label: relation between the labeling1 frame and (text1 "A"). The semantics must go further than this to cover other interpretations, e.g., (1) It is a gene diagram; (2) It is two buildings joined by a walkway; (3) It is two successive stages of a process or procedure; (4) It is two tunnels or covered bridges with a road passing through them. Our concern here is not to delve into the semantic representations of these various interpretations, but to see how we could devise a system that would allow an author to easily describe the desired interpretation so it would be preserved in the electronic document for later indexing, retrieval, and browsing. KBATs for graphics 2 Futrelle, 7/95

KBATs as semantic construction kits for diagrams In a normal drawing package, predefined objects are available (lines, ovals, polygons, etc.) as well as more flexible tools (Bezier curves, free-form lines, etc.). There are parameter settings available such as line widths or (scaled) dimensions. In addition, constraints can be imposed such as positioning to a grid, or aligning or spacing horizontally or vertically. Transformations such as translation, scaling and rotation are available. Items can be grouped and then copied and the copies transformed. When the resulting diagram is parsed, the parser has to go beyond the relations given by the drawing package. It would have to discover relations such as near or contained or long versus short or left-of or right-of. The resulting parse identifies the participating constituents, and through the grammar rules (productions) that lead to them, the syntactic relations between objects. When an author is actively involved in the specification process while entering a diagram, much of the syntactic analysis can be hidden, and the semantic representation can be built directly. This is done by making the semantic primitives directly available to the author. Consider the semantic frame (2). The labeling relation is a ubiquitous one, occurring in virtually all diagrams. Usually it is left to the viewer to determine what object a label is intended to label. Rarely are these ambiguous to a human. But the machine needs to be told. In a KBAT, the labeling process would be mediated by an explicit set of tools for telling the system about labels and labeled objects. When the system is in the "labeling" mode, the default would be to assume that text entered would be the label and the object it is nearest to, the labeled item. While in the "labeling" mode, an arrow could be drawn which, by default, would be assumed to connect label and object. (All the defaults could be overridden, so that text or an arrow could be labeled, or only one side of a rectangle would be the referent, for example.) If a label was copied, its semantics would normally be copied with it, so it would retain its role as a label, rather than just text. After semantic specification by the author, symbols on the screen acquire meanings beyond their visible form, so it is important for the author to be able to visualize these meanings. This can be done by a great variety of methods, though it is important to settle on a few standard ones to avoid confusion. The simplest semantics are the properties of a single object. For example, if the system knows that a particular string is the name of a chemical substance, it could indicate that when selected appropriately by the user. Many of the relations of interest are binary, so they can be shown, on demand, as a directed relational arc coupling two items, with the arc labeled by the relation. This can lead to ambiguities, e.g., if the relation arc points to a rectangle, the author cannot be sure whether the reference is to one edge or to the entire rectangle. The two items in the relation can be visually highlighted to render them unambiguous. The above approach to diagram authoring could well be described as using a "semantic construction kit". All manner of relations within a diagram or between diagrams, or with text, can be constructed with these kits. Conventional hypertext links are a simple, special case. Grammars, constraints, and semantic relations One of the obvious demands on a KBAT for diagrams is that it be easily extensible to a wide variety of domains, without extensive special-purpose coding for each new domain. This is possible. Imagine a basic KBAT to which we want to add the semantics of KBATs for graphics 3 Futrelle, 7/95

labeling. We would add three grammar rules and three semantic mapping rules, for the labeling semantic frame, Grammar: labeling LabelingA -> text, object; (constraints) LabelingB -> text, connector, object; (constraints) connector -> line arrow Frame: labeling label: <- text labeled: <- object coupler: <- connector (3a) (3b) (3c) (3d) Rule (3a) is a production that states that a LabelingA object is made up of two constituents. These are the primitive types text and object. The constraints in this rule require that the text be near the object. Rule (3b) is similar but with more complex constraints (cf. (1a) and (1b)). The frame (3d), expressing the semantics, is the same as (2), but shows the mapping from the syntactic constituents to semantic slots. In the labeling mode, 3a-3d would potentially be active. In this mode, any text entered would be bound to "text" and the nearest object to "object", using rule 3a. If a line or arrow were drawn, rule 3b would be used for the match instead. As these pairings are made, they can be bound to the semantic frame slots using (3d). The approach just described is a general one A mode is defined for drawing; a set of grammar rules and semantic frames and mappings are singled out as applicable to the mode; drawing is done and the drawn items are matched to the constituents in the rules and frames. Ambiguities can occur and additional disambiguation strategies would be needed, e.g., if two lines with different roles appeared in the same rule. Once a mode is chosen, constraints in the grammar rules can assist drawing, e.g., by enforcing equal lengths or alignment of certain items. We have also discussed strategies for building KBATs for text (Futrelle & Fridman, 1995). These are much harder to design than KBATs for graphics. Nevertheless, it should be possible to integrate both into a full-scale authoring system, to handle semantic relations between text and graphics, e.g., pairing a segment of text with an object in a graphic. Digital Libraries Ontological issues When digital library collections are enhanced by the inclusion of semantic information for the text and graphics of documents, the strategies open to the library user will be multiplied many-fold. Retrieval by content becomes possible, e.g., the user could ask for "any documents that describe gene A as adjacent to and to the left of gene B." The system would use the labeling to identify various occurrences of A and B and use geometric relations to decide which, if any, satisfied the query. Ontologies are very powerful, because once a semantic item is introduced by the author, a web of semantic relations immediately becomes available. For example, if the author designates a particular line as an ATM link, then relations network, connector, information-carry ing element, etc., all become available. Further links to richer knowledge representations can be exploited, to the extent that such knowledge exists. KBATs for graphics 4 Futrelle, 7/95

One of the most difficult issues in all this is the standardization of ontologies. An ontology is a particular structuring for knowledge representation. Every KBAT implementation could take a different view of the world. For example, a road might be described in one KBAT as going entering and leaving a tunnel and by another as going through a tunnel. In a gene diagram, the centered backbone line could be envisioned as a single line underlying all the segments, or as a series of short connectors. Bringing ontologies into concordance is analogous to the creation of data dictionaries or standardized keyword lists, but it is very difficult to do. The approach to Knowledge-based Authoring Tools for graphics that we have described should be practical. This is because the paradigm is much like current drawing applications a mode is selected from a menu and the user draws the components in a freely chosen order. Our current KBAT implementation, GeneDraw, is written in Dylan. It builds semantic frames and enforces constraints during drawing. The early prototype is hard-coded rather than being driven by a declarative set of rules and frame bindings, but work is in progress to do that. Acknowledgments Thanks to Chris Hopkins and Dave Kormann for work on GeneDraw and to Chris for work on the supporting Simple Object Store ("Sauce"). Work supported in part by grants from the National Science Foundation, IRI-9117030, and the Department of Energy, DE- FG02-93ER61718. References Futrelle, R. P., & Fridman, N. (1995). Principles and tools for authoring knowledge-rich documents. In DEXA '95 (Databases and Expert Systems Applications). London, England (in press). Futrelle, R. P., & Nikolakis, N. (1995). Efficient Analysis of Complex Diagrams using Constraint-Based Parsing. In ICDAR-95. Montreal (in press). KBATs for graphics 5 Futrelle, 7/95