Towards a Semantic Web Modeling Language

Towards a Semantic Web Modeling Language Draft Christoph Wernhard Persist AG Rheinstr. 7c 14513 Teltow Tel: 03328/3477-0 wernhard@persistag.com May 25, 2000 1 Introduction The Semantic Web [2] requires that reference models, schemas specifying structuring and access methods of information, can be shared by consumers and providers of information. Every net user should be able to publish schemas: entirely new models, sophisticated refinements of given models and also simplifications of given models which are adapted to an application field or user community. The World Wide Web Consortium suggest with RDF and RDF-Schema [5] a rudimentary logic language for that purpose. RDF breaks knowledge into its atoms, property-object-value triples. We wish to complement this bottom-up approach by starting from a higher level of abstraction, with a language for distributed knowledge and data modeling: From primitive object types, that specify single messages, complex models can be built using Web links and a few operators. The modeling language should facilitate a number of requirements of modeling in the semantic web: providing human understandable structuring of knowledge, defining mappings between different schemas related to a common subject, incorporating well known techniques from knowledge based modeling such as object orientation, constraints and defaults, incorporating more recent techniques from the area of type systems of programming languages, defining extensions corresponding to particular knowledge representation approaches and defining mappings to different concrete representation and programming languages. 1

1.1 First Order Logic as Formalism In this paper we outline an abstract syntax and semantics of a simple type system suited for distributed modeling in the semantic web. To provide a common basis for extensions of several kinds, it should be as general as possible. We like to specify the type system using first order logic, following the knowledge representation tradition of artificial intelligence, for two reasons: First order logic is expected to provide a single language in which everything can be expressed, thereby allowing the combination of knowledge simply by conjunction of formulas. Secondly, first order logic provides a separation between declarative information and processing. Specifications have not to be obscured with entangled processing aspects, and, when it comes to processing, it can be done really efficient with tools like preprocessors and specialized theorem provers. 1.2 On Concrete Syntax Today there is an abundance of suggestions for XML based formats for exchanging structured information in the Web. In contrast to conventional legacy document processing, we start out with a completely formalized language, which already comes with an abstract syntax and can be easily mapped into XML, ISO Prolog, Common Lisp, a variant of Corba IDL and a meta object protocol. 1.3 Types Different schema level representation formalisms, like relational database schemas, object-oriented classes and object-oriented interfaces emphasize different aspects such as data structuring, code sharing, model based representation, and interfaces between providers and customers of information. Our approach starts from considering types in the sense of interfaces only. As familiar from the Java programming language, types (interfaces) are distinguished from classes. With types only method signatures and no implementation or concrete data structuring information is associated. Very flexible ways of combination, such as multiple inheritance, can be specified for types in simple ways. Since agreement on supplied and required interfaces is all that is needed for the exchange of data in a distributed environment, types already provide the glue for many useful applications. 1.4 Types as Objects The logic-of-frames approach in the past era of knowledge representation focused on showing how frames (types) and slots (fields, methods) can be expressed using first order logic as predicates and functions respectively. In these straightforward logic representations (see e.g. [3]) types are predicates 2

and therefore not objects of the first order language. The issues we are interested in, however require that types are considered as language objects: It must be possible for the specification of a type to include references to other types. It must be possible to match types provided and required in data exchange scenarios. It must be possible to specify, infer and verify mediators [7], mappings between types describing a common domain in different ways. In particular this involves characterizations of equality of types and the subtype relationship. Some concrete type systems specify those relationships based on the names by which types are referred to. For a general system it must be possible to compare types by their extensional or structural properties. In the literature both approaches are termed by-name equivalence and structural equivalence [4]. We adopt structural equivalence and provide a mechanism to simulate by-name equivalence. 2 Syntax We prepend the semantic description of the type system operators by an overview on the abstract syntax of Web documents containing type definitions. Some suggestions for the context in which it is used and extensions are described below in section 5. definition ::= define(uri, type) reference ::= refer(uri ) type ::= reference primitive type composed type primitive type ::= name : msg(type, type) name : tag void any composed type ::= type type type type URI is an identifier from the Web URI namespace. reference contains an URI referring to a type definition. : is a binary constructor function for primitive types. name is a string that is used as message name or tag name. N: msg(t a, T v ) denotes the type which implements just the message N on the sequence of arguments of types T a and with value type T v. Special cases are N : msg(t v ), which denotes the type which implements the single message N without argument and with value type T v, and N:msg(T a, void) which denotes a message that does not return a value. N: tag denotes the 3

type that implements just the so called tag N. Tags are used to simulate by-name type equivalence and subtyping. any is the most general type. and can be used to form intersection and union types. 3 Axioms We present axioms stating semantic properties of types. in(x, T ) denotes that object X has type (i.e. satisfies the interface) T. The role of this predicate is similar to in the common axiomatizations of set operators. 3.1 Primitive Types Axiom 1 (MSG) X, N, T a, T v in(x, N : msg(t a, T v )) A in(a, T a ) V send(n, X, A, V ) in(v, T v ) send(n, X, A, V ) states that V is the result of sending message N with argument A to object X. A message is represented by send(n, X, A, V ) as a partial function, that (for given N and X) is defined on each argument in its domain A. An additional axiom can state the uniqueness of send: N, X, A, V 1, V 2 send(n, X, A, V 1 ) send(n, X, A, V 2 ) V 1 = V 2. Although send is used here in the axiomatization of the type system, to prove properties of types such as subtype relationship or equivalence of given types, the full semantics of send (i.e. the semantics of actual programs) is not required. For messages with argument arity larger than one, analogous axioms can be given. For messages with arity zero, the corresponding axiom looks as follows: Axiom 2 (MSG-NOARG) X, N, T v in(x, N : msg(t v )) V send(n, X, V ) in(v, T v ) Messages which return no useful result, (which are called e.g. for side effects), have the predefined type void as result type. The tag type constructor is used to simulate by-name type equivalence. Two types, which otherwise implement the same messages, can be distinguished by the tags they implement. These tags are not linked with any further constraints, although a more specific type-system might associate constraints with them, for example to ensure that comparision of tags is sufficient for type checking. The following axiomatization of tag as message without argument and result is somewhat arbitrary, but fits into the general scheme: 4

Axiom 3 (TAG) X, N in(x, N : tag) V send(n, X, V ) in(v, void) 3.2 Properties Axiom 4 (SUBTYPE) T 1, T 2 subtype(t 1, T 2 ) X in(x, T 1 ) in(x, T 2 ) The subtype relationship. The following theorem illustrates that the formalization handles types in contravariant (anti-monotonic) [4] position as expected: Theorem 1 N, T a1, T v1, T a2, T v2 subtype(t a2, T a1 ) subtype(t v1, T v2 ) subtype(n:msg(t a1, T v1 ), N:msg(T a2, T v2 )) Axiom 5 (EQUIVALENCE) T 1, T 2 T 1 = T 2 X in(x, T 1 ) in(x, T 2 ) Type equivalence. 3.3 Intersection and Union Types Axiom 6 (INTERSECTION) X, T 1, T 2 in(x, T 1 T 2 ) in(x, T 1 ) in(x, T 2 ) Intersection of types is used to construct a type supporting different messages directly from primitive message types, as well as, in the manner of multiple inheritance, from given types and given additional primitve message types. For example the type point supports two messages, get x and, get y which return objects of type int. colored point extends point by supporting the get color message: point = get x:msg(int) get y:msg(int) colored point = point get color:msg(color) 5

Axiom 7 (UNION) X, T 1, T 2 in(x, T 1 T 2 ) in(x, T 1 ) in(x, T 2 ) Consider two types rgb color and hsv color: rgb color = get color name:msg(string) get red:msg(int) get green:msg(int) get blue:msg(int) hsv color = get color name:msg(string) get hue:msg(int) get saturation:msg(int) get value:msg(int) get color name:msg(string) is a supertype of their union type (rgb color hsv color). A variant type, the disjoint union of both color types, could be defined with the help of tags: (rgb color tag:tag rgb color) (hsv color tag:tag hsv color) The following equations hold for unary methods, analogous equations hold for methods of other arities: Theorem 2 N, T, T 1, T 2 N:msg(T, T 1 T 2 ) = N:msg(T, T 1 ) N:msg(T, T 2 ) N, T, T 1, T 2 N:msg(T, T 1 T 2 ) = N:msg(T, T 1 ) N:msg(T, T 2 ) N, T 1, T 2, T N:msg(T 1 T 2, T ) = N:msg(T 1, T ) N:msg(T 2, T ) N, T 1, T 2, T N:msg(T 1 T 2, T ) = N:msg(T 1, T ) N:msg(T 2, T ) Since, for example n:msg(int string, void) = n:msg(int, void) n:msg(string, void), implementing message n by a single method accepting (int string) and implementing it using overloading by two methods accepting int and string respectively, are equivalent. However, not every composed type expression containing primitive messages with the same name is equivalent to a single primitive message type: For example (n:msg(int, int) n:msg(real, real)) 1 is a supertype of n:msg(real,int), but not equivalent. 1 Assumed int is defined as subtype of real (e.g. by defining real = real tag:tag and int = real int tag:tag) 6

The following illustrates the subtype relationship of fields, considered as types composed from types implementing getter and setter methods. 2 The types on the third line implement a read- and writable field of type real and int respectively. Their supertype in the center implements a field, that can be written only with an int but might return a real when read. Their subtype at the bottom line can implements a field, that can be written with a real but returns an int. set n:msg(int,void) get n:msg(real) set n:msg(real,void) set n:msg(int,void) get n:msg(real) set n:msg(real,void) get n:msg(int) set n:msg(real,void) get n:msg(real) get n:msg(int) get n:msg(int) set n:msg(int,void) 4 Recursive and Polymorphic Types 4.1 Recursive Types Our axiomatization so far is too weak to provide notions for equality and subtype relationship between recursively defined types. Amadio and Cardelli [1] give algorithmic characterizations of subtype relationship between recursive types. In our type specification language, recursion is expressed via references by globally scoped names (URIs of type definitions) instead of an explicit µ operator. It should be possible to map these global definitions to µ expressions in the same ways as mutually recursive function definitions can be mapped to letrec expressions. 4.2 Polymorphic Types An unbounded polymorphic type can be specified by using a function instead of a constant as type name in the defining equation. The polymorphic type is then not available as an object (within first order logic), but a statement about all instances of the polymorphic type can be made. A value for the 2 This is similar to the relationship of expression, variable and acceptor types in the programming language Forsythe [6]. 7

type parameter has to supplied within an URI refering to an instantiation of a polymorphic type. T list(t ) = car:msg(t ) cdr:msg(list(t )) 5 Around the Types We outline some of the envisioned contexts of our type system for use as a Semantic Web modeling language. 5.1 Documents Defining Equations For the distributed modeling language, the primary unit which can be referred to by an URI is a type definition. It is an equation with a type name on its left side and a type expression on its right. Logically this can be viewed as a proposition, that states that a reference expression containing the URI can be rewritten to the semantically equivalent right side. A set of such propositions can be grouped into a single document, thereby inheriting common properties such as author, date or the prefix of the URI. 5.2 Functional Meta Language The constructor terms listed in section 2 can also be constructor terms of a functional programming or query language. This meta language can be used to implement specialized theorem provers, (e.g. for evaluating the subtype predicate) and arbitrary syntax related operations like validating or performing mappings between sets of type specifications. The URI namespace can be extended to include not only names of defined types but also expressions of the meta language. 5.3 References and Distributed Processing Since an URI contains the name of a server, it includes information about the location on which an expression (i.e. the rest of the URI after the server name) is evaluated. Together with the functional view of URI dereferencing, this suggests that optimization of a distributed functional query language can be expressed by rewriting refer(uri ) expressions appropriately. 5.4 Derivations Proofs The basic units of our language are defining equations, propositions as Web documents. Usually meta information such as author and publication date is associated with a Web document. The view of a Web document as a 8

proposition suggests to consider this meta information as part of a proof of that proposition. Such a proof can include subproofs the proofs of the defining equations, to which it (transitively) refers via URIs. Besides Web meta information, proofs will include records of the operations involved in constructing the type. Since different versions of types will be identified by different URIs 3, such a derivation provides a version history of a type. The inclusion of derivations extends the extensional view of a Web reference (dereferencing as a semantics preserving rewriting step) to an extension-intension pair, that also contains a representation of intensional aspects (a proof or derivation, including epistemic information such as authorship). The following example sketch illustrates this. # is the extensionderivation pairing operator, the displayed right sides of # can be considered as pretty printed derivation terms: point = get x:msg(int) get y:msg(int) # author=john colored point = point get color:msg(color) # author=mary The query get(colored point) to a web server then might return the following expression-derivation pair: get x:msg(int) get y:msg(int) get color:msg(color) # mary says: "colored point = (point get color:msg(color))" john says: "point = (get x:msg(int) get y:msg(int))" hence colored point = result 5.5 Extensions The type system as outlined above is very general. It should provide a basis for extensions such as more restrictive type systems. This includes data modeling approaches, such as entity-relationship modeling and concrete systems of modeling and programming languages, such as the Java or the ODMG type system. Constraints expressing the properties of theses systems have to be specified along with mappings from arbitrarily structured groups of types. 6 Conclusion and Further Research Issues We believe that some characteristics of the outlined modeling language are important features for distributed semantic expression in the Web, especially 3 This is also suggested for RDF-Schema [5]. 9

the composition of types from primitives and operations and the embedding of equational specification into the URI space. Straightforward logical property-object-value statements are only positive assertions about the extension of a property. Negative facts, that delimit the extension, are in horn clause programming and datalog implicit by restriction to minimal models with respect to the set of propositions of the program or database. For a logic program or a deductive database, this set of propositions is completely determined. In a distributed setting like the Semantic Web, however, it is a problem to determine, when the set of propositions that have to be considered for an inference task is complete. The use of equational specification avoids this problem, since an equation characterizes positive as well as negative aspects. Nevertheless there is a number of issues, that must be cleared by further research: Formalize the subtype relationship of recursive subtypes in the sense of [1]. Specify algorithms e.g. for deciding subtype relationship and equality. Show how algorithms can be derived from the axiomatization. Show how features of various type systems can be simulated. Show uses of union types. Explore notions of removal of information, such as the projection of a type to messages with names from a certain set, or a difference operator for types. References [1] Roberto M. Amadio and Luca Cardelli. Subtyping recursive types. ACM Transactions on Programming Languages and Systems, 15(4):575-631, September 1993. [2] Tim Berners-Lee, Dan Connolly, Ralph R. Swick: Web Architecture: Describing and Exchanging Data, W3C Note 7 June 1999, http://www.w3.org/1999/06/07-webdata. [3] Wolfgang Bibel et al.: Wissensrepräsentation und Inferenz eine grundlegende Einführung, Braunschweig, 1993. [4] Luca Cardelli: Type Systems. In Handbook of Computer Science and Engineering. CRC Press 1997. [5] Resource Description Framework (RDF) Schema Specification, W3C Proposed Recommendation 03 March 1999, http://www.w3.org/rdf/. [6] John C. Reynolds: Design of the Programming Language Forsythe, Technical Report CMU-CS-96-146, Scool of Computer Science, Carnegie Mellon University, Pittsburgh, 1996. [7] Gio Wiederhold: Mediators in the Architecture of Future Information Systems. In IEEE Computer, March 1992, pp. 38-49. 10