Author: Miguel Ángel Corella Montoya Supervisor: Pablo Castells Azpilicueta

Size: px

Start display at page:

Download "Author: Miguel Ángel Corella Montoya Supervisor: Pablo Castells Azpilicueta"

Emmeline Gibson
5 years ago
Views:

1 Advanced Studies Diploma EPS UAM 2006 Author: Miguel Ángel Corella Montoya Supervisor: Pablo Castells Azpilicueta

2 Table of contents 1. Introduction State of the art Semantic Web RDF / RDFS OWL Web Services SOAP WSDL UDDI Semantic Web services OWL-S WSMO WSDL-S SWSO Web Service classification Non heuristic approaches Heuristic approaches Other related research areas The Web service classification problem Classification problem presentation The need for service semantics The heuristic Category level Service level Parameter level Heuristic usage example Implementation and experiments The classification framework Offline Data Controllers Ontology Controllers Measure Controllers User interface Experiments and evaluation Accuracy test Performance test Results evaluation Conclusions and future work Acknowledgements References Publications based on this research work Miguel Ángel Corella Montoya Page I

3 1. Introduction Since the initial proposal of the World Wide Web [4] by Tim Berners-Lee (more than fifteen years ago), the number of users in the WWW has grown to the point of becoming a commonplace commodity both at the workplace and homes, next to other everyday technologies such as the as the telephone or the TV [11]. As the number of users (providers, brokers, consumers, etc.) kept increasing and the requirements on WWW infrastructures grew increasingly demanding in many aspects, an assortment of technologies (e.g. HTML 1, HTTP 2, CGI 3, ASP 4, JSP 5, PHP 6, Flash 7, etc.) were developed in order to respond to these continuously increasing needs. Today, the evolution and innovation of the World Wide Web keeps progressing at a steady pace, and it is hard to predict what the WWW will become a decade from now. One of the most recent and influential trends in the evolution of the WWW is that of the so called Semantic Web [5] (also promoted by the World Wide Web creator, Tim Berners-Lee). This new trend aims to enable computers to understand the Web contents as nowadays all the information contained in the Web is intended to be consumed by human agents. In this way, computers could use the information instead of only presenting it (e.g. search engines making use of word and phrases meanings instead of using predefined keywords) and would thus be able to automate certain tasks that are performed today by humans (e.g. complex queries resolution, information composition, etc.). At the same time that the Semantic Web vision appears, the Web undergoes another evolutionary trend that has made of it not only a collection of documents providing static information, but a collection of functionalities that are deployed through the Web, providing procedural capabilities, and manipulating data with a certain meaning (e.g. flight tickets, hotel reservations, etc.). In this direction, this Web Service technologies (SOAP 8, WSDL 9 and UDDI 10 ) were developed by the end of the past decade, as a better support for extending the Web from a distributed source of information to a distributed source of service [22], providing a new model for the Web in which dynamic information is exchanged on demand and users are able to perform real changes in the world [1]. In this context, the idea of extending the ideas of the Semantic Web to the description of functionalities, and not only content, is natural. The confluence of the Semantic Web and Web Services has given rise to the field of Semantic Web services [32] which aims to use semantics to endow Web services with a higher potential for automation by attaching extra semantic information beyond current WSDL - based [12] Miguel Ángel Corella Montoya Page 1

4 descriptions in order to enable their analysis and manipulation by software programs. This manipulation is needed to enact capabilities such as automatic selection, invocation, composition, location or discovery of the underlying Web services. The work presented here is focused on a specific service management task related to the ones just mentioned, namely that of service publication. (Semantic) Web Service publication consists not only of making services available to be accessed through the Web (i.e. offering the service functionality through an URL) but also of including it on service repositories where service consumers could look for functionalities they need. Nowadays, UDDI is the most widely accepted and used protocol for publishing, searching and finding services all over the Web [26]. These actions are usually performed using UDDI registries, which can be seen as those service repositories mentioned above, easily accessible through a URL. In addition to the publishing and querying capabilities, UDDI registries have a very relevant feature, key to other service tasks (e.g. selection, location or discovery): it enables published services to be classified under some kind of service taxonomy (e.g. UNSPSC 11 United National Standard Products and Service Code, NAICS 12 North American Industry Classification System, etc.) in a way that service consumers could reduce the number of services in which look for by using the service class as search filter. Unfortunately, the service classification process is performed manually by service providers. This fact brings some problems making useless the usage of those service taxonomies for publishing or discovering services. Common service taxonomies as the ones mentioned before (i.e. UNSPSC and NAICS) consist of such a high amount of classes (e.g. UNSPSC consists of over twenty thousand classes) that the probability of service producers misclassifying published services is very high, making impossible for service consumers to use the taxonomy classification in their service location processes. Taxonomies are subject to change and evolution, or even complete replacement by new ones, making even heavier the maintenance effort load on repository administrators. Having explained the problem, the main section of the work presented here is to provide automatic mechanisms to help service providers (i.e. service publishers) in the service classification task. In this research, a heuristic that will provide publishers with a ranked list of service categories in which the new published services fit best is presented. This will enable the usage of service classifications by service consumers thus easing the effort needed in (automatic) service location or discovery. The structure of the document will be organized as follows: Section 2 presents a complete state of the art regarding all the different techniques, technologies and research trends involved in (semantic) Web service classification. Section 3 exposes in a more complete way the problem of Web service classification and explains why service semantics are needed to successfully solve the classification problem. Section Miguel Ángel Corella Montoya Page 2

5 4 contains the complete presentation and explanation of our heuristic and the different parts in which it can be divided. Section 5 includes a usage example of the classification heuristic to a concrete set of Semantic Web services. Section 6 consists on a presentation of the classification and test framework that has been developed as part of this work and the analysis and evaluation of the classification results obtained using this test framework. Finally, in Section 7 the different conclusions reached during all the research process seen here are discussed. Miguel Ángel Corella Montoya Page 3

2. State of the art In this section a complete state of the art can be found, covering all the different aspects, issues and disciplines involved in the domain of Web service classification as it has

6 2. State of the art In this section a complete state of the art can be found, covering all the different aspects, issues and disciplines involved in the domain of Web service classification as it has been understood in the research here presented. These disciplines include: Semantic Web, Web Services, Semantic Web services and Web Service Classification. The main objective of this section is to expose a general view of the technological situation during the time this research has been developed, in order to understand considerations and further decisions that have been taken, along the research process Semantic Web Nowadays, the Web can be seen as a container of almost all the available information. Any user can find, using a good search engine (e.g. Google 13 ), the piece of information he/she is looking for with little effort. Nevertheless, the current Web situation can be improved, as different examples can be found in which the potential of the Web (and the different search engines) shows some weaknesses. Some examples will be analyzed next [11]: If the query articles about Tim Berners-Lee is introduced in a Web search engine, lots of results will be obtained offering information about articles written by Tim Berners-Lee which is not the information the user is looking for. If someone is looking for information about Jaguar (the animal), Web search engines will provide a huge quantity of links to pages related to Jaguar (the car brand). Again, this is not the information the user is looking for. These are clear examples of the current Web main limitation: the fact that current Web technologies are not able to capture (i.e. formally represent) the semantics of the content presented. Both, the content and the interface provided by the current Web today are intended to be consumed by human beings, not by machines [11] (i.e. software). Figure 1 reinforces this fact by showing how information that is easily interpretable by humans is nearly impossible to be automatically processed by software programs. Fig. 1. Content as it is presented for humans (left image) versus the same content as it is processed by machines (i.e. software programs) (right image). [11] 13 Miguel Ángel Corella Montoya Page 4

7 Therefore, the Semantic Web [5] is a new trend pursuing the overcoming of the limitations of the current Web. The foundation of this new trend is to formally add semantic information to the content of the Web in a way that enables its consumption and understanding by computers. With those semantic annotations, software programs (e.g. search engines) would be able to make difference between pieces of content beyond the words (namely keywords) contained in them (e.g. relationship between words, as about, by, etc., can be represented, and so, used in a retrieval process). Figure 2 illustrates an example of the structural differences between how the contents are organized in the current Web and how they could be in the Semantic Web. Fig. 2. Content as it is structured in the Web (left image) versus the same content as it is structured in the Semantic Web (right image). [11] In order to achieve this formal representation of content semantics, the Semantic Web relies on the concept of ontology. The most accepted definition of ontologies is the one presented by Gruber, who states that an ontology is a formal explicit specification of a shared conceptualization. This is, a formal (i.e. consumable by software programs) representation of a knowledge domain (e.g. in the above example, paintings) shared and agreed between all the different agents dealing with the knowledge domain, in order to enable semantic consistent communication between them (e.g. the ontology concept Painter has to represent the same semantics for all the agents using it). In conclusion, ontologies offer shared vocabularies of classes and relationships among them, used to describe a specific knowledge domain [11]. Thereby, the idea of the Semantic Web is to have the Web content semantically annotated with concepts (and relationships) defined in different shared domain ontologies enabling computers to further understand the content they process and, this way, improve the different tasks performed using this content, e.g. information retrieval. For example, the query presented above which uses the relation about will be successfully resolved as long as about and written by could be represented as different semantic relations in an ontology regarding the domain of writers, articles and so on. Once the use of semantics for the improvement of the current Web situation is clear, the next issue to be solved is how the semantics (i.e. the ontologies) can be represented in a formal way. Since the appearance of the Semantic Web, many languages have been created in order to solve the representation problem (each one with its advantages and drawbacks). Examples of these languages can be SHOE, OIL, DAML+OIL, WSML, etc. Next, a brief introduction to some of the ontology languages existent today can be found. Nevertheless, the presentation of those languages will not be too exhaustive as it is out of the scope of this work (as it will be seen later, the classification proposal performed in this work is language agnostic and so, the Miguel Ángel Corella Montoya Page 5

8 language used for the representation of the semantics involved in the classification process is not really an issue) RDF / RDFS The Resource Description Framework (RDF) 14, W3C Recommendation since 1999, provides with means for adding semantics to a document without making any assumption of its structure. RDF is an XML application (i.e. the syntax managed by the language is based on XML) customized for the addition of meta-information to Web documents. Basically, the description model provided by RDF consists on three main object types. Each one of them will be explored next: Resources: A resource could be an entire Web page, a fragment of a Web page, a whole collection of pages or, even, an object not directly accessible via the Web (e.g. printed books, films, etc.) These resources are always represented uniquely by assigning them URIs. Properties: A property is a specific aspect, characteristic, attribute or relation used to describe a resource. Statements: A specific resource (subject) in conjunction with a named property (predicate), plus the value (object) of that property for the selected resource is called an RDF Statement. In addition, RDF Statements are also resources, enabling thus recursive representations. Having those elements, RDF allows the representation of the semantics of any knowledge domain. But then, how are the properties used to describe a resource defined themselves? At this point, RDF Schema (RDFS) 15 appears. As it happens between XML and XML Schema, RDF is used to describe content (e.g. the different books in a library) and RDFS is used to define the structure of that content (e.g. the Book and Author types of resources, the written by property, etc.). In conclusion, RDFS allows the definition of the different types of resources (also called classes) involved in a domain and the properties which describe them, forming a graph or network between resources. On the other hand, RDF allows populating those semantic networks created with RDFS with specific instances of the resources and properties defined in the schema OWL The Web Ontology Language (OWL) 16 is an evolution of the previous ontology language DAML+OIL and has the status of W3C Recommendation since February It is built over the foundations established by RDF/RDFS, extending the capabilities of the ontologies described with those languages. Next, some of the aspects extended can be found: Miguel Ángel Corella Montoya Page 6

9 Class definition using both, value restrictions and cardinality restrictions, over defined properties. For example, it is possible to define the class C.S: Lewis Books as a restriction stating that the value of the property author must be C.S. Lewis. Class definition using Boolean operators over other defined classes (e.g. class union, class intersection, class negation, etc.). Relationships between classes (e.g. conjunction, disjunction, equivalence, etc.). Relationships between properties (e.g. inverse, symmetric, transitive, etc). Cardinality restrictions (e.g. properties with exactly one value). Enumerated classes. Nevertheless, this new ontology expressivity has a main drawback when the information described in OWL is intended to be used for reasoning or inference processes. This is so, as long as with this expressivity there are situations in which a reasoner can not guarantee that: 1. All the conclusions that must be reached are, in fact, reached. 2. The calculations required to reach the conclusions are performed in a finite time. This limitation implied the creation of three different variations of the OWL language in order to enable its usage (although with some restrictions) for reasoning activities. These language variations together with the restrictions they established for ontologies expressivity are presented next: OWL Full: This variation provides with the maximum expressivity that OWL can provide but does not guarantee any of the statements already mentioned (i.e. all the conclusions and in a finite time). OWL DL: This variation provides with the maximum expressivity that can ensure all the conclusions are reached in a finite time. To enable this, the main restriction established is that the concept of meta-class can not exist, i.e. a class can not be an instance of other class. OWL Lite: Restrict the usage of cardinality restrictions in a way that only cardinalities between 0 and 1 can be used. In addition, all the restrictions established in OWL DL are also applied to OWL Lite. This way, the relation between these three different variations can be expressed as follows: OWL Lite valid ontologies are also OWL DL valid ontologies. Conclusions obtained by using inference over OWL Lite ontologies are also valid conclusions in OWL DL. OWL DL valid ontologies are also OWL Full valid ontologies. Conclusions obtained by using inference over OWL DL ontologies are also valid conclusions in OWL Full. Due to the OWL main drawback (already commented), the validity of the inverse relations between the language variations can not be ensured. Miguel Ángel Corella Montoya Page 7

10 2.2. Web Services As it has been mentioned, the World Wide Web is not only a collection of static contents, but also a collection of dynamic content and functionalities, i.e. Web services, that operate through the Web and provide with some kind of value (e.g. flight tickets, hotel reservations, etc.). As it is stated in [22], this Web services extend the Web from a distributed source of information to a distributed source of service, providing a model for the Web in which dynamic information is exchanged on demand and users are able to perform real changes in the world [1]. But, what is exactly a Web service? Nowadays, lots of discussions exist trying to find the perfect definition of what a Web service is. For this work, I have chosen the definition performed by Priest in [30], stating that today, the notion of service is semantically overloaded. Let s see each of the definitions offered in his work. Priest notice the importance of understanding the differences between services in the real world and Web services, often mixed up in the literature. A real world service (generally talking) is a provision of value in some domain [30]. Customers using the service (i.e. users or service consumers) receive something in exchange by paying for the service. These services can be described in some language related to the service domain but such a description does not compel the description of the interaction between service consumers and service suppliers. It is obvious that there has to be some relation between real world services and Web services, and so, the definition provided above gives the main ideas of what a Web service is. Nevertheless, there are some slight differences that should be considered. Web services are offered without any need of payment, although there are some examples (e.g. SMS messaging) in which some sort of payment is required. The value provided by Web services does not need to be something with a monetary value, but something that is useful for consumers (e.g. a piece of information extracted from a database). Web services are offered through the Internet using some sort of standards and/or protocols to achieve the communication between providers and consumers. Web services can be seen as sets of functionalities available through the Internet (publicly or privately), that provide with some sort of results when executed (i.e. when the service is consumed). Attending to this differences, Web services can be formally defined as computational entities accessible over the Internet (using some particular standards and protocols) providing something of value, in the context of some application domain [30]. In addition, Priest states the importance of differentiate between particular provisions of value (e.g. price of The Lord of the Rings book), named concrete services, from the general capability to provide a value (e.g. book prices lookup), named abstract services. This way, every concrete service (i.e. specific execution of a Web service) will be a particular instance of an abstract service (i.e. description of the whole capability of a Web service). Generally, when the term Web service is used in this works, it will be referring to the last type of services mentioned, this is, the abstract ones. Miguel Ángel Corella Montoya Page 8

In order to further describe the concept of Web services, a list of some of their most relevant features can be found next [2]: Interoperability: Due to the fact that Web services are used as a

11 In order to further describe the concept of Web services, a list of some of their most relevant features can be found next [2]: Interoperability: Due to the fact that Web services are used as a standard interface for some sort of internal functionality, they enable the usage of that functionality independently of the client environment (e.g. operating system, programming languages, communication protocols, etc.). Ease of use: As Web services internal functionality is publicly accessible through the Internet, integrating this functionality in other systems is quite simple, as no requirements are established for the client side and no (at least, not too much) settings are needed in order to make use of that functionality. Reusability: As the functionalities are published in the Web and, as it has been already mentioned for the above features, under no restrictions, their reusability is maximized. Formalization: The Web service descriptions (i.e. the pieces of information available for discovery and invocation of Web services) are expressed in a formal language enabling thus both human and software programs consumption, although automatic consumption of Web services is not achieved as well as it is desirable (see Section 2.3). Ubiquity: As Web services are published in the Internet and are accessible through an URL (i.e. the same way as all the different resources in the current Web), the ubiquity of the functionalities offered is clear. Once the different main features of Web services have been presented, standards and technologies that support the creation and usage of those features are presented. Since the appearance of the Web service trend [1], several technologies have been developed and used to deal with services. In this work, we will focus on the three main standards that have been most used and accepted by the Web service community: SOAP, WSDL and UDDI. Figure 3 shows how these three standards are related to the different participants involved in a service publication and consumption cycle. Fig. 3. Participants of the service publication and consumption cycle and how they are related to the different Web service related standards SOAP The Simple Object Access Protocol (SOAP) [25] provides the definition of the XML based information which can be used for exchanging structured and typed information between peers in a decentralized, distributed environment. It is, fundamentally, a stateless, one-way message exchange paradigm and it is on issues such as message routing, reliable data transfer, firewall traversal, etc. All this features (and its Miguel Ángel Corella Montoya Page 9

12 specification details which will be briefly presented next) have made SOAP the most common and used protocol for communication with Web services and thus, the latest version of the specification (SOAP 1.2) is accepted as W3C Recommendation since June SOAP Messages The standard specification starts by showing a SOAP message example to illustrate the different components provided by the specification that will be briefly presented next. <?xml version='1.0'?> <env:envelope xmlns:env=" <env:header> <m:reservation xmlns:m=" env:role=" env:mustunderstand="true"> <m:reference>uuid:093a2da1-q r-ba5d-pqff98fe8j7d</m:reference> <m:dateandtime> t13:20: :00</m:dateandtime> </m:reservation> <n:passenger xmlns:n=" env:role=" env:mustunderstand="true"> <n:name>miguel Angel Corella</n:name> </n:passenger> </env:header> <env:body> <p:itinerary xmlns:p=" <p:departure> <p:departing>new York</p:departing> <p:arriving>los Angeles</p:arriving> <p:departuredate> </p:departuredate> <p:departuretime>late afternoon</p:departuretime> <p:seatpreference>aisle</p:seatpreference> </p:departure> <p:return> <p:departing>los Angeles</p:departing> <p:arriving>new York</p:arriving> <p:departuredate> </p:departuredate> <p:departuretime>mid-morning</p:departuretime> <p:seatpreference/> </p:return> </p:itinerary> <q:lodging xmlns:q=" <q:preference>none</q:preference> </q:lodging> </env:body> </env:envelope> So, as it can be seen in the example, SOAP messages have three main components, although one of them is not mandatory. Figure 4 shows the structure of a SOAP message having these three elements in a graphical way. Let s present them: SOAP Envelope: This is the main element of SOAP messages and acts as the container of the real message contents, i.e. message header and body (whose content is application defined and not part of the SOAP specifications). SOAP Header: This is the non mandatory element that has been mentioned, and it is intended to contain information that is not consider application payload and can be used as control information (e.g. passing directives or contextual information related to the processing of the message). So, headers enable message extension in application specific manners. SOAP Body: This is the mandatory element inside SOAP envelopes and, so, this element is where the main end-to-end information conveyed in a SOAP Miguel Ángel Corella Montoya Page 10

13 message must be carried. Therefore, here is where all the information exchanged is included. Fig. 4. Graphical representation of a SOAP message containing both the SOAP Header and the SOAP body elements [25]. In conclusion, SOAP messages are extendable XML blocks that allow the exchange of some application defined information (specified apart from the SOAP specification). So, the next issue that has to be explored is how these messages are exchanged. SOAP Message Exchange The simples exchange model for SOAP is a request-response pattern. Some early uses of SOAP (version 1.1) emphasized the use of this pattern as means for conveying remote procedure calls (RPC). Nevertheless, it is important to note that not all the SOAP request-response patterns can be modelled as RPC calls. A much larger set of usage scenarios than that covered by the request response pattern can be modelled simply as XML based content exchanged in SOAP messages to form a back-and-forth conversation between two endpoints. Applicationdefined header blocks accompanying each message in the conversation can be used to correlate the messages exchanged between endpoints at the application level, as SOAP is a stateless protocol. For instance, in the example presented, a header block reservation with the same value of reference can be used to achieve the correlation. SOAP also provides a model for handling possible faults during the communication process. SOAP distinguishes between the conditions that result in a fault, and the ability to notify that fault to the sender of the faulty message. Details about how faults can be notified, structure of notification messages, etc., are too specific and thus, out of the scope of this document. SOAP Processing Model The processing model specified by SOAP describes the way a node has to process a received SOAP message. There is a requirement for the node to analyze those parts of Miguel Ángel Corella Montoya Page 11

14 a message that are SOAP-specific (the ones that appear as part of the env namespace). Those elements conform, as it can be seen, the structure of the envelope (the only section that has to be processed by a node as the rest of the information contained in the message is intended to be consumed by the exchanging applications). SOAP Protocol Bindings SOAP Messages can be exchanged using a variety of underlying protocols including other application layer protocols. The specification of how SOAP messages may be transferred between SOAP nodes using a specific protocol is defined as a SOAP binding. These bindings provide the mechanisms to support different functionalities (e.g. encryption, reliability, etc.) needed by a SOAP application. These functionalities are identified by an URI, so that all applications referencing it use exactly the same semantics. Although different SOAP bindings can be defined, the only normative binding for the current version of SOAP messages (i.e. version 1.2) is with HTTP 1.1 Other choices are possible, e.g. binding, but not standardized in the current recommendation WSDL The Web Service Description Language (WSDL) [12] is an XML based language for describing Web services. The last version of the specification (namely, version 2.0) is W3C Candidate Recommendation since March Nevertheless, as WSDL 2.0 descriptions are not the ones that have been used in the whole research process, I will include here the information about a previous version, more precisely, version 1.1. An example of a WSDL described service (offered in the specification of the language) can be found next, containing all the elements involved in Web service description. In addition, Figure 5 shows the structure of WSDL in a graphical way in order to ease its understanding. The example shows the description of a Stock Quote service. <?xml version="1.0"?> <definitions name="stockquote"...> <message name="gettradepriceinput"> <part name="tickersymbol" element="xsd:string"/> <part name="time" element="xsd:timeinstant"/> </message> <message name="gettradepriceoutput"> <part name="result" type="xsd:float"/> </message> <porttype name="stockquoteporttype"> <operation name="gettradeprice"> <input message="tns:gettradepriceinput"/> <output message="tns:gettradepriceoutput"/> </operation> </porttype> <binding name="stockquotesoapbinding" type="tns:stockquoteporttype"> <soap:binding style="rpc" transport=" <operation name="gettradeprice"> <soap:operation soapaction=" Miguel Ángel Corella Montoya Page 12

<input> <soap:body use="encoded" namespace="http://example.com/stockquote" encodingstyle="http://schemas.xmlsoap.

15 <input> <soap:body use="encoded" namespace=" encodingstyle=" </input> <output> <soap:body use="encoded" namespace=" encodingstyle=" </output> </operation>> </binding> <service name="stockquoteservice"> <documentation>my first service</documentation> <port name="stockquoteport" binding="tns:stockquotebinding"> <soap:address location=" </port> </service> </definitions> Fig. 5. Graphical representation of a service description based in WSDL containing all the different elements provided by the language specification. Let s briefly comment in more detail each of the different elements provided by the WSDL specification: Messages: These elements describe abstractly the format of a particular communication message between the service and consumers. The format of these messages is typically described as XML elements and attributes. Messages are composed of one or more parts each one describing an atomic portion of the message, i.e. a service interface parameter (e.g. a ticker symbol in the example proposed). Operations: Sets containing the messages (more precisely, references to the definition of the messages in previous sections of a WSDL description) the operation accepts as input and output. Operations can be seen as a language programming API. Port types: Collections of operations grouped in order to assign them a common message exchange mechanism. Miguel Ángel Corella Montoya Page 13

16 Bindings: These components specify concrete bindings for port types to a particular concrete message format and transmission protocol. The WSDL specification does not state concrete binding details. Ports: These elements specify the specific endpoint at which a service is available and in what terms. This way, ports associate port types with a specific binding. Services: This are the upper component described in WSDL. They describe a set of ports that a service provides, thus exposing both the functionality provided by the service and how to access it. Finally, in addition to all these functional elements, WSDL specification supports the inclusion of documentation (intended only for human consumption) describing different aspects of a described service in a natural language form UDDI Universal Description, Discovery and Integration (UDDI) [26], being its last version (i.e. v 3.0) OASIS standard since 2005, provides mechanisms to find Web services over the Internet, and, in addition to attach them (not exactly attach them, but associate them) some extra information not included in WSDL descriptions (e.g. textual descriptions). Using UDDI, different service consumers can dynamically look up, as well as discover, services provided by external business partners. All these functionalities are implemented in UDDI registries, i.e. repositories enabling the browsing and discovery of services all over the Web. UDDI registries can have two different kinds of service consumers or clients. First, there can be business willing to publish service descriptions and the interfaces needed to access or invoke those services. On the other hand, there can be clients willing to obtain services descriptions of a certain kind and bind programmatically to them. UDDI itself is layered over the SOAP messaging protocol and assumes that all the request and responses performed to/from the UDDI registry are done by exchanging UDDI objects sent around as SOAP messages. As it has been seen, WSDL allows only the representation of the functional and technical specification of Web services, i.e. the syntactic information about services. Nevertheless, in order to have meaningful Web services, more information is needed about them beyond this syntactic information. One of the central purposes of UDDI is the representation of data and metadata about Web services, i.e. some extra information allowing further Web service management. For instance, a UDDI registry, either publicly or privately accessible, offers standard mechanisms to represent some extra information (e.g. classification or categorization of Web services, the more relevant feature for this research) so that services can be discovered and consumed in a more flexible way. In conclusion, UDDI can be used to represent (and store in repositories accessible over the Internet) information about Web services in a standard way such that different sort of queries can be posted to UDDI registries in order to achieve the different objectives presented by some different scenarios, commented next: Find Web services compatible with an abstract definition of their functionality and/or interface (i.e. same or equivalent inputs and outputs definition). Miguel Ángel Corella Montoya Page 14

17 Find Web service providers classified according to a known classification scheme or identifier system. Determine the different communication features (e.g. security, transport protocols involved in the communication with the service, etc.) supported by a given Web service. Perform service retrieval process having as input some keywords generally related to the Web services the service consumer is looking for. Cache technical information about Web services and update that information automatically at runtime. Once the different scenario usages of UDDI and UDDI registries have been presented, the most relevant elements and structures involved in the standard are presented: businessentity structure: This structure is used to represent business and/or service providers within an UDDI registry. It contains some descriptive information about both the providers and the services they offer (e.g. names and textual descriptions in multiple languages, contact and classification information, etc.). Functional and technical information about services is represented in different structures (explained next). This businessentity information can be seen as UDDI white pages, as they allow the location of business and the inspection of their internal information. businessservice structure: This structure is used to represent a logical grouping of Web services. At this service level, there is still no information about the different interfaces or functionalities provided by those services, but information that allows the grouping of different services under a common rubric (e.g. services providing equivalent functionalities). Each businessservice is the logical child of a businessentity and contains, again, description information (e.g. names, descriptions and classifications) outlining the purpose of the individual Web services contained in the group. The information provided by this structure can be seen as the UDDI yellow pages, as it is intended for the location of services based on their description and purposes. bindingtemplate structure: Finally, the bindingtemplate structure allows the representation of an individual Web service. In this case, the structure contains the functional and technical information needed by applications to bind and interact with the Web service being described. This structure must contain either the access point in which the service can be found or an indirection mechanism leading to the access point of the service. Then, the information provided in this type of structure can be seen as the UDDI green pages, as it provides the technical information needed to interact with the published Web services. tmodel structure: Although this component is not directly related to Web services (at least, not as much as the ones already presented) it is important as these models are the ones in charge of representing all the different classification and categorization taxonomies. As UDDI model is based on the notion of shared specifications about different information related to Web services, tmodels allows the storage, access and management of these different types of information in an independent, but centralized, way. For example, imaging a Web service from which functional description, textual description and categorization information is willing to be stored in an UDDI registry. UDDI can have referenced three different tmodel (one for each Miguel Ángel Corella Montoya Page 15

18 information type) and treat them as independent information although they are all related to the same Web service. Then, now that UDDI purposes and structures have been presented, the last point that has to be covered is that of how UDDI registries are used for the location of Web services (i.e. their main purpose). The different mechanisms and methods supported by the UDDI specification are analyzed next: It allows registry users for the definition of multiple classification taxonomies (using tmodels) that can be used for service organization. This way, by the usage of tmodels Web services can be organised in an infinite number of different ways supporting every different classification and retrieval scheme used. It allows the usage of such a classification system into every entity on the registry. This way, not only services can be classified, but also businessentities and businessservices. Finally, the UDDI Inquiry API provides with a very complex and powerful set of tools for performing queries to the UDDI registries, allowing from the most simple queries (e.g. all the services provided by a business) to the most complex one (e.g. queries involving different criteria as owners, classification, description keywords, etc.) Semantic Web services Web services provides with very useful means to populate the Internet with functionalities (easy accessible through a URL) that maximize the reusability of the specific underlying implementation of those functionalities. Moreover, as these functionalities are provided and used using standards and protocols widely accepted, Web services maximize also the interoperability between systems, i.e. service consumers have no need to know the specific features of the environment (e.g. computer specifications, operating system, development language, etc.) where the implementation of the Web service is running. Nevertheless, Web services present some limitations when intended to be used without (or with minimum) human supervision. As it has been seen in the presentation of the UDDI standard (in the previous subsection of this work), the service retrieval capabilities provided today by existent service repositories are not too flexible nor good enough to perform automatic discovery and selection (as the most part of the information managed by repositories is intended for human interaction). The interaction process in systems using Web Services is almost hard coded as there is no possibility of discovering and selecting services in runtime (because of the previous limitation commented). There are some not semantic based tendencies and technologies, e.g. EAI, ESB, SOA (e.g. [15], [28] and [31]) (having much similarity between them), which provide with an approximate solution for this limitation by easing the task of integrating services in a system (or use them to perform the integration of different applications). Creation of complex process composed by a set of different service invocations is a very hard and inflexible task, as, first, the services have to be previously found and manually selected and, second, the complete interaction between them has to be coded by the service composition developer (who has to know Miguel Ángel Corella Montoya Page 16

19 exactly the interface and return values provided by each of the services involved in the overall complex process). In conclusion, Web Services provides with a great level of interoperability between systems and flexibility for the reuse of pieces of software and functionalities previously developed, but their usage requires great effort from system developers and/or administrators point of view. In order to overcome these Web Service limitations, a new research trend has appeared aiming to use Semantic Web technologies (already presented in this work) in conjunction with Web Service technologies in order to endow these Web Services with a higher potential for automation. This new trend has been named Semantic Web services [32]. In a less formal way, the main goal of this Semantic Web services is to add semantic information (e.g. using domain ontologies and other Semantic Web techniques) to current WSDL-based service descriptions (already presented) in order to enable the manipulation of services by software programs easing the automation of different tasks related to the Web Service life cycle. Figure 6, shows in a graphical way the relation between all the technologies already commented in this section of the work (i.e. the Web, the Semantic Web, Web Services and Semantic Web services). Fig. 6. Chart showing the relation between Web, Semantic Web, Web Services and Semantic Web services and their most relevant technologies and/or standards currently available. Let s comment some of these Web service lifecycle tasks from the point of view of Semantic Web services, as they are presented in the specification of one of the main description languages of this area, this is, OWL-S [24]. Automatic Web service discovery: This task consists on the automatic location of Web services that can provide a particular class of service capabilities, while adhering to some client-specified constraints. For example, the user may want to find a service that sells airline tickets between two given cities and accepts a particular credit card. Automatic Web service invocation: In this case, the task goal is to perform the invocation of a Web service by a software program or agent, given only a declarative description of that service as opposed to when the agent has been pre programmed to be able to call that particular service. This is required, for example, so that the user can automatically request the purchase of an airline ticket of a particular flight, once the service has been selected, without having to create the specific messages involved in the communication. Invocation of Web services can be seen as a set of remote procedure calls. Automatic Web service composition and interoperation: This last task involves the selection, composition and interoperation of Web services to perform some complex task, given a high level description of an objective. For example, the Miguel Ángel Corella Montoya Page 17

20 user may want to make all the travel arrangements for a trip to a conference without having to fulfil all the different steps manually. So, the last issue that has to be solved is how the semantics needed to achieve the automation of all this different service related task can be captured and formally represented in a way software programs can understand and use them. There are different Semantic Web service description language proposals. The most relevant and accepted ones by the Semantic Web service community (although any of them has become an official standard ranked with the status of Recommendation by the W3C at the time of this writing) are presented next OWL-S The service description language OWL-S (Web Ontology Language for Web Services) [24], has been built over the foundations established by DAML-S [10]. The goal pursued by this language is to provide with a collection of tools allowing Web service annotation (i.e. augmentation with semantic information) in such a way the objectives pursued by the Semantic Web service vision (mentioned before) could be fulfilled. OWL-S is an ontology enabling service providers to capture and formally represent the semantics related with a Web service (by attaching to each service element one or more semantic concepts defined in any sort of ontology). This ontology focused on Web services is based on the OWL ontology language, and so, it provides with service representations that are compatible with other languages as, for example, XML and RDF. This way, elements or tools defined and used in such compatible languages (e.g. domain ontologies described in RDF or XML validators) can be used in conjunction with the OWL-S ontology for the description of Web service semantics. OWL-S provide with an upper ontology that can be used (by its specification and/or instantiation) to describe any sort of Web services. This upper ontology is mainly composed by four different classes. Let s briefly present each class and its main purpose. Figure 7 shows the representation of the upper ontology as it is presented in the specification. First, the Service class. This class obviously aims to offer a representation of the whole service itself. It is used as a link between the other three main classes (pieces of information) provided by the OWL-S ontology. The ServiceProfile class. This class aims to represent what the service does and who is providing it. This way, it can contain information such as provider contact information, textual description about the service, service parameters and service capabilities/functionalities. The ServiceModel class. This class aims to represent how the service works. In this way, it can contain information such as which different processes (operations) are offered by the service, which inputs and outputs are requested and provided for the invocation of a specific process, how complex processes (involving more than one service) are scheduled, etc. The ServiceGrounding class. This last class aims to represent how to access the service. This is how the service represented at a conceptual level in OWL-S descriptions can be accessed and invoked, typically, by accessing and using the underlying syntactic WSDL description and all the standards existent for non-semantic Web services. Miguel Ángel Corella Montoya Page 18

21 Fig. 7. Graph showing the four main classes of the OWL-S upper ontology, the relationships between them and the main purpose of each class [24]. In OWL-S, each Service can present a set of ServiceProfiles, it is describedby at most one ServiceModel and supports exactly one ServiceGrounding. A closer analysis of the three different OWL-S main classes can be found next. ServiceProfile When a service is provided, there are three main components that have to be present: a service consumer, a service provider and a set of communication mechanisms in order to enable finding each other, stating the terms of their communication and establishing a logical connection between them. To achieve these communication tasks, it has to be possible to access, and process, the description of what the provider offers, what the consumer needs and the terms of use of the service. This kind of information is what the OWL-S ServiceProfile contains. So, next the information contained in a ServiceProfile can be found. Figure 8 shows the different types of information managed by the ServiceProfile and the relationships between all the different information contained in it. Service general information: This type of information is usually intended to be consumed by human agents and so it is basically conformed by textual pieces of information. More precisely, the information included here is the name of the service, a textual description and the contact information of the provider. Service features: This second type of information can be divided into two subtypes: service categorization and service general attributes. The first one aims at describing the category to which the service belongs (e.g. the UNSPSC code or label of the service category in which the service must be classified). The general attributes can be used as containers for any sort of non functional properties as, for example, Quality of Service (QoS) parameters. Service functionality: The description of the service functionality in the ServiceProfile is performed from two different points of view. First, the data transformation, i.e. the data inputs and outputs that compound the communication interface with the service (e.g. flight departure and destination). Second, the environment prerequisites and changes performed by the invocation of the service (e.g. the flight reservation performance). Miguel Ángel Corella Montoya Page 19

22 Fig. 8. Graph showing the OWL-S ServiceProfile class and the relationships between the different concepts and information types [24]. ServiceModel As it is stated in the specification of the OWL-S description language, the interaction with a service can be modelled as a process (e.g. a remote procedure call). This way, OWL-S supports the idea of using Process (a subclass of ServiceModel, based on some Workflow techniques and ideas) to describe how the service works (internally). Nevertheless, the ServiceModel class is of course offered in order to develop any other type of service internal description. Processes describe the execution flow of a service since the receiving of the first input until the provision of the final output, thus, compatibility between the Process description of a service and the interface described in the ServiceProfile must be ensured by service developers. Three different types of process are presented in OWL-S that can be used to model how the service works and interacts with other participants (e.g. the user or other services inside a composition). Let s comment each one. Atomic process: Used to model services following the model in which an input message is expected and an output message is provided. This type of process can be seen as simple invocations of underlying Web services. Composite process: Used to model complex process in which some kind of state is maintained. They can be seen as complex workflow processes in which each action (e.g. user action or external service invocation) move a step forward the execution of the overall complex process. Simple process: Abstraction used to ease the representation of both atomic and composite services inside descriptions of composite services (as these descriptions can get very huge as the complete process gets more complex). Leaving apart simple processes (as they are only used as abstraction of the other types of processes), we have then that atomic processes can be seen as the basic unit of functionality described using the OWL-S Process (ServiceModel) class. This basic unit can be composed with other basic units (as well as user interactions) to compose more complex services (composite processes) in which more complex capabilities (in which some kind of state is maintained) are offered. In order to describe how this state Miguel Ángel Corella Montoya Page 20

23 evolves as the composite process is executed, OWL-S offers some control structures (e.g. sequences, if-then-else, loops, etc.) enabling the description of execution flows (i.e. the same way a programming language does). Figure 9 shows the complete structure of a ServiceModel (and its specification ProcessModel) and the different relationships between all the involved components. Fig. 9. Graph showing the complete structure of OWL-S ServiceModel (in concrete its ProcessModel specification) and the relationships between all the involved components [24]. ServiceGrounding Grounding can be seen as a mapping from a conceptual description to a syntactic description of the elements used to describe how the interaction with a service has to be done (i.e. the different parameters involved in the invocation of the service). OWL-S offers a specific grounding enabling the mapping from atomic processes inputs and outputs to WSDL descriptions parameters. The problem then is how to map parameter semantic descriptions (in OWL-S, typically, OWL ontology concepts) to WSDL inputs and outputs (defined by XML Schemas). As both, OWL and XML Schema, are based on the usage of XML as foundation, OWL-S specification proposes the usage of XML transformation languages (i.e. XSLT17) to translate OWL concepts into WSDL messages. Thus, the mapping between OWL-S and WSDL is performed using the following rules. Figure 10 shows the mapping explained next in a graphical way. An OWL-S atomic process is mapped to a WSDL operation. The set of inputs of an atomic process is mapped to a unique input WSDL message, and the same is done with the set of outputs. OWL concepts used in OWL-S descriptions are mapped to WSDL complex types Miguel Ángel Corella Montoya Page 21

24 Fig. 10. Graph showing how the different elements of an OWL-S service description are mapped to WSDL in order to enable the invocation of Semantic Web services [24] WSMO The Web Service Modelling Ontology (WSMO) [7] provides, the same way as the rest of the proposals do, with the means (more precisely, in this case, as in the previous one, an ontology) that allow the conceptual formal description of different aspects related to Web services. To do so, WSMO is expressed in the Web Service Modelling Language (WSML) [8], already presented, and built over the foundations specified by the Web Service Modelling Framework, WSMF [17] (i.e. the ontology provides with definitions for all the concepts involved in the framework). As it is stated in the WSMO specification, this Semantic Web service description language has its foundations in the following design principles: The concept of URI (Universal Resource Identifier) is used in order to achieve the resource unique identification. In addition, WSMO also uses the well known concept of namespaces in order to make difference between different vocabularies involved in service descriptions. Finally, it offers the possibility of representing WSMO descriptions in an XML based format and other standards supported by the W3C. In conclusion, WSMO tries to be compatible with current Web technologies. Semantic Web services descriptions based in WSMO are completely based on the usage of domain ontologies, i.e. all the resources, parameters and all the different elements involved in the description of a service are defined as concepts contained in different domain ontologies (as opposed to other description languages, e.g. WSDL-S, in which some syntactic description elements are involved). In order to maximize its interoperability, WSMO provides with means to support the usage of all the different existent ontological languages. All the different elements involved in WSMO descriptions are described in an independent way, i.e. service element descriptions are completely independent of the description of the possible interactions between those service elements. Miguel Ángel Corella Montoya Page 22

25 WSMO maintains the desirable independence between service consumers and providers, allowing the usage of different domain ontologies by each participant involved in service invocations. This way, WSMO offers means to perform the translation/transformation between different knowledge representations. As it has been already commented, WSMO is a WSML expressed ontology based on the foundations established by WSMF. To understand the relation between both elements, WSMF can be seen as the set of principles, design decisions and/or architectural elements. In the other hand, WSMO is the specific vocabulary that can be used to define Web services based on WSMF principles. Figure 11 shows the different main components or architectural elements proposed by WSMF. Fig. 11. Web Service Modelling Framework (WSMF) and, so, Web Service Modelling Ontology (WSMO) top-level elements [7]. Next, each of the top elements will be presented in order to complete the presentation of WSMO Semantic Web service description language. Ontologies As it has been already commented, WSMO based descriptions are completely based on the usage of ontologies (i.e. all the elements involved in descriptions are concepts defined in different domain ontologies). Thus, this is a key element of the framework as it provides with means to describe all the different vocabularies that will be involved in further service descriptions. The concept of ontology has been already presented in previous sections of this work, so in this section I will only focus on how they are constructed in WSMO. Next, a piece of WSML code defining the concept of ontologies and all its available properties can be found. Class ontology hasnonfunctionalproperties type nonfunctionalproperties importsontology type ontology usesmediator type oomediator hasconcept type concept hasrelation type relation hasfunction type function hasinstance type instance hasaxiom type axiom Let s comment the different pieces of information contained in ontology descriptions: Non functional properties: They are a set of key value pairs offering information about the services not related with its functionality (e.g. creator, date, version, etc). Miguel Ángel Corella Montoya Page 23

26 Ontology importation: WSMO descriptions are completely based on the usage of lots of different ontologies, so mechanisms to import all the different ontologies needed by one specific description are provided. Mediators usage: As there can exists conflicts between ontologies involved in a specific definition (e.g. two ontologies defining exactly the same concept in different ways), mechanisms to resolve this conflicts are provided. The concept of mediation will be presented properly later. Concept definition: This is the most obvious section of an ontology definition as here all the different semantic concepts managed by the ontology will be described. To do so, concept class has the following description. Class concept hasnonfunctionalproperties type nonfunctionalproperties hassuperconcept type concept hasattribute type attribute hasdefintion type logicalexpression multiplicity = single - valued Relationship and function definition: This is another key section for the definition of ontologies as both relationships (multiple domain and multiple results) and functions (multiple domain and unique result) enable the creation of semantic networks in the ontology by the description of the interactions between concepts. For this purpose, relation and function classes are defined as it is described next. Class relation hasnonfunctionalproperties type nonfunctionalproperties hassuperrelation type relation hasparameter type parameter hasdefintion type logicalexpression multiplicity = single valued Class function sub-class relation hasrange type concept multiplicity = single - valued Instances definition: Instances allows the definition of specific individuals of the concepts defined in the ontology (e.g. Spain should be an instance of the concept Country). To do such definitions, WSMO offers the concept instance whose definition can be found next. Class instance hasnonfunctionalproperties type nonfunctionalproperties hastype type concept hasattributevalues type attributevalue Axioms: Finally, axioms enable the description of different WSMO elements (e.g. concepts) from the point of view of their internal logic. For example, the concept SpanishBanks can be defined as all the instances of Bank having Spain as value of their locatedat property. WSMO offers the definition of the axiom class to define such logic propositions. Class axiom hasnonfunctionalproperties type nonfunctionalproperties hasdefinition type logicalexpression Web services WSMO Web service definition is conformed by a set of non functional properties (i.e. same idea used in the non functional section of the OWL-S ServiceProfile: textual Miguel Ángel Corella Montoya Page 24

27 description, name, etc.), a set of functional properties (i.e. input, output, preconditions and effects in OWL-S ServiceProfile) and the description of the behaviour of the service (i.e. same goals as OWL-S ServiceModel). Next, the WSML piece of code showing the structure of a Web service definition in WSMO can be found. Class WebService hasnonfunctionalproperties type nonfunctionalproperties importsontology type ontology usesmediator type {oomediator, wwmediator} hascapability type capability multiplicity = single-valued hasinterface type interface Let s comment the different pieces of information contained in Web service descriptions. Only the ones that have been not previously presented will be included (as some of them also appeared in WSMO ontologies definition): Service capabilities: Definition of service capabilities stands for service functionality definition based on its preconditions (i.e. data inputs for the service), assumptions (i.e. state of the world before service execution), postconditions (i.e. data outputs) and effects (i.e. state of the world after service execution). Although the naming of the elements involved are different from the ones used in OWL-S (i.e. inputs, outputs, preconditions and results), the goal of these elements is exactly the same. The description of the WSMO capability elements can be found next. Class capability hasnonfunctionalproperties type nonfunctionalproperties importsontology type ontology usesmediator type {oomediator, wwmediator} hassharedvariables type sharedvariables hasprecondition type axiom hasassumption type axiom haspostcondition type axiom haseffect type axiom Service interfaces: WSMO interfaces are intended to describe how the service functionality can be used, i.e. how the underlying Web service can be invoked and how it works internally. Nevertheless, at the time this research has been completed, this section of the WSMO proposal is not mature enough to be used (i.e. there are a lot of details not yet included in the WSMO specification). So, I will not include a complete explanation of how these interfaces are used. Only the WSML description of the interface concept will be included. Class interface hasnonfunctionalproperties type nonfunctionalproperties importsontology type ontology usesmediator type {oomediator, wwmediator} haschoreography type choreography hasorchestration type orchestration Goals A goal in WSMO is used to represent the objective that will be reached, or that is expected to be reached, by the invocation of a service. Thus, these goals can be seen as service capability descriptions that can be used, for example, in service discovery mechanisms aiming to find a service fulfilling the goal expressed by a service Miguel Ángel Corella Montoya Page 25

28 consumer. Next, the structure of the WSMO goal element can be found. As it only includes pieces of information already commented, no further details will be given. Class goal hasnonfunctionalproperties type nonfunctionalproperties importsontology type ontology usesmediator type {oomediator, wwmediator} requestcapability type capability requestinterface type interface Mediators In a non formal way, WSMO mediators can be seen as components allowing the inclusion, in service descriptions, of external services that perform different mediation tasks between two components of those descriptions (e.g. mediation between two ontologies can resolve possible incompatibilities between them, for example, different representations of the same concept). Next, the structure of a general mediator can be found. Class mediator hasnonfunctionalproperties type nonfunctionalproperties importsontology type ontology hassource type {ontology, WebService, goal, mediator} hastarget type {ontology, WebService, goal, mediator} hasmediationservice type {goal, WebService, wwmediator} Depending on the elements involved in the mediation WSMO differs between four different types of mediators. Let s see each one of them as well as their definition in WSML format (i.e. the different specifications of the general mediator description): ggmediators: Goal to goal mediators can perform tasks as goal refinement/translation allowing the conversion of goals into other equivalent ones. This is useful, for example, in service discovery. The user can define the specific goal the service he is looking for must fulfil and this specific goal can be refined into goals equivalent but more general, increasing the probability of finding a service providing the desired results. Class ggmediator sub-class mediator usesmediator type oomediator hassource type {goal, ggmediator} hastarget type {goal, ggmediator} oomediators: Ontology to ontology mediators are critical for the usage of multiple ontologies in a unique WSMO description as they support very critical tasks as incompatibilities resolution, language translations, etc. This way, the reusability of ontologies is maximized as ontology heterogeneity is solved almost transparently (once the service performing the needed translations and transformations is found). Class oomediator sub-class mediator hassource type {ontology, oomediator} wgmediators: Web service to goal mediators are used to represent the fact that a service (totally or partially) achieves the objectives described by a goal. This way, this type of mediators allows the representation of location information as they relate Web services to the goals that can be fulfilled by their invocation. Miguel Ángel Corella Montoya Page 26

29 Class wgmediator sub-class mediator usesmediator type oomediator hassource type {WebService, goal, wgmediator, ggmediator} hastarget type {WebService, goal, ggmediator, wgmediator} wwmediators: Web service to Web service mediators can be used to achieve two different types of objectives. First, they can represent the equivalence between two services solving the same goal (e.g. this can be use for automatic failure recovery). On the other hand, these mediators can be used to represent relations between two services involved in a composition process, i.e. how the outputs of the first invoked service have to be plugged in the inputs of the second one. Class wwmediator sub-class mediator usesmediator type oomediator hassource type {WebService, wwmediator} hastarget type {WebService, wwmediator} WSDL-S WSDL-S [16] has been developed by IBM 18 in conjunction with the LSDIS at the University of Georgia 19. From all the different Semantic Web service description languages, this is the one that most tries to use the different standards already accepted by the community. More precisely, it is based on some extensions over the WSDL standard, avoiding the creation of a complete new language and/or syntax for describing the semantics of Web services. WSDL-S has been designed and developed using the following foundations and criteria as guideline: The description language should be based on existent Web service standards more accepted in the community in order to minimize interoperability issues between traditional services and semantic based ones. The mechanism used to perform the annotation of Web services must be independent from the language used to describe the service semantics. This way, any language that can be used to capture different domain semantics (e.g. OWL, RDF, etc.) could be used to annotate the different service elements. The mechanism used to perform the annotation of Web services must be able to add multiple annotations to a unique Web service element in order to maximize the flexibility of the annotations (e.g. annotation with similar but different domain ontologies, based in different description languages, etc.). The mechanism used to perform the annotation of Web services must be able to annotate directly XML Schema based data types, avoiding the creation of complex translation processes between Web service representations (typically, WSDL) and the different semantics description languages. The process needed to jump between the semantic representation of the service and the syntactic one, i.e. the one that finally will be invoked (again typically, WSDL descriptions) must be as easy and direct as possible Miguel Ángel Corella Montoya Page 27

30 Based on these design criteria, WSDL-S description language is proposed as an extension of the WSDL Web service description language which allows the inclusion of semantic information directly into current WSDL descriptions. To do so, the extensibility capabilities of the WSDL specification are used, performing the following steps: WSDL elements (e.g. messages, operations, etc.) are extended in such a way a one to one mapping can be performed from the syntactic description contained in WSDL to a conceptual description represented in some semantics description language (e.g. OWL). The XSD describing the WSDL elements grows by the addition of a new multi valuated attribute that allows multiple semantic annotations in each WSDL element. Two new elements are defined as part of the WSDL operation element, namely, precondition and effect. This way, the importance of the state of the world when invoking services is taken into account within each WSDL operation. A new element is included in the definition of the WSDL interface element (included in WSDL 2.0 standard 20 ) enabling the inclusion, in service descriptions, of the categorization information of the service, which can be used, for example, at publication time in UDDI registries. Once all the extensions to WSDL have been commented, the last issue that must be resolved is how the different WSDL elements have to be annotated using this extensions. WSDL-S states that the elements that have to be semantically annotated are parameters, operations and interfaces. Parameter annotation: All the parameters involved in WSDL messages must be annotated depending on their complexity. Atomic parameters (i.e. the ones defined by an XML Schema primitive data type) must be annotated with a concept defined in the semantic description language used. On the other hand, complex parameters (i.e. the ones defined by fragments of XML Schema) can be annotated in two different ways. First, by the annotation of each one of their internal basic parameters. Second, by the assignation of a concept defined in the semantic description language to the whole complex parameter or data type. Operation annotation: Precondition and effects must be added to the operation definitions in order to reflect the interaction of the service with the real world and so, to enable using this information, for example, in service retrieval processes. These preconditions and effects must be defined in some sort of logical language, for example, SWRL 21, and must be annotated with the logical expression (or its reference) and an identification name. Interface annotation: Interfaces are annotated with the category in which the service is classified. This categorization is performed by including two different pieces of information. First, an URI identifying the taxonomy used for the classification (e.g. UNSPSC, NAICS, etc.) and, second, the name of the category in the selected taxonomy. In conclusion, WSDL-S provides with some interesting means to perform the semantic annotation of Web services having the desirable feature of being this Miguel Ángel Corella Montoya Page 28

31 semantically enhanced descriptions completely compatible with current technologies (as the mechanisms used for the annotation are part of the WSDL standard). Nevertheless, the more compatible with current technologies a service description language is, the less expressivity it can provide as current technologies are too much oriented to semantics and include no means to perform the representation of, for example, complex parameter definition based on the usage of logical axioms (as it can be done in WSMO). More precisely, WSLD-S allows the addition of semantic labels to different WSDL elements, but leaves apart issues as important as how the service works, and other different non functional features as QoS, etc SWSO SWSO [3], the last included Semantic Web service description language in this work, is the most recent of all the proposals analyzed (more precisely, it appears in 2005; less than a year ago from the time of this writing). Due to its novelty, this proposal has not much acceptance in the community or enough maturity to be taken into account in a possible development or research involving Semantic Web services. Nevertheless, it presents some very interesting ideas and proposals. With sufficient time and evolution (at least as much as the other existent proposals) maybe SWSO could be the Semantic Web service description language finally standardized and stated as Recommendation by the W3C. Due to the novelty of the specification I will not include in this work the concrete details of the SWSO proposal. Nevertheless, a list with some (but not all) interesting features of the proposal can be found next: Semantic Web service descriptions in SWSO are created using logical axioms having the expressivity provided by First Order Logic. This way, every service property or feature definition is described in form of predicates. These logical based descriptions allow an easy integration with different reasoning tools, enabling the usage of reasoning in the achievement of the different Semantic Web service goals (i.e. discovery, invocation, composition, etc.). SWSO integrates some of the main ideas promoted by the three previous approaches. Namely, the three level definition of OWL-S, the logical axioms base from WSMO and the maximization of the compatibility with existent Web service technologies from WSDL-S. SWSO offers very complete definitions for the points less developed and mature of the other approaches, namely, the service process model definition and the service grounding Web Service classification In addition to the service related tasks already mentioned in the previous section, there are other issues related to services, not usually mentioned in the literature, which are very interesting as research topics and which could be key issues in order to achieve better results in the already presented tasks (i.e. discovery, invocation and composition). In this work, one of these alternative (or complementary) tasks has been explored; more precisely, the automatic (or semi-automatic) classification of Web services. Miguel Ángel Corella Montoya Page 29

32 As the number of Web services populating the Web grows, it is more difficult to keep track of all of them. Therefore, as it has been already commented, some automatic mechanisms easing the location (discovery) of services over the Web are needed. In addition, having all the services organised in some way before the discovery processes take place can enable both new discovery mechanisms and/or improvements in the retrieval results of existent mechanisms [21]. Today, the tendency is to use some predefine service taxonomies, containing hierarchies representing all the different service types (i.e. classes or categories) that exist, to perform service organisation inside service repositories (currently, UDDI registries). Although UDDI allows the usage of any service taxonomy (since the version 2.0 of the standard), even user defined ones, some standard taxonomies are usually involved in service organisation, for example, UNSPSC or NAICS (which were the only ones supported by the first version of the UDDI standard), offering lots of codes and labels to define many different service types or classes. Nevertheless, classification information is not often included at service publication time. The problem appears when trying to select the service category in which a service best fits, as standard taxonomies usually comprise thousands of different categories, codes and labels. Some research works have appeared (in addition to the one presented here) trying to offer (almost) automatic mechanisms to perform this service categorization, this is, the Automatic Web service classification. The most relevant ones will be included here. The Web service classification problem has been addressed in prior work from two main approaches, that can be classified in heuristic (e.g. [27]) and non heuristic (e.g. [9] and [18]). Let s present each one of the proposals referenced, show how they face the problem and comment the main drawbacks present in each proposal Non heuristic approaches First, let s focus on the two proposals already referenced as non heuristic approaches. In the proposal presented in [18], the authors offer two different strategies to achieve the Web service classification. Using the information contained in non-semantic, WSDL-based descriptions to select a category in which the service fits best. Dynamically creating the classification taxonomy by using the same nonsemantic descriptions as input. In both approaches, the procedure used by the authors in order to classify Web services follows almost the same steps. 1. They use word extractors, based on Natural Language Processing techniques, over both the WSDL descriptions and the textual information included in Web service containers (e.g. UDDI registries or Web services online aggregators as, for example, BindingPoint.com 22 ) Miguel Ángel Corella Montoya Page 30

33 2. They construct different term vectors (e.g. one for the input information, one for the output information, one for the textual information, etc.) with all the words that have been automatically extracted in the previous step. 3. Finally, they use some different classification techniques for term vector categorization, namely, machine learning techniques using different classifiers (e.g. Naïve Bayes) in their first strategy and clustering techniques in the second one. So, this way, the Web service classification problem is translated into a text classification problem as, once the keywords have been extracted from the WSDL descriptions, there is no trace of Web services in the rest of the process. Nevertheless, this text classification based approach present three main drawbacks. The hypothesis of finding meaningful words in WSDL service descriptions is a very optimistic assumption, which in my experience does not usually hold. So, the main (valid) information source they use is the textual information contained in UDDI and so, this approach highly depends on the existence of this information, which, again, does not always exists for all the available services. Regarding the second strategy (i.e. the automatic creation of the service taxonomy using clustering techniques), the main problem is that the resulting taxonomy will not be meaningful for humans as there is no possibility of giving a meaningful label to automatically created categories. Finally, modifying automatically the taxonomy (again in the clustering based approach) implies that the categories in which services are classified are extremely volatile, since they may change each time a new service is added to the repository. Although the two last problems presented do not appear at publication time (i.e. when the classification takes place), they can be an issue if the service classification is intended to be used for service retrieval [21], since users could neither browse services by category (they do not have a meaningful name), nor could they get properly acquainted with the taxonomy (its structure is highly unstable). The approach to classification in [9] proposes similar steps as the ones described in [18]. The main difference is the usage of Support Vector Machines as term vector classification mechanism. Again, as it happens in the previous commented approach, service classification is reduced to a text classification problem. In addition, the proposal offered by the authors of [9] provides service publishers with extra information after the overall classification process has taken place. More precisely, a concept lattice, extracted using Formal Concept Analysis over the term vectors, is presented to the publishers. The offered lattice allows service developers to know how the words included in their service descriptions (the ones extracted during the classification process) contribute to the selection of a specific category. With this information, service developers could, for example, modify some words contained in their service descriptions which may cause ambiguity in the classification process. Due to the similarity between this proposal and the one commented before, it shares the drawbacks pointed above, namely the assumption of having meaningful words in descriptions and the fact the classification process has little to do with services as they are treated as pure pieces of text. Miguel Ángel Corella Montoya Page 31

34 Heuristic approaches Regarding Web service classification based on the usage of heuristics, I would like to introduce here the work performed by the authors of [27]. In their proposal, a framework for the automation of the semantic annotation of services (i.e. description, based on domain ontology concepts, of service parameters) is presented. The proposal offered works as follows: A set of domain ontologies exists, used to perform the annotation of service descriptions; this is the assignation of one or more semantic concepts to each one of the parameters that appear in the Web service interface. Domain ontologies are treated as a service categories, as each one of them only contains the concepts related to one service domain, and each service domain can be treated as a service category. Service classification is used to perform both, the annotation of the service with one service category and the selection of the most suitable domain ontology that has to be used during the semantic annotation of the service description. To perform this domain ontology selection and service categorization they rely on the hypothesis of having structure similarities between the data types defined in the WSDL description (in XML Schema 23 ) and the structure of the semantic concepts contained in domain ontologies. Based on this hypothesis, they have developed an algorithm based on schema matching enabling both the comparison between data type definitions and the measurement of the similarity between them. The complete algorithm works as follows: 1. Translate the Web service data type schema to an intermediate representation developed by the authors. 2. Translate the domain ontology concepts to the same intermediate representation. 3. Use the algorithm developed to find similarity values between the service and the domain ontology (as both representations are in the same syntax this is done almost directly). 4. Select the domain ontology with which the service maximizes the similarity value in order to use it for the automatic service annotation and for the categorization of the service. 5. Perform the rest of the automatic annotation process (it will not be explained as it is out of the focus of this work). Although this proposal is the only one (from the ones analyzed in this work) that focus on the usage of Web services for the classification (not only for the extraction of relevant words as it happens in the previous commented approaches), it has one main drawback that has to be taken into account when being presented. Best practices for Web service definition prescribe document-based descriptions, where service messages should consist of one unique part defined by a complex schema (in the types section of the WSDL description) containing all the different parameters needed to invoke the service operation selected. So, it is expectable that most of the service descriptions will have this form of data type and message representation. The problem 23 Miguel Ángel Corella Montoya Page 32

35 is that having this form of representation it would be difficult to find similarities with domain ontology concepts as, usually, no single domain concept will contain the complete structure of the service message. This way the similarity values between services and domain ontologies will decrease, and thus, the probability of service misclassification will grow Other related research areas In addition to the techniques and approaches already mentioned (i.e. the ones focusing exclusively in service classification) there are other research areas that are related with the one discussed in this work. More precisely, I would like to include here a brief reference to service matchmaking as it is the one I think more related with the research presented here. A lot of work has been performed regarding service matchmaking (e.g. [19], [23] and [29]). This research topic is related to Web service classification in that the classification approach followed in this work (for example), computes the similarity degrees between services in order to assign them to a common category, and service matchmaking aims to find services that match a concrete capability description (which can be seen as a form of similarity computation) in order to automatically invoke or compose service into further and more complex processes. Nevertheless, although there are similarities between both research topics, there are also some differences that have to be taken into account before trying to use service matchmaking techniques within the Web service classification domain. The main difference between both research areas is that while service classification admits some degree of fuzziness in service matching (i.e. similarity measures used always provide with a continuous similarity value), service matchmaking typically does not. As long as service matchmaking is typically used for the selection of services that will be later invoked, or even included in some complex processes, the matching between the needed capabilities and the ones provided by services has to be much more rigid. This way, service matchmaking will (usually) use discrete matching levels between services; more precisely, between services and needs representations (e.g. no match, complete match, partial match, etc.). Miguel Ángel Corella Montoya Page 33

36 3. The Web service classification problem In this section, the Web service classification problem is presented in order to clarify the motivation of the work here presented. The objectives pursued by this problem presentation and analysis are listed below: Providing a complete overview of the concrete scenario in which the classification problem appears, including examples of why current Web service technologies provoke this situation. Showing the need of Web service semantics (within the concrete domain of the problem) and, so, provide with a motivation of why the work here presented has been developed using Semantic Web services. The content of this section will be divided then in two different subsections, each one covering one of the already commented final objectives Classification problem presentation Service categorization (classification) is commonly used to facilitate or provide with more information about services enabling different approaches to the problem of Web service retrieval and discovery, be it by: Using the categorization to enable service consumers browsing manually through repositories in order to find the service which best fits their needs, although this service retrieval method is not likely to be used as it could be very time consuming if the used repositories are too large. Using the classification information to enrich the capabilities of software agents or mechanisms performing processes of automatic service discovery (e.g. service search engines, UDDI repositories or even new automatic semantic service discovery approaches [20]). At this point, it seems that service classification is not a problem but an advantage for other Web services tasks and, indeed, it is. Nevertheless, there are some issues involved in the fact of having a service taxonomy organizing service repositories contents which prove useless the categorization as it is performed nowadays. Let s expose the ones found more relevant: Classification taxonomies or categorizations could be extremely large comprising thousands of categories, within multiple hierarchical levels. Examples as UNSPSC (~ categories) or NAICS (~ 2300 categories) have been already cited in this document. The number of Web services can grow quite large, not only in public repositories available over the Web, but in private repositories of large companies, e.g. Microsoft Corporation or IBM Corporation, having so many departments and so much reusable software components that is feasible to think of an internal SOA organization in which services can be published, retrieved and consumed by all the different employees. The placement of a service under a proper category, manually performed by service publishers or repository administrators, requires a considerable Miguel Ángel Corella Montoya Page 34

37 amount of knowledge of the taxonomy, the service characteristics, the application domain, the overall organization of the repository, implicit guidelines, etc., in order to make good classification decisions. Few service publishers, or even repository administrators, could have all this kind of knowledge. Version 2.0 and 3.0 of the UDDI specification (used today in the development of UDDI service registries) do not check the validity of the categorization values introduced as part of service descriptions when these are published on the registry, i.e. categorization information is managed (e.g. taxonomy creation, standard taxonomies usage, used codes validity, etc.) and validated only by registry administrators. As a consequence of these problems and drawbacks, the classification task often becomes overwhelming for service publishers and repository administrators, leading to such a high degree of misclassification that it makes impossible the usage of the service categorization information in discovery and retrieval processes. This argument can be reinforced by taking a look at the complete process that a service publisher has to perform in order to achieve the publication of a Web service in a repository. The objective of this example is to show the different process directions the publisher could follow and analyze e.g. the number of paths leading to correct/incorrect classifications, the different effort involved in each publisher actions, etc. The publication process can be found represented in a graphical way in Figure 12. Select Category Y Capable? Search service category Y Find them? N N Save Service w/o category Correct selection? N Y Successful classification Erroneous classification Introduce Service Info. Search taxonomy codes Y Add Category? N Initial task Intermediate task Final taks If-then-else Fig. 12. Web service publication process in a repository (e.g. UDDI repository) showing the different actions, final states and possible decisions taken by the publisher along the process. Let s analyze each action independently to understand their complexity and the effort needed by the publisher to complete the whole process. This way, the usefulness of the proposed approach to ease the publication task can be motivated. Introduce service info initial task: This task consists on the introduction/preparation of the information related specifically to the service but not to its categorization (e.g. service name, business supporting the service, service description, etc.). Add category? if-then-else: This is the first point of the process in which the publisher could decide not to add category information to the service for many Miguel Ángel Corella Montoya Page 35

38 reasons (e.g. does not understand the usefulness of adding such information, does not know anything about standard classifications nor how to create a userdefined one, etc.). Search taxonomy codes task: In this task, as UDDI does not longer validate the usage of UNSPSC or NAICS, the service publisher is forced to search for the specification of these taxonomies and the different categories/codes used by them. Finding the specification is not a difficult task; the problem is to find taxonomies representations that enable publishers to browse through the huge number of categories. For example, UNSPSC is provided (for free) in PDF format in the official Web page of the standard, which is not a very useful format having more than classes to be read. Find them? if-then-else: This second bifurcation point of the process describes the possibility of deciding not to include classification information in the service publication as it could be impossible to find the taxonomy specification and/or a way to browse it in order to find appropriate categories. Search service category task: This task consists on searching (almost manually) through the complete service taxonomy for the class (and code) that better fits with the domain of the service that is going to be published. If the publisher have found a browsing mechanism to manage the taxonomy, this task could be easier, but, anyway this is a very costly task. To show this, I include (in Figure 13) a screenshot of a Web page 24 offering means to browse through the complete UNSPSC taxonomy. Capable? if-then-else: This bifurcation point allows the description of the situation in which the service publisher although has been able to find the taxonomy and have tried to search for the most suitable category, has been unable to find it and have decided not to add categorization information in the service published. Select category task: In this task, the user has been able to find the taxonomy and search over it the service category. This is the last step, where the service publisher selects the category he/she believes to be the most suitable for the published service. Correct selection? if-then-else: This if-then-else task describes the situation in which the user has selected a category for the published service. Nevertheless, it is possible that the user selects the wrong category (e.g. too much general, too much specific, one that does not completely describe the domain of the service, etc.). Save service w/o category final task: This task consists on the final publication of the service without having included any categorization information. We can consider this as an erroneous final state within the problem domain we are treating in this document, as the categorization information (due to its inexistence) is useless for further discovery processes and/or other processes that can make use of it. Successful classification final task: This task consists on the final publication of the service having successfully selected the category in which best suits. This is the unique successful final state for the publication process as it is understood for this work. Erroneous classification final task: This task consists on the final publication of the service having added categorization information but being this information erroneous. Having selected a category for the service which is not the one in 24 Miguel Ángel Corella Montoya Page 36

39 which it best fits. Again, within the boundaries of the problem we are treating here, this final state can be seen as an erroneous state. Fig. 13. Screenshot of the online UNSPSC browser that ca be used (for free) to search for the category in which a Web service better fits. Although it could be very difficult to find such a category this online browser is an improvement over the PDF file offered in the official UNSPSC Web page. It can be seen that during the classification process there is more probability of performing a not successfully classified publication (4 out of 5 possible paths) than a successful one and, so, having useful classified publication in the repository. Moreover, most of the non successfully classified publication paths (3 out of 5) are due to issues related with the service taxonomy in which the publisher is trying to classify (e.g. taxonomy specification not found, impossible to find desired classes or wrong class selection). In conclusion, the probability of having published services with wrong or inexistent classification information is so high, that service publishers or repository administrators should have to stand a great effort to check this information in order to enable its usage in further discovery processes. So, the work here presented aims at alleviating service publishers and administrators work by automatically providing them with a small set of likely appropriate taxonomy categories (ranked by likelihood of appropriateness), when a new service has to be registered in a repository. Let s present how this proposal approaches the service classification problem. Given a set of service descriptions, already classified under some classification taxonomy, and a new service description, a heuristic for automated service classification is proposed, based on the comparison of the unclassified service with the set of already classified services whereby a measure of likelihood that the service should be assigned a certain category is computed. Miguel Ángel Corella Montoya Page 37

40 In conclusion, it seems that all the work that has to be done consists on taking some existent service descriptions, developing a heuristic enabling service comparison and similarity computation and start classifying services. Unfortunately, it is not so simple. Why not? Because to develop such a heuristic to classify services, some information about the service is needed beyond its functionality description, this is, beyond the information contained in the existent WSDL based service descriptions The need for service semantics In the previous section, the fact that currently existent WSDL based service descriptions are not sufficient to solve the classification problem has been exposed. In this section I explain and demonstrate this fact and, so, motivate the usage of Semantic Web service descriptions instead of WSDL based ones in the work here presented. As it has been already mentioned before, WSDL based descriptions provide us with the complete details about the operations a service provides, as well as the input and output information (parameters) involved in the invocation or usage of those operations. By the way, this information enables comparisons between services. Nevertheless, this service comparison would be greatly enhanced if the descriptions were enriched with further semantic information as it can be seen in the following concrete example. Let s take two different Web service WSDL based descriptions: one of a service enabling currency conversion and the other one computing distance between cities. The complete descriptions of these services are the key issue for this example; so, they have been included next: FIRST SERVICE DEFINITION: Currency Converter <?xml version="1.0" encoding="utf-8"?>  <wsdl:definitions targetnamespace=" xmlns:apachesoap=" xmlns:impl=" xmlns:intf=" xmlns:soapenc=" xmlns:wsdl=" xmlns:wsdlsoap=" xmlns:xsd="  <wsdl:types> <schema targetnamespace=" xmlns="  <complextype name="currencyconverterrequest"> <sequence>  <element name="currencycode1" type="xsd:string"/>  <element name="currencycode2" type="xsd:string"/> </sequence> </complextype> Miguel Ángel Corella Montoya Page 38

41  <complextype name="currencyconverterresponse"> <sequence>  <element name="conversionrate" type="xsd:double"/> </sequence> </complextype> </schema> </wsdl:types>  <wsdl:message name="convertcurrencyrequest"> <wsdl:part name="request" type="impl:currencyconverterrequest"/> </wsdl:message>  <wsdl:message name="convertcurrencyresponse"> <wsdl:part name="response" type="impl:currencyconverterresponse"/> </wsdl:message>  <wsdl:porttype name="currencyconverterporttype">  <wsdl:operation name="convertcurrency" parameterorder="request"> <wsdl:input message="impl:convertcurrencyrequest" name="convertcurrencyrequest"/> <wsdl:output message="impl:convertcurrencyresponse" name="convertcurrencyresponse"/> </wsdl:operation> </wsdl:porttype>  <wsdl:binding name="currencyconvertersoapbinding" type="impl:currencyconverterporttype">  <wsdlsoap:binding style="rpc" transport="  <wsdl:operation name="convertcurrency"> <wsdlsoap:operation soapaction=""/>  <wsdl:input name="convertcurrencyrequest"> <wsdlsoap:body encodingstyle=" namespace=" use="encoded"/> </wsdl:input>  <wsdl:output name="convertcurrencyresponse"> <wsdlsoap:body encodingstyle=" namespace=" use="encoded"/> </wsdl:output> </wsdl:operation> </wsdl:binding> Miguel Ángel Corella Montoya Page 39

42  <wsdl:service name="currencyconverterservice"> <wsdl:port binding="impl:currencyconvertersoapbinding" name="currencyconverter"> <wsdlsoap:address location=" </wsdl:port> </wsdl:service> </wsdl:definitions> SECOND SERVICE DEFINITION: City Distance Calculator <?xml version="1.0" encoding="utf-8"?>  <wsdl:definitions targetnamespace=" xmlns:apachesoap=" xmlns:impl=" xmlns:intf=" xmlns:soapenc=" xmlns:wsdl=" xmlns:wsdlsoap=" xmlns:xsd="  <wsdl:types> <schema targetnamespace=" xmlns="  <complextype name="distancecalcualtorrequest"> <sequence>  <element name="firstcity" type="xsd:string"/>  <element name="secondcity" type="xsd:string"/> </sequence> </complextype>  <complextype name="distancecalculatorresponse"> <sequence>  <element name="distance" type="xsd:double"/> </sequence> </complextype> </schema> </wsdl:types>  <wsdl:message name="distancecalculatorrequest"> <wsdl:part name="request" type="impl:distancecalculatorrequest"/> </wsdl:message>  <wsdl:message name="distancecalculatorresponse"> <wsdl:part name="response" type="impl:distancecalculatorresponse"/> </wsdl:message> Miguel Ángel Corella Montoya Page 40

43  <wsdl:porttype name="distancecalculatorporttype">  <wsdl:operation name="calculatedistance" parameterorder="request"> <wsdl:input message="impl:distancecalculatorrequest" name="calculatedistancerequest"/> <wsdl:output message="impl:distancecalculatorresponse" name="calculatedistanceresponse"/> </wsdl:operation> </wsdl:porttype>  <wsdl:binding name="distancecalculatorsoapbinding" type="impl:distancecalculatorporttype">  <wsdlsoap:binding style="rpc" transport="  <wsdl:operation name="calculatedistance"> <wsdlsoap:operation soapaction=""/>  <wsdl:input name="calculatedistancerequest"> <wsdlsoap:body encodingstyle=" namespace=" use="encoded"/> </wsdl:input>  <wsdl:output name="calculatedistanceresponse"> <wsdlsoap:body encodingstyle=" namespace=" use="encoded"/> </wsdl:output> </wsdl:operation> </wsdl:binding>  <wsdl:service name="distancecalculatorservice"> <wsdl:port binding="impl:distancecalculatorsoapbinding" name="distancecalculator"> <wsdlsoap:address location=" </wsdl:port> </wsdl:service> </wsdl:definitions> Having these service descriptions, let s explore which sections can be used as input information for a heuristic measuring the similarity between services. Note that here, when terms like relevant, valid, etc., are used they are related only to the scope of the classification problem but, obviously, not to the validity or usefulness of the WSDL specification. Miguel Ángel Corella Montoya Page 41

44 Service section (wsdl:service): As this section only includes information about the location of the service it is irrelevant, as the URL can not be used to measure similarity between services. Binding section (wsdl:binding): As this section only includes information about how parameters and messages have to be encoded in order to establish communication channels between clients and services, it is also irrelevant information as the protocol used in the communication cannot be used to measure similarity between services. Port type and operation sections (wsdl:porttype and wsdl:operation): These sections can be seen as the first sections valuable of the WSDL description as they describe the set of operations that can be performed by a service. The comparison of operation sets can be seen as a form of similarity measure. Nevertheless, using only this information to develop a service similarity measure will provide with very poor (and so, useless) results as we have no information about the messages involved in the service invocation. Message and types sections (wsdl:message and wsdl:types): Is in these section were almost every useful information of the service is contained. These sections describe the contents of the messages needed to invoke the service and so the relevant information managed by it (this is, the domain of the service). Then, having the information described by data types, service messages, service operations and port types we can create a heuristic to measure service similarity. Although this is true, let s take a closer look to the example provided. The currency converter service description contains only one operation, which takes: As inputs, a message with one part containing two currency codes (of type string). As outputs, a message with one part containing the conversion rate between the two currencies (a double). On the other hand, the trip time calculator service, which, again, has only one operation taking, in this case: As inputs, a message with one part containing two city names (of type string). As outputs, a message with one part containing the calculated distances between the cities in kilometres (a double). It can be seen that the syntactic description of the service is exactly the same (i.e. same number of operations, messages and parameters; and same basic data types managed). This situation present the possibility of finding high similarity degrees between services that, as in the shown example, have similar syntactical descriptions and, so, assuming they belong to the same service category when, obviously, they do not. The problem we are facing is that WSDL descriptions are intended to formalize the information needed to use the service functionalities provided (this is, they syntactically describe the service) once this service has been found and selected. So, its specification has not been developed taking into account service related tasks that take place before the invocation as e.g. discovery, selection, composition or, in this case, classification. Miguel Ángel Corella Montoya Page 42

45 Having only syntactic descriptions, the information that can be used to compare services to compute their similarity measure is much reduced. For example, we can use the number of operations (an important issue, but with little relevance for service classification and comparison), the number of inputs / outputs parameter (also important, but again, with little key information for service comparison) or parameter types (this is the key information, the description of the parameters the service deals with), nevertheless this type of information will provide with a high degree of misclassification (services as the ones included in the example will provide with a high similarity value using only the mentioned information fragments). So, we need some extra information about the service. But, what extra information could be extracted from service messages and parameter data types described in WSDL? Unfortunately, there is not too much. Let s analyze the relevance and validity of each information fragment of the message and types descriptions: Message and data type structures: WSDL descriptions could be written following different styles or recommendations supported by the specification. In relation to the message structure the specification offers two styles: the document oriented style, in which service messages are composed of a unique part containing an XML Schema fragment defining a complex type containing all the different message parameters; and the RPC oriented style, in which service messages are composed by as many parts as simple parameters. So, as we can have services providing exactly the same functionality but with descriptions supported by different styles, the structure of the messages or the data type descriptions cannot be considered key information for the service comparison. Message and data type names: Although these names are considered key information for other approaches mentioned in the state of the art using NLP to process them (see Section 2), the hypothesis of meaningful names in service descriptions is, maybe, too much optimist. Many examples can be found of services in which the parameter names have nothing to do with the semantics of those parameters (e.g. Axis 25 create automatic WSDL descriptions from Java interfaces but, by default, change all the parameter names in Java for in0, in1, etc., the Google Web Service 26 maintains the names of the different input HTML controls presented on their Web interface, such as q, hl, etc.). In conclusion, I defend that using NLP over non meaningful names will provide no improvement. Message data types: Using the information provided by the basic data types (e.g. a double data type for an amount) seems to be our best candidate. And indeed it is, but the results that can be expected for comparing services by the number of strings, doubles, floats, etc., they have are not too much promising. Let s take the given example, the comparison of the services by their data types will provide the highest value of similarity, as their interfaces are (syntactically) the same. Nevertheless, it is clear that those services are completely different. In conclusion, we need to know what the service uses as inputs and outputs parameters but it is not sufficient to know the basic (syntactic) data types involved in Miguel Ángel Corella Montoya Page 43

46 the service interface. Therefore, more information is needed in the service descriptions; some semantic information enabling the classification mechanisms to make differences between e.g. currency codes and city names, or conversion rates and distances. But, how this information is obtained? Obviously, it can not be found in the WSDL descriptions as WSDL is not intended to represent the semantics of a service. Nevertheless, there are different approaches and proposals (mentioned before in Section 2) that are aiming to represent this service semantics in a formal way, converting Web services into Semantic Web services. Let s see the previously presented example from the point of view of the usage of one of the Semantic Web services approaches, namely WSMO. FIRST SEMANTIC SERVICE DEFINITION: Currency Converter namespace { _" dc _" sample _" } /* Web service description */ webservice _" /* Service capability description: Currency converter capability */ capability ConverterCapability /* Service inputs definitions (logical axiom) */ precondition definedby?ccode1 memberof sample#currencycode and?ccode2 memberof sample#currencycode. /* Service outputs definition (logical axiom) */ postcondition definedby?conversionrate memberof sample#currencyconversionrate. SECOND SEMANTIC SERVICE DEFINITION: City Distance Calculator namespace { _" dc _" sample _" } /* Web service description */ webservice _" /* Service capability description: City distance calculator capability */ capability CalculatorCapability /* Service inputs definition (logical axiom) */ precondition definedby?city1 memberof sample#city and?city2 memberof sample#city. /* Service outputs definition (logical axiom) */ postcondition definedby?distance memberof sample#distance and?distance[sample#inunits hasvalue "km"]. Miguel Ángel Corella Montoya Page 44

47 It can be seen that, with this type of descriptions, instead of having the service described from syntax point of view (e.g. two double numbers as input parameters, a string as output parameter, etc.) we have a description of the service based on the semantics of its inputs and outputs (e.g. duration, speed, currency code, etc.). This kind of semantic information enables us to make difference between services with the same syntactic description, and, this way it can be expected that classification and comparison measures will provide with better results. Thus, I advocate here for using Semantic Web service descriptions instead of WSDL based ones in order to enable the comparison of Web services without using NLP techniques nor other techniques as the ones used in the other classification approaches mentioned before (see Section 2). Nevertheless, there are two issues I would like to clarify regarding this statement: The approach presented in this work is completely independent of the language used to describe the semantics enabling this approach to be compatible in the future with whichever description language that becomes standard. Then, there was no specific reason for selecting WSMO as description language for the examples. The lack of usage of NLP techniques in this approach does not discard their future usage if there is evidence of real benefits on its application in order to improve the results obtained. In conclusion, the fact that Semantic Web services descriptions are needed in the classification problem solution proposed has been proven. So, having described the service classification problem and having Web services descriptions (with all the needed information), the next step is to define and describe the classification heuristic that will be used (having these service descriptions as starting point) in order to achieve the objectives stated for this research work. Miguel Ángel Corella Montoya Page 45

48 4. The heuristic The main goal of this section is to present the heuristic developed to achieve the classification objective, this is, the different measures and formulas used to perform (almost automatically) the assignation of services to taxonomy categories. The heuristic developed is based on the idea of having three different layers, or granularity levels, in which the complete classification process can be divided: the category level, the service level and the parameter level. Following this approach, the heuristic is composed of three different similarity measures between different problem components, each one corresponding to one level. Let s see why the heuristic has been divided this way. Our main objective, as it has been commented all over the document, is to assign (almost automatically) a category to a service. More specifically, the category assigned must be the one in which the service fits best, this is, the one that better describes the domain to which the service belongs (e.g. e mail validation, weather information, address solver, etc.). This assignation must be done according to some criteria. In this case, that criterion is going to be a similarity measure between a service and a category. So, here appears the first level of granularity: the category level, i.e. the heuristic layer where a category is selected or recommended. But, how can be measured the similarity between a service and a category? Some different possibilities to determine how similar a service is to a category can be found in the literature. The most relevant for this research have been already mentioned and analyzed in Section 2. In this case, I have based the similarity measurement in the hypothesis of having a repository of services previously classified. This way, each category can be seen as a set of services related to it. And then, the similarity between a service and a category will be measured as a combination of the similarities obtained between the service and all the services already classified under the category (the specific formula used to combine the similarity measures can be found on the formal presentation of the heuristic). So, the next question is, how can the similarity between two different services be measured? The fact is that, for the classification purpose, a service can be seen as the set of operations contained in it. In addition, those operations can be seen as structured sets of input and output parameters. So, the similarity measure between services can be reduced to the similarity measure between structured sets of input and output parameters (semantically annotated). How the different sets of parameters or, more precisely, their similarity measure values are combined can be seen in the formal explanation below. Then, here appears the second level of granularity or heuristic layer: the service layer, this is, the level in which a similarity degree between services is given in function of the similarity of their input and output parameters. But, again, how can we compute the similarity between parameters? It is important to remember here the fact that the heuristic is not dealing with basic data type parameters (e.g. double, string, etc), as they are described in WSDL descriptions. Semantic Web service descriptions using semantic concepts to describe parameters are used instead. So, the issue that has to be solved is how the similarity between domain ontology concepts can be computed. Miguel Ángel Corella Montoya Page 46

49 Once more, there are lots of possibilities to solve the problem (some references can be found in the formal presentation of the last heuristic level, see Section 4.3). Nevertheless, in this work a new measure enabling the comparison between concepts contained in the same ontology has been developed in order to provide with a measure completely oriented to the focus domain of this work. This ontology concept similarity measure compounds the third heuristic layer: the parameter layer, i.e. the assignation of a similarity value to pairs of domain ontology concepts. In successive subsections, a formal explanation of each heuristic layer can be found. This explanation will contain, for each level: the formal explanation of the similarity measure, the mathematical representation, the pseudo code enabling the efficient implementation of the measure and some graphs representing the behaviour of the measures Category level As it has been already explained, this layer aims to obtain a similarity measure between a service and a category. This similarity value is the one that will be used later for selecting a category to be assigned to a service (i.e. automatic classification) or recommending/ranking the categories yielding the highest similarity values. Let s explain in a formal way how the similarity measure compounding this layer works. Let S be the set of all the Web services contained in a repository, and let C be the taxonomy used to classify the services in the repository. Allowing a service to be classified under several categories of the taxonomy, we may define the classification C as a mapping τ : S 2 C. Given a new service s to be added to S, the pursued goal is to find the categories in C that best suit s. Given c C, let P(s:c) be the probability that c is an appropriate classification for s. This probability is estimated by comparison of s with all the services classified under c. With this aim, taking P(s:c) ~ 0 if {x S c τ (x)} = Ø (i.e. c is disregarded as a potential category for s if there is no previous service s S classified under c), the estimation for P(s:c) can be written as it is shown in expression (1). ( ( )) x S ( s c) s c c τ ( x) P : : P : (1) The inclusion-exclusion principle [33] allows the computation n-ary unions of sets, as the one that can be found on expression (1). Let s see how this principle works when applied to probability by representing it in a mathematical way in expression (2). P P P P... P n n n Ai = ( Ai) ( Ai Aj) + ( Ai Aj Ak) ± Ai (2) i= 1 i= 1 i, j: i< j i, j, k: i< j< k i= 1 Now, using the previous expressions (1) and (2) we can refine the representation of the P(s:c) estimation by rewriting the right hand-side of (1), and obtain this way the Miguel Ángel Corella Montoya Page 47

50 representation shown in expression (3), provided that s: c c τ ( x) independent for all x S. ( ) A + 1 ( ) ( τ ( )) τ ( ) are pairwise P( s: c) : 1 P c x P s: c c x (3) A S x A Since c τ(x) is true iff x {x S c τ(x)}, and assuming a crisp service classification (i.e. c τ(x) is either true or false, as opposed to fuzzy classification where P(c τ(x)) [0, 1]), expression (3) can be reformulated in expression (4). ( ) A + 1 ( s c) ( ) s c c τ ( x) P : : 1 P : (4) 1 A τ x A Finally, an estimation of P(s:c c τ(x)) must be performed in terms of the similarity between the service s and each service x classified under category c. This is, the probability of a service belonging to a category is estimated by a similarity measure between the service and all the services already classified under that category. Applying this estimation to expression (4) we get the final category layer measure representation (in expression (5)). ( s c) ( ) 1 A τ A + 1 P : : 1 sim( s, x) (5) It is easy to see that this similarity measure between service sim(s,x) will compound the second heuristic layer. Let s analyze some relevant properties of the estimation presented in expression (5). x A P(s:c) [0, 1] provided that sim(s,x) [0, 1] x {x S c τ(x)}. P(s:c) sim(s,x) x {x S c τ(x)} (in particular this means that P(s:c) ~ 1 if sim(s,x) = 1 for some x {x S c τ(x)}). P(s:c) increases monotonically with respect to sim(s,x) x {x S c τ(x)}. Since x {x S c τ(x)}, P(s:c) = sim(s,x) + (1 sim(s,x)) P(s:c S {x}) 27, P(s,x) can be computed efficiently (i.e. with linear Θ( c ) complexity. Figure 14 shows in a graphical way the behaviour of the presented estimation measure, the test presented in the graphs is a simulation of the similarity measure between a service and a unique category which contains only two services. More complex examples can not be easily displayed in a graphical way (as they present multidimensional models). 27 Here, P(s:c S {x}) denotes the computation of P(s:c) with x removed from S. Miguel Ángel Corella Montoya Page 48

51 Fig. 14. Service to category similarity measure, i.e. P(s:c). Graph (displayed from to angles) showing the behaviour of the estimation with respect to sim(s,x) and sim(s,y) (i.e. similarity between a service s and a category c containing two services x and y). Finally, in order to help to understand both the mathematical formalization of this heuristic layer and how this heuristic level has been included in the implementation section of this research work (see Section 6) I will include here some pseudo code representing how this measure could be developed efficiently. // Method used to retrieve an appropriateness value between a service and a // category GetCategorySimilarityFor (aservice, acategory, aserviceindex) sim <- GetServiceSimilarity(aService, acategory[aserviceindex]) if acategory has more services then nextsim <- GetSimilarityFor (aservice, acategory, aserviceindex + 1) sim <- sim + [(1 - sim) * nextsim] return sim else return sim 4.2. Service level In this layer of the heuristic the main goal is to obtain a similarity measure between two different services despite the category they belong to. As it has been already explained before, the result value of this heuristic layer is used by the previous one, i.e. the category level, as criterion to estimate the appropriateness of a category for a service. Let s see how the similarity measure works. Let P be the set of all the parameters appearing in all the services contained in S, and OP the set of service operations. As in semantic service descriptions, as for example, WSMO-based ones, in which input and outputs (i.e. WSMO preconditions and postconditions) are expressed in logical axiom format, the concept of operation is not too clear, I will understand by operation the different sets of input/output parameters that have to be used to perform a single action through the service (e.g. login, query, purchase, reservation, etc.). It could be interesting to study how using an operation ontology to define formally this type of service parameter structure could help to improve the complete classification heuristic, but this will be addressed as future work for this research. Miguel Ángel Corella Montoya Page 49

52 If the set of parameters of service s is denoted as P s P, and the set operations as OP s OP, the similarity measure between services can be defined as it is stated in expression (6). ( ss) f( ( Ps Ps' ) ( OPs OPs' )) sim, ' = sim,,sim, (6) As the improvement on combining similarity measures calculated from the comparison of parameters in both structured way (with operation distinction) and unstructured way (compare every parameter in first service with every parameter in second one) has not been proved yet, I will use the similarity function f (x,y) = y. This is, using only the comparison of structured parameters. The inclusion of the other comparison in the formula, as long as the definition of a combination function f is addressed as future work at the time of this writing. So, the comparison between the set of operations of two services has to be defined. This similarity between OP s and OP s will be computed as the average of the best possible pairwise similarities obtained by an optimal pairing of the elements from the two sets. This can be formalized as follows. We define top(op, OP ) as the pair (op,op ) OP OP that maximizes sim(op,op ). Then, let OP 1 = OP, OP 1 = OP, OP k = OP k-1 {op k-1 } and OP k = OP k-1 {op k-1 }, where (op k-1,op k-1 ) = top(op k-1,op k-1 ). With these definitions, the similarity between two operations sets can be mathematically described as in expression (7). ( OP OP ) sim, ' = ( P P ) sim( top ( OPi, OP ' i) ) (7) min, ' i= 1 ( OP OP ) max, ' Therefore, the only thing which relies undefined in expression (7) is how to find the similarity between two operations. As these operations are structures containing parameters, this similarity measure have to be layered on top of a parameter similarity measure which, as have been already explained, compounds the next and last layer of the heuristic proposed. In addition, as the structure of the operations is relevant (i.e. making difference between input parameters and output parameters is critical for the classification process) another similarity measure is needed to describe how similarity at service level works. Thus, the similarity measure between two operations op and op can be found in expression (8). ( op op ) ( Iop Iop ') ( Oop Oop ') sim, ' = sim, sim, (8) In expression (8) I op, I op, O op and O op are the set of input and output parameters of the operations op and op respectively. Finally, the similarity between two parameter sets is computes in turn the same way as the similarity between operation sets is done, i.e. as the average of the best possible pairwise similarities obtained by an optimal pairing of the elements from the two sets (see expression (7) for an equivalent mathematical representation). Miguel Ángel Corella Montoya Page 50

53 Figure 15 represents in a graphical way the behaviour of the similarity measure between service operations with respect to the similarity value of their input and output parameter sets. The test presented in the graphs is a simulation of the similarity measure between two services having only one operation. More complex examples cannot be easily displayed in a graphical way (as they present multidimensional models). Fig. 15. Service to service similarity measure graph (displayed from two angles) showing the behaviour of the similarity measure with respect to sim(i op, I op ) and sim(o op, O op ) (i.e. similarity between two services, each one containing only one operation). Note that the service comparison performed at this heuristic level returns values in the range [0, 1], provided that the similarity between ontology concepts (i.e. the parameter layer) is also within that range. Finally, as in the previous heuristic layer presentation, with the goal of easing the comprehension of this heuristic service layer, I include below a piece of pseudo code showing how this similarity can be implemented in an efficient way. // Method used to retrieve a similarity value between two services assuming // firstservice has less or equal number of operations than secondservice GetServiceSimilarity (firstservice, secondservice) firstoperations <- firstservice operations secondoperations <- secondservice operations while firstoperations is not empty sim <- GetMaximumSimilarity (firstoperations, secondoperations) globalsim <- globalsim + sim remove selected operation pair from operation lists globalsim <- globalsim / max (firstoperations size, secondoperation size) return globalsim // Method used to retrieve the optimal operation pairing similarity GetOpMaximumSimilarity (firstoperationset, secondoperationset) for all the operations in firstoperationset for all the operations in secondoperationset firstinputs <- current firstoperation inputs secondinputs <- current secondoperation inputs firstoutputs <- current firstoperation outputs secondoutputs <- current secondoperation outputs inputsim <- GetInputSimilarity (firstinputs, secondinputs) outputsim <- GetOutputSimilarity(firstOutputs, secondoutputs) if maxsim < (inputsim * outputsim) maxsim < inputsim * outputsim return maxsim Miguel Ángel Corella Montoya Page 51

54 // Method used to retrieve the similarity between input sets. Pseudo code is // not included as it is practically equivalent to GetServiceSimilarity() GetInputSimilarity(firstInputSet, secondinputset) // Method used to retrieve the similarity between output sets. Pseudo code is // not included as it is practically equivalent to GetServiceSimilarity() GetOutputSimilarity(firstOutputSet, secondoutputset) 4.3. Parameter level In this final layer of the heuristic the goal is to obtain a similarity measure between two service parameters. As the heuristic deals with Semantic Web services descriptions, the concept parameter is used here to define the domain ontology concepts used to semantically describe the inputs and outputs of the service. So, the goal pursued by this heuristic layer is to perform comparisons between domain ontology concepts in order to get a numerical representation of the similarity between parameters that can be used as core measure for the previous heuristic layer. There exists many research efforts involved in the definition of mechanisms to find the similarity between ontology concepts (e.g. [6], [13] and [14]). In this work, a new approach has been presented as it provide with very satisfactory results for the specific domain in which it is to be applied. Anyway, it will be interesting to perform a study of other different approaches and similarity measures developed as well as a result comparison between different approaches. All this new research directions will be addressed as future work of this research. Next, the presentation of how this last proposed similarity measure works can be found. Let T denote the set of all concepts in the domain ontology. The similarity between two concepts is measures in terms of their distance in the ontology class hierarchy. Here, a hypothesis is done stating that the domain ontology hierarchy is formed by a directed acyclic graph, otherwise the automatic computation of the distance between ontology concepts would be impossible. Then, given two concepts t T and t T, let t 0 be the lowest common ancestor to t and t in T, and let d = dist(t,t 0 ) + 1 and d = dist(t,t 0 ) + 1 be the number of levels (plus one) between t, t and t 0 in the concept hierarchy. Having all this definitions, we can define the similarity measure as it is represented in expression (9). ( ) ( T ) α d d' 1 max dd, ' 1 sim ( tt, ') = 1 1 h ( ) d + d' min ( d, d' ) h T (9) To further clarify expression (9) next can be found the explanation of some relevant features related to that measure. h(t) is the total height of the concept hierarchy, which is introduce to measure the distance between concepts as a proportion of the total depth of the ontology. Miguel Ángel Corella Montoya Page 52

55 d d' The term increases (that is, the similarity result value decreases) with d + d' the difference in depth level between t and t. Note that the similarity would seem to increase with the length d + d of the shortest hierarchical path between them, but this is compensated by dividing min(d,d ), in a way that the similarity is essentially not sensitive to the depth of the concepts. α [0, 1] (empirically tuned to 0.8, for this approach) is a parameter that ensures a minimum non-zero similarity value, even for the most dissimilar concepts, in a way that the similarity ranges in some interval [min, 1] above 0, in order to relax the influence of the measure in the heuristic. max ( dd, ') 1 The factor 1 is introduced to reinforce the decrease of the h ( T ) similarity when the concepts are in the same branch of the ontology hierarchy, to achieve a monotonic decrease of the similarity value from 1 to 0 when one concept is super concept of the other (this is, one of the concepts is the first common parent). Figure 16 represents in a graphical way the behaviour of the similarity measure between ontology concepts with respect to the distances d and d of both concepts to their lowest common parent. Fig. 16. Concept to concept similarity measure graph (displayed from two angles) showing the behaviour of the similarity measure with respect to d and d in a twenty depth levels ontology. As it has been done in previous layers, the pseudo code allowing the efficient implementation of this similarity measure is included next to ease the comprehension of the complete mathematical formalization. Miguel Ángel Corella Montoya Page 53

56 // Method used to retrieve a similarity value between two ontology concepts GetConceptSimilarity (firstconcept, secondconcept, alpha) commonparent <- find the first common parent of the concepts firstdistance <- find the distance of the firstconcept to the commonparent seconddistance <- find the distance of the secondconcet to the commonparent ontdepth <- find the ontology depth firstterm <- firstdistance / (firstdistance + seconddistance) firstterm <- firstterm (seconddistance / (firstdistance + seconddistance)) firstterm <- abs (firstterm) firstterm <- firstterm * (alpha / ontdepth) firstterm <- 1 - firstterm secondterm <- 1 / (min (firstdistance, seconddistance)) thirdterm <- 1 ((max (firstdistance, seconddistance) - 1) / ontdepth) return firstterm * secondterm * thirdterm Miguel Ángel Corella Montoya Page 54

57 5. Heuristic usage example In order to ease the understanding of how the complete heuristic works, a simple example is going to be included here showing the numeric results obtained by the application of the formulas already presented to a concrete set of services. First of all, let s present the services involved in the classification process. Three services are going to be used, all of them belonging to the repository that has been used to perform the evaluation tests of the approach here presented (see Section 6). The three services belong to the same taxonomy category, namely the FaxService category, contained in the service taxonomy used for the evaluation tests of the approach here presented (see Section 6). Finally, all the services have been semantically described using the WSMO Semantic Web service description language (although the classification approach, as it can be seen in the heuristic presentation, is language agnostic). The WSMO descriptions of the services can be found next. Service: FAX Sender Service 1 namespace { _" dc _" type _" } webservice _" capability FAXSender1Capability precondition definedby?input0 memberof type#username and?input1 memberof type#password and?input2 memberof type#faxtext and?input3 memberof type#faxnumber and?input4 memberof type#faxnumbersender and?input5 memberof type#faxsubject. postcondition definedby?output0 memberof type#acknowledgement. Service: FAX Sender Service 2 namespace { _" dc _" type _" } webservice _" capability FAXSender2Capability precondition definedby?input0 memberof type# addresssender and?input1 memberof type#faxsubject and?input2 memberof type#faxnumber and?input3 memberof type#faxtext and?input4 memberof type#personsname. postcondition definedby?output0 memberof type#acknowledgement. Miguel Ángel Corella Montoya Page 55

58 Service: FAX Sender Service 3 namespace { _" dc _" type _" } webservice _" capability ServiceCapability precondition definedby?input0 memberof type#faxnumbersender and?input1 memberof type#faxnumberreceiver and?input2 memberof type#receivername and?input3 memberof type#faxtext. postcondition definedby?output0 memberof type#acknowledgement. Figure 17 shows a fragment (as the complete ontology comprises more than 400 named concepts) of the domain ontology used for the semantic annotation of these services. The fragment shown contains all the different concepts used in the above descriptions. Fig. 17. Fragment of the domain on ontology used for the annotation of the services used in the example. Having then the domain ontology and the services, let s present the classification example performed. Let s suppose FAX Service 1 and Fax Service 2 are already classified in the service repository under the FAXService category, and FAX Service 3 is the new service that is willing to be classified in the same repository. Assuming the repository used in this simplified scenario has only one category (i.e. this FaxService category), the goal pursued in the example is to find the similarity between the unclassified service and that category. Starting from the bottom measure as describe in Section 4, first the similarity between the concepts used to describe each parameter must be computed. I will only include here the details of the calculation between the first input of service Fax Service 3 and Fax Service 1, as all the different similarities are computed in the same way. Then, formula (9) of the heuristic presentation has to be used to compute the similarity between Username (FAX Service 1 input0), being variable t in formula (9), and FaxNumberSender (Fax Service 3 input 0), being variable t in formula (9). In addition, other variables have to be given a value to achieve the similarity measurement, namely α = 0.8; d = 4; d = 3 and h(t) = 8 (the specific meaning of these variables can be seen in the presentation of the heuristic in Section 4). Having all these values the similarity between concepts can be computed as follows. Miguel Ángel Corella Montoya Page 56

59 sim ( tt, ') = 1 1 = The remaining similarity values between concepts are computed using the same formula. Next, the similarity between service parameters is computed as the product of the inputs and outputs similarities, which are computed in turn as the average of the best possible pairwise similarities obtained by an optimal pairing of the parameters. The resulting values from the comparison of Service 3 to Service 1 are shown in Tables 1 and 2 below. Table 1. Input similarity between FAX Sender Service 3 (rows) and FAX Sender Service 1 (columns), best possible pairwise similarities and their average. input0 input1 input2 input3 input4 input5 Avg. input input input input Table 2. Output similarity between FAX Sender Service 3 (rows) and FAX Sender Service 1 (columns), best possible pairwise similarities and their average output0 Avg. output Then, the result values for the comparison of Service 3 to Service 1 are shown in Tables 3 and 4 below. Table 3. Input comparison between FAX Sender Service 3 (rows) and FAX Sender Service 2 (columns), best possible pairwise similarities and their average (i.e. input similarity) input0 input1 input2 input3 input4 Avg. input input input input Table 4. Output comparison between FAX Sender Service 3 (rows) and FAX Sender Service 2 (columns), best possible pairwise similarities and their average (i.e. output similarity) output0 Avg. output According to the above results, the similarity between the unclassified Service 3 and Service 1 is = On the other hand, the similarity between Service 3 and Service 2 is = Then, the last step consists on using formula (5) to find the similarity of Service 3 to the FAX category, as follows: sim(s 3,c) = sim(s 3, s 1 ) + (1 sim(s 3, s 1 ) ) sim(s 3, s 2 ) = = In the result formula, s1, s2 and s3 correspond to Service 1, 2, and 3, and c denotes the FAX service category. The high similarity of to the FAX category was to be expected, since the example uses few services, all of them providing the same functionality with similar inputs and exactly the same outputs. Miguel Ángel Corella Montoya Page 57

60 6. Implementation and experiments In addition to the development of the classification heuristic explained in the previous section, an implementation of the heuristic has been developed as part of this research. The heuristic implementation have been included in a complete test and evaluation framework enabling anyone to test the classification procedure as well as the behaviour of each of the heuristic layers mentioned on the heuristic presentation (see Section 4). In fact, all the tests used to obtain the results which evaluation is included later in this document have been performed using the developed framework. In this section, a complete overview of the framework is presented, including issues such as architecture, features, specific technologies used to develop the proposed architecture, etc. In addition, the presentation of the different tests performed over the framework, and so, over the heuristic is included, as well as the results obtained and an analysis over these results in order to evaluate the accuracy and validity of both the framework and the heuristic presented as solution for the classification problem. Finally, a brief comparison of the results obtained with the results presented by other classification approaches is included The classification framework As it has been mentioned, the classification framework has been developed, mainly, to provide a way to easily test and evaluate the heuristic proposed. In addition, it has been developed focusing on its future usage, not only as part of this research, but as a useful tool for service developers, publishers, etc. This way, by using this tool, almost anyone can classify Web services. Obviously, some requirements are needed in order to achieve the classification, being them: a public repository of classified services, the taxonomy used to classify them, the ontology used to describe them semantically and, of course, some services that are willing to be classified. The concrete information about the repository, ontologies and taxonomies used in this approach for the evaluation can be found on the heuristic evaluation section (see Section 6.2). Before presenting the specifications of the developed framework, let s see some needs or goals I stated for the implementation before building it. Maximize generality: This is the main goal pursued when designing and developing the framework. The complete application has to be composed by elements so general or abstract that almost any service can be classified. This way, it has to be possible that anyone classifies services described in any language, into any repository and under any taxonomy. Moreover, even someone with his own classification approach have to be able to use it in the framework. In conclusion, the developed framework must be as a classification problem skeleton in which different elements involved in classification can be plugged in. Maximize performance: This goal is not really needed in the domain of service classification, as the classification process can be performed off line (and usually it will) and it is not a problem if it lasts for many minutes. Nevertheless, as this framework is about to be used to test the classification results of the proposed heuristic, the complete classification process has to be performed as Miguel Ángel Corella Montoya Page 58

61 fast as possible in order to have results in few seconds to improve classification with low time cost. Let s see, then, the different key elements involved in the architecture used to implement the complete classification framework, and how they contribute to achieve the commented goals. The proposed architecture presenting all the components explained next can be found in Figure 18. Offline Data Controller Repository Database(s) Optimization Database(s) User Interface Ontology Controller Domain ontology(ies) Taxonomy ontology(ies)... Category measure(s) Measure Controller Service measure(s) Parameter measure(s) Fig. 18. Overview of the complete architecture proposed (and used) for the implementation of the classification framework. It represents the four main components of the approach Offline Data Controllers These components, as it can be seen by their name, are in charge of controlling the access to offline information involved in the classification process, i.e. they provide with general means to access previously stored information related to the repository, its services, the classification taxonomy, domain ontology information, etc. Following the main goals pursued by the framework (i.e. the ones mentioned in the introduction of this section) these offline controllers are defined in a very general way in the implementation. This is, with a very simple interface including the main methods that will be needed at run time (e.g. taxonomy retrieval, repository information retrieval including services, parameters, etc.). This generic interface will enable the usage of any existent mechanism for storing offline information (e.g. databases, files, etc.) and any specific implementation or solution of those mechanisms (e.g. MySQL 28, Oracle 29, different OS file systems, etc.). This way, any taxonomy or repository can be used as long as it is stored persistently and a specification (implementation) of the provided interface is developed in order to manage the internal procedure (i.e. read and write actions) needed for the used storage mechanism. In addition, the fact of using offline data controllers enables the achievement of the second goal mentioned in the introduction of this section: maximize performance. The already commented generality of the offline data controller component enable: Miguel Ángel Corella Montoya Page 59

62 The creation of methods allowing the storage of results obtained in offline (i.e. not at classification run time) processes. For example, domain ontology concepts similarity values or parsed and optimally structured service information (e.g. operations, parameters, ontology concepts describing the parameters, etc.). The optimization of the methods described in the offline data controllers interface, and so, the performance offered by them will depend on the user needs, as it will be this user the one who will choose or develop the controller internal procedures. Once the different features enabled by the usage of this generic components has been explained, I will move forward taking a look at the specific implementation that has been developed in order to enable the usage of the classification heuristic proposed in this research work. Concept offline data controller: This first type of specification is in charge of managing the offline information related to ontology concepts, this is, the information involved in the last level of the heuristic. This offline information is composed by a complete table of concept to concept similarities (as the bottom level of the heuristic, the comparison between domain ontology concepts is the task that will be performed more times during the classification process, so, having it pre calculated will reduce the classification time cost) and a table containing ontology concept depths (which will ease the pre calculation of the concept similarity table). Service offline data controller: The next specification of the offline controllers is in charge of maintaining all the information related to services in order to avoid parsing the complete repository each time a service classification is about to be done. So, the information managed by this component comprises a table containing all the needed information of the complete service repository used (e.g. service location, operations, messages and parameter names, types and ontology concepts used to describe each service parameter). Category offline data controller: Finally, this last component is responsible for managing the information used by the first heuristic level, this is, the one providing with the service classification. So, the information used at this level is, obviously, the complete taxonomy used in the classification as well as a table containing the relationships between services and categories (as all the services stored in the repository are already classified). As it can be seen, the implementation has been developed following also the three-level approach that has been used in the heuristic design. Each of the controllers is only in charge of the specific offline data actions relevant for its focus granularity level, providing with a very modularized design that helps to understand the complete architecture and in to add possible future updates or improvements Ontology Controllers Components belonging to this second type are in charge of controlling the access to semantic information represented and stored in ontologies (e.g. domain ontologies, operation ontologies, or even service taxonomies represented in an ontological format). Miguel Ángel Corella Montoya Page 60

63 In this case, the goal pursued by including this type of components in the framework implementation is to provide with means to ease the inclusion of any type of semantic information into the classification process (e.g. domain ontologies used for service annotation, service taxonomies represented in an ontological way, etc.). In conclusion, these components endow the framework with a high level of semantic interoperability. In the concrete specification of this components created to support the heuristic presented, two ontology controllers have been used, managing two different types of ontology. The first one, containing the description of the different concepts used to develop the semantic descriptions of the repository services, i.e. the domain ontology. The second one, containing the description of the complete service taxonomy used (e.g. the different service categories the heuristic will use to perform the service classification). However, the technology used for the implementation of the controllers is exactly the same: the Jena Semantic Web Framework 30. In concrete the API it provides to manage OWL ontologies have been used, as the domain ontology used in the semantic annotation of Web services, as well as the ontology representing the service taxonomy used, are both described in that ontology language Measure Controllers Measure controllers are in charge of managing the different measures, operations and formulas used in the classification heuristic (e.g. if we use the measures proposed in this work, we would have three controllers, one for each granularity level commented in previous section). As it happens with the ontology controllers, this type of components has been created in order to maximize the generality, flexibility and interoperability (with different classification approaches) of the implemented framework. The goal is to provide a common interface (divided in the already commented three granularity levels) abstracting the specific details of how the comparison between service elements is performed in the heuristic. This way, any sort of similarity measure can be plugged in the framework in a very flexible and simple way enabling: The usage of the application by any user (e.g. having its own repository and implemented offline and ontology controllers). The comparison of the presented approach with other existent ones (e.g. the similarity measures included in the SimPack 31 library provided by the Dynamic and Distributed Information Systems Group (DDIS) 32 ). The simple and fast test, evaluation, modification and evolution of measures Miguel Ángel Corella Montoya Page 61

Particularly, for the implementation provided to support the presented heuristic I have developed measure controller specifications for all the three levels of the approach.

64 Particularly, for the implementation provided to support the presented heuristic I have developed measure controller specifications for all the three levels of the approach. This way, each of the formulas (shown in Section 4) appeared as a result of the formal definition of each heuristic, have been implemented as three different measure controllers. In addition, and quite obviously each of these specific measure controllers makes use of the previously mentioned specifications of offline data and ontology controllers User interface Finally, and quite obvious, the application user interface. This last component allows the framework to provide the functionalities enabled by all the previous components in a simple, intuitive and comfortable way in order to allow every user to get rid of the classification problem (or test similarity measures already developed) with minimum effort load. Figure 19 shows a screenshot of the user interface developed for the complete classification framework. Fig. 19. Screenshot of the classification framework implementing the classification heuristic proposed as central point of this research work Experiments and evaluation Using the classification framework already explained, the heuristic and measures proposed in Section 4 have been tested and evaluated in order to obtain some formal (and numerical) results proving the correctness and validity of the proposed approach and, in addition, enabling the formal comparison of this approach with the ones proposed in different research trends (as the ones mentioned in Section 2). To perform the different correctness and performance tests, the first thing needed is a service repository, i.e. a set of services stored in some way (not necessarily an UDDI repository). As the classification heuristic uses Semantic Web service descriptions already classified under a given taxonomy to perform the classification process, we need that the used service repository is populated with real Semantic Miguel Ángel Corella Montoya Page 62

SOAP Protocol CS 183 Hypermedia and the Web

SOAP Protocol CS 183 Hypermedia and the Web SOAP is most typically viewed as a remote procedure call technology. A remote procedure call involves a client machine making a request (a procedure call) of