Semantic Web Lecture Part 1. Prof. Do van Thanh

Semantic Web Lecture Part 1 Prof. Do van Thanh

Overview of the lecture Part 1 Why Semantic Web? Part 2 Semantic Web components: XML - XML Schema Part 3 - Semantic Web components: RDF RDF Schema Part 4 - Semantic Web components: DAML - OWL

Why Semantic Web?

Let us start with the current World Wide Web Definition of the WWW: Wide-area hypermedia information retrieval initiative aiming to give universal access to a large universe of documents

How was the Web created? The Web began in March 1989, when Tim Berners- Lee of CERN (European Organization for Nuclear Research) proposed the project to be used as a means of transporting research and ideas effectively throughout the organization. Taken over by World Wide Web Consortium (W3C), an International industry consortium founded in 1994 whose purpose is to develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential.

Web server Documents in HTML WWW Web browser Web surfing!

How does the Web works The Web works under the popular client-server model: The client is called Web browser The server is called Web server

A web browser A software application that enables a user to display and interact with HTML documents hosted by web servers or held in a file system. Examples are Microsoft Internet Explorer, Mozilla Firefox, Opera, Netscape Navigator and Safari. A browser is the most commonly used kind of user agent.

web server A computer that is responsible for accepting HTTP requests from clients, the web browsers, and serving them web pages, which are usually HTML documents.

Web browser Uniform Resource Locator Web server 1. http://www.telenor.com/doc1 3. return doc 1 2. locate & fetch doc 1 4. interpret doc 1 5. Display doc 1 on the screen Tags <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>panda - Personal Area Networks & Data Applications</title> <meta http-equiv="content-type" content="text/html; charset=iso-8859-1"> <link href="main.css" rel="stylesheet" type="text/css"> </head> <body> <table width="75%" border="0"> <tr> <td><img src="panda.gif" width="128" height="128" align="left"></td> <td class="textnormal"> <h1>panda, Personal Area Networks & Data Applications</h1> <p>welcome to PANDA's place on the web</p> <p><a href="info.html">[klick here to read about the project]</a></p> </td> </tr> </table> </body> </html> HyperText

HTTP The language that Web clients and servers use to communicate with each other is called the HyperText Transmission Protocol (HTTP). All Web clients and servers must be able to speak HTTP in order to send and receive hypermedia documents. For this reason, Web servers are often called HTTP servers. The phrase "World-Wide Web" is often used to refer to the collective network of servers speaking HTTP as well as the global body of information available using the protocol.

URL Uniform Resource Locator Method by which documents or data are addressed in the World Wide Web. Contains the following information: the internet name of the site containing the resource (document or data) the type of service the resource is served by (eg, HTTP, Gopher, WAIS) the Internet port number of the service. If this is omitted the browser assumes a commonly accepted default value. the location of the resource in the directory structure of the server. For more info: http://www.w3.org/hypertext/www/addressing/url/overview.html.

Structure of URL The following is an outline of the most common form of a URL: http://www.address.edu:1234/path/subdir/file.ext Service Host Port File and resource details http://info.cern.ch/hypertext/www/addressing/url/overview.h tml Retrieve the named HTML document from the CERN http server running on the default port.

Hypertext The operation of the Web relies on hypertext as its means of interacting with users Hypertext is basically the same as regular text - it can be stored, read, searched, or edited - with an important exception: hypertext contains connections within the text to other documents.

Hypermedia Hypermedia is hypertext with a difference - hypermedia documents contain links not only to other pieces of text, but also to other forms of media - sounds, images, and movies.

HTML The standard language the Web uses for creating and recognizing hypermedia documents is the HyperText Markup Language (HTML). Loosely related to, but technically not a subset of, the Standard Generalized Markup Language (SGML), a document formatting language used widely in some computing circles.

HTML Designed to specify the logical organisation of a document, with important hypertext extensions. Not to be the language of a WYSIWYG word processor such as Word or WordPerfect. This choice was made because the same HTML document may be viewed by many different "browsers", of very different abilities. For ex: HTML allows you to mark selections of text as titles or paragraphs, and then leaves the interpretation of these marked elements up to the browser. For example one browser may indent the beginning of a paragraph, while another may only leave a blank line.

HTML Document Structure A HTML is structured into two parts, the HEAD, and the BODY. The HEAD contains information about the document that is not generally displayed with the document, such as its TITLE. The BODY contains the body of the text, and is where you place the document material to be displayed. Elements allowed inside the HEAD, such as TITLE, are not allowed inside the BODY, and vice versa.

HTML Document Structure HEAD BODY

Versions of HTML HTML 2.0 the first definitive version, HTML 3 (late 1995), an ambitious effort by Dave Raggett was never completed or implemented. HTML 3.2 was the next official version, integrating support for TABLES, image, heading and other element ALIGN attributes, and a few other finicky details. HTML 3.2 is the current "universal" dialect -- essentially all browsers understand HTML 3.2. But was, however, missing some of the Netscape/Microsoft extensions, such as FRAMEs, EMBED and APPLET. Support for these (after a fashion) came in HTML 4.0

Versions of HTML HTML 4.01 the current official standard, includes support for most of the proprietary extensions, support for extra features (Internationalized documents, support for Cascading Style Sheets, extra TABLE, FORM, and JavaScript enhancements), not universally supported. For more info: http://www.utoronto.ca/webdocs/htmldocs/html_spec/ht ml.html documents. The evolution of HTML has now ceased -- HTML 4.01 is the last version of HTML. For the future, HTML is being replaced by a new language, called XHTML -- for the extensible HyperText Markup Language.

Limitations of the current Web Documents primarily written in HTML, a language that is useful for describing, with an emphasis on visual presentation, a body of structured text interspersed with multimedia objects such as images and interactive forms. HTML has limited ability to classify the blocks of text on a page, apart from the roles they play in a typical document's organization and in the desired visual layout

Limitations of the current Web Focus on publishing and presenting documents to human beings But one must know where things are located: The URL: http://www.w3.org/addressing/. indicates only address of a web page and not what exactly it contains Difficult to find things People need somehow to get information about what are contained where e.g. from email, from advertisement, etc.

Limitations of the current Web One can argue that Google is great for finding things. It is not always true: Ex: You want to find the names and addresses of the supermarket in Alicante by using Google

No hit with the wrong key word List contains numerous results But none is correct!

A right word is required

All the supermarkets in the Alicante region are listed With HTML the semantic binding between supermarket and hipermercado cannot be expressed

Another example From this web page, a person can extract and compare the prices of 4 phones It is quite difficult to write a program to do so because HTML does not specify the phone types nor prices HTML does have the necessary semantics

What is the Semantic Web?

The Semantic Web Will enable users to search not only for documents that contain data, but also for the desired data itself, through semantic identification and location techniques Will support software agents that are able not only to locate data, to perform meaningful tasks with data automatically and on the fly that today must be done manually and episodically by computer users.

The Semantic Web Uses the descriptive technologies RDF and OWL And the data-centric, customizable markup language XML. To provide descriptions that supplement or replace the content of Web documents. The content will be given explicit meaning, making it easier for machines to automatically process and integrate information available on the Web. The content may manifest as descriptive data stored in Web-accessible databases or as markup within documents (particularly, in XHTML interspersed with XML, or, more often, purely in XML, with layout/rendering cues stored separately). The machine-readable descriptions allow content managers to add meaning to the content, thereby facilitating automated information gathering and research by computers.