Online Communication and Marketing (Day 2) Zaenal Akbar zaenal.akbar@sti2.at Copyright 2015 STI INNSBRUCK
Topic & Agenda Topic - World Wide Web, Semantic Web and Schema.org Agenda 09.00-10.30 Introduction The Internet and the World Wide Web Break - 15 minutes 10.45-12.15 Semantic Web Markup Languages Schema.org Break - 30 minutes 12.45-14.15 Pro-seminar Working with task in hand in a group (3-5 students) 2
1. INTRODUCTION 3
What travel consumers do online? (*) ETOA, The New Online Travel Consumer, 2014, http://www.etoa.org 4
Online sources of travel inspiration (*) Think with Google, The 2014 Traveler s Road to Decision, 2014, https://www.thinkwithgoogle.com 5
Top 10 online sources used in travel planning (*) Think with Google, The 2014 Traveler s Road to Decision, 2014, https://www.thinkwithgoogle.com 6
Typed into Google when start to plan a trip (*) Think with Google, The 2014 Traveler s Road to Decision, 2014, https://www.thinkwithgoogle.com 7
Events in Landeck? 8
Events in Vienna? 9
Hotel Schwarzer Adler? 10
How old is david alaba? 11
How is this possible? The answer is annotation of web pages with Structured Semantic Data - Search engines can more easily organized and display them in creative ways 12
How is it relevant? Semantically annotated web pages will increase the pages online visibility Higher ranked web pages could attract more visitors More visitors means more potential customers More potential customers increases your business success 13
2. FUNDAMENTALS OF THE INTERNET Picture taken from: http://querosaber.sapo.pt/media/galeria_multimedia_v2/offline/19577.0.original.jpg 14
The Internet US Government (1960s): robust, fault-tolerant communication via computer networks ARPANET (1980s): backbone for interconnection of regional academic and military networks 1990s: birth of modern Internet : merging of Academic networks Military networks Commercial enterprise networks Source: https://en.wikipedia.org/wiki/internet https://en.wikipedia.org/wiki/file:internet_map_1024_-_transparent,_inverted.png 15
The Internet http://www.internetworldstats.com/emarketing.htm http://www.bitrebels.com/technology/the-growth-of-the-internet-infographic/ 16
The Internet Architecture Globally connected network of computers Currently 2.9 billion things connected [1] Estimated 25 billion by the end of 2020 Internet of things Source: https://en.wikipedia.org/wiki/internet [1] http://www.zdnet.com/article/25-billion-connected-devices-by-2020-to-build-the-internet-of-things/ 17
The Internet Evolution A Mathemati cal Theory of Memex Concei ved 1945 Silic on Chip 1958 First Vast Comput er Network Envision ed 1962 Packet Switchi ng Invente d 1964 Hypert ext Invente d 1965 ARPAN ET 1969 TCP/IP Create d 1972 Inter net Nam ed and Goes WWW Create d 1989 Mosai c Create d 1993 Age of ecom merce Begins 1995 TCP/I P 1984 Communic ation 1948 1945 1995 Source: http://www.isoc.org/internet/history2002_0918_internet_history_and_growth.ppt 18
The Internet Energy use: 2011 Estimation: 170 307 GW, less than 2% of energy used by humanity Estimation includes building, operating and replacing: 750M laptops 1B smartphones 100M servers Routers, cell towers, optical switches, Wi-Fi transmitters and cloud storage devices 19
The Internet Services based on the Internet: Communication E-mail: messages and attachments are sent over the internet infrastructure. Protocols in use: SMTP, POP, IMAP Chat: Short-message based communication Protocol: eg. IRC (Internet relay chat) Typically: install a client, connect to a server, start conversation, examples: ICQ, Skype, Talker, Windows Live Messenger Internet Telephonie: (Skype,...) Internet carries voice traffic calls are free or cost much less serious competitor to traditional telephony aka VoIP = Voice over Internet Protocol 20
The Internet Services based on the Internet: Data transfer File sharing uploading file to server for storing and sharing: FTP peer-to-peer sharing of large files: Torrent (BitTorrent) Streaming media: real-time delivery of digital media for the immediate consumption or enjoyment by end users Live: Radio stations, TV,...: fm4, orf TVthek, das erst mediathek On Demand: Podcasts, Netflix, Sky, Spotify, Pandora,... Webcams: Weather cameras, animal watch, traffic monitoring, surveillance, sports, online live shows, video chat, live demos,... 21
The Internet Services based on the Internet: World Wide Web 22
3. THE WORLD WIDE WEB Picture taken from: http://webfoundation.org/about/vision/history-of-the-web/ 23
The World Wide Web 29 October 2014, W3C20 ANNIVERSARY SYMPOSIUM Vinton G. Cerf & Sir Tim Berners-Lee 24
The World Wide Web Invented by Sir Tim Berners-Lee while employed at CERN in 1989 TBL wrote a proposal for a system called World Wide Web [1] TBL wrote the first Web browser and Web server wrote the first Webpage [1] http://www.w3.org/history/1989/proposal.html 25
The World Wide Web information space where documents and other web resources are identified by URLs, interlinked by hypertext links, and can be accessed via the Internet URI Url Hyperlink (Link) URI Website URI 26
The World Wide Web Web 1.0 first stage in the WWW the static web collection of text documents and other resources, linked by hyperlinks and URLs, usually accessed by web browsers, from web servers Netscape Netscape is associated with the breakthrough of the Web. Netscape had rapidly a large user community making attractive for others to present their information on the Web. Google Google is the incarnation of Web 1.0 mega grows Google indexed already in 2008 more than 1 trillion pages [*] Google and other similar search engines turned out that a piece of information can be faster found again on the Web than in the own bookmark list [*] http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html 27
The World Wide Web Web 2.0 The term "Web 2.0" (2004 present) is commonly associated with web applications that facilitate interactive information sharing, interoperability, user-centered design, and collaboration on the World Wide Web http://en.wikipedia.org/wiki/web_2.0 Web 2.0 is a vaguely defined phrase referring to various topics such as social networking sites, wikis, communication tools, and folksonomies. Tim Berners-Lee is right that all these ideas are already underlying his original web ideas, however, there are differences in emphasis that may cause a qualitative change. With Web 1.0 technology a significant amount of software skills and investment in software was necessary to publish information. Web 2.0 technology changed this dramatically. 28
The World Wide Web The four major breakthroughs of Web 2.0 are: 1. Blurring the distinction between content consumers and content providers. 2. Moving from media for individuals towards media for communities. 3. Blurring the distinction between service consumers and service providers 4. Integrating human and machine computing in a new and innovative way 29
The World Wide Web 1. Blurring the distinction between content consumers and content providers Wiki, Blogs, and Twitter turned the publication of text in mass phenomena, as flickr and youtube did for multimedia 30
The World Wide Web 2. Moving from media for individuals towards media for communities Social web sites such as del.icio.us, facebook, FOAF, linkedin, myspace and Xing allow communities of users to smoothly interweave their information and activities 31
The World Wide Web 3. Blurring the distinction between service consumers and service providers Mashups allow web users to easy integrate services in their web site that were implemented by third parties 32
The World Wide Web 4. Integrating human and machine computing in a new and innovative way Amazon Mechanical Turk - allows to access human services through a web service interface blurring the distinction between manually and automatically provided services 33
The World Wide Web (*) K. Bratcher, Web: The History of the Internet, https://www.tes.com/lessons/hmm6kqb3x9wzpw/web-the-history-of-the-internet 34
The World Wide Web But... The current Web has its limitations when it comes to: 1. finding relevant information 2. extracting relevant information 3. combining and reusing information 35
The World Wide Web Finding information on the current Web is based on keyword search Keyword search has a limited recall and precision due to: Synonyms: e.g. Searching information about Cars will ignore Web pages that contain the word Automobiles even though the information on these pages could be relevant Homonyms: e.g. Searching information about Jaguar will bring up pages containing information about both Jaguar (the car brand) and Jaguar (the animal) even though the user is interested only in one of them 36
The World Wide Web 37
The World Wide Web Keyword search has a limited recall and precision due also to: Spelling variants: e.g. organize in American English vs. organise in British English Spelling mistakes Multiple languages i.e. information about same topics in published on the Web on different languages (English, German, Italian, ) Current search engines provide no means to specify the relation between a resource and a term e.g. sell / buy 38
The World Wide Web One-fit-all automatic solution for extracting information from Web pages is not possible due to different formats, different syntaxes Even from a single Web page is difficult to extract the relevant information Which book is about the web? What is the price of the book? 39
The World Wide Web Extracting information from current web sites can be done using wrappers WEB HTML pages Layout Wrapper extract annotate structure Structured Data, Databases, XML Structure 40
The World Wide Web The actual extraction of information from web sites is specified using standards such as XSL Transformation (XSLT) *) Extracted information can be stored as structured data in XML format or databases. However, using wrappers do not really scale because the actual extraction of information depends again on the web site format and layout *) https://en.wikipedia.org/wiki/xslt 41
The World Wide Web Tasks often require to combine data on the Web 1. Searching for the same information in different digital libraries 2. Information may come from different websites and needs to be combined 42
The World Wide Web 1. Searches for the same information in different digital libraries Example: I want to travel from Innsbruck to Rome. 43
The World Wide Web 2. Information may come from different websites and needs to be combined Example: I want to travel from Innsbruck to Rome where I want to stay in a hotel and visit the city 44
The World Wide Web Increasing automatic linking among data Increasing recall and precision in search Increasing automation in data integration Increasing automation in the service life cycle 45
Summary Word Wide Web is a service run on the Internet Web 1.0 (the static web) Contents are available on web servers, accessible through browsers Limited to the passive viewing of content Web 2.0 (the social web) User-generated content High interaction and collaboration of users in a virtual community 46
Summary Extracting information from the Web is challenging: Use different formats, different syntaxes Located on distributed sources Search information based on keyword is less accurate Search information about Car, but not Automobiles? Search information about Jaguar (the car brand) or Jaguar (the animal)? Adding semantics to data and services is the solution! Technical solution: The Semantic Web 47
4. THE SEMANTIC WEB 48
The Semantic Web Short motivation movie: https://www.youtube.com/watch?v=off08as3sim 49
The Semantic Web More than 2 billion users more than 50 billion pages Static WWW URI, HTML, HTTP 50
The Semantic Web Serious problems in information finding information extracting information representing information interpreting information maintaining Static WWW Semantic Web URI, HTML, HTTP RDF, RDF(S), OWL 51
The Semantic Web The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. T. Berners-Lee, J. Hendler, O. Lassila, The Semantic Web, Scientific American, May 2001 52
The Semantic Web The next generation of the WWW Information has machine-processable and machine-understandable semantics Not a separate Web but an augmentation of the current one The backbone of Semantic Web are ontologies 53
Ontology definition unambiguous terminology definitions conceptual model of a domain (ontological theory) formal, explicit specification of a shared conceptualization machine-readability with computational semantics commonly accepted understanding Gruber, Toward principles for the design of ontologies used or knowledge sharing?, Int. J. Hum.-Comput. Stud., vol. 43, no. 5-6,1995
well-defined meaning An ontology is an explicit specification of a conceptualization Gruber, Toward principles for the design of ontologies used for knowledge sharing?, Int. J. Hum.-Comput. Stud., vol. 43, no. 5-6,1995. Ontologies are the modeling foundations to Semantic Web They provide the well-defined meaning for information
explicit, specification, conceptualization, An ontology is: A conceptualization An ontology is a model of the most relevant concepts of a phenomenon from the real world Explicit The model explicitly states the type of the concepts, the relationships between them and the constraints on their use Formal The ontology has to be machine readable (the use of the natural language is excluded) Shared The knowledge contained in the ontology is consensual, i.e. it has been accepted by a group of people. Studer, Benjamins, D. Fensel, Knowledge engineering: Principles and methods, Data Knowledge Engineering, vol. 25, no. 1-2, 1998.
Ontology example name Concept email conceptual entity of the domain Property Person matr. nr. attribute describing a concept isa hierarchy (taxonomy) Relation relationship between concepts or properties research field Student Professor attends Axiom coherency description between Concepts / Properties / Relations via logical expressions holds Lecture lecture nr. topic holds(professor, Lecture) => Lecture.topic = Professor.researchField
Types of ontologies describe very general concepts like space, time, event, which are independent of a particular problem or domain describe the vocabulary related to a generic domain by specializing the concepts introduced in the top-level ontology. Top Level O., Generic O. Core O., Foundational O., High-level O, Upper O. Domain Ontology Task & Problem-solving Ontology Application Ontology describe the vocabulary related to a generic task or activity by specializing the top-level ontologies. the most specific ontologies. Concepts in application ontologies often correspond to roles played by domain entities while performing a certain activity. [Guarino, 98] Formal Ontology in Information Systems, http://www.loa-cnr.it/papers/fois98.pdf
The Semantic Web is about Web Data Annotation connecting (syntactic) Web objects, like text chunks, images, to their semantic notion e.g., this image is about Innsbruck, Dieter Fensel is a professor Data Linking on the Web (Web of Data) global networking of knowledge through URI, RDF, and SPARQL e.g., connecting my calendar with my rss feeds, my pictures,... Data Integration over the Web seamless integration of data based on different conceptual models e.g., integrating data coming from my two favorite book sellers 59
Web Data Annotation (*) Images: http://mist-deid.sourceforge.net/docs_2_0/html/use_ui.html 60
Data Linking on the Web (Web of Data) (*) Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
Data Linking on the Web Principles Use URIs as names for things anything, not just documents you are not your homepage information resources and non-information resources Use HTTP URIs globally unique names, distributed ownership allows people to look up those names Provide useful information in RDF when someone looks up a URI Include RDF links to other URIs to enable discovery of related information 62
Data Linking on the Web Google Knowledge Graph: 1. Find the right thing 2. Get the best summary 3. Go deeper and broader (*) https://googleblog.blogspot.co.at/2012/05/introducing-knowledge-graph-things-not.html 63
Data integration over the Web Data integration involves combining data residing in different sources and providing user with a unified view of these data Data integration over the Web can be implemented as follows: 1. Export the data sets to be integrated as RDF graphs 2. Merge identical resources (i.e. resources having the same URI) from different data sets 3. Start making queries on the integrated data, queries that were not possible on the individual data sets.
Data integration over the Web 1. Export first data set as RDF graph For example the following RDF graph contains information about book The Glass Palace by Amitav Ghosh http://www.w3.org/people/ivan/corepresentations/swtutorial/slides.pdf
Data integration over the Web 1. Export second data set as RDF graph Information about the same book but in French this time is modeled in RDF graph below http://www.w3.org/people/ivan/corepresentations/swtutorial/slides.pdf
Data Integration over the Web 2. Merge identical resources (i.e. resources having the same URI) from different data sets Same URI = Same resource http://www.w3.org/people/ivan/corepresentations/swtutorial/slides.pdf
Data integration over the Web 2. Merge identical resources (i.e. resources having the same URI) from different data sets http://www.w3.org/people/ivan/corepresentations/swtutorial/slides.pdf
Data integration over the Web 3. Start making queries on the integrated data A user of the second dataset may ask queries like: give me the title of the original book This information is not in the second dataset This information can be however retrieved from the integrated dataset, in which the second dataset was connected with the the first dataset
Summary The Semantic Web extends current web with machine-processable and machine-understandable semantics Its backbones are Ontologies It is about Web Data Annotation Data Linking on the Web Data Integration over the Web
5. MARKUP LANGUAGES Picture taken from: http://webfoundation.org/about/vision/history-of-the-web/ 71
Markup language A system for annotating a document in a way that is distinguishable from the text Digital annotation media is known as tags <strong>online Communication and Marketing</strong> <underline>landeck, Tyrol, Austria</underline> (*) https://en.wikipedia.org/wiki/markup_language (*) Image: https://persistentenlightenment.wordpress.com/2013/04/14/popperberlinpart/ 72
HTML HyperText Markup Language Invented by Tim Berners-Lee at CERN 1993 Current Version: 5 Standard markup language to create web pages Interpreted by browser HTML describes structure of website Semantically With cues for representation 73
HTML 74
XML Extensible Markup Language A markup language for encoding documents in a format that is both human-readable and machine readable XML emphasize simplicity, generality, and usability across the Internet (*) https://en.wikipedia.org/wiki/xml 75
JSON JavaScript Object Notation An open-standard format that uses human-readable text to transmit data objects consisting of attribute-value pairs <name> : <value> <key> : <value> <field> : <value> <attribute> : <value> (*) https://en.wikipedia.org/wiki/json 76
JSON-LD JavaScript Object Notation for Linked Data, provides links called context between objects in a JSON to concepts in an Ontology Playground: http://json-ld.org/playground/ (*) https://en.wikipedia.org/wiki/json-ld 77
Microdata HTML specification used to embed metadata within existing content on web pages (*) https://en.wikipedia.org/wiki/microdata_(html) 78
RDFa Resource Description Framework in Attributes A set of attribute-level extensions to HTML, XML for embedding rich metadata within web documents (*) https://en.wikipedia.org/wiki/rdfa 79
Summary Markup languages - used to annotate a document in a way distinguishable HyperText Markup Language (HTML) Extensible Markup Language (XML) JavaScript Object Notation (JSON) JavaScript Object Notation for Linked Data (JSON-LD) Microdata Resource Description Framework in Attributes (RDFa) 80
6. SCHEMA.ORG 81
Schema.org Initiative founded 2011, by: Bing Google Yahoo! Yandex Vocabulary for structuring data in web sites Embedded into html Microdata RDFa JSON-LD 82
Schema.org 83
Recipes A set of instructions that describes how to prepare or make something 84
Events An organized event that people may attend at a particular time and place 85
Reviews A review of an item such as restaurant, movie, or store 86
Products Information about a product, including price, availability, and review ratings 87
Schema.org (*) Google s testing tool: https://search.google.com/structured-data/testing-tool 88
Summary World Wide Web Evolution Need a Semantic Structured Data Markup Languages Markup Vocabulary 89