Table of Contents WWW World Wide Web Aka The Internet Karst Koymans Informatics Institute University of Amsterdam (version 44, 2014/10/06 11:35:56 UTC) Tuesday, October 7, 2014 WWW history Basic concepts Client side Server side Dynamic content URI, URL, URN Protocol Markup (a quick recap from Essential Skills) Hybrids (the way forward?!) WWW history (1) WWW history (2) 1968 Doug Engelbart Earlier than ARPANET or UNIX Inventor of the mouse and of hypermedia and of videoconferencing 1989 Tim Berners Lee 20 years after Engelbart First Information management proposal at CERN 1990 WorldWideWeb browser (later Nexus) Developed by Berners Lee on NEXTSTEP World Wide Web (with spaces) is the abstract information space
WWW history (3) WWW history (4) More browsers 1992 Line mode browser 11 ViolaWWW browser for X Windows 1993 Mosaic for X Windows (Marc Andreessen, founder of Netscape) 1994 Mozilla (Netscape) W3C founded (MIT, INRIA, Keio) In 2003 ERCIM took the place of INRIA In 2012 http://wwwwebplatformorg/ was convened with an invalid certificate (20131003) for https In 2013 Beihang University (China) was invited to join hosting W3C 1995 Microsoft launches Internet Explorer Start of the Browser Wars WWW history (5) WWW concepts Standards based browsers and layout engines Gecko (Firefox, Seamonkey, Netscape) KHTML (Konqueror) WebKit/WebCore (Apple Safari, Google Chrome) Blink derived from WebCore (Chrome, Chromium, Opera) Trident (Microsoft Internet Explorer proprietary 1 ) Presto (Opera proprietary 2 ) Web browser (client) Web server (server) URLs and HTTP (protocol) HTML and other markup (content) Dynamic web pages (interaction) Document Object Model (page model) 3-tier model (architecture) 1 freely available 2 freely and commercially available
Browser Web server Client making requests for web pages Interface for the user Graphical User Interface IE, Safari, Firefox, Opera, Chrome, Chromium, Terminal based lynx, links, w3m, Server responding to requests NCSA HTTPd, Apache HTTP Server Internet Information Services (IIS, Microsoft) Sun Java System Web Server (Sun ONE, iplanet) Partly open sourced (Open Web Server) Looks for information in or via files, databases, scripts locally or remote See also http://wwwnetcraftcom/ Lesser known or more recent web servers Netcraft survey October 2006 nginx ( engine X ) also reverse proxy and load balancer high performance Google Web Server (GWS) Google custom (mystery?) web server also runs Blogger, Google Docs, lighttpd ( lighty ) lightweight optimised for speed qqcom Chinese IM and blog service (mostly private now)
Netcraft survey November 2007 Netcraft survey September 2008 Netcraft survey September 2009 Netcraft survey September 2010
Netcraft survey September 2011 Netcraft survey December 2012 (active sites) Netcraft survey October 2013 (all sites) Netcraft survey September 2014 (all sites)
Netcraft survey September 2014 (active sites) 3-tier model Presentation layer (user tier) Communicates with (client) browser Business Logic layer (business tier) Applies business rules Data layer (data tier) Interacts with data store (database) LAMP model Dynamic web pages (client side) (L)inux as underlying OS (A)pache as presentation layer (M)ySQL as data layer (P)HP as business logic layer Some people use (P)erl and/or (P)ython Client side: Dynamic HTML (DHTML) Works with the Document Object Model (DOM) Executes ECMAscript (standard) programs JavaScript (Netscape), JScript (Microsoft) Other client side techniques AJAX (Asynchronous Javascript and XML) for interactive web pages Applets (JVM) or ActiveX controls Flash, Silverlight, HTML5
ECMAScript engines Dynamic web pages (server side) ECMAScript engines SpiderMonkey/TraceMonkey (Firefox, Seamonkey) KJS (Konqueror) JavaScriptCore/SquirrelFish/Nitro (Apple Safari) V8 (Google Chrome) Chakra (Microsoft Internet Explorer 9, a JScript engine) Futhark/Carakan (Opera) Server side dynamic web page technologies, which are able to generate unique content for each call or user Common Gateway Interface (CGI) Server Side Includes (SSI) Server-side scripting PHP, JSP, ASP Servlets: server-side java applications ASPNET: successor of ASP, part of the NET framework Document Object Model Uniform Resource Identifiers (RFC 3986) An ongoing W3C activity for standardizing Dynamic HTML Level 0: proprietary API for HTML (XML) documents refers to what existed before the standardization Level 1: standardized API for HTML (XML) documents Level 2: modularized with support for events and styles Level 3: support for loading and saving and for keyboard events Level 4: DOM4 working draft (July 2014) Now part of the DOM Living Specification Work now integrated with HTML5 effort A URI can have two forms A URL (Uniform Resource Locator) https://wwwos3nl/ ftp://ftpnluugnl/pub/os/bsd/ Often identifies the location and access mechanism of the resource A URN (Uniform Resource Name) urn:<nid>:<nid-specific> urn:ietf:rfc:2648 urn:isbn:0-97-606188-0 Gives a name to a resource in a certain namespace
Uniform Resource Locators HTTP URLs <scheme>:<scheme-specific> <scheme> is often some Internet protocol http, ftp, telnet, rtsp <scheme-specific> often starts with // to indicate that an Internet address (IP address or DNS domain name) follows Other schemes: mailto, news <scheme> : <hierarchy-indicator> http:// <authority> [<userinfo> @ ] <host>[ : <port>] <path> either begins with / or is empty, see RFC 3986? <query> gives extra parameters for identifying the resource # <fragment> secondary (sub)resource, mostly used in URI-references 3 3 A URI reference is a relative URI, which has to be completed by software HTTP HTTP request Uses <CR><LF> as end of line convention HTTP request/response request/response line request/response headers empty line optional body <method> <path 4 > <HTTP-version> GET (to get/load a resource) HEAD (to fetch only the headers) PUT (to store a resource) POST (to provide input in the body to server side scripts) DELETE (to delete a resource) OPTIONS (to query the server options) 4 The path may include a query, but no fragment
HTTP response Request headers <HTTP-version> <status-code> <comment> HTTP/11 200 OK HTTP/11 301 Moved Permanently HTTP/11 400 Bad Request HTTP/11 404 Not Found HTTP/11 501 Method Not Implemented User-Agent: <client identification> Host: <(virtual) server name> Cookie: <stored user tracking information> Date: <date/time message sent> Authorization: <credentials> many more Response headers Markup Content-Type: <MIME type> Content-Length: <page length in bytes> Last-Modified: <date of last page change> Set-Cookie: <string to keep state> Location: <redirection information> many more SGML HTML XML XHTML
SGML HTML Standard Generalized Markup Language <!DOCTYPE > <!ELEMENT > <!ATTLIST > <!ENTITY > DTD: Document Type Definition Instantiation of SGML <!DOCTYPE HTML PUBLIC -/W3C/DTD HTML 401//EN http://wwww3org/tr/html4/strictdtd > All else is bogus: <BLINK>, XML XHTML Simpler reformulation of SGML Some differences Every start tag must have a close tag Attribute values must always be quoted <?xml > processing instructions reserved XML specification of HTML Still needs a DTD http://wwww3org/tr/xhtml1/dtd/xhtml1-strictdtd The DTD will be replaced (?) by an XML Schema http://wwww3org/tr/xhtml1-schema/#xhtml1-strict making all syntax XML based
W3C activities CSS, XSL See http://wwww3org/ CSS, XSL RDF, Semantic Web XML Schema SOAP, Web Services Accessibility, Internationalization (I18N) Cascading Style Sheets (CSS1, CSS2, CSS3, CSS4, ) Starting with CSS3 the specification is modular Extensible Stylesheet Language (XSL) XSL Transformations (XSLT) XSL Formatting Objects (XSL-FO) XML Path Language (XPath) RDF (1) RDF (2) Resource Description Framework Metadata Semantic Web Web 30 Knowledge Machine readable information Reinventing Mathematical Logic (?) Example from UvA/SNE research NDL (Network Description Language) http://wwwscienceuvanl/research/sne/ndl/
XML Schema HTML5 (1) Replacement for an SGML DTD Is written itself in XML syntax Has support for built-in datatypes Tries to address web applications Typical example is Adobe Flash Returns to HTML as a basis, improves and extends it Introduces new tags, for instance <nav> <video>, <audio> <canvas>, <figure> Promoted by the WHATWG (Apple, Mozilla, Opera) Web Hypertext Application Technology Working Group Also W3C now has its own HTML5 specification HTML5 (2) HTML5 builds upon and is backward compatible with HTML4 XHTML1 DOM Level 2 HTML5 s use of DOCTYPEs <!DOCTYPE html> <!DOCTYPE html SYSTEM "about:legacy-compat"> Revival of Tag soup But with standardized error handling