Table of Contents WWW World Wide Web Aka The Internet Karst Koymans Informatics Institute University of Amsterdam (version 163, 2016/10/06 13:25:13 UTC) Friday, October 7, 2016 WWW history Basic concepts Client side Server side Dynamic content URI, URL, URN Protocol Markup (a quick recap from Essential Skills) Hybrids (the way forward?!) WWW history (1) WWW history (2) 1968 Doug Engelbart Earlier than ARPANET or UNIX Inventor of the mouse and of hypermedia and of videoconferencing 1989 Tim Berners Lee 20 years after Engelbart First Information management proposal at CERN 1990 WorldWideWeb browser (later Nexus) Developed by Berners Lee on NEXTSTEP World Wide Web (with spaces) is the abstract information space
WWW history (3) WWW history (4) More browsers 1992 Line mode browser 11 ViolaWWW browser for X Windows 1993 Mosaic for X Windows (Marc Andreessen, founder of Netscape) 1994 Mozilla (Netscape) W3C founded (MIT, INRIA, Keio) In 2003 ERCIM took the place of INRIA In 2012 http://wwwwebplatformorg/ was convened with an invalid certificate (20131003) for https In 2013 Beihang University (China) was invited to join hosting W3C 1995 Microsoft launches Internet Explorer Start of the Browser Wars WWW history (5) WWW concepts Standards based browsers and layout engines (20151005) Gecko (Firefox, Seamonkey, Netscape, Thunderbird) KHTML (Konqueror) WebKit, using WebCore (Apple Safari, Google Chrome) Blink derived from WebCore (Chrome, Chromium, Opera) Trident/EdgeHTML (Microsoft Internet Explorer / Microsoft Edge) 1 Presto (Opera (until 2013)) 2 Web browser (client) Web server (server) URLs and HTTP (protocol) HTML and other markup (content) Dynamic web pages (interaction) Document Object Model (page model) 3-tier model (architecture) 1 proprietary, but freely available 2 proprietary; freely and commercially available
Browser Web server Client making requests for web pages Interface for the user Graphical User Interface IE, Safari, Firefox, Opera, Chrome, Chromium, Terminal based lynx, links, w3m, Server responding to requests NCSA HTTPd, Apache HTTP Server Internet Information Services (IIS, Microsoft) Sun Java System Web Server (Sun ONE, iplanet) Partly open sourced (Open Web Server) Looks for information in or via files, databases, scripts locally or remote See also http://wwwnetcraftcom/ Lesser known or more recent web servers Netcraft survey October 2006 nginx ( engine X ) also reverse proxy and load balancer high performance Google Web Server (GWS) Google custom (mystery?) web server also runs Blogger, Google Docs, lighttpd ( lighty ) lightweight optimised for speed qqcom Chinese IM and blog service (mostly private now)
Netcraft survey November 2007 Netcraft survey September 2008 Netcraft survey September 2009 Netcraft survey September 2010
Netcraft survey September 2011 Netcraft survey December 2012 (active sites) Netcraft survey October 2013 (all sites) Netcraft survey September 2014 (all sites)
Netcraft survey September 2015 (all sites) Netcraft survey September 2016 (all sites) Netcraft survey September 2014 (active sites) Netcraft survey September 2015 (active sites)
Netcraft survey September 2016 (active sites) 3-tier model Presentation layer (user tier) Communicates with (client) browser Business Logic layer (business tier) Applies business rules Data layer (data tier) Interacts with data store (database) LAMP model Dynamic web pages (server side) (L)inux as underlying OS (A)pache as presentation layer (M)ySQL as data layer (P)HP as business logic layer Some people use (P)erl and/or (P)ython Server side dynamic web page technologies, which are able to generate unique content for each call or user Common Gateway Interface (CGI) Server Side Includes (SSI) Server-side scripting PHP, JSP, ASP Servlets: server-side java applications ASPNET: successor of ASP, part of the NET framework
Dynamic web pages (client side) ECMAScript engines (20151005) Client side: Dynamic HTML (DHTML) Works with the Document Object Model (DOM) Executes ECMAscript (standard) programs JavaScript (Netscape), JScript (Microsoft) Other client side techniques AJAX (Asynchronous Javascript and XML) for interactive web pages Applets (JVM) or ActiveX controls Flash, Silverlight, HTML5 ECMAScript engines SpiderMonkey/TraceMonkey (Firefox, Seamonkey) KJS (Konqueror) JavaScriptCore/SquirrelFish/Nitro (Apple Safari) V8 (Google Chrome, Opera) Chakra (JScript) (Microsoft Internet Explorer 9) Chakra (JavaScript) (Microsoft Edge) Futhark/Carakan (Opera discontinued ) Document Object Model More interaction (1) An ongoing W3C/WHATWG activity for standardizing Dynamic HTML Level 0: proprietary API for HTML (XML) documents refers to what existed before the standardization Level 1: standardized API for HTML (XML) documents Level 2: modularized with support for events and styles Level 3: support for loading and saving and for keyboard events Level 4: DOM4 working draft (July 2014) Now part of the DOM Living Specification Also see https://domspecwhatwgorg/ XmlHttpRequest (XHR) Javascript API to send requests to servers Used by the Ajax web development technique Response is integrated into the current web page without rendering the complete page again Response can be XML, but also JSON HTML Work now integrated with HTML5 effort
More interaction (2) Uniform Resource Identifiers (RFC 3986) Websockets General mechanism to upgrade an existing TCP connection to multiple full-duplex connections Uses the Upgrade: 3 HTTP header mechanism as handshake Defines the ws: and wss: schemes Uses an allowed-origin policy for security by including an Origin: header HTTP/2 defines a similar mechanism to establish multiplexed connections over the same TCP link A URI can have two forms A URL (Uniform Resource Locator) https://wwwos3nl/ ftp://ftpnluugnl/pub/os/bsd/ Often identifies the location and access mechanism of the resource A URN (Uniform Resource Name) urn:<nid>:<nid-specific> urn:ietf:rfc:2648 urn:isbn:0-97-606188-0 Gives a name to a resource in a certain namespace 3 Originally meant to upgrade http to https, just like STARTTLS for mail Uniform Resource Locators HTTP URLs <scheme>:<scheme-specific> <scheme> is often some Internet protocol http, ftp, telnet, rtsp <scheme-specific> often starts with // to indicate that an Internet address (IP address or DNS domain name) follows This is called the authority part Other schemes: mailto, news <scheme>":" <hierarchy-indicator> http:// <authority> [<userinfo>"@"] <host>[":" <port>] <path> either begins with / or is empty, see RFC 3986 "?" <query> gives extra parameters for identifying the resource "#" <fragment> secondary (sub)resource, mostly used in URI-references 4 4 A URI reference is a relative URI, which has to be completed by software
HTTP/11 HTTP/11 request RFCs 7230-7235 Uses <CR><LF> as end of line convention HTTP request/response request/response line request/response headers empty line optional body <method> <path 5 > <HTTP-version> GET (to get/load a resource) HEAD (to fetch only the headers) PUT (to store a resource) POST (to provide input in the body to server side scripts) DELETE (to delete a resource) OPTIONS (to query the server options) 5 The path may include a query, but no fragment HTTP/11 response HTTP/11 request headers <HTTP-version> <status-code> <comment> HTTP/11 200 OK HTTP/11 301 Moved Permanently HTTP/11 400 Bad Request HTTP/11 404 Not Found HTTP/11 501 Method Not Implemented User-Agent: <client identification> Host: <(virtual) server name> Cookie: <stored user tracking information> Date: <date/time message sent> Authorization: <credentials> many more
HTTP/11 response headers HTTP/2 Content-Type: <MIME type> Content-Length: <page length in bytes> Last-Modified: <date of last page change> Set-Cookie: <string to keep state> Location: <redirection information> many more Based on SPDY Defined in RFC 7540 Improves efficiency and uses multiplexed streams Has flow control and prioritization Implements server push mode Starts out as a normal http(s) connection Uses Upgrade: h2c for http connections Uses ALPN (Application-Layer Protocol Negotiation) TLS extension (RFC 7301) for https connections with the h2 protocol identifier Markup SGML SGML HTML XML XHTML Standard Generalized Markup Language <!DOCTYPE > <!ELEMENT > <!ATTLIST > <!ENTITY > DTD: Document Type Definition
HTML XML Instantiation of SGML <!DOCTYPE HTML PUBLIC -/W3C/DTD HTML 401//EN http://wwww3org/tr/html4/strictdtd > All else is bogus: <BLINK>, Simpler reformulation of SGML Some differences Every start tag must have a close tag Attribute values must always be quoted <?xml > processing instructions reserved XHTML W3C activities XML specification of HTML, which still needs a DTD http://wwww3org/tr/xhtml1/dtd/xhtml1-strictdtd The DTD can be replaced by an XML Schema instance http://wwww3org/tr/xhtml1-schema/#xhtml1-strict making all syntax XML based XML Schema is also referred to as WXS (W3C XML Schema) or XSD (XML Schema Definition) There are other XML schemata like RELAX NG or Schematron See http://wwww3org/ HTML, XML, XHTML CSS, XSL RDF, Semantic Web XML Schema SOAP, Web Services Accessibility, Internationalization (I18N)
CSS, XSL RDF (1) Cascading Style Sheets (CSS1, CSS2, CSS3, CSS4, ) Starting with CSS3 the specification is modular Extensible Stylesheet Language (XSL) XSL Transformations (XSLT) XSL Formatting Objects (XSL-FO) XML Path Language (XPath) Resource Description Framework Metadata Semantic Web Web 30 Knowledge Machine readable information Reinventing Mathematical Logic (?) RDF (2) XML Schema Example from UvA/SNE research NDL (Network Description Language) http://wwwscienceuvanl/research/sne/ndl/ Replacement for an SGML DTD Is written itself in XML syntax Has support for built-in datatypes
HTML5 (1) HTML5 (2) Tries to address web applications Typical example is Adobe Flash Returns to HTML as a basis, but improves and extends it Introduces new tags, for instance <nav> <video>, <audio> <canvas>, <figure> Promoted by the WHATWG (Apple, Mozilla, Opera) Web Hypertext Application Technology Working Group Also W3C now has its own HTML5 specification HTML5 builds upon and is backward compatible with HTML4 XHTML1 DOM Level 2 HTML5 s use of DOCTYPEs <!DOCTYPE html> <!DOCTYPE html SYSTEM "about:legacy-compat"> Revival of Tag soup But with standardized error handling