The World Wide Web
Web Web site consists of simply of pages of text and images A web pages are render by a web browser Retrieving a webpage online: Client open a web browser on the local machine The web browser need to determine the ipaddress of the web server that is hosting the webpage (www.amazon.com) Web browser contacted the DNS to reslovename space to IP addresses
Uniform Resource Locators (URL) The URL used by the web browser to identify a webpage http://www.example.com/directory/file.html The protocol used in the above example is the Hypertext transfer protocol The domain name is www.example.com The web browser is requesting the content of file.htmlwhich is stored in the directory folder of a web server file.htmlis a file that describe text and images using the html (Hypertext markup language)format
Connecting to Web Server URLis used by the HTTP to access web information on a remote machine (web server) Resolving name space to IP address: The web browser check it DNS cache to resolve name space to IP address If no address is found the web browser send a DNS request Client-server TCP connection: The client make a TCP request on port 80 for HTTP HTTP requests: HTTP requests are encapsulated in the TCP packets HTTP request usually begin with commands such as POST or GET
Hypertext Mark up Language (HTML) Each HTTP response include a header Information in the HTTP response header includes: Information about the web server Software type and version number (Apache, Google GWS) The size of the payload The main body of the webpage in HTML source code
HTTP Request
HTML Coding HTML use a structural description of a document using special tags: Text formatting <i> text </i> for italics and <b> text </b> for bold Itemized lists is presented as: <ul> <li> first-item </li> <li> second-item </li> </ul> Hyperlinks is presented as: <a href= web-page-url >Description of the other page</a> Embedding images: <img src= URL-of the-image > Scripting code is represented as: <script> computer code </script>
HTML forms HTML forms allow user of the web to submit inputs to variables provided by the web server Server-side code is used by the web server to process user inputs Two methods to submit user inputted data: GET POST GET variables are recommended for querying a database POST variables are recommended when inserting or sending an email. The browser will promote the user if he wish to submit the information
HTML GET method
HTML code with a form
Vulnerabilities in HTTP HTTP request and response packets are send in clear text The lack of encryption allow an attacker to eavesdrop on the communication and capture the payload Therefore sensitive data should transmitted using HTTPS
HTTPS HTTPS uses the secure socket layer (SSL) or transport layer security to secure data in transit Establishing a secure connection: 1. The browser provide the web server with a list of security primitives that are supported on the client machine. Hash function Crypto algorithms 2. The web server chooses the strongest cipher and hash that are supported by the client machine. 3. The web server send a certificate HTTPS Client HTTPS web server Supported hash and cipher Choose the strongest hash and cipher Send a certificate
HTTPS 1. Client verify the certificate 2. Client and web server generate a shared key 3. Symmetric encryption is used to transfer data over the secure channel HTTPS Client HTTPS web server Client send a random number E(R, P s) Server and client set a shared key Shared key and MAC is used to encrypt and verify the integrity of the data
Web Server Certificate Certificates are used to enable a client to verify the identity of web site Certificates are digitally signed by a certificates authority (CA) A website obtain a certificate by submitting a certificate signing request The certificate include the following information: Name of the CA Serial number of the certificate Experiation date Domain name of the web site Identifier of the public key scheme Public key Identifier of the crypto and hash algorithm Digital signature over the certificate data
Extended Validation Certificate Extended validation certificate can only be signed by high-profile CAs Extended validation certificate are designated in the CA field
Certificate Hierarchy Low-level certificates are signed by intermediary CA Top-level certificate is known as root certificate Root certificates for top-level domain are called anchor point Anchor points are usually stored in the OS.
Invalid certificate
Dynamic Content Web content could be: Static Dynamic Scripting languages allowed a computer code executed by a module of the browser Client-side scripting executed by the browser Server-side scripting executed by the server hiding the code from the user and only providing the user with the output
Document Object Model (DOM) The content of the web page can presented as in an organized way HTML code is presented in an object-oriented way Tags and page elements are represented as parent-child relationship
Java script Interactive and dynamic web browsing capacities are introduced through a scripting languages called java script