Part I: Communication and Networking

Review what we learned Part I: Communication and Networking Communication and Networking: Week 5-6, Lectures 2-7

Lecture 1 OSI vs TCP/IP model OSI model Protocols TCP/IP model Application FTP SMTP HTTP IMAP Presentation SSL/TLS ACSII Application Session Sockets Transport TCP UDP Transport Network IP Internet Data Link Physical Physical Network Interface

Lecture 1 Transmission Control Protocol (TCP) Definition: connection-based protocol that provides a reliable flow of data between two computers Connection-oriented protocol: file or message will be delivered unless connections fails. Reliable data transmission using message acknowledgement and retransmission The order in which the data (called segments) is sent and received over the network is critical to the success of these applications Examples include: HTTP, FTP, and Telnet

Lecture 1 User Datagram Protocol (UDP) Definition: sends independent packets of data (called datagrams) between computers without guarantees about arrival and sequencing. Transaction-oriented but not connection-based like TCP: don t know if it will be delivered, it could get lost on the way. Suitable for simple query-response protocols Examples: clock server, Domain Name System,... Suitable for very large numbers of clients Examples: Voice over IP, IPTV, online games,...

Lecture 1 UDP vs TCP:4 main differences Reliability: TCP has mechanisms to manages message acknowledgement and retransmissions in case of lost parts, but UDP has no such mechanisms. Ordering: TCP segments are sent in a sequence and they are received in the same sequence, but UDP does not ensure the order of datagrams. Connection: TCP is a heavyweight connection requiring three packets to set up a socket connection, and also handles congestion control and reliability. UDB is a lightweight transport layer. Method of transfer: TCP segments are read as a byte stream (no boundaries between segments), while UDP datagrams are sent individually and are checked for integrity only if they arrive.

Lecture 1 UDP vs TCP Feature Name UDP TCP Packet header size 8 bytes 2060 bytes Transport-layer packet entity Datagram Segment Connection oriented No Yes Reliable transport No Yes Preserve message boundary Yes No Ordered delivery No Yes Flow control No Yes Congestion control No Yes Explicit Congestion Notification No Yes

Lecture 2-5 Sockets A socket consists of Local socket address: Local IP address and service port number Remote socket address: Only for established TCP sockets Protocol: A transport protocol, e.g., TCP or UDP. A socket address is the combination of an IP address (phone number) and service port number (extension). A socket API is an application programming interface (API), usually provided by the operating system.

Lecture 2-5 Service ports Computers often communicate and provide more than one type of service or to talk to multiple hosts/computers at a time Ports are used to distinguish these services Each service offered by a computer is identified by a port number Represented as a positive (16-bit) integer value Some ports have been reserved to support common services FTP: 21/TCP HTTP: 80/TCP,UDP User-level process/services use port number value 1024

Lecture 2-5 Establish socket connection Server Client Service Port Connection request Server Client Service Port Service Port Connection

Lecture 2-5 Establish socket connection Step 1: The server listens to the socket for a client to make a connection request Step 2: the server accepts the connection if everything goes well Step 3: the server gets a new socket bound to a different port Go to Step 1

Lecture 6 What is HTTP? Hypertext Transfer Protocol (HTTP): an application layer protocol for distributed, collaborative, hypermedia information systems Works as a request-response protocol in the client-server computing model. Defined as a reliable transport layer protocol, therefore commonly uses TCP. Client Server Request Results Network (TCP/IP)

Lecture 6 How HTTP works? Client: A browser, a web crawler (used by a search engine), or other software uses HTTP. Server: a computer hosting a web site with web pages or other contents, or providing functions on behalf of the client. HTTP session: a sequence of network request-response transactions. A client initiates a request by establishing a TCP connection to a particular port on a HTTP server. The HTTP server listening on that port waits for a client s request message. Upon receiving the request, the server sends back: completion status information about the request; and requested content in its message body.

Lecture 6 Part II: Communication and Networking Concurrency and Multi-threading: Week 7-9, Lectures 6-18

Lecture 1 Multitasking: Cooperative vs Preemptive Cooperative multitasking: processes control the CPU Used in early multitasking operating systems, e.g., Win 3.1 Transferring control explicitly from one process to another (coroutines) according to a cooperative model Runtime support is simpler to implement Programmer has to handle cooperation: bugs in processes may lock the systems Pre-emptive multi-tasking: Kernel schedules CPU to each task Used in modern multitasking operating systems Operating system s kernel manages (schedules) process access to CPU Preemption: an action performed by kernel: it forces a process to abandon its running state even if it could safely execute Used time slicing mechanism: a process is suspend when a specified amount of time has expired

Lecture 1 Preemptive Multitasking Process creation ready Selection Preemption running Process termination Synchronisation statement executed by other processes waiting Synchronisation statement executed by the process

Lecture 1 Concurrent programming: Two basic units Processes: a program has a self-contained execution environment has a complete, private set of basic run-time resources has its own memory space totally controlled by the operating system Threads: also called lightweight processes, which is a dispatchable unit of work in a process. Both processes and threads provide an execution environment, but creating a new thread requires fewer resources than creating a new process. A process can be divided into multiple independent threads

Lecture 1 More about threads Thread has a definite beginning and an end run inside a single process share the same address space, the resources allocated and the environment of that process A process can be a single thread, which is called main thread. A standalone Java application starts with main thread ( main() ) This main thread can start new independent threads.

Lecture 1 Difference between threads and processes Single thread process Code Memory segments Other Data resources Multithreaded process Code Data Other resources Register Stack Register Register Register Stack Stack Stack

Lecture 1 Difference between threads and processes Processes are typically independent and might consist of multiple threads Processes have separate address spaces for code, data and other resources, whereas threads share the address space of the process that created it Threads are easier to create than processes Multithreading requires careful programming Processes use inter-process communication mechanisms provided by the OS to communicate with other processes, while threads can directly communicate with other threads in the same process

Lecture 1 Context switching Context switching: a procedure of multi-threading for the system to switch between threads running on the available CPUs A context is the minimal set of data used by this task that must be saved to allow a task interruption at any point int time Data to be saved include: Registers: one of a small set of data holding places in CPU, which may hold a computer instruction, a storage address, or any kind of data Program counter: known as an instruction address register, which is a small amount of fast memory that holds the address of the instruction to be executed immediately after the current one other necessary operating system specific data

Lecture 1 Advantages of multi-threading These are advantages compared with multi-processing Improves the performance of the program by better usage of system resources: Share the same address space, less overhead for operating system Context-switching between threads is normally inexpensive Better usage of CPU time, e.g., while one thread is blocked (e.g., waiting for completion of an I/O operation), another thread can use the CPU time to perform computations Simpler program design: Control and communication between threads is easy and inexpensive. More responsive programs

Lecture 1 Disadvantages of multi-threading These are disadvantages/costs compared with single-threading Context switching overhead: even lighter than multi-processing, CPU still needs to save the register, program counter etc. of the current thread, and load the same of the next thread to execute. More complex design: data shared and accessed by multiple threads needs special attention Increased resource consumption: CPU time, memory to keep its local stack, and operating system resources to manage the thread

Lecture 1 Thread vs Runnable Essentially two ways of doing the same thing, but Extending (subclassing) the standard Thread class: Pros: the derived class itself is a thread object and it gains full control over the thread life cycle Cons: your task class must be a descendant of Thread because Java only allows single inheritance. Implementing the Runnable interface: Pros 1: easy to use, e.g., simply defines the unit of work that will be executed in a thread. Pros 2: can extend a class other than Thread Cons: lack of full control over the thread life cycle Rule of thumb: Use Runnable since it is more general and flexible.

Lecture 1 Two mechanisms of creating threads Thread Extends Implements Thread (class) Override Runnable (interface) run() method

Lecture 1 Issues in multithreading Problems will arise when multiple threads access the shared resources: Q: How threads communicate? A: By sharing the same memory space, e.g., access to fields and the objects reference fields refer to. Efficient, but makes two kinds of errors possible: Thread interference Memory consistency errors Critical section: a piece of code that accesses a shared resource

Lecture 1 Thread interference Also called race conditions Errors are introduced when multiple threads access and try to change the same resource, e.g., memory (variables, arrays, objects), system (databases) or file Let s take a look a very simple example: adding/subtracting some values from 0. We first define a Class Counter to do adding/subtracting Suppose we want to reference a Counter object from two threads which add 1 and then subtract 1. We expect we should get the outputs of 0 from both threads, but sometimes it is not. Why?

Lecture 1 Three atomic operations Atomic operation: an action either happens completely, or it doesn t happen at all. For a simple add() method: Step 1: Load value from counter into a register Step 2: Add some value to the register Step 3: Store the value in the register back to counter

Lecture 1 Thread interference Let s take a look an simple example, where two threads try to increase an integer value by 1 The steps when it went wrong: Thread 1 Thread 2 Integer value 0 read value to r1 0 read value to r2 0 increase value in r1 0 increase value in r2 0 write back 1 write back 1 Step 1: Load integer value to a register Step 3: Store the register to integer vaule

Lecture 1 Thread interference: danger and solution Please read and execute my code example for thread interference. You might notice most of the time it runs perfectly well This kind of bug is particularly difficult to find and fix One solution: Synchronisation, that is enforcing exclusive access to a shared resource.

Lecture 1 Memory consistency errors Memory consistency errors: different threads have inconsistent views of the same data. Example: Suppose we have one int field: int count = 0; counter is shared between two threads A and B Thread A increments counter: Thread B prints out counter: System.out.println(counter); count++; The output might not be predictable, could be 0 or 1. No guarantee that thread A s change to counter will be visible to thread B

Lecture 1 Memory consistency errors Memory consistency errors: different threads have inconsistent views of the same data. The causes of these errors are complex: we only need to know how to avoid them Key for avoiding these errors: understanding the happens-before relationship, also denoted as a b: event a should happen before event b, the result must reflect that no matter what order those events are in reality executed. In Java: a guarantee that memory written to by statement A is visible to statement B, that is, that statement A completes its write before statement B starts its read.

Lecture 1 Ways of creating happens-before relationships Write code to explicitly implement happens-before relationships, esp. for executing new threads Use Thread.join() Synchronisation

Lecture 1 Intrinsic lock Synchronization is built around an internal entity known as the intrinsic lock, or monitor lock In Java, every object has an intrinsic lock associated with it Intrinsic lock does two things: enforcing exclusive access to an object s state; and establishing happens-before relationships How it works: Thread A acquires the object s intrinsic lock before accessing the object s field exclusive access It release the intrinsic lock when finishes accessing No other thread can acquire the same lock before Thread A releases the lock happens-before relationships

Lecture 1 Mechanisms behind Java synchronisation: monitor An important concept in concurrent programming and operating system Monitor: a synchronization construct that allows threads: to have mutual exclusion to have the ability to wait for a certain condition to become true to signal other threads that their condition has been met. A monitor can be formally defined as M = (m, c), where m is intrinsic lock object and c a condition variable which is basically a container of threads that are waiting on a certain condition.

Lecture 1 Synchronisation in Java Synchronisation tools Volatile Variables Locks Atomic Operations Reentrant Locks Synchronized keyword Synchronized methods Synchronized statements

Liveness: deadlock Liveness and Deadlock Liveness property in concurrent programming means something good eventually happens. More specifically, the ability of an application to execute in a timely manner. Liveness problems include deadlock, starvation and livelock Deadlock: a situation threads are blocked forever, waiting for each other: Can be two threads waiting for another to release a lock Can be more than two processes are waiting for resources in a circular chain

Liveness: deadlock 4 Necessary conditions for deadlock A deadlock situation will happen if all of the following conditions (called Coffman conditions) hold simultaneously: Mutual exclusion: at least one resource must be held in a non-sharable mode: Only one thread at a time can use the resource If another thread requests that resource, the requesting thread must wait until the resource has been released Hold and wait: Thread holds one resource while waits for another No preemption: Once a thread is holding a resource, then that resource cannot be taken away from that thread until the thread voluntarily releases it. Circular wait: A thread must be waiting for a resource which is being held by another thread, which in turn is waiting for the first thread to release the resource.

Liveness: deadlock How to deal with deadlock? Prevention: make one or more conditions invalid No mutually exclusive access to resource: Nonblocking Synchronization (next week) Threads should not hold resources while waiting Allow resources to be taken away from threads No circular wait: lock ordering design, see blow Dynamic avoidance: Lock ordering: make sure all locks are always taken in the same order by any thread Lock timeout: put a timeout on lock attempts Deadlock detection: manually or algorithmically use a data structure, i.e., graph to detect deadlock ()

Liveness: deadlock How to avoid deadlock: Deadlock detection Can be done manually or algorithmically Use Resource-Allocation Graph Searches for a cycle in the graph. If there is a cycle, there MIGHT exist a deadlock Resource vertex Thread vertex Resource A Assignment edge Thread 1 Thread 2 Resource B Request edge

Thread signalling Thread signalling Threads often have to coordinate (or synchronise) their actions: several threads start at the same time; a thread waits for other threads to finish To coordinate, they need to signal each other Two types of threads signals Synchronous (what we are dealing): Occur as a direct result of thread execution Should be delivered to currently executing thread Asynchronous: Occur due to an event typically unrelated to the current instruction Threading library must determine each signals recipient so that asynchronous signals are delivered properly Each thread might receive a set of synchronous signals but it can mask all signals except those that it wishes to receive

Thread signalling Guarded block: using thread signalling The above naive implementation is non-synchronized guarded block Use a blank loop until the condition becomes true wasting the precious CPU time We should use synchronized guarded block: Current thread is suspended to wait for the condition becomes true It releases the acquired lock on that object leaves the processor to be used by other threads We can use Java thread signalling methods to achieve this Steps: Invoke wait inside a loop that tests for the condition being waited for, also release the lock Another thread who acquires the same lock invokes notifyall to informing all threads waiting on that lock that something important has happened.

Thread signalling Semaphore What is a Semaphore? Semaphore: a variable or abstract data type that is used for controlling access, by multiple processes or threads, to a common resource in concurrent programming Very simple idea: if the Semaphore value is 0, an attempt to decrement this value will cause the calling thread to wait until some other thread increments it. Invented by the famous Dutch computer scientist Edsger Dijkstra in 1965 In Java, it is called counting semaphore, which maintains a set of permits (Semaphore value). Usage: to restrict the number of threads than can access some resource. to send signals between threads.

Thread signalling Semaphore How Semaphore works? Cnt=2 T1 acquire() T2 T3 T4 acquire() acquire() acquire() continue 2 1 0 0 0 1 continue continue awaiting... awaiting... release() T2 continue continue continue 0 1 0 1 2 continue T4 release() T4 T3 continue release() release() T1 T3

Producer consumer problem Producer Consumer problem Producer Consumer problem: also known as bounded-buffer problem) Two threads: the producer and the consumer A shared buffer: a fixed-size queue. The produce: generating a piece of data, putting it into the buffer and start again. The consumer: removing the data continuously from the buffer one piece at a time Requirement: the producer won t try to add data into the buffer if it s full and that the consumer won t try to remove data from an empty buffer. Everyday examples everywhere: rotating sushi bar

Producer consumer problem Producer Consumer problem: solutions Three situations: The buffer is full: the producer stops producing, i.e., sleep The buffer is empty: the consumer stops removing, i.e., sleep The buffer is neither full or empty: the producer and the consumer continue working or notify the sleeping producer/consumer to resume Synchronisation is required to avoid thread interference problem deadlock Deadlock: both threads are waiting to be awakened by the other. Once you have found a good solution, the problem becomes a design pattern: Producer Consumer Design Pattern

Producer consumer problem Producer Consumer design pattern Producer Consumer design pattern: a classic concurrency or threading programming design pattern Use to separate work that needs to be done from the execution of that work. Also useful for decoupling threads that produce and consume data in different rates Example: application accepts data while processing them in the order they were received. Producer: Producing the data, e.g., queueing up the received data in order - fast Consumer: Consuming the data, e.g., processing the data - slow Java Executor framework (will introduce later) implements Producer Consumer design pattern

Executors framework Thread pools Thread pool: a managed collection of worker threads that are created and waiting to perform tasks. A thread pool also contains a job queue which holds tasks waiting to get executed. Benefits of thread pools: Improved performance when executing large numbers of tasks: reuses worker threads to reduce per-task invocation overhead. A means of bounding the resources consumed by threads when executing a collection of tasks. No management of the life cycle of threads. You just need to focus on the tasks that you want the threads to perform, instead of creating, managing and coordinating threads. The best tool for creating thread pools in Java is Executors framework interface

Executors framework Thread Pool Task 1 Thread 1 Job queue Task 2 Thread 2 Task 3 Thread n Thread pool

Executors framework A few important concepts: Callable Runnable: An interface should be implemented by any class whose instances are intended to be executed by a thread. Callable: An interface that can be implemented as a task that returns a result and may throw an exception. Runnable vs Callable: Similarity: both are designed for classes whose instances are potentially executed by another thread. Difference: Runnable does not return a result and cannot throw a exception. Callable = Runnable objects with a return value.

Executors framework Part III: Web applications and development Web applications and development: Week 10-11, Lectures 18-20

Executors framework Java servlet life cycle Java Servlets are run inside a Servlet compatible Servlet Container, e.g., Apache Tomcat, JBoss, Jetty, etc. A Servlets s life cycle consists of the following steps: Step 1: Load Servlet Class. Step 2: Create Instance of Servlet. Step 3: Call the servlets init() method. Step 4: Call the servlets service() method. Step 5: Call the servlets destroy() method. Note 1: By default, a servlet is not loaded until the first request is received for it. Note 2: When the servlet container unloads the servlet, destroy() method is called and the container finalises the servlet and collect garbage.

Executors framework Java servlet life cycle Servlet Container Load class Instantiation Instantiation Instantiation init() service() Response Request destroy() Finalisation and garbage collection

Java severlet Session Management What is a session and why use it? HTTP protocol and Web Servers are stateless: for web server every request is a new request, even it is the same request from the same client Web applications sometimes require the client information to process the request accordingly: Example 1: After login with your correct authentication credential, how does the server remember you have logged in? Example 2: When you add an entry to your cart, how does the server know what you have added earlier? We need to make the server remember what the user entered before. Session: a conversation between client and server and it can consists of multiple request and response between them

Java severlet Session Management Session ID Session ID: a piece of data that is used in HTTP to identify a session Client store the session ID, while the server associate that ID with other client information such as a user name Steps: Step 1: Client start a session, e.g., requests a page Step 2: Server allocates a random session ID upon the request also store the user information Step 3: Session ID is then communicated back to the client Step 4: If the client sends subsequent requests, it also sends back the same session ID Step 5: The server decide whether the session has expired Step 6: If not expired, the server associates the user information with that session ID and response to the requests

Java severlet Session Management How to associate user information with ID Three typical ways of associate user information with ID: Hidden form fields: a unique hidden field in the HTML of which the server can set its value to the session ID and keep track of the session Drawback 1: form with the hidden field must be submitted every time when the request is made from client to server. Drawback 2: Not secure: hacker can get the hidden field value from the HTML source and use it to hack the session. Cookies: a small piece of information that is sent from the server and stored in the client s browser. When client make further request, it adds the cookie to the request header and we can utilize it to keep track of the session URL Rewriting: Appends a session identifier parameter with every request and response to keep track of the session.

Java severlet Session Management How to associate user information Client Login Post Username=GWBush Password=1+1=3 Set Cookie: SESSIONID=24D644 2B89D1B65FECF1C 8D9FC2232D0 Server Login successful? 1. Create session ID 2. Return session ID in a cookie 3. Store session ID in a database Session ID Username CreatedTime ExpiredTime LassAccessTime Cookie: SESSIONID=24D644 2B89D1B65FECF1C 8D9FC2232D0 Lookup session ID Session still valid? Database Content for GWBush

Java severlet Session Management What is Model-View-Controller (MVC)? MVC: a design pattern for efficiently relating the user interface to underlying data models. Three main components: Model: represents the underlying data structures in a software application and the functions to retrieve, insert, and update the data. Note: No information about the user interface. View: a collection of classes representing the elements in the user interface for the user to see on the screen Controller: classes connecting the model and the view, and is used to communicate between classes in the model and view.

Java severlet Session Management What is Model-View-Controller (MVC)? User Request Controller Return results Updates Manipulates View Model

Java severlet Session Management Advantages of MVC Better complexity management: Software separate presentation (view) from application logic (model) Consequently, code is cleaner and easier to understand Enable large teams of developers for parallel development Flexibility: Presentation or user interface can be completely revamped without touching the model Reusability: The same model can used for other application with different user interfaces