A Small Web Server. Programming II - Elixir Version. Johan Montelius. Spring Term 2018

Similar documents
Rudy: a small web server. Johan Montelius. October 2, 2016

The Actor Model. CSCI 5828: Foundations of Software Engineering Lecture 13 10/04/2016

AVL Trees. Programming II - Elixir Version. Johan Montelius. Spring Term 2018

The Application Layer HTTP and FTP

Application Protocols and HTTP

Lecture 8: February 19

CSMC 412. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala Set 2. September 15 CMSC417 Set 2 1

Web Server Project. Tom Kelliher, CS points, due May 4, 2011

Elixir, functional programming and the Lambda calculus.

STARCOUNTER. Technical Overview

CS 43: Computer Networks. HTTP September 10, 2018

KTH ROYAL INSTITUTE OF TECHNOLOGY. Remote Invocation. Vladimir Vlassov and Johan Montelius

Operating Systems. 18. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Spring /20/ Paul Krzyzanowski

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files. Compiler vs.

Computer Networks Prof. Ashok K. Agrawala

HTTP Reading: Section and COS 461: Computer Networks Spring 2013

Erlang: concurrent programming. Johan Montelius. October 2, 2016

Socket Programming Assignment 6: Video Streaming with RTSP and RTP

HTTP Server Application

Application Layer: The Web and HTTP Sec 2.2 Prof Lina Battestilli Fall 2017

CSC 2209: CLOUD STORAGE FINAL PROJECT

2/13/2014. A protocol is an agreed-upon convention that defines how communication occurs between two (or more?) endpoints

CSC209 Review. Yeah! We made it!

Request for Comments: 913 September 1984

Introduction to Erlang. Franck Petit / Sebastien Tixeuil

SC/CSE 3213 Winter Sebastian Magierowski York University CSE 3213, W13 L8: TCP/IP. Outline. Forwarding over network and data link layers

Session 8. Reading and Reference. en.wikipedia.org/wiki/list_of_http_headers. en.wikipedia.org/wiki/http_status_codes

porcelain Documentation

Distributed Systems. How do regular procedure calls work in programming languages? Problems with sockets RPC. Regular procedure calls

CSSE 460 Computer Networks Group Projects: Implement a Simple HTTP Web Proxy

殷亚凤. Processes. Distributed Systems [3]

My malloc: mylloc and mhysa. Johan Montelius HT2016

Peer to Peer Instant Messaging

A programmer can create Internet application software without understanding the underlying network technology or communication protocols.

Computer Networks. Wenzhong Li. Nanjing University

Project 2 Implementing a Simple HTTP Web Proxy

Rocking with Racket. Marc Burns Beatlight Inc

ECE 650 Systems Programming & Engineering. Spring 2018

Taking the Derivative

The Actor Model, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 18 10/23/2014

COMPUTER NETWORK. Homework #1. Due Date: March 29, 2017 in class

Programming Assignment IV

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Project 2 Group Project Implementing a Simple HTTP Web Proxy

CSE 333 Lecture server sockets

A process. the stack

Introduction to computer networking

Web Client And Server

CHAPTER 2. Troubleshooting CGI Scripts

Remote Invocation Vladimir Vlassov and Johan Montelius

Session 9. Deployment Descriptor Http. Reading and Reference. en.wikipedia.org/wiki/http. en.wikipedia.org/wiki/list_of_http_headers

Lecture 5: Declarative Programming. The Declarative Kernel Language Machine. September 12th, 2011

Project 1: A Web Server Called Liso

P2P Programming Assignment

Internet Content Distribution

DATA COMMUNICATOIN NETWORKING

CSE 333 Lecture server sockets

Flash: an efficient and portable web server

Distributed Systems 8. Remote Procedure Calls

Protocol Buffers, grpc

String Computation Program

printf( Please enter another number: ); scanf( %d, &num2);

Processes. Johan Montelius KTH

Erlang Concepts. Programming for Beginners, Summer 2011

Programming with MPI

Project 1: Web Client and Server

Casty: a streaming media network. Johan Montelius. October 2, 2016

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

Web Mechanisms. Draft: 2/23/13 6:54 PM 2013 Christopher Vickery

let s implement a memory Asynchronous vs Synchronous the functional way a first try

Version 2.1 User Guide 08/2003

Chapter 1. Introduction. 1.1 Understanding Bluetooth as a Software Developer

Scalability, Performance & Caching

A Client-Server Exchange

Programming Assignment IV Due Monday, November 8 (with an automatic extension until Friday, November 12, noon)

Getting Started With Squeeze Server

Remote Procedure Calls (RPC)

CME 193: Introduction to Scientific Python Lecture 1: Introduction

Internet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016

Client-side SOAP in S

An Erlang primer. Johan Montelius

Serving dynamic content. Issues in serving dynamic content. add.com: THE Internet addition portal! CGI. Page 2

CS 417 9/18/17. Paul Krzyzanowski 1. Socket-based communication. Distributed Systems 03. Remote Procedure Calls. Sample SMTP Interaction

CIS 505: Software Systems

CS52 - Assignment 8. Due Friday 4/15 at 5:00pm.

A programming example. A Ray Tracer. ray tracing. Architecture

Chapter Two Bonus Lesson: JavaDoc

Distributed Systems. 03. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Fall 2017

CS 326: Operating Systems. Networking. Lecture 17

Python INTRODUCTION: Understanding the Open source Installation of python in Linux/windows. Understanding Interpreters * ipython.

AN408. A Web-Configurable LabVIEW Virtual Instrument for a BL2600 and RN1100

Computer Organization & Assembly Language Programming

Lab 03 - x86-64: atoi

Programming Assignment 1

ELEC / COMP 177 Fall Some slides from Kurose and Ross, Computer Networking, 5 th Edition

Input File Syntax The parser expects the input file to be divided into objects. Each object must start with the declaration:

Sending MAC Address Function

UDP and TCP. Introduction. So far we have studied some data link layer protocols such as PPP which are responsible for getting data

New York University Computer Science Department Courant Institute of Mathematical Sciences

Transcription:

A Small Web Server Programming II - Elixir Version Johan Montelius Spring Term 2018 Introduction Your task is to implement a small web server in Elixir. exercise is that you should be able to: The aim of this describe the procedures for using a socket API describe the structure of a server process As a side effect you will also learn a bit about the HTTP protocol. 1 A HTTP parser Let s start by implementing a HTTP parser. We will not build a complete parser nor will we implement a server that can reply to queries (yet) but we will do enough to understand how Elixir can parse a HTTP request. 1.1 A HTTP request Open a file http.ex and declare a module HTTP on the first line. defmodule HTTP do ID1019 KTH 1 / 12

We re only going to implement the parsing of a HTTP get request; and avoiding some details to make life easier. If you download the RFC 2616 from www.ietf.org you will find this descriptions of a request. From the RFC we have: Request = Request-Line ; Section 5.1 *(( general-header ; Section 4.5 request-header ; Section 5.3 entity-header ) CRLF) ; Section 7.1 CRLF [ message-body ] ; Section 4.3 So a request consist of: a request line, an optional sequence of headers (the * means zero or more), a carriage-return line-feed (CRLF) and, an optional body (optional since it is enclosed in []). We also note that each header is terminated by a carriage-return line-feed. OK, let s go; we implement each parsing function so that it will parse its element and return a tuple consisting of the parsed result and the rest of the string. def parse_request(r0) do {request, r1} = request_line(r0) {headers, r2} = headers(r1) {body, _} = message_body(r2) {request, headers, body} 1.2 The request line Now take a look at the RFC and the definition of the request line. Request-Line = Method SP Request-URI SP HTTP-Version CRLF The request line consist of: a method, a request URI and, a http version. All separated by space characters and terminated by a carriage-return and line-feed. The method is one of: OPTIONS, GET, HEAD etc. Since we are only interested in get requests, life becomes easy. To understand how we implement the parser you must know how strings are represented in Elixir. Strings are lists of integers and one way of writing ID1019 KTH 2 / 12

an integer is?g, that is the ASCII value of the character G. So the string GET could be written [?G,?E,?T], or (if you know your ASCII) as [71, 69, 84]. Now let s do some string parsing. def request_line([?g,?e,?t, 32 r0]) do {uri, r1} = request_uri(r0) {ver, r2} = http_version(r1) [13, 10 r3] = r2 {{:get, uri, ver}, r3} We match the input string with a list starting with the integers of G, E and T, followed by 32 (which is the ASCII value for space). After having matched the input string with GET we continue with the rest of the string r0. We find the URI, the version and finally, the carriage return and line feed that is the of the request line. We then return the tuple {{:get, uri, ver}, r3}, the first element is the parsed representation of the request line and r3 is the rest of the string. 1.3 The URI Next we implement the parsing of the URI. This requires recursive definitions. I know that you can write this as a tail recursive function but right now I don t think it is worth the trouble. def request_uri([32 r0]), do: {[], r0} def request_uri([c r0]) do {rest, r1} = request_uri(r0) {[c rest], r1} The URI is returned as a string. There is of course a whole lot of structure to that string. We will find the resource that we are looking for possibly with query information etc. For now we leave it as a string, but feel free to parse it later. ID1019 KTH 3 / 12

1.4 Version information Parsing the version is simple, it s either version 1.0 or 1.1. We represent this by the atoms v11 and v10. This means that we later can switch on the atom rather than again parsing a string. It does of course mean that our program will stop working when 1.2 is release but that will probably not be this week. def http_version([?h,?t,?t,?p,?/,?1,?.,?1 r0]) do {:v11, r0} def http_version([?h,?t,?t,?p,?/,?1,?.,?0 r0]) do {:v10, r0} 1.5 Headers Headers also have internal structure but we are only interested in dividing them up into individual strings and most important find the of the header section. We implement this as two recursive functions; one that consumes a sequence of headers and one that consumes individual headers. def headers([13, 10 r0]), do: {[], r0} def headers(r0) do {header, r1} = header(r0) {rest, r2} = headers(r1) {[header rest], r2} def header([13, 10 r0]), do: {[], r0} def header([c r0]) do {rest, r1} = header(r0) {[c rest], r1} ID1019 KTH 4 / 12

1.6 The body The last thing we need is parsing of the body and we will make things very easy (even cheating). We assume that the body is everything that is left but the truth is not that simple. If we call our function with a string as input argument, there is little discussion of how large the body is - but this is not easy if we want to parse an incoming stream of bytes. When do we reach the, when should we stop waiting for more? The length of the body is therefore encoded in the headers of the request. Or rather in the specification of HTTP 1.0 and 1.1 there are several alternative ways of determining the length of the body. If you dig deeper into the specs you will find that it is quite messy. In our little world we will however treat the rest of the string as the body. def message_body(r), do: {r, []} 1.7 a small test You now have all the pieces to parse a request; try the string below. Note how we encode the special characters in a string, \r is carriage-return and \n is line-feed. 'GET /index.html HTTP/1.1\r\nfoo 34\r\n\r\nHello' 1.8 What about replies We will not deliver very interesting replies from our server but here is a function that generates a HTTP reply with a 200 status code (200 is all OK). Another function can be convenient to have when we want to generate a request. Also export these from the http module, we are going to use them later. Note the double \r\n, one to the status line and one to the header section. A proper reply should containing headers that describe the content and the size of the body but a normal browser will understand what we mean. ID1019 KTH 5 / 12

def ok(body) do "HTTP/1.1 200 OK\r\n\r\n #{body}" def get(uri) do "GET #{uri} HTTP/1.1\r\n\r\n" Note the double \r\n, one to the status line and one to the header section. A proper reply should containing headers that describe the content and the size of the body but a normal browser will understand what we mean. 2 The first reply Your task now is to start a program that waits for an incoming request, delivers a reply and then terminates. This is not much of a web server but it will show you how to work with sockets. The important lesson here is that a socket that a server listen to is not the same thing as the socket later used for communication. Call the first test rudy (since it is a rudimentary server), open a new file and add a module declaration. You should define four procedures: init(port): the procedure that will initialize the server, takes a port number (for example 8080), opens a listening socket and passes the socket to handler/1. Once the request has been handled the socket will be closed. handler(listen): will listen to the socket for an incoming connection. Once a client has connect it will pass the connection to request/1. When the request is handled the connection is closed. request(client): will read the request from the client connection and parse it. It will then parse the request using your http parser and pass the request to reply/1. The reply is then sent back to the client. reply(request): this is where we decide what to reply, how to turn the reply into a well formed HTTP reply. The program could have the following structure: ID1019 KTH 6 / 12

defmodule Rudy do def init(port) do opt = [:list, active: false, reuseaddr: true] case :gen_tcp.listen(port, opt) do {:ok, listen} ->... :gen_tcp.close(listen) :ok {:error, error} -> error def handler(listen) do case :gen_tcp.accept(listen) do {:ok, client} ->... {:error, error} -> error def request(client) do recv = :gen_tcp.recv(client, 0) case recv do {:ok, str} ->...... :gen_tcp.s(client, response) {:error, error} -> IO.puts("RUDY ERROR: #{error}") :gen_tcp.close(client) def reply({{:get, uri, _}, _, _}) do HTTP.ok("Hello!") ID1019 KTH 7 / 12

2.1 Socket API In order to implement the above procedures you will need the functions defined in the Erlang :gen tcp library. Look up the library in the Kernel Reference Manual found under Application/kernel in the documentation. The following will get you starting: :gen tcp.listen(port, option): this is how a listening socket is opened by the server. We will pass the port number as an argument and use the following option: :list, active: false, reuseaddr: true. Using these option we will see the bytes as a list of integers instead of a binary structure. We will need to read the input using recv/2 rather than having it sent to us as messages. The port address should be used again and again. :gen tcp.accept(listen): this is how we accept an incoming request. If it succeeds we will have a communication channel open to the client. :gen tcp.recv(client, 0): once we have the connection to the client we will read the input and return it as a string. The augment 0, tells the system to read as much as possible. :gen tcp.s(client, reply): this is how we s back a reply, in the form of a string, to the client. :gen tcp.close(socket): once we are done we need to close the connection. Note that we also have to close the listening socket that we opened in the beginning. Fill in the missing pieces, compile and start the program. Use your browser to retrieve the page by accessing http://localhost:8080/foo. Any luck? 2.2 A server Now a server should of course not terminate after one request. The server should run and provide a service until it is manually terminated. To achieve this we need to listen to a new connection once the first has been handled. This is easily achieved by modifying handler/1 procedure so that it calls itself recursively once the first request has been handled. A problem is of course how to terminate the server. If it is susped waiting for an incoming connection the only way to terminate it is to kill the process. ID1019 KTH 8 / 12

You don t want to kill the Elixir shell so one solution is to let the server run as a separate Elixir process and then register this process under a name in order to kill it. def start(port) do Process.register(spawn(fn -> init(port) ), :rudy) def stop() do Process.exit(Process.whereis(:rudy), "Time to die!") This is quite brutal and one should of course do things in a more controlled manner but it works for now. 3 The assignment You should complete the rudimentary server described above and do some experiments. Set up the server on one machine and access it from another machine. A small benchmark program can generate requests and measure the time it takes to receive the answers. defmodule Test do @number_requests 100 def bench(host, port) do start = Time.utc_now() run(@number_requests, host, port) finish = Time.utc_now() diff = Time.diff(finish, start, :millisecond) IO.puts("Benchmark: #{@number_requests} requests in #{diff} ms") defp run(0, _host, _port), do: :ok defp run(n, host, port) do request(host, port) run(n - 1, host, port) ID1019 KTH 9 / 12

defp request(host, port) do opt = [:list, active: false, reuseaddr: true] {:ok, server} = :gen_tcp.connect(host, port, opt) :gen_tcp.s(server, HTTP.get("foo")) {:ok, _reply} = :gen_tcp.recv(server, 0) :gen_tcp.close(server) You might want to change the procedure request/2 during debugging to see if the socket communication actually works. Remove the print-out calls so that we re measuring the performance of the server and not the performance of the output function. Insert a small delay (10ms) in the handling of the request to simulate file handling, server side scripting etc. def reply({{:get, uri, _}, _, _}) do :timer.sleep(10) HTTP.ok("Hello!") Now, how many request per second can we serve? Is our artificial delay significant or does it disappear in the parsing overhead? What happens if you run the benchmarks on several machines at the same time? Run some tests and report your findings. 4 Increasing throughput Our web server as it looks right now will wait for a request, server the request and wait for the next. If the serving of a request deps on other processes, such as the file system or some database, we will be idle waiting while a new request might already be available for parsing. Your task is to increase the throughput of our server by letting each request be handled concurrently. This means that the server should handle each request in its own Elixir process. Should we create a new process for each incoming reply? Does it not take time to create a process (not really)? What will happen if we have thousand ID1019 KTH 10 / 12

of requests a minute? A better approach might be to create a pool of handlers and assign request to them; if there are now available handlers the request will be put on hold or ignored. Elixir even allows several processes to listen to a socket. One could create a fixed number of sequential servers that are all listening to the same socket. This is probably the easiest solution but you have to do some reading to understand the TCP module. The Elixir system has support for multicore architectures. If you have dualcore handy you can do some performance testing. Read more on how to start Elixir with multiprocessor support. Things might even improve a single core processor since the system will be better in hiding latency. Do some performance measurements to see if your concurrent solution outperforms the sequential solution. You will have to change the benchmark program so that it is also concurrent. 5 Optional extensions If you want to have some more challenges you can try to do the following improvements. 5.1 HTTP parsing If you have done things the easy way you might have taken for granted that the whole HTTP request is complete in the first string that you read from the socket. This might not be that case, the request might be divided into several blocks and you will have to concatenated these before you can read a complete request. One simple solution is as follows: do a first scan of the string and look for a double CR, LF. This will be the of the header section; if you don t find it you will have to wait for more. How do you know how long the body is? 5.2 Deliver files It s not much of an web server if it can only deliver canned answers. Let s ext the server so that it can deliver files. In order to do this we need to parse the URI request and separate the path and file form a possible query ID1019 KTH 11 / 12

or index. Once we have the file in our hand we need to make up a proper reply header containing the proper size, type and coding descriptors. 5.3 Robustness The way the server is terminated is of course not very elegant. One could do a much better job by having the sockets being active and deliver the connections as messages. The server could then be free to receive either control messages from a controlling process or a message from the socket. Another problem with the current implementation is that it will leave sockets open if things go wrong. A more robust way is to trap exceptions and close sockets, open files etc before terminating. ID1019 KTH 12 / 12