Chapter 4: Application Protocols 4.1: Layer : Internet Phonebook : DNS 4.3: The WWW and s

Chapter 4: Application Protocols 4.1: Layer 5-7 4.2: Internet Phonebook : DNS 4.3: The WWW and E-Mails OSI Reference Model Application Layer Presentation Layer Session Layer Application Protocols Chapter 3: Internet Protocols Chapter 2: Computer Networks Transport Layer Network Layer Data Link Layer Physical Layer Page 1

Layer 5: Session Layer Layer 5 is the lowest of the application orientated layers; it controls dialogs, i.e. the exchange of related information: Synchronization of partner instances by synchronization points: data can have been transferred correctly but have to be nevertheless partially retransmitted. (Crash of a sender in the mid of the data transmission process.) Therefore, synchronization points can be set on layer 5 at arbitrary times of the communication process. If a connection breaks down, not the entire data transmission has to be repeated; the transmission can remount at the last synchronization point. Dialog management during half duplex transmission: layer 5 controls the order in which the communication partners are allowed to send their data. Connection establishment, data transmission, and connection termination for layer 5 to 7. Use of different tokens for the assignment of transmission authorizations, for connection termination, and for the setting synchronization points. Page 2

Layer 6: Presentation Layer Layer 6 hides the use of different data structures or differences in their internal representation The same meaning of the data with the sender and the receiver is guaranteed Adapt character codes ASCII 7-bit American Standard Code for Information Interchange EBCDIC 8-bit Extended Binary Coded Digital Interchange Code Adapt number notation 32/40/56/64 bits Little Endian (byte 0 of a word is right) vs. Big Endian (byte 0 is left) Abstract Syntax Notation One, ASN.1 as transfer syntax Substantial tasks of layer 6: 1.) Negotiation of the transfer syntax 2.) Mapping of the own data to the transfer syntax 3.) and further data compression, data encryption (source coding) Page 3

Codes Source coding generally converts the representation of messages into a sequence of code words Efficient coding Remove redundancies Data compression Codes are meaningful only if they are clearly decodable, i.e. each sequence of characters, which consists of code words, can be divided definitely into a sequence of code words. In communication, immediately decodable codes are important, i.e. character sequences from code words can be decoded definitely from the beginning of the character sequence word by word, without considering following characters. Prefix code: no code word may be a prefix of another. Example: C = {0, 10, 011, 11111} is a definite code, but not immediately decodable To each definite code, an immediately decodable code exists, which is not longer. Page 4

Information Theory What is information? Definition: The mean information content (entropy) of a character is defined by N p log i= 1 i a p i = N p log i= 1 i a 1 p i with N - Number of different characters p i - Frequency of a character i (i=1,, N) A -Basis In a transferred sense: The entropy indicates, how surprised we are, which character comes next. Example 1: Given: 4 characters All N=4 characters are equivalent frequently (p i = 0.25 i) 4 Entropy: 0,25log 4 = log 4 = 2[ bit] 2 2 = i 1 There does not exist a better coding as with 2 bits per character Example 2: Given: 4 characters The first character has the frequency p 1 =1, thus is p 2 = p 3 = p 4 = 0 Entropy: 1 1 log 1+ lim3* p log = 0 + 0 = 0 2 a p 0 The entropy is 0 [bit], i.e. because anyway only character 1 is transferred, we did not even code and transfer it. p Page 5

Huffman Code Lehrstuhl für Informatik 4 The entropy indicates how many bits at least are needed for coding. A good approximation to that theoretical minimum (for mean code word length) is the use of a binary tree. The characters which are to be coded are at the leafs. Huffman code (a prefix code) Precondition: the frequency of the occurrence of all characters is well-known. Principle: more frequently arising characters are coded shorter than rarer ones 1.) List all characters as well as their frequencies 2.) Select the two list elements with the smallest frequency and remove them from the list 3.) Make them the leafs of a tree, whereby the probabilities for both elements are being added; place the tree into the list 4.) Repeat steps 2 and 3, until the list contains only one element 5.) Mark all edges: Father left son with 0 Father right son with 1 The code words result from the path from the root to the leafs Page 6

Huffman Code - Example The characters A, B, C, D and E are given with the probabilities p(a) = 0.27, p(b) = 0.36, p(c) = 0.16, p(d) = 0.14, p(e) = 0.07 Entropy: 2,13 4 p(adceb) = 1.00 0 1 2 p(ced) = 0.37 0 1 p(c) = 0.16 1 p(ed) = 0.21 0 1 3 p(ab) = 0.63 0 1 p(a) = 0.27 p(b) = 0.36 p(e) = 0.07 p(d) = 0.14 Resulting Code Words: w(a) = 10, w(b) = 11, w(c) = 00, w(d) = 011, w(e) = 010 Page 7

Huffman Code - Example A - Adenin 0,5 C - Cytosin 0,3 G - Guanin 0,15 T - Thymin 0,05 A (0,5) 0 C (0,3) (1,0) 1 (0,5) 0 1 (0,2) 0 1 G (0,15) T (0,05) Entropy 1.65 bit It can exist a coding with less than 2 bits per character on the average Page 8

Frequency of Characters and Character Sequences (English language) Letters Digrams Trigrams E 13,05 TH 3,16 THE 4,72 T 9,02 IN 1,54 ING 1,42 O 8,21 ER 1,33 AND 1,13 A 7,81 RE 1,30 ION 1,00 N 7,28 AN 1,08 ENT 0,98 I 6,77 HE 1,08 FOR 0,76 R 6,64 AR 1,02 TIO 0,75 S 6,46 EN 1,02 ERE 0,69 H 5,85 TI 1,02 HER 0,68 D 4,11 TE 0,98 ATE 0,66 L 3,60 AT 0,88 VER 0,63 C 2,93 ON 0,84 TER 0,62 F 2,88 HA 0,84 THA 0,62 U 2,77 OU 0,72 ATI 0,59 M 2,62 IT 0,71 HAT 0,55 P 2,15 ES 0,69 ERS 0,54 Y 1,51 ST 0,68 HIS 0,52 W 1,49 OR 0,68 RES 0,50 G 1,39 NT 0,67 ILL 0,47 B 1,28 HI 0,66 ARE 0,46 V 1,00 EA 0,64 CON 0,45 K 0,42 VE 0,64 NCE 0,43 X 0,30 CO 0,59 ALL 0,44 J 0,23 DE 0,55 EVE 0,44 Q 0,14 RA 0,55 ITH 0,44 Z 0,09 RO 0,55 TED 0,44 Codes like the Huffman code are not limited necessarily to individual characters. It can be more meaningful (depending on the application) to code directly whole character strings example: the English language. Page 9

Arithmetic Coding Characteristics: Achieves optimality (coding rate) as the Huffman coding Difference to Huffman: the entire data stream has an assigned probability, which consists of the probabilities of the contained characters. Coding a character takes place with consideration of all previous characters. The data are coded as an interval of real numbers between 0 and 1. Each value within the interval can be used as code word. The minimum length of the code is determined by the assigned probability. Disadvantage: the data stream can be decoded only as a whole. Page 10

Arithmetic Coding: Example Code data ACAB with p A = 0.5, p B = 0.2, p C = 0.3 0 p A = 0.5 pb = 0.2 p C = 0.3 0.5 0.7 1 0 p AA = 0.25 p AB = 0.1p AC = 0.15 p BA p BB p BC p CA p CB p CC 0.25 0.35 0.5 0.6 0.68 0.7 0.85 0.91 1 p ACA = 0.075 p ACB = 0.03 p ACC = 0.045 0.35 0.425 0.455 0.5 0.35 p ACAA = 0.0375 p ACAB = 0.015 p ACAC = 0.0225 0.3875 0.4025 0.425 ACAB can be coded by each binary number from the interval [0.3875, 0.4025), rounded up to log 2 (p ACAB ) = 6.06 i.e. 7 bit, e.g. 0.0110010 Page 11

Layer 7: Application Layer Collection of often used communication services Identification of communication partners Detection of the availability of communication partners Authentication Negotiation of the grade of the transmission quality Synchronization of cooperating applications Page 12