Architectures for Distributed Systems

Distributed Systems and Middleware 2013 2: Architectures Architectures for Distributed Systems Components A distributed system consists of components Each component has well-defined interface, can be replaced by another one with same I/F in the system Architectural styles How components should be organized? How components should interact with each other? 1 2 System architecture How components are placed on real machines Architectural t styles (1) Request flow Layer N Layer N-1 Layer 2 Layer 1 Response flow Layered architecture Component at layer i is allowed to call components at underlying layer i-1 3 Method call -based architecture Components (objects) are connected through a remote procedure call mechanism 4 Architectural styles (2) Data-centered architecture Processes communicate through a common repository Event-based architecture Processes communicate through propagation of events publish/subscribe (pub/sub) system events are published only subscriber processes receive the published events component component component component delivery delivery publish Event bus publish component Event-based architecture (publish/subscribe system) Shared data space Shared data-space architecture (data-centered + event-based)

System architectures What is the system architecture? Instance of a distributed system after deciding components, their interaction, and their placement According to placement, following forms exist Centralized architectures Decentralized architectures Various hybrid forms Centralized architectures t Client-server model Processes in a DS are divided into two groups: server, client Server: e apocess process implementing pe e gaspec specific cservice Client: a process requesting a service from a server 5 6 Can be implemented in LAN with connectionless protocol and in WAN with reliable connection-oriented protocol Application i layering Client-server model has the following 3 levels User-interface level Processing level Data level Multi-tiered i architecture Three levels can be distributed across several machines Two-tiered architecture Three levels are distributed over two kinds of machines: clients and servers There are the following five possibilities 7 Simplified organization of Internet search engine 8

Three-tiered architecture A single server can be replaced by multiple servers running on different machines A server may need to act as a client Question 2-1 (1) Show an example system for each architectural styles on pages 3-4. (2) Show an example system for each of (a)- (e) in two-tiered architecture on page 8. (3) Show a system (other than Internet search engine) that can be realized with three- tiered architecture. 9 This kind of distribution is called vertical distribution 10 Decentralized architectures Vertical distribution: logical level division This is one of many possible ways of organizing a distributed system Horizontal distribution: physical level division Client (server) is physically split up into logically equivalent parts Each part operates on its own share of the complete data set All the parts balance the load Architectures supporting horizontal distribution Peer-to-peer (P2P) systems Peer-to-peer (P2P) systems What is a P2P system? Resources (files, bandwidth, computation power, services, etc) are distributed and shared among peers (user processes) Characteristics of P2P systems The processes that constitute a P2P system are all equal Interaction between processes is symmetric Each process will act as a client and a server at the same time Processes form a network called the overlay network Consists of processes and overlay links (commun. channel) A process cannot communicate directly with an arbitrary other process, but is required ed to send requests ests through available communication channels (neighboring peers) 11 12

Process (peer) Example of overlay network Overlay link (TCP connection) Overlay Network Types of P2P systems Hybrid P2P Structured P2P Unstructured P2P Hierarchical P2P Host A Host C Host D Physical Network Host F 13 Host B Switch (router) Host E 14 15 P2P-based file sharing Goals Files are distributed across all peers When a peer sends a asking for a file to one of other peers, it will receive a reply indicating which peer retains the file File placement Centralized placement: one server retains all files Bottleneck in the server and the network Distributed placement: M files are distributed over N peers Processing and communication traffic amounts can be balanced Problem is how efficiently queries can be routed Query routing Storing the index to all the files (routing table) in one place or in all peers does not scale as the network size grows Need to keep the routing table adequately small Query 16 Hybrid P2P: example Index server Peer A (requests file X ) Reply Napster DL Request for X Download X Peer B Peer E (storing file X ) Peer D Peer C Advantage Fast t search Easy security guarantee Easy management for contents Disadvantage Server maintenance Fault-tolerance Size scalability User searches the responsible peer for the requested file with the index server

Structured P2P architectures DHT (Distributed Hash Table) Referring to methods for efficiently/deterministically searching the peer that retains a file, given a with a key (hash) of the file Examples: CAN, Chord [1], Pastry [2], Tapestry [1] I. Stoica, et al.: Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Proc. of ACM SIGCOMM 01 01, 2001. [2] A. Rowstron, P. Druschel, Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems, Proc. of IFIP/ACM Middleware 2001, 2001. Challenges in structured P2P To realize efficient file sharing, Where (in which peers) files should be stored? How queries should be routed to reach the peers with target files? Peers Data Files 17 DHT (contents) Query User (peer) 18 19 Chord: file placement Assign a responsible peer for each file in fair manner With a hash function, compute a key (m-bit) for each file Store the file in a peer with the same ID as its key (if no peer with the same ID, the file is stored in the first-found succeeding peer) 0 Key space (each key: m bits) 2 m -1 0 2 m -1 Problem Peer ID space (each ID: m bits) red box: file, blue box: peer When with a key is issued, how can the responsible peer be searched? searching the whole peer ID space one by one O(2 m ) Chord: routing Tradeoff between lookup table size and number of hops to reach the target peer If each peer has complete lookup table for all files Can reach the target peer by 1 hop, but table size is O(N) If each peer does not have any index for files Table size is O(1), but reach the target peer by O(N) hops Chord approach: each peer has partial view of lookup table Each peer has partial lookup table with O(log N) entries O(log N) hops to reach the target peer by recursively narrowing the search space (similar to binary search) 20 N is the number of all peers, N 2 m

100 Chord: solution (1/2) dividing search space into m intervals 0 Peer ID space (each ID: m(=8)bit) ) 2 8-1 101 103 107 To decide peer i s (e.g., peer 99) routing table, divide search space into m (=8) intervals of 2 0, 2 1, 2 2, 2 3,, 2 m-1 entries i (=99) 255 0 m th interval 1 st interval 2 nd interval 3 rd interval 100+2 7 =228 2 8-1 =128 entries 21 The peer with ID i (=99) has the routing table with m (=8) entries Intervals Next peer to ask [100, 101) Peer 100 (or the succeeding peer if peer 10 does not exist) [101, 103) Peer 101 (or succeeding peer) [103, 107) Peer 103 (or succeeding peer) [228, 100) Peer 228 (or succeeding peer) Note: for peer i, 1st interval starts from i+1, m th interval starts t from (i+2 m-1 ) mod 2 m Chord: solution (2/2) step-by-step search space narrowing Lookup table of peer 99 100 100+2 7 =228 99 Lookup table of peer 228 228+1 232 Lookup table of peer 232 233 233 (1) When peer 99 receives with key (=233), it identifies the interval containing the key, and forwards the to peer 228 (2) When peer 228 receives es the, it identifies the interval containing the key, and forwards it to peer 232 22 (3) Peer 232 receives the, identifies the interval (peer 233), and forwards the to peer 233. Worst case Chord: example (1/3) 1 st peer 2 nd peer 3 rd peer 4 th peer 5 th peer O(log N)=O(m) N=32 Key specified in the was in this interval log N =log 2 32 =5 M files are distributed across N peers search file by key (1) Assign a hash value (key) to each file Key is m-bit decided by hash function SHA-1 (N 2 m M) (2) Construct a virtual ring of peers m =128 or 160 Ring consists of 0 to 2 m -1 IDs All peers are associated with IDs on the ring (3) Decide a responsible peer for each file A file is retained by the peer with the smallest ID such that key ID represented by successor(key) return ID of the peer first found starting from key m=3 23 24

25 Chord: example (2/3) (4) Construct a routing table called finger table for each peer Finger table has m (=log N) entries Key space with 2 m items is divided into m intervals with 2 0, 2 1,..., 2 m-1 items Each entry specifies the ID of the next searching peer (succ.) for an interval Finger table of peer 0 1 st int.: [1, 2) next is peer 1 2 nd int.: [2, 4) next is peer 3 3 rd int.: [4, 0) next is peer 0 How to get k-th int. for peer n start k =(n+2 k-1 ) mod 2 m (1 k m) int k = [start k,start k+1 ) succ k = first node start k 26 Chord: example (3/3) What happens when peer 3 receives with key:1 Peer 3 NOT have file 1 forwards to peer 0 based on f. table Peer 0 NOT have file 1 forwards to peer 1 Finally reaches peer 1 that t has file 1 How many hops for? Query is forwarded by at most m times (See page 23 for the worst case) 27 Question 2-2 2 Suppose to use Chord, and answer the following questions (write your answer in next page). Set of IDs for peers {2, 4, 6, 7, 9, 12, 15} Set of keys for files {0, 2, 5, 8, 10, 13, 14, 15} (1) In which peer is each file stored? (2) Complete the finger table of peer 4 (3) How for file 0 is traversed when starting from peer 4? (4) What happens when peer 4 leaves from the network? After that, what happens when a new peer with ID 5 joins? Answer for question 2-2 2 12 13 11 14 10 15 9 28 8 0 1 7 2 6 3 5 4 Files to retain = Finger table start int. succ.

Unstructured P2P: architectures Characteristics of Unstructured P2P No limits on topology (connection of peers), flexible search Each peer has a list of c peers selected at random: partial view The list is periodically exchanged between neighboring peers All peers compose a random graph Peer F Unstructured P2P: example hit Peer A (requests file X ) X) Gnutella hit Peer B Peer E (storing file X ) Peer D Peer C Commun. order Advantage Fault-tolerancetolerance Privacy-preserve Disadvantage network bandwidth is suppressed Search a file by flooding a request without the index server 29 30 To join network, new peer should get one of existing peers address Each peer can keep connections with up to four peers Managing topology in unstructured dp2p Hierarchical P2P Disadvantage of unstructured P2P Search is done by flooding queries efficiency is not good Improvement for efficiency Manage topology in two layers Structured topology Protocol for maintaining optimal topology in a given criterion Links to neighbor peers (optimally selected by a criterion, e.g., having common data, geographically close, ) Peers selected at random Unstructured P2P does not scale as the network grows Flooding a request will overload the entire network Broker that collects resource usage for peers in each other s proximity will allow to quickly select a peer with sufficient resources Superpeers Peers that maintain an index for all peers in their groups and act as a broker regular peer Random topology Protocol for maintaining random graph Links to neighbor peers (selected at random) 31 superpeer 32 superpeer network

Summary Architectures for distributed systems Architectural styles Layered, object-based, event-based, data-centered System architectures: Centralized architecture: client-server model Decentralized architecture: vertical/horizontal distribution of C/S model Peer-to-peer systems Hybrid P2P Structured P2P Unstructured P2P Hierarchical P2P 33