Information Systems Engineering Distributed and Mobile Information Systems

Information Systems Engineering Distributed and Mobile Information Systems Serge Shumilov Department of Computer Science III, University of Bonn, Germany shumilov@cs.uni bonn.de

Outline 1. What is a Distributed Information System 2. Examples of Distributed Information Systems 3. Common Characteristics 4. Basic Design Issues 5. Basic Technologies 6. Preview: MA INF 3204 "Distributed and Mobile Information Systems" Information Systems Engineering, 2009 2

1. What is a Distributed Information System? Definition: A distributed information system is one, in which multiple components located at networked computers, that communicate and coordinate their actions only by passing messages. This definition leads to the following characteristics of distributed systems: concurrency of components lack of a global clock independent failures of components Information Systems Engineering, 2009 3

1.1 Centralized vs. Distributed System Characteristics Centralized System Distributed System One component with non autonomous parts Multiple autonomous components Component shared by users all the time All resources accessible Software runs in a single process Components are not shared by all users Resources may not be accessible Software runs in concurrent processes on different processors Single point of control Multiple points of control Single point of failure Multiple points of failure Information Systems Engineering, 2009 4

1.2 Distributed System Types Control Fully Distributed Autonomous fully cooperative Autonomous transaction based Master slave Data Fully replicated Not fully replicated master directory Local data, local directory Homog. special purpose Heterog. special purpose Homog. general purpose Heterog. general purpose Processors Information Systems Engineering, 2009 5

2. Examples of Distributed Systems Local Area Network and Intranet Database Management System Embedded Systems/Automatic Teller Machine Network Internet/World Wide Web Mobile and Ubiquitous Computing Information Systems Engineering, 2009 6

2.1 Local Area Network (LAN) print and other servers email server Desktop computers Web server Local area network email server File server print other servers the rest of the Internet router/firewall Information Systems Engineering, 2009 7

2.2 Database Management System (DBMS) Information Systems Engineering, 2009 8

2.3 Automatic Teller Machine Network Information Systems Engineering, 2009 9

2.4 Internet % ISP intranet % % % backbone satellite link desktop computer: server: network link: Information Systems Engineering, 2009 10

2.4.1 World Wide Web (WWW) Information Systems Engineering, 2009 11

2.4.2 Web Servers and Web Browsers Web Servers www.google.com Web Browsers http://www.google.comlsearch?q=shumilov www.cdk3.net Internet http://www.iai.uni bonn.de/ www.w3c.org File system of www.w3c.org Protocols http://www.w3c.org/protocols/activity.html Activity.html Information Systems Engineering, 2009 12

2.5 Mobile and Ubiquitous Computing Internet Host intranet Wireless LAN GSM/GPRS gateway Home intranet Printer Camera Mobile phone Laptop Host site Information Systems Engineering, 2009 13

3. Common Characteristics Certain common characteristics can be used to assess distributed systems What are we trying to achieve when we construct a distributed system? Which problems we will receive? These represent various properties that distributed systems should have. They can be seen as advantages and disadvantages of a distributed system comparing it to a centralized one. Such classification is not simple. Many characteristics can play both roles: as an advantage and a disadvantage. Distribution and Mobility Openness Concurrency Scalability Fault tolerance Heterogeneity Security Complexity Transparency Information Systems Engineering, 2009 14

3.1 Distribution and Mobility According to the given definition the system consists of several distributed components Users can be geographically separate. This is important for large corporations, where business decisions must be made by people in different locations, but those decisions must be based on company wide data. Resource sharing Sharing of data, hardware and software resources. Mobility of users and resources A user or resource can change it's location. For example, Mobile Code to refer to code that can be sent from one computer to another and run at the destination (e.g. Java applets and Java virtual machine). Information Systems Engineering, 2009 15

3.2 Heterogeneity Variety and differences in Networks Computer hardware Operating systems Programming languages Implementations by different developers Heterogeneous systems can use the best tools for each task. Different components of an application can run on hardware that is optimized for a specific task. For example, an application might need to retrieve large amounts of statistical or experimental data from a database, perform complex computations on that data (such as computing a weather model), and display the results of that computation in the form of maps. By running the database, the computational engine, and the graphics rendering engine on hardware that is optimized for each task, performance can improve dramatically. Information Systems Engineering, 2009 16

3.2.1 Heterogeneous Applications and Methods tool 1 tool 2 tool 3 closed applications proprietary data exchange formats obscure data semantics no reuse due to poor documentation Tool Isolation Information Systems Engineering, 2009 17

3.2.2 Heterogeneous Data Modeling data source 1 data source 2 data source 3 data sets built independently according to local needs many complex data types different standards, formats, data models and representations different storage systems 2716BD 561000.10/qhKD/T;fs2,h5,e2,hw/br/dbngr/wf2$ 2716BD 561010.35/qhKD/T;fs2,e e4,hw/br/gngr/wf4,ks3 ks4$ 2716BD 561020.51/qhKD/T;fs2,e,hw2/br/blgngr/ks4,wf5$ 2716BD 561030.61/qhKD/T;fs2,e2 e,hw2/br/gn=blgr/ks4,wf5$ 2716BD 561041.25/qhKD/T;fs2 fs,pf2,k/wa/dgr/ks4,wf5$ 2716BD 561051.56/qhKD/T;fs2,k2 k/wa/bn=dgr$ 2716BD 561062.20/qhO/H;zg4/Hnb/dbn$ 2716BD 561073.90/qhO/H;zg4/Hnb/ro=bn//,,(Z3)$ 2716BD 561084.00/qp/mS;h2/a/hbngr/et$ Data Isolation Information Systems Engineering, 2009 18

3.3 Openness Use of equipment and software from different vendors. Detailed interfaces of components need to be published. New components have to be integrated with existing components. Differences in data representation of interface types on different processors (of different vendors) have to be resolved. Easier extensions and improvements of distributed systems. Information Systems Engineering, 2009 19

3.4 Scalability Adaptation of distributed systems to accommodate more users increase performance by adding new resources (this is the hard one) Usually done by adding more and/or faster processors. Components should not need to be changed when scale of a system increases. Design of components should be scalable! Where is now my data? Information Systems Engineering, 2009 20

3.5 Concurrency Concurrent processing to enhance performance. Components in distributed systems can be executed in concurrent processes. Components can access and use shared resources (e.g. variables, databases, device drivers). Integrity of the system may be violated if concurrent updates are not coordinated. Lost updates Inconsistent analysis Information Systems Engineering, 2009 21

3.6 Failure Handling (Fault Tolerance) The ability to continue in operation after a fault has occurred. Hardware, software and networks can fail! Distributed systems (properly constructed) can operate even at low levels of hardware/software/network reliability. Fault tolerance is achieved by recovery redundancy Information Systems Engineering, 2009 22

3.7 Security More fragile and susceptible to external attack. In a distributed system, components communicate by sending messages through a network: Doctors requesting records from hospitals Users purchase products through electronic commerce systems Security is required for Concealing the contents of messages: security and privacy Identifying a remote user or other agent correctly: authentication New challenges: Denial of service attack Principal A Principal B Security of mobile code Process p Secure channel Process q Information Systems Engineering, 2009 23

3.8 Complexity Typically, distributed systems are more complex than centralised systems. Difficult manageability More effort required for system management. Unpredictability Unpredictable responses depending on the system organisation and network load. e.g. real time ordering of events X send 1 m 1 receive 4 receive Y 2 receive send 3 m 2 receive Physical time Z receive receive send A m 3 m 1 m 2 receive receive receive t 1 t 2 t 3 Information Systems Engineering, 2009 24

3.9 Transparency Definition: For distributed systems transparency means that any form of such a distributed system should hide its distributed nature from its users, appearing and functioning as a normal centralized system. Transparency is a kind of an aspect or a cross cutting characteristic going through other properties. Formal definitions of most of these properties can be found in Reference Model of Open Distributed Processing (RM ODP), the Open Distributed Processing Reference Model (ISO 10746). http://en.wikipedia.org/wiki/rm ODP Information Systems Engineering, 2009 25

3.9.1 Transparencies Access transparency: enables local and remote resources to be accessed using identical operations. Examples: File system operations in NFS, Navigation in the Web, SQL Queries Location transparency: enables resources to be accessed without knowledge of their physical or network location (for example, which building or IP address). Examples: File system operations in NFS, Pages in the Web, Tables in distributed databases Concurrency transparency: enables several processes to operate concurrently using shared resources without interference between them. Examples: NFS, Automatic teller machine network, Distributed DBMS Replication transparency: enables multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or programmers. Examples: Distributed DBMS, Mirroring Web Pages Information Systems Engineering, 2009 26

3.9.2 Transparencies Failure transparency: enables the concealment of faults, allowing users and application programs to complete their tasks despite the failure of hardware or software components. Examples: Distributed DBMS, Embedded Systems (Satellites) Mobility transparency: allows the movement of resources and clients within a system without affecting the operation of users or programs. Examples: NFS, Web Pages Performance transparency: allows the system to be reconfigured to improve performance as loads vary. Examples: High Performance Computing Grids Scaling transparency: allows the system and applications to expand in scale without change to the system structure or the application algorithms. Example: World Wide Web, Distributed DBMS Information Systems Engineering, 2009 27

4. Basic Design Issues Definition: a model is a formalized interpretation which deals with empirical entities, phenomena, and physical processes in a mathematical, or logical way. Basically a model is a simplified abstract view of the complex system. It may focus on particular views, enforcing the "divide and conquer" principle for a compound problem. Specific fundamental models for distributed information systems: Architekture model Communication model Concurrency model Fault model Security model Information Systems Engineering, 2009 28

4.1 Architekture model Examples: Architekture of a building defines components, materials, the structure and/or behavior of a building Hardware architekture of a hardware devide Software architekture of a software system Architecture model provides: a consistent, application fair and possibly aesthetically pleasing specification of the structure, individual components and their connections Layers and tiers Bottom up design / Top down design One tier / Two tier (client/server) / Three tier (middleware) N tier architectures Clusters and tier distribution Information Systems Engineering, 2009 29

4.1.1 Layers and tiers Client Application Logic Resource Manager Client Server Database Presentation layer Business rules Business objects Client Business processes Persistent storage Client is any user or program that wants to perform an operation over the system. Clients interact with the system through a presentation layer The application logic determines what the system actually does. It takes care of enforcing the business rules and establish the business processes. The application logic can take many forms: programs, constraints, business processes, etc. The resource manager deals with the organization (storage, indexing, and retrieval) of the data necessary to support the application logic. This is typically a database but it can also be a text retrieval system or any other data management system providing querying capabilities and persistence. Information Systems Engineering, 2009 30

4.1.2 A game of boxes and arrows There is no problem in system design that cannot be solved by adding a level of indirection. There is no performance problem that cannot be solved by removing a level of indirection. Each box represents a part of the system. Each arrow represents a connection between two parts of the system. The more boxes, the more modular the system: more opportunities for distribution and parallelism. This allows encapsulation, component based design, reuse. The more boxes, the more arrows: more sessions (connections) need to be maintained, more coordination is necessary. The system becomes more complex to monitor and manage. The more boxes, the greater the number of context switches and intermediate steps to go through before one gets to the data. Performance suffers considerably. System designers try to balance the flexibility of modular design with the performance demands of real applications. Once a layer is established, it tends to migrate down and merge with lower layers. Information Systems Engineering, 2009 31

4.1.3 Top down design The functionality of a system is divided among several modules. Modules cannot act as a separate component, their functionality depends on the functionality of other modules. Hardware is typically homogeneous and the system is designed to be distributed from the beginning. top down architecture PL B PL A PL C top down design PL A PL C PL B AL C AL B AL A AL B AL C AL D AL A AL D RM 1 RM 2 RM 1 RM 2 Information Systems Engineering, 2009 32

4.1.4 Top down design top down design 1. define access channels and client platforms 2. define presentation formats and protocols for the selected clients and protocols 3. define the functionality necessary to deliver the contents and formats needed at the presentation layer 4. define the data sources and data organization needed to implement the application logic client presentation layer application logic layer resource management layer information system Information Systems Engineering, 2009 33

4.1.5 Bottom up design New application Legacy systems Legacy application In a bottom up design, many of the basic components already exist. These are stand alone systems which need to be integrated into new systems. The components do not necessarily cease to work as stand alone components. Often old applications continue running at the same time as new applications. This approach has a wide application because the underlying systems already exist and cannot be easily replaced. Much of the work and products in this area are related to middleware, the intermediate layer used to provide a common interface, bridge heterogeneity, and cope with distribution. Information Systems Engineering, 2009 34

4.1.6 Bottom up design bottom up design PL A PL B PL C PL A PL C PL B bottom up architecture AL A AL B AL C AL D AL C AL B wrapper wrapper wrapper AL A AL D wrapper wrapper wrapper legacy application legacy application legacy system legacy system legacy system Information Systems Engineering, 2009 35

4.1.7 Bottom up design bottom up design 1. define access channels and client platforms 2. examine existing resources and the functionality they offer 3. wrap existing resources and integrate their functionality into a consistent interface 4. adapt the output of the application logic so that it can be used with the required access channels and client protocols client presentation layer application logic layer resource management layer information system Information Systems Engineering, 2009 36

4.1.8 One tier: fully centralized 1 tier architecture Server The presentation layer, application logic and resource manager are built as a monolithic entity. Users/programs access the system through display terminals but what is displayed and how it appears is controlled by the server. (These are dumb terminals). This was the typical architecture of mainframes, offering several advantages: no forced context switches in the control flow (everything happens within the system), all is centralized, managing and controlling resources is easier, the design can be highly optimized by blurring the separation between layers. Information Systems Engineering, 2009 37

4.1.9 Two tier: client/server 2 tier architecture Server As computers became more powerful, it was possible to move the presentation layer to the client. This has several advantages: Clients are independent of each other: one could have several presentation layers depending on what each client wants to do. One can take advantage of the computing power at the client machine to have more sophisticated presentation layers. This also saves computer resources at the server machine. It introduces the concept of API (Application Program Interface). An interface to invoke the system from the outside. It also allows designers to think about federating the systems into a single system. The resource manager only sees one client: the application logic. This greatly helps with performance since there are no client connections/sessions to maintain. Information Systems Engineering, 2009 38

4.1.10 API in client/server Client/server systems introduced the notion of service (the client invokes a service implemented by the server) Together with the notion of service, client/server introduced the notion of service interface (how the client can invoke a given service) Taken all together, the interfaces to all the services provided by a server (whether there are application or system specific) define the server s Application Program Interface (API) that describes how to interact with the server from the outside Many standardization efforts were triggered by the need to agree to common APIs for each type of server server s API service interface service interface service interface service interface service service service service server resource management layer Information Systems Engineering, 2009 39

4.1.11 Technical aspects of the 2 tier architecture There are clear technical advantages when going from one tier to two tier architectures: take advantage of client capacity to off load work to the clients work within the server takes place within one scope (almost as in 1 tier), the server design is still tightly coupled and can be optimized by ignoring presentation issues still relatively easy to manage and control from a software engineering point of view However, two tier systems have disadvantages: The server has to deal with all possible client connections. The maximum number of clients is given by the number of connections supported by the server. Clients are tied to the system since there is no standard presentation layer. If one wants to connect to two systems, then the client needs two presentation layers. There is no failure or load encapsulation. If the server fails, nobody can work. Similarly, the load created by a client will directly affect the work of others since they are all competing for the same resources. Information Systems Engineering, 2009 40

4.1.12 The main limitation of client/server The responsibility of dealing with heterogeneous systems is shifted to the client. The client becomes responsible for knowing where things are, how to get to them, and how to ensure consistency Server A Server B If clients want to access two or more servers, a 2 tier architecture causes several problems: the underlying systems don t know about each other there is no common business logic the client is the point of integration (increasingly fat clients) This is tremendously inefficient from all points of view (software design, portability, code reuse, performance since the client capacity is limited, etc.). There is very little that can be done to solve this problems if staying within the 2 tier model. Information Systems Engineering, 2009 41

4.1.13 Three tier: middleware 3 tier architecture In a 3 tier system, the three layers are fully separated. The layers are also typically distributed taking advantage of the complete modularity of the design (in two tier systems, the server is typically centralized) A middleware based system is a 3 tier architecture. This is a bit oversimplified but conceptually correct since the underlying systems can be treated as black boxes. In fact, 3 tier makes only sense in the context of middleware systems (otherwise the client has the same problems as in a 2 tier system). Information Systems Engineering, 2009 42

4.1.14 Middleware clients Middleware or global application logic Local application logic Local resource managers Server A Middleware Server B Middleware is just a level of indirection between clients and other layers of the system. It introduces an additional layer of business logic encompassing all underlying systems. By doing this, a middleware system: simplifies the design of the clients by reducing the number of interfaces, provides transparent access to the underlying systems, acts as the platform for inter system functionality and high level application logic, and takes care of locating resources, accessing them, and gathering results. But a middleware system is just a system like any other! It can also be 1 tier, 2 tier, 3 tier... Information Systems Engineering, 2009 43

4.1.15 Technical aspects of middleware The introduction of a middleware layer helps in that: the number of necessary interfaces is greatly reduced: clients see only one system (the middleware), local applications see only one system (the middleware), it centralizes control (middleware systems themselves are usually 2 tier), it makes necessary functionality widely available to all clients, it allows to implement functionality that otherwise would be very difficult to provide, and it is a first step towards dealing with application heterogeneity (some forms of it). The middleware layer does not help in that: it is another indirection level, it is complex software, it is a development platform, not a complete system Information Systems Engineering, 2009 44

4.1.16 A three tier middleware based system... External clients External client middleware system internal clients connecting logic control user logic middleware wrappers 2 tier systems Resource managers 2 tier system Resource manager Information Systems Engineering, 2009 45

4.1.17 N tier: connecting to the Web client Web browser Web server HTML filter application logic layer resource management layer information system N tier architecture presentation layer middleware N tier architectures result from connecting several three tier systems to each other and/or by adding an additional layer to allow clients to access the system through a Web server The Web layer was initially external to the system (a true additional layer); today, it is slowly being incorporated into a presentation layer that resides on the server side (part of the middleware infrastructure in a three tier system, or part of the server directly in a two tier system) The addition of the Web layer led to the notion of application servers, which was used to refer to middleware platforms supporting access through the Web Information Systems Engineering, 2009 46

4.1.18 N tier systems in reality INTERNET internal clients LAN middleware application logic FIREWALL Web server cluster LAN LAN, gateways LAN resource management layer database server file server application middleware application logic LAN additional resource management layers Wrappers and gateways LAN Information Systems Engineering, 2009 47

4.2 Communication Model Architecture modeling must cover: Organizational/Business analysis Task/Workplace analysis Component/Actor/Agent analysis (both human and system) Usually, several components/actors cooperate in a business process or task Communication model intends to capture its interactions within ajoint task Communication Model = conceptual specification of what kind of information objects are exchanged between components cooperating in and carrying out a task, and how? Blocking or synchronous interactions Non blocking or asynchronous interactions Information Systems Engineering, 2009 48

4.2.1 Blocking or synchronous interaction Traditionally, information systems use blocking calls (the client sends a request to a service and waits for a response of the service to come back before continuing doing its work) Synchronous interaction requires both parties to be on line : the caller makes a request, the receiver gets the request, processes the request, sends a response, the caller receives the response. The caller must wait until the response comes back. The receiver does not need to exist at the time of the call (TP Monitors, CORBA or DCOM create an instance of the service/server /object when called if it does not exist already) but the interaction requires both client and server to be alive at the same time Call Answer client idle time Because it synchronizes client and server, this mode of operation has several disadvantages: connection overhead higher probability of failures server difficult to identify and react to failures Receive Response it is a one to one system; it is not really practical for nested calls and complex interactions (the problems becomes even more acute) Information Systems Engineering, 2009 49

4.2.2 Overhead of synchronism Synchronous invocations require to maintain a session between the caller and the receiver. Maintaining sessions is expensive and consumes CPU resources. There is also a limit on how many sessions can be active at the same time (thus limiting the number of concurrent clients connected to a server) For this reason, client/server systems often resort to connection pooling to optimize resource utilization have a pool of open connections associate a thread with each connection allocate connections as needed Synchronous interaction requires a context for each call and a context management system for all incoming calls. The context needs to be passed around with each call as it identifies the session, the client, and the nature of the interaction. request() do with answer request() do with answer Context is lost Needs to be restarted!! session duration receive process return receive process return Information Systems Engineering, 2009 50

4.2.3 Failures in synchronous calls If the client or the server fail, the context is lost and resynchronization might be difficult. If the failure occurred before 1, nothing has happened If the failure occurs after 1 but before 2 (receiver crashes), then the request is lost request() do with answer 1 4 receive process return 2 3 If the failure happens after 2 but before 3, side effects may cause inconsistencies If the failure occurs after 3 but before 4, the response is lost but the action has been performed (do it again?) Who is responsible for finding out what happened? Finding out when the failure took place may not be easy. Worse still, if there is a chain of invocations (e.g., a client calls a server that calls another server) the failure can occur anywhere along the chain. request() do with answer timeout try again do with answer 1 receive process return receive process return 2 3 2 3 Information Systems Engineering, 2009 51

4.2.4 Two solutions Enhanced Support Asynchronous Interaction Client/Server systems and middleware platforms provide a number of mechanisms to deal with the problems created by synchronous interaction: Transactional interaction: to enforce exactly once execution semantics and enable more complex interactions with some execution guarantees Service replication and load balancing: to prevent the service from becoming unavailable when there is a failure (however, the recovery at the client side is still a problem of the client) Using asynchronous interaction, the caller sends a message that gets stored somewhere until the receiver reads it and sends a response. The response is sent in a similar manner Asynchronous interaction can take place in two forms: non blocking invocation (a service invocation but the call returns immediately without waiting for a response, similar to batch jobs) persistent queues (the call and the response are actually persistently stored until they are accessed by the client and the server) Information Systems Engineering, 2009 52

4.2.5 Message queuing Reliable queuing turned out to be a very good idea and an excellent complement to synchronous interactions: Suitable to modular design: the code for making a request can be in a different module (even a different machine!) than the code for dealing with the response blocking period invoking execution thread request invoked execution thread response It is easier to design sophisticated distribution modes (multicast, transfers, replication, coalescing messages) an it also helps to handle communication sessions in a more abstract way More natural way to implement complex interactions between heterogeneous systems thread remains active invoking execution thread put fetch queue queue invoked execution thread fetch put Information Systems Engineering, 2009 53

5. Middleware Definition: Middleware is a software providing a programming abstraction as well as masking the heterogeneity of the underlying networks, hardware, OS, and programming languages (e.g. CORBA, Web Services). Understanding middleware Middleware as a programming abstraction Middleware as infrastructure A quick overview of conventional middleware platforms RPC TP Monitors Object brokers Middleware convergence Information Systems Engineering, 2009 54

5.1 Programming abstractions Programming languages and almost any form of software system evolve always towards higher levels of abstraction hiding hardware and platform details more powerful primitives and interfaces leaving difficult task to intermediaries (compilers, optimizers, automatic load balancing, automatic data partitioning and allocation, etc.) reducing the number of programming errors reducing the development and maintenance cost of the applications developed by facilitating their portability Middleware is primarily a set of programming abstractions developed to facilitate the development of complex distributed systems to understand a middleware platform one needs to understand its programming model from the programming model the limitations, general performance, and applicability of a given type of middleware can be determined in a first approximation the underlying programming model also determines how the platform will evolve and fare when new technologies evolve Information Systems Engineering, 2009 55

5.2 The genealogy of middleware Application servers TP Monitors Object brokers Message brokers Specialized forms of RPC, typically with additional functionality or properties but almost always running on RPC platforms Transactional RPC Object oriented RPC (RMI) Asynchronous RPC Remote Procedure Call: hides communication details behind a procedure call and helps bridge heterogeneous platforms Remote Procedure Call sockets: operating system level interface to the underlying communication protocols sockets TCP, UDP TCP, UDP: User Datagram Protocol (UDP) transports data packets without guarantees Transmission Control Protocol (TCP) verifies correct delivery of data streams Internet Protocol (IP) Internet Protocol (IP): moves a packet of data from one node to another Information Systems Engineering, 2009 56

5.3 And the Internet? And Java? Programming abstractions are a key part of middleware but not the only one: a programming abstraction without good supporting infrastructure (i.e., a good implementation and support system underneath) does not help Programming abstractions, in fact, appear I many cases in reaction to changes in the underlying hardware or the nature of the systems being developed Java is a programming language that abstracts the underlying hardware: programmers see only the Java Virtual Machine regardless of what computer they use code portability (not the same as code mobility) the first step towards standardizing middleware abstractions (since now the can be based on a virtual platform everybody agrees upon) The Internet is a different type of network that requires one more specialization of existing abstractions: The Simple Object Access Protocol (SOAP) of Web services is RPC wrapped in XML and mapped to HTML for easy transport through the Internet Information Systems Engineering, 2009 57

5.4 Middleware as infrastructure client code client process DCE development environment IDL server process server code language specific call interface IDL sources language specific call interface client stub IDL compiler server stub RPC API RPC API RPC run time service library interface headers RPC run time service library RPC protocols security service cell service distributed file service thread service DCE runtime environment Information Systems Engineering, 2009 58

5.5 Infrastructure As the programming abstractions reach higher and higher levels, the underlying infrastructure implementing the abstractions must grow accordingly Additional functionality is almost always implemented through additional software layers The additional software layers increase the size and complexity of the infrastructure necessary to use the new abstractions The infrastructure is also intended to support additional functionality that makes development, maintenance, and monitoring easier and less costly RPC => transactional RPC => logging, recovery, advanced transaction models, language primitives for transactional demarcation, transactional file system, etc. The infrastructure is also there to take care of all the non functional properties typically ignored by data models, programming models, and programming languages: performance, availability, recovery, instrumentation, maintenance, resource management, etc. Information Systems Engineering, 2009 59

5.6 Understanding middleware To understand middleware, one needs to understand its dual role as programming abstraction and as infrastructure Programming Abstraction Imfrastructure Intended to hide low level details of hardware, networks, and distribution Trend is towards increasingly more powerful primitives that, without changing the basic concept of RPC, have additional properties or allow more flexibility in the use of the concept Evolution and appearance to the programmer is dictated by the trends in programming languages (RPC and C, CORBA and C++, RMI and Java, Web services and SOAP XML) Intended to provide a comprehensive platform for developing and running complex distributed systems Trend is towards service oriented architectures at a global scale and standardization of interfaces Another important trend is towards single vendor software stacks to minimize complexity and streamline interaction Evolution is towards integration of platforms and flexibility in the configuration (plus autonomic behavior) Information Systems Engineering, 2009 60

5.7 Basic middleware: RPC One cannot expect the programmer to implement a complete infrastructure for every distributed application. Instead, one can use an RPC system (our first example of low level middleware) What does an RPC system do? Hides distribution behind procedure calls Provides an interface definition language (IDL) to describe the services Generates all the additional code necessary to make a procedure call remote and to deal with all the communication aspects Provides a binder in case it has a distributed name and directory service system CLIENT call to remote procedure CLIENT stub procedure Bind Marshalling Send SERVER stub procedure Unmarshalling Return SERVER remote procedure Client process Communication module Communication module Dispatcher (select stub) Server process Information Systems Engineering, 2009 61

5.8 What can go wrong here? RPC is a point to point protocol in the sense that it supports the interaction between two entities (the client and the server) When there are more entities interacting with each other (a client with two servers, a client with a server and the server with a database), RPC treats the calls as independent of each other. However, the calls are not independent Recovering from partial system failures is very complex. For instance, the order was placed but the inventory was not updated, or payment was made but the order was not recorded Avoiding these problems using plain RPC systems is very cumbersome Server 2 (products) DBMS New_product Lookup_product Delete_product Update_product INVENTORY CONTROL CLIENT Lookup_product Check_inventory IF supplies_low THEN Place_order Update_inventory... Products database Server 3 (inventory) Place_order Cancel_order Update_inventory Check_inventory DBMS Inventory and order database Information Systems Engineering, 2009 62

5.9 Transactional RPC The solution to this limitation is to make RPC calls transactional, that is, instead of providing plain RPC, the system should provide TRPC What is TRPC? same concept as RPC plus additional language constructs and run time support (additional services) to bundle several RPC calls into an atomic unit usually, it also includes an interface to databases for making end to end transactions using the XA standard (implementing 2 Phase Commit) and anything else the vendor may find useful (transactional callbacks, high level locking, etc.) Simplifying things quite a bit, one can say that, historically, TP Monitors are RPC based systems with transactional support. We have already seen an example of this: Encina Structured File Service Encina Distributed Applications Encina Monitor Peer to Peer Comm Encina Toolkit Reliable Queuing Service OSF DCE Information Systems Engineering, 2009 63

5.10 TP Monitors Server 2 (products) DBMS New_product Lookup_product Delete_product Update_product INVENTORY CONTROL IF supplies_low THEN BOT Place_order Update_inventory EOT Products database Server 3 (inventory) Place_order Cancel_order Update_inventory Check_inventory DBMS Inventory and order database The design cycle with a TP Monitor is very similar to that of RPC: define the services to implement and describe them in IDL specify which services are transactional use an IDL compiler to generate the client and server stubs Execution requires a bit more control since now interaction is no longer point to point: transactional services maintain context information and call records in order to guarantee atomicity stubs also need to support more information like transaction id and call context Complex call hierarchies are typically implemented with a TP Monitor and not with plain RPC Information Systems Engineering, 2009 64

5.11 TP Monitor Example Interfaces to user defined services Programs implementing the services Yearly balance? Monthly average revenue? Front end TP Monitor environment Control (load balancing, cc and rec., replication, distribution, scheduling, priorities, monitoring ) app server 1 user program user program app server 1 user program user program app server 2 recoverable queue app server 3 wrappers Branch 1 Branch 2 Finance Dept. Information Systems Engineering, 2009 65

5.12 TP Heavy vs. TP Light = 2 tier vs. 3 tier A TP heavy monitor provides: a full development environment (programming tools, services, libraries, etc.), additional services (persistent queues, communication tools, transactional services, priority scheduling, buffering), support for authentication (of users and access rights to different services), its own solutions for communication, replication, load balancing, storage management... (similar to an operating system). Its main purpose is to provide an execution environment for resource managers (applications), with guaranteed reasonable performance This is the traditional monitor: CICS, Encina, Tuxedo. A TP Light is a database extension: it is implemented as threads, instead of processes, it is based on stored procedures ("methods" stored in the database that perform an specific set of operations) and triggers, it does not provide a development environment. Light Monitors are appearing as databases become more sophisticated and provide more services, such as integrating part of the functionality of a TP Monitor within the database. Instead of writing a complex query, the query is implemented as a stored procedure. A client, instead of running the query, invokes the stored procedure. Stored procedure languages: Sybase's Transact SQL, Oracle's PL/SQL. Information Systems Engineering, 2009 66

5.13 Databases and the 2 tier approach user defined application logic database resource manager client database management system Database developing environment external application Databases are traditionally used to manage data. However, simply managing data is not an end in itself. One manages data because it has some concrete application logic in mind. This is often forgotten when considering databases. But if the application logic is what matters, why not move the application logic into the database? These is what many vendors are advocating. By doing this, they propose a 2 tier model with the database providing the tools necessary to implement complex application logic. These tools include triggers, replication, stored procedures, queuing systems, standard access interfaces (ODBC, JDBC). Information Systems Engineering, 2009 67

5.14 CORBA The Common Object Request Broker Architecture (CORBA) is part of the Object Management Architecture (OMA) standard, a reference architecture for component based systems The key parts of CORBA are: Object Request Broker (ORB): in charge of the interaction between components CORBA services: standard definitions of system services A standardized IDL language for the publication of interfaces Protocols for allowing ORBs to talk to each other CORBA was an attempt to modernize RPC by making it object oriented and providing a standard Client (CORBA object) client stub (proxy) CORBA library interface to remote calls Marshalling serialization Object Request Broker (ORB) CORBA services Server (CORBA object) server stub (skeleton) CORBA Basic Object Adaptor Information Systems Engineering, 2009 68

5.15 CORBA follows the RPC model CORBA follows the same model as RPC : they are trying to solve the same problem CORBA is often implemented on top of RPC Unlike RPC, however, CORBA proposes a complete architecture and identifies parts of the system to much more detail than RPC ever did (RPC is an interprocess communication mechanism, CORBA is a reference architecture that includes an inter process communication mechanism) CORBA standardized component based architectures but many of the concepts behind were already in place long ago Development is similar to RPC: define the services provided by the server using IDL (define the server object) compile the definition using an IDL compiler. This produces the client stub (proxy, server proxy, proxy object) and the server stub (skeleton). The method signatures (services that can be invoked) are stored in an interface repository Program the client and link it with its stub Program the server and link it with its stub Unlike in RPC, the stubs make client and server independent of the operating system and programming language Information Systems Engineering, 2009 69

5.16 Objects everywhere: IIOP and GIOP In order for ORBs to be a truly universal component architecture, there has to be a way to allow ORBs to communicate with each other (one cannot have all components in the world under a single ORB) Client (CORBA object) Server (CORBA object) For this purpose, CORBA provides a General Inter ORB Protocol (GIOP) that specifies how to forward calls from one ORB to another and get the requests back The Internet Inter ORB Protocol (IIOP) specifies how GIOP messages are translated into TCP/IP ORB 1 ORB 2 GIOP GIOP IIOP IIOP There are additional protocols to allow ORBs to communicate with other systems The idea was sound but came too late and was soon superseded by Web services Internet (TCP/IP) Information Systems Engineering, 2009 70

5.17 The best of two worlds: Object Monitors Middleware technology should be interpreted as different stages of evolution of an ideal system. Current systems do not compete with each other per se, they complement each other. The competition arises as the underlying infrastructures converge towards a single platform: OBJECT REQUEST BROKERS (ORBs): Reuse and distribution of components via an standard, object oriented interface and number of services that add semantics to the interaction between components. TRANSACTION PROCESSING MONITORS: An environment to develop components capable of interacting transactionally and the tools necessary to maintain transactional consistency And Object Transaction Monitors? Object Monitor = ORB + TP Monitor Information Systems Engineering, 2009 71

5.18 Conventional middleware today RPC and the model behind RPC are at the core of any middleware platform, even those using asynchronous interaction. RPC, however, has become part of the low level infrastructure and it is rarely used directly by application developers TP Monitors are still as important as they have been in the past decades but they have become components in larger systems and hidden behind additional layers intended for enterprise application integration and Web services. Like RPC, the functionality of TP Monitors is starting to migrate to the low levels of the infrastructure and becoming invisible to the developer CORBA is being replaced by other platforms although its ideas are still being used and copied in new systems. CORBA suffered from three developments that changed the technology landscape: the quick adoption of Java and the Java Virtual Machine, the Internet and the emergence of the Web, the raise of J2EE and related technologies to an almost defacto standard for middleware Information Systems Engineering, 2009 72

5.19 Middleware convergence In practice, one always needs more than one type of middleware. The question is what is offered by each product. Existing systems implement a great deal of overlapping functionality: what in CORBA are called the services RPC runtime engine Name services App. wrappers platform support repository Because of this overlapping functionality, there are many possible combinations. That all these combinations are possible does not make they all make sense In an integrated environment, this functionality should be incorporated not by plugging heavy, stand alone components but by designing a coherent system from the beginning. This is not always feasible nowadays. Information Systems Engineering, 2009 73

5.20 Ideal System transaction management object management process management message management data management COMMON INFRASTRUCTURE Information Systems Engineering, 2009 74

5.21 Suggested References Distributed Systems. Concepts and Design George Coulouris, Jean Dollimore, Tim Kindberg Addison Wesley, 2005 (4th edition) Web Services. Concepts, Architectures and Applications Gustavo Alonso, Fabio Casati, Harumi Kuno Springer, 2003 From P2P to Web Services and Grids: Peers in a Client/Server World Ian J. Taylor Springer, 2004 Information Systems Engineering, 2009 75

6. Preview: MA INF 3204 "Distributed and Mobile Information Systems" Prerequisites (Recommended) Basic knowledge and possible some practical experience with Java programming Basics of network technologies and information systems Contents (Tentative) Technologies Short review of basic middleware technologies (Sockets, JavaRMI, CORBA) Web Services (SOAP, WSDL, UDDI) Grid infrastructures (GRIA, WSRF, Web Services Resource Framework (WSRF)) Algorithms (on the example of Web Services) Service coordination protocols (WS Coordination, WS Transaction) Data access and manipulation protocols (OGSA DAI) Services discovery and integration (OWL S, SAWSDL) Service composition and execution (BEPL, Scufl) Data management issues specific for P2P and mobile environments Information Systems Engineering, 2009 76

6.1 What You Will Learn Infrastructure and basic technologies for construction of modern distributed and mobile information systems Problems that occur during construction of distributed and mobile systems Principles and techniques to solve them Practical applications of distributed and mobile systems Information Systems Engineering, 2009 77

Contact information: Serge Shumilov shumilov@cs.uni bonn.de Information Systems Engineering, 2009 78