Cloud Computing Lecture 4 and 5 Grid: 2012-2013 Introduction. Up until now Definition of Cloud Computing. Grid Computing: Schedulers: Condor SGE 1
Summary Core Grid: Toolkit Condor-G Grid: Conceptual Architecture Tools and Applications Aplications Descovery, negotiation, diagnostics& monitoring Secure access to resources Colective Services Communication and Resource Management Protocols Resources: CPU, storage, networking Fabric 2
What is Toolkit? Toolkit is a set of tools that solve common issues in distributed application development specially in the context of grids: Heterogeneity. Complexity and security issues. Lack of standardization and inter-operability. It began in 1997 and is developed by the Alliance. Tookit: Core Grid Infra-Structure Uses existing resources: Clusters with scheduling, distributed file systems, networks, security systems Layers it s own services on top of them. Provides XML configuration. In order to create a generic service-centred interaction model based on: Command line utilities. Web Services. 3
Examples of the benefits of using Toolkit in a Grid Replace login in different systems by a single sign-on. Group system information into a single repository. Submit remote tasks at: Remote machines. CPU sharing portals. Workflow engines. Provide high bandwidth data transfers. Provide managed data transfers. Core features: Toolkit v.4 (GT4): Scope Infra-structure for building new services. Security: Applying a uniform policy among different systems. Execution management: Manage the lifecycle of the application and its jobs/processes. Data management: Locate, transfer and access data. Monitoring: Monitoring dynamic grid systems. 4
GT4: Base Toolkit Java GSI- OpenSSH MyProxy Data Rep C Delegation GridFTP Replica Location Python CAS GridWay Reliable File MDS4 Base Segurança Security Execution Execução Dados Data Monitoring Monit. Web Services with: WS-Resource Framework: records service call state and provides information. Security using WS-Security Tools for compiling and starting Web Services in C, Java e Python. GT4: Base 5
WS-Resource Framework EPR EPR EPR Service Resource RPs GetRP GetMultRPs SetRP QueryRPs Subscribe SetTermTime Destroy Service State representation: Resource Property State identification: Endpoint Reference State interfaces: GetRP, QueryRPs, GetMultipleRPs, SetRP Service lifecycle management: SetTerminationTime ImmediateDestruction Notification interfaces: Subscribe Notify GT4: Security Toolkit Java GSI- OpenSSH MyProxy Data Rep C Delegation GridFTP Replica Location Python CAS GridWay Reliable File MDS4 Base Security Execution Data Monitoring 6
GT4: Security Delegation: mechanism to ensure that certificates are valid across different systems. CAS: Community Authorization Service. Provides authorization for groups of users. GSI-OpenSSH: sshwith support for credentials avoiding multiple logins. MyProxy: Credential server. Users keep certificates at the server and provide a single key to read all needed certificates. GT4: Monitoring Toolkit Java GSI- OpenSSH MyProxy Data Rep C Delegation GridFTP Replica Location Python CAS GridWay Reliable File MDS4 Base Segurança Security Execution Execução Dados Data Monitoring Monit. 7
MDS4: GT4: Monitoring MDS-Index: Gathers monitoring information (e.g. GetRP). MDS-Trigger: Compares gathered information with management rules and sends alerts (e.g. scripts). MDS-Archive: Manages monitoring information archive. GT4: Data Management Toolkit Java GSI- OpenSSH MyProxy Data Rep C Delegation GridFTP Replica Location Python CAS GridWay Reliable File MDS4 Base Segurança Security Execution Execução Dados Data Monitoring Monit. 8
GT4: Data Management GridFTP: Efficient data transfer. Reliable File : Manages GridFTP. Data Replication. Replica Location. GridFTP FTP service optimized for high debit in large scale networks: FTP with added extensions. The channels use security. Multiple transfer channels. of partial files. Server-server transfers. Basic Server-Server 9
Striped GridFTP GridFTPsupports striped (multi-node) transfers because often is highbandwidth networks the local file system is a bottleneck: A control channel. Multiple channel on each of several nodes. Requires a shared FS on all nodes. RFT Reliable File A manager for data transfer request with: Server-server transfers. Monitoring for restarts. Database to tolerate failures. Allows clients to submit a request and disconnect. 26 10
RLS - Replica Location Service Data repository manager: Consistent local state stored in the Local Replica Catalogs(LRCs). Global state stored in the Replica Location Indices (RLIs) without consistency guarantees. Configurable topology. GT4 Tools Toolkit Java GSI- OpenSSH MyProxy Data Rep C Delegation GridFTP Replica Location Python CAS GridWay Reliable File MDS4 Base Segurança Security Execution Execução Dados Data Monitoring Monit. 11
GT4: Execution : Job Management. GridWay: Meta Scheduler. : Remote Job Management Triggers data transfers and manages jobs. Keeps persistent state of the jobs. Uses security services. s user credentials. It s not a scheduler. Used as an interface for schedulers and meta-schedulers. Applications Workflow, Meta-schedulers, Batch Jobs, Parameter Sweep Schedulers Condor, SGE, LSF, PBS, Loadleveler, Fork 12
Scalability: 4 Receives a job file with the executable name, input filenames, output filenames and destination machine in RSL (Resource Specification Language). Interacts with schedulers and send jobs to them (SGE, Condor, etc.) Manages job and file information with high scalability: Manages up to 32k active jobs. Monitors node load. Handles bursts of up to 50 jobs. Processes a job every 2 seconds. GridWay: Meta-Scheduler Users GridWay Portal Comand Line Applications Services: MDS,, GridFTP Scheduler Middleware SGE Cluster PBS Cluster LSF Cluster May be heterogeneous and distributed. Infra-structure 13
GridWay Architecture Portal GridWay Core Job Pool Host Pool Command Line Request Manager Dispatch Manager Submission Submission Monitoring Monitoring Control Control Scheduler Job Preparation and Job Manager GridFTP RFT Grid Data Services Execution Manager pre-ws WS Grid Job Execution Services MDS2 Information Manager MDS2 GLUE MDS4 Resource Discovery and Monitoring Grid Monitoring Services GridWay Scheduling Relevant Information: About jobs: Fixed priority Urgency flag User quota Deadline Waiting time About resources: Rank (Preferences). Fixed priority. Past use history. Failure history. 14
GridWay: Meta-Scheduler Dynamic scheduler: Set of pre-defined policies: priority-based, proportional, by waiting time, by deadlines, etc.) Interface for user scheduling code. Based on services. Job resubmission when better nodes become available. Detects violations of advertised characteristics. Able to handle dynamic changes in job execution requests. Security in Grids. Grid case studies. Next time 15