Framework for Querying Distributed Objects Managed by a Grid Infrastructure

Size: px
Start display at page:

Download "Framework for Querying Distributed Objects Managed by a Grid Infrastructure"

Transcription

1 Framework for Querying Distributed Objects Managed by a Grid Infrastructure Ruslan Fomkin and Tore Risch Department of Information Technology, Uppsala University, P.O. Box 337, SE Uppsala, Sweden {Ruslan.Fomkin, Tore.Risch}@it.uu.se Abstract. Queries over scientific data often imply expensive analyses of data requiring a lot of computational resources available in Grids. We are developing a customizable query processor built on top of an established Grid infrastructure, the NorduGrid middleware, and have implemented a framework for managing long running queries in Grid environment. With the framework the user does not specify the detailed job and parallelization descriptions required by NorduGrid. Instead s/he specifies queries in terms of an application-oriented schema describing contents of files managed by the Grid and accessed through wrappers. When a query is received by the system it generates NorduGrid job descriptions submitted to NorduGrid for execution. The framework considers limitations of NorduGrid. It includes a submission mechanism, a job babysitter, and a generic data exchange mechanism. The submission mechanism generates a number of jobs for parallel execution of a user query over wrapped data files. The task of the babysitter is to submit generated jobs to NorduGrid for the execution, to monitor their execution status, and to download results from the execution. The generic exchange mechanism provides a way to exchange objects through files between Grid execution nodes and user applications. 1 Introduction Nowadays a lot of scientific data are stored in Grids. Scientists need to access and analyze them. Their analyses often imply expensive computations that need to process a lot of data. Thus scientists need to use external computational resources to process their analyses, and storage resources to store and share huge amounts of data. For this many Grids are developed to provide computational resources and storage facilities. For example, the ATLAS collaboration [1] motivates many Grid projects such as LCG [2], EGEE [3], and NorduGrid [4]. These projects provide storage facilities to store and share data produced by ATLAS [1] and to be produced by the Large Hadron Collider (LHC) [5], along with computational resources to analyze the data. This work is funded by The Swedish Research Council (VR) under contract J.-M. Pierson (Ed.): VLDB DMG 2005, LNCS 3836, pp , c Springer-Verlag Berlin Heidelberg 2005

2 Framework for Querying Distributed Objects 59 A typical analysis of data for the LHC projects is selections of subsets of the input data. The selections, called cuts, consist of not only simple logical predicates but also numerical computations. We show that such analyses can be expressed in a declarative way using an extensible query language. We are developing POQSEC [6] (Parallel Object Query System for Expensive Computations) that processes scientific analyses specified as declarative SQL-like queries over data distributed in the Grid. It utilizes computational resources of Swegrid [7] and storage resources of Nordic countries through the middleware Grid infrastructure NorduGrid [4]. The goal of the POQSEC project is to provide a transparent and scalable way to specify and execute scientific queries. A user should be able to specify his/her query transparently in a client database without respect to where it will be executed and how data will be accessed. Currently we have implemented a framework for submitting user queries for execution in the Grid. The system then creates jobs executing the queries, submits the jobs to NorduGrid, monitors execution of the jobs by NorduGrid, downloads results of the jobs, and delivers results of the queries to the user. The user states queries to POQSEC in terms of a database schema available in the client database. The schema contains both an application-oriented part and Grid metadata. The application schema describes data stored inside files in Grid storage resources, for example events produced by ATLAS. Wrappers are defined for accessing the contents of these files, e.g. in our application we use a wrapper of the ROOT library [8]. The Grid meta-data contains information about the files. Thus user queries can restrict data both in terms of application data contents and meta-data about files. The latter is very important since there is a huge amount of Grid data files and queries are normally over a small percentage of them. User queries are parallelized to a number of jobs for execution. The parallelization is done by partitioning data between jobs. Our preliminary results show that the parallelization gives significant performance improvements. The rest of the paper is organized as follows. Related work is discussed in Sect. 2. Section 3 describes the POQSEC architecture. It is followed by a description of an application from High Energy Physics, which is our test case. The implementation of the framework is discussed in Sect. 5, and Sect. 6 concludes the paper. 2 Related Work Another system that utilizes a Grid infrastructure and provides high-level declarative query language for data access and analysis is Distributed Query Processing system (DQP) [9] or its web service version OGSA-DQP [10]. The DQP is part of the Grid infrastructure mygrid [11], which fully controls resources and where resources can be allocated dynamically. The resources for the query execution are allocated and provided by a user. Any of them can be utilized by DQP dynamically. It is different from our system where NorduGrid is a middleware above autonomous local batch systems that control computational resources. Unlike the DQP, we need to consider the NorduGrid limitation that jobs are not

3 60 R. Fomkin and T. Risch guaranteed to start immediately. Furthermore, as part of a job description NorduGrid requires to specify descriptions of resources in advance. This includes, for example, estimating execution time and number of computational nodes for jobs. STORM [12] is a distributed query processing environment for processing selections over distributed large scientific datasets and transferring the selected data to its clients. STORM does not leverage an existing Grid infrastructure for data transportation, job scheduling, and batch query processing as POQSEC. In [13,14] a batch database system is developed to support scientific queries. It is there applied on astronomical data. The data are stored in back-end SQL servers managed by a front-end batch query system. In POQSEC we use a middleware approach to access wrapped data stored in native format rather than storing the data in SQL databases. ATLAS Distributed Analysis (ADA) [15] project has goal to provide highlevel interface for scientists who analyze data produced by ATLAS and LHC. The users specify jobs containing datasets for processing in terms of meta-data and their analyses as snippets of programming code, for example in C++ or Fortran. A typical analysis performs selections that include computations over input datasets and aggregations over results of the selections. The jobs are submitted to Grid resources and their execution is monitored. In contrast, POQSEC uses a declarative high-level query language to specify analyses and the goal of POQSEC is transparent execution without considering whether the query will be executed on Grid resources or locally. 3 POQSEC Architecture The architecture presented in Fig. 1 illustrates the current implementation of POQSEC. The POQSEC architecture considers limitations of the NorduGrid (NG) middleware. NorduGrid and its limitations arebriefly described in Sect Section 3.2 describes POQSEC components and its interaction with NorduGrid. 3.1 NorduGrid Middleware NorduGrid (also called Advance Resource Connector) [4,16] is a middleware between Grid users and computational resources that are managed by local batch systems. Thus NorduGrid does not control computational resources; instead it submits user tasks to local batch systems on clusters and each local batch system allocates cluster nodes according to its policy and current load of the cluster. The Computing Elements (CE) are clusters where Grid jobs are executed while Storage Elements (SE) are file servers where the data to be queried are stored. The CEs and SEs are managed by NorduGrid and are accessible by submitting Grid jobs to an NG Client. The NG client is a set of command line tools to submit, monitor, and manage jobs on the Grid. It also has commands to move data between storage elements and clients, and to query Grid resource information such as loads on different CEs and job statistics. Users of NorduGrid always first initiate communication with the NG client.

4 Framework for Querying Distributed Objects 61 Fig. 1. Architecture of current implementation of POQSEC The NG client includes a resource brokering service [17] to find suitable resources for jobs. Jobs are described in a resource specification language, xrsl [18], which includes specification of, e.g.: A user executable and its arguments to be run on some suitable computing element. Files to be transported to and from the chosen computing element before and after the execution. Maximal CPU time for the execution. Runtime environments for the execution. A runtime environment is an additional software package required, e.g. an application library such as ROOT [8]. Standard input, output, and error files for the execution. Optional names of the computing elements where the executable can run. The number of parallel sub-jobs to be run on the computing element.

5 62 R. Fomkin and T. Risch In summary NorduGrid requires a lot of user specifications to fully describe computation tasks as xrsl scripts. An example of the script is shown in Fig. 4. POQSEC simplifies this considerably by automatically generating NorduGrid interactions and job scripts to execute a task specified as a declarative query of contents of data. To manage jobs generated by POQSEC, to track their executions, and to download results we provide a babysitter integrated with the POQSEC framework. 3.2 POQSEC Components and Their Interaction with NorduGrid The Query Coordinator of POQSEC (Fig. 1) manages user queries submitted to POQSEC for execution on the Grid. It communicates with an NG client directly through a command line interface. Both the query coordinator and the NG client are running on the same node, the Grid Client Node, which is a user accessible computer node. On it the user must first initialize his/her Grid credentials required for using NG client services according to the Grid Secure Infrastructure (GSI) [19] mechanism. The POQSEC Client component is a personal POQSEC database running on the Grid client node and communicating with the query coordinator. It could also run on a separate node from the Grid client node, e.g. on a user s desktop computer, if GSI is used for the communication with the query coordinator. Queries are submitted through the POQSEC client to the query coordinator for further execution on Grid resources. The components of the query coordinator are the Coordinator Server and the Babysitter. The coordinator server contains a Grid Meta-Database, a Submission Database, andajob Queue. The Grid meta-database stores information about data files and computational elements accessible trough POQSEC. It is needed since Grid resources are heterogeneous and require Grid users to know the computational elements that are able to execute their jobs and properties of the computational elements required for job executions, e.g. runtime environments. POQSEC users need not specify this information when submitting queries since it is stored in the Grid meta-database. The submission database contains descriptions of queries submitted from the POQSEC client and job descriptions generated by POQSEC to execute the queries. The job queue contains jobs that are created but not yet submitted to NorduGrid for execution. The process of submitting and evaluating a query is presented in Fig. 2. When a query is received (1) from the POQSEC client the coordinator server first registers the query in the submission database and stores there a number of job descriptions to parallelize the query execution. The number of jobs to create is currently provided by the user as part of the query submission 1. Information about computational resources and data files from the Grid meta-database is used to generate these job descriptions. xrsl scripts are generated from the job descriptions and are stored (2) in the local storage. Then the jobs are registered 1 We are working on automating this.

6 Framework for Querying Distributed Objects 63 Fig. 2. Interactions between POQSEC components and NorduGrid in the job queue. The babysitter picks (3) jobs from the job queue and submits (4) them as xrsl scripts to the NG client for execution on Grid resources. Once a job has been submitted the babysitter regularly polls (5) the NG client for its job status and reports (6) the status to the coordinator server to update the submission database. When a job is finished the babysitter downloads (11) the result to the Local Storage, which is the file system of the Grid client node, and notifies (12) the coordinator server. The result can be retrieved (13) to the POQSEC client after the query is finished. On each CE NorduGrid maintains an NG Grid Manager. It receives (7) job descriptions from NG clients. In our case these jobs are executing POQSEC subqueries. The NG Grid manager uploads (8) input files from SEs to the local CE Storage before each job is submitted to the local batch system. The local batch system allocates CE nodes for each job according its policies and current load, and then starts the job executions. For POQSEC these jobs contain Executors that evaluate (9) subqueries over uploaded data and store (10) the results in local CE storage files. The babysitter polls (5) the NG client regularly for finished executions. After a job has finished the babysitter requests (11) the NG client to download (11) the result to the local storage of the Grid client node and notifies (12) the coordinator server that the job is ready. Since a given POQSEC query often generates many jobs a query is ready only when all its jobs are finished. However, partial results can be obtained once some jobs are finished. 4 User Application Our current test application is an application for analyzing data produced by LHC projects for containing charged Higgs bosons [20]. Input data for the analyses are events, which describe collision events between elementary particles. Each event comprises sets of particles of various types such as electrons, muons, sets of other particles called jets, and sets of event parameters such as missing momentum in x and y directions (PxMiss and PyMiss). Each particle is described by its own set of parameters, e.g., the ID-number of thetypeofaparticle(kf), momentum in x, y, and z directions (Px, Py, and Pz), and amount of energy (Ee). The data are stored in files managed by an object-oriented data analysis framework, ROOT [8].

7 64 R. Fomkin and T. Risch Fig. 3. The schema of the application data Analysis of events consists of selecting those events that can potentially contain the charged Higgs bosons. A number of predicates, called cuts, are applied to each event and if the event satisfies all of them it is selected. A cut is a selection of events of interest for further analysis according to a scientist s theory. The application is implemented as an extension of a functional and objectoriented mediator system Amos II [21]. It is called ALEH (Analysis LHC Events for containing charged Higgs bosons). ALEH has a ROOT wrapper to access data from files managed by ROOT. An object-relational schema of the event data is defined in Fig. 3. It is a view of relevant parts of ROOT files. A number of analysis queries implementing cuts are defined as derived functions expressed in a query language, AmosQL [22]. Often a researcher selects events satisfying several cuts. For example, such query in AmosQL is: SELECT ev FROM Event ev WHERE jetvetocut(ev) AND zvetocut(ev) AND topcut(ev) AND misseecuts(ev) AND leptoncuts(ev) AND threeleptoncut(ev); The query is expressed in terms of derived functions, which define the cuts. The definition of one of the cuts in AmosQL is: CREATE FUNCTION zvetocut (Event ev) -> Event AS SELECT ev WHERE NOTANY(oppositeleptons(ev)) OR (abs(invmass(oppositeleptons(ev)) - zmass) >= minzmass) where invmass calculates the invariant mass of a pair of two given leptons, zmass is the mass of a Z particle, minzmass is range of closeness, and oppositeleptons is a derived function defined as another query: CREATE FUNCTION oppositeleptons (Event ev) -> <Lepton, Lepton> AS SELECT l1, l2 FROM Lepton l1, Lepton l2

8 Framework for Querying Distributed Objects 65 WHERE l1 = particles(ev) AND l2 = particles(ev) AND Kf(l1) = -Kf(l2); 5 Implementation A POQSEC client running our test application ALEH has an interface to a coordinator server through which a user can submit queries for execution in the Grid. It can monitor the status of submitted queries, and can retrieve results of finished queries. To submit a query the user invokes a system interface function named submit and specifies there the query defined in terms of the application schema, set of file names which should be processed by the query, number of jobs for parallelization the query, CPU time required for processing one job, and optionally a computing element where the query s jobs should be executed. If no computing element is specified the jobs will be submitted to an NG client along with a list of possible computing elements for execution. The result of the submit function is an object used to monitor the status and to retrieve the result. The test data, which are events, are produced by ATLAS simulation software and stored on storage recourses accessible through NorduGrid. Paths to the data files are stored in the Grid meta-database of the coordinator server in a format according to xrsl specification [18]. Thus the user provides file names without paths during submission. For example, the user wants to execute the general analyzing query presented in Sect. 4 over eight specific files containing equal number of events, with parallelization in four jobs that each job will process two files, where the CPU time of executing the query over the two files is 20 minutes, on any of available computational resources of Swegrid. The user submits the query and assigns the result of the submission to a variable :s: SET :s = submit("select ev FROM Event ev WHERE jetvetocut(ev) AND zvetocut(ev) AND topcut(ev) AND misseecuts(ev) AND leptoncuts(ev) AND threeleptoncut(ev)",{"bkg2events_ruslan_000.root", "bkg2events_ruslan_001.root","bkg2events_ruslan_002.root", "bkg2events_ruslan_003.root","bkg2events_ruslan_004.root", "bkg2events_ruslan_005.root","bkg2events_ruslan_006.root", "bkg2events_ruslan_007.root"},4,20); The submission is then translated into four xrsl scripts, which are submitted to a NG client for execution. One of the scripts is presented in Fig. 4. The executable there is the ALEH application, which contains the wrapper of ROOT files. It is necessary for the user to specify which files to analyze to restrict amount of data for processing. In the example the user specifies file names explicitly. Alternatively the user can define a query over the meta-database of the coordinator server to retrieve the file names. The local batch systems of all computational elements available through NorduGrid require specification of CPU time and thus the user needs to provide this 2. 2 We are working to estimate this automatically.

9 66 R. Fomkin and T. Risch & (executable=aleh) (arguments="aleh.dmp") (inputfiles= (aleh "/home/udbl/ruslan/amox/bin/aleh") (aleh.dmp "/home/udbl/ruslan/amox/bin/aleh.dmp") (query osql "query osql") (bkg2events_ruslan_001.root "gsiftp://se1.hpc2n.umu.se:2811/ se3/ruslan_poqsec/bkg2events_ruslan_001.root") (bkg2events_ruslan_000.root "gsiftp://se1.hpc2n.umu.se:2811/ se3/ruslan_poqsec/bkg2events_ruslan_000.root")) (outputfiles=(result.out "")) (cputime=20) ( (runtimeenvironment=root ) (runtimeenvironment=apps/hep/atlas-8.0.8) (runtimeenvironment=apps/physics/hep/root ) (runtimeenvironment=atlas-8.0.8) (runtimeenvironment=apps/hep/atlas-9.0.3)) (stdin="query osql") (stdout="outgen.out") (stderr="errgen.err") (gmlog="grid.debug") (middleware>="nordugrid") ( (cluster=sg-access.pdc.kth.se) (cluster=bluesmoke.nsc.liu.se) (cluster=hagrid.it.uu.se) (cluster=hive.unicc.chalmers.se) (cluster=ingrid.hpc2n.umu.se) (cluster=sigrid.lunarc.lu.se)) (jobname="poqsec: swegrid xrsl") Fig. 4. Example of the xrsl file with name swegrid xrsl The performance of many queries can be significantly improved by parallelization into several jobs. Our experience shows that parallelization of executing a query gives dramatic improvements. For example, the above submission took 24 minutes. The time was calculated as the elapsed time between when the query was submitted until all job results were downloaded from the Grid. A submission of the same query without parallelization as one job took 3 hours and 45 minutes, where 3 hours and 10 minutes were spent for the query evaluation. It is much longer response time compared with the parallelized Grid execution. During execution of a query submitted to POQSEC the user can monitor its status by calling status(:s). The status of the query is computed from its batch jobs statuses. The status DOWNLOADED will be returned only if results of all jobs of the query were downloaded. Then the user can retrieve the result data by executing retrieve(:s). The result of the query can be retrieved also by using the function wait(:s). The difference is that if wait is invoked before the result of the jobs is available the system waits until the coordinator server notifies it that all jobs are downloaded. Then it retrievesthe result while retrieve will just print a message if the query is not finished 3. The user can cancel his/her query submission by executing cancel(:s). 3 We are implementing functions to retrieve partial results.

10 Framework for Querying Distributed Objects 67 Fig. 5. Schema of the Grid meta-database and the submission database The coordinator server, the babysitter, and the NG client are running on the same Grid client node as the POQSEC client. The coordinator server contains the Grid meta-database and the submission database. The user is able to query the coordinator server for data from the Grid meta-database and to request updates of the Grid meta-database through the POQSEC client. The babysitter polls the coordinator server to pick up jobs from the job queue and to request updates of the submission database. A schema of the Grid meta-database and the submission database is presented in Fig. 5. The types Cluster and DataFile and its subtype EventData are parts of the Grid meta-database. The submission database is presented by the types Submission and Job. When the coordinator server receives query submissions from the POQSEC client it generates job descriptions and creates xrsl files for NorduGrid and script files for POQSEC executors. For example, for the submission given above the coordinator server generates four xrsl files and four script files. Example of one of the xrsl file is given in Fig. 4. The POQSEC script files contain commands for executors to load the input data from the data files through the ROOT wrapper and to execute the user query. In our example one of the script files contains: load_root_file("bkg2events_ruslan_001.root"); load_root_file("bkg2events_ruslan_000.root"); save("result.out",select ev FROM Event ev WHERE jetvetocut(ev) AND zvetocut(ev) AND topcut(ev) AND misseecuts(ev) AND leptoncuts(ev) AND threeleptoncut(ev));

11 68 R. Fomkin and T. Risch The results of the query executions are saved by the executors in files (here in result.out) in a way that they can be read by the POQSEC client. Objects, in our case events, which originally were the same, will be treated by the POQSEC client as the same object regardless of that they came from different sources. The other three xrsl files and three script files are similar except that they have different input data files. Automatic generation of the files by POQSEC exempts the user from manually creating such files for each job. The main purposes of the babysitter is to interact with the NG client to submit jobs, to monitor status of executing jobs, and to download finished jobs. Each interaction with the NG client can take from several seconds to a minute; thus the coordinator server does not contact the babysitter immediately when a job is created. Instead the babysitter polls the coordinator server regularly when it is not interacting with the NG client. 6 Conclusion and Future Work We have implemented a framework which provides basic tools for executing long running batch queries on Grid resources over wrapped scientific data distributed in the Grid. The framework is a part of our development of POQSEC (Parallel Object Query System for Expensive Computations), the goal of which is to provide a fully transparent query execution system for scientific applications. Our on-going work is to automate estimates of maximal CPU time required for the execution of an arbitrary query on a partition of input data. The estimates will be based on probing the query on a number of small samples. We also investigate strategies for suspending those jobs for which the maximal CPU time were underestimated, and then resuming them on other resources. Another on-going work considers automatic parallelization of a user query submitted for execution on the Grid resources. To decide automatically how many jobs to parallelize the query into depends on the current load of computational resources of the Grid and which computational elements would be chosen for the execution of the generated jobs. We combine this together with development of our own resource brokering algorithm. The resource brokering algorithm will not just decide where to execute the query but also to how many jobs parallelize the query. It should take in account that different computing elements of the Grid have different policies. We will base our resource broker algorithm on job statistics available from the NorduGrid middleware [23]. Job execution on computational resources accessible through NorduGrid can fail and users of NorduGrid need to deal with failures. We are investigating various strategies to deal with failures of job executions. References 1. ATLAS collaboration LHC Computing Grid EGEE: Enabling Grids for E-sciencE.

12 Framework for Querying Distributed Objects Eerola, P., Ekelöf, T., Ellert, M., Hansen, J.R., Konstantinov, A., Kónya, B., Nielsen, J.L., Ould-Saada, F., Smirnova, O., Wäänänen, A.: Science on NorduGrid. In Neittaanmäki, P., Rossi, T., Korotov, S., Oñate, E., Périaux, J., Knörzer, D., eds.: EC- COMAS (2004) See also 5. LHC - the Large Hadron Collider Fomkin, R., Risch, T.: Managing long running queries in Grid environment. In Meersman, R., Tari, Z., Corsaro, A., eds.: OTM Workshops. LNCS 3292, Springer (2004) Swegrid Brun, R., Rademakers, F.: ROOT - an object oriented data analysis framework. In: AIHENP 96 Workshop. Nucl. Inst. & Meth. in Phys. Res. A 389 (1997) See also 9. Smith, J., Gounaris, A., Watson, P., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Distributed query processing on the Grid. In Parashar, M., ed.: GRID. LNCS 2536, Springer (2002) Alpdemir, M.N., Mukherjee, A., Gounaris, A., Paton, N.W., Watson, P., Fernandes, A.A.A., Fitzgerald, D.J.: OGSA-DQP: A service for distributed querying on the Grid. In Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E., eds.: EDBT. LNCS 2992, Springer (2004) mygrid Narayanan, S., Kurç, T.M., Çatalyürek, Ü.V., Saltz, J.H.: Database support for data-driven scientific applications in the grid. Parallel Processing Letters 13 (2003) See also Nieto-Santisteban, M.A., Gray, J., Szalay, A.S., Annis, J., Thakar, A.R., O Mullane, W.: When database systems meet the Grid. In: CIDR. (2005) O Mullane, W., Li, N., Nieto-Santisteban, M.A., Szalay, A.S., Thakar, A.R., Gray, J.: Batch is back: CasJobs, serving multi-tb data on the Web. Technical Report MSR-TR , Microsoft Research (2005) 15. Adams, D., Deng, W., Chetan, N., Kannan, C., Sambamurthy, V., Harrison, K., Tan, C., Soroko, A., Liko, D., Orellana, F., Branco, M., Haeberli, C., Albrand, S., Fulachier, J., Lozano, J., Fassi, F., Rybkine, G.: ATLAS distributed analysis. In: CHEP04. (2004) 16. The NorduGrid/ARC User Guide. (2005) Available at Ellert, M.: The NorduGrid brokering algorithm (2004) Available at Smirnova, O.: Extended Resource Specification Language Reference Manual. (2005) Available at Welch, V., Siebenlist, F., Foster, I., Bresnahan, J., Czajkowski, K., Gawor, J., Kesselman, C., Meder, S., Pearlman, L., Tuecke, S.: Security for Grid services. In: HPDC 03, IEEE Computer Society (2003) See also Hansen, C., Gollub, N., Assmagan, K., Ekelöf, T.: Discovery potential for a charged Higgs boson decaying in the chargino-neutralino channel of the ATLAS detector at the LHC. SN-ATLAS (2005) 21. Risch, T., Josifovski, V., Katchaounov, T.: Functional data integration in a distributed mediator system. In: The Functional Approach to Data Management: Modeling, Analyzing, and Integrating Heterogeneous Data. SpringerVerlag (2003)

13 70 R. Fomkin and T. Risch 22. Flodin, S., Hansson, M., Josifovski, V., Katchaounov, T., Risch, T., Skold, M.: Amos II Release 7 User s Manual. Uppsala Database Laboratory. (2005) Available at users guide.html. 23. Konstantinov, A.: The Logger Service, Functionality Description and Installation Manual. (2005) Available at

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Dissertations from the Faculty of Science and Technology 80

ACTA UNIVERSITATIS UPSALIENSIS Uppsala Dissertations from the Faculty of Science and Technology 80 ACTA UNIVERSITATIS UPSALIENSIS Uppsala Dissertations from the Faculty of Science and Technology 80 Ruslan Fomkin Optimization and Execution of Complex Scientific Queries Dissertation presented at Uppsala

More information

Data Management for the World s Largest Machine

Data Management for the World s Largest Machine Data Management for the World s Largest Machine Sigve Haug 1, Farid Ould-Saada 2, Katarina Pajchel 2, and Alexander L. Read 2 1 Laboratory for High Energy Physics, University of Bern, Sidlerstrasse 5,

More information

Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world

Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world Pajchel, K.; Eerola, Paula; Konya, Balazs; Smirnova,

More information

An SQL-based approach to physics analysis

An SQL-based approach to physics analysis Journal of Physics: Conference Series OPEN ACCESS An SQL-based approach to physics analysis To cite this article: Dr Maaike Limper 2014 J. Phys.: Conf. Ser. 513 022022 View the article online for updates

More information

A SECURITY BASED DATA MINING APPROACH IN DATA GRID

A SECURITY BASED DATA MINING APPROACH IN DATA GRID 45 A SECURITY BASED DATA MINING APPROACH IN DATA GRID S.Vidhya, S.Karthikeyan Abstract - Grid computing is the next logical step to distributed computing. Main objective of grid computing is an innovative

More information

ATLAS NorduGrid related activities

ATLAS NorduGrid related activities Outline: NorduGrid Introduction ATLAS software preparation and distribution Interface between NorduGrid and Condor NGlogger graphical interface On behalf of: Ugur Erkarslan, Samir Ferrag, Morten Hanshaugen

More information

The NorduGrid Architecture and Middleware for Scientific Applications

The NorduGrid Architecture and Middleware for Scientific Applications The NorduGrid Architecture and Middleware for Scientific Applications O. Smirnova 1, P. Eerola 1,T.Ekelöf 2, M. Ellert 2, J.R. Hansen 3, A. Konstantinov 4,B.Kónya 1, J.L. Nielsen 3, F. Ould-Saada 5, and

More information

Atlas Data-Challenge 1 on NorduGrid

Atlas Data-Challenge 1 on NorduGrid Atlas Data-Challenge 1 on NorduGrid P. Eerola, B. Kónya, O. Smirnova Particle Physics, Institute of Physics, Lund University, Box 118, 22100 Lund, Sweden T. Ekelöf, M. Ellert Department of Radiation Sciences,

More information

ARC-XWCH bridge: Running ARC jobs on the XtremWeb-CH volunteer

ARC-XWCH bridge: Running ARC jobs on the XtremWeb-CH volunteer ARC-XWCH bridge: Running ARC jobs on the XtremWeb-CH volunteer computing platform Internal report Marko Niinimaki, Mohamed BenBelgacem, Nabil Abdennadher HEPIA, January 2010 1. Background and motivation

More information

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries. for a distributed Tier1 in the Nordic countries. Philippe Gros Lund University, Div. of Experimental High Energy Physics, Box 118, 22100 Lund, Sweden philippe.gros@hep.lu.se Anders Rhod Gregersen NDGF

More information

Spark and HPC for High Energy Physics Data Analyses

Spark and HPC for High Energy Physics Data Analyses Spark and HPC for High Energy Physics Data Analyses Marc Paterno, Jim Kowalkowski, and Saba Sehrish 2017 IEEE International Workshop on High-Performance Big Data Computing Introduction High energy physics

More information

Delivering Data Management for Engineers on the Grid 1

Delivering Data Management for Engineers on the Grid 1 Delivering Data Management for Engineers on the Grid 1 Jasmin Wason, Marc Molinari, Zhuoan Jiao, and Simon J. Cox School of Engineering Sciences, University of Southampton, UK {j.l.wason, m.molinari, z.jiao,

More information

The Grid Monitor. Usage and installation manual. Oxana Smirnova

The Grid Monitor. Usage and installation manual. Oxana Smirnova NORDUGRID NORDUGRID-MANUAL-5 2/5/2017 The Grid Monitor Usage and installation manual Oxana Smirnova Abstract The LDAP-based ARC Grid Monitor is a Web client tool for the ARC Information System, allowing

More information

Applications of Grid Computing in Genetics and Proteomics

Applications of Grid Computing in Genetics and Proteomics Applications of Grid Computing in Genetics and Proteomics Jorge Andrade 1, Malin Andersen 1,2, Lisa Berglund 1, and Jacob Odeberg 1,2 1 Department of Biotechnology, Royal Institute of Technology (KTH),

More information

Scalable Hybrid Search on Distributed Databases

Scalable Hybrid Search on Distributed Databases Scalable Hybrid Search on Distributed Databases Jungkee Kim 1,2 and Geoffrey Fox 2 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu, 2 Community

More information

Grid Resource Brokering Algorithms Enabling Advance Reservations and Resource Selection Based on Performance Predictions

Grid Resource Brokering Algorithms Enabling Advance Reservations and Resource Selection Based on Performance Predictions Accepted for publication in "Future Generation Computer Systems. The International Journal of Grid Computing: Theory, Methods and Applications". Elsevier. Grid Resource Brokering Algorithms Enabling Advance

More information

ISTITUTO NAZIONALE DI FISICA NUCLEARE

ISTITUTO NAZIONALE DI FISICA NUCLEARE ISTITUTO NAZIONALE DI FISICA NUCLEARE Sezione di Perugia INFN/TC-05/10 July 4, 2005 DESIGN, IMPLEMENTATION AND CONFIGURATION OF A GRID SITE WITH A PRIVATE NETWORK ARCHITECTURE Leonello Servoli 1,2!, Mirko

More information

Eerola, Paula; Konya, Balazs; Smirnova, Oxana; Ekelof, T; Ellert, M; Hansen, JR; Nielsen, JL; Waananen, A; Konstantinov, A; Ould-Saada, F

Eerola, Paula; Konya, Balazs; Smirnova, Oxana; Ekelof, T; Ellert, M; Hansen, JR; Nielsen, JL; Waananen, A; Konstantinov, A; Ould-Saada, F Building a production grid in Scandinavia Eerola, Paula; Konya, Balazs; Smirnova, Oxana; Ekelof, T; Ellert, M; Hansen, JR; Nielsen, JL; Waananen, A; Konstantinov, A; Ould-Saada, F Published in: IEEE Internet

More information

GRID Stream Database Managent for Scientific Applications

GRID Stream Database Managent for Scientific Applications GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova) and Tore Risch IT Department, Uppsala University, Sweden Outline Motivation Stream Data Management Computational GRIDs

More information

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan Grids and Security Ian Neilson Grid Deployment Group CERN TF-CSIRT London 27 Jan 2004-1 TOC Background Grids Grid Projects Some Technical Aspects The three or four A s Some Operational Aspects Security

More information

Multiple Broker Support by Grid Portals* Extended Abstract

Multiple Broker Support by Grid Portals* Extended Abstract 1. Introduction Multiple Broker Support by Grid Portals* Extended Abstract Attila Kertesz 1,3, Zoltan Farkas 1,4, Peter Kacsuk 1,4, Tamas Kiss 2,4 1 MTA SZTAKI Computer and Automation Research Institute

More information

Grid-BGC: A Grid-Enabled Terrestrial Carbon Cycle Modeling System

Grid-BGC: A Grid-Enabled Terrestrial Carbon Cycle Modeling System Grid-BGC: A Grid-Enabled Terrestrial Carbon Cycle Modeling System Jason Cope*, Craig Hartsough, Peter Thornton, Henry M. Tufo*, Nathan Wilhelmi and Matthew Woitaszek* * University of Colorado, Boulder

More information

Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2

Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2 Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2 Sturrock, R.; Eerola, Paula; Konya, Balazs; Smirnova, Oxana; Lindemann, Jonas; et, al. Published in: CERN-2005-002 Published:

More information

Contents. 1. Introduction Background Relational Databases High Energy Physics Application 8

Contents. 1. Introduction Background Relational Databases High Energy Physics Application 8 Contents 1. Introduction 2 2. Background 3 2.1 Relational Databases. 3 2.2 High Energy Physics Application 8 3. Representing High Energy Physics Data in Relational Databases 9 3.1 Implementation of Analysis

More information

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why the Grid? Science is becoming increasingly digital and needs to deal with increasing amounts of

More information

Monitoring ARC services with GangliARC

Monitoring ARC services with GangliARC Journal of Physics: Conference Series Monitoring ARC services with GangliARC To cite this article: D Cameron and D Karpenko 2012 J. Phys.: Conf. Ser. 396 032018 View the article online for updates and

More information

IEPSAS-Kosice: experiences in running LCG site

IEPSAS-Kosice: experiences in running LCG site IEPSAS-Kosice: experiences in running LCG site Marian Babik 1, Dusan Bruncko 2, Tomas Daranyi 1, Ladislav Hluchy 1 and Pavol Strizenec 2 1 Department of Parallel and Distributed Computing, Institute of

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

AGARM: An Adaptive Grid Application and Resource Monitor Framework

AGARM: An Adaptive Grid Application and Resource Monitor Framework AGARM: An Adaptive Grid Application and Resource Monitor Framework Wenju Zhang, Shudong Chen, Liang Zhang, Shui Yu, and Fanyuan Ma Shanghai Jiaotong University, Shanghai, P.R.China, 200030 {zwj03, chenshudong,

More information

ATLAS Data Challenge 2: A Massive Monte Carlo Production on the Grid

ATLAS Data Challenge 2: A Massive Monte Carlo Production on the Grid ATLAS Data Challenge 2: A Massive Monte Carlo Production on the Grid Santiago González de la Hoz 1, Javier Sánchez 1, Julio Lozano 1, Jose Salt 1, Farida Fassi 1, Luis March 1, D.L. Adams 2, Gilbert Poulard

More information

Optimising Mediator Queries to Distributed Engineering Systems

Optimising Mediator Queries to Distributed Engineering Systems Optimising Mediator Queries to Distributed Engineering Systems Mattias Nyström 1 and Tore Risch 2 1 Luleå University of Technology, S-971 87 Luleå, Sweden Mattias.Nystrom@cad.luth.se 2 Uppsala University,

More information

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano. First sights on a non-grid end-user analysis model on Grid Infrastructure Roberto Santinelli CERN E-mail: roberto.santinelli@cern.ch Fabrizio Furano CERN E-mail: fabrzio.furano@cern.ch Andrew Maier CERN

More information

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Evaluation of the computing resources required for a Nordic research exploitation of the LHC PROCEEDINGS Evaluation of the computing resources required for a Nordic research exploitation of the LHC and Sverker Almehed, Chafik Driouichi, Paula Eerola, Ulf Mjörnmark, Oxana Smirnova,TorstenÅkesson

More information

The NorduGrid production Grid infrastructure, status and plans

The NorduGrid production Grid infrastructure, status and plans The NorduGrid production Grid infrastructure, status and plans P.Eerola,B.Kónya, O. Smirnova Department of High Energy Physics Lund University Box 118, 22100 Lund, Sweden T. Ekelöf, M. Ellert Department

More information

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data FedX: A Federation Layer for Distributed Query Processing on Linked Open Data Andreas Schwarte 1, Peter Haase 1,KatjaHose 2, Ralf Schenkel 2, and Michael Schmidt 1 1 fluid Operations AG, Walldorf, Germany

More information

The PanDA System in the ATLAS Experiment

The PanDA System in the ATLAS Experiment 1a, Jose Caballero b, Kaushik De a, Tadashi Maeno b, Maxim Potekhin b, Torre Wenaus b on behalf of the ATLAS collaboration a University of Texas at Arlington, Science Hall, PO Box 19059, Arlington, TX

More information

How to discover the Higgs Boson in an Oracle database. Maaike Limper

How to discover the Higgs Boson in an Oracle database. Maaike Limper How to discover the Higgs Boson in an Oracle database Maaike Limper 2 Introduction CERN openlab is a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate

More information

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez Scientific data processing at global scale The LHC Computing Grid Chengdu (China), July 5th 2011 Who I am 2 Computing science background Working in the field of computing for high-energy physics since

More information

File Access Optimization with the Lustre Filesystem at Florida CMS T2

File Access Optimization with the Lustre Filesystem at Florida CMS T2 Journal of Physics: Conference Series PAPER OPEN ACCESS File Access Optimization with the Lustre Filesystem at Florida CMS T2 To cite this article: P. Avery et al 215 J. Phys.: Conf. Ser. 664 4228 View

More information

EGEE and Interoperation

EGEE and Interoperation EGEE and Interoperation Laurence Field CERN-IT-GD ISGC 2008 www.eu-egee.org EGEE and glite are registered trademarks Overview The grid problem definition GLite and EGEE The interoperability problem The

More information

Big Data Tools as Applied to ATLAS Event Data

Big Data Tools as Applied to ATLAS Event Data Big Data Tools as Applied to ATLAS Event Data I Vukotic 1, R W Gardner and L A Bryant University of Chicago, 5620 S Ellis Ave. Chicago IL 60637, USA ivukotic@uchicago.edu ATL-SOFT-PROC-2017-001 03 January

More information

ATLAS software configuration and build tool optimisation

ATLAS software configuration and build tool optimisation Journal of Physics: Conference Series OPEN ACCESS ATLAS software configuration and build tool optimisation To cite this article: Grigory Rybkin and the Atlas Collaboration 2014 J. Phys.: Conf. Ser. 513

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

DIAL: Distributed Interactive Analysis of Large Datasets

DIAL: Distributed Interactive Analysis of Large Datasets DIAL: Distributed Interactive Analysis of Large Datasets D. L. Adams Brookhaven National Laboratory, Upton NY 11973, USA DIAL will enable users to analyze very large, event-based datasets using an application

More information

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007 Grid Programming: Concepts and Challenges Michael Rokitka SUNY@Buffalo CSE510B 10/2007 Issues Due to Heterogeneous Hardware level Environment Different architectures, chipsets, execution speeds Software

More information

High Performance Computing Course Notes Grid Computing I

High Performance Computing Course Notes Grid Computing I High Performance Computing Course Notes 2008-2009 2009 Grid Computing I Resource Demands Even as computer power, data storage, and communication continue to improve exponentially, resource capacities are

More information

Architecture of the Grid Services Toolkit for Process Data Processing

Architecture of the Grid Services Toolkit for Process Data Processing 2007 German e-science Available online at http://www.ges2007.de This document is under the terms of the CC-BY-NC-ND Creative Commons Attribution Architecture of the Grid Services Toolkit for Process Data

More information

Grid Computing a new tool for science

Grid Computing a new tool for science Grid Computing a new tool for science CERN, the European Organization for Nuclear Research Dr. Wolfgang von Rüden Wolfgang von Rüden, CERN, IT Department Grid Computing July 2006 CERN stands for over 50

More information

Heterogeneous Grid Computing: Issues and Early Benchmarks

Heterogeneous Grid Computing: Issues and Early Benchmarks Heterogeneous Grid Computing: Issues and Early Benchmarks Eamonn Kenny 1, Brian Coghlan 1, George Tsouloupas 2, Marios Dikaiakos 2, John Walsh 1, Stephen Childs 1, David O Callaghan 1, and Geoff Quigley

More information

Scalable Computing: Practice and Experience Volume 10, Number 4, pp

Scalable Computing: Practice and Experience Volume 10, Number 4, pp Scalable Computing: Practice and Experience Volume 10, Number 4, pp. 413 418. http://www.scpe.org ISSN 1895-1767 c 2009 SCPE MULTI-APPLICATION BAG OF JOBS FOR INTERACTIVE AND ON-DEMAND COMPUTING BRANKO

More information

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF Conference 2017 The Data Challenges of the LHC Reda Tafirout, TRIUMF Outline LHC Science goals, tools and data Worldwide LHC Computing Grid Collaboration & Scale Key challenges Networking ATLAS experiment

More information

Performance Analysis of Applying Replica Selection Technology for Data Grid Environments*

Performance Analysis of Applying Replica Selection Technology for Data Grid Environments* Performance Analysis of Applying Replica Selection Technology for Data Grid Environments* Chao-Tung Yang 1,, Chun-Hsiang Chen 1, Kuan-Ching Li 2, and Ching-Hsien Hsu 3 1 High-Performance Computing Laboratory,

More information

UNICORE Globus: Interoperability of Grid Infrastructures

UNICORE Globus: Interoperability of Grid Infrastructures UNICORE : Interoperability of Grid Infrastructures Michael Rambadt Philipp Wieder Central Institute for Applied Mathematics (ZAM) Research Centre Juelich D 52425 Juelich, Germany Phone: +49 2461 612057

More information

Scientific data management

Scientific data management Scientific data management Storage and data management components Application database Certificate Certificate Authorised users directory Certificate Certificate Researcher Certificate Policies Information

More information

Framework for Interactive Parallel Dataset Analysis on the Grid

Framework for Interactive Parallel Dataset Analysis on the Grid SLAC-PUB-12289 January 2007 Framework for Interactive Parallel Analysis on the David A. Alexander, Balamurali Ananthan Tech-X Corporation 5621 Arapahoe Ave, Suite A Boulder, CO 80303 {alexanda,bala}@txcorp.com

More information

Andrea Sciabà CERN, Switzerland

Andrea Sciabà CERN, Switzerland Frascati Physics Series Vol. VVVVVV (xxxx), pp. 000-000 XX Conference Location, Date-start - Date-end, Year THE LHC COMPUTING GRID Andrea Sciabà CERN, Switzerland Abstract The LHC experiments will start

More information

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk Challenges and Evolution of the LHC Production Grid April 13, 2011 Ian Fisk 1 Evolution Uni x ALICE Remote Access PD2P/ Popularity Tier-2 Tier-2 Uni u Open Lab m Tier-2 Science Uni x Grid Uni z USA Tier-2

More information

Juliusz Pukacki OGF25 - Grid technologies in e-health Catania, 2-6 March 2009

Juliusz Pukacki OGF25 - Grid technologies in e-health Catania, 2-6 March 2009 Grid Technologies for Cancer Research in the ACGT Project Juliusz Pukacki (pukacki@man.poznan.pl) OGF25 - Grid technologies in e-health Catania, 2-6 March 2009 Outline ACGT project ACGT architecture Layers

More information

The SweGrid Accounting System

The SweGrid Accounting System The SweGrid Accounting System Enforcing Grid Resource Allocations Thomas Sandholm sandholm@pdc.kth.se 1 Outline Resource Sharing Dilemma Grid Research Trends Connecting National Computing Resources in

More information

ShibVomGSite: A Framework for Providing Username and Password Support to GridSite with Attribute based Authorization using Shibboleth and VOMS

ShibVomGSite: A Framework for Providing Username and Password Support to GridSite with Attribute based Authorization using Shibboleth and VOMS ShibVomGSite: A Framework for Providing Username and Password Support to GridSite with Attribute based Authorization using Shibboleth and VOMS Joseph Olufemi Dada & Andrew McNab School of Physics and Astronomy,

More information

NorduGrid Tutorial. Client Installation and Job Examples

NorduGrid Tutorial. Client Installation and Job Examples NorduGrid Tutorial Client Installation and Job Examples Linux Clusters for Super Computing Conference Linköping, Sweden October 18, 2004 Arto Teräs arto.teras@csc.fi Steps to Start Using NorduGrid 1) Install

More information

Virtualizing a Batch. University Grid Center

Virtualizing a Batch. University Grid Center Virtualizing a Batch Queuing System at a University Grid Center Volker Büge (1,2), Yves Kemp (1), Günter Quast (1), Oliver Oberst (1), Marcel Kunze (2) (1) University of Karlsruhe (2) Forschungszentrum

More information

Agents for Cloud Resource Allocation: an Amazon EC2 Case Study

Agents for Cloud Resource Allocation: an Amazon EC2 Case Study Agents for Cloud Resource Allocation: an Amazon EC2 Case Study J. Octavio Gutierrez-Garcia and Kwang Mong Sim, Gwangju Institute of Science and Technology, Gwangju 500-712, Republic of Korea joseogg@gmail.com

More information

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN Software and computing evolution: the HL-LHC challenge Simone Campana, CERN Higgs discovery in Run-1 The Large Hadron Collider at CERN We are here: Run-2 (Fernando s talk) High Luminosity: the HL-LHC challenge

More information

Extending the SDSS Batch Query System to the National Virtual Observatory Grid

Extending the SDSS Batch Query System to the National Virtual Observatory Grid Extending the SDSS Batch Query System to the National Virtual Observatory Grid María A. Nieto-Santisteban, William O'Mullane Nolan Li Tamás Budavári Alexander S. Szalay Aniruddha R. Thakar Johns Hopkins

More information

MSF: A Workflow Service Infrastructure for Computational Grid Environments

MSF: A Workflow Service Infrastructure for Computational Grid Environments MSF: A Workflow Service Infrastructure for Computational Grid Environments Seogchan Hwang 1 and Jaeyoung Choi 2 1 Supercomputing Center, Korea Institute of Science and Technology Information, 52 Eoeun-dong,

More information

ARC integration for CMS

ARC integration for CMS ARC integration for CMS ARC integration for CMS Erik Edelmann 2, Laurence Field 3, Jaime Frey 4, Michael Grønager 2, Kalle Happonen 1, Daniel Johansson 2, Josva Kleist 2, Jukka Klem 1, Jesper Koivumäki

More information

The LHC Computing Grid

The LHC Computing Grid The LHC Computing Grid Gergely Debreczeni (CERN IT/Grid Deployment Group) The data factory of LHC 40 million collisions in each second After on-line triggers and selections, only 100 3-4 MB/event requires

More information

Optimizing Multiple Queries on Scientific Datasets with Partial Replicas

Optimizing Multiple Queries on Scientific Datasets with Partial Replicas Optimizing Multiple Queries on Scientific Datasets with Partial Replicas Li Weng Umit Catalyurek Tahsin Kurc Gagan Agrawal Joel Saltz Department of Computer Science and Engineering Department of Biomedical

More information

A Model for Scientific Computing Platform

A Model for Scientific Computing Platform A Model for Scientific Computing Platform Petre Băzăvan CS Romania S.A. Păcii 29, 200692 Romania petre.bazavan@c-s.ro Mircea Grosu CS Romania S.A. Păcii 29, 200692 Romania mircea.grosu@c-s.ro Abstract:

More information

Introduction to Grid Infrastructures

Introduction to Grid Infrastructures Introduction to Grid Infrastructures Stefano Cozzini 1 and Alessandro Costantini 2 1 CNR-INFM DEMOCRITOS National Simulation Center, Trieste, Italy 2 Department of Chemistry, Università di Perugia, Perugia,

More information

Empowering a Flexible Application Portal with a SOA-based Grid Job Management Framework

Empowering a Flexible Application Portal with a SOA-based Grid Job Management Framework Empowering a Flexible Application Portal with a SOA-based Grid Job Management Framework Erik Elmroth 1, Sverker Holmgren 2, Jonas Lindemann 3, Salman Toor 2, and Per-Olov Östberg1 1 Dept. Computing Science

More information

Analysis of internal network requirements for the distributed Nordic Tier-1

Analysis of internal network requirements for the distributed Nordic Tier-1 Journal of Physics: Conference Series Analysis of internal network requirements for the distributed Nordic Tier-1 To cite this article: G Behrmann et al 2010 J. Phys.: Conf. Ser. 219 052001 View the article

More information

Automatic Job Resubmission in the Nordugrid Middleware

Automatic Job Resubmission in the Nordugrid Middleware Henrik Thostrup Jensen Jesper Ryge Leth Automatic Job Resubmission in the Nordugrid Middleware Dat5 Project September 2003 - January 2004 Department of Computer Science Aalborg University Fredrik Bajersvej

More information

Integrating a Common Visualization Service into a Metagrid.

Integrating a Common Visualization Service into a Metagrid. Integrating a Common Visualization Service into a Metagrid. R. Watson 1, S. Maad 1, and B. Coghlan 1 Trinity College Dublin, Dublin, Ireland, watsonr@cs.tcd.ie, WWW home page: http://www.cs.tcd.ie/ watsonr

More information

Monitoring the Usage of the ZEUS Analysis Grid

Monitoring the Usage of the ZEUS Analysis Grid Monitoring the Usage of the ZEUS Analysis Grid Stefanos Leontsinis September 9, 2006 Summer Student Programme 2006 DESY Hamburg Supervisor Dr. Hartmut Stadie National Technical

More information

Lessons Learned in the NorduGrid Federation

Lessons Learned in the NorduGrid Federation Lessons Learned in the NorduGrid Federation David Cameron University of Oslo With input from Gerd Behrmann, Oxana Smirnova and Mattias Wadenstein Creating Federated Data Stores For The LHC 14.9.12, Lyon,

More information

On the employment of LCG GRID middleware

On the employment of LCG GRID middleware On the employment of LCG GRID middleware Luben Boyanov, Plamena Nenkova Abstract: This paper describes the functionalities and operation of the LCG GRID middleware. An overview of the development of GRID

More information

SDS: A Scalable Data Services System in Data Grid

SDS: A Scalable Data Services System in Data Grid SDS: A Scalable Data s System in Data Grid Xiaoning Peng School of Information Science & Engineering, Central South University Changsha 410083, China Department of Computer Science and Technology, Huaihua

More information

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme Yue Zhang 1 and Yunxia Pei 2 1 Department of Math and Computer Science Center of Network, Henan Police College, Zhengzhou,

More information

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation ATL-SOFT-PROC-2012-045 22 May 2012 Not reviewed, for internal circulation only AutoPyFactory: A Scalable Flexible Pilot Factory Implementation J. Caballero 1, J. Hover 1, P. Love 2, G. A. Stewart 3 on

More information

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID Design of Distributed Data Mining Applications on the KNOWLEDGE GRID Mario Cannataro ICAR-CNR cannataro@acm.org Domenico Talia DEIS University of Calabria talia@deis.unical.it Paolo Trunfio DEIS University

More information

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data Maaike Limper Emil Pilecki Manuel Martín Márquez About the speakers Maaike Limper Physicist and project leader Manuel Martín

More information

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing Wolf Behrenhoff, Christoph Wissing DESY Computing Seminar May 17th, 2010 Page 1 Installation of

More information

ISSN: Supporting Collaborative Tool of A New Scientific Workflow Composition

ISSN: Supporting Collaborative Tool of A New Scientific Workflow Composition Abstract Supporting Collaborative Tool of A New Scientific Workflow Composition Md.Jameel Ur Rahman*1, Akheel Mohammed*2, Dr. Vasumathi*3 Large scale scientific data management and analysis usually relies

More information

HEP replica management

HEP replica management Primary actor Goal in context Scope Level Stakeholders and interests Precondition Minimal guarantees Success guarantees Trigger Technology and data variations Priority Releases Response time Frequency

More information

Distributing storage of LHC data - in the nordic countries

Distributing storage of LHC data - in the nordic countries Distributing storage of LHC data - in the nordic countries Gerd Behrmann INTEGRATE ASG Lund, May 11th, 2016 Agenda WLCG: A world wide computing grid for the LHC NDGF: The Nordic Tier 1 dcache: Distributed

More information

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS Raj Kumar, Vanish Talwar, Sujoy Basu Hewlett-Packard Labs 1501 Page Mill Road, MS 1181 Palo Alto, CA 94304 USA { raj.kumar,vanish.talwar,sujoy.basu}@hp.com

More information

The ATLAS PanDA Pilot in Operation

The ATLAS PanDA Pilot in Operation The ATLAS PanDA Pilot in Operation P. Nilsson 1, J. Caballero 2, K. De 1, T. Maeno 2, A. Stradling 1, T. Wenaus 2 for the ATLAS Collaboration 1 University of Texas at Arlington, Science Hall, P O Box 19059,

More information

PROOF-Condor integration for ATLAS

PROOF-Condor integration for ATLAS PROOF-Condor integration for ATLAS G. Ganis,, J. Iwaszkiewicz, F. Rademakers CERN / PH-SFT M. Livny, B. Mellado, Neng Xu,, Sau Lan Wu University Of Wisconsin Condor Week, Madison, 29 Apr 2 May 2008 Outline

More information

ICAT Job Portal. a generic job submission system built on a scientific data catalog. IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013

ICAT Job Portal. a generic job submission system built on a scientific data catalog. IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013 ICAT Job Portal a generic job submission system built on a scientific data catalog IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013 Steve Fisher, Kevin Phipps and Dan Rolfe Rutherford Appleton Laboratory

More information

Heterogeneous Relational Databases for a Grid-enabled Analysis Environment

Heterogeneous Relational Databases for a Grid-enabled Analysis Environment Heterogeneous Relational Databases for a Grid-enabled Analysis Environment Arshad Ali 1, Ashiq Anjum 1,4, Tahir Azim 1, Julian Bunn 2, Saima Iqbal 2,4, Richard McClatchey 4, Harvey Newman 2, S. Yousaf

More information

ATLAS Experiment and GCE

ATLAS Experiment and GCE ATLAS Experiment and GCE Google IO Conference San Francisco, CA Sergey Panitkin (BNL) and Andrew Hanushevsky (SLAC), for the ATLAS Collaboration ATLAS Experiment The ATLAS is one of the six particle detectors

More information

Problems for Resource Brokering in Large and Dynamic Grid Environments*

Problems for Resource Brokering in Large and Dynamic Grid Environments* Problems for Resource Brokering in Large and Dynamic Grid Environments* Catalin L. Dumitrescu CoreGRID Institute on Resource Management and Scheduling Electrical Eng., Math. and Computer Science, Delft

More information

Web Services for Visualization

Web Services for Visualization Web Services for Visualization Gordon Erlebacher (Florida State University) Collaborators: S. Pallickara, G. Fox (Indiana U.) Dave Yuen (U. Minnesota) State of affairs Size of datasets is growing exponentially

More information

Initial experiences with GeneRecon on MiG

Initial experiences with GeneRecon on MiG Initial experiences with GeneRecon on MiG Thomas Mailund and Christian N.S. Pedersen Bioinformatics Research Center (BiRC), Dept. of Computer Science, University of Aarhus, Denmark, Email: {mailund,cstorm}@birc.dk

More information

A distributed tier-1. International Conference on Computing in High Energy and Nuclear Physics (CHEP 07) IOP Publishing. c 2008 IOP Publishing Ltd 1

A distributed tier-1. International Conference on Computing in High Energy and Nuclear Physics (CHEP 07) IOP Publishing. c 2008 IOP Publishing Ltd 1 A distributed tier-1 L Fischer 1, M Grønager 1, J Kleist 2 and O Smirnova 3 1 NDGF - Nordic DataGrid Facilty, Kastruplundgade 22(1), DK-2770 Kastrup 2 NDGF and Aalborg University, Department of Computer

More information

e-science for High-Energy Physics in Korea

e-science for High-Energy Physics in Korea Journal of the Korean Physical Society, Vol. 53, No. 2, August 2008, pp. 11871191 e-science for High-Energy Physics in Korea Kihyeon Cho e-science Applications Research and Development Team, Korea Institute

More information

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation Journal of Physics: Conference Series PAPER OPEN ACCESS Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation To cite this article: R. Di Nardo et al 2015 J. Phys.: Conf.

More information

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia Grid services Dusan Vudragovic dusan@phy.bg.ac.yu Scientific Computing Laboratory Institute of Physics Belgrade, Serbia Sep. 19, 2008 www.eu-egee.org Set of basic Grid services Job submission/management

More information