Meta-Monitoring for performance optimization of a computing cluster for data-intensive analysis

Size: px
Start display at page:

Download "Meta-Monitoring for performance optimization of a computing cluster for data-intensive analysis"

Transcription

1 IEKP-KA/2016-XXX Meta-Monitoring for performance optimization of a computing cluster for data-intensive analysis Meta-Monitoring zur Leistungsoptimierung eines Rechner-Clusters für datenintensive Analysen BACHELORARBEIT von Sebastian Brommer An der Fakultät für Physik Institut für Experimentelle Kernphysik (IEKP) Referent: Prof. Dr. Günther Quast Korreferent: Dr. Manuel Giffels

2 "One of the great mistakes is to judge policies and programs by their intentions rather than their results." GARRY TRUDEAU

3 Contents 1 Introduction 3 2 Essentials LHC Compact Muon Solenoid Computing High Energy Physics Computing Jobs CMS Computing Model Local IEKP Computing The HappyFace framework Monitoring General Concept Implementation HappyFace Core Database Modules Local HappyFace Instance Collector Data Sources Additional Components HappyFace Instance Batch System Modules Job Status Site Status History Cache Modules Cache Details and Cache Summary Cache HitMiss Cache Life Time

4 Contents Cache Distribution Conclusion and Outlook 33 A Appendix 35 A.1 Configuration Keys A.2 Additional Plots and Tables List of Figures 41 Listings 43 Bibliography 45 2

5 1 Introduction The generation and processing of enormous amounts of data is one of the biggest challenges of modern society. IBM estimates that 2500 PB of data are generated every day [1]. Naturally, data intensive analysis is and will be a crucial part of industry, economy and scientific research projects. Large computing clusters are required to process immense amounts of data in a short period of time, as it is for example done to provide accurate real time search results. Nowadays computing infrastructure is oftentimes very complex. New and innovative technologies are tested and implemented on a daily basis and many solutions are designed to improve specialized use cases. This rapid development improves the efficiency of data analysis on both software and hardware side. On the other hand, every new product has to be tested, documented and maintained. It is not uncommon that newly developed frameworks only offer inadequate means for monitoring, testing and optimizing their performance on a day-to-day business. The development focuses on solving the actual problem rather than easy monitoring solutions. Many projects offer interfaces to obtain the necessary monitoring information but lack easy access and visualization. High Energy Physics is a perfect example of highly data driven science projects [2]. Institues like the the Institute of Experimental Nuclear Physics (IEKP) at the Karlsruhe Institute of Technology (KIT) operate its own computing cluster to perform scientific analysis. The Meta-Monitoring framework HappyFace [3] was developed at IEKP to offer a new monitoring solution for modern computing systems. HappyFace provides an individually customized monitoring solution concerning virtually any aspect of a computing cluster. It unifies visualization and data handling, while the actual data acquisition can be freely customized to suit developers, operators and supervisors needs. 3

6 1 Introduction In the course of this thesis, a new meta-monitoring instance was set up, in order to monitor the computing cluster at the IEKP. This cluster consits of several complex services and grew over several years. It includes a batch system [4] with access to severl computing resources like external cloud resources or a large desktop cluster. Furthermore modern hardware for high throughput data analysis [5, 6] was added recently. Chapter 2 provides an introduction to the application of computing clusters in High Energy Physics and associated analysis tasks. Moreover, it introduces the computing structure at IEKP and their components as one of many possible applications of HappyFace. Chapter 3 provides a rough overview of the HappyFace framework and its features. In Chapter 4, the main features and design ideas behind the new HappyFace instance are explained in detail. Finally Chapter 5 gives a quick outlook on further developments before concluding the thesis with a short summary. 4

7 2 Essentials 2.1 LHC The Large Hardon Collider [7] is a ring-accelerator and collider at CERN (European Organization for Nuclear Research) [8] near Geneva. To date, it is the biggest accelerator built by mankind and provides the highest particle energy ever artificially created on earth. With its recent upgrade the LHC now collides two proton beams at a center-of-mass-energy of 13 TeV instead of 8 TeV before. The accelerator contains two separate beam tubes to accelerate two protons in opposite directions. The tunnel where the LHC is located as a total length of 26.7 km and was originally built for the LEP experiment between 1984 and Since 2015, the LHC has been cable of running at a luminosity of cm 2 s 1 for proton proton and cm 2 s 1 for heavy ions. With the Run-2 upgrade, the beam crossing interval was upgraded to 25 ns, resulting in a crossing frequency of 40 MHz. Four major experiments are part of the LHC complex: ALICE [9], ATLAS [10], CMS [11] and LHCb [12]. ALICE (A large Ion Collider Experiment) was designed to analyze the QCD (Quantum Chromodynamics) sector of the standard model by studying the collision of heavy nuclei. The LHCb is an experiment which was designed to study heavy flavor physics. Its goal is to find indirect evidence of new physics for example by measuring CP-Violation and rare decays of B-Mesons. ATLAS (A Toroidal LHC ApparatuS) and CMS (Compact Muon Solenoid) are the two biggest detectors. They are designed as multi-purpose detectors to cover a wide area of physical topics. Recent successes of ATLAS and CMS were the discovery of the Higgs-Boson [13] and the increase in precision of several other standard model parameters. 5

8 2 Essentials Figure 2.1: Overview of the CMS Detector [14] 2.2 Compact Muon Solenoid Like ATLAS, the CMS detector is built in a barrel shape around the beam pipe as shown in Figure 2.1. In total, the detector is 21.6 m long and has a diameter of 14.6 m. Like all modern detectors, CMS is made up of several sub detectors, each serving a dedicated role for particle identification and measurement. The innermost part is a silicon tracker that is made up of silicon pixels and silicon micro stripes. This inner tracker allows a spatial resolution from 15 µm to 20 µm and consists of a barrel region and two end caps. This allows a very precise reconstruction of innermost particle tracks originating from the collision point. The calorimeters are placed around this silicon tracker. CMS has an electromagnetic as well as a hadronic calorimeter. The electromagnetic calorimeter consists of more than lead tungstate crystals. This material has a very high density and a short radiation length which allows for the calorimeter to be very compact yet ensuring the full energy deposition. The hadronic calorimeter is positioned around the electromagnetic calorimeter so no particle energy gets lost. A mix of brass and plastic crystals is used as material to determine the energy of hadron particles. 6

9 2.3 Computing A superconducting magnet is located around the calorimeters and the inner silicon tracker. The magnet is designed to ensure a homogeneous magnetic field of 3.8 T in the inner part of the detector. The magnet is operated at a temperature of 1.8 K and is the largest superconducting magnet ever built [15]. Its magnetic field is essential for the measurement of charge and transversal momentum p T of particles in the silicon tracker and the calorimeters. A Muon System is built around the superconducting magnet and between the return yoke of the magnet. As the name indicates, CMS was built to measure and detect muons with high precision. The Muon system is able to identify and track muons. Any charged particle that is able to pass the two calorimeters and is detected by the muon system is most likely a muon. Due to the fact that the muon system is the most outer part of the detector, it is also the biggest part of the detector. It uses chambers filled with gas to track particles which lowers the costs of the detector. Since 40 million bunch collisions take place inside the detector every second, it is essential to use very efficient triggers to filter out any uninteresting event. CMS uses a two step system: a Level-1 Trigger(L1) and a High-Level Trigger(HLT). The Level-1 Trigger is made of programmable electronics and is fully automated. The L1 combines information from the trackers and calorimeters to make a very fast decision whether an event is potentially of interest. The HLT is a software with the purpose of filtering events using basic event reconstruction to look for characteristic, physical signatures. It executes a quick analysis using algorithms designed to perform very fast, rather than exact. In the end, a few hundred events per second are selected, resulting in a data flow of roughly 600 MB s 1 [16] for Run Computing In order to make use of the data produced by the detector, a great variety of software and computing resources are needed to enable a successful analysis of the measured data High Energy Physics Computing Jobs There are three main computing tasks in High Energy Physics. Theoretical predictions of events have to be calculated, the detector data has to be reconstructed and a physical analysis has to be made. Monte Carlo Simulations Monte Carlo Simulations are essential for comparing measured data with theoretical predictions. Monte Carlo datasets are composed of a simulation of particle collisions as well as a full simulation of the behavior of the resulting particles in the detector. In 7

10 2 Essentials Figure 2.2: Reconstruction of an event in CMS showing the largest jet pair event observed so far. The di-jet system has a total mass of 6.14 TeV [18] 2012, the CMS collaboration simulated more than 6.5 billion events citecmssimulation. These simulations get more demanding with a higher center of mass energy. More processes are possible and more processes occur in addition to the hard interaction process. Although recent software improvements have increased the speed of Monte Carlo Simulations (up to 50 %) [17], the need for simulated events is still difficult to satisfy. Improvement of the simulation process will continue to be one of the major challenges for the computing infrastructure of High Energy Physics. Monte Carlo Simulations are CPU intensive but their Input/Output (I/O) load is very low since they produce data themselves and only need simulation parameters as input. Overall, they are the most time consuming computing task in High Energy Physics. Data Reconstruction In this process, signals recorded by the detector get reconstructed into physical objects. Signals from the tracker allow a reconstruction of particle tracks and its energy gets calculated from the calorimeter data. By putting all this data together, the initial interaction process that took place can be reconstructed and displayed as shown in Figure 2.2. The recent upgrade of the LHC leads to an increase of 8

11 2.3 Computing complexity concerning this task as more events have to be reconstructed and the amount of pile-up processes (soft scattering processes that take place during the same bunch crossing) grows due to the increase of the center-of-mass-energy. The reconstruction is an I/O intensive task since raw detector data is needed to create a much more compact physical dataset for further analysis. The reconstructed datasets are often reduced even more to exclude variables and parameters that are not relevant for a certain physical analysis and only take up valuable disk space. This process is called skimming and is very useful for saving resources and time, since the datasets being used get smaller. Physical analysis The physical analysis itself is very diverse. The requirements differ depending on the type and size of the datasets that are used in the analysis. In general, physical analysis tasks are I/O intense as they use different datasets from reconstructed and simulated events CMS Computing Model In order to provide solutions to these tasks, the LHC uses a specialized computing model, which is realized in the Worldwide LHC Computing Grid (WLCG). The WLCG is the worlds largest computing grid, connecting more than 170 computing facilities in more than 40 countries. The mission of the WLCG project is to provide global computing resources to store, distribute and analyse the 30 Petabytes of data annually generated by the LHC at CERN [...] [2]. Instead of using a few large resources, such as supercomputers, the WLCG relies on sharing the overall load between many computing centers. This approach is much more dynamic. In HEP, a high output over a long period of time is favored rather than a high output per minute or second [19]. The WLCG approach allows high overall output whilst granting contribution to everybody who wants to be part of the community regardless the size of their resources provided. In order to ensure a well organized structure, the WLCG uses a hierarchical model of resources: 1. The detector data enters the grid at Tier0 computing resource at CERN. Here the data is aggregated, stored on tape and distributed further. 2. Tier1 centers are also responsible for safekeeping and data reconstruction. 3. Tier2 center s tasks are simulations and processing of the data. 9

12 2 Essentials 4. Tier3 centers are utilized for physical analysis by the users of the WLCG. They are only loosely connected to the grid. This model was designed to provide the CMS detector measurements to as many people as possible. It enables smaller universities and institutes to do successful research without the need of excessive computing resources. A more detailed description of the CMS computing model can be found at [20, 21] Local IEKP Computing At the IEKP, several local computing resources are available and listed in the following. The available resources are compareable to a Tier3 center of the WLCG grid. New systems are implemented and tested here, for example a cache based analysis middleware solution (HPDA) [5]. The local resources are joined together via the batch system HTCondor. Processing Resources A number of different local resources are usable at IEKP. The most obvious ones are the Desktop PCs. About 60 desktop machines are available with different specifications depending on the age of the machine. Specifications range from old dual core PCs with 2 GB of RAM to new ones with modern i7 processors, SSDs and enough RAM to support a regular works well as job executions from the batch system simultaneously. Another resource is the EKPSG-/EKPSM cluster. It adds a distributed caching mechanisms to the existing work flow of a Batch System (more in Chapter Cache System). The cluster consists of 5 machines, each providing 32 Cores and 64 G B of RAM as well as SSDs for caching and hard drives for storage. IEKP owns several file servers to store skims, datasets and analysis results. Furthermore, IEKP hosts several service machines. These servers have multiple purposes such as hosting general infrastructure (firewall, web space...) or specialized services like HTCondor. Their specifications differ from server to server. All these services are connected to the backbone of the SCC (Steinbruch Center of Computing) which provides the IT management at KIT. Batch System HTCondor is a high-throughput distributed batch computing system [4]. This project exists since 1984 an is developed and maintained at the University of Wisconsin- Madison. It is available for every common operating system. Its main goal is to allow an easy management of computing resources whilst staying flexible and stable even for big computing systems. 10

13 2.3 Computing User ekpcms6 resources EKP Desktop Cluster submit job / recieve output message execute job / send output message HTCondor Agent execute job send output message ekpsm/ekpsg HPDA ekp-cloud bw4cluster Freiburg gridka openstack HTCondor Central Manager icons made by Freepik, licensed by Creative Commons CY 3.0 Figure 2.3: A representation of the HTCondor setup at IEKP. Users submit jobs to the HTCondor manager. The HTCondor manager handles execution of jobs and returns the results, including the location of the result data to the user. The basic features of Condor include a job management mechanism, scheduling, and priority policy as well as resource management and monitoring. In the default work flow, a user submits a job to HTCondor and HTCondor chooses where and when to run the job, supervises the process and informs the user after the job is completed. The strength of HTCondor compared to other batch systems is its ability to utilize the available cluster processing in an efficient and effective way, such as detecting idle workstations and using otherwise wasted computing resources. HT- Condor is able to adapt without requiring the resources to be permanently available. HTCondor can even move jobs away from certain machines without the need of restarting the whole job. Resource Requests (jobs) and Resource Offers (machines) get matched based on a policy that is defined by the user using the ClassAds language [22]. Figure 2.3 shows that HTCondor is a central component of the day-to-day work setup at IEKP. Cache System The High-Performance Data Analysis concept (HPDA) is being developed at KIT and addresses the issue of high I/O analyses being inefficient due to insufficient bandwidth. The idea behind HPDA is to combine the strength of a dedicated storage where all files are saved on one file server and every node accesses the files from there, and integrated storage, where every node has its own file storage. A system of shared caches is used to fulfill this task. In order to work efficiently, HPDA includes an API (application programming interface) to exchange information with the batch system (HTCondor). In addition, a caching algorithm is implemented to determine whether a file is worth caching, and 11

14 2 Essentials if the cache is full, which files are going to be deleted in order to keep the cache as full as possible. A coordinator component is responsible for delegating all the worker nodes and their caches [6]. More information can be found in the original design paper [5]. 12

15 3 The HappyFace framework 3.1 Monitoring Monitoring is a crucial part of modern computing infrastructure. Monitoring means "Supervising activities in progress to ensure they are on-course and onschedule in meeting the objectives and performance targets [23]." The number of Cloud-based services has increased rapidly in the last years and so has the complexity of the infrastructures, behind these services. To properly operate and manage such complex infrastructures, effective and efficient monitoring is constantly needed [24] especially for every modern computing system used in High Energy Physics. Monitoring is the key tool for providing information on hardware and software performance as well as user based statistics. It allows a performance and workload analysis, on top of an activity and availability analysis. This is crucial when testing new systems such as the HPDA framework. Monitoring data can be further used to conclude future investment steps, identify problems and optimize the performance of monitored computing systems. 3.2 General Concept The idea behind HappyFace is to create a meta monitoring framework instead of a traditional monitoring framework. Rather than collecting raw data where it originates (e.g. every node in a computing network), HappyFace is designed to aggregate data from existing monitoring services and provide this information in a more compact way. In addition, an interpretation of the data is done in HappyFace. HappyFace collects data from different data sources and displays them on one website no matter how different the data sources are. With aggregated data stored in a database, HappyFace s history function allows to show the previous state of monitored resources. New data gets acquired at a regular time interval to ensure 13

16 3 The HappyFace framework HappyFace external monitoring sources config - category 1 * mod 1 * mod 2 - category 2... DataBase Modules module.py HTML output incl SQL queries module.html final HTML output Figure 3.1: Diagram displaying the work flow of the HappyFace framework from data sources to final output on the web page. Adapted from: [3] up-to-date data. The modular structure of HappyFace is a flexible approach and allows for a highly customizable monitoring service, including results of custom tests and plots. As this is only a small overview, more information can be found in the original design paper [3]. 3.3 Implementation The main design idea is to create a monitoring framework that is as simple as possible. The original version was implemented in Perl in 2008 [25]. The current version 3.0 is written in Python and can be found at [26]. HappyFace has several components: the core, a database and the actual modules. Figure 3.1 illustrates the work flow of HappyFace HappyFace Core The HappyFace core is the heart of any HappyFace instance. It is responsible for providing the basic features of HappyFace such as a module template, web page template, access to the database, a download service to load data into HappyFace and logging. HappyFace contains two main Python scripts. acquire.py executes all module Python scripts to fill the database with the newest data. render.py is executed every time a user accesses the web page and generates an output web page using the module HTML templates and the data stored in the database. 14

17 3.3 Implementation Database The database is the central place where all information is stored. The default installation of HappyFace uses a sq lite typed database but a setup with a PostgreSQL database is also possible. Every module obtains its own data table in the database. Every time a new aquire.py cycle gets triggered, a new row of data is added to the table. Modules may also feature an arbitrary number of subtables. These subtables can be used to add additional information to a module, e.g. a detailed overview of all jobs in a batch system. Subtables are linked to the main table of the module in order to maintain a logical structure. All parameters stored in the database can be accessed via the plotgenerator. The plotgenerator is a powerful tool which allows users to receive detailed information on every parameter available in HappyFace. As demonstrated in Figure 3.2, every value stored in the database can be plotted in a custom chosen time frame. The values do not have to belong to the same module; every value available in the database can be plotted Modules Modules define the content of a HappyFace instance. Every module can be designed individually. Each module has its own configuration file in which an arbitrary number of configuration parameters, such as location of the data source, are definable. The module code itself is responsible for processing information and storing the desired values in the database. Plots are being generated during this step using the matplotlib library and saved as Portable Network Graphics files in a static archive. To reduce the size of the database only the location of plots is stored in the database, rather than the plot itself. Every module consists of two files. A Python script which collects, processes and stores the data in a database table and a HTML template, where the style of the module output such as tables, plots and interactive elements are defined. This is done via the MAKO template library. MAKO combines HTML template and data from the database to provide the full HTML code that is used for the website. Modules may be organized into discretionary categories. Per category, an arbitrary number of modules can be featured. Each module has a status parameter which is used to determine the quality of a module. The status ranges from OK to Critical (green, yellow and red color). The status of a whole category is determined by the statuses of the modules it contains and is displayed in the navigation bar. The status calculation is fully up to the module developer. The design may differ from module to module. The fact that every module has an individual python script allows nearly every monitoring idea to be fulfilled. 15

18 3 The HappyFace framework Figure 3.2: Plot generated by the plotgenerator tool of HappyFace. It shows the number of claimed slots and running jobs. The curves are almost identical which indicates that the system is running just fine. As the aquire.py cycle is executed once very 15 minutes, a data point is taken every 15 minutes. 16

19 4 Local HappyFace Instance In order to monitor the local resources at IEKP, the need for a monitoring system emerged. The HappyFace framework proved to be the best solution for this task, as many different parameters from various data sources can be monitored without problems. In order to fulfill this task, the ekplocal project consists of two seperate parts running on two different machines, the collector and the actual HappyFace instance. The project is available at [27]. 4.1 Collector The first component is the collector. It is a small software tool currently running on cms6. Its purpose is to collect data from various sources, put them in a process-friendly format and transfer the data to the ekphappyface machine, where the HappyFace instance is located. The collector is independent from HappyFace, but the output data design is in line with the HappyFace design to allow an easy interaction. The collector reformats information from data sources and combines them to create several new metrics that were not available before. The collector is written in Python Data Sources Four different data sources are used at the moment: the batch system (HTCondor), the cache worker nodes (ekpsg / ekpsm), the cache coordinator and Ganglia. Each data source behaves differently and requires a different extracting and processing method. 17

20 4 Local HappyFace Instance work flow data flow HTCondor collection of data by collector raw data EKPSG/SM Cache Coordinator Ganglia upload to ekphappyface ekpcms6.json /.png aquire.py py HappyFace HappyFace database.json /.png ekphappyface html / css HappyFace core html / css render.py by HappyFace apache webserver html / css user output on website data source collector HappyFace icons made by Designmodo, Freepik, licensed by Creative Commons CY 3.0 Figure 4.1: Scheme displaying the work flow as well as the data flow of the ekplocal project. Input data is collected by the collector running on cms6. After that, the data is transfered to happyface, where the HappyFace core processes the data, stores it in a database, generates the plots and displays the output web page. Batch System HTCondor provides an interface which returns information on the command line. Users are able to specify the parameters to be displayed as well as to constrain certain parameters. A complete documentation of available HTCondor commands can be found at [28] The collector uses three different HTCondor API calls: 1. condor_q: Provides information on currently running jobs and active users (used in module Job Status) 2. condor_status: Provides information on currently active machines and site usage (used in module Site Status) 3. condor_history: Provides information on jobs that have left the batch system (used in module History). 18

21 4.1 Collector 1 echo { $( condor_status - format "%s" Name -format :{" State ":"%s" State - format, " Activity ":"%s" Activity -format, " LoadAvg ":" %s"}, LoadAvg ) "}" > condor_status. json Listing 4.1: Example of the condor_status command used for generating data for the Site Status Module The command 4.1 generates a file called condor_status.json. This file is an almost valid JSON file (the syntax gets corrected in the module code). The actual API call is done by the condor_status command. The echo command is used to minimize the corrections that have to be done later on in the HappyFace instance. Cache Worker Nodes The cache worker nodes provide an interface responding to HTML calls. The API returns a summary status of all worker nodes. In addition it is also able to return a list of every cached file, its size, age and cache parameter to determine the importance of a file (used in Module Cache Details and Cache Summary). The collector accumulates this information from every worker node and merges into two different JSON files. One contains the summary data and one the detailed information on every file present across all caches as seen in Listing {" ekpsg02 ": { 2 "/ storage /a/ cmetzlaff / htda / benchmark6_1 / kappa_ DoubleMu_ Run2015D_ Sep2015_ 13TeV_ root ": { 3 " allocated ": , 4 " maintained ": , 5 " score ": , 6 " size ": },[...] 8 },[...]} Listing 4.2: The collector output from worker nodes. The JSON file contains the name of the worker node, a full file name, file size, internal caching score and the point in time a file allocated and maintained last (further explained in Chapter 4.4). The time is given in UNIX time and the file size in B. Cache Coordinator The Cache Coordinator interface responds in the fashion of the cache worker nodes API. At the moment, two different pieces of information are extracted from the coordinator, the first being the lifetime of files in the cache. The coordinator retains 19

22 4 Local HappyFace Instance Figure 4.2: Plot showing the distribution of the collector run time. In total, 1979 collection cycles were completed between 10 Dec 2015 and 04 Jan The first and last bin are overflow bins. An unexpected error in the network connection of cms6 caused run times bigger than 120 s. Generally, a full cycle lasts less then 60 s. the point in time a file was deleted from the cache and how long this file stayed cached (used in module Cache Life Time). The second statistic is whether a job used cached files and ran on the right worker node (used in module Cache HitMiss). Ganglia Ganglia is a monitoring system for high performance computing systems such as the setup used at IEKP [29]. Ganglia gathers information on usage parameters such as CPU load or network usage for every machine in the network. Machines can also be organized in groups to give an overview, e.g. of all desktop machines. Ganglia plots measured parameters on the fly and offers a web front end for easy access to the monitored data. Plots are extracted with a bash script using wget Additional Components The upload component of the collector uses the Python package requests. Files are being uploaded into a dedicated folder on ekphappyface. The security authentication is handled via the Apache server running on ekphappyface. The collector is logging error messages such as unavailable worker nodes or failing uploads of files. A cronjob triggers a collection cycle every fifteen minutes. The average duration of a cycle is shown in Figure

23 4.2 HappyFace Instance 4.2 HappyFace Instance The HappyFace instance is located on the ekphappyface machine. At the moment, the instance contains five different categories: Batch System Modules, Cache Modules and three categories showing Ganglia plots of the ekpsg/ekpsm nodes, the file server and the overall cluster. The aquire.py process is triggered four times per hour to roughly match the collector timing. 21

24 4 Local HappyFace Instance 4.3 Batch System Modules Three Batch System Modules monitor the current status of the batch system. They consits of 1. a module that monitors the current status of jobs, 2. a module that monitors the current site usage, 3. a module that displays the batch system usage during the last day. Jobs are user submitted computing tasks (Chapter 2.3.1). Slots on the other hand, are the most finely grained representation of computing resources where a job is able to run. Slots are provided by the different computing resources connected to the batch system. Jobs HTCondor enforces a certain job work flow. At first a job is submitted to the batch system and waits until a fitting slot is found. Then the job is executed based on the submitted configuration parameters and once finished, its status code and output information are returned to the user. In order to categorize jobs, HTCondor relies on status codes. The status codes as shown in Table 4.1 represent the state of a job at a given point in time. Jobs are very individual. Their run time can vary between 10 minutes up to 24 hours. Nearly all jobs are single core jobs so they use only one CPU, but it is also possible to process multi-core jobs. Slots Sites are built from several multi-core machines. Each CPU core in a site is assumed to be able to run one CPU intensive task. To represent this structure, a site offers slots to the batch system where each slot represents one CPU core and is able to run Table 4.1: Job status codes used in HTCondor. Code 5 and 6 can appear though it is quite unlikely. code definition explanation 1 idle Jobs that are either in queue or paused 2 running Jobs that are currently being executed in a slot 3 removed Jobs that were manually removed from HTCondor 4 completed Jobs that successfully finished their task 5 held Jobs that are on hold 6 transfer Jobs that are returning their output 22

25 4.3 Batch System Modules one job at a time. Most of the jobs use one core and 2 GB of RAM. For example, a machine with 16 cores is able to host and run a total of 16 jobs at the same time. Therefore the machine has 16 individual slots listed in HTCondor. The current status of a slot is also represented via a status code in HTCondor. There are only two common slot statuses, either a slot is claimed, or it is unclaimed. In theory HTCondor does differentiate between a state and an activity of a slot. Experience shows that all cases other than those two are negligible. In practice a slot is either claimed and runs a job, or it is unclaimed and idle Job Status The first module is a Job Status Module. It provides an overview on jobs that are currently processed by HTCondor. The module uses the parameters listed in { " ekpcms6. physik.uni - karlsruhe.de # # ": // job ID 2 {" RAM ":"8", // RAM request by job in Gigabyte 3 " Status ":"1", // current status of job 4 " Cpu_1 ":"0.0", // time user CPU used 5 " LastJobStatus ":"0", // status of job before present status 6 " HostName ":" undefined ", // Name of Machine, job is running on 7 " User ":" sieber@physik.uni - karlsruhe.de", // User ID 8 " RequestedCPUs ":"1", // Number of Cores requested by job 9 " QueueDate ":" ", // point in time the job was submitted 10 " JobStartDate ":" undefined ",// point in time the job started 11 " Cpu_2 ":"0.0" // time system CPU used 12 } 13 } Listing 4.3: Example of the data extracted for a single job The main component of this module is a jobs per user plot, as shown in Figure 4.3. To create the plot, jobs are sorted by user names and their current status. All common status codes given in Table 4.1 are included. "Removed" and "completed" jobs can show up if the collector collects data from condor_q during a correct management cycle of HTCondor. These jobs do not show up in the next iteration as they are no longer considered as currently processed by HTCondor. "Queued" jobs have a present Status of 1 and a LastJobStatus of 0. "Queued" jobs have not started running, thus are waiting to get started. The queue time of a job can be calculated via queue time = current time QueueDate. (4.1) 23

26 4 Local HappyFace Instance Figure 4.3: Plot is shown in the Job Status Module. This plot was created on 17 Dec 2015, 15:09. In total, there were 98 Jobs "running" and 50 Jobs "queued". It is also possible to determine the efficiency of jobs. The efficiency is defined as efficiency = Cpu_1 + Cpu_2 current time JobStartDate. (4.2) The efficiency ranges from one, a perfect job that uses the CPU all the time to zero, no CPU usage at all. The parameter must be interpreted with care: jobs that have just started may have a low efficiency reported because of timing issues between HTCondor, collector and HappyFace. The CPU time is not updated in real time by HTCondor which may result in a considerably worse efficiency reported although the job is running just fine. This effect disappears for jobs with longer run times. The actual RAM usage of a job is difficult to track. HTCondor provides several parameters concerning the RAM usage, but none of these seem to represent the genuine usage as it is difficult to consistently describe RAM usage. The requested RAM is the most accurate value HappyFace provides concerning this issue although the real RAM usage is most likely lower than this value. The module provides a detailed table that can be found in Appendix Table A.1. It displays the plotted data as well as the efficiency per user and the sites a certain user utilizes at the moment. The HappyFace color code is used to indicate critical values: If the efficiency of "running" user jobs is too low, the table column is marked 24

27 4.3 Batch System Modules Figure 4.4: Plot shown in the Site Status Module. This plot was created on 08 Jan 2016, 12:09 and displays how many slots are running per site and their status in red. The status of the module is influenced by multiple factors. The parameters used for the calculation of the status can be configured via the configuration parameters of the module. If the module status is not OK, the module returns an error message which explains why. The configuration parameters are listed in Appendix A Site Status The Site Status Module is designed similarly to the Job Status Module. It displays the details of different computing resources available to the batch system and their current status. The module provides a visual overview as shown in Figure 4.4. It identifies every slot that is online and is able to determine the number of actual active machines. In order to shorten site names, the module requires a list of site names as a configuration key. By using these, the module is able to match slot names like slot1@ekpcloudc9577fff-72a1-4bc8-9ee4-dd476a689bd2.ekpcloud to the site ekpcloud. This configuration parameter ensures that the sites are identified correctly, regardless of the name convention used in HTCondor. Additional configuration keys used for the status calculation can be found in Appendix A.1 A detailed table displays the plotted data, as well as the average load on claimed and unclaimed slots. It is very easy to identify malfunctioning slots as a claimed slot 25

28 4 Local HappyFace Instance (a) Job History Plot (b) Site History Plot Figure 4.5: Two plots used in the History Module. They were generated on 12 Jan 2016 and represent the usage of the IEKP resources in the last 24 hours. should have a high average load whilst an unclaimed slot should have an average load close to zero. The average load of slots that recently changed their status can be unrepresentative, similar to the efficiency of jobs. The HTCondor documentation does not explain which timing window is used to calculate this value. Colors are used to quickly identify sites with bad load values. Another feature is the HTCondor version list. This table shows which site uses which HTCondor client version. HTCondor client versions may differ from site to site but they are all compatible to each other History The History Module is the most complex module concerning the batch system. It combines information from condor_q and condor_history and allows to create a full batch system usage overview of the last day. The output of this module consists of two plots. One shows the job status of all jobs during the last day (Figure 4.5a), the other one shows the site usage during the last day (Figure 4.5b). The collector merges information from condor_history and condor_q into one data file. Essentially, six parameters are needed for the creation of the job history plot (all points in time rounded to hours): 1. QueueDate: Point in time a job entered the batch system 2. JobStartDate: Point in time, a job started "running", thus leaving the queue 3. CompletionDate: Point in time a job was completed. If a job is still "running" this value is zero. 26

29 4.3 Batch System Modules completion date second to last status last status!= 0 idle running finished finished finished held removed job no status idle idle running removed idle 0 running removed finished common rare highly unlikely removed finished held held removed removed removed Figure 4.6: Diagram showing all possible status configurations of a job and how likely they are. Some of them, like "removed" and after that "removed" do not make a lot of sense and appeared most likely because of bugged jobs or failing machines. The most common jobs are the ones that finish in normal fashion, so their last status is "finished", the second to last status is "running" and the completion date is not zero (2nd row). Another example are "queued" jobs. They have no completion date, the second to last status is not given and the last status in idle (5th row). 4. EnteredCurrentStatus: Point in time, a job entered his current status (if a job is completed, CompletionDate and EnteredCurrentStatus are identical). 5. JobStatus: The current/latest status of a job 6. LastJobStatus: The second last to status of a job. By utilizing these parameters, the module is able to create an accurate timetable of every job. This is done by assuming that jobs always have a queue time and finish successfully during their first run, as the HTCondor interface does not return more than the last two status codes. This is an accurate description for most of the jobs. Figure 4.6 shows all possible combinations of job statuses. At first, the completion date of a job is checked. This allows a separation between finished and currently active jobs. The two available status codes of a job and their corresponding timestamps allow the reconstruction of a jobs timetable. Lists with 25 entries, each representing 27

30 4 Local HappyFace Instance 20 hours ago 19 hours ago 18 hours ago Job Queued plot_data_queued[4]+= 1 Job Running plot_data_running[5]+= 1 Job Finished plot_data_finished[6]+= 1 Figure 4.7: Sequence diagram of an example job. The job entered the batch system at 20 hours ago, stays in the queue for one hour, ran for one hour and then finished. The list entries plot_data_queued[4], plot_data_running[5], and plot_data_finished [6] are increased by one (Index 0 represents 24 hours ago, Index 1 represents 23 hours ago etc.). one hour of the last day for every possible job status are used to combine the timetable data of each individual job. The lists get filled based on the information which status a job is in at a given point in time, as shown in an example 4.7. In the end, the plot is generated by plotting the status lists. The plot 4.5b utilizes the HostName parameter of every job, matches it with the given site names like in the Site Status module and plots the values. 28

31 4.4 Cache Modules 4.4 Cache Modules The Cache Modules monitor the status of the HPDA framework. They consist of 1. an overview module, 2. a module that monitors the efficiency of job distribution among the different machines, 3. a module that monitors the efficiency of dataset distribution among the different machines, 4. a module that monitors the lifetime of cached files. In theory, files that are used frequently get cached and are available as a local copy whereas files that are needed infrequently get stored on a remote file server. Caching a file means that it gets stored on a Solid-State-Drive (SSD) connected to a worker node. The framework must now detect, which files are required by a job in order to send the job to the right worker node. Otherwise the job would have no access to the cached file and there would not be any benefit in using a cache system at all. Two components are responsible for keeping the cache up to date. An allocation algorithm determines the theoretical distribution of files on the caches. This process is triggered every 5 to 10 minutes. The maintain algorithm is responsible for enforcing this distribution and checking the actual file status. This is done in a longer time interval. Score The Score is a parameter used to indicate how important a file is. Every time a job uses a certain file, the score of this file is increased. File scores also decrease, if files are not used over a longer period of time. When the cache is full the score determines which files have to be deleted and which files take up the free space Cache Details and Cache Summary These two modules use data from the Cache worker nodes. A summary table gives an overview of the overall status of the cache, how many files are cached, how much space is available and used and how many machines are available in the cluster. The details module features a table that shows how many files are stored on every machine. The plots generated for this module show size distribution, maintain and allocation time distributions and the distribution of score parameter of every file. The plots can be found in Appendix Figure A.1 29

32 4 Local HappyFace Instance Figure 4.8: Plot representing the locality rate and cachehit rate of jobs between 15 Dec 2015 and 22 Dec Jobs with a cachhit rate and locality rate of one are ideal Cache HitMiss The Cache HitMiss Module is a fine indicator of the overall performance of the HPDA framework and the caching algorithm. As explained before, the full benefit of the HPDA setup is achieved if all files needed are cached and a job is running on the right machine. These two conditions are represented by the cachehit rate and the locality rate. The Coordinator calculates these two rates for every job. The cachehit rate indicates how many of the files needed were cached. The value reaches from one, all files cached, to zero, no files cached. The locality rate represents how many files were cached on the machine the job ran on. A locality rate of one means all files were cached on the machine the job ran on; a locality rate of zero means no files were available in the cache connected to the executing machine. If the files are distributed on multiple caches or not all files are cached, values between one and zero are also possible. This data gets represented in a 2D scatter plot, as seen in Figure 4.8. Since the locality rate is dependent on the cachehit rate, only values located in the bottom triangle of the plot are possible. It is not possible to have a locality rate of 1 if only 50% of the files used are actually in the cache. The data for this plot is provided by the Cache Coordinator as well as the point in time a job was executed to allow a constraint on the period of time. 30

33 4.4 Cache Modules Cache Life Time This module has a basic usage. It displays the life time of files in the cache. As previously mentioned, the basic functionality of a cache includes the possibility to determine which files can be deleted from cache in order to free up space for new files. The Cache Coordinator provides the information how long a deleted file stayed in the cache and when it was deleted. This information gets displayed in a histogram, as shown in Appendix Figure A Cache Distribution Cache organization and the distribution of files are crucial parts of a cache based computing system. In HPDA, files are grouped in so called datasets. The different datasets emerge from the file structure the users use: files in the same folder belong to the same dataset. These datasets normally consist of multiple separate files. The file numbers range from one to several thousand files per dataset. In HPDA, an algorithm is implemented to distribute the files in one dataset equally among all machines in the cluster. In an ideal scenario, every machine caches an equal sized part of every dataset. This distribution is done by file size, not by file count. 1 {" ekpsg02 ":{ 2 "/ storage /a/ cmetzlaff / htda / benchmark6_1 ": { 3 " file_count ": 40, 4 " size ": 61440}, [...] 5 " ds_count ": 14, 6 " error_ count ": 0, 7 " status ": " Aquisition successful "}, 8 " ekpsg04 ": { 9 "/ storage /a/ cmetzlaff / htda / benchmark6_1 ": { 10 " file_count ": 40, 11 " size ": 61440}, [...] 12 " ds_count ": 14, 13 " error_ count ": 0, 14 " status ": " Aquisition successful "},[...]} Listing 4.4: Example of the data used to determine the distribution of a dataset among the machines. In this case, the dataset benchmark6_1 is stored on ekpsg02 and ekpsg04 with both equal file size and equal amount of files. This would be an ideal distribution. The data is extracted from every worker node in the HPDA cluster, filtered and combined into one JSON file by the Collector instance. The data provided by the collector is shown in Listing 4.4. In order to test the distribution algorithm, the Cache Distribution Module calcu- 31

34 4 Local HappyFace Instance Figure 4.9: Plot showing the dataset distribution on 12 Jan There are a total of 25 datasets cached of which 2 were not completely read. In addition, one machine was not active so the actual distribution may differ. lates a metric to determine, how optimal a dataset distribution is metric = n ( ) optimal_size actual_size(k) 2. (4.3) dataset_total_size k=0 n is the total amount of machines in the cluster. The optimal_size is calculated via optimal_size = dataset_total_size n, (4.4) and the actual_size is the aggregated size parameter of the machine k. The value get normalized in order to ensure metric values between zero and one metric_norm = metric 1 1 n. (4.5) In order to visualize this metric, a 2D scatter plot is used. Figure 4.9 shows the metric plotted against the number of files in a dataset. Some metric values are not possible due to the fact that datasets with fewer files than machines in the cluster can not be distributed equally over all machines since files are not splitted. 32

35 5 Conclusion and Outlook The ekplocal HappyFace meta-monitoring instance, which was set up during this thesis, monitors the local computing setup at IEKP. This includes the HPDA caching system and the local batch system HTCondor. Multiple modules were specifically designed and developed to monitor this complex infrastructure. The new instance adds monitoring components to key infrastructures for day-to-day work at IEKP. A custom collector was implemented to aggregate data from different data sources and provide presorted sets of data for the HappyFace modules. The different modules allow a quick overview of key system parameters. The ekplocal project still offers room for improvements and extensions. Several features can be added to make this monitoring instance even more powerful. New developments in the monitored resources provide new features and unexpected correlations, so adaptations might be required. Additional monitoring targets such as the cloud manager ROCED [30] can easily be added due to the modular design of HappyFace. ROCED is being developed at IEKP and handles the allocation of cloud resources based on current demand. It is able to request, start and shut down virtual machines from external cloud resources and embed them into the current cluster setup. ROCED features an interface which can be used to access useful data and implement several HappyFace modules. In its current state, the ekplocal instance is running without major problems and provides a status report with new aggregated data every 15 minutes (a snapshot of the web out put is shown in Figure 5.1). It provides satisfying status information about the computing instance and grants a quick overview of the overall health of computing resources at IEKP. The new monitoring instance will help to improve the general performance of the computing resources at IEKP to ensure a smooth analysis process in the future. 33

36 5 Conclusion and Outlook Figure 5.1: Snapshot of the ekplocal HappyFace instance. It shows the Job Status Module and some detailed information on the current status. The navigation bar with the status of the different categories is shown at the top. This snapshot was taken 16 Feb

37 Appendix A Appendix A.1 Configuration Keys Default Keys Every module has a set of default configuration keys. These are 1 [module_name] 2 module = PythonScriptName 3 name = Name on frontend 4 description = description on frontend 5 instruction = instruction on frontend 6 type = rated 7 weight = # Source Url 9 sourceurl = "link to source file" 10 # size of plot in y (if module features a plot) 11 plotsize_y = 5 12 # size of the plot in x 13 plotsize_x = 8.9 Listing A.1: default configuration keys Depending on the module, a number of different config keys are used. Job Status 1 # status parameters 2 # he minimum amount of jobs required to determine a status 3 jobs_min = 50 4 # how long jobs stay queued at max in hours for status 5 qtime_max = 48 6 # ratio between running and idle jobs for status 7 running_idle_ratio = # how many long jobs for status 35

38 A Appendix 9 qtime_max_jobs = # minimal efficency for status 11 min_efficency = # differnet sites - input a python list with strings 13 sites = ["gridka", "ekpcms6", "ekpcloud", "ekpsg", "ekpsm","bwforcluster"] 14 # additional plotting parameters 15 # distance between biggest bar and right and of the plot in % 16 plot_right_margin = # width of the bars in plot 18 plot_width = # how much bars the plots shows at least before scaling bigger 20 min_plotsize = 3 21 # x-value, when to use a log Scale in Plot 22 log_limit = 500 Listing A.2: additional job status configuration keys jobs_min: The minimum amount of jobs required to determine a status qtime_max and qtime_max_jobs: If more jobs than qtime_max_jobs are longer queued than qtime_max and have not started yet, the module status turns critical. running_idle_ratio: The minimum ratio between running and idle jobs. If the ratio is below the given parameter, the status turn critical. min_efficiency: If the average efficiency is below this value, the module status turns critical. Site Status 1 # status parameters 2 # how many slots per machine should be running 3 machine_slot_min = 2 4 # how many slots must be running to determine status 5 slots_min = 20 6 # limit for claimed_unclaimed_ratio 7 claimed_unclaimed_ratio = # weak Slots have a load below this value 9 weak_threshold = # differnet sites - input a python list with strings 11 sites = ["gridka", "ekpcms6", "ekpcloud", "ekpsg", "ekpsm","bwforcluster"] 12 # additional plotting parameters 13 # how much bars the plots shows at least before scaling bigger 14 min_plotsize = 3 15 # x-value when to use a log Scale in Plot 36

39 A.2 Additional Plots and Tables 16 log_limit = # width of the bars in plot 18 plot_width = # distance between biggest bar and right and of the plot in % 20 plot_right_margin = 0.1 Listing A.3: additional site status configuration keys machine_slots_min: How many slots should be claimed per machine. slots_min : The minimum amount of slots required to determine a status. claimed_unclaimed_ratio: The minimum ratio between claimed and unclaimed slots. If the ratio is below the given parameter, the status turn critical. weak_threshold: Slots with a load below this value are considered weak slots if they are claimed. History 1 # number of hours in plot (maximum is 24, given by the constraint on the condor_history command) 2 plotrange = 24 3 # width of bars in plot 4 plot_width = 1 Listing A.4: additional history module configuration keys Cache Modules The cache modules use histograms to display data. Therefore an additional configuration parameter is provided. 1 # number of bins in histograms 2 nbins = 50 Listing A.5: additional cache module configuration keys A.2 Additional Plots and Tables 37

40 A Appendix Figure A.1: This plot shows the distributions used in the Cache Details Module. The Plot was created 10 Jan 2016, 15:24. As expected, the allocation time plot shows peaks for every machine. The other parameters differ depending on file size and machine. 38

41 A.2 Additional Plots and Tables Table A.1: Detailed table to Figure 4.3 User Host Undefined, ekpsg, ekpsm ekpsg, ekpsm Queued Jobs 50 0 Idle Jobs 0 0 Running Jobs Removed Jobs 0 0 Cores used RAM used 2.9 GB 1.2 GB Efficiency Figure A.2: The cache lifetime plot shows data from Sept After September, no files were automatically deleted from the cache, so this old data must be sufficient. 39

42 A Appendix Figure A.3: Snapshot of the ekplocal HappyFace instance. It shows the Caching Details Module. 40

Virtualizing a Batch. University Grid Center

Virtualizing a Batch. University Grid Center Virtualizing a Batch Queuing System at a University Grid Center Volker Büge (1,2), Yves Kemp (1), Günter Quast (1), Oliver Oberst (1), Marcel Kunze (2) (1) University of Karlsruhe (2) Forschungszentrum

More information

IEPSAS-Kosice: experiences in running LCG site

IEPSAS-Kosice: experiences in running LCG site IEPSAS-Kosice: experiences in running LCG site Marian Babik 1, Dusan Bruncko 2, Tomas Daranyi 1, Ladislav Hluchy 1 and Pavol Strizenec 2 1 Department of Parallel and Distributed Computing, Institute of

More information

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF Conference 2017 The Data Challenges of the LHC Reda Tafirout, TRIUMF Outline LHC Science goals, tools and data Worldwide LHC Computing Grid Collaboration & Scale Key challenges Networking ATLAS experiment

More information

ATLAS Experiment and GCE

ATLAS Experiment and GCE ATLAS Experiment and GCE Google IO Conference San Francisco, CA Sergey Panitkin (BNL) and Andrew Hanushevsky (SLAC), for the ATLAS Collaboration ATLAS Experiment The ATLAS is one of the six particle detectors

More information

Adding timing to the VELO

Adding timing to the VELO Summer student project report: Adding timing to the VELO supervisor: Mark Williams Biljana Mitreska Cern Summer Student Internship from June 12 to August 4, 2017 Acknowledgements I would like to thank

More information

PROOF-Condor integration for ATLAS

PROOF-Condor integration for ATLAS PROOF-Condor integration for ATLAS G. Ganis,, J. Iwaszkiewicz, F. Rademakers CERN / PH-SFT M. Livny, B. Mellado, Neng Xu,, Sau Lan Wu University Of Wisconsin Condor Week, Madison, 29 Apr 2 May 2008 Outline

More information

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider Andrew Washbrook School of Physics and Astronomy University of Edinburgh Dealing with Data Conference

More information

PoS(High-pT physics09)036

PoS(High-pT physics09)036 Triggering on Jets and D 0 in HLT at ALICE 1 University of Bergen Allegaten 55, 5007 Bergen, Norway E-mail: st05886@alf.uib.no The High Level Trigger (HLT) of the ALICE experiment is designed to perform

More information

Grid Computing Activities at KIT

Grid Computing Activities at KIT Grid Computing Activities at KIT Meeting between NCP and KIT, 21.09.2015 Manuel Giffels Karlsruhe Institute of Technology Institute of Experimental Nuclear Physics & Steinbuch Center for Computing Courtesy

More information

CMS Computing Model with Focus on German Tier1 Activities

CMS Computing Model with Focus on German Tier1 Activities CMS Computing Model with Focus on German Tier1 Activities Seminar über Datenverarbeitung in der Hochenergiephysik DESY Hamburg, 24.11.2008 Overview The Large Hadron Collider The Compact Muon Solenoid CMS

More information

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract ATLAS NOTE December 4, 2014 ATLAS offline reconstruction timing improvements for run-2 The ATLAS Collaboration Abstract ATL-SOFT-PUB-2014-004 04/12/2014 From 2013 to 2014 the LHC underwent an upgrade to

More information

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers CHEP 2016 - San Francisco, United States of America Gunther Erli, Frank Fischer, Georg Fleig, Manuel Giffels, Thomas

More information

The CMS Computing Model

The CMS Computing Model The CMS Computing Model Dorian Kcira California Institute of Technology SuperComputing 2009 November 14-20 2009, Portland, OR CERN s Large Hadron Collider 5000+ Physicists/Engineers 300+ Institutes 70+

More information

First LHCb measurement with data from the LHC Run 2

First LHCb measurement with data from the LHC Run 2 IL NUOVO CIMENTO 40 C (2017) 35 DOI 10.1393/ncc/i2017-17035-4 Colloquia: IFAE 2016 First LHCb measurement with data from the LHC Run 2 L. Anderlini( 1 )ands. Amerio( 2 ) ( 1 ) INFN, Sezione di Firenze

More information

ISTITUTO NAZIONALE DI FISICA NUCLEARE

ISTITUTO NAZIONALE DI FISICA NUCLEARE ISTITUTO NAZIONALE DI FISICA NUCLEARE Sezione di Perugia INFN/TC-05/10 July 4, 2005 DESIGN, IMPLEMENTATION AND CONFIGURATION OF A GRID SITE WITH A PRIVATE NETWORK ARCHITECTURE Leonello Servoli 1,2!, Mirko

More information

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S) Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S) Overview Large Hadron Collider (LHC) Compact Muon Solenoid (CMS) experiment The Challenge Worldwide LHC

More information

A New Segment Building Algorithm for the Cathode Strip Chambers in the CMS Experiment

A New Segment Building Algorithm for the Cathode Strip Chambers in the CMS Experiment EPJ Web of Conferences 108, 02023 (2016) DOI: 10.1051/ epjconf/ 201610802023 C Owned by the authors, published by EDP Sciences, 2016 A New Segment Building Algorithm for the Cathode Strip Chambers in the

More information

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013 Computing at the Large Hadron Collider Frank Würthwein Professor of Physics of California San Diego November 15th, 2013 Outline The Science Software & Computing Challenges Present Solutions Future Solutions

More information

CSCS CERN videoconference CFD applications

CSCS CERN videoconference CFD applications CSCS CERN videoconference CFD applications TS/CV/Detector Cooling - CFD Team CERN June 13 th 2006 Michele Battistin June 2006 CERN & CFD Presentation 1 TOPICS - Some feedback about already existing collaboration

More information

Overview. About CERN 2 / 11

Overview. About CERN 2 / 11 Overview CERN wanted to upgrade the data monitoring system of one of its Large Hadron Collider experiments called ALICE (A La rge Ion Collider Experiment) to ensure the experiment s high efficiency. They

More information

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Evaluation of the computing resources required for a Nordic research exploitation of the LHC PROCEEDINGS Evaluation of the computing resources required for a Nordic research exploitation of the LHC and Sverker Almehed, Chafik Driouichi, Paula Eerola, Ulf Mjörnmark, Oxana Smirnova,TorstenÅkesson

More information

Batch Services at CERN: Status and Future Evolution

Batch Services at CERN: Status and Future Evolution Batch Services at CERN: Status and Future Evolution Helge Meinhard, CERN-IT Platform and Engineering Services Group Leader HTCondor Week 20 May 2015 20-May-2015 CERN batch status and evolution - Helge

More information

Tracking and flavour tagging selection in the ATLAS High Level Trigger

Tracking and flavour tagging selection in the ATLAS High Level Trigger Tracking and flavour tagging selection in the ATLAS High Level Trigger University of Pisa and INFN E-mail: milene.calvetti@cern.ch In high-energy physics experiments, track based selection in the online

More information

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino Monitoring system for geographically distributed datacenters based on Openstack Gioacchino Vino Tutor: Dott. Domenico Elia Tutor: Dott. Giacinto Donvito Borsa di studio GARR Orio Carlini 2016-2017 INFN

More information

CMS Conference Report

CMS Conference Report Available on CMS information server CMS CR 2005/021 CMS Conference Report 29 Septemebr 2005 Track and Vertex Reconstruction with the CMS Detector at LHC S. Cucciarelli CERN, Geneva, Switzerland Abstract

More information

b-jet identification at High Level Trigger in CMS

b-jet identification at High Level Trigger in CMS Journal of Physics: Conference Series PAPER OPEN ACCESS b-jet identification at High Level Trigger in CMS To cite this article: Eric Chabert 2015 J. Phys.: Conf. Ser. 608 012041 View the article online

More information

CC-IN2P3: A High Performance Data Center for Research

CC-IN2P3: A High Performance Data Center for Research April 15 th, 2011 CC-IN2P3: A High Performance Data Center for Research Toward a partnership with DELL Dominique Boutigny Agenda Welcome Introduction to CC-IN2P3 Visit of the computer room Lunch Discussion

More information

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008 CERN openlab II CERN openlab and Intel: Today and Tomorrow Sverre Jarp CERN openlab CTO 16 September 2008 Overview of CERN 2 CERN is the world's largest particle physics centre What is CERN? Particle physics

More information

The CMS data quality monitoring software: experience and future prospects

The CMS data quality monitoring software: experience and future prospects The CMS data quality monitoring software: experience and future prospects Federico De Guio on behalf of the CMS Collaboration CERN, Geneva, Switzerland E-mail: federico.de.guio@cern.ch Abstract. The Data

More information

UW-ATLAS Experiences with Condor

UW-ATLAS Experiences with Condor UW-ATLAS Experiences with Condor M.Chen, A. Leung, B.Mellado Sau Lan Wu and N.Xu Paradyn / Condor Week, Madison, 05/01/08 Outline Our first success story with Condor - ATLAS production in 2004~2005. CRONUS

More information

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

Monitoring of Computing Resource Use of Active Software Releases at ATLAS 1 2 3 4 5 6 Monitoring of Computing Resource Use of Active Software Releases at ATLAS Antonio Limosani on behalf of the ATLAS Collaboration CERN CH-1211 Geneva 23 Switzerland and University of Sydney,

More information

Fast pattern recognition with the ATLAS L1Track trigger for the HL-LHC

Fast pattern recognition with the ATLAS L1Track trigger for the HL-LHC Fast pattern recognition with the ATLAS L1Track trigger for the HL-LHC On behalf of the ATLAS Collaboration Uppsala Universitet E-mail: mikael.martensson@cern.ch ATL-DAQ-PROC-2016-034 09/01/2017 A fast

More information

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era to meet the computing requirements of the HL-LHC era NPI AS CR Prague/Rez E-mail: adamova@ujf.cas.cz Maarten Litmaath CERN E-mail: Maarten.Litmaath@cern.ch The performance of the Large Hadron Collider

More information

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM Lukas Nellen ICN-UNAM lukas@nucleares.unam.mx 3rd BigData BigNetworks Conference Puerto Vallarta April 23, 2015 Who Am I? ALICE

More information

CouchDB-based system for data management in a Grid environment Implementation and Experience

CouchDB-based system for data management in a Grid environment Implementation and Experience CouchDB-based system for data management in a Grid environment Implementation and Experience Hassen Riahi IT/SDC, CERN Outline Context Problematic and strategy System architecture Integration and deployment

More information

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez Scientific data processing at global scale The LHC Computing Grid Chengdu (China), July 5th 2011 Who I am 2 Computing science background Working in the field of computing for high-energy physics since

More information

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN Software and computing evolution: the HL-LHC challenge Simone Campana, CERN Higgs discovery in Run-1 The Large Hadron Collider at CERN We are here: Run-2 (Fernando s talk) High Luminosity: the HL-LHC challenge

More information

ATLAS, CMS and LHCb Trigger systems for flavour physics

ATLAS, CMS and LHCb Trigger systems for flavour physics ATLAS, CMS and LHCb Trigger systems for flavour physics Università degli Studi di Bologna and INFN E-mail: guiducci@bo.infn.it The trigger systems of the LHC detectors play a crucial role in determining

More information

Data Reconstruction in Modern Particle Physics

Data Reconstruction in Modern Particle Physics Data Reconstruction in Modern Particle Physics Daniel Saunders, University of Bristol 1 About me Particle Physics student, final year. CSC 2014, tcsc 2015, icsc 2016 Main research interests. Detector upgrades

More information

Muon Reconstruction and Identification in CMS

Muon Reconstruction and Identification in CMS Muon Reconstruction and Identification in CMS Marcin Konecki Institute of Experimental Physics, University of Warsaw, Poland E-mail: marcin.konecki@gmail.com An event reconstruction at LHC is a challenging

More information

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data Maaike Limper Emil Pilecki Manuel Martín Márquez About the speakers Maaike Limper Physicist and project leader Manuel Martín

More information

The LHC Computing Grid

The LHC Computing Grid The LHC Computing Grid Visit of Finnish IT Centre for Science CSC Board Members Finland Tuesday 19 th May 2009 Frédéric Hemmer IT Department Head The LHC and Detectors Outline Computing Challenges Current

More information

Striped Data Server for Scalable Parallel Data Analysis

Striped Data Server for Scalable Parallel Data Analysis Journal of Physics: Conference Series PAPER OPEN ACCESS Striped Data Server for Scalable Parallel Data Analysis To cite this article: Jin Chang et al 2018 J. Phys.: Conf. Ser. 1085 042035 View the article

More information

CERN s Business Computing

CERN s Business Computing CERN s Business Computing Where Accelerated the infinitely by Large Pentaho Meets the Infinitely small Jan Janke Deputy Group Leader CERN Administrative Information Systems Group CERN World s Leading Particle

More information

Reprocessing DØ data with SAMGrid

Reprocessing DØ data with SAMGrid Reprocessing DØ data with SAMGrid Frédéric Villeneuve-Séguier Imperial College, London, UK On behalf of the DØ collaboration and the SAM-Grid team. Abstract The DØ experiment studies proton-antiproton

More information

Track reconstruction with the CMS tracking detector

Track reconstruction with the CMS tracking detector Track reconstruction with the CMS tracking detector B. Mangano (University of California, San Diego) & O.Gutsche (Fermi National Accelerator Laboratory) Overview The challenges The detector Track reconstruction

More information

Storage Resource Sharing with CASTOR.

Storage Resource Sharing with CASTOR. Storage Resource Sharing with CASTOR Olof Barring, Benjamin Couturier, Jean-Damien Durand, Emil Knezo, Sebastien Ponce (CERN) Vitali Motyakov (IHEP) ben.couturier@cern.ch 16/4/2004 Storage Resource Sharing

More information

Data Quality Monitoring at CMS with Machine Learning

Data Quality Monitoring at CMS with Machine Learning Data Quality Monitoring at CMS with Machine Learning July-August 2016 Author: Aytaj Aghabayli Supervisors: Jean-Roch Vlimant Maurizio Pierini CERN openlab Summer Student Report 2016 Abstract The Data Quality

More information

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005 Compact Muon Solenoid: Cyberinfrastructure Solutions Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005 Computing Demands CMS must provide computing to handle huge data rates and sizes, and

More information

Modules and Front-End Electronics Developments for the ATLAS ITk Strips Upgrade

Modules and Front-End Electronics Developments for the ATLAS ITk Strips Upgrade Modules and Front-End Electronics Developments for the ATLAS ITk Strips Upgrade Carlos García Argos, on behalf of the ATLAS ITk Collaboration University of Freiburg International Conference on Technology

More information

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I. System upgrade and future perspective for the operation of Tokyo Tier2 center, T. Mashimo, N. Matsui, H. Sakamoto and I. Ueda International Center for Elementary Particle Physics, The University of Tokyo

More information

Precision Timing in High Pile-Up and Time-Based Vertex Reconstruction

Precision Timing in High Pile-Up and Time-Based Vertex Reconstruction Precision Timing in High Pile-Up and Time-Based Vertex Reconstruction Cedric Flamant (CERN Summer Student) - Supervisor: Adi Bornheim Division of High Energy Physics, California Institute of Technology,

More information

The GAP project: GPU applications for High Level Trigger and Medical Imaging

The GAP project: GPU applications for High Level Trigger and Medical Imaging The GAP project: GPU applications for High Level Trigger and Medical Imaging Matteo Bauce 1,2, Andrea Messina 1,2,3, Marco Rescigno 3, Stefano Giagu 1,3, Gianluca Lamanna 4,6, Massimiliano Fiorini 5 1

More information

Atlantis: Visualization Tool in Particle Physics

Atlantis: Visualization Tool in Particle Physics Atlantis: Visualization Tool in Particle Physics F.J.G.H. Crijns 2, H. Drevermann 1, J.G. Drohan 3, E. Jansen 2, P.F. Klok 2, N. Konstantinidis 3, Z. Maxa 3, D. Petrusca 1, G. Taylor 4, C. Timmermans 2

More information

Detector Control LHC

Detector Control LHC Detector Control Systems @ LHC Matthias Richter Department of Physics, University of Oslo IRTG Lecture week Autumn 2012 Oct 18 2012 M. Richter (UiO) DCS @ LHC Oct 09 2012 1 / 39 Detectors in High Energy

More information

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER V.V. Korenkov 1, N.A. Kutovskiy 1, N.A. Balashov 1, V.T. Dimitrov 2,a, R.D. Hristova 2, K.T. Kouzmov 2, S.T. Hristov 3 1 Laboratory of Information

More information

The full detector simulation for the ATLAS experiment: status and outlook

The full detector simulation for the ATLAS experiment: status and outlook The full detector simulation for the ATLAS experiment: status and outlook A. Rimoldi University of Pavia & INFN, Italy A.Dell Acqua CERN, Geneva, CH The simulation of the ATLAS detector is a major challenge,

More information

Data Transfers Between LHC Grid Sites Dorian Kcira

Data Transfers Between LHC Grid Sites Dorian Kcira Data Transfers Between LHC Grid Sites Dorian Kcira dkcira@caltech.edu Caltech High Energy Physics Group hep.caltech.edu/cms CERN Site: LHC and the Experiments Large Hadron Collider 27 km circumference

More information

PoS(EPS-HEP2017)523. The CMS trigger in Run 2. Mia Tosi CERN

PoS(EPS-HEP2017)523. The CMS trigger in Run 2. Mia Tosi CERN CERN E-mail: mia.tosi@cern.ch During its second period of operation (Run 2) which started in 2015, the LHC will reach a peak instantaneous luminosity of approximately 2 10 34 cm 2 s 1 with an average pile-up

More information

How to discover the Higgs Boson in an Oracle database. Maaike Limper

How to discover the Higgs Boson in an Oracle database. Maaike Limper How to discover the Higgs Boson in an Oracle database Maaike Limper 2 Introduction CERN openlab is a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate

More information

N. Marusov, I. Semenov

N. Marusov, I. Semenov GRID TECHNOLOGY FOR CONTROLLED FUSION: CONCEPTION OF THE UNIFIED CYBERSPACE AND ITER DATA MANAGEMENT N. Marusov, I. Semenov Project Center ITER (ITER Russian Domestic Agency N.Marusov@ITERRF.RU) Challenges

More information

ATLAS PILE-UP AND OVERLAY SIMULATION

ATLAS PILE-UP AND OVERLAY SIMULATION ATLAS PILE-UP AND OVERLAY SIMULATION LPCC Detector Simulation Workshop, June 26-27, 2017 ATL-SOFT-SLIDE-2017-375 22/06/2017 Tadej Novak on behalf of the ATLAS Collaboration INTRODUCTION In addition to

More information

The ATLAS Conditions Database Model for the Muon Spectrometer

The ATLAS Conditions Database Model for the Muon Spectrometer The ATLAS Conditions Database Model for the Muon Spectrometer Monica Verducci 1 INFN Sezione di Roma P.le Aldo Moro 5,00185 Rome, Italy E-mail: monica.verducci@cern.ch on behalf of the ATLAS Muon Collaboration

More information

Prompt data reconstruction at the ATLAS experiment

Prompt data reconstruction at the ATLAS experiment Prompt data reconstruction at the ATLAS experiment Graeme Andrew Stewart 1, Jamie Boyd 1, João Firmino da Costa 2, Joseph Tuggle 3 and Guillaume Unal 1, on behalf of the ATLAS Collaboration 1 European

More information

PoS(EPS-HEP2017)492. Performance and recent developments of the real-time track reconstruction and alignment of the LHCb detector.

PoS(EPS-HEP2017)492. Performance and recent developments of the real-time track reconstruction and alignment of the LHCb detector. Performance and recent developments of the real-time track reconstruction and alignment of the LHCb detector. CERN E-mail: agnieszka.dziurda@cern.ch he LHCb detector is a single-arm forward spectrometer

More information

Big Data Analytics and the LHC

Big Data Analytics and the LHC Big Data Analytics and the LHC Maria Girone CERN openlab CTO Computing Frontiers 2016, Como, May 2016 DOI: 10.5281/zenodo.45449, CC-BY-SA, images courtesy of CERN 2 3 xx 4 Big bang in the laboratory We

More information

Physics CMS Muon High Level Trigger: Level 3 reconstruction algorithm development and optimization

Physics CMS Muon High Level Trigger: Level 3 reconstruction algorithm development and optimization Scientifica Acta 2, No. 2, 74 79 (28) Physics CMS Muon High Level Trigger: Level 3 reconstruction algorithm development and optimization Alessandro Grelli Dipartimento di Fisica Nucleare e Teorica, Università

More information

An ATCA framework for the upgraded ATLAS read out electronics at the LHC

An ATCA framework for the upgraded ATLAS read out electronics at the LHC An ATCA framework for the upgraded ATLAS read out electronics at the LHC Robert Reed School of Physics, University of the Witwatersrand, Johannesburg, South Africa E-mail: robert.reed@cern.ch Abstract.

More information

High Throughput WAN Data Transfer with Hadoop-based Storage

High Throughput WAN Data Transfer with Hadoop-based Storage High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San

More information

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk Challenges and Evolution of the LHC Production Grid April 13, 2011 Ian Fisk 1 Evolution Uni x ALICE Remote Access PD2P/ Popularity Tier-2 Tier-2 Uni u Open Lab m Tier-2 Science Uni x Grid Uni z USA Tier-2

More information

AliEn Resource Brokers

AliEn Resource Brokers AliEn Resource Brokers Pablo Saiz University of the West of England, Frenchay Campus Coldharbour Lane, Bristol BS16 1QY, U.K. Predrag Buncic Institut für Kernphysik, August-Euler-Strasse 6, 60486 Frankfurt

More information

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products What is big data? Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

More information

An SQL-based approach to physics analysis

An SQL-based approach to physics analysis Journal of Physics: Conference Series OPEN ACCESS An SQL-based approach to physics analysis To cite this article: Dr Maaike Limper 2014 J. Phys.: Conf. Ser. 513 022022 View the article online for updates

More information

Performance of the ATLAS Inner Detector at the LHC

Performance of the ATLAS Inner Detector at the LHC Performance of the ALAS Inner Detector at the LHC hijs Cornelissen for the ALAS Collaboration Bergische Universität Wuppertal, Gaußstraße 2, 4297 Wuppertal, Germany E-mail: thijs.cornelissen@cern.ch Abstract.

More information

Grid Computing at the IIHE

Grid Computing at the IIHE BNC 2016 Grid Computing at the IIHE The Interuniversity Institute for High Energies S. Amary, F. Blekman, A. Boukil, O. Devroede, S. Gérard, A. Ouchene, R. Rougny, S. Rugovac, P. Vanlaer, R. Vandenbroucke

More information

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in versions 8 and 9. that must be used to measure, evaluate,

More information

Benchmarking third-party-transfer protocols with the FTS

Benchmarking third-party-transfer protocols with the FTS Benchmarking third-party-transfer protocols with the FTS Rizart Dona CERN Summer Student Programme 2018 Supervised by Dr. Simone Campana & Dr. Oliver Keeble 1.Introduction 1 Worldwide LHC Computing Grid

More information

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool D. Mason for CMS Software & Computing 1 Going to try to give you a picture of the CMS HTCondor/ glideinwms global pool What s the use case

More information

ATLAS ITk Layout Design and Optimisation

ATLAS ITk Layout Design and Optimisation ATLAS ITk Layout Design and Optimisation Noemi Calace noemi.calace@cern.ch On behalf of the ATLAS Collaboration 3rd ECFA High Luminosity LHC Experiments Workshop 3-6 October 2016 Aix-Les-Bains Overview

More information

DIRAC pilot framework and the DIRAC Workload Management System

DIRAC pilot framework and the DIRAC Workload Management System Journal of Physics: Conference Series DIRAC pilot framework and the DIRAC Workload Management System To cite this article: Adrian Casajus et al 2010 J. Phys.: Conf. Ser. 219 062049 View the article online

More information

Clustering and Reclustering HEP Data in Object Databases

Clustering and Reclustering HEP Data in Object Databases Clustering and Reclustering HEP Data in Object Databases Koen Holtman CERN EP division CH - Geneva 3, Switzerland We formulate principles for the clustering of data, applicable to both sequential HEP applications

More information

CMS High Level Trigger Timing Measurements

CMS High Level Trigger Timing Measurements Journal of Physics: Conference Series PAPER OPEN ACCESS High Level Trigger Timing Measurements To cite this article: Clint Richardson 2015 J. Phys.: Conf. Ser. 664 082045 Related content - Recent Standard

More information

Storage and I/O requirements of the LHC experiments

Storage and I/O requirements of the LHC experiments Storage and I/O requirements of the LHC experiments Sverre Jarp CERN openlab, IT Dept where the Web was born 22 June 2006 OpenFabrics Workshop, Paris 1 Briefly about CERN 22 June 2006 OpenFabrics Workshop,

More information

Experience of the WLCG data management system from the first two years of the LHC data taking

Experience of the WLCG data management system from the first two years of the LHC data taking Experience of the WLCG data management system from the first two years of the LHC data taking 1 Nuclear Physics Institute, Czech Academy of Sciences Rez near Prague, CZ 25068, Czech Republic E-mail: adamova@ujf.cas.cz

More information

Data handling and processing at the LHC experiments

Data handling and processing at the LHC experiments 1 Data handling and processing at the LHC experiments Astronomy and Bio-informatic Farida Fassi CC-IN2P3/CNRS EPAM 2011, Taza, Morocco 2 The presentation will be LHC centric, which is very relevant for

More information

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT The Monte Carlo Method: Versatility Unbounded in a Dynamic Computing World Chattanooga, Tennessee, April 17-21, 2005, on CD-ROM, American Nuclear Society, LaGrange Park, IL (2005) MONTE CARLO SIMULATION

More information

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland Available on CMS information server CMS CR -2009/098 The Compact Muon Solenoid Experiment Conference Report Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland 15 April 2009 FROG: The Fast And Realistic

More information

Monte Carlo Production Management at CMS

Monte Carlo Production Management at CMS Monte Carlo Production Management at CMS G Boudoul 1, G Franzoni 2, A Norkus 2,3, A Pol 2, P Srimanobhas 4 and J-R Vlimant 5 - for the Compact Muon Solenoid collaboration 1 U. C. Bernard-Lyon I, 43 boulevard

More information

PoS(ACAT08)100. FROG: The Fast & Realistic OPENGL Event Displayer

PoS(ACAT08)100. FROG: The Fast & Realistic OPENGL Event Displayer FROG: The Fast & Realistic OPENGL Event Displayer Center for Particle Physics and Phenomenology (CP3) Université catholique de Louvain Chemin du cyclotron 2, B-1348-Louvain-la-Neuve - Belgium E-mail: loic.quertenmont@cern.ch

More information

CERN and Scientific Computing

CERN and Scientific Computing CERN and Scientific Computing Massimo Lamanna CERN Information Technology Department Experiment Support Group 1960: 26 GeV proton in the 32 cm CERN hydrogen bubble chamber 1960: IBM 709 at the Geneva airport

More information

CMS Simulation Software

CMS Simulation Software CMS Simulation Software Dmitry Onoprienko Kansas State University on behalf of the CMS collaboration 10th Topical Seminar on Innovative Particle and Radiation Detectors 1-5 October 2006. Siena, Italy Simulation

More information

CMS Alignement and Calibration workflows: lesson learned and future plans

CMS Alignement and Calibration workflows: lesson learned and future plans Available online at www.sciencedirect.com Nuclear and Particle Physics Proceedings 273 275 (2016) 923 928 www.elsevier.com/locate/nppp CMS Alignement and Calibration workflows: lesson learned and future

More information

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan Philippe Laurens, Michigan State University, for USATLAS Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan ESCC/Internet2 Joint Techs -- 12 July 2011 Content Introduction LHC, ATLAS,

More information

Improved ATLAS HammerCloud Monitoring for Local Site Administration

Improved ATLAS HammerCloud Monitoring for Local Site Administration Improved ATLAS HammerCloud Monitoring for Local Site Administration M Böhler 1, J Elmsheuser 2, F Hönig 2, F Legger 2, V Mancinelli 3, and G Sciacca 4 on behalf of the ATLAS collaboration 1 Albert-Ludwigs

More information

THE ATLAS INNER DETECTOR OPERATION, DATA QUALITY AND TRACKING PERFORMANCE.

THE ATLAS INNER DETECTOR OPERATION, DATA QUALITY AND TRACKING PERFORMANCE. Proceedings of the PIC 2012, Štrbské Pleso, Slovakia THE ATLAS INNER DETECTOR OPERATION, DATA QUALITY AND TRACKING PERFORMANCE. E.STANECKA, ON BEHALF OF THE ATLAS COLLABORATION Institute of Nuclear Physics

More information

LHCb Computing Resources: 2018 requests and preview of 2019 requests

LHCb Computing Resources: 2018 requests and preview of 2019 requests LHCb Computing Resources: 2018 requests and preview of 2019 requests LHCb-PUB-2017-009 23/02/2017 LHCb Public Note Issue: 0 Revision: 0 Reference: LHCb-PUB-2017-009 Created: 23 rd February 2017 Last modified:

More information

Capturing and Analyzing User Behavior in Large Digital Libraries

Capturing and Analyzing User Behavior in Large Digital Libraries Capturing and Analyzing User Behavior in Large Digital Libraries Giorgi Gvianishvili, Jean-Yves Le Meur, Tibor Šimko, Jérôme Caffaro, Ludmila Marian, Samuele Kaplun, Belinda Chan, and Martin Rajman European

More information

The JINR Tier1 Site Simulation for Research and Development Purposes

The JINR Tier1 Site Simulation for Research and Development Purposes EPJ Web of Conferences 108, 02033 (2016) DOI: 10.1051/ epjconf/ 201610802033 C Owned by the authors, published by EDP Sciences, 2016 The JINR Tier1 Site Simulation for Research and Development Purposes

More information

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland Available on CMS information server CMS CR -2008/100 The Compact Muon Solenoid Experiment Conference Report Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland 02 December 2008 (v2, 03 December 2008)

More information

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008 The LHC Computing Grid Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008 The LHC Computing Grid February 2008 Some precursors Computing for HEP means data handling Fixed-target experiments

More information