DAL ALGORITHMS AND PYTHON

Similar documents
The Error Reporting in the ATLAS TDAQ System

Data Quality Monitoring Display for ATLAS experiment

Building the Trigger Partition Testbed

Kakadu and Java. David Taubman, UNSW June 3, 2003

New Persistent Back-End for the ATLAS Online Information Service

Data Quality Monitoring at CMS with Machine Learning

Large Scale Software Building with CMake in ATLAS

arxiv: v1 [physics.ins-det] 16 Oct 2017

lecture24: Disjoint Sets

Object-Oriented Programming

NanoAODs Summer student report

Rivet. July , CERN

Wrapping a complex C++ library for Eiffel. FINAL REPORT July 1 st, 2005

Cut per region. Marc Verderi GEANT4 collaboration meeting 01/10/2002

Python in the Cling World

Functions CHAPTER 5. FIGURE 1. Concrete syntax for the P 2 subset of Python. (In addition to that of P 1.)

The AAL project: automated monitoring and intelligent analysis for the ATLAS data taking infrastructure

Functions CHAPTER 5. FIGURE 1. Concrete syntax for the P 2 subset of Python. (In addition to that of P 1.)

Summer Student Project Report

September Development of favorite collections & visualizing user search queries in CERN Document Server (CDS)

PyROOT: Seamless Melting of C++ and Python. Pere MATO, Danilo PIPARO on behalf of the ROOT Team

Modeling and Validating Time, Buffering, and Utilization of a Large-Scale, Real-Time Data Acquisition System

Performance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1

Assignment 7: functions and closure conversion (part 1)

Shadowserver reports automated tool

Appendix B Boost.Python

Work Project Report: Benchmark for 100 Gbps Ethernet network analysis

Building a (resumable and extensible) DSL with Apache Groovy Jesse Glick CloudBees, Inc.

The ALICE Glance Shift Accounting Management System (SAMS)

pyframe Ryan Reece A light-weight Python framework for analyzing ROOT ntuples in ATLAS University of Pennsylvania

Book keeping. Will post HW5 tonight. OK to work in pairs. Midterm review next Wednesday

HippoDraw and Python

The ATLAS Data Flow System for LHC Run 2

peval Documentation Release Bogdan Opanchuk

ENGR 102 Engineering Lab I - Computation

Overview. Rationale Division of labour between script and C++ Choice of language(s) Interfacing to C++ Performance, memory

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Improving Generators Interface to Support LHEF V3 Format

Garment Documentation

MythoLogic: problems and their solutions in the evolution of a project

Webgurukul Programming Language Course

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

ALeF: Active Learning Framework for Readability Prediction

CSE 341, Autumn 2015, Ruby Introduction Summary

CS61A Notes Week 13: Interpreters

AGIS: The ATLAS Grid Information System

VIRTUAL FUNCTIONS Chapter 10

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

ATLAS TDAQ System Administration: Master of Puppets

Programming Languages. Streams Wrapup, Memoization, Type Systems, and Some Monty Python

About Python. Python Duration. Training Objectives. Training Pre - Requisites & Who Should Learn Python

Python Training. Complete Practical & Real-time Trainings. A Unit of SequelGate Innovative Technologies Pvt. Ltd.

flask-dynamo Documentation

Implementation of Customized FindBugs Detectors

Assignment3 CS206 Intro to Data Structures Fall Part 1 (50 pts) due: October 13, :59pm Part 2 (150 pts) due: October 20, :59pm

An SQL-based approach to physics analysis

High Performance Computing on MapReduce Programming Framework

Physics Analysis Software Framework for Belle II

Patterns Continued and Concluded. July 26, 2017

The Run 2 ATLAS Analysis Event Data Model

Verification and Diagnostics Framework in ATLAS Trigger/DAQ

Object Oriented Programming

A Class to Manage Large Ensembles and Batch Execution in Python

C02: Overview of Software Development and Java

Class diagrams. Modeling with UML Chapter 2, part 2. Class Diagrams: details. Class diagram for a simple watch

Rocking with Racket. Marc Burns Beatlight Inc

} else if( Ellipse *e = dynamic_cast<ellipse *>(shape) ) { } else if( Square *s = dynamic_cast<square *>(shape) ) {

PoS(High-pT physics09)036

Summary of the LHC Computing Review

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

PTN-202: Advanced Python Programming Course Description. Course Outline

Hue Application for Big Data Ingestion

Managing the Database

Designing Procedural 4GL Applications through UML Modeling

The ATLAS Data Acquisition System: from Run 1 to Run 2

Homework 1. Generic Data Structures for Storing CSPs. 1 Help 1

CSC108: Introduction to Computer Programming. Lecture 11

Multiple 802.1Q Spanning Trees

Singularity in CMS. Over a million containers served

Friday, 11 April 14. Advanced methods for creating decorators Graham Dumpleton PyCon US - April 2014

The new inter process communication middle-ware for the ATLAS Trigger and Data Acquisition system

ENVIRONMENT MODEL: FUNCTIONS, DATA 18

CSCS CERN videoconference CFD applications

Description of the program generator utility, Generv. An overview with historical perspectives.

Lecture 16: Static Semantics Overview 1

THE ATLAS DATA ACQUISITION SYSTEM IN LHC RUN 2

Principles of Programming Languages. Objective-C. Joris Kluivers

SYSC 2006 C Winter String Processing in C. D.L. Bailey, Systems and Computer Engineering, Carleton University

Manage Workflows. Workflows and Workflow Actions

pyramid_assetmutator Documentation

Assignment 7: functions and closure conversion

FINAL REPORT 04/25/2015 FINAL REPORT SUNY CANTON MOBILE APPLICATION

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

dyn dynamic object management

A Scalable and Reliable Message Transport Service for the ATLAS Trigger and Data Acquisition System

Automatizing the Online Filter Test Management for a General-Purpose Particle Detector

ECE 449 OOP and Computer Simulation Lecture 11 Design Patterns

Chapter 12: Query Processing. Chapter 12: Query Processing

High Speed DAQ with DPDK

Investigation on Oracle GoldenGate Veridata for Data Consistency in WLCG Distributed Database Environment

Transcription:

DAL ALGORITHMS AND PYTHON CERN Summer Student Report Bahar Aydemir Supervisors: Igor Soloviev Giuseppe Avolio September 15, 2017 1

Contents 1 Introduction... 3 2 Work Done... 3 2.1 Implementation Details... 4 2.2 Example Usage By The User... 4 2.3 DAL algorithms in core... 5 2.4 New Python Class: AppConfig... 5 2.5 New Python Class: SegConfig... 5 2.6 Python script : dal_dump_app_config.py... 6 2.7 Python script : dal_dump_apps_config.py... 6 2.8 Optimizations... 7 2.8.1 Python side... 7 2.8.2 C ++ side... 7 2.9 Measurements... 7 2.10 Tests... 8 3 Conclusion... 8 4 Acknowledgements... 8 2

1 Introduction The Trigger and Data Acquisition (TDAQ) system of the ATLAS detector at the Large Hadron Collider (LHC) at CERN is composed of a large number of distributed hardware and software components. TDAQ system consists of about 3000 computers and more than 25000 applications which, in a coordinated manner, provide the data-taking functionality of the overall system. There is a number of online services required to configure, monitor and control the ATLAS data taking. In particular, the configuration service is used to provide configuration of above components. The configuration of the ATLAS data acquisition system is stored in XML-based object database named OKS. DAL (Data Access Library) allowing to access it's information by C++, Java and Python clients in a distributed environment. Some information has quite complicated structure, so it's extraction requires writing special algorithms. Algorithms available on C++ programming language and partially reimplemented on Java programming language. The goal of the project is using C++ DAL algorithms in Python. It will also allow several web applications (using a Python back-end like Django) to have a proper access to the system configuration. 2 Work Done There were two proposed solutions in order to reach the goal. First solution is, re-writing the DAL algorithms in Python and the second one is calling the C++ versions at run-time by using bindings with the help of the BoostPython library. The former one requires maintenance always when a modification is done on the original algorithm. However, the latter one is already using the original algorithm as the base and it provides flexibility in terms of the evolution of the original algorithms. Binding from the generated C++ DAL classes to Python would result in two 'DAL objects' representing the same thing: One object from the Boost Python binding to C++ and one object from the pure Python side. Therefore, instead of using the regular way the following method is used. Figure 1 - An example function call flow As illustrated Figure 1, the function call is made on the Python object. In the Python side the arguments are parsed into strings and passed to the C++ side by calling the library function which is a wrapper for the original algorithm. After that, the wrapper retrieves the C++ objects and calls the original algorithm. The result is parsed into strings again to be passed back to Python side. Lastly, the returned value is used for construction of the desired structures and the result is served to the user. Using wrapper functions provides a more Pythonic experience such as returning dictionaries for the functions with multiple return values. It also increases the flexibility on a case by case basis. 3

2.1 Implementation Details To explain the implementation details of this method, the implementation of the following C++ algorithm from BaseApplication class is inspected : void get_output_error_directory(std::string& dir_name, const Partition&)const This function takes two parameters, one as the output and the other one as the input. The output value is returned by the Python call and there is no need to be passed into the wrapper. The Python wrappers always takes the generated DAL object and the configuration database as parameters. Then the other input arguments follows which can be seen as the partition parameter in this example. In the following code snippet, line 4 redirects the desired function to the wrapper function. 1. def helper_wrapper(self, db, partition): 2. #It converts all DAL objects to single strings (their UID), and passes the current configu ration DB in addition as first argument. 3. return libdal_algo_helper.get_output_error_directory(self.id, db._obj, partition.id) 4. dal.baseapplication.get_output_error_directory = helper_wrapper To the C++ side the string ids passed instead of the object itself as seen above on line 3 and below line 1. Instead of binding the instance method directly, a helper method is also introduced which be bind instead via Boost::Python. 1. std::string get_output_error_directory(const std::string& self_name, python::configurationpoint er& db, const std::string& partition_name) 2. { 3. // Get the Configuration DB 4. Configuration *conf = db.get(); 5. // Get pointer to ourselves 6. const daq::core::baseapplication *self = conf->get<daq::core::baseapplication>(self_name); 7. // Get the DAL object to the argument, here a 'Partition' 8. const daq::core::partition* partition = conf->get<daq::core::partition>(partition_name); 9. // Call the DAL algorithm. 10. std::string result; 11. self->get_output_error_directory(result, *partition); 12. return result; 13. } Line 3, gets a reference to configuration database via the parameter passed. Later, for every parameter that is a DAL object, the C++ DAL instances are retrieved. Finally, the DAL algorithm is called and the result is returned. 2.2 Example Usage By The User 1. if name == ' main ': 2. import config 3. 4. # Open test database 5. db = config.configuration(database) 6. # Get Partition object 7. partition = db.get_dal('partition', partition_name) 8. # Get HLTSV application, note this is derived class of BaseApplication 9. hltsv = db.get_dal('hltsvapplication', 'HLTSV') 10. # Call the DAL algorithm and print the result 11. print hltsv.get_output_error_directory(db, partition) 4

Firstly the database is opened and partition object retrieved as usual. Then the desired object is taken and the function call is made with database and partition parameters. The result of the call can be seen below : /tmp/part_hlt_baydemir 2.3 DAL algorithms in core The following algorithms are implemented by using the method that is described above. dal.baseapplication.get_application dal.baseapplication.get_output_error_directory dal.component.disabled dal.component.get_parents dal.computerprogram.get_info dal.computerprogram.get_parameters dal.partition.get_all_applications dal.partition.get_segment dal.resourcebase.get_applications dal.resourcebase.get_resources dal.segment.find_is_server_by_mask dal.segment.get_timeouts dal.variable.get_value 2.4 New Python Class: AppConfig The class describes application configuration parameters. Following methods are provided to get description of an application: get_app_id() get_base_app() get_host() get_segment() get_seg_id() get_initialization_depends_from() get_shutdown_depends_from() get_info() get_some_info() 2.5 New Python Class: SegConfig The class describes segment configuration parameters. Following methods are provided to get description of nested segments and applications: get_segment_id() get_segment() get_applications() get_nested_segments() get_controller() get_infrastructure() get_hosts() For both AppConfig and SegConfig classes, the simple functions such as getting the id, application or segment are immediately returned from the Python side without using bindings. However, for more complicated functions that need additional checks and operations the same binding method is used. 5

2.6 Python script : dal_dump_app_config.py This script uses the dal.partition.get_all_applications function. Usage: dal_dump_app_config.py [-d database-name] -p name-of-partition [-t [types...]] [-c [ids...]] [-s [ids...]] Options/Arguments: -d database-name name of the database (ignore TDAQ_DB variable) -p name-of-partition name of the partition object -t [types] filter out all applications except given classes (and their subclasses) -c [ids] filter out all applications except those which run on given hosts -s [ids] filter out all applications except those which belong to given segments Figure 2 - An example output of dal_dump_app_config.py Number of applications, ids of the applications, their hosts and segments are retrieved by this script as can be seen in Figure 2. 2.7 Python script : dal_dump_apps.py This script uses both of the AppConfig.get_info and dal.partition.get_all_applications functions. Usage of this script is the same with the above one. Number of applications, ids of the applications, start and restart arguments, possible program names and environment variables of the applications can be retrieved by this script as shown in Figure 3. Figure 3 - An example output of dal_dump_apps.py The usage and the outputs of both scripts are identical to the C++ versions. 6

2.8 Optimizations For dal_dump_apps.py script, in ATLAS partition which has 13355 application objects, the execution needs 13355 calls of get_info function on this many newly created application objects. In every call from Python causes creation of the same C++ objects and destruction over and over again. Especially recursive function calls such as getting nested segments extends the execution time to a unfeasible period. Therefore, the objects should be reused. For this reason, the following optimizations on two sides are implemented. 2.8.1 Python side Global maps: initialized_app_config initialized_seg_config When an AppConfig or SegConfig object is needed, first the existence of the object in the maps is checked. If it has initialized before, returned immediately. Else a new object is created and stored in the map. These maps prevents creation of multiple objects that describe the same application or segment. Flags: get_all_applications_called This flag disallows the excessive calls of get_all_applications function. If it is true, then the needed objects are taken from the initialized_app_config map. 2.8.2 C ++ side Global : std::vector<daq::core::appconfig> all_apps AppConfig objects are stored in this vector after get_all_applications function is called. For the later calls, the objects are taken directly from this vector. <daq::core::segconfig> root_seg The SegConfig object which corresponds to the root segment is stored in this parameter. It is filled after get_segment function call. The nested segments and whole segment tree is accessible from this variable. Flags : get_all_applications_called root_segment_exists These flags are used for filling the above structures in a similar manner. 2.9 Measurements After the optimizations the measurements are done on the computer with the following hardware specifications: Memory: 4 GB Intel(R) Core(TM) 2 Duo CPU E8500 @ 3.16GHz The scripts are tested on ATLAS partition with 13355 application objects. Before the optimizations the execution time was over one hour due to the reason explained above. After optimizations the time reduced to the reasonable limits compared to the C++ versions. 7

Script dal_dump_app_config.py dal_dump_apps.py Time for filling the data structures ~0.5 s (total) Total time with printing ~7 s ~0.6 s (total) ~240 s Table 1 - Time elapsed for Python scripts 2.10 Tests DAL functions are tested individually by comparing the outputs with the original C++ algorithms. dal_dump_app_config.py and dal_dump_apps.py tested on the ATLAS partition. The correctness of the implementation is checked. The tests are provided for future use. 3 Conclusion The thirteen DAL algorithms core, AppConfig, SegConfig classes and their public functions have been made available for Python. With using the explained method, future modifications on the original C++ algorithms will not cause the need of maintenance on the Python side since the it is actually based on the C++ algorithms. Moreover, the package is integrated into DAL to be used conveniently by the users. For now, the binding code is written manually but if some generic rules on the algorithm parameters will be established, such methods can be generated automatically on fly and/or by genconfig. To conclude my experience in The Summer Student Programme 2017, I have learned a lot about how a software developed and integrated into a huge project, the communication between different team members, the organization a research center which includes more than 13,000 individuals as students, fellows, staff, sub-contractors and visiting scientists from around the world. I am so glad to have the opportunity to join the research teams in a multicultural, multidisciplinary environment through the whole summer while learning and socializing. 4 Acknowledgements I would like to thank my supervisors Igor Soloviev and Giuseppe Avolio for their guidance, their explanations on every detail of the project and support throughout whole time. I have learned a lot about how to be a team member and work in a complex project with the help of them. Also, I would like to thank The Summer Student Team Jennifer Dembski, Eszter Badinova, Céline Delieutraz and Despoina Driva for the great organization and all the help and support in case of any need or problem. 8