Superconducting Super Collider Laboratory

Similar documents
ESNET Requirements for Physics Reseirch at the SSCL

Superconducting Super Collider Laboratory. Bridging Fortran to Object Oriented Paradigm for HEP Data Modeling Task. J. Huang

t Database Computing* CJ

ACCELERATOR OPERATION MANAGEMENT USING OBJECTS*

Tim Draelos, Mark Harris, Pres Herrington, and Dick Kromer Monitoring Technologies Department Sandia National Laboratories

and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

FY97 ICCS Prototype Specification

In-Field Programming of Smart Meter and Meter Firmware Upgrade

NATIONAL GEOSCIENCE DATA REPOSITORY SYSTEM

NIF ICCS Test Controller for Automated & Manual Testing

Fundamentals of Design, Implementation, and Management Tenth Edition

for use with the WINDOW 4.1 Computer Program

METADATA REGISTRY, ISO/IEC 11179

On Demand Meter Reading from CIS

OPTIMIZING CHEMICAL SENSOR ARRAY SIZES

The NMT-5 Criticality Database

Adding a System Call to Plan 9

NFRC Spectral Data Library #4 for use with the WINDOW 4.1 Computer Program

Go SOLAR Online Permitting System A Guide for Applicants November 2012

CHANGING THE WAY WE LOOK AT NUCLEAR

High Scalability Resource Management with SLURM Supercomputing 2008 November 2008

zorder-lib: Library API for Z-Order Memory Layout

Development of Web Applications for Savannah River Site

Optimizing Bandwidth Utilization in Packet Based Telemetry Systems. Jeffrey R Kalibjian

Centrally Managed. Ding Jun JLAB-ACC This process of name resolution and control point location

The ANLABM SP Scheduling System

Final Report for LDRD Project Learning Efficient Hypermedia N avi g a ti o n

Fermi National Accelerator Laboratory

Large Scale Test Simulations using the Virtual Environment for Test Optimization

SOME TYPES AND USES OF DATA MODELS

VxWorks v5.1 Benchmark Tests

Database Management System 2

Real Time Price HAN Device Provisioning

Chapter 11: Data Management Layer Design

REAL-TIME MULTIPLE NETWORKED VIEWER CAPABILITY OF THE DIII D EC DATA ACQUISITION SYSTEM

2. An implementation-ready data model needn't necessarily contain enforceable rules to guarantee the integrity of the data.

GA A22720 THE DIII D ECH MULTIPLE GYROTRON CONTROL SYSTEM

ENDF/B-VII.1 versus ENDFB/-VII.0: What s Different?

Advanced Synchrophasor Protocol DE-OE-859. Project Overview. Russell Robertson March 22, 2017

Java Based Open Architecture Controller

E-R Model. Hi! Here in this lecture we are going to discuss about the E-R Model.

CHAPTER 2: DATA MODELS

5A&-qg-oOL6c AN INTERNET ENABLED IMPACT LIMITER MATERIAL DATABASE

Integrated Volt VAR Control Centralized

Big Data Computing for GIS Data Discovery

Testing PL/SQL with Ounit UCRL-PRES

Site Impact Policies for Website Use

Superconducting Super Collider Laboratory

PJM Interconnection Smart Grid Investment Grant Update

SmartSacramento Distribution Automation

GA A22637 REAL TIME EQUILIBRIUM RECONSTRUCTION FOR CONTROL OF THE DISCHARGE IN THE DIII D TOKAMAK

What is database? Types and Examples

Portable Data Acquisition System

NOTES ON OBJECT-ORIENTED MODELING AND DESIGN

MULTIPLE HIGH VOLTAGE MODULATORS OPERATING INDEPENDENTLY FROM A SINGLE COMMON 100 kv dc POWER SUPPLY

LosAlamos National Laboratory LosAlamos New Mexico HEXAHEDRON, WEDGE, TETRAHEDRON, AND PYRAMID DIFFUSION OPERATOR DISCRETIZATION

A VERSATILE DIGITAL VIDEO ENGINE FOR SAFEGUARDS AND SECURITY APPLICATIONS

Generalized Document Data Model for Integrating Autonomous Applications

Graphical Programming of Telerobotic Tasks

Visualization for the Large Scale Data Analysis Project. R. E. Flanery, Jr. J. M. Donato

DOE EM Web Refresh Project and LLNL Building 280

CHAPTER 6 DATABASE MANAGEMENT SYSTEMS

MAS. &lliedsignal. Design of Intertransition Digitizing Decomutator KCP Federal Manufacturing & Technologies. K. L.

Comparing the performance of object and object relational database systems on objects of varying complexity

Requirements Engineering for Enterprise Systems

Information to Insight

Bridging The Gap Between Industry And Academia

CHAPTER 2: DATA MODELS

DERIVATIVE-FREE OPTIMIZATION ENHANCED-SURROGATE MODEL DEVELOPMENT FOR OPTIMIZATION. Alison Cozad, Nick Sahinidis, David Miller

COMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS AND DEVELOPMENT OF HALON- REPLACEMENT FIRE EXTINGUISHING SYSTEMS (PHASE II)

PJM Interconnection Smart Grid Investment Grant Update

3. Object-Oriented Databases

Electronic Weight-and-Dimensional-Data Entry in a Computer Database

Entergy Phasor Project Phasor Gateway Implementation

ALAMO: Automatic Learning of Algebraic Models for Optimization

Data warehouse architecture consists of the following interconnected layers:

DATABASE SYSTEMS CHAPTER 2 DATA MODELS 1 DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT

Advanced Database Applications. Object Oriented Database Management Chapter 13 10/29/2016. Object DBMSs

Evolution of Database Systems

RECENT ENHANCEMENTS TO ANALYZED DATA ACQUISITION AND REMOTE PARTICIPATION AT THE DIII D NATIONAL FUSION FACILITY


@ST1. JUt EVALUATION OF A PROTOTYPE INFRASOUND SYSTEM ABSTRACT. Tom Sandoval (Contractor) Los Alamos National Laboratory Contract # W7405-ENG-36

Introduction to Database Design. Dr. Kanda Runapongsa Dept of Computer Engineering Khon Kaen University

Relational Model: History

How to speed up a database which has gotten slow

CSE 530A. ER Model. Washington University Fall 2013

II. Data Models. Importance of Data Models. Entity Set (and its attributes) Data Modeling and Data Models. Data Model Basic Building Blocks

CS 146 Database Systems

The functions performed by a typical DBMS are the following:

Cross-Track Coherent Stereo Collections

CS 338 The Enhanced Entity-Relationship (EER) Model

Stereo Vision Based Automated Grasp Planning

2 Databases for calibration and bookkeeping purposes

Chapter 8: Enhanced ER Model

Intelligent Grid and Lessons Learned. April 26, 2011 SWEDE Conference

Designing Procedural 4GL Applications through UML Modeling

IN-SITU PROPERTY MEASUREMENTS ON LASER-DRAWN STRANDS OF SL 5170 EPOXY AND SL 5149 ACRYLATE

DAVID M. MALON, EDWARD N. MAY. University of Illinois at Chicago Chicago, IL, USA. Lawrence Berkeley National Laboratory Berkeley, CA, USA

The APS Control System-Network

CS/INFO 330 Entity-Relationship Modeling. Announcements. Goals of This Lecture. Mirek Riedewald

Transcription:

SSCL-Preprint446 December 1993 Distribution Category: 1 J. Marstaller Comparative Performance Measures of Relational and Object-Oriented Databases Using High Energy Physics Data Superconducting Super Collider Laboratory

Disclaimer Notice This repon was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government or any agency thereof, nor any of their employees. makes any warranty, express or implied, or assumes any legal liabilityor responsibilit); for the accuracy, completeness, or usefulness of any information, apparatus. product. or process disclosed. or reprebentsthat its use would not infringe privately owned rights. Reference herein to any specific wmmercial prcduct. process. or service by trade name, trademark, manufacturer, or otherwise. does not necessarily constitute or imply its endorsement. recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarilystate or reflect those of the United States Government or any agency thereof. Superconducting Super Collider Laboratory js an equal opportunity employer.

DSCLAMER Portions of this document may be illegible in electronic image products. mages are produced from the best available original document.

SS CL-Preprint-546 Comparative Performance Measures of Relational and Object-Oriented Databases Using High Energy Physics Data* J. Marstaller Superconducting Super Collider Laboratory+ 2550 Beckleymeade Ave. Dallas, TX 75237, USA December 1993 *Presentedat the Third nternational Workshop on Software Engineering, Artificial ntelligence and Expert Systems for High Energy and Nuclear Physics.?Operated by the Universities Research Association, nc., for the U.S.Department of Energy under Contract NO.DE-AC35-89ER40486. MASTER

COMPARATVE PERFORMANCE MEASURES OF RELATONAL AND OBJECT-ORENTED DATABASES USNG HGH ENERGY PHYSCS DATA J. Marstaller Superconducting Super Collider 2550 Beckleymeade Avenue Dallas, TX 75237-3997 ABSTRACT The major experiments at the SSC are expected to produce up to 1 Petabyte of data per year. The use of database techniques can significantly reduce the time it takes to access data. The goal of this project was to test which underlying data model, the relational or the object-oriented, would be better suited for archival and accessing high energy data. We describe the relational and the objectoriented data model and their implementation in commercial database management systems. To determine scalability we tested both implementations for 10-MB and 100-MB databases using storage and timing criteria. 1. ntroduction J3storically, analysis of High Energy Physics (HEP) event data from detectors has been done using a serial, tape-based approach. Database computing offers several advantages to the traditional approach in physics analysis: 1) t may not be necessary to read an entire event into the computer, when only a few quantities are needed for a given analysis, thus reducing the YO factor considerably. 2) Databases allow simultaneous access of the data. 3) Analysis of the data is immediate and interactive. 4)Data can be selected via simple queries using the standard query languages. The basis for the project described here is within the framework of the Petabyte Access and Storage project (PASS).1 The extent to which HEP can benefit from database technology is to be evaluated. A HEP database, using existing data from the Collider Detector at Fermilab (CDF), was designed to make initial performance measurements of a relational database management system (RDBMS) versus an object-oriented database management system (OODBMS). Approximately 1 GB of CDF data was taken and divided into 100 datasets to be stored on disk. A database requires the data stored to be put into some kind of underlying data model, which provides in turn some form of stable structures. The goal is to determine the data model that produces better results. 2. Conceptual Design When designing a database, the first step is to collect user requirements. These requirements can be put into a high-level construct that provides a preliminary view of how the user perceives the database. This high-level model, the conceptual model, describes how the user, Le., a

physicist, conceives the domain area (HEP data). The conceptual data model was used as a means to produce a logical data model specific to the relational and the object-oriented implementation. A semantic data model, the entity-relationship (ER) model, was chosen to illustrate the conceptual design. An entity-relationship diagram is a graphical technique that depicts the enterprise being modeled as a collection of sets, relationships between the sets, and attributes associated with both entity and relationship sets. The most common extension to the ER diagram includes the abstraction concepts of aggregation and generalization. Aggregation abstracts a collection in such a way that objects (entities) are viewed as a single higher-level object. Generalization is a form of abstraction which captures the commonalties of objects (entities), while at the same time ignoring the differences. The extended entity-relationship model, (EER) model, is fairly simple to construct and has a reasonably unique interpretation. This model produced a diagram (Figure 1) which served as a means of communication among users and designers. The conceptual design ensured that no loss of information occurred, i.e., all entities mentioned in the data analysis tool* were captured. Trlgger.. Run Tape Name has has-a has 1 Muon Trlgger Event Hlts Particle Jet Vertex Segment Track has-a Photon Electron fluon Track version has-a Track flt Figure 1. Conceptual Design. 3. Relational Design The next step in designing the data model is to create a logical data model. The logical model has constructs that are easy for the users to follow and yet still avoids physical details of implementation. t results, however, in a direct computer implementation, in this case in an implementation in the relational database management system. The EER model also depicts the logical data model. Figure 2 shows the partial model. The conceptual data model was used as a means to produce a logical data model specific to the relational and object-oriented implementation. The nature of the objects has an inherent hierarchical structure based upon level of details. To store this data in a traditional relational database, links need to be established between the objects to represent this hierarchy. Each run and event is given a unique identifier. Each child relation (such as Tack, Vertex, etc.) is augmented to include the primary key of its parent. n other words, any relation in the hierarchy must include the primary-key of its parent relation.3 This compares to data from computer-aided

design application. This hierarchy was modeled using weak entities. Weak entities depend on the existence of other entities for their identification.4 The CASE tool we used, ERDRAW,5 uses non-directed edges, labeled D, to show the dependency. The entity Track was modeled to reflect the CDF data. The entity Track is the generalization of two homogenous entities6 Defaulttrack and Otherversiontrack. The homogenous entities Defaulttrack and Otherversiontrack only contain the common attributes, whereas the generalization Track has additional attributes. ERDRAW provides the directed edges labeled SA for illustration. 4. Object-Oriented Design Figure 2. Part One of the logical design for RDBMS using ERDRAW notation. The ultimate goal of the object-oriented data model design is to produce a system that consists of a collection of objects. The focal point of the design task is to determine and specify those classes that govern the structure and functionality of these objects. As in the relational design method, the key is to identify the abstractions that occur in the problem domain. Objects are concepts, abstractions, or things with crisp boundaries within the problem domain (HEP). An object class describes a group of objects with similar properties (attributes), common behavior (methods), common relationships to other objects and common semantics. A class can be viewed as a template, which specifies the intended purpose of the objects or as an abstraction mechanism. Once the object classes were established, we identified and recognized relationships that exist between the classes within the application domain (HEP). The main goal

is to recognize patterns for aggregation, generalization, specialization or inheritance on both @e object and the class level. This object-oriented logical data model was produced using OMTool.7 OMTool follows the basic conventions for object model notation as described in Rumbaugh.8 Figure 3 shows the logical design for the OODBMS. The class Event is on top of the hierarchy and depending on the event the corresponding instantiations of the other classes can be found, i-e., Hit, Track, etc., have a relationship to Event so each of these classes is part of an Event. The class Particle is modeled as an abstract class (one that has no direct instantiations). The subclasses of Particle, the different types of particles, inherit all attributes from their superclass. a 1 ti Particle Track version \ /- Electron Photo: cj Track Forwarde Key A Centralm Generalizatlon A s s o c l a t i o n (many) -O 4 ASSOClatlOn (Zero or one) - Assoctatton ( E x a c t l y one) 1 Centrale 0 Aggregatlon 9 Centralseg, Pluge Forwardm Plugm - r Plugseg Figure 3. Logical Design for the OODBMS using OMTool notation.

5. Testing Criteria For the purpose of this project, we wanted to measure how well the underlying data model reflects and enhances the performance of the system. Our test measures response time, i.e., the real time elapsed, from the point a program calls the database system with a particular query, until the results of the query, if any, have been placed into the program's variables. There are two broad categories of performance measures: time and storage. We wanted to determine the storage efficiency of the database management systems; Le., determine what fraction of a database is actual source data and what is database management overhead. We had a 10-MB and a 100-M3 database and checked the size of the database dumpfiles on disk. We looked at only three time measurements: 1) the time it takes to create a database, 2) the time it takes to archive a database, and 3) the time it takes to complete the queries. All tests are run on disk resident databases. 6. Results The goal of our test was to model a specific environment, the high energy physics analysis environment, and the aspects of how the two database systems are performing in this environment; therefore, a lot of time was spent in modeling the HEP data and reflecting this environment in the relational and object-oriented data model. All measurements are summarized in Figure 4. RDBMS OODBW Storage: Time: Populate Dump db Load db Query:vfit 84 MB 8.4 MB 3:47.0 7.2 27.8 24.4 ~~ ~ 28:18.0 1:37.6 533.4 36.0 ~~ 6.4 MB 254.5 22.7 22.7 15.5 Figure 4. Time Measurements in min:sec. 1/10 sec. 64 M B 32:18.9 4:12.1 4:12.1 44.9 1 Storage: To measure the overhead of each database management system, we determined the size of the files to be archived. The relational database dumpfile is larger because it stores a selfcontained database that includes all of its pertaining information. The object-oriented database schema is not part of the size of the database file size recorded. For both database systems, the schema remains static throughout analysis. The schema must, however, exist prior to loading a database file into an existing healthy database. For the OODBMS, the schema files are of insignificant size, so we did not archive them with the database file. The OODBMS database file is approximately 25% smaller than the RDBMS database file.

Time: 1. The first time measurement was to see how long it takes to create and fill (populate) a database for both database systems and both sizes (10 MB and 100 MB). The RDBMS showed much more consistent results in creating and loading both database sizes. When loading a relational database, only the actual data is loaded, whereas when loading an object-oriented database the pointers, or the paths, have to be loaded also. This reflects the time increase for the OODBMS. For 10 MB the OODBMS performed approximately 25% better than the RDBMS, however, for 100 MB the performance deteriorates and becomes approximately 12% worse than the RDBMS. 2. Archiving the database files was the next time measurement recorded. For both DBMS, we used vendor-provided utilities to archive the datafiles. Archiving included loading a database file from disk and dumping a database file for storage. Both measurements are important when 1 GB of data is divided into separate datasets so that archiving can eventually be done in a distributed, parallel fashion. The RDBMS performed consistently better than the OODBMS. To retrieve a database file, the RDBMS requires approximately 18% less time for the 10 MB and 24% less for 100 MB. To store 10 MB for the RDBMS it requires approximately 31% and for 100 MB 38% less time than for the OODBMS. 3. For the query used as a test measurent, we selected a track refitting query. There are other physics query results,* but the refitting query reflects a standard traversal over three quite large tables, five levels deep. The RDBMS scales up better, without optimization from the designer (25%). n the OODBMS, where the user has to take care of all query optimization, using such features as path of, which will store a predefined indexed path for a query, the performance can be increased dramatically by implementing such changes. Both relational and object-oriented databases have their own inherent problems for the physics research community. Relational databases provide a simple way of extracting data with SQL, but all analyses must be done outside the database. n other words, for the algorithms there still needs to be a layer of application programs on top of the database. SQL does not have sufficient capabilities, such as recursion. This causes problems, such as the cost of writing application programs, the granularity of the data present in the programs and the impedance mismatch, however, this approach seems to be rather attractive to the physicists since most of the algorithms are already implemented in FORTRAN. Object-oriented databases, on the other hand, are less mature than relational systems and are more difficult to fine tune for performance. The grand picture is, of course, to develop a system that is very flexible, modular, scalable and distributed. t should, essentially, take data and process it into information, preferably visual information, so that the user can extract facts and make meaningful conclusions from that. Use of databases is essential for these projects, however, the design of such a database is not yet fully understood. The next-generation database application has little in common with current business data processing. Since a need for much larger, more complex datasets exists, new capabilities in areas such as type extensions, complex objects, rule processing and archival storage are required.

This project was not designed to be another benchmark in the area of database comparison, therefore, traditional database benchmarks, such as insert were not measured. We wanted to show the performance characteristics of the database management systems in the application domain, Le., high energy physics applications. The benchmarks designed for generic use and overall performance were applicable, but less desirable than designing specific tests. 7. Acknowledgments This paper has been produced within the framework of the PASS project1 at the SSCL. The author has participated in this project, designing and implementing the databases and creating the performance measures. She would like to thank Alain Gauthier for developing the track refitting query incorporating existing FORTRAN code. She would like to express special thanks to Ed May and Ute Nixdorf from the SSCL who were crucial to the success of this exercise. 8. References 1. 2. 3. 4. 5. 6. 7. 8. E. May, D. Lifka, R. Lusk, L. Price, D. Baden, R. Grossman, C. T. Day, S. Loken, J. F. Mcfarlane, L. Cormell, P. Leibold, D. Liu, M. Marquez, U. Nixdorf, B. Scipioni, and T. Song, The PASS Project: Database Computing for the SSC, Proposal to the DOE HPCC nitiative, Argonne National Lab, 1991, unpublished. J. Marstaller, Comparative Pe$ormance Measures of Relational and Object-Oriented Databases for High Energy Physics Data, Baylor University, 1993, unpublished. E. F. Codd, Extending the Database Relational Model to Capture More Meaning ACM TODS (December) 1979. V. M. Markowitz, and A. Shoshanie, Representing Extended Entity-Relationship Structures in Relational Databases: A Modular Approach, ACM Transaction on Database Systems, 1992. E. Szeto, V. M. Markowitz, ERDRAW A Graphical Schema Specification Tool, Lawrence Berkeley Laboratory, 1990. V. M. Markowitz, and A. Shoshanie, Representing Extended Entity-Relationship Structures in Relational Databases: A Modular Approach, ACM Transaction on Database Systems, 1992. General Electric Company, OMTool. User Guide for Sun Microsystem Workstations, Pennsylvania, General Electric Company, 1992. J. Rumbaugh, et al., Object-Oriented Modeling and Design, Prentice Hall, New Jersey, 1991.