CDF/DOC/COMP UPG/PUBLIC/4522 AN INTERIM STATUS REPORT FROM THE OBJECTIVITY/DB WORKING GROUP

Size: px
Start display at page:

Download "CDF/DOC/COMP UPG/PUBLIC/4522 AN INTERIM STATUS REPORT FROM THE OBJECTIVITY/DB WORKING GROUP"

Transcription

1 CDF/DOC/COMP UPG/PUBLIC/4522 March 9, 1998 Version 1.0 AN INTERIM STATUS REPORT FROM THE OBJECTIVITY/DB WORKING GROUP W. David Dagenhart, Kristo Karr, Simona Rolli and Krzysztof Sliwa Tufts University Department of Physics and Astronomy Medford, Massachusetts USA Ruth Pordes and Mingshen Gao Fermi National Accelerator Laboratory Batavia, Illinois ABSTRACT We present some preliminary results of benchmarks of the Objectivity database system obtained by our group. We show results using prototype "loader" and "reader" modules that run under AC++, which provide necessary interface between the Objectivity/DB database and the TRYBOS/YBOS records. We also summarize the highlights from the 1998 CERN RD45 Status Report. The selected items are relevant to the architecture, scalability and performance issues of the large volume data management system based on Objectivity/DB, as we propose for CDF RUN-II data. 1

2 1 Introduction We present some preliminary results of benchmarks of the Objectivity database system. First, we show results derived using prototype modules that run under AC++. Second, we show tests of simpler standalone processes that write/read arrays. Finally, there is a brief summary from the 1998 CERN RD45 Status Report. The selected items are relevant to the architecture, scalability and performance issues of the large volume data management system based on Objectivity/DB, as we propose for CDF RUN-II data. 2 AC++ Prototype Module Results In this section, we present results derived using prototype modules that run under AC++ (Framework). One module reads banks from a TRYBOS record and copies them to Objectivity persistent objects. We call it the \loader". The second module is an input module. It reads the data les created by the \loader", then uses functions to recreate the banks in a TRYBOS record, and passes that record to Framework. Other modules could process that TRYBOS record in a completely transparent fashion. At present, 36 run 1 banks have been redened as persistent classes and can be loaded and read. The persistent objects store the data eld and bank number from the bank. The rest of the bank header and the type header are not stored, but can be recreated by a function. The following results are measured using 150 MBytes of data from a run 1 monte carlo le. These results are preliminary. At present, the granularity of the objects is one bank to one object. We are already working on composite objects, which contain multiple bank sized objects or arrays of them. When these are implemented, we expect performance to signicantly improve. These composite objects are not at arrays of bytes, but contain other objects. In addition, we have done studies of large at arrays and are considering using them only for the storage of raw data, not reconstructed or higher level objects. For both composite objects and at arrays, we expect the performance to improve, because the persistent objects will be larger. The database includes event objects and tag objects. All objects in an event are connected to these through associations. The input module iterates over the tag objects. Associations are used to access all the other objects in the event. The reading benchmarks reect reading 100% of the data in the database. They do not show benet one would see when reading nonsequentially. It is important to keep in mind that the following results will change as our data model evolves. Results of benchmarks done with TRYBOS are included for comparison. 2

3 Objectivity space overhead = 31% TRYBOS space overhead = 33% Objectivity Load and Write speed = 0.43 MBytes/sec TRYBOS Bank Copy and Write Speed = 1.8 MBytes/sec TRYBOS Write Speed = 4.1 MBytes/sec Objectivity Read speed (Objects into memory) = 3.7 MBytes/sec Objectivity Read speed (with copy to TRYBOS record) = 2.8 MBytes/sec TRYBOS Read speed = 8.3 MBytes/sec The rest of this section gives details of how the tests were done. Skip to the next section if you are not interested in the details. The space overhead is calculated based on the size of the Objectivity output les after ootidy (147 MBytes) divided by the size of the data elds in the banks (112 MBytes), then subtracting 1 and converting to a percentage. For TRYBOS, rst a bank copy program is run that copies out the banks which the loader can translate. The size of the original TRYBOS le is 238 MBytes. The size of the le with only the selected banks is 150 MBytes. This 150 Mbyte le was used for all the TRYBOS tests. The input le is /simona/simona/cdfloader/wbb q.trk on b0it04. The write speed is calculated by dividing the size of the output le by the time to create it. For Objectivity, the time includes the time to run the process (373 sec) plus ootidy time (63 sec) minus the time it takes to read in the data using TRYBOS standard input module (91 sec for the 238 Mbyte input le). For TRYBOS, there are two ways to measure write speed, depending on what you want to compare. In the rst method, there is one TRYBOS record that is lled by the input module, then the output module takes a second record and copies in the data one bank at a time, then the second record is written to disk. This is analogous to loading the objects and then writing. In the second method, the record delivered by the input module is directly written to a new disk le. There is no copy of banks to a new record. For TRYBOS, the time is the process time minus the time to read in the data. The read speed is the size of the database les divided by the time to read through it. All the data is read into memory, including the contents of the oovarrays in the case of Objectivity. Only the input module is run, there is no processing done on the event. For Objectivity, it takes 52 seconds to read 147 MBytes when copied into a TRYBOS record. It takes 39.4 seconds when the data is only read into memory, not copied to a TRYBOS record. For TRYBOS, it takes 18.1 seconds to read in the 150 Mbyte le. This is using a modied version of the FrameMods input module that does not reallocate a new event record every event. Using the standard module the read rate is 2.6 MBytes per second. In this case, most of the time is spent allocating new records and this is not a very interesting number. 3

4 The results depend heavily on the hardware used. In all cases, b0it04 was used. b0it04 is a Silicon Graphics 195 MHz R10000 machine. The disk was local to the machine. It was a Seagate ST19171N with rated "Internal Formatted Transfer Rate" of 7.86 to 12.2 MBytes/sec. The disk was tested to write using low level I/O at least 4 MBytes/sec. The disk read speed was tested with the UNIX command \time cat wbb11.try" piped to /dev/null, where wbb11.try is the 150 Mbyte disk le. It took seconds, which is 8.4 MBytes/second. All the benchmarks were run when no other signicant processes were running. This was determined using the UNIX \top" utility. Note, about 30% of the time to write data was taken by the ootidy process. But ootidy only reduces the size of the output le by 1% or 2%. It is not clear the size savings is worth the extra write time. We dene 1 MByte/sec to be 1,000,000 bytes per second. 3 Simpler Standalone Results 3.1 Reading/Writing Many Identical Fixed Size Arrays First, we show tests of a process that writes a large number of identical persistent objects to the database. Each object contains one array of a xed size. The test is repeated with dierent array sizes. The arrays are lled with random garbage. To keep it simple, there are no oovarrays or associations. A second process is used to read through the data created by the rst process. See gure 1 and table 1. The details of the test are very similar to the test described in the rst section. The same hardware was used. The size of the data is roughly 160 MBytes in every case. The write speed is the size of the output le divided by the time to create it (including time for ootidy). The read speed is the size of the input le divided by the time to read it all into memory. Objects are located using the Objectivity scan function to initialize the iterator. The hardware read speed was calculated using the UNIX command "time cat lename" piped to /dev/null. The hardware read speed was recalculated for every le to account for the variation caused by what part of the disk the le is written to. For the larger objects, the reading speed is roughly 90% as fast as the hardware can go. The hardware write speed was tested using low level I/O to write 1 MByte blocks to a le. The hardware write speed is very roughly 4 MBytes per second. The writing speed is less than 1/4 as fast as the hardware will go. Benchmarks at dierent page sizes are in progress. The plot in gure 1 shows space overhead as a function of object size. The space overhead is the sum of two components. One component varies like 1/(object size). This is important for smaller objects. This component will increase proportionally as associations, oovarrays and other things are added to objects. In the simple case tested here, the overhead is 14 bytes per object. The other component depends on the page size. The saw tooth part of the curve in gure 1 relates to space wasted at 4

5 Table 1: Read and Write Speeds as a Function of Object Size (with pagesize 8192 bytes). Errors on the speeds are roughly 10%. Size of Space Write Speed Read Speed Hardware Read Objects (bytes) Overhead (MBytes/s) (MBytes/s) Speed (MBytes/s) 80 19% % % % , % the beginning/end of the pages. Changing the page size would be like rescaling the x axis in gure 1. The height and shape of the sawtooth peaks would not change. The peaks are located at (pagesize)/n and (n+1/2) * (pagesize), for n = 1, 2, 3 : : :. It is a little dicult to compare the results in section 1 to these results. The average bank size for the results in section 1 was 160 bytes. But the overhead per object was larger, because there are associations, oovarrays, tag objects, event objects and other things that make the overhead per object larger. These things also take signicant amount of time to create and a little bit more time to read. 5

6 Figure 1: Space overhead as a function of the size of the persistent objects. Overhead is calculated by dividing the size of the database output le by the size of the input data (number of objects times the size of the data in each object), then subtracting 1 and converting to a percentage. The curve is a prediction based on our understanding of how Objectivity works. The points show our test results. The top plot covers objects of size 32 to 1,000 bytes, the lower left plot 1,000 to 9,000 bytes, and the lower right plot shows 9,000 to 90,000 bytes. This is a sum of overhead from 2 sources. First, there is 14 bytes of overhead per object. This varies like 1/x and is only signicant in the top plot. Second, the sawtooth function reects the space wasted at the end/beginning of each page. 6

7 3.2 Bank Loader Program Using Flat Arrays In this test, an alternate schema was tried. It shows that we can expect performance to improve with larger objects. It is also a schema we are considering using for the raw data. This schema uses large at variable sized arrays. \Flat" means that the type information is not stored within the schema, it is just an array of bytes. This approach requires another package (TRYBOS) to do type conversions and access the data. The schema just gives persistent storage for arrays. In this test, all the banks in an event with the same name were stored in the same array (one might also store entire TRYBOS records this way). We used the same hardware, same measurement assumptions, and the same 150 MBytes of monte carlo data used in the test described in section 1. The overhead was measured to be 15%. The writing speed was measured to be 0.89 MBytes/sec. Reading is not implemented yet for this alternate schema. 3.3 Nonsequential Reading of Data One can expect to save a lot of time by reading data nonsequentially. We demonstrate this in a simple test. The process reads data from events which were written into the database by our loader program. There is a cut based on data in the tag object. The rest of the data in the event is only accessed if the cut is passed. The numbers below show a nonsequential read is faster. In each case there are 10,000 total events. Events passing cut = 9997 Events passing cut = 7510 Events passing cut = 5241 Events passing cut = 4055 Events passing cut = 2276 Events passing cut = 707 Events passing cut = 553 Time = 152 seconds Time = 138 seconds Time = 104 seconds Time = 87 seconds Time = 59 seconds Time = 19 seconds Time = 17 seconds The results above will change as we change the event structure in our database. The only conclusion one should draw is that nonsequential reading is faster. One should not presume that the relationship between time and the fraction of events selected will be the same in the nal data model. 4 Highlights from the 1998 RD45 Status Report For the Year 1998 the RD45 Collaboration has been asked to reach the following milestones: demonstrate that an ODBMS can satisfy the requirements of a typical simulation reconstruction and analysis scenario with data volume up to 1 TB 7

8 investigate the impact on the every day work of the end user physicist when using an ODBMS for event data storage. The work should address issues related to individual developers' schema and collections for simulation, reconstruction and analysis demonstrate the feasibility of using an ODBMS and MSS at data rates sucient for ATLAS and CMS 1997 test beams requirements Milestone I The main requirements for reconstruction is that the ODBMS should be able to keep up with the rate at which data is acquired: 100MB/sec for ATLAS/CMS 1.5GB/sec for ALICE. (For CDF this number is of order 60MB/sec, and it's the reconstruction rate to which the Input/Output module should keep up). It is assumed that reconstruction is done using a farm. The NA45 experiment demonstrated already that up to 32 streams can write into a single Objectivity/DB federation using a lock free strategy. We think that an extrapolation to 60 streams is not unreasonable. This should give a rate of 1 MB/second per stream. The conclusion is that I/O rates for reconstruction purposes are not considered to be a problem. For analysis it is assumed that some 150 users will be performing analysis concurrently at any one time. The following setup has been developed at Caltech: A 256 processors system has been built on 16 nodes, each with 16 processors. Each node was connected to 4-way striped disk array, capable of delivering 22 MB/s. The nodes were connected by fast switching fabric. Data have been put on two nodes (for a total of 2 10 GB), and the I/O-intensive clients on the nodes which hold the data. The CPU-intensive clients were located on the other nodes. The following assumption has been made: 1 physics job executes as N clients ( or client threads), each client doing mostly sequential reading. The load of M physics jobs has been simulated by putting M*N Objectivity clients on the machine. Each client does mostly sequential reading (traversing containers): 1/3 of all clients read 10KB objects and computed for sec per analysis on the object; 1/3 of all clients read 100KB objects and computed for 0.1 sec per object; 1/3 of all clients read 500KB objects and computed for 10 sec per object The system got up to 158 clients running in parallel. The conclusions are that HP Exemplar I/O scales to 100+ readers for data on 2 nodes. There were no no I/O performance degradation with 100+ readers. There were no crashes of the lockserver (570 entries in lockserver table). The combined throughput of all clients was about 18 MB/second, essentially constant for up to 100 concurrent clients. These result suggest that scaling to 150 concurrent analyses is achievable today, without resorting to replication. 8

9 4.0.2 Scalability In version 4 of Objectivity it was not possible to create individual databases (DB) larger than 2 GB which, combined with the limit of 2 16 databases allowed in a federation, was a limitation the maximum volume of data one could store in a federated database. In the version 5, RD45 has veried that the 2 GB limit no longer exists, les up to 25 GB is size have been created. The current le size in only limited by the underlying lesystem: it is practically unlimited on 64-bit systems. The largest federation created to date is of order 0.5 TB ( limited only by disk space). Many federations containing over 1000 DB have been built. Using 25 GB databases just 40 are needed to build a federation of 1 TB. In practice, building federations larger of the order of 1 TB requires an interface to a mass storage system. In version 6 of Objectivity, due end of 1998, the mapping between containers and les will be possible. With this modication, one could build very large federation using small les, of the order of 1GB, which may be advantageous over using large, say GB, les. Attempts to build very large federation using HPSS-managed storage are currently under way, as CERN plans to use the Objectivity/HPSS for production with about 300 TB of data for two experiments, COMPASS and NA45, in Data reclustering The data reclustering issue has been investigated both in ATLAS and CMS. To study the potential performance gains of reclustering a prototype has been developed in CMS, based on a mechanism for clustering data into collections and accessing the collections with read ahead optimization. The read ahead optimization allows the clustering of dierent types of objects to be managed in an independent way, and also makes it possible for the batch reclustering operation to conserve the database size while preserving optimal throughput. The schedule only needs to be computed once for every job, and this allows the optimizer to use fairly complex computations Data Import/Export In Objectivity/DB it is possible to copy a database and attach it to another federation It is required that the target federation should be compatible i.e. have a consistent schema for at least the subset of objects in the database that are copied and share database parameters such as database pages. A copied database may be (re-)attached with a new database ID in which case the object identiers of all contained objects are automatically updated. A more complete solution however is to provide a deep copy utility, which copies objects and objects that are referenced. Such a tool has been developed by BaBar to assist in their data import/export. An associated problem is that of maintening consistency between federations. BaBar intend to use a simple database-id allocation scheme which ensures that the database Ids used by dierent federation are compatible. 9

10 4.0.5 Production database services. The federation catalogue is crucial for the database, and it should be protected from possible corruption and failures. A number of production database services will be establish at CERN during 1998, most importantly, a dedicated server on which the lockserver for the primary partition would run. This machine would also contain a copy of the federated database catalog and schema. These servers would have mirrored le-system for the operating system and database data, in order to oer maximum reliability of the Dataserver on which the Objectivity/DB servers (AMS servers) would run. These servers would have several hundred GB of disk space which would be typically managed by HPSS. At least one data server per experiment would be non-hpss managed for data that must reside permanently online, such as calibration data, production control and data collections of event tags. Such services will be established for ALICE, NA45, ATLAS, CMS and COMPASS. At SLAC the BaBar experiment will take some 200 TB of data per year starting in They intend to use a combination of Objectivity/DB and HPSS on which to base their event store. For the details of the data model see Milestone II. Access to a given Objectivity/DB federation is determined by invoking an environmental variable. Helper classes, distributed as part of the HepODBMS class libraries, reduce the amount of knowledge about the details of frequently performed database operations, for example initialization et cetera. At the interactive analysis stage browser's are developed to navigate through the database, nd an appropriate collection of events and analyse it. The system allows a user to access the data as a logically single entity, without the need for knowing the physical location of data, or details of the staging system or a book-keeping system Collections Collections of persistent objects, even collections or others, are of obvious importance for the users and the database administrator. Database administrators will try to optimize the overall system performance by redening physical clustering of event collections shared by one of more physics analysis groups. A workshop was held in February 1998 at CERN, and the key requirements have been dened: a single class for user interface STL-like interface, including a forward iterator support for collections of up to events support for a "description" of the collection set-style operations based on a unique event identier 10

11 The goal is to develop a rst prototype of event collection classes in time for the end of April 1998 RD-45 workshop. The goal is to incorporate a version of these classes in the 98A release of the LHC++, scheduled for June/July LHC++ analysis model A new approach to analysis is possible if the data is stored in an ODBMS. Comparing to the NTUPLES, with most of HEP users are quite familiar, it is no longer necessary to repeat the entire NTUPLE-generation stage if an additional variable is required in the NTUPLE. It is possible because of associations provided by the ODBMS. A scheme for performing interactive data analysis has been developed in the context of LHC Schema Handling Issues Each persistent-capable class is given a type number, allocated sequentially. Maintaining a type-numbering scheme of an application ( or library) in agreement with the target federated database schema is therefore an essential requirements. To remove type number coupling between dierent packages introduced by the sequential type number allocation, Objectivity oers the, so called, \named schema" feature. This feature allows to divide the type number space into named subsets when running the ddl processor. Each of these individual named schemata is reserving a range of 64K type numbers, allowing the developers of dierent packages not to enter in mutual conict. Already some 16 schema names have been allocated for the various LHC++ packages. User schema are then possible, allocating a named schema for each user, on demand. 4.2 Milestone III ODBMS-MSS interface. The production version is scheduled to be delivered by the end of A prototype of the interface has been produced for IBM AIX systems, the only system on which HPSS is currently ocially supported. The interface is provided in such a way that end-user sites are able to optimize the I/O layer, even substituting a dierent mass storage system provided that a compatible interface is written. Objectivity/DB applications will be unaware that the associated data resides in HPSS managed storage. When an object is accessed it will be returned immediately if the corresponding database is already on disk, if not the client will block on the implicit database open whilst the server, through HPSS, causes the necessary le to be reloaded from tape. The current HPSS-NFS and the "simple API" interfaces, in which a block is read at a time, are ecient for very large blocksizes - between 1-10 MB. Unfortunately, 11

12 databases typically transfer much smaller amounts of data (the pagesize we are considering to use for the RUN-II CDF Objectivity/DB database is 8 kb). A better strategy should be to read multiple blocks at a time and hence minimize the interaction with the HPSS server. However the performance implications of the current prototype are not yet well understood and it is expected that stress testing over the coming months will suggest areas where improvements are required. An alternative solution would be to use HPSS as a conventional staging system and let the Objectivity/DB to read and write directly to standard UNIX lesystems. This would avoid the performance overheads associated with reading (or writing) from (to) HPSS-managed disk storage, but would require additional space management of the disk pool. At CERN, this direction is pursued now, the existing tape staging software already provides such a capacity and is being interfaced to HPSS. This solution is currently considered the most viable short term solution OBJY-HPSS installation at CERN. In the current HPSS test conguration at CERN, the various HPSS components are distributed across multiple systems. For example, the tape mover(s), disk mover(s) and HPSS nameserver all run on dierent systems. In addition an IBM system is currently being used to evaluate the Objy-HPSS prototype interface. As such this system runs both the Objy/server (AMS) and the HPSS disk mover, together with the rest of the environments required by HPSS, such as DCE OBJY-HPSS conguration at SLAC. Unlike at CERN, SLAC currently plans to run the various HPSS components and the Objectivity/DB server on a single powerful system. Although such a scenario has the advantage of reducing the network overhead involved in the inter-module communication, it is inherently a less scalable scenario, but nevertheless well-suited to the environment at SLAC, where the system will be used to support a single experiment Functionality tests. The basic functionality required by the proof of concept prototype have been demonstrated. Test have been also made to access tape-resident databases. This area needs a signicant amount of further study and it is going to be the subject of primary attention in The target for a production-quality interface remains the end of 1998 and it is scheduled for inclusion in Objectivity/DB Version 6.0. The schedule is not unrealistic given the fact that two large volume experiments, BaBar and Compass, will start taking data in

13 4.2.5 CMS test beams experiences. There has been no integration with HPSS (given the small amount of data - less then 100GB - ). The data rates involved were well below 1MB/sec. The two test beam activities can be considered as a production demonstration of the overall LHC++ environment, from data taking to analysis The H2 test-beam Online Event data recording DAQware ( OBBMS unaware) Objectivity/DB formatter ( Objy-dependent) Control system ( could use ODBMS) CDF ( Dependent on objy fault tolerant option) Asynchronous data recording ( Objy dependent) Oine data processing reconstruction framework (ODBMS-based) interface to simulation user persistent class ( Objy dependent) Interactive analysis environment: data browser HisOOgrams ( ODBMS-based) HisOOgrams visualize ( ODBMS-aware) After a few days of running the system run essentially unattended without major problems. The only manual operation was to change the output disk every 9 GB. Further development expected in The X5B test-beam Online ( data recording) DAQ/conversion of ZEBRA les Objectivity/DB formatter Central Data Recording Online monitoring/data quality Oine (data processing) simulation framework ( interface to GEANT-4) analysis and reconstruction framework 13

14 Interactive analysis tools : HisOOgrams HisOOgrams visualizer ( HEPRxplorer/HEPInventor) The Objy reformatter performs the following operations: gets the data from the ZEBRA server using the proxy pattern; creates the databases and containers; creates the event structure; populates the database. It is clear that the reformatter is very similar in nature to our CdfLoader module. 4.3 Database Administration tools A tool for monitoring and administering an Objectivity/DB federated database has been developed. A rst version, has been built using the Objectivity/DB Java binding. Using this tool the database administrator is able to observe, control, and manage the basic federated database functionality as well as the autonomous partition and data replication options. Functionality of this tool is divided in three major groups: conguration, handling the functionality of the autonomous partition and data replication options - in other words it allows the administrator to create or delete partitions, replicate database images, vary partitions on/oine, resynchronise images and so on -; control, allowing and administrator to monitor and control the database servers; and statistics group oering the possibility to run a number of tests to check data transfer throughput of a given autonomous partition. 4.4 Tests with large number of images. The data replication option has been testes up to 100 images - the limit coming from the number of nodes that could be conveniently be used for this purpose and not from any limitation in Objectivity/DB. The time taken to both create persistent objects and commit the corresponding transaction increases with the number of images involved. This is expected, not only does the transaction not complete until the data involved has been safely written on disk on all servers, but more network trac is involved. There have been wide area tests: two images at CERN and one at Caltech. The data rate is strongly correlated to the hour of the day, the measurements varied from 2Kbit/second to 20 KBit/second. The conclusion is that the basic functionality oered by the Data Replication Option behaves as documented. It is important to stress that the required network bandwidth must be made available. Oine replication remains the most appropriate option for large data volumes, with the networks used typically in HEP today. Replication is still a viable solution for small data volumes, as calibration data. 14

15 5 Use of OBJECTIVITY/DB in HEP AMS (Alpha Magnetic Spectrometer) is an experiment that will take data on the NASA Space Shuttle (to be launched May 1998) and later on the International Space Station. The AMS collaboration has been using Objectivity/DB in test and plans to use it to store their production data, slow control parameters and NASA auxiliary data. ALEPH - started an exercise to convert their ADAMO-based mini-dst to Objectivity/DB. ALICE - the ALICE oine team is currently focusing on GEANT-4 (which uses Objectivity/DB as their standard output). They plan to study the Objy/DB based solutions, but only in the context of GEANT-4. ATLAS - is developing a number of prototype applications using Objectivity/DB, in both o-line and on-line. BaBar -plans to use Objectivity/DB to store their data starting in They expect to collect about 200 TB of data per year, all of which will be stored in the federated Objectivity/DB database. The HPSS storage manager will also be used. BELLE - start taking data in the fall of 1998, plan to use Objectivity/DB to store detector constants, and later the mini/micro-dst for rapid data analysis. CHORUS - using Objectivity/DB for an on-line emulsion scanning database. They also evaluate Objectivity/DB as a potential solution for the proposed TOSCA experiment. CMS - using Objectivity/DB for a number of prototype applications, including the test beam activities. The current baseline assumption is that (as ATLAS) they will use Objectivity/DB coupled with HPSS to store their data as persistent objects in an object database. NA45 - they have been using Objectivity/DB in production since A number of production runs have been performed, with a total data volume of 30 GB. For 1998 their plan is to make tests of Objectivity/DB together with the central data recording and HPSS, in preparation for the 1999 data run, where TB of data is anticipated. NA-48 - maintain the detector conguration database in Objectivity/DB. Recently, they initiated a project to store their micro-dst (and perhaps more) in Objectivity/DB (following ZEUS). RHIC - The RHIC experiments in Brookhaven plan to adopt a common strategy 15

16 for their data storage. The current plan is to use Objectivity/DB and HPSS. Experiments involved include BRAHMS, PHENIX, PHOBOS and STAR. Data volumes for both PHENIX and STAR are expected to be around TB/year. ZEUS - they built a tag database on Objectivity/DB, and were using it since end of This database has been built from the physics data in the ADAMO database. The new system is considered signicantly more exible than the past one, and if oers much improved performance - by allowing users to access only the data they need. 6 Comments on comparisons quoted by the ROOT team the dierence in size of the les comes from two factors: 1) there was no optimization of the Objy database page size performed, which factor can account for perhaps 1/3 of the dierence 2) ROOT uses data compression, and the LHC++ at present does not. (It is possible to use the data compression with Objectivity/DB. One can compress physical entities (les, containers) or logical entities (VArrays or objects). Compression has been tried by RD45 using compression based on zlib (gunzip), with the default compression method. Tests performed with ATLAS Production Ntuple showed possible gains of between in le sizes. One should remember that the gains depend on data used and access patterns, as compression increases imbalance between sequential and random access speeds. It also increases the CPU load on the server or client.) The dierence in time of going across the database This was measured by doing the comparison in a very skewed way, basically comparing apples and oranges. The database given to the ROOT team by RD45 was built to demonstrate the ease of traversing from the "tag" to rest of the data. It was not intended to be "ecient", just to demonstrate the functionality. The database traversal speed dierence went completely away when Objy database and LHC++ were congured more properly. It is basically impossible to nd out from cdf4497 how the various comparisons were made. This is too bad, as there is room for comparisons. However, they should be conducted in a sensible way to demonstrate that the architecture of the system will scale to our requirements. This is the issue at hand. The dierence in the 1-D histogramming speed After a few, simple, changes to make the comparison more fair and meaningful, LHC++ was demonstrated to deliver similar performance to ROOT. A large 16

17 factor in histogramming speed was due to the old and unoptimized version which was used by ROOT team when using it with Objy database (ROOT was 30x faster than the "old" LHC++ histogram; the new version of LHC++ histogram code is 25x faster than the "old" code. The LHC++ team plan to totally replace the current histograms with a new templated design. LHC++ gave the ROOT team the library which was available at the time the request was made, in order not to slow the exercise meant as an attempt to show that it is possible to access and manipulate NTUPLES stored in Objectivity using ROOT. 7 References \A solution for data handling based on an Object Oriented Database System and its Applications to CDF RUN II" CDF/DOC/COMP UPG/PUBLIC/4346 \ Flows and controls for a data handling solution based on an Object Oriented Database System " CDF/DOC/COMP UPG/PUBLIC/4493 \Gedanken experiments for a data handling solution based on an Object Oriented Database System and its Applications to CDF RUN II" CDF/DOC/COMP UPG/PUBLIC/4492 \Implementation plan for a a data handling solution based on an Object Oriented Database System " CDF/DOC/COMP UPG/PUBLIC/4503 \RD45 - A persistent Object Manager for HEP" CERN/LHCC \Object databases and mass storage systems: the prognosis" CERN/LHCC \ ATLAS Computing Technical Proposal" CERN/LHCC \Object databases and their impact on storage-related aspects of HEP computing" CERN/LHCC 97-7 \Object database feutures and HEP data management" CERN/LHCC 97-8 \Using an object database and mass storage system for physics analysis" CERN/LHCC

18 \ Status Report of the RD45 project" CERN/LHCC 97-6 \ Status Report of the RD45 project" CERN/LHCC 98-x Not yet available to the general public. 18

Clustering and Reclustering HEP Data in Object Databases

Clustering and Reclustering HEP Data in Object Databases Clustering and Reclustering HEP Data in Object Databases Koen Holtman CERN EP division CH - Geneva 3, Switzerland We formulate principles for the clustering of data, applicable to both sequential HEP applications

More information

Atlas event data model optimization studies based on the use of segmented VArray in Objectivity/DB.

Atlas event data model optimization studies based on the use of segmented VArray in Objectivity/DB. Atlas event data model optimization studies based on the use of segmented VArray in Objectivity/DB. S. Rolli, A. Salvadori 2,A.C. Schaffer 2 3,M. Schaller 2 4 Tufts University, Medford, MA, USA 2 CERN,

More information

The BaBar Computing Model *

The BaBar Computing Model * SLAC PUB 9964 April 1997 The BaBar Computing Model * N. Geddes Rutherford Appleton Laboratory, Chilton, Didcot, England OX11 0QX Representing the BaBar Collaboration Abstract The BaBar experiment will

More information

The BABAR Database: Challenges, Trends and Projections

The BABAR Database: Challenges, Trends and Projections SLAC-PUB-9179 September 2001 The BABAR Database: Challenges, Trends and Projections I. Gaponenko 1, A. Mokhtarani 1, S. Patton 1, D. Quarrie 1, A. Adesanya 2, J. Becla 2, A. Hanushevsky 2, A. Hasan 2,

More information

High-Energy Physics Data-Storage Challenges

High-Energy Physics Data-Storage Challenges High-Energy Physics Data-Storage Challenges Richard P. Mount SLAC SC2003 Experimental HENP Understanding the quantum world requires: Repeated measurement billions of collisions Large (500 2000 physicist)

More information

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers SLAC-PUB-9176 September 2001 Optimizing Parallel Access to the BaBar Database System Using CORBA Servers Jacek Becla 1, Igor Gaponenko 2 1 Stanford Linear Accelerator Center Stanford University, Stanford,

More information

INTRODUCTION TO THE ANAPHE/LHC++ SOFTWARE SUITE

INTRODUCTION TO THE ANAPHE/LHC++ SOFTWARE SUITE INTRODUCTION TO THE ANAPHE/LHC++ SOFTWARE SUITE Andreas Pfeiffer CERN, Geneva, Switzerland Abstract The Anaphe/LHC++ project is an ongoing effort to provide an Object-Oriented software environment for

More information

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010 Worldwide Production Distributed Data Management at the LHC Brian Bockelman MSST 2010, 4 May 2010 At the LHC http://op-webtools.web.cern.ch/opwebtools/vistar/vistars.php?usr=lhc1 Gratuitous detector pictures:

More information

A Prototype of the CMS Object Oriented Reconstruction and Analysis Framework for the Beam Test Data

A Prototype of the CMS Object Oriented Reconstruction and Analysis Framework for the Beam Test Data Prototype of the CMS Object Oriented Reconstruction and nalysis Framework for the Beam Test Data CMS Collaboration presented by Lucia Silvestris CERN, Geneve, Suisse and INFN, Bari, Italy bstract. CMS

More information

Existing Tools in HEP and Particle Astrophysics

Existing Tools in HEP and Particle Astrophysics Existing Tools in HEP and Particle Astrophysics Richard Dubois richard@slac.stanford.edu R.Dubois Existing Tools in HEP and Particle Astro 1/20 Outline Introduction: Fermi as example user Analysis Toolkits:

More information

Multi-threaded, discrete event simulation of distributed computing systems

Multi-threaded, discrete event simulation of distributed computing systems Multi-threaded, discrete event simulation of distributed computing systems Iosif C. Legrand California Institute of Technology, Pasadena, CA, U.S.A Abstract The LHC experiments have envisaged computing

More information

The COMPASS Event Store in 2002

The COMPASS Event Store in 2002 The COMPASS Event Store in 2002 V. Duic INFN, Trieste, Italy M. Lamanna CERN, Switzerland and INFN, Trieste, Italy COMPASS, the fixed-target experiment at CERN studying the structure of the nucleon and

More information

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator Computing DOE Program Review SLAC Breakout Session 3 June 2004 Rainer Bartoldus BaBar Deputy Computing Coordinator 1 Outline The New Computing Model (CM2) New Kanga/ROOT event store, new Analysis Model,

More information

A Geometrical Modeller for HEP

A Geometrical Modeller for HEP A Geometrical Modeller for HEP R. Brun, A. Gheata CERN, CH 1211, Geneva 23, Switzerland M. Gheata ISS, RO 76900, Bucharest MG23, Romania For ALICE off-line collaboration Geometrical modelling generally

More information

Automated load balancing in the ATLAS high-performance storage software

Automated load balancing in the ATLAS high-performance storage software Automated load balancing in the ATLAS high-performance storage software Fabrice Le Go 1 Wainer Vandelli 1 On behalf of the ATLAS Collaboration 1 CERN May 25th, 2017 The ATLAS Experiment 3 / 20 ATLAS Trigger

More information

LHCb Computing Resources: 2018 requests and preview of 2019 requests

LHCb Computing Resources: 2018 requests and preview of 2019 requests LHCb Computing Resources: 2018 requests and preview of 2019 requests LHCb-PUB-2017-009 23/02/2017 LHCb Public Note Issue: 0 Revision: 0 Reference: LHCb-PUB-2017-009 Created: 23 rd February 2017 Last modified:

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection

More information

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Data oriented job submission scheme for the PHENIX user analysis in CCJ Journal of Physics: Conference Series Data oriented job submission scheme for the PHENIX user analysis in CCJ To cite this article: T Nakamura et al 2011 J. Phys.: Conf. Ser. 331 072025 Related content

More information

Data preservation for the HERA experiments at DESY using dcache technology

Data preservation for the HERA experiments at DESY using dcache technology Journal of Physics: Conference Series PAPER OPEN ACCESS Data preservation for the HERA experiments at DESY using dcache technology To cite this article: Dirk Krücker et al 2015 J. Phys.: Conf. Ser. 66

More information

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk Challenges and Evolution of the LHC Production Grid April 13, 2011 Ian Fisk 1 Evolution Uni x ALICE Remote Access PD2P/ Popularity Tier-2 Tier-2 Uni u Open Lab m Tier-2 Science Uni x Grid Uni z USA Tier-2

More information

DATABASE SCALABILITY AND CLUSTERING

DATABASE SCALABILITY AND CLUSTERING WHITE PAPER DATABASE SCALABILITY AND CLUSTERING As application architectures become increasingly dependent on distributed communication and processing, it is extremely important to understand where the

More information

BUILDING PETABYTE DATABASES WITH OBJECTIVITY/DB

BUILDING PETABYTE DATABASES WITH OBJECTIVITY/DB BUILDING PETABYTE DATABASES WITH OBJECTIVITY/DB Leon Guzenda Objectivity, Inc., Mountain View, California, USA. Abstract Objectivity, Inc. has been working with the CERN RD45 Project for several years

More information

Stitched Together: Transitioning CMS to a Hierarchical Threaded Framework

Stitched Together: Transitioning CMS to a Hierarchical Threaded Framework Stitched Together: Transitioning CMS to a Hierarchical Threaded Framework CD Jones and E Sexton-Kennedy Fermilab, P.O.Box 500, Batavia, IL 60510-5011, USA E-mail: cdj@fnal.gov, sexton@fnal.gov Abstract.

More information

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Data oriented job submission scheme for the PHENIX user analysis in CCJ Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En yo, Takashi Ichihara, Yasushi Watanabe and Satoshi Yokkaichi RIKEN Nishina Center for Accelerator-Based

More information

Massive-Scale Data Management using Standards-Based Solutions

Massive-Scale Data Management using Standards-Based Solutions Massive-Scale Data Management using Standards-Based Solutions Abstract Jamie Shiers CERN Geneva Switzerland In common with many large institutes, CERN has traditionally developed and maintained its own

More information

Summary of the LHC Computing Review

Summary of the LHC Computing Review Summary of the LHC Computing Review http://lhc-computing-review-public.web.cern.ch John Harvey CERN/EP May 10 th, 2001 LHCb Collaboration Meeting The Scale Data taking rate : 50,100, 200 Hz (ALICE, ATLAS-CMS,

More information

Table 9. ASCI Data Storage Requirements

Table 9. ASCI Data Storage Requirements Table 9. ASCI Data Storage Requirements 1998 1999 2000 2001 2002 2003 2004 ASCI memory (TB) Storage Growth / Year (PB) Total Storage Capacity (PB) Single File Xfr Rate (GB/sec).44 4 1.5 4.5 8.9 15. 8 28

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

Belle & Belle II. Takanori Hara (KEK) 9 June, 2015 DPHEP Collaboration CERN

Belle & Belle II. Takanori Hara (KEK) 9 June, 2015 DPHEP Collaboration CERN 1 Belle & Belle II Takanori Hara (KEK) takanori.hara@kek.jp 9 June, 2015 DPHEP Collaboration Workshop @ CERN 2 Belle Data Belle : started in 1999, data-taking completed in 2010 still keep analysing the

More information

Andrea Sciabà CERN, Switzerland

Andrea Sciabà CERN, Switzerland Frascati Physics Series Vol. VVVVVV (xxxx), pp. 000-000 XX Conference Location, Date-start - Date-end, Year THE LHC COMPUTING GRID Andrea Sciabà CERN, Switzerland Abstract The LHC experiments will start

More information

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Chapter 10: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space

More information

Dept. Of Computer Science, Colorado State University

Dept. Of Computer Science, Colorado State University CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [HADOOP/HDFS] Trying to have your cake and eat it too Each phase pines for tasks with locality and their numbers on a tether Alas within a phase, you get one,

More information

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX Towards an Adaptive Distributed Shared Memory (Preliminary Version ) Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3 E-mail: fjhkim,vaidyag@cs.tamu.edu

More information

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF Conference 2017 The Data Challenges of the LHC Reda Tafirout, TRIUMF Outline LHC Science goals, tools and data Worldwide LHC Computing Grid Collaboration & Scale Key challenges Networking ATLAS experiment

More information

NetVault Backup Client and Server Sizing Guide 3.0

NetVault Backup Client and Server Sizing Guide 3.0 NetVault Backup Client and Server Sizing Guide 3.0 Recommended hardware and storage configurations for NetVault Backup 12.x September 2018 Page 1 Table of Contents 1. Abstract... 3 2. Introduction... 3

More information

Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Chapter 10: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space

More information

Storage and I/O requirements of the LHC experiments

Storage and I/O requirements of the LHC experiments Storage and I/O requirements of the LHC experiments Sverre Jarp CERN openlab, IT Dept where the Web was born 22 June 2006 OpenFabrics Workshop, Paris 1 Briefly about CERN 22 June 2006 OpenFabrics Workshop,

More information

PROOF-Condor integration for ATLAS

PROOF-Condor integration for ATLAS PROOF-Condor integration for ATLAS G. Ganis,, J. Iwaszkiewicz, F. Rademakers CERN / PH-SFT M. Livny, B. Mellado, Neng Xu,, Sau Lan Wu University Of Wisconsin Condor Week, Madison, 29 Apr 2 May 2008 Outline

More information

Operating System Performance and Large Servers 1

Operating System Performance and Large Servers 1 Operating System Performance and Large Servers 1 Hyuck Yoo and Keng-Tai Ko Sun Microsystems, Inc. Mountain View, CA 94043 Abstract Servers are an essential part of today's computing environments. High

More information

Operational Aspects of Dealing with the Large BaBar Data Set

Operational Aspects of Dealing with the Large BaBar Data Set Operational Aspects of Dealing with the Large BaBar Data Set Tofigh Azemoon, Adil Hasan, Wilko Kröger, Artem Trunov SLAC Computing Services, Stanford, CA 94025, USA On Behalf of the BaBar Computing Group

More information

Cloud Computing CS

Cloud Computing CS Cloud Computing CS 15-319 Distributed File Systems and Cloud Storage Part I Lecture 12, Feb 22, 2012 Majd F. Sakr, Mohammad Hammoud and Suhail Rehman 1 Today Last two sessions Pregel, Dryad and GraphLab

More information

The GAP project: GPU applications for High Level Trigger and Medical Imaging

The GAP project: GPU applications for High Level Trigger and Medical Imaging The GAP project: GPU applications for High Level Trigger and Medical Imaging Matteo Bauce 1,2, Andrea Messina 1,2,3, Marco Rescigno 3, Stefano Giagu 1,3, Gianluca Lamanna 4,6, Massimiliano Fiorini 5 1

More information

Chapter 11: File System Implementation. Objectives

Chapter 11: File System Implementation. Objectives Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block

More information

Future File System: An Evaluation

Future File System: An Evaluation Future System: An Evaluation Brian Gaffey and Daniel J. Messer, Cray Research, Inc., Eagan, Minnesota, USA ABSTRACT: Cray Research s file system, NC1, is based on an early System V technology. Cray has

More information

LHCb Distributed Conditions Database

LHCb Distributed Conditions Database LHCb Distributed Conditions Database Marco Clemencic E-mail: marco.clemencic@cern.ch Abstract. The LHCb Conditions Database project provides the necessary tools to handle non-event time-varying data. The

More information

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication CDS and Sky Tech Brief Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication Actifio recommends using Dedup-Async Replication (DAR) for RPO of 4 hours or more and using StreamSnap for

More information

CA485 Ray Walshe Google File System

CA485 Ray Walshe Google File System Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage

More information

Database Systems II. Secondary Storage

Database Systems II. Secondary Storage Database Systems II Secondary Storage CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM

More information

Frank Miller, George Apostolopoulos, and Satish Tripathi. University of Maryland. College Park, MD ffwmiller, georgeap,

Frank Miller, George Apostolopoulos, and Satish Tripathi. University of Maryland. College Park, MD ffwmiller, georgeap, Simple Input/Output Streaming in the Operating System Frank Miller, George Apostolopoulos, and Satish Tripathi Mobile Computing and Multimedia Laboratory Department of Computer Science University of Maryland

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2008 Quiz II There are 14 questions and 11 pages in this quiz booklet. To receive

More information

Data Transfers Between LHC Grid Sites Dorian Kcira

Data Transfers Between LHC Grid Sites Dorian Kcira Data Transfers Between LHC Grid Sites Dorian Kcira dkcira@caltech.edu Caltech High Energy Physics Group hep.caltech.edu/cms CERN Site: LHC and the Experiments Large Hadron Collider 27 km circumference

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Monitor Qlik Sense sites. Qlik Sense Copyright QlikTech International AB. All rights reserved.

Monitor Qlik Sense sites. Qlik Sense Copyright QlikTech International AB. All rights reserved. Monitor Qlik Sense sites Qlik Sense 2.1.2 Copyright 1993-2015 QlikTech International AB. All rights reserved. Copyright 1993-2015 QlikTech International AB. All rights reserved. Qlik, QlikTech, Qlik Sense,

More information

V. File System. SGG9: chapter 11. Files, directories, sharing FS layers, partitions, allocations, free space. TDIU11: Operating Systems

V. File System. SGG9: chapter 11. Files, directories, sharing FS layers, partitions, allocations, free space. TDIU11: Operating Systems V. File System SGG9: chapter 11 Files, directories, sharing FS layers, partitions, allocations, free space TDIU11: Operating Systems Ahmed Rezine, Linköping University Copyright Notice: The lecture notes

More information

File Server Comparison: Executive Summary. Microsoft Windows NT Server 4.0 and Novell NetWare 5. Contents

File Server Comparison: Executive Summary. Microsoft Windows NT Server 4.0 and Novell NetWare 5. Contents File Server Comparison: Microsoft Windows NT Server 4.0 and Novell NetWare 5 Contents Executive Summary Updated: October 7, 1998 (PDF version 240 KB) Executive Summary Performance Analysis Price/Performance

More information

Transparent Access to Legacy Data in Java. Olivier Gruber. IBM Almaden Research Center. San Jose, CA Abstract

Transparent Access to Legacy Data in Java. Olivier Gruber. IBM Almaden Research Center. San Jose, CA Abstract Transparent Access to Legacy Data in Java Olivier Gruber IBM Almaden Research Center San Jose, CA 95120 Abstract We propose in this paper an extension to PJava in order to provide a transparent access

More information

Virtualizing a Batch. University Grid Center

Virtualizing a Batch. University Grid Center Virtualizing a Batch Queuing System at a University Grid Center Volker Büge (1,2), Yves Kemp (1), Günter Quast (1), Oliver Oberst (1), Marcel Kunze (2) (1) University of Karlsruhe (2) Forschungszentrum

More information

2 Databases for calibration and bookkeeping purposes

2 Databases for calibration and bookkeeping purposes Databases for High Energy Physics D. Baden University of Maryland. B. Linder ORACLE corporation R. Mount Califomia Institute of Technology J. Shiers CERN, Geneva, Switzerland. This paper will examine the

More information

Virtual Memory. Chapter 8

Virtual Memory. Chapter 8 Chapter 8 Virtual Memory What are common with paging and segmentation are that all memory addresses within a process are logical ones that can be dynamically translated into physical addresses at run time.

More information

Unit 2 : Computer and Operating System Structure

Unit 2 : Computer and Operating System Structure Unit 2 : Computer and Operating System Structure Lesson 1 : Interrupts and I/O Structure 1.1. Learning Objectives On completion of this lesson you will know : what interrupt is the causes of occurring

More information

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL Jun Sun, Yasushi Shinjo and Kozo Itano Institute of Information Sciences and Electronics University of Tsukuba Tsukuba,

More information

Network Bandwidth & Minimum Efficient Problem Size

Network Bandwidth & Minimum Efficient Problem Size Network Bandwidth & Minimum Efficient Problem Size Paul R. Woodward Laboratory for Computational Science & Engineering (LCSE), University of Minnesota April 21, 2004 Build 3 virtual computers with Intel

More information

On BigFix Performance: Disk is King. How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services

On BigFix Performance: Disk is King. How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services On BigFix Performance: Disk is King How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services Authors: Shaun T. Kelley, Mark Leitch Abstract: Rolling out large

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

EventStore A Data Management System

EventStore A Data Management System EventStore A Data Management System Valentin Kuznetsov with Chris Jones, Dan Riley, Gregory Sharp Cornell University CLEO-c The CLEO-c experiment started in 2003 main physics topics are precise studies

More information

IBM InfoSphere Streams v4.0 Performance Best Practices

IBM InfoSphere Streams v4.0 Performance Best Practices Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related

More information

V. Mass Storage Systems

V. Mass Storage Systems TDIU25: Operating Systems V. Mass Storage Systems SGG9: chapter 12 o Mass storage: Hard disks, structure, scheduling, RAID Copyright Notice: The lecture notes are mainly based on modifications of the slides

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

Datacenter replication solution with quasardb

Datacenter replication solution with quasardb Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION

More information

A L I C E Computing Model

A L I C E Computing Model CERN-LHCC-2004-038/G-086 04 February 2005 A L I C E Computing Model Computing Project Leader Offline Coordinator F. Carminati Y. Schutz (Editors on behalf of the ALICE Collaboration) i Foreword This document

More information

Parallel Pipeline STAP System

Parallel Pipeline STAP System I/O Implementation and Evaluation of Parallel Pipelined STAP on High Performance Computers Wei-keng Liao, Alok Choudhary, Donald Weiner, and Pramod Varshney EECS Department, Syracuse University, Syracuse,

More information

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Evaluation of the computing resources required for a Nordic research exploitation of the LHC PROCEEDINGS Evaluation of the computing resources required for a Nordic research exploitation of the LHC and Sverker Almehed, Chafik Driouichi, Paula Eerola, Ulf Mjörnmark, Oxana Smirnova,TorstenÅkesson

More information

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Operating Systems Lecture 7.2 - File system implementation Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Design FAT or indexed allocation? UFS, FFS & Ext2 Journaling with Ext3

More information

Computing at Belle II

Computing at Belle II Computing at Belle II CHEP 22.05.2012 Takanori Hara for the Belle II Computing Group Physics Objective of Belle and Belle II Confirmation of KM mechanism of CP in the Standard Model CP in the SM too small

More information

Geant4 Computing Performance Benchmarking and Monitoring

Geant4 Computing Performance Benchmarking and Monitoring Journal of Physics: Conference Series PAPER OPEN ACCESS Geant4 Computing Performance Benchmarking and Monitoring To cite this article: Andrea Dotti et al 2015 J. Phys.: Conf. Ser. 664 062021 View the article

More information

An Introduction to Software Architecture. David Garlan & Mary Shaw 94

An Introduction to Software Architecture. David Garlan & Mary Shaw 94 An Introduction to Software Architecture David Garlan & Mary Shaw 94 Motivation Motivation An increase in (system) size and complexity structural issues communication (type, protocol) synchronization data

More information

Technical Paper. Performance and Tuning Considerations for SAS on Dell EMC VMAX 250 All-Flash Array

Technical Paper. Performance and Tuning Considerations for SAS on Dell EMC VMAX 250 All-Flash Array Technical Paper Performance and Tuning Considerations for SAS on Dell EMC VMAX 250 All-Flash Array Release Information Content Version: 1.0 April 2018 Trademarks and Patents SAS Institute Inc., SAS Campus

More information

Technology Insight Series

Technology Insight Series IBM ProtecTIER Deduplication for z/os John Webster March 04, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved. Announcement Summary The many data

More information

NetVault Backup Client and Server Sizing Guide 2.1

NetVault Backup Client and Server Sizing Guide 2.1 NetVault Backup Client and Server Sizing Guide 2.1 Recommended hardware and storage configurations for NetVault Backup 10.x and 11.x September, 2017 Page 1 Table of Contents 1. Abstract... 3 2. Introduction...

More information

Deduplication File System & Course Review

Deduplication File System & Course Review Deduplication File System & Course Review Kai Li 12/13/13 Topics u Deduplication File System u Review 12/13/13 2 Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients

More information

CERN and Scientific Computing

CERN and Scientific Computing CERN and Scientific Computing Massimo Lamanna CERN Information Technology Department Experiment Support Group 1960: 26 GeV proton in the 32 cm CERN hydrogen bubble chamber 1960: IBM 709 at the Geneva airport

More information

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide V7 Unified Asynchronous Replication Performance Reference Guide IBM V7 Unified R1.4.2 Asynchronous Replication Performance Reference Guide Document Version 1. SONAS / V7 Unified Asynchronous Replication

More information

LHC-B. 60 silicon vertex detector elements. (strips not to scale) [cm] [cm] = 1265 strips

LHC-B. 60 silicon vertex detector elements. (strips not to scale) [cm] [cm] = 1265 strips LHCb 97-020, TRAC November 25 1997 Comparison of analogue and binary read-out in the silicon strips vertex detector of LHCb. P. Koppenburg 1 Institut de Physique Nucleaire, Universite de Lausanne Abstract

More information

Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System

Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System Meenakshi Arunachalam Alok Choudhary Brad Rullman y ECE and CIS Link Hall Syracuse University Syracuse, NY 344 E-mail:

More information

The CMS data quality monitoring software: experience and future prospects

The CMS data quality monitoring software: experience and future prospects The CMS data quality monitoring software: experience and future prospects Federico De Guio on behalf of the CMS Collaboration CERN, Geneva, Switzerland E-mail: federico.de.guio@cern.ch Abstract. The Data

More information

5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 485.e1

5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 485.e1 5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 485.e1 5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks Amdahl s law in Chapter 1 reminds us that

More information

An SQL-based approach to physics analysis

An SQL-based approach to physics analysis Journal of Physics: Conference Series OPEN ACCESS An SQL-based approach to physics analysis To cite this article: Dr Maaike Limper 2014 J. Phys.: Conf. Ser. 513 022022 View the article online for updates

More information

RealDB: Low-Overhead Database for Time-Sequenced Data Streams in Embedded Systems

RealDB: Low-Overhead Database for Time-Sequenced Data Streams in Embedded Systems RealDB: Low-Overhead Database for Time-Sequenced Data Streams in Embedded Systems Project Report Submitted to the Faculty of the Rochester Institute of Technology, Computer Science Department In partial

More information

Specifying Storage Servers for IP security applications

Specifying Storage Servers for IP security applications Specifying Storage Servers for IP security applications The migration of security systems from analogue to digital IP based solutions has created a large demand for storage servers high performance PCs

More information

Mass-Storage Structure

Mass-Storage Structure Operating Systems (Fall/Winter 2018) Mass-Storage Structure Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review On-disk structure

More information

File Management. Chapter 12

File Management. Chapter 12 File Management Chapter 12 Files Used for: input to a program Program output saved for long-term storage Terms Used with Files Field basic element of data contains a single value characterized by its length

More information

Multi-version Data recovery for Cluster Identifier Forensics Filesystem with Identifier Integrity

Multi-version Data recovery for Cluster Identifier Forensics Filesystem with Identifier Integrity Multi-version Data recovery for Cluster Identifier Forensics Filesystem with Identifier Integrity Mohammed Alhussein, Duminda Wijesekera Department of Computer Science George Mason University Fairfax,

More information

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data D. Barberis 1*, J. Cranshaw 2, G. Dimitrov 3, A. Favareto 1, Á. Fernández Casaní 4, S. González de la Hoz 4, J.

More information

ALICE ANALYSIS PRESERVATION. Mihaela Gheata DASPOS/DPHEP7 workshop

ALICE ANALYSIS PRESERVATION. Mihaela Gheata DASPOS/DPHEP7 workshop 1 ALICE ANALYSIS PRESERVATION Mihaela Gheata DASPOS/DPHEP7 workshop 2 Outline ALICE data flow ALICE analysis Data & software preservation Open access and sharing analysis tools Conclusions 3 ALICE data

More information

Contents Overview of the Compression Server White Paper... 5 Business Problem... 7

Contents Overview of the Compression Server White Paper... 5 Business Problem... 7 P6 Professional Compression Server White Paper for On-Premises Version 17 July 2017 Contents Overview of the Compression Server White Paper... 5 Business Problem... 7 P6 Compression Server vs. Citrix...

More information

Memory - Paging. Copyright : University of Illinois CS 241 Staff 1

Memory - Paging. Copyright : University of Illinois CS 241 Staff 1 Memory - Paging Copyright : University of Illinois CS 241 Staff 1 Physical Frame Allocation How do we allocate physical memory across multiple processes? What if Process A needs to evict a page from Process

More information

Technical Documentation Version 7.4. Performance

Technical Documentation Version 7.4. Performance Technical Documentation Version 7.4 These documents are copyrighted by the Regents of the University of Colorado. No part of this document may be reproduced, stored in a retrieval system, or transmitted

More information

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Objectives To describe the physical structure of secondary storage devices and its effects on the uses of the devices To explain the

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

OS-caused Long JVM Pauses - Deep Dive and Solutions

OS-caused Long JVM Pauses - Deep Dive and Solutions OS-caused Long JVM Pauses - Deep Dive and Solutions Zhenyun Zhuang LinkedIn Corp., Mountain View, California, USA https://www.linkedin.com/in/zhenyun Zhenyun@gmail.com 2016-4-21 Outline q Introduction

More information