A Design of an Experiment to Model Data Base

Size: px
Start display at page:

Download "A Design of an Experiment to Model Data Base"

Transcription

1 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-2, NO. 2, JUNE 1976 A Design of an Experiment to Model Data Base System Performance 97 SAKTI P. GHOSH, SENIOR MEMBER, IEEE, AND WILLIAM G. TUEL, JR., MEMBER, IEEE Abstract-A statistical design of an experiment for modeling the performance (access time) of a data base (DB) system in a controlled environment has been outlined. A three-factor pilot experiment with retrieval sequence, logical view (combined with access method), and target segment type as factors was designed and the results analyzed for a specific IMS batch DB. It was found that: 1) the variation of sequences of DB retrieval calls in the application program did not have any signifilcant effect on the access time, whereas the other two factors did have significant effect; 2) the variability in access time is not completely explained by these three factors; and 3) the distribution of the residual error is '"nonnonnal," with a large positive skew. Index Terms-Data bases, data management systems, design of experiment, design technique, file organization, information management system, statistical. I. INTRODUCTION A DATA BASE SYSTEM is a software system whose main objective is to provide a convenient way to access and modify large amounts of stored data. It provides an environment in which a user can represent and view information of the real world with complex logical relations. A practical data base (DB) can contain millions of records; hence, efficiency or- performance of a DB system becomes very important to a user. To be able to design a DB system intelligently it is necessary to know the access time needed to retrieve a particular segment of the DB. The access time varies widely across different DB designs and a poor design can significantly impair the usefulness of the entire DB system. In order to be specific, we shall concentrate on IMS/360, a DB system available from IBM. There are many factors that can affect performance of DB calls in IMS. Some of them are: the user's view of the DB (also referred to as logical view), the relative position of the target segments in the logical view, the underlying access method used to support the logical view, the type of data language (DL) call statements executed, the size of the DB buffer pqol, the statistical properties of the DB, the memory management algorithm, the data set device allocations, the operating system and its parameters, the hardware configuration, etc. It is obvious that studying the performance of IMS is a difficult task because of the large number and unknown nature of the factors that will affect the access time of a particular call at a particular state of the computing system. A survey of the difficulties of the problems involved in understanding the performance of computers in general has been given by Grenander and Tsao [1]. Little attention has Manuscript received November 5, 1974; revised November 13, The authors are with the IBM Research Laboratory, San Jose, CA been paid to the particular case of DB systems, e.g., IMS. Most of the work up to now has been confined to studying performance through indirect factors like page faults, page swaps, etc. Some of the references on such work are given in the Reference section. Tsao, Comeau, and Margolin [21 and Tsao and Margolin [3] have used statistical factorial experiments to study factors affecting page swaps in a time-shared environment. No attempt has been made up to now to study the factors which directly affect performance of DB's in an isolated environment. In this paper we describe a statistical approach to modeling the performance of IMS in a controlled, batch mode environment. Access time has been taken as the measure of performance. A geographical DB using the data from the Bureau of Census DIME file [10] was organized under IMS Version 2.3. IMS was run on an IBM S/370 Model 145 under OS VS2 release 1.6. The data calls for IMS were embedded in a PL/I program and the elapsed time to execute a given sequence of calls was taken as access time. Section II gives a detailed description of the IMS environment. Section III describes the statistical design of a three-factor controlle-d factorial experiment. Section IV discusses the experimental results. The last section discusses the conclusions reached from the experiment. II. THE IMS ENVIRONMENT IMS views data as hierarchical structures both at the physical description level and at the logical description level. A physical DB description (DBD) can be defined for storing and retrieving data by one of the following access methods: the hierarchical sequential access methods (HSAM), the hierarchical index sequential access method (HISAM), the hierarchical direct access method (HDAM), and the hierarchical index direct access method (HIDAM). A brief description of these access methods is given in the Appendix; however, readers without a basic knowledge of hierarchical data structures and access methods are urged to consult [41 - [61. Since HSAM has limited direct access capability, our discussion considers only the latter three access methods. The Census Bureau DIME file contains three types of entities: branches (which may be street sections or geographical features), blocks (enclosed areas), and nodes (intersections). Each branch is identified by its full name, and if it is a street, by the lowest address on the left side of the street. Associated with each branch are the blocks on its left and right sides and the nodes at each end of the branch. Blocks and nodes are identified by unique numbers, and the corresponding IMS segments contain data specific to an entire block or to a node.

2 98 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, JUNE 1976 BRANCH DB. (HIDAM) LV1: (HIDAM) (HISAM) (HDAM) (HISAM) points to NODE data base LV2: // BLOCK (HISAM) base points to NODE data base ADDRESS (HIDAM) IZr/ z/tj BRTYPE (NS)A (HIDAM) I/11////(,X I n BRTYPE (ST) BRNAME (NS) BLOCK DB (HISAM) BRNAME (ST) BLOCK Key Name BLOCKDATA 7 I BLOCKKEY ILCDAA points to/ STBLOCK in BRANCHDB points to NSBLOCK in BRANCHDB ADDRESS (HIDAM) LBY"iPE INS) (HIDAM) BRTYPE (ST) NODE DB (HDAM) NODE Key Name- 1 p I t I NODEKEY NODEDATA points to 1{ points to STNODEin NSNODE in BRANCHDB BRANCHDB *Fig. 1. Three physical DBD. BRNAME (ST)l Fig. 2. Three logical views. (Note: In the figure if an access method is not specified for a segment type, then the segment is retrieved using the access method associated with its parent. In the experiment, the segment types which are- qualifled by the SSA's of the GU calls are shaded. A detailed explanation of the qualifications of the GU calls is discussed later in the paper.) From these thfree entities, three interrelated physical DBD's were constructed. An important factor affecting the access time is the access method underlying a physical DBD. In order to study the effect of the different access methods, each DBD (physical) was organized using a different access method and possible- logical relations were specified by means of pointers. They are diagrammatically shown in Fig. 1. An example is shown in the Appendix. A logical view may be identical to or contained within one physical view, or may be constructed from multiple physical views. Pointers (or key values) are used to connect multiple physical views to form a complex logical view.' Each logical view has associated with it all the three physical DBD's (based on the three access methods) but the root segme,nt of the different logical views have different access methods. Fig. 2 shows the three logical views-that are constructed-. The logical view used is one of the factors which. affect performance. In the experiment, the logical view along with the access method supporting it was used as a factor; henceforth the term "logical view" should be interpreted as that combination. Having established his logical view, a user issues one or more calls to retrieve, replace, insert, or delete an instance of a segment type in the IMS data base. These calls are referred to as DLI calls. There are nine DLI functions which are as follows: DLI Functions Get Unique Get Next Get Next within Parent Get Hold Unique Get Hold Next Get Hold Next within Parent Insert Delete Replace Codes GU GN GNP GHU GHN GHNP ISRT DLET REPL

3 GHOSH AND TUEL: MODELING DATA BASE SYSTEM PERFORMANCE 99 The first six function codes are called DLI retrieve calls because they are used to fetch data from the IMS data base. The GU function is used for retrieving a unique segment from the data base satisfying certain qualifications on the segment type and key field. The GN function is used for retrieving the next sequential segment from the data base satisfying certain qualifications. The GNP function is used for retrieving the next sequential segment associated with a segment at a lower level of the hierarchical structure satisfying certain qualifications. GHU, GHN, and GHNP functions are similar to GU, GN, and GNP functions, respectively, except that they are used before the execution of ISRT, DLET, and REPL. The ISRT function is used to insert segments, the DLET function is used for deleting segments from the data base, and the REPL function is used for replacing the data of a segment. The IMS organized DB is permanently stored on an auxiliary storage device; however to facilitate retrieval there is a DB buffer pool in the main memory for temporary storage of data blocks. When a DLI function is ussued, the DB buffer pool is first checked to see if the requested segment or pointer pointing to the requested segment is present in the DB buffer pool. If not, data blocks are fetched from auxiliary storage into the DB buffer pool using direct or symbolic pointers to locate the data. When the DB buffer pool is full, space is created by deleting data blocks. The DLI functions ISRT, DLET, and REPL modify segments in the buffer pool and leave them there. When the DB buffer pool is full or when the application program terminates, the modifications are written into the DB. Thus these functions are not completely executed during the DLI calls. The argument of a DLI call may contain one or more segment search arguments (SSA's). If an SSA contains the segment type and the key value of a segment to be referenced it is called a qualified SSA. If it contains only the segment type, then it is called unqualified. The details of specification or execution of unqualified or partly qualified SSA's are discussed in [41-[61. One measure of performance of an IMS organization is the access time associated with executing the DLI functions. From the point of view of IMS, an application program can be characterized by a sequence of DLI functions, which is one factor that should be considered. Hence the performance of an IMS organization associated with an application program can be measured by the sum of the access times associated with each of the DLI functions in the application program. The access time of each DLI function depends on the type of DLI function, SSA, logical view, and also on other factors. In the next section we shall focus on some of the significant factors that influence performance of the IMS DB outlined in this section. III. STATISTICAL DESIGN OF AN EXPERIMENT A statistical design of an experiment to study some of the important factors that affect the performance of the IMS DB described previously is outlined. Factors to be considered are chosen from among those that are controllable by the highlevel user; then an experiment is designed to test whether these factors and their interactions are significant in comparison to the uncontrolled errors (i.e., the variability due to uncontrolled factors). As there are many (interacting) factors affecting access time of IMS, it is appropriate to set up a factorial experiment to study performance. Logical view plays an important role in IMS. The user views the DB in the format provided by the logical view, while IMS processes the DB according to the structure (using the supporting access method) defining that logical view. Hence it is important to find out what effects changes in the logical views have on access time. We shall choose the logical view as one factor for the factorial experiment. In order to reduce the number of factors in the experiment it was decided to combine the access method and the logical view into one factor. This technique is widely used in statistical design of experiments and is called confounding (of access methods with logical views) (see [71 ). The three levels of this factor are the three different logical views. The three logical views are created from the three physical views shown in Fig. 1. The logical view L VI is shown in Fig. 2, is created from BRANCH DB, BLOCK DB, and NODE DB. It is structurally identical with the physical view of BRANCH DB. The logical view L V2 has as its root segment the BLOCK DB and the descendent segments are created from the BRANCH DR. The logical view L V3 has as its root segment the NODE DB and the descendent segments are created from the BRANCH DB. Both L V2 and L V3 have the same descendent segment types as shown in Fig. 2. The different access methods associated with the different segments of the three logical views are shown in Fig. 2. Application programs are also an important factor affecting performance of IMS because they contain the DLI functions. We shall consider application program as another factor for the factorial experiment. Characterization of application programs by DLI calls requires a little more insight into the retrieval process during the execution of a DLI call. In Section II we discussed the characterization of an application program using the sequence of DLj functions in it. Since the DLI functions, ISRT, DLET, and REPL may not be completely executed during the program execution, their characterization is difficult to measure without building special probes and monitoring during actual execution of the process. In many practical DB environments, ISRT, DLET, and REPL constitute a small percentage of the DLI calls, hence they are omitted from this experiment. The execution of a GHU is almost identical with that of GU with the only exception being that the retrieved segment is held for executing a DLET or REPL function. Similarly, GHN is equivalent to GN, and GHNP is equivalent to GNP. Hence for application program retrieval characterization it is considered sufficient to examine sequences of the three DLI calls GU, GN, and GNP. With one minor exception, execution of a GU call does not depend on the position in the DB established by the previous DLI call, whereas the GN and GNP calls do utilize the relative position established by the previous DLI call. Hence, it is sufficient to characterize application programs by sequences of DLI calls which begin with GU. Since each sequence of DLI calls starts with GU and there are only three distinct calls, there are only

4 100 three possible sets of DLI calls, namely GU-GN-GNP; GU-GN; and GU-GNP. Thus in our factorial experiment, the application program factor contains three levels. In order to allow for a little more variability in the access time between the sequences, we have chosen the three levels as API: AP2: AP3: GU(Sl) GN(S2) GNP(S3) GU(S4) GN(S5) GN(S6) GN(S77) GU(S8) GNP(Sg) GNP(SIO) GNP(S11) where the Si are the SSA's of the DLI calls. The third factor in the experiment is the set of segment types of the logical view referred to by the SSA. For the experiment described here, the SSA's for DLI calls of the same category qualify the same segment type but not necessarily the same value of the attribute. The attributes in the SSA's for GU are chosen as the key fields of the segment types with values chosen at random over the domain of existing key fields. The attributes and their values in the SSA for GN or GNP calls are unqualified, i.e., only the segment type is qualified for these two categories of DLI calls. Two levels of this factor are considered as follows. Level STI The SSA's of GU and GN calls specify the root segment type of the logical view, and the SSA of GNP calls specifies the last segment type of the hierarchical structure specified by the logical view. (The hierarchy is considered ordered top to bottom, left to right.) Level ST2 The SSA's of GU and GN calls specify the next-to-last segment type of the hierarchical structure specified by the logical view, and the SSA of GNP calls specifies the last segment type. (Note: the key of the segment sought is specified, not the keys of a path to the segment.) Typical combinations of the three factors are given in the Appendix. Other factors like the amount of storage -available, the allocation of data seis on devices, DB buffer pool sizes, operating system, hardware configuration, and contents of the DB are held fixed. The effect of the remaining factors such as the order of choosing the level combinations and the actual key values used in the SSA's, etc. are assumed to be averaged out by proper -randomization during the actual conduct of the experiment. As we are interested in studying the effects of the three factors and their interactions, a factorial design with a first factor with three levels, the second factor with three levels, and the third factor with two levels is considered reasonable. Eighteen level combinations = 3 X 3 X 2 are associated with this design. In the experiment the access time is measured for each of the 18 combinations of the levels of the three factors. The set of observations is repeated (replication) for a number of times with different sets of random keys to obtain high reliability in the estimates of the parameters of the model. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, JUNE 1976 Let tijkl = the observed access time (in milliseconds) when logical view is at level i, application program is at level j, and segment type factor is at level k, for the Ith replication where i = 1,25,3, j= 1,2, 3, k = 1, 2, I= 1, 2, --,r The linear model for the design and analysis of the experiment (7], [8] are assumed to be tijkl= / + tlvi + tapj + tstk + tlvxapii + tlvxstik + tapxstjk + tlvxapxstijk + tri + Cifil where,p - constant; tlvi = effect on the access time due to ith level of logical view; tlvxapij = effect on the access time due to the interaction between the ith level of logical view and jth level of application program; tlvxapxstiik = effect on the access time due to the interactions between the ith level of logical view, the jth level of application program, and the kth level of segment type; tri =effect on the access time due to the Ith replication, etc.; eijkl = uncontrolled random error with expectation E(e1kl) = 0 and variance E(e t)= a2. This model assumes that the effects of the different factors and their interactions on the access time are additive. The effect due to replication is really not a factor and has no interactions with the other factors. For details on the subject see [7] and [8]. The above parameters (e.g., tlvi, tri, etc.) are uniquely estimable when the following constraints are imposed: E tlvi ZtAPI= tstk trio i - j k I E tlvxapij E tapxstik = 0 for each j i k E2 tlvxapij = Z tj1xstik = O for each i J k E tljvxj) = E tapxst/k i i~~~~~~~~ Ej tjjj)(apxs7tijk = 0 for all j, k E tlvxapxstj-k = 0 for all i, k k tlvxapxstijk 0 for all i, j. j = 0 for each k When such constraints are imposed, the t's are no longer absolute access times but deviations around 0. These parame-

5 GHOSH AND TUEL: MODELING DATA BASE SYSTEM PERFORMANCE 101 Fig. 3. Distribution of residual errors. ters are estimated by least square methods and the observations are analyzed by using analysis of variance techniques. These are well-known techniques and are given in [7] and [8]. Analysis of variance gives an insight into the amount of variability that is generated by the different factors and their interactions. IV. EXPERIMENTAL RESULTS The DB environment described in Section II was used to run the pilot experiment. DLI calls were generated by a PL/I program which also performed the randomization of levels and selection of the keys at random. In order to provide controlled results, no other jobs were run concurrently. Onehundred thirty-three replications of the 18 combinations of levels were executed yielding 2394 observations of access time. Within each replication the 18 combinations were executed in a random sequence to reduce any bias that could arise due to the order in which the combinations were executed. The values of the attributes in the SSA of GU were chosen at random, i.e., uniformly over the actual key values represented for the segment type being retrieved. The access time for each combination of levels was measured in milliseconds of elasped time. The parameters of the linear model were estimated by the method of least squares ignoring the effects of replications. The residual error for each observation was then calculated. Its frequency distribution is shown in Fig. 3. It was found that the residual errors do not have a symmetric distribution. They have considerably less skewness on the negative side of the mode. Hence, Wilcoxon's sign rank test for means [1 1] was used to test the mean of the residual errors. The value of the Wilcoxon statistic for the 2394 residual errors was Since the Wilcoxon statistics deviate so much (more than the 99-percent level of significance which is 2.97) from zero, the conclusion is that the mean of the residual errors for the underlying model is not zero. This means that the model (without the replications as a factor) does not account for all the factors affecting access time of the given IMS data base. It is possible that there are some uncontrolled effects which have not been averaged out or that the underlying process is not linear. The variability of the data was then analyzed using analysis of variance technique; the results are presented in Table I. Since the errors do not have a normal distribution, the ratios of the mean sums of squares do not have the usual statistical distribution (also known as F distribution). But, because of the low skewness on the negative side of the errors, the distribution of the ratios will also reflect the same shape; hence any ratio which was significant under F distribution is also significant under this situation. Thus the variations due to the main effects of ST and L V, and the interaction between ST and L V are significantly larger than the variability due to the error at the usual 5-percent level of significance. The exact distribution of the ratio of two mean sums of squares when the denominator does not have a "normal" distribution is not known, thus the other ratios may or may not be insignificant. The ratios for the interactions between AP and LV, and also the triple-order interaction, are and 0.822, respectively. It is likely that they are insignificant at

6 102 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, JUNE 1976 TABLE I RESULTS OF ANALYSIS OF VARIANCE Source of Variation Degrees Mean Sums of Ratio of the of Freedon Squares MISS to Error tiss Replication x Main effect due to logical 6 views (LV) x Main effect due to application program (AP) x Main effect due to segment 6 type of logical view (ST) x Interaction bet LV and AP x Interaction bet LV and ST x Interaction bet AP and ST x Interaction bet LV, AP and ST x Error x 106 TABLE II TESTING THE DIFFERENCE BETWEEN LEVELS OF AP I Name of Value of Normalized the difference the difference difference Result of the test of significance at 5X AP1 - AP Insignificant AP1 - AP Insignificant AP2 - AP Insignificant the 5-percent level because the 5-percent level of significance for the F ratio for 2 and oo degrees of freedom is For testing the main effect of the level of application program some detailed analysis was performed. As the mean square of the error was estimated with 2244 degrees of freedom, this estimate of the variance of the error was considered to be fairly accurate. Thus, the difference between the means of any levels of the application program can be tested by the test for the difference between means. The normalized difference (i.e., difference divided by its standard deviation) between means has the "normal" distribution under the Central Limit Theorem, hence it can be tested by "normal" distribution. The results are given iv Table II. Thus the difference between any two levels of the application program was insignificant. Hence there is good reason to believe that the access time, in this experiment, does not depend on the sequence of the DLI calls. For making any reasonable conclusions regarding the interaction between AP and ST, further analysis is needed. There is a good possibility that the access time effect of application programs can be characterized by a linear function of the type a X (#of GU calls) +, X (#of GN calls) +y X (#of GNP calls) but this has not been explored. As the experiment requires sequential observation of the effects of the different level combinations, the state of the system may change between one replication and the next. In order to estimate this effect on the residual error the factor AP and its interactions were deleted and replication was treated as an additional factor in the model. The residual errors were analyzed again under the following model for the access time: where tiki = /1+ tlvi + tstk + tl STik + tri + Eiki i = 1,2, 3, k = 1,~2, 1= 1, 2,,399. The restrictions on this model are the same as before. Under this model the parameters are estimated and the errors recalculated. The new error distribution is shown in Fig. 4. The Wilcoxon statistic now has the value which is very close to the 1-percent level of significance. This implies that for this data ST, L V, and ST X L V are the important controllable factors that affect access time. Possibly the effect due to replication R also has some significant effect. Statistically, there is no reason to believe that the mean of the residual errors (after these factors are eliminated) differs from zero. Using the original model again, average access time for each combination of levels was calculated to test the difference between them. The critical difference calculated under the "normality" assumption (to.95 & v'27m3) was ms. This was used to test the differences. Table III gives the results. All combinations which are covered by one continuous line are not significantly different. In the table the first coordinate of the combination refers to the logical view, the

7 GHOSH AND TUEL: MODELING DATA BASE SYSTEM PERFORMANCE 103 Fig. 4. Distribution of residual errors with replication factor. TABLE ;III Combination Average access Critical of Levels time (ms.) difference line (3,2,1) (3,3,1) (3,1,1) (2,2,1) (2,1,1) (2,3,1) (1,3,1) (1,1,1) (1,2,1) (2,3,2) ,2,2) II (2,1,2) (3,3,2) II (3,2,2) (3,1,2) (1,1,2) (1,2,2) (1,3,2) Critical difference = ms. second to the retrieval sequence, and the third to the segment several DB design parameters. A factorial experiment was used type. to capture the interactions between the parameters. For this particular enivronment 1) the logical view/access method, V. CONCLUSIONS 2) the segment type retrieved, and 3) their interaction are In this paper we have outlined the design and analysis of a significant components which affect response time, whereas controlled experiment for studying the performance effects of the particular call sequence is not significant.

8 104 The residual error distribution is highly asymmetrical with positive skew. Hence standard hypothesis tests are not valid and new nonparametric methods are required. Further analysis is required to understand the reason behind such distributions. APPENDIX ACCESS METHODS An index sequential organization method (ISAM) showing the track indices are given in the following diagram. The blocks are records and the numbers in them represent keys. The first row represents the track indices. TI DT IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, JUNE 1976 available space in the OSAM. Suppose that location is the third record in the fifth track on the tenth cylinder. The new record is stored there. The pointer associated with the key 215 is now filled in to point to this location. Thus after the insertion the third track of the second cylinder appears as follows. I::::E> >3 LED The pointer associated with the newly inserted record in the location of the third record on the fifth track of the tenth cylinder points to the next record in the index sequential organization, i.e., the key 230. The inserted record appears as follows. Tl I T3 lls OT There are many other techniques for handling the overflow problem associated with index sequential organizations. ISAM has many good features, but there are some disadvantages associated with it. One of them is the handling of the overflow problem. Two access methods which combine ISAM with another access method known as the ordinary sequential access method or OSAM in a hierarchical manner are given here. A pointer points to the location of a record in a particular track on a particular cylinder. A pointer is symbolically written as to indicate that it is pointing to the location of the zth record in the yth track on the xth cylinder. One of the access methods which is based on an ISAM and an OSAM is HISAM. In HISAM the original records are organized as ISAM and the records which are added later (overflow records) are stored in an OSAM in the order that they arnrve. The overflow records are chained by pointers to their exact location in the index sequential arrangement so that they can be retrieved quickly. Suppose the keys of the records on the third track of the second cylinder of the ISAM part of a HISAM organization are 212, 215, and 230. Initially, these records are stored on that track with their blank pointers as follows. Another hierarchical access method which uses an ISAM organization and an OSAM organization in a hierarchical manner is HIDAM. HIDAM is used to create an index from a nonkey or key field. The index file is organized as a HISAM organization. The records themselves are stored in an OSAM organization. In the HISAM organization a record is now called an index record. It consists of the value of the attribute on which the indexing has been performed and a pointer which points to the first record in the OSAM organization relevant to the value of the attribute. The records relevant to the same value of the index are chained together in the OSAM organization. The insertions and deletions of indices in the HISAM are performed in exactly the same manner as described before. Similarly, the records can be deleted and inserted in the OSAM and then the pointers from the index file and/or the records in OSAM are changed accordingly. When the records are stored in OSAM but are addressed by a key to address transformation then the access method is referred to as HDAM, SAMPLE DATA BASE AND CALL SEQUENCES A few records from the DIME file are shown in Fig. 5. Some of the pointers providing logical relationships are exhibited. Three sample call sequences are listed below. Each level of each factor is represented. AP: level 1 ST: level 1 LV: level 1 GU GN GNP BRNAME (BRKEY= GUADELUPECREEK) BRNAME NSNODE returns data from node # Now suppose a new record with key 219 is to be added to the HISAM organization. The new record is stored in the first AP: level 2 ST: level 2 LV: level 3 access through NODE data base

9 GHOSH AND TUEL: MODELING DATA BASE SYSTEM PERFORMANCE 105 L'sy. J GUADELUPE CREE I1 wmi:* I L key pointer 4ROSS CRZ ai (BRNAIE) (BRTYPE) (pty) L Y key pointer pointer (ADDRESS) [~~~2130~j] Fq (ADDRESS) 5 3 BRlACt data base L F IR T1 (NSBLOCK) (NSOODE) Block Data Block Datea Block Data BLOCK data base Node Datea Node Data Node Data Node Data NODE data base Fig. 5. Sample data base records. AP: ST: LV: GU GN GN GN level 3 level 1 level 2 GU GNP GNP GNP BR TYPE(NS) (TYPEKEY = 0fifiggg) BR TYPE(NS) BR TYPEI(NS) BR TYPE(NS) returns 4 BR TYPE segments access through BLOCK data base BLOCK (BLOCKKEY= ) BRNAME (NS) BRNAME(NS) BRNAME(NS) returns 3 BRNAME segments which occur within the given block. The other 15 level combinations are constructed similarly. ACKNOWLEDGMENT The authors wish to thank M. C. Smyly, B. Krampetz, and D. B. Hildebrand of the IBM Research Laboratory, San Jose, CA, for their help during this work. REFERENCES [1] U. Grenander and R. F. Tsao, "Quantitative methods for evaluating computer system performance: A review and proposal," in Statistical Computer Performance Evaluation, W. Freiberger, Ed. New York: Academic, 1972, pp [2] R. F. Tsao, L. W. Comeau, and B. H. Margolin, "A multi-factor paging experiment: I-The experiment and conclusions," in Statistical Computer Performance Evaluation, W. Freiberger, Ed. New York: Academic, 1972, pp [3] R. F. Tsao and B. H. Margolin, "A multi-factor paging experiment II: Statistical methodology," in Statistical Computer Per- New York: Academic, formance Evaluation, W. Freiberger, Ed. 1972, pp [4] Information Management System Version II: IBM Education Guide No. ZR [5] Information Management System/360, Version 2 System Programming Reference Manual, IBM publ. SH [6] Information Management System/360, Version 2 System Application Design Guide, IBM publ. SH [7] 0. Kempthorne. The Design and Analysis of Experiments. Huntington, NY: R. E. Krieger, [8] W. G. Cochran and G. M. Cox, Experimental Design. New York: Wiley, [9] M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, vol. 2. New York: Hafner, [10] The DIME Geocoding System, Rep. 4, 1970 Census Use Study Series, Bureau of the Census, U.S. Commerce Dep., Washington, DC. [11] J. L. Hodges, Jr., and E. L. Lehman, Basic Concepts of Probability and Statistics. San Francisco, CA: Holden-Day. [12] J. Buzen, "Analysis of system bottlenecks using a queuing net-

10 106 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-2, NO. 2, JUNE 1976 work model," presented at the ACM SIGOPS Workshop System Performance Evaluation, ACM, New York, [13] J. E. Shemer and D. E. Heying, "Performance modeling and empirical measurements in a system designed for batch and time sharing users," Proc. AFIPS FJCC, vol. 35, pp , [141 S. Sherman, F. Baskett, 111, and J. C. Browne, "Trace driven modeling and analysis of CPU scheduling in a multiprogramming system," Comm. ACM, vol. 15, pp , [15] P. Denning, "The working set model for program behavior," Comm. ACM, vol. 5, pp , ing Guest Lecturer at many universities in the United States and abroad. Dr. Ghosh is a life member of the Calcutta Statistical Association. He is also a member of the ACM, American Statistical Association, and the Institute of Mathematical Statistics. Wiliam G. Tuel, Jr. (S'63-M'65) received the S 6 B.S.E.E., M.S.E.E., and Ph.D. degrees from the Rensselaer Polytechnic Institute, Troy, NY, in 1962, 1964, and 1965, respectively. His graduate work was supported by a NASA fellowship. He joined the Research Laboratory, IBM Corporation, San Jose, CA, in 1965 and was involved in the areas of automatic control theory and applications of computers to electrical power systems. From 1967 to 1968 he was assigned to the IBM Nordic Laboratory in Sakti P. Ghosh (M'69-SM'74) received the B.S. degree (Honors) and the M.Sc. degree in statis- tics from Calcutta University, Calcutta, India, in 1955 and 1957, respectively. He received the Ph.D. degree in statistics from the University of California, Berkeley, in He joined the IBM Research Division at the Thomas J. Watson Research Center in His contribution to the STORM System led to the x2 and F probability computation subroui I_ i tines in IBM Scientific Packages. He developed the Balanced Filing Schemes based on fmite geometrics and the consecutive retrieval properties for file organizations. He is currently a Research Staff Member with the Research Laboratory, IBM Corporation, San Jose, CA. He is working on data management techniques and performance. He has published more than 35 technical papers. He has held a teaching position at New York University and has served as Visit- Stockholm, Sweden, and studied the automation of machine tool monitoring. He received an Outstanding Contribution Award for this work. Currently, he is Manager of the Data Base System Characterization Project with the Systems Evaluation Department, Research Laboratory, IBM Corporation, San Jose, CA. This project involves the measurement and modeling of computer systems and workloads, particularly data base systems. Dr. Tuel is a member of Tau Beta Pi, Eta Kappa Nu, Sigma Xi, and the Association for Computing Machinery. Distributing a Data Base with Logical Associations on a Computer Network for Parallel Searching SAKTI P. GHOSH, Abstract-The problem of distributing a data base (with logical associations between segment types) on a computer network such that multiple segment types satisfying a query can be retrieved in paralel from different nodes has been introduced. Properties of such distributions without redundancy and with redundancy have been discussed. Lower bounds on the number of nodes needed for such distributions have been given. Algorithms for constructing such distributions have also been given. Distributions of data bases for queries whose target segments form a combinatorial set have been studied in detail. Closed form expressions for redundancy have been obtained for such query sets. Index Tenns-Algorithm of data bases, computer network, data bases, distributing data bases, multiple segment queries, parallel search, redundancy. I. INTRODUCTION C ONNECTING COMPUTERS or multiple terminals to computers started with reservation systems for airlines in the fifties. Many sophisticated computer networks C Manuscript received June 15, 1974;revised August 6, The author is with the IBM Research Laboratory, San Jose, CA SENIOR MEMBER, IEEE have been built since then by research groups. The most famous of them all is the computer network sponsored by the Advanced Research Projects Agency (ARPA). The ARPA network is a decentralized, heterogeneous network connecting about 32 (as of this time) computing sites throughout the United States. Much basic research has been done with data transmission, resource sharing, load leveling, data synchronization, etc. An excellent tutorial on the subject along with some important references has been provided by Merrill [3]. Most of the work on computer networks up to now has concentrated on the communication feasibility aspect of the network. Distributing data bases on computer networks is just beginning to surface. Merrill [3] does mention a few aspects of data distribution, viz., location of data/data descriptors, data access rationale, replication of data/data descriptors, partitioning of data, etc. These problems are being examined by computer systems experts, but very little basic research has been done towards understanding them. Casey [1] has investigated the probiem of optimum conditions for creating multiple copies of a file. None of the research up to now has been directed at

DL/1. - Application programs are independent from the physical storage and access method.

DL/1. - Application programs are independent from the physical storage and access method. DL/1 OVERVIEW The historical approach to data processing was to have individual files dedicated to each application. This led to considerable data duplication, and therefore wasted space and additional

More information

IMS DATABASE FOR MAINFRAME

IMS DATABASE FOR MAINFRAME IMS DATABASE FOR MAINFRAME Author: Saravanan Ramasamy, UST Global 2012 IMS DATABASE FOR MAINFRAME BOOK Date: 08 Mar, 2012 This Book provides Background of databases, Background of IMS databases, IMS database

More information

Chapter B: Hierarchical Model

Chapter B: Hierarchical Model Chapter B: Hierarchical Model Basic Concepts Tree-Structure Diagrams Data-Retrieval Facility Update Facility Virtual Records Mapping of Hierarchies to Files The IMS Database System B.1 Basic Concepts A

More information

ITERATIVE MULTI-LEVEL MODELLING - A METHODOLOGY FOR COMPUTER SYSTEM DESIGN. F. W. Zurcher B. Randell

ITERATIVE MULTI-LEVEL MODELLING - A METHODOLOGY FOR COMPUTER SYSTEM DESIGN. F. W. Zurcher B. Randell ITERATIVE MULTI-LEVEL MODELLING - A METHODOLOGY FOR COMPUTER SYSTEM DESIGN F. W. Zurcher B. Randell Thomas J. Watson Research Center Yorktown Heights, New York Abstract: The paper presents a method of

More information

A model for the evaluation of storage hierarchies

A model for the evaluation of storage hierarchies ~ The The design of the storage component is essential to the achieving of a good overall cost-performance balance in a computing system. A method is presented for quickly assessing many of the technological

More information

IMS/DB Introduction and Structure

IMS/DB Introduction and Structure and Structure Introduction 2 Before databases 3 Database Requirements 6 IMS objectives 7 IMS features 8 Converting from VSAM to IMS 10 How is the database created? 12 PCBs and PSBs 13 Database structuring

More information

Application Design and Programming with HALDB

Application Design and Programming with HALDB Application Design and Programming with HALDB Rich Lewis IBM IMS Advanced Technical Support??IBM Corporation 2003 High Availability Large Database (HALDB) extends IMS full function database capabilities.

More information

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS

ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS NURNABI MEHERUL ALAM M.Sc. (Agricultural Statistics), Roll No. I.A.S.R.I, Library Avenue, New Delhi- Chairperson: Dr. P.K. Batra Abstract: Block designs

More information

Lecture: Simulation. of Manufacturing Systems. Sivakumar AI. Simulation. SMA6304 M2 ---Factory Planning and scheduling. Simulation - A Predictive Tool

Lecture: Simulation. of Manufacturing Systems. Sivakumar AI. Simulation. SMA6304 M2 ---Factory Planning and scheduling. Simulation - A Predictive Tool SMA6304 M2 ---Factory Planning and scheduling Lecture Discrete Event of Manufacturing Systems Simulation Sivakumar AI Lecture: 12 copyright 2002 Sivakumar 1 Simulation Simulation - A Predictive Tool Next

More information

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

The Relationship between Slices and Module Cohesion

The Relationship between Slices and Module Cohesion The Relationship between Slices and Module Cohesion Linda M. Ott Jeffrey J. Thuss Department of Computer Science Michigan Technological University Houghton, MI 49931 Abstract High module cohesion is often

More information

H. W. Kuhn. Bryn Mawr College

H. W. Kuhn. Bryn Mawr College VARIANTS OF THE HUNGARIAN METHOD FOR ASSIGNMENT PROBLEMS' H. W. Kuhn Bryn Mawr College The author presents a geometrical modelwhich illuminates variants of the Hungarian method for the solution of the

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Using Statistics for Computing Joins with MapReduce

Using Statistics for Computing Joins with MapReduce Using Statistics for Computing Joins with MapReduce Theresa Csar 1, Reinhard Pichler 1, Emanuel Sallinger 1, and Vadim Savenkov 2 1 Vienna University of Technology {csar, pichler, sallinger}@dbaituwienacat

More information

A Note on Storage Fragmentation and Program Segmentation

A Note on Storage Fragmentation and Program Segmentation A Note on Storage Fragmentation and Program Segmentation B. RANDELL IBM Thomas J. Watson Research Center Yorktown Heights, New York The main purpose of this paper is the presentation of some of the results

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

The information management system IMS/VS Part I: General structure and operation

The information management system IMS/VS Part I: General structure and operation The.first of a jive-part series of papers on IMSIVS, this paper discusses the urchitecture, goals, and objectives of that information management system, the purpose of which is to facilitate Data BaselDatm

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

The NESTED Procedure (Chapter)

The NESTED Procedure (Chapter) SAS/STAT 9.3 User s Guide The NESTED Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation for the complete manual

More information

PASS Sample Size Software. Randomization Lists

PASS Sample Size Software. Randomization Lists Chapter 880 Introduction This module is used to create a randomization list for assigning subjects to one of up to eight groups or treatments. Six randomization algorithms are available. Four of the algorithms

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

Performance Evaluation of Virtualization and Non Virtualization on Different Workloads using DOE Methodology

Performance Evaluation of Virtualization and Non Virtualization on Different Workloads using DOE Methodology 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Performance Evaluation of Virtualization and Non Virtualization on Different Workloads

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

One Factor Experiments

One Factor Experiments One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal

More information

SMD149 - Operating Systems - File systems

SMD149 - Operating Systems - File systems SMD149 - Operating Systems - File systems Roland Parviainen November 21, 2005 1 / 59 Outline Overview Files, directories Data integrity Transaction based file systems 2 / 59 Files Overview Named collection

More information

VALIDATING AN ANALYTICAL APPROXIMATION THROUGH DISCRETE SIMULATION

VALIDATING AN ANALYTICAL APPROXIMATION THROUGH DISCRETE SIMULATION MATHEMATICAL MODELLING AND SCIENTIFIC COMPUTING, Vol. 8 (997) VALIDATING AN ANALYTICAL APPROXIMATION THROUGH DISCRETE ULATION Jehan-François Pâris Computer Science Department, University of Houston, Houston,

More information

Lecture 11 Databases. Slide 1

Lecture 11 Databases. Slide 1 Lecture 11 Databases Slide 1 Why Databases are useful Using flat files (ex. VSAM), each file is stored in a separate data set in sequential or indexed format. To retrieve data from the file, an application

More information

Verification of Multiple Agent Knowledge-based Systems

Verification of Multiple Agent Knowledge-based Systems Verification of Multiple Agent Knowledge-based Systems From: AAAI Technical Report WS-97-01. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Daniel E. O Leary University of Southern

More information

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 13.1 User s Guide. The NESTED Procedure SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

Response to API 1163 and Its Impact on Pipeline Integrity Management

Response to API 1163 and Its Impact on Pipeline Integrity Management ECNDT 2 - Tu.2.7.1 Response to API 3 and Its Impact on Pipeline Integrity Management Munendra S TOMAR, Martin FINGERHUT; RTD Quality Services, USA Abstract. Knowing the accuracy and reliability of ILI

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Transaction Management: Concurrency Control, part 2

Transaction Management: Concurrency Control, part 2 Transaction Management: Concurrency Control, part 2 CS634 Class 16 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Locking for B+ Trees Naïve solution Ignore tree structure,

More information

Locking for B+ Trees. Transaction Management: Concurrency Control, part 2. Locking for B+ Trees (contd.) Locking vs. Latching

Locking for B+ Trees. Transaction Management: Concurrency Control, part 2. Locking for B+ Trees (contd.) Locking vs. Latching Locking for B+ Trees Transaction Management: Concurrency Control, part 2 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke CS634 Class 16 Naïve solution Ignore tree structure,

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

DesignDirector Version 1.0(E)

DesignDirector Version 1.0(E) Statistical Design Support System DesignDirector Version 1.0(E) User s Guide NHK Spring Co.,Ltd. Copyright NHK Spring Co.,Ltd. 1999 All Rights Reserved. Copyright DesignDirector is registered trademarks

More information

PERSONAL communications service (PCS) provides

PERSONAL communications service (PCS) provides 646 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 5, OCTOBER 1997 Dynamic Hierarchical Database Architecture for Location Management in PCS Networks Joseph S. M. Ho, Member, IEEE, and Ian F. Akyildiz,

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

Introduction to Relational Databases. Introduction to Relational Databases cont: Introduction to Relational Databases cont: Relational Data structure

Introduction to Relational Databases. Introduction to Relational Databases cont: Introduction to Relational Databases cont: Relational Data structure Databases databases Terminology of relational model Properties of database relations. Relational Keys. Meaning of entity integrity and referential integrity. Purpose and advantages of views. The relational

More information

MASS Modified Assignment Algorithm in Facilities Layout Planning

MASS Modified Assignment Algorithm in Facilities Layout Planning International Journal of Tomography & Statistics (IJTS), June-July 2005, Vol. 3, No. JJ05, 19-29 ISSN 0972-9976; Copyright 2005 IJTS, ISDER MASS Modified Assignment Algorithm in Facilities Layout Planning

More information

1993 Paper 3 Question 6

1993 Paper 3 Question 6 993 Paper 3 Question 6 Describe the functionality you would expect to find in the file system directory service of a multi-user operating system. [0 marks] Describe two ways in which multiple names for

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the

8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13. This document is designed to help North Carolina educators

More information

Doctoral Studies and Research Proposition. Diversity in Peer-to-Peer Networks. Mikko Pervilä. Helsinki 24 November 2008 UNIVERSITY OF HELSINKI

Doctoral Studies and Research Proposition. Diversity in Peer-to-Peer Networks. Mikko Pervilä. Helsinki 24 November 2008 UNIVERSITY OF HELSINKI Doctoral Studies and Research Proposition Diversity in Peer-to-Peer Networks Mikko Pervilä Helsinki 24 November 2008 UNIVERSITY OF HELSINKI Department of Computer Science Supervisor: prof. Jussi Kangasharju

More information

CSC 553 Operating Systems

CSC 553 Operating Systems CSC 553 Operating Systems Lecture 12 - File Management Files Data collections created by users The File System is one of the most important parts of the OS to a user Desirable properties of files: Long-term

More information

Introduction to OS. File Management. MOS Ch. 4. Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1

Introduction to OS. File Management. MOS Ch. 4. Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1 Introduction to OS File Management MOS Ch. 4 Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Introduction to OS 1 File Management Objectives Provide I/O support for a variety of storage device

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Algebraic Graph Theory- Adjacency Matrix and Spectrum

Algebraic Graph Theory- Adjacency Matrix and Spectrum Algebraic Graph Theory- Adjacency Matrix and Spectrum Michael Levet December 24, 2013 Introduction This tutorial will introduce the adjacency matrix, as well as spectral graph theory. For those familiar

More information

LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS

LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS TIMOTHY L. VIS Abstract. A significant problem in finite optimization is the assignment problem. In essence, the assignment

More information

Bumptrees for Efficient Function, Constraint, and Classification Learning

Bumptrees for Efficient Function, Constraint, and Classification Learning umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A

More information

HEAPS ON HEAPS* Downloaded 02/04/13 to Redistribution subject to SIAM license or copyright; see

HEAPS ON HEAPS* Downloaded 02/04/13 to Redistribution subject to SIAM license or copyright; see SIAM J. COMPUT. Vol. 15, No. 4, November 1986 (C) 1986 Society for Industrial and Applied Mathematics OO6 HEAPS ON HEAPS* GASTON H. GONNET" AND J. IAN MUNRO," Abstract. As part of a study of the general

More information

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution

More information

Virtual Memory. Overview: Virtual Memory. Virtual address space of a process. Virtual Memory. Demand Paging

Virtual Memory. Overview: Virtual Memory. Virtual address space of a process. Virtual Memory. Demand Paging TDDB68 Concurrent programming and operating systems Overview: Virtual Memory Virtual Memory [SGG7/8] Chapter 9 Background Demand Paging Page Replacement Allocation of Frames Thrashing and Data Access Locality

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

The Mathematics of Banking and Finance By Dennis Cox and Michael Cox Copyright 2006 John Wiley & Sons Ltd

The Mathematics of Banking and Finance By Dennis Cox and Michael Cox Copyright 2006 John Wiley & Sons Ltd The Mathematics of Banking and Finance By Dennis Cox and Michael Cox Copyright 2006 John Wiley & Sons Ltd Less than ( ), less than or equal to ( ) Appendix 281 A symbol meaning smaller or less than, for

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

9/24/ Hash functions

9/24/ Hash functions 11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

Virtual Memory. Overview: Virtual Memory. Virtual address space of a process. Virtual Memory

Virtual Memory. Overview: Virtual Memory. Virtual address space of a process. Virtual Memory TDIU Operating systems Overview: Virtual Memory Virtual Memory Background Demand Paging Page Replacement Allocation of Frames Thrashing and Data Access Locality [SGG7/8/9] Chapter 9 Copyright Notice: The

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

ARELAY network consists of a pair of source and destination

ARELAY network consists of a pair of source and destination 158 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 55, NO 1, JANUARY 2009 Parity Forwarding for Multiple-Relay Networks Peyman Razaghi, Student Member, IEEE, Wei Yu, Senior Member, IEEE Abstract This paper

More information

File Management. Chapter 12

File Management. Chapter 12 File Management Chapter 12 Files Used for: input to a program Program output saved for long-term storage Terms Used with Files Field basic element of data contains a single value characterized by its length

More information

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset

More information

A Disk Head Scheduling Simulator

A Disk Head Scheduling Simulator A Disk Head Scheduling Simulator Steven Robbins Department of Computer Science University of Texas at San Antonio srobbins@cs.utsa.edu Abstract Disk head scheduling is a standard topic in undergraduate

More information

The strong chromatic number of a graph

The strong chromatic number of a graph The strong chromatic number of a graph Noga Alon Abstract It is shown that there is an absolute constant c with the following property: For any two graphs G 1 = (V, E 1 ) and G 2 = (V, E 2 ) on the same

More information

Network Working Group Request for Comments: 205 NIC: August 1971

Network Working Group Request for Comments: 205 NIC: August 1971 Network Working Group R. Braden Request for Comments: 205 UCLA/CCN NIC: 7172 6 August 1971 NETCRT - A CHARACTER DISPLAY PROTOCOL At the May NWG, meeting, CCN circulated dittoed copies of a proposed character-display

More information

Themes in the Texas CCRS - Mathematics

Themes in the Texas CCRS - Mathematics 1. Compare real numbers. a. Classify numbers as natural, whole, integers, rational, irrational, real, imaginary, &/or complex. b. Use and apply the relative magnitude of real numbers by using inequality

More information

Lecture notes on Transportation and Assignment Problem (BBE (H) QTM paper of Delhi University)

Lecture notes on Transportation and Assignment Problem (BBE (H) QTM paper of Delhi University) Transportation and Assignment Problems The transportation model is a special class of linear programs. It received this name because many of its applications involve determining how to optimally transport

More information

UNIT I. Introduction

UNIT I. Introduction UNIT I Introduction Objective To know the need for database system. To study about various data models. To understand the architecture of database system. To introduce Relational database system. Introduction

More information

Scheduling Real Time Parallel Structure on Cluster Computing with Possible Processor failures

Scheduling Real Time Parallel Structure on Cluster Computing with Possible Processor failures Scheduling Real Time Parallel Structure on Cluster Computing with Possible Processor failures Alaa Amin and Reda Ammar Computer Science and Eng. Dept. University of Connecticut Ayman El Dessouly Electronics

More information

Figure 2.1: A bipartite graph.

Figure 2.1: A bipartite graph. Matching problems The dance-class problem. A group of boys and girls, with just as many boys as girls, want to dance together; hence, they have to be matched in couples. Each boy prefers to dance with

More information

The Fibonacci hypercube

The Fibonacci hypercube AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 40 (2008), Pages 187 196 The Fibonacci hypercube Fred J. Rispoli Department of Mathematics and Computer Science Dowling College, Oakdale, NY 11769 U.S.A. Steven

More information

Mathematics and Computer Science

Mathematics and Computer Science Technical Report TR-2006-010 Revisiting hypergraph models for sparse matrix decomposition by Cevdet Aykanat, Bora Ucar Mathematics and Computer Science EMORY UNIVERSITY REVISITING HYPERGRAPH MODELS FOR

More information

A STRUCTURAL OPTIMIZATION METHODOLOGY USING THE INDEPENDENCE AXIOM

A STRUCTURAL OPTIMIZATION METHODOLOGY USING THE INDEPENDENCE AXIOM Proceedings of ICAD Cambridge, MA June -3, ICAD A STRUCTURAL OPTIMIZATION METHODOLOGY USING THE INDEPENDENCE AXIOM Kwang Won Lee leekw3@yahoo.com Research Center Daewoo Motor Company 99 Cheongchon-Dong

More information

LOGICAL OPERATOR USAGE IN STRUCTURAL MODELLING

LOGICAL OPERATOR USAGE IN STRUCTURAL MODELLING LOGICAL OPERATOR USAGE IN STRUCTURAL MODELLING Ieva Zeltmate (a) (a) Riga Technical University, Faculty of Computer Science and Information Technology Department of System Theory and Design ieva.zeltmate@gmail.com

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 1: Digital logic circuits The digital computer is a digital system that performs various computational tasks. Digital computers use the binary number system, which has two

More information

Architectural Styles. Software Architecture Lecture 5. Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved.

Architectural Styles. Software Architecture Lecture 5. Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Architectural Styles Software Architecture Lecture 5 Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Object-Oriented Style Components are objects Data and associated

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

A Modified Weibull Distribution

A Modified Weibull Distribution IEEE TRANSACTIONS ON RELIABILITY, VOL. 52, NO. 1, MARCH 2003 33 A Modified Weibull Distribution C. D. Lai, Min Xie, Senior Member, IEEE, D. N. P. Murthy, Member, IEEE Abstract A new lifetime distribution

More information

Maximizing IMS Database Availability

Maximizing IMS Database Availability Maximizing IMS Database Availability Rich Lewis IBM August 3, 2010 Session 7853 Agenda Why are databases unavailable We will discuss the reasons What can we do about it We will see how we can eliminate

More information

INTERLEAVING codewords is an important method for

INTERLEAVING codewords is an important method for IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 2, FEBRUARY 2005 597 Multicluster Interleaving on Paths Cycles Anxiao (Andrew) Jiang, Member, IEEE, Jehoshua Bruck, Fellow, IEEE Abstract Interleaving

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

An Evaluation of Deficit Round Robin Fair Queuing Applied in Router Congestion Control

An Evaluation of Deficit Round Robin Fair Queuing Applied in Router Congestion Control JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 18, 333-339 (2002) Short aper An Evaluation of Deficit Round Robin Fair ueuing Applied in Router Congestion Control Department of Electrical Engineering National

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

Quantitative Models for Performance Enhancement of Information Retrieval from Relational Databases

Quantitative Models for Performance Enhancement of Information Retrieval from Relational Databases Quantitative Models for Performance Enhancement of Information Retrieval from Relational Databases Jenna Estep Corvis Corporation, Columbia, MD 21046 Natarajan Gautam Harold and Inge Marcus Department

More information

6. Concluding Remarks

6. Concluding Remarks [8] K. J. Supowit, The relative neighborhood graph with an application to minimum spanning trees, Tech. Rept., Department of Computer Science, University of Illinois, Urbana-Champaign, August 1980, also

More information

Analysis of Two-Level Designs

Analysis of Two-Level Designs Chapter 213 Analysis of Two-Level Designs Introduction Several analysis programs are provided for the analysis of designed experiments. The GLM-ANOVA and the Multiple Regression programs are often used.

More information

(Refer Slide Time: 00:02:00)

(Refer Slide Time: 00:02:00) Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 18 Polyfill - Scan Conversion of a Polygon Today we will discuss the concepts

More information

IT 540 Operating Systems ECE519 Advanced Operating Systems

IT 540 Operating Systems ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (3 rd Week) (Advanced) Operating Systems 3. Process Description and Control 3. Outline What Is a Process? Process

More information

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No # 09 Lecture No # 40 This is lecture forty of the course on

More information

T consists of finding an efficient implementation of access,

T consists of finding an efficient implementation of access, 968 IEEE TRANSACTIONS ON COMPUTERS, VOL. 38, NO. 7, JULY 1989 Multidimensional Balanced Binary Trees VIJAY K. VAISHNAVI A bstract-a new balanced multidimensional tree structure called a k-dimensional balanced

More information