A Mixed Fragmentation Algorithm for Distributed Object Oriented Databases 1

A Mixed Fragmentation Algorithm for Distributed Object Oriented Databases 1 Fernanda Baião Department of Computer Science - COPPE/UFRJ Abstract Federal University of Rio de Janeiro - Brazil baiao@cos.ufrj.br The performance of applications on Object Oriented Database Management Systems (OODBMSs) is strongly affected by Distributed Design, which reduces irrelevant data accessed by applications and data exchange among sites. This work proposes an algorithm to the fragmentation phase of the distributed design of object oriented databases, according to a set of heuristics obtained from experimental results. The proposed algorithm addresses specific characteristics of OODBMSs such as management of class extensions and object relationships, and its major contributions are: (i) it observes performance issues, by allowing a class which has a very small extension not to be fragmented, (ii) it proposes horizontal, vertical and mixed fragmentation (horizontal and vertical) of a class, and (iii) it permits specific OO characteristics to drive the primary and derived fragmentation based on the dependency between classes. This paper also evaluates the proposed algorithm with a case study using the 007 Benchmark database schema, and compares the final fragmentation schema against other alternative fragmentation schemas. 1 Introduction Distributed and parallel processing on Object Oriented Database Management Systems (OODBMSs) may improve performance of non conventional applications that manipulate large volumes of data. This is addressed by removing irrelevant data accessed by queries and transactions and by reducing data exchange among sites, which are the main goals of the distributed design [1]. Distributed design involves making decisions on the placement of data across the sites of a computer network [2]. In a top down approach, the distributed design has two phases: fragmentation and allocation. The fragmentation phase is the process of clustering in fragments information accessed simultaneously by applications, and the allocation phase is the process of distributing the generated fragments over the database system sites. To fragment a 1 Research partially supported by CNPq Proceedings of the Ninth International Conference on Computing Information, Winnipeg, Canada, June 1998 Also to appear in the Special edition of the Journal of Computing Information Marta Mattoso Department of Computer Science - COPPE/UFRJ Federal University of Rio de Janeiro - Brazil marta@cos.ufrj.br class, it is possible to use two basic techniques: vertical fragmentation and horizontal fragmentation. In an object oriented (OO) environment, horizontal fragmentation distributes class instances across the fragments, which will have exactly the same structure but different contents. Thus, an horizontal fragment of a class contains a subset of the whole class extension. On the other hand, vertical fragmentation breaks the logical structure of the class (its attributes and methods) and distributes them across the fragments, which will logically contain the same objects, but with different properties. It is also possible to perform mixed fragmentation on a class, combining these two techniques. Horizontal fragmentation is usually subdivided in primary and derived fragmentation to address the relationship between entities: primary horizontal fragmentation is applied on owner entities, while derived fragmentation is applied on member entities according to the owner fragmentation [2, 3]. Many researchers have worked on distributed design in the relational model, including [2,4,5]. However, in the OO model, the fragmentation process is much more complex than in the relational case. Karlapalem et al. [1] describe the different aspects of a distributed OODBMS that are critical to the distributed design process, which are: the data model; method invocation; types of location transparency; and transaction management. The authors also present preliminary ideas to be used in fragmentation algorithms for OO distributed design. In the same way, Maier et. al. [6] consider the effect of distribution over complex objects. Also, the works from Ezeife and Barker [3] and Savonnet et al. [7] propose algorithms for horizontal fragmentation of OODBs, while Bellatreche et al. [8], Ezeife and Barker [9] and Malinowski [10] address vertical fragmentation. However, mixed fragmentation is not considered in any of the related works. The semantic differences between relational and OO models inhibit a straightforward migration from relational distributed design algorithms to OO algorithms. The relational distributed design is based on queries and relationships between entities. On the other hand, the OO distributed design has to consider, in addition, inheritance mechanisms, complex relationships and method execution. Also, while relational operations are only set oriented, OO

operations are also pointer based, and therefore may have a dual nature involving both set operations (search over class extensions) and navigation (traversals). This dual nature of access patterns has made object clustering a hard task to the OO distributed database designer. Usually, the components of a complex object are physically clustered together in the disk, while a class with a large extension groups its own objects together. This may generate some conflicts to the OODBMS object clustering policy, and we believe that this object clustering policy is strongly related to the fragmentation strategy. While derived horizontal fragmentation privileges navigational access (from a complex object to its components), vertical fragmentation favors the class extension access and the use of class attributes and methods, by removing irrelevant data accessed by operations. Therefore, both fragmentation techniques should coexist for different classes in the distributed design. Also, there are some cases in which the best option is to perform horizontal and vertical fragmentation in a class simultaneously. Although this choice is not simple for the designer to make, we believe that algorithms that force all classes to have the same fragmentation policy (either horizontal or vertical) will end up having unsuitable fragmentation for classes having different access patterns, incurring in bad performance. Therefore, this work proposes a strategy and its algorithms to the fragmentation phase of the distributed design of OODBs. The allocation phase (and selection of replication techniques) is not considered in the present work. Our proposed fragmentation strategy, which was already defined in our previous work [13], is divided in three steps: (i) the analysis phase that choose the best fragmentation technique for each class of the database, based on extensive implementation experiments of [11]; (ii) the vertical fragmentation phase; and (iii) the horizontal fragmentation phase. We present here the algorithms of each phase. A more detailed description of them can be found in [12]. The main contributions of our algorithm when compared to the ones in the literature are the existence of a specific step to analyze the main issues involved in the decision upon the class fragmentation policy, and the use of mixed fragmentation techniques, therefore increasing the performance of applications that access vertical and/or horizontal fragments of the database. The structure of this work is the following: the next Section presents some issues in choosing the best fragmentation technique for each class of the database schema, and lists some heuristics to be used in the proposed algorithms for the fragmentation phase of the distributed design of OODBs presented in Section 3. Section 4 evaluates our algorithms with a case study using the 007 Benchmark database schema. Finally, Section 5 concludes this paper. 2 Choosing the fragmentation technique 2.1 Issues in fragmentation There are many important issues that must be addressed in the OO distributed design in order to choose the best fragmentation policy for each class, thus obtaining an optimal fragmentation schema. This Section focus on three items already identified in [1] that deserve special attention from the distributed database designer, since they have a great influence on the distributed design quality, and may impact on the system performance: (i) attributes, relationships and method links between the classes, i.e., database semantics; (ii) existence of class extensions and their size, i.e., quantitative information; and (iii) application characteristics, i.e., operations. The next sub-sections list information required by our strategy concerning these issues and summarize their influence on the distributed design by defining heuristics for class fragmentation to be used in the first step. 2.2 Obtaining relevant information We encapsulate this task in what we call the Interface Module (see Figure 1), which is responsible for making this data available for our algorithms, either by consulting the database global conceptual schema, or the database designer directly. 2.2.1 Semantics For each class in the database schema, it is important to know the following semantic information: (i) its attributes, their classifications (simple or complex) and the referred classes of each (for complex attributes); (ii) its methods, their classifications (simple or complex) and the referred classes of each (for complex methods); and (iii) its relationships, their cardinalities and the referred classes. Complex attributes, complex methods and relationships (generalization and aggregation) in the object oriented model define links between classes [3]. These links generate navigational paths that must be considered in the fragmentation phase of the distributed design, in order to improve navigational query processing performance by reducing data transfer among sites. Also, different object clustering possibilities transforms the analysis of those links into an essential task [14,15]. In order to propose heuristics for the distributed design of OODBs considering class attributes, methods and relationships, we present the following classification for classes in a database schema. This taxonomy is an extension of the existence dependency taxonomy presented in [1], and addresses the clustering dependency of a class with respect to another class according to the cardinality of the links from to. In fact, this classification may be performed automatically by the Interface Module considering only available information

in the conceptual schema. For each pair of classes and in the database, falls in one of the following situations: Independent is independent of if either: 1. there are no links from to (i.e., objects of class are not linked to objects of class in the database); or 2. all the links from to have cardinality = or (i.e., each object of class may or may not be related to objects of class ). Non-shared dependent is non-shared dependent on if there is at least one link from to with cardinality = (i.e., each object of class is related to exactly 1 object of class ). Shared dependent is shared dependent on if all the links from to have cardinality = (i.e., each object of class is related to many objects of class ). 2.2.2 Quantitative information For each class in the database schema, relevant quantitative information is: (i) the existence or not of a collection representing the class extension; and (ii) the estimated size of the class extension (small, medium or large), compared to other classes in the database schema. Intuitively, the importance of quantitative information lies in the fact that classes with a very large extension must not be considered in the same way as the ones with a small extension in the fragmentation process. Clearly, the distributed design must dedicate special attention to classes with large extensions, in order to improve performance of operations scanning over all their instances. 2.2.3 Operations An operation represents an access to one or more classes in the database during a transaction. A transaction is composed by queries and method calls. The analysis of a transaction leads to three access types identified as: a simple predicate defined on a class (such as! " # $ # % & ' ( ) * & +, -. ( ) / 0 1 2 3 ), a path expression (such as 4 5 6 7 8 9 : 6 ; < = >? 9 5 5 > = ; 8 9 < @ = 6 A 9 : 6 B ) or an isolated access to a class (such as a constructor call within a method). Operations can then be obtained by decomposing transactions, considering recursively the method calls embedded on them and the queries. Let C(O) be the set of classes accessed by an operation O. Each operation can be classified in one of the two following categories, based on the number of classes referenced by it: extension operation C if C(O) is a unitary set, i.e., if O directly references data from only one class c; navigation operation C if C(O) has more than one element, i.e., O refers data from several classes. In this case, C(O) represents a list of classes referred by º We call the first element of C(O) its root class. Therefore, for each operation it is important to know: (i) its classification (extension or navigation); (ii) the accessed class path; and (iii) its probable execution frequency, to determine its priority for fragmentation. 2.3 Defining some heuristics 2.3.1 Semantics The classification defined in Section 2.2.1 will be used in our strategy according to the following heuristics: Independent classes can be clustered without any restriction, as their objects are not necessarily related; Two classes A and B (where A is non-shared dependent on B) should be clustered in the same fragment in order to improve performance of applications that access objects of B from the complex attributes or methods of A. The best option is to carry out derived horizontal fragmentation of B with respect to A; Two classes A and B (where A is shared dependent on B) should also be clustered in the same fragment in order to improve performance of applications that access objects of B from the complex attributes or methods of A. In most cases, the best option is also to carry out derived horizontal fragmentation of B with respect to A, as in the previous case. But here it is important to notice that some conflicts may arise, due to the possible existence of another class C linked to B (C B). At this point, the fragmentation strategy must analyze some issues (which will be discussed in Section 3) to decide which is the most important link, to be used as a guideline to the derived horizontal fragmentation of B. 2.3.2 Quantitative information Although quantitative information is specially important for the purpose of distribution, not all the OODBMS products implement the concept of class extension automatically. According to experiments presented in [11], the distributed design must not consider classes with a large extension and classes with no extension in the same way. However, quantitative information is not considered in any of the algorithms from the literature. Those experiments led to the following heuristics for the distributed design of OODBs: a class with large cardinality having a collection that implements its extension should be isolated from all the other classes in the database, and mixed fragmentation (vertical + horizontal) is recommended; pseudo-classes (classes with no instances) should not be considered in the fragmentation process. 2.3.3 Operations Since all the operations are classified either as extension or navigation, as defined in Section 2.2.3, the following heuristics can be proposed for class fragmentation: for an extension operation on a class with a large cardinality, this class extension must be isolated in vertical fragments, which must be horizontally fragmented, if possible. This will probably result in mixed fragmentation. for a navigation operation, all the referenced instances

Class Information set of lists of classes to be horizontally fragmented set of horizontal class fragments User Information/ Global Conceptual Design Information (Interface Module) Analysis Phase set of classes not to be fragmented Primary Horizontal Fragmentation Operation Information pair of conflicting navigation paths Conflict Analysis pair of non-conflicting navigation paths set of classes to be vertically fragmented Vertical Fragmentation set of vertical class fragments set of mixed class fragments Figure 1- Overview of the proposed strategy for the distributed design of OODBs in the navigation path must be grouped in one horizontal fragment. The root class must be horizontally fragmented in a primary way, and the non-root classes must be derived fragmented according to its preceding class in the path. 3 Proposed strategy and algorithms 3.1 Overview This Section proposes a new strategy to the fragmentation phase of the distributed design of OODBs that uses the heuristics from Section 2.3. Figure 1 presents an overview of our strategy, illustrating information flow between the steps composing it, which will be detailed in the following sub-sections. 3.2 Step 1: Analysis phase The first step of our strategy analyzes operations and semantic information from the Interface Module, and uses the heuristics from Section 2.3 to decide on the most adequate fragmentation technique (horizontal and/or vertical) for each class in the database schema. Its output are: (i) a set of classes to be horizontally fragmented; and (ii) a set of classes to be vertically fragmented. In this step, operations are sorted in a descending way according to their execution frequency (thus priority is function AnalysisPhase ( C : the set of classes in the schema, O : the set of operations) returns Ch : set of lists of classes to be horizontally fragmented Cv : set of classes to be vertically fragmented Cn : set of classes not to be fragmented begin sort O in descending order according to the operation frequency for each Oi that is in O do if Oi is an extension operation then if (C(Oi) is in C) and (cardinality of C(Oi) = large ) then Cv += C(Oi) ; C -= C(Oi) if Oi is a navigation operation then for each non-root class c that is in C(Oi) do if c is in Cv then break the list of classes C(Oi) at class c, forming 2 sublists sub1 and sub2 sublistsofoi += sub1 C(Oi) = sub2 sublistsofoi += C(Oi) for each sublist s that is in sublistsofoi do for each list of classes l that is in Ch do if s and l are two conflicting navigational paths then let Oj be the operation that originated l if freq(oi) and freq(oj) are almost the same then for each class Ck that is in l and conflicts with s do Ch -= list l ; C += all the elements from l call conflictanalysis( Ck, s, l, freq(oi), freq(oj) ) Ch += returned lists s, l C -= all the non-root classes from s and l Ch += s ; C -= all the elements from s Cn = C return Ch, Cv, Cn end Figure 2- Algorithm for the analysis phase: deciding the most adequate fragmentation technique for each class given to the most frequent operations), and the classes involved on those operations are indicated for horizontal and/or vertical fragmentation according to the defined heuristics. Those classes on the intersection of the horizontal and vertical sets (notice that only root classes in the horizontal set may be included in the vertical set) might proceed to mixed fragmentation. In this case, the algorithm for horizontal fragmentation will be performed on the vertical fragments of those classes. Figure 2 presents the algorithm for the Analysis Phase. To help the decision between primary and derived fragmentation, the algorithm for Step 1 includes in the horizontal set lists of related classes, instead of isolated ones. Those lists reflect the structure of the navigation paths accessed by the most frequent operations. The algorithm ensures that the lists inserted in the horizontal set do not contain any intersection among its non-root classes: when the algorithm tries to include in the horizontal set a non-root class D which already belongs to another list, then it proceeds to conflict analysis. The conflict is solved based on the class clustering dependency classification obtained in the Interface Module. Thus, according to the degree of dependency between D and its preceding classes in the conflicting paths, class D will preferably belong to the path where there is a dependency, and non-shared dependency will receive higher priority. Figure 3 presents the algorithm for this conflict analysis. 3.3 Step 2: Vertical fragmentation The second step of our strategy defines the vertical

function ConflictAnalysis ( Y: Class, N1 = (Ci, Cj,, X, Y,, Cn), N2 = (Cp, Cq,, Z, Y,, Cm), f1: frequency of N1, f2: frequency of N2) returns N1, N2 : final navigation paths begin let d1 be the clustering dependency between X and Y let d2 be the clustering dependency between Z and Y if d1 = d2 then if f1 > f2 then N1 = N1; N2 = (Cp, Cq,, Z) N1 = (Ci, Cj,, X); N2 = N2 select case d1 = non-shared dependent N1 = N1; N2 = (Cp, Cq,, Z) case d1 = shared dependent if d2 = non-shared dependent then N1 = (Ci, Cj,, X); N2 = N2 N1 = N1; N2 = (Cp, Cq,, Z) case d1 = independent N1 = (Ci, Cj,, X); N2 = N2 return N1, N2 end Figure 3- Algorithm for conflict analysis: deciding the most relevant links between classes function VerticalFragmentation( Cv: set of classes to be vertically fragmented, O: the set of operations) returns Fv : set of vertical class fragments begin for each Ck that is in Cv do for each Oi that is in O do for each element (attribute or method) ei of Ck that is accessed by Oi do for each element (attribute or method) ej of Ck that is accessed by Oi do if there is a link between ei and ej then value of this link += freq (Oi) create a link between ei and ej value of this link = freq (Oi) N = empty set of nodes; A = empty set of links; G = (N, A) firstnode = any element of Ck N += {firstnode} while there is an element of Ck that is not in N do chosenlink = the link with the greatest value to one of the graph extremities if chosenlink forms a cycle in the graph G then let cp be this cycle if cp can be an affinity cycle then mark cp as a fragment candidate if there is a fragment candidate then let cf be this candidate if cf cannot be extended then mark cf as a fragment Fv (Ck)+= cf Fv += Fv(Ck) if Ck is a root class in Ch then substitute Ck in Ch by its fragment that contains the relevant element for the derived fragmentation return Fv end Figure 4- Algorithm for the vertical fragmentation phase: defining vertical fragments of classes in Cv fragments of the classes indicated in the first step. The algorithm presented in [5] was used for this step, with some adaptations needed to consider both attributes and methods of a class. Figure 4 shows this algorithm. 3.4 Step 3: Horizontal fragmentation The third step of our strategy defines the horizontal fragments of the classes indicated in the first step. Both function PrimaryHorizontalFragmentation( Ch: set of classes to be horizontally fragmented, O: set of operations, Fv: set of vertical class fragments ) returns Fh : set of horizontal class fragments Fm : set of mixed class fragments begin Cr = empty set; for each list Li that is in Ch do Cr += root class of Li for each Ck that is in Cr do for each pair of operations Oi, Oj extracted from the same transaction such that C(Oi) = C(Oj) = Ck do Oext += Oi; Oext += Oj if there is a link between Oi and Oj then value(link) += freq (Oi ) create a link between Oi and Oj with value(link) = freq (Oi) if Oext is empty then if the class has a large extension then define horizontal fragments of Ck in a circular manner for each operation Oi that is in Oext do for each operation Oj that is in Oext do if Oi => Oj then // logic predicate create a logic implication link between Oi and Oj if Oi and Oj are next to each other then create a proximity link between Oi and Oj N = empty set of nodes; A = empty set of links; G = (N, A) N += any operation of Oext while there is an operation of Oext that is not in N do chosenlink = the link with the greatest value to one of the graph extremities if chosenlink forms a cycle in the graph G then let cp be this cycle if cp can be an affinity cycle then mark cp as a fragment candidate if there is a fragment candidate then let cf be this candidate if cf cannot be extended then mark cf as a group of operations for each group of operations g = (O1, O2,.., Oq) do for each operation Oi that is in g do for each operation Oj that is in g do if (Oi!= Oj) and (Oi => Oj) then g -= Oi mark g as an operation term TO = {t1, t2,, tt} // set of operation terms Tab = empty table // table of operation terms on Ck E = empty set // set of Ck elements in Tab while there is a Ck element in TO which is not in E do let e be the less frequent element of Ck in TO such that e is not in E create a new column in Tab fill each element of the new column with operations over e such that the combination of elements in the same row defines a term in TO for each row r in Tab do if there are vertical fragments of Ck in Fv then // Mixed Fragmentation!!! Fm += combination of all elements in r applied to vertical fragments of Ck containing these elements Fh += the combination of all the elements in row r of Tab applied to the whole class Ck return Fh, Fm end Figure 5- Algorithm for the horizontal fragmentation phase: defining primary horizontal fragments and/or mixed fragments of classes in Ch primary and derived horizontal fragments of classes must be defined. To define primary fragments, we developed an algorithm that is an extension of the one used in the previous step, and thus uses the same concepts and data structures, as in [5]. This reduces implementation difficulties, and provides a uniform paradigm for dealing with both vertical and primary horizontal class fragmentation. The algorithm developed to implement this

BaseAssembly DesignObject Id Type BuidDate ComponentsShared M N 1 RootPart 1 CompositePart ComponentsPrivate Parts M N N 1 x y AtomicPart N N From To 1 1 Connection Type Length Figure 6- The reduced 007 Benchmark database schema Class Cardinality Class Cardinality BaseAssembly 200 AtomicPart 100.000 CompositePart 500 Connection 300.000 Table 1- Cardinalities of classes in the 007 Benchmark database schema step is shown in Figure 5. The definition of derived fragments is straightforward, since the class paths received as inputs provide a guideline. Therefore, there is no need to develop an algorithm to perform this task. In order to group in one horizontal fragment all the objects from different classes referenced by the same navigation operation, the distributed designer must define derived horizontal fragments of each non root class according to its preceding class in the path. 4 A Case study using 007 Benchmark The 007 Benchmark has been used in many OODBMS to evaluate their performance. In this Section, our proposed algorithms for the distributed design of OODBs presented in Section 3 are applied on a reduced version of the 007 Benchmark data model [16]. The final fragmentation schema will be evaluated against some performance issues, and compared to other alternative fragmentation schema. The following sub-sections present the reduced 007 Benchmark database schema, the set of transactions considered, the utilization of the proposed algorithms, the final fragmentation schema, and finally an evaluation of the results. 4.1 The reduced 007 Benchmark database schema Figure 6 and Table 1 show the reduced 007 Benchmark database schema and the estimated cardinalities for its classes. 4.2 Transactions The set of transactions considered in this case study was selected from the set of queries and traversals defined for the 007 Benchmark in [16]. This selection reflects typical situations for data retrieval in OODBMS applications Trans Extracted Operation Freq Class Ref Classes Query O 1: bdate < 01/10/96 100 ext AtomicPart Q2 Query O 2: bdate 01/10/96 50 ext AtomicPart Q3 O 3: bdate < 01/10/97 50 ext AtomicPart Trav T1 Trav T6 Query Q5 O 4: baseassembly. componentsprivate. parts.to O 5: baseassembly. componentsprivate. rootpart O 6: baseassembly. componentsprivate O 7: bdate < componentsprivate. bdate 30 nav BaseAssembly, CompositePart, AtomicPart, Connection 30 nav BaseAssembly, CompositePart, AtomicPart 10 nav BaseAssembly, CompositePart 10 ext BaseAssembly Table 2- The sorted set of operations extracted from the selected queries and traversals involving both set operations (queries over class extensions) and navigation (traversals) that were evaluated in [11]: Query Q2 (Query Q3): Choose a range for dates that will contain 1% (11%) of the dates found in the database s atomic parts. Retrieve the Ids of the atomic parts that satisfy this range predicate; Query Q5: Find all base assemblies that use a private composite part with a build date later than the build date of the base assembly. Report the number of qualifying base assemblies found; Traversal T1: Traverse the assembly hierarchy. As each base assembly is visited, visit each of its referenced private composite parts. As each composite part is visited, perform a depth first search on its graph of atomic parts. Return the Ids of the Atomic Parts that have the minimum and the maximum dates when done; Traversal T6: Traverse the assembly hierarchy. As each base assembly is visited, visit each of its referenced private composite parts. As each composite part is visited, visit the root atomic part. Return a count of the number of atomic parts visited when done. 4.3 Applying the algorithms Table 2 presents the sorted set of operations obtained from the previous transactions. The range for dates used by query Q2 was less than 01/10/1996 (as it appears in Operation O 1 ), and in query Q3 was between 01/10/1996 and 01/10/1997 (as it appears in Operations O 2 and O 3 ). The Analysis phase will return two sets of classes (Cv and Ch), representing the set of classes to be vertically and horizontally fragmented, as defined in Figure 7. The second step defines vertical fragments of the classes in Cv (E F G H I J K L M F and N O P Q R P P Q S T U V ). The groups of

b ü ` WW XX YY Z Z [[ \\ ]] WW ^ ^ _ w x y z { } ~ _ x ` x aa bb ƒ cc ƒ dd ee ff gg ƒ h x { i j k j x i j ~ l x m { jx n l ˆ o p x q { r j ~ jš s q j ~ l t z u n l x n kœ m o v t t Ž š œ žž Ÿ Ÿ ž ž ¹ º» ¼ ½»» ¼ ¾ À Á Â Ã ª Ä «Å ª Æ ª Ä Ç È ª É Ê Ë Ì Í É Î ª Ï Ð Ñ ± Ò ² ³ Ó Ô ± Õ Ö ³ µ Ð Ø Ù Ñ ± Ú Ú Û Ô µ Ð Ó «Ñ Ú Ü Ø ± Ð Ý ª Þ ß ß Þ à à Û ß Ð Ó Ú à Ð Ý Ö Ð Ô á Ö ß ß â ã ä å æ ä ä å ç è é ê è å ë ì í î ï ì ð ñ ã é é ê ò í ã ó ç å ð ñ å ô õ ö ø ù õ ú ù û ü û ý ö þ ÿ ú ø õ ù ý ú õ ÿ ù ö þ ö ö ÿ õ ö ù õ ö û ù ú ö þ õ ö þ ù õ þ ü û! " # $ % & " ' ( $ % " $ ) * +,, -. / 0 1 2 3, 4 + 5 3 6 + 1 3 / 4 7 + 1 2 3, 8 9 / : ; 4 / 4 ) * +,, < 1 / = 3 ) > + 9 1 0 + 4? @ 9 / = 1 2 3, 7 / 3 4 1 @ / 9 A + 9? A ; @ / 9 = + 4 ; A 4 + 5 3 6 + 1 3 / 4 7 + 1 2 1 / 8 ; 3 4, ; 9 1 ;? 3 4 B 2 C D E F G H I J K L M F N O G P P Q J F I G P R S T G F I J Q F U I V W I X X M Q V Y X F Z [ \ ] Z ^ _ ` a b \ c ] _ [ d \ d Z e [ e a f g \ h h i d e ] Z f j \ b d k l m n o p q r s t u v w p x k y m n n z v { o { { r } ~ k q r q { s p u v w p x n o p q r s t u v w p k q t p s q x x ƒ ƒ ˆ ˆ ƒƒ Š Š ž Ÿ ŒŒ Ž ª Ž «± ² ³ µ ¹ º» š ¼ µ ½ ¾ µ À Á š Â Á ¹ œ» Â Â Á Â Ã Ä Å Ã Æ Ä Å Ã Ç È É Ê Ë Ì Í Î Ï Ð Ì Ñ Ò Ñ Ó Ô Õ Õ Ö Ñ Ï Ø Ì Ù Ú Ñ Î Û Ü Ý Þ ß à á â ã ä å æ ç è á é Ü ê Þ ß ß ë ç ì í à ì ì í ã î ï ð ñ Ü â ã ò â ì ä á í æ ç è á é ñ ß à á â ã ä å æ ç è á ñ Ü â ó ó í å á ä â ó é é ôô õõ ö ö øø ùù úú ôô ûû ü ýý þ ÿ ÿ! " # $ % & ' ( ' " ) # " * + " #, -. / 0 1 2 3 4 5 6 4 7 4 8 7 4 2 9 3 9 2 : ; 1 3 5 < 1 2 5 6 4 = 8 5 6 < > 5 6 4 7 4? 1 ; ; @ 4 2 9 3 6 8 2 A 4 < B C D E F G H I J K L M N G O B P D E E Q M R S F R R S I T U V W B H I X H R J G S L M N G O W E F G H I J K L M N G W B H Y Y S K G J H Y O O Z Z [[ \\ ]] ^ ^ `` ZZ a } ~ a b ~ cc ƒ d e f } g f e f h ƒ i f j } k ˆ l f f Š m k f h j n o ~ h o } g i p q j j r Œ s t u Ž v t t u w x y z { š œ ž œ Ÿ š š ž œ œ Ÿ š š ž š œ ª «ž œ œ ± ² ³ µ ± ¹ º» ¼ ½ ¾ ¼ ¼ ½ À Á Â Ã Ä Å Æ Ç Ç º» ¼ ½ ¾ ¼ ¼ ½ À Á Â È Ä É Ê É ¼ Ë Ì ½ Í» Î Ì Ã È Ç ¾ Ì É Ë Ï Í» Î Ì È Ä É Ð Ð ½ Ï Ì Ë É Ð Ã Ã Figure 7- Execution of the analysis phase step-by-step attributes referenced simultaneously by the operations will be identified and each group will define a new class fragment. The performance of operations will be improved, since this will reduce unnecessary information accessed by them. Applying the algorithm described in Figure 4, the final vertical fragments will be: F v 1( ¾ Ì É Ë Ï Í» Î Ì ) = π (type, x, y) ( ¾ Ì É Ë Ï Í» Î Ì ) F v 2( ¾ Ì É Ë Ï Í» Î Ì ) = π (id, to, bdate) ( ¾ Ì É Ë Ï Í» Î Ì ) F v 1( º» ¼ ½ ¾ ¼ ¼ ½ À Á Â ) = π (id, type) ( º» ¼ ½ ¾ ¼ ¼ ½ À Á Â ) F v 2( º» ¼ ½ ¾ ¼ ¼ ½ À Á Â ) = π (bdate, componentsprivate) ( º» ¼ ½ ¾ ¼ ¼ ½ À Á Â ) The last step of the algorithm addresses primary and derived horizontal fragmentation of classes in Ch. The primary horizontal fragments of Ch root classes ( ¾ Ì É Ë Ï Í» Î Ì and º» ¼ ½ ¾ ¼ ¼ ½ À Á Â ) will be defined by the algorithm of Figure 5. The derived horizontal fragments of classes (CompositePart, Ä É Ð Ð ½ Ï Ì Ë É Ð ) will be easily defined by the distributed designer. Applying the algorithm described in Figure 5, primary horizontal fragmentation will be performed on the vertical fragments of classes ¾ Ì É Ë Ï Í» Î Ì and º» ¼ ½ ¾ ¼ ¼ ½ À Á Â, resulting in mixed fragments (these will represent a selection of Fragment 6 Fragment 5 Fragment 4 Fragment 3 Fragment 2 Fragment 1 Fragment 7 þ ÿ û ù ý ù ù ý. è é ê ë ì í î ï ð î î ï é ñ ò ó ô õ ö ø ù ú û ø ü ý ù. " & ' ( ) *! " # $ $ %! " # # $ $ % "! +,, -. / 0 1. 2 3 / 4 5 6 7 8 0 9 5-2 : 1 ; 0 < 6,,, -. / 0 1. 2 3 / 4 5? 7 8 0 9 5-2 : 1 ; 0 < 6 6,,, -. / 0 1. 2 3 / 4 5 + 7 8 0 9 5-2 : 1 ; 0 < @ @,,, -. / 0 1. 2 3 / Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Ö Þ ß à à à à à á â ã ä å â æ ç ã 4 = 6 7 > 9.. 3 2 0-9. < +,,, -. / 0 1. 2 3 / 4 =? 7 > 9.. 3 2 0-9. < + +,,, -. / 0 1. 2 3 / 4 =? 7 > 9.. 3 2 0-9. < A B C D D D E F G H I F J K G Figure 8- The final 007 Benchmark distributed database objects from the vertical fragments that satisfies some range predicates inserted in the affinity graph).the final mixed fragments will be defined as: F m 1(L M N O P Q R S T M ) = σ (bdate < 01/10/1996) F v 2(L M N O P Q R S T M ) F m 2(L M N O P Q R S T M ) = σ (bdate 01/10/96) and (bdate < 01/10/19) F v 2(L M N O P Q R S T M ) F m 3(L M N O P Q R S T M ) = ELSE F d i(u V W W X Y Z [ V W ) = \ ] ^ ^ _ ` a b ] ^ F m i(c a ] d b ` e f g a ), i = 1 to 3 F m 1(h i j k l j j k m n o p ) = σ (bdate<componentsprivate.bdate)f v 2(h i j k l j j k m n o p ) F m 2(h i j k l j j k m n o p ) = ELSE F d i(q r s t r u v w x y z { w ) = q r s t r u v w x y z { w F m i( z u x } u u x s ~ ),i= 1 to 2 4.4 Evaluating the final distributed database Figure 8 illustrates the final 007 Benchmark Distributed Database obtained from our proposed fragmentation strategy. From the illustration, we may see that queries Q2, Q3 and Q5 will have their performance improved, since they will perform a direct access to only one fragment each (5, 6 and 2, respectively). Traversals T1 and T6 will not have a bad performance, because of the elimination of irrelevant data accessed by them. Although their entire navigation paths are not clustered into the fragment, this happened in order to improve performance of queries Q2

and Q3, which are more frequent. This prioritization of the most frequent transactions will surely improve the overall system performance. Also, the clustering of the navigation paths { ƒ ƒ ƒ ˆ Š Œ Œ ƒ Ž } and { Œ Ž Š Œ Ž Œ } will reduce the communication overhead during their execution. It is important to notice that the next phase of the distributed design (the allocation phase), which is not in the scope of this work, will possibly reduce this communication overhead even more, by allocating some of these defined fragments in the same site, or through replication techniques. 5 Conclusions In this paper we have proposed a new strategy to the fragmentation phase of the distributed design of OODBs. We have identified the most relevant issues to be considered in the fragmentation process, such as the dual nature of OO applications involving both set operations and navigation. Therefore, our strategy may detect, in the same database schema, classes that should be horizontally fragmented due to its navigation access and classes that should be vertically fragmented because of its search over a large extension and its attribute usage. Particularly in the OO model, vertical fragmentation also has a special appeal due to the existence of class methods. This work has pointed out the importance of mixed fragmentation, and has implemented it in the proposed algorithms. Mixed fragmentation benefits were already detected in [5] for the relational model, but it was not proposed in any of the related works for the OO model. The main contribution of this paper lies in providing a sequence of steps to be followed when fragmenting an OODB based on some developed heuristics, and the related algorithms. The first step analyzes the database structure and applications to decide the most adequate fragmentation strategy (horizontal and/or vertical) for each class, the second and third steps define respectively vertical and horizontal fragments of the classes indicated in the first step, considering derived fragmentation of classes to improve navigation performance. We have presented an evaluation of the proposed algorithm using the 007 Benchmark database schema. The final fragmentation schema considers characteristics that were not addressed in previous algorithms such as utilization of horizontal and/or vertical fragmentation of a class, and identification of small class extensions (the fragmentation of a small class extension accessed through navigation would incur in communication overhead that dominates the operation). The final fragmentation schema offers a high degree of parallelism (reached by horizontal fragmentation) together with an important reduction of irrelevant data (obtained with vertical fragmentation). Even though the use of heuristics for the distributed design of OODBs by the user is not a trivial task, the implemented algorithms certainly help the designer in defining the best fragmentation schema, considering all relevant information provided. Currently we are working on machine learning techniques (such as ILP) to improve the decision upon conflicting situations. References [1] Karlapalem, K. et. al, Issues in Distribution Design of Object-Oriented Databases. In: Özsu, M. et. al (eds), Distributed Object Management, Morgan Kaufmann Publishers, 1994 [2] Özsu, M., Valduriez, P., Principles of Distributed Database Systems, New Jersey, Prentice-Hall, 1991 [3] Ezeife, C., Barker, K., "A Comprehensive Approach to Horizontal Class Fragmentation in a Distributed Object Based System, Distributed and Parallel Databases, 3(3), pp. 247-272, 1995 [4] Navathe, S., Ra, M., Vertical Partitioning for Database Design: A Graphical Algorithm. In: Proc. of 1989 ACM SIGMOD, pp. 440-450, 1989 [5] Navathe, S. et. al, A Mixed Fragmentation Methodology for Initial Distributed Database Design, Journal of Computer and Software Engineering, vol. 3(4), 1995 [6] Maier, D..et al., Issues in Distributed Object Assembly. In: Özsu, M. et. al (eds), Distributed Object Management, Morgan Kaufmann Publishers, 1994 [7] Savonnet, M. et. al., Using Structural Schema Information as Heuristics for Horizontal Fragmentation of Object Classes in Distributed OODB, In: Proc IX Intl. Conf on Parallel & Distributed Computing Systems, pp. 732-737, France, 1996 [8] Bellatreche, L. et. al, Vertical Fragmentation in Distributed Object Database Systems with Complex Attributes and Methods. In: Proc. of the 7th Intl Workshop on Database and Expert Systems Applications, 1996 [9] Ezeife, C., Barker, K., Vertical Class Fragmentation in a Distributed Object Based System, Technical Report 94-03, Dept of Computer Science, University of Manitoba, 1994 [10] Malinowski, E., Fragmentation Techniques for Distributed Object-Oriented Databases, Thesis, Univ. of Florida, 1996 [11] Lima, F. Mattoso, M., Performance Evaluation of Distribution in OODBMS: a Case Study with O2 In: Proc. IX Intl. Conf on Parallel & Distributed Computing Systems, pp.720-726, France, 1996 [12] Baião, F., A Strategy for the Distributed Design of Object Oriented Databases, Thesis, COPPE/UFRJ, Rio de Janeiro, Brazil, 1997 (in portuguese) [13] Baião, F., Mattoso, M., A Mixed Fragmentation Strategy for Distributed OO Databases, In: Proc. of The Second Workshop on CSCW in Design, pp. 42-48, Bangkok, Thailand, 1997 [14] Chen, Y., Su, S., "Implementation and Evaluation of Parallel Query Processing Algorithms and Data Partitioning Heuristics in Object Oriented Databases, Distributed and Parallel Databases,4(2), pp. 107-142, 1996 [15] Cluet, S., Delobel, C., "A General Framework for the Optimization of Object-Oriented Queries. In: Proc. of 1992 ACM SIGMOD, 21(2), pp. 383-391, San Diego, 1992 [16] Carey, M., et. al., The 007 Benchmark. In: Proc. of 1993 ACM SIGMOD, 22(2), pp. 12-21, 1993