Preserving Constrints for Aggregtion Reltionship Type Updte in XML Document Eric Prdede 1, J. Wenny Rhyu 1, nd Dvid Tnir 2 1 Deprtment of Computer Science nd Computer Engineering, L Trobe University, Bundoor 3083 Austrli E-mil: {ekprdede, wenny}@cs.ltrobe.edu.u 2 School of Business System, Monsh University, Clyton 3800, Austrli E-mil: Dvid.Tnir@infotech.monsh.edu.u Abstrct Despite the incresing demnd for effective XML document repository, mny re still reluctnt to store XML documents in their nturl tree form. One min reson is the limittion of XML lnguges used to define nd mnipulte the XML documents. It is evident tht the current XML lnguges hve lck of support for updte opertions. Even though some of the lnguges hve supported minimum updte fcilities, they do not concern on preserving the documents constrints. The result is updted documents with very low dtbse integrity. This pper ims to propose XML document updte without violting the semntic constrints. We focus on the constrints found in ggregtion reltionship, which is the most frequently used reltionship in n XML tree. Hving clssified the constrint types, we propose the lgorithms bsed on the type of opertions: deletion nd insertion. 1 Introduction In the lst few yers the interests of storing the XML Documents in the ntive XML Dtbses (NXD) hve emerged rpidly. The min ide is to store the documents in their nturl tree form. However, it is no secret tht mny users still prefer to use DBMS tht re bsed on estblished dt models such s Reltionl Model for their document storge. One reson is the incompleteness of NXD query lnguge. Mny proprietry XML query lnguges nd even W3C-stndrdized lnguges still hve limittions compred to the Reltionl Model SQL. One of the most importnt limittions is the lck of the updte opertions support [1]. Different NXDs pply different strtegy for XML updte. Very frequently fter updte opertions, the XML document contins mny dngling references, loses the key ttribute, hs unnecessry duplictions, nd mny other problems tht indicte very low dtbse integrity. There is no lgorithm let lone query lnguge, which hs considered the integrity issues emerged by the updte opertions. We find this s n importnt issue to rise nd to investigte further. This pper proposes lgorithm for updting XML document without violting the constrints nd creting integrity problems. Nevertheless, we relize tht the re is too big for single reserch pper. Therefore, we put focus on the updte of ggregtion reltionship type. Aggregtion is reltionship type in which composite object ( whole ) consists of component objects ( prts ) [2]. It is the core reltionship type in XML document. We will
distinguish different ggregtion constrints in XML document tree, nlyze the potentil integrity problems if it is being updted, nd finlly propose lgorithm to void the integrity problems. The rest of this pper will follow this structure. Section 2 briefly discusses different XML updte strtegies by NXDs. Section 3 depicts different ggregtion constrints in XML Document. Section 4 proposes the lgorithm divided by the updte opertions type. We will conclude the pper in section 5. 2 XML document updte: n overview So fr there re three min strtegies of updting the XML documents inside the NXDs [1, 3]. It is importnt to mention tht none of these hs concerned on the integrity constrint of the XML document tht is being updted. First strtegy is by hving proprietry updte lnguges tht will llow updte within the server. Usully the system hs versioning cpbilities tht enble users to get different versions of the documents. Some of the systems tht use this strtegy re Ipedo [4] nd SODA [5] Second strtegy is by using XUpdte, the stndrd proposed by XML DB inititive for updting distinct prt of document [6]. Some open source NXDs such s exist nd Xindice use this option [7] Third strtegy is followed by most NXD products. The XML document is retrieved, then updted using XML API nd then is returned to the dtbse. One of the systems using this strtegy is TIMBER [8] Different strtegies hve limited the dtbse interchngebility. To unite these different strtegies, [9] hs tried to propose the updte processes for XML Documents into n XML lnguge. These processes re embedded into XQuery nd thus, cn be used for ny NXD tht hs supported this lnguge. The updte is pplicble for ordered nd unordered XML Documents nd lso for single or multiple level of nodes. Nonetheless, even this proposl hs not nswered the bsic question. We do not know how the updte opertions cn ffect the semntic correctness of the updted XML Documents. 3 Aggregtion reltionship in XML document By nture, XML documents re structured s set of ggregtion reltionship. Semnticlly, this reltionship type cn be distinguished by different constrints bsed on crdinlity, homogeneity, dhesion, exclusivity, ordering, shre-bility, nd dependency. Ech of these influences how the prt components relte to the whole component. Most constrints, with the exception of shre-bility nd dependency, cn be identified in XML Dt Model such s in Semntic Network Digrm [10]. We will show running exmple describing different ggregtion constrints (see Fig. 1).
FACULTY FcNme [0..N] wek dhesion Den ADDRESS DeptID DEPARTMENT order [0..N] wek dhesion exclusive [0..N] wek dhesion Street Suburb Phone SCHOOL RES-CENTRE [1..N] [1..N] School Nme School Hed PUBLICATION [1..N] Author Centre Nme CONTENT [1..N] homogeneous Centre Desc PUBLICATION [1..N] Author CONTENT [1..N] homogeneous Section Section Figure 1. Aggregtions in XML Document Tree Aggregtion crdinlity identifies the number of instnces of prticulr prt component tht single instnce of whole component cn relte with. For exmple, school hs exctly one or more publiction [1..N]. Aggregtion homogeneity identifies whether the types of component tht mde up the whole component re either homogeneous or heterogeneous. For exmple, publiction content hs homogeneous ggregtion of section document. Aggregtion dhesion identifies whether whole nd prt components must or must not coexist nd dhere to ech other. For exmple, Fculty hs wek dhesion ggregtion with component Deprtment, which mens the existence of the ltter does not totlly depended on the former. Aggregtion exclusivity identifies tht t ny given time n instnce of whole component cn only be composed by prticulr prt component nd NOT the other prt components. For exmple, Deprtment hs exclusive disjunction since it must be group of school or group of reserch centre. Ordered ggregtion identifies whether the prt components must compose the whole component in prticulr order. The opposite is unordered ggregtion, which is usully not explicitly mentioned. In the exmple, Address hs ordered ggregtion. Aggregtion shre-bility identifies whether instnce(s) of prt component cn be shred by more thn one instnces of one or more whole components. If they cn be shred, we cll it shreble ggregtion. This ggregtion type cnnot be depicted either in Semntic Network Digrm or in XML Schem [11]. We cnnot enforce the shreble constrints by enbling the sme prt component to be owned by more thn one whole component. Therefore, the solution is by hving the prt components seprtely nd then linking them with the whole components. For exmple, ssume tht the publiction is shreble. The usul prctice solution is shown in Fig.2.
Aggregtion dependency is little bit similr to the dhesion. However it is more concerned on how the prt component is depended on the whole component. If the existence of the prt is totlly depended on the whole, we cll it existence-dependent ggregtion. It mens tht removing the whole component will lso remove ll ssocited prt components. All components in the bove exmple re existence-dependent. However, sy now we wnt to chnge the reltionship between deprtment nd fculty to be existenceindependent. For the solution, we remove the prt component nd include the reference to the prt component under the whole component (see Fig. 2b). It is very similr to the solution for the non-shreble constrint. DEPARTMENT PUBLICATION FACULTY DEPARTMENT SCHOOL [1..N] (reference Publiction) RES-CENTRE [1..N] (reference Publiction) [1..N] Author CONTENT [1..N] homogeneous Section DeptID (reference Deprtment) DeptID. Non-Shreble Constrint b. Existence-Independent Constrint Figure 2. Specil Structure for Non-Shreble nd Existence-Independent Constrints 4 Proposed lgorithm The updte opertions cn be differentited into three min groups: deletion, insertion, nd replcement. In this section we will show the lgorithms for the first two only. It is becuse the replcement opertion ffects the constrints in the sme wy s deletion followed by n insertion. Since the XML nodes cn be differentited into ttribute nd element, we propose different lgorithm for both. In ddition, specific lgorithms for key nd key reference (for insertion only) re proposed seprtely. Note tht our lgorithms ssume tht the schem used is XML Schem [11]. Therefore, for exmple, crdinlity only cn be checked in n element since the constrint minoccurs cn only be ttched in n element. 4.1 Algorithm for deletion Deletion opertions my violte some ggregtion constrints like they re described by points below. Crdinlity constrint is violted if we delete prt component so tht the number of the prticulr prt component is less thn its minimum crdinlity. For exmple, the deletion of den in fculty should be restricted since it is [1..1] ggregtion crdinlity
Adhesion constrint is violted if we delete prt component in strong dhesion ggregtion. The exmple of previous point is lso pplicble. This time the dhesion semntic is discrded Shre-bility nd dependency constrints re violted if we delete prt component key. Now the reference key in the whole component will point to non-existence instnce. For exmple like in Fig 3, if we delete publiction with XML Updte, the reference keys inside school nd reserch centre will be dngling Now we hve discussed the potentil problem, we propose the lgorithms or functions to perform the delete opertion. Algorithm 1.1. Key Deletion Pss the Key For ll nodes in the documents Check the Key References If the Key Reference refers to the Key --shre-bility & dependency constrints THEN (Nullify or Delete the Key Reference) Delete the Key For ll siblings nodes of the Key Delete the sibling Algorithm 1.2. Attribute Deletion Pss the Attribute If the ttribute is Key ttribute THEN Go to Algorithm 1.1 ELSE (Check the use constrint -- dhesion constrint If the constrint is required THEN () ELSE Delete Attribute) Algorithm 1.3. Element Deletion Pss the Element If the element is Key element THEN Go to Algorithm 1.1 ELSE (Check the minimum occurrence constrint -- crdinlity constrint If the constrint exist THEN (Check the Instnce occurrence If the Instnce occurrence > (minimum occurrence + 1) THEN Delete element ) ELSE Delete element) 4.2 Algorithm for insertion Like in deletion, some insertion opertions will violte the ggregtion constrints, like they re described below. Crdinlity constrint is violted if we insert prt component so tht the number of the prticulr prt component is more thn its mximum crdinlity. For exmple, the insertion of the second den in fculty should be restricted since it is n [1..1] ggregtion
Homogeneity constrint is violted if we insert new prt component type in homogeneous ggregtion. For exmple, the insertion of node remrk s the prt of component content Exclusivity constrint is violted if we insert new prt component type in n exclusive disjoint ggregtion. For exmple, we insert reserch centre inside deprtment tht hs only school components Ordering constrint is violted if we insert prt component not in its defined order. For exmple, we insert new prt node stte fter the suburb node in n ddress Shre-bility nd existence independent constrints re violted in two occsions. First, if we insert duplicted prt component key. For exmple like in Fig. 3, we insert nother publiction with title XML Updte. There is potentil integrity problem, since now the reference key might point to more thn one instnce. Second, if we insert the key reference tht does not refer to ny key instnce. For exmple, if we insert the reference key XML UpdteS under school. It will point to no key instnce Unlike deletion, we propose four lgorithms for insert updte. The ddition lgorithm is for inserting the key reference (keyref). Algorithm 2.1. Key Insertion Pss the Key Nme nd Content For ll existing instnces of the Key nme Check existing key content --shre-bility & dependency constrints If the existing key content is the sme s the new key content Insert the Key Algorithm 2.1. KeyRef Insertion Pss the KeyRef Nme nd Content Check the Key being Referred by KeyRef For ll existing instnces of the Key being Referred Check existing key content --shre-bility & dependency constrints If there is existing key content sme s new KeyRef content THEN Insert new KeyRef Algorithm 2.3. Attribute Insertion Pss the Attribute nme nd content If the ttribute is under choice constrint -- exclusivity constrint THEN (FOR ll existing instnce ttribute under the constrint If the existing instnce ttribute hs the different nme with the new ttribute If the ttribute is under homogeneous constrint -- homogeneity constrint THEN (Check existing instnce ttribute under the constrint If the existing instnce ttribute hs the different nme with the new ttribute ) If the ttribute is Key ttribute THEN Go to Algorithm 2.1 ELSE If the ttribute is KeyRef ttribute
THEN Go to Algorithm 2.2 ELSE Insert ttribute Algorithm 2.4. Element Insertion Pss the Element nme nd content If the element is under choice constrint -- exclusivity constrint THEN (FOR ll existing instnce element under the constrint If the existing instnce element hs the different nme with the new element ) If the element is under homogeneous constrint -- homogeneity constrint THEN (Check existing instnce element under the constrint If the existing instnce element hs the different nme with the new element ) If the element is Key element THEN Go to Algorithm 2.1 ELSE (If the element is KeyRef element THEN Go to Algorithm 2.2 ELSE() Check the mximum occurrence constrint -- crdinlity constrint If the mximum occurrence constrint exist THEN (Check the Instnce occurrence If the Instnce occurrence > (mximum occurrence + 1) ) ELSE() Check the sequence constrint -- ordering constrint If the sequence constrint exist THEN (If there is previous element THEN Insert element fter the previous element ELSE (If there is next element THEN Insert element before the previous element )) ELSE Insert element on the bck) 5 Conclusion nd future work In this pper, we propose some lgorithms to void constrints violtions during the XML updte opertions. We focus on ggregtion type constrint tht cn be distinguished by its crdinlity, homogeneity, dhesion, exclusivity, ordering, shre-bility, nd dependency. The lgorithms re grouped bsed on the updte opertions type, in this cse deletion nd insertion. With checking lgorithms, XML query lnguges cn become more powerful. It cn lso increse the usge of tree-form XML repository such s Ntive XML Dtbse. At the time of writing, we re still pplying the lgorithm into one XML lnguge, XQuery. For future work we lso im to develop lgorithm, s well s embedding them into XQuery, for different reltionship constrints such s ssocition nd inheritnce. References [1] K. Stken, Introduction to Ntive XML Dtbses, http://www.xml.com/pub//2001/10/31/ntivexmldb.html, 2001
[2] J. Rumbugh, et.l., Object-Oriented Modelling nd Design, Prentice Hll, 1991. [3] R Bourett, XML nd Dtbses, http://www.rpbourret.com/xml/xmlanddtbses.htm, 2003. [4] Ipedo, Ipedo XML Dtbse, Avilble t: http://www.ipedo.com/html/products.html, 2004. [5] SODA Technology, SODA, vilble t http://www.sodtech.com/products.html, 2004. [6] XML DB, XUpdte XML Updte Lnguge, http://www.xmldb.org/xupdte/, 2000. [7] W.M. Meier, exist Ntive XML Dtbse, XML Dt Mngement: Ntive XML nd XML-Enbled Dtbse System, A.B. Chuduri, A. Rwis, & R. Zicri (Eds), Addison Wesley, 43-68, 2003. [8] H. V. Jgdish, S. Al-Khlif, A.Chpmn, L.V.S. Lkhsmnn, A. Niermn, S.Pprizos, J.M. Ptel, D. Srivstv, N. Wiwttn, Y. Wu & C. Yu, TIMBER: A ntive XML dtbse, VLDB Journl, Vol. 11, No. 4, 279-291, December, 2002. [9] I. Ttrinov, Z.G. Ives, A.Y. Hlevy & D.S. Weld, Updting XML, ACM SIGMOD, Snt Brbr, CA, USA, 21-24 My 2001. [10] L. Feng, E. Chng & T.S. Dillon, A Semntic Network-Bsed Design Methodology for XML Documents, ACM Trns. Informtion System, Vol. 20, No. 4, 390-421, October, 2002. [11] E. vn der Vlist, XML Schem, O Reilly, Sebstopol, CA, USA, 2002.