WITH the multidimensional model for data warehouses,

Size: px
Start display at page:

Download "WITH the multidimensional model for data warehouses,"

Transcription

1 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 7, JULY Effcent Computaton of Iceberg Cubes by Boundng Aggregate Functons Xuzhen Zhang, Paulne Lenhua Chou, and Guozhu Dong, Senor Member, IEEE Abstract The ceberg cubng problem s to compute the multdmensonal group-by parttons that satsfy gven aggregaton constrants. Prunng unproductve computaton for ceberg cubng when nonantmonotone constrants are present s a great challenge because the aggregate functons do not ncrease or decrease monotoncally along the subset relatonshp between parttons. In ths paper, we propose a novel bound prune cubng (BP-Cubng) approach for ceberg cubng wth nonantmonotone aggregaton constrants. Gven a cube over n dmensons, an aggregate for any group-by partton can be computed from aggregates for the most specfc n-dmensonal parttons (MSPs). The largest and smallest aggregate values computed ths way become the bounds for all parttons n the cube. We provde effcent methods to compute tght bounds for base aggregate functons and, more nterestngly, arthmetc expressons thereof, from bounds of aggregates over the MSPs. Our methods produce tghter bounds than those obtaned by prevous approaches. We present ceberg cubng algorthms that combne boundng wth effcent aggregaton strateges. Our experments on real-world and artfcal benchmark data sets demonstrate that BP-Cubng algorthms acheve more effectve prunng and are several tmes faster than state-of-the-art ceberg cubng algorthms and that BP-Cubng acheves the best performance wth the top-down cubng approach. Index Terms Data mnng, data cube, prunng, data warehouses. Ç 1 INTRODUCTION WITH the multdmensonal model for data warehouses, a data set conssts of tuples over dmensons and measures, and queres nvolve aggregatng the measures over parttons of tuples sharng dentcal dmenson values. For example, the Structured Query Language (SQL) n Fg. 1a gves the monthly total amount of sales for each cty on the Sales data set n Table 1. The (Month, Cty) group-by dvdes the data set nto parttons of tuples sharng dentcal (Month, Cty) values, and the measure Sale s aggregated to yeld Sum(Sale) for each partton. The Cube operator was proposed [6] to compute all potental group-bys of a data set, leadng to the noton of a data cube. The ceberg cube was proposed n [4], where only parttons whose aggregate values satsfy an aggregaton constrant are produced. By addng HAVING CountðÞ 10 to the SQL query n Fg. 1a, we get an SQL query (n Fg. 1b) for computng those (Month, Cty) parttons, each havng 10 tuples. The aggregaton constrant CountðÞ 10 s antmonotone: If a partton does not contan 10 tuples and thus fals the constrant, then all ts subparttons wll have fewer tuples and also fal the constrant. The bottom-up cubng (BUC) strategy was proposed [4], where aggregates are computed startng from. X. Zhang and P.L. Chou are wth the School of Computer Scence and IT, RMIT Unversty, GPO Box 2476v, Melbourne 3001, Australa. E-mal: {zhang, lchou}@cs.rmt.edu.au.. G. Dong s wth the Department of Computer Scence and Engneerng, Wrght State Unversty, Dayton, OH E-mal: guozhu.dong@wrght.edu. Manuscrpt receved 18 Sept. 2005; revsed 12 Apr. 2006; accepted 15 Jan. 2007; publshed onlne 25 Apr For nformaton on obtanng reprnts of ths artcle, please send e-mal to: tkde@computer.org, and reference IEEECS Log Number TKDE Dgtal Object Identfer no /TKDE the partton of all tuples and then followed recursvely wth the subparttons, and antmonotone constrants are used for prunng. In On-Lne Analytcal Processng (OLAP) applcatons, aggregaton constrants often nvolve complex aggregates. The constrant SumðxÞ 5; 000 and VarðxÞ 100, whch states that the total proft ðxþ s at least $5,000, and the varance of proft s at most $100, s a typcal example of constrants for supportng enterprse data analyss and decson makng. As x can be postve or negatve, the subset relatonshp between a group 1 and ts subgroups may not mply monotoncally ncreasng (or decreasng) aggregate values. Computng ceberg cubes wth such constrants has been a challengng problem. Exstng proposals n the lterature have all followed the approach of dervng weaker but antmonotone constrants [7], [15]. They are restrcted n two aspects: 1) the weaker constrants derved from nonantmonotone constrants are often very loose and not effectve enough for prunng, especally for complex constrants, and 2) aggregates are computed followng the BUC framework, and only one group s aggregated at a tme n the recursve parttonng process, whch lmts the amount of nformaton readly avalable for prunng. In ths paper, we propose a novel technque, called bound prune cubng (BP-Cubng), for effcently computng ceberg cubes wth nonantmonotone aggregaton constrants. The man deas are to effectvely derve tght bounds for possble aggregates and to use such bounds to prune unproductve computaton. An mportant part s played by the most specfc parttons (MSPs), namely, those nonempty parttons that cannot be further dvded (for the subcube under consderaton). MSPs can be vewed as the 1. We use group and partton as synonyms /07/$25.00 ß 2007 IEEE Publshed by the IEEE Computer Socety Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

2 904 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 7, JULY 2007 Fg. 1. Two SQL group-by queres on the sales data set. (a) An SQL group-by. (b) Addng a constrant to (a). TABLE 1 A Sales Data Set, Partally Aggregated Fg. 2. Data cube lattce and subdata cube. (a) Cube(ABCD) decomposton. (b) CubeðBCDÞja 1. Month, Product, SalesMan, and Cty are dmensons. Sale s the measure. basc unts for computng data cubes, snce other parttons are unons of some MSPs. The aggregate for any partton n the data cube can be computed from aggregatng some MSPs. Decomposng an aggregate functon nto the base aggregate functons Count, Max, Mn, and Sum on MSPs, the aggregate functon can be tghtly bounded. By usng MSPs, nonexstng groupngs of tuples wll not nfluence the estmaton of the bounds of aggregates, 2 whch s another reason why tght bounds can be obtaned. Wth a four-dmensonal data cube on the average Sale (Avg(Sale)) of Sales, sx MSPs are shown n Table 1. For each MSP, AvgðSaleÞ ¼SumðSaleÞ=CountðÞ. 3 We now llustrate cases where SumðSaleÞ > 0 for each MSP (the general case s treated later). Both Sum(Sale) and CountðÞ ncrease monotoncally over unons of postve MSPs. The upper bound of Avg(Sale) can be obtaned from the upper bound for Sum(Sale) and the lower bound for CountðÞ, whch s Sum ðsumðsaleþþ=mn ðcountðþþ, where ranges over the sx MSPs. Ths can, n fact, be mproved: The upper bound for Avg(Sale) s actually Max AvgðSaleÞ, where ranges over all MSPs (see Secton 6). For Table 1, Max AvgðSaleÞ s 200=5 ¼ 40, reached at the group (January, Toy, John, Perth). The MSPs can be obtaned by a sngle scan of the raw data. The bound-prune technque descrbed above wll be appled recursvely for all subcubes. We wll optmze the computaton for subcubes by reusng computed results for supercubes. Our bound-prune technque can be appled n the BUC approach. More mportantly, t also fts ncely nto the topdown cubng approach, where multway shared computaton of aggregates mproves cubng effcency. Bound-prune s a general technque that can prune for constrants of the form F ðxþ, F ðxþ, or FðxÞ n ½ 1 ; 2 Š. The aggregate functon F can be an SQL aggregate functon or a commonly seen complex aggregate functon such as Average (Avg) or Varance (Var). As wll be shown later 2. Ths s an ssue for other approaches. 3. For ease of dscusson, we assume that NULL s not allowed for measures and, therefore, CountðSaleÞ ¼CountðÞ. n our experments on real-world and synthetc benchmark data sets, our BP-Cubng algorthms are several tmes faster than exstng prunng technques, ncludng the most recent Dvde-and-Approxmate (DnA) algorthm [15]. Organzatonally, Secton 2 gves some prelmnares. Secton 3 defnes boundablty of aggregate functons. We dscuss how we can derve the bounds for base aggregates, ther functons, and complex aggregates n Sectons 4, 5, and 6, respectvely. Secton 7 then presents the BP-Cubng algorthms. We report expermental evaluatons n Secton 8. We dscuss related works n Secton 9 and conclude n Secton SOME PRELIMINARIES ON DATA CUBES We wll use uppercase letters to denote dmensons and lowercase letters to denote dmenson values. A group-by s a tuple of dmensons of the form (A, B, C), and a partton of (A, B, C) s a tuple of dmenson values of the form (a, b, c). The group-bys n a data cube form a lattce structure called the cube lattce. Fg. 2 shows an example fourdmensonal cube lattce. The empty group-by (whch aggregates all tuples n a data set) s at the bottom of the lattce, and the group-by wth all dmensons s at the top. The edges n the lattce represent subset-superset relatonshps between group-bys. A specal value s used n specfyng parttons, wth the meanng that t can match any value (of the applcable dmenson). For the data set n Table 1, ðjan; ; ; Þ denotes the partton of tuples havng Month ¼ January and havng no restrcton on the other dmensons. For brevty, s usually omtted n namng parttons; for example, ðjan; ; ; Þ s wrtten as (Jan). The subset relatonshp also exsts between parttons from dfferent group-bys. For example, the 3D partton (Jan, TV, Perth) s a subset of each of the 2D parttons (Jan, TV), (Jan, Perth), and (TV, Perth). Gven a data set S of n dmensons A 1 ;...;A n and a measure X, the correspondng data cube s usually referred to as CubeðA 1 ;...;A n Þ. Here, we thnk of the cube as the set of possble group-bys or the set of possble groups for all group-bys. For example, the four-dmensonal cube n Table 1 can be denoted as Cube(Month, Product, SalesMan, Cty). For ease of dscusson, we also refer to the cube as CubeðXÞ, and thnk of t as the measure parttons (the possble bags of measure values for the possble groups). Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

3 ZHANG ET AL.: EFFICIENT COMPUTATION OF ICEBERG CUBES BY BOUNDING AGGREGATE FUNCTIONS 905 Specfcally, let g 1 ;...;g m be all possble groups of the cube. By an abuse of notaton, we wll use X to denote the bag (multset) ft½xšjt 2 g g, and we wll refer to X 1 ;...;X m as the measure parttons. Now, gven an aggregate functon F, we apply F to the measure parttons n the data cube to get F ðcubeðxþþ ¼ ff ðx ÞjX 2 CubeðXÞg. 4 Wth the aggregate functon Sum(Sale), we can wrte Sum(Cube(Sale)) for the four-dmensonal cube n Table 1. Aggregate functons are categorzed nto dstrbutve, algebrac, and holstc functons, dependng on how an aggregate on a partton can be computed from aggregates on ts subparttons [6]. 1) An aggregate functon F s dstrbutve f there s a functon G such that F ðsþ ¼ GðfF ðs Þj ¼ 1;...;ngÞ for a data set S and parttonng fs 1 ;...;S n g of S. 5 For example, Sum and Max are dstrbutve, wth G ¼ F, and Count s dstrbutve, wth G ¼ Sum. 2) An aggregate functon F s algebrac f t can be computed by a functon H wth several arguments, each of whch s obtaned by applyng a dstrbutve aggregate functon. Avg, Var, Standard-Devaton, and MaxN and MnN 6 are algebrac functons; for example, Avg s algebrac, snce AvgðSÞ ¼SumðSÞ=CountðSÞ, and both Sum and Count are dstrbutve. All dstrbutve aggregatons are algebrac. 3) An aggregate functon F s holstc f t s not dstrbutve or algebrac. Rank, Mode, and Medan are examples of holstc functons. We wll call the dstrbutve aggregatons used for computng algebrac functons F (tem 2 above) the auxlary aggregates. Exstng cubng algorthms have made use of the propertes of dstrbutve and algebrac functons to compute supergroups from subgroups [1], [10], [11], [17]. No effcent cubng algorthms for holstc functons have been reported. It s worth notng that n most applcatons, aggregate functons are algebrac functons. In the dscussons below, the term aggregate functons wll refer to dstrbutve and algebrac functons unless specfed otherwse. 3 BOUNDING DATA CUBES The basc dea of boundng s to estmate the upper and lower bounds for a cube from the smallest or the MSPs of the base table. In ths secton, we defne the boundablty concept for aggregate functons. In Sectons 4 and 5, we wll dscuss how we can bound the base aggregate functons Count, Max, Mn, and Sum and how we can bound algebrac functons gven as arthmetc expressons of the base aggregate functons. Defnton 1 (most specfc partton and data cube core). Gven a data set on dmensons A 1 ;...;A n and measure X, all n-dmensonal parttons form the core of the data cube: fða 1 ;...;a n Þja 2 domanða Þ;a 6¼ ; 1 ng: Each partton n the core s an MSP. The multset of measure values for tuples n an MSP g, namely, ft½xšjt 2 gg, san MSP of the measure. All aggregates n a data cube can be computed from ts MSPs. For the Sales data set n Table 1 and Cube(Month, 4. Strctly speakng, FðX Þ s the F aggregate of X for group g. 5. S ¼[ n ¼1 S and S \ S j ¼ for 6¼ j. 6. MaxN and MnN return, respectvely, the N largest and smallest values. Product, SalesMan, Cty), the core conssts of sx (Month, Product, SalesMan, Cty) parttons, namely, (Jan, Toy, John, Perth), (Mar, TV, Peter, Perth), (Mar, TV, John, Perth), (Mar, TV, John, Sydney), (Apr, TV, Peter, Perth), and (Apr, Toy, Peter, Sydney). Any aggregate n ths cube can be computed from a subset of the sx MSPs. We thus can use MSPs to bound the aggregates n a data cube. Defnton 2 (Aggregate functon bound). An upper bound of an aggregate functon F for the data cube on X s a real number U such that for any partton X of CubeðXÞ, t s the case that FðX ÞU. Smlarly, we can defne a lower bound for F ðcubeðxþþ to be a real number L such that for any partton X of CubeðXÞ, F ðx ÞL. Observaton 1. Gven a data cube on measure X and an aggregate functon F, the tghtest upper bound and lower bound are, respectvely, reached by the largest and smallest aggregate values that can be produced by any set of MSPs of the data cube. Example 1. In Table 1, Sum(Sale) for any group n the data cube s not larger than the sum of Sum(Sale) for all sx MSPs, whch s 700. On the other hand, Sum(Sale) for any group n the data cube s not smaller than the mnmal Sum(Sale) among the sx MSPs, whch s 100. Therefore, for Sum(Sale), Cube(Month, Product, SalesMan, Cty) has the tghtest upper bound of 700 and the tghtest lower bound of 100. The noton of MSP and data cube core s also appled to subdata cubes. Fg. 2a shows Cube (ABCD), and ts MSPs are the ABCD parttons. The polygon on the left of Fg. 2a denotes a set of lattce structures, as shown n Fg. 2b. The polygon on the rght of Fg. 2a represents Cube(BCD). The MSPs for Cube(BCD) are BCD parttons, whch are unons of ABCD parttons, wth equal values for B, C, and D. The lattce structure, as shown n Fg. 2b, s called a subdata cube. For the lattce n Fg. 2b, (a 1, B, C, D) parttons, whch are a subset of the ABCD parttons, form ts core. The relatonshp between cores for a cube and ts subcubes, and that for a cube and ts lower-dmensonal counterparts, are mportant for computng bounds from MSPs, wthout ncurrng extra cost and for effectve prunng wth bounds, as shown n our BP-Cubng algorthms, dscussed n Secton 7. In later dscussons, we use the term data cube to refer to a standard or subdata cube. Defnton 3 (Subdata cube). Consder dmensons A 1 ;...;A n and B 1 ;...;B k and values b 1 2 domanðb 1 Þ;...;b k 2 domanðb k Þ: The groups aggregatng tuples that satsfy B ¼ b for 1 k form a subdata cube, or smply a subcube, denoted as CubeðA 1 ;...;A n Þjb 1 ;...;b k. The core of the subdata cube comprses MSPs wth B ¼ b for 1 k. B 1 ;...;B k are condtonal dmensons for the subcube. Example 2. Contnung wth Example 1, consder CubeðProduct; SalesMan; CtyÞjMar: There are three MSPs for the data cube: (Mar, TV, Peter, Perth), (Mar, TV, John, Perth), and Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

4 906 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 7, JULY 2007 (Mar, TV, John, Sydney). Tghter bounds can be obtaned on subcubes. Indeed, the tghtest upper bound for Sum(Sale) for the data cube s 100 þ 100 þ 100 ¼ 300, and the tghtest lower bound s Mnðf100; 100; 100gÞ ¼ 100. Based on Observaton 1, we can see that a data cube can be bounded from ts MSPs. However, exhaustvely checkng the power set of MSPs s equvalent to computng the complete data cube and s not computatonally feasble. We defne boundable aggregate functons as follows. Defnton 4 (Boundable aggregate functon). An aggregate functon F s boundable for a data cube f some upper and lower bounds of F for the data cube can be determned by some algorthm wth a sngle scan of some auxlary aggregate values of the MSPs of the data cube. We wll use F ðcubeðxþþ and F ðcubeðxþþ to denote, respectvely, the upper and lower bounds, computed by a gven sngle-scan algorthm. 7 It should be ponted out that the bounds computed may not be the tghtest. To ensure the effectveness of prunng, we am to derve bounds that are as tght as possble. Let us use some example aggregate functons to explan ths defnton. Example 3. Consder the aggregate functon Count and measure X wth n MSPs X 1 ;...;X n, where CountðXÞ ¼SumðfCountðX Þj ¼ 1...ngÞ: The auxlary aggregate functon s Count. For any partton g of CubeðXÞ, the number of tuples n g s not larger than the total number of tuples of all MSPs; n other words, CountðgÞ SumðfCountðX Þj ¼ 1...ngÞ. On the other hand, we also have CountðgÞ MnðfCountðX Þj ¼ 1...ngÞ: As a result, SumðfCountðX Þj ¼ 1...ngÞ s an upper bound, and MnðfCountðX Þj ¼ 1...ngÞ s a lower bound. They can be obtaned by a sngle scan of the auxlary aggregates of MSPs. Therefore, CountðCubeðXÞÞ s boundable. We now gve an example of aggregate functons whose bounds can be nfnte. Example 4. Gven measure X and aggregate functon 1/Sum, consder CubeðXÞ wth MSPs X 1 ;...;X n. Then, 1=SumðXÞ ¼1=SumðfSumðX Þj ¼ 1...ngÞ. As wll be explaned later n Tables 3 and 4, the sgns of SumðX Þ are mportant for computng the bounds: 1. If SumðX Þ > 0 for all ¼ 1...n, then the upper and lower bounds are computed from, respectvely, the postve lower and upper bounds of SumðCubeðXÞÞ: 1=SumðCubeðXÞÞ ¼ 1=MnðfSumðX Þj ¼ 1...ngÞ 7. We wll wrte FðCubeðXÞÞ and FðCubeðXÞÞ to mean, respectvely, the upper and lower bounds computed by our algorthm. That algorthm s the sum of the boundng formulas presented n ths paper. TABLE 2 The Bounds of Base Aggregate Functons Gven a data set, X s the measure, and X 1 ;...;X n are the MSPs. (a) There s such that SumðX Þ > 0. (b) There s such that SumðX Þ < 0. and 1=SumðCubeðXÞÞ ¼ 1=SumðfSumðX Þj ¼ 1...ngÞ: 2. If SumðX Þ < 0 for all ¼ 1...n, then the upper and lower bounds are computed from, respectvely, the negatve lower and upper bounds of SumðCubeðXÞÞ as 1=SumðCubeðXÞÞ ¼ 1=SumðfSumðX Þj ¼ 1...ngÞ and 1=SumðCubeðXÞÞ ¼ 1=MaxðfSumðX Þj ¼ 1...ngÞ: 3. For general cases, where SumðX Þ can be postve or negatve, the upper bound 1=SumðCubeðXÞÞ s computed from the smallest SumðxÞ 0 for any partton n the cube. For any group g of CubeðXÞ, SumðgÞ can be the sum of any postve or negatve MSP aggregates and can have a mnmum of 0. Thus, the upper bound for 1=SumðCubeðXÞÞ s 1. Smlarly, the lower bound 1=SumðCubeðXÞÞ s computed from the largest SumðxÞ < 0 for any partton n the cube, whch can be a negatve approachng 0. Therefore, the lower bound for 1=SumðCubeðXÞÞ s 1. Computng the upper and lower bounds for arthmetc expressons of base aggregate functons such as 1/Sum s summarzed later n Table 4. 4 BOUNDING BASE AGGREGATE FUNCTIONS We now consder how we can bound the base aggregate functons of SQL. Theorem 1. The base aggregate functons Count, Mn, Max, and Sum are boundable, and ther bounds are lsted n Table 2. We use the followng example for Sum to explan the reasonng for the bounds of Table 2 and how they are obtaned by a sngle scan of MSPs. Example 5. Suppose we are to compute a data cube on measure X, whch may take postve or negatve values. Let the MSPs of X be X 1 ;...;X n. Wth a sngle scan of X 1 ;...;X n, we have SumðX Þð1 nþ and the postve/negatve and maxmal/mnmal values among them: Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

5 ZHANG ET AL.: EFFICIENT COMPUTATION OF ICEBERG CUBES BY BOUNDING AGGREGATE FUNCTIONS 907 TABLE 3 The Sgn Bounds of Base Aggregate Functons TABLE 4 Boundng Arthmetc Expressons X s the measure of a data set, and the MSPs for the data cube are X 1 ;...;X n. 1. If there exsts such that SumðX Þ > 0, then the sum for any group n CubeðXÞ s not greater than the sum of all such postve SumðX Þ. Otherwse, SumðX Þ0 for ¼ 1...n; the sum for any group s not greater than the maxmal SumðX Þ. Thus, SumðCubeðXÞÞ ¼ Sum SumðX Þ>0 SumðX Þ f there exsts SumðX Þ > 0; SumðCubeðXÞÞ ¼ MaxSumðX Þ; otherwse: 2. If there exsts such that SumðX Þ < 0, then the sum of any group n CubeðXÞÞ s not less than the sum of all such SumðX Þ. Otherwse, SumðX Þ0 for all ¼ 1...n; the sum of any group s not less than the mnmal SumðX Þ. Therefore, SumðCubeðXÞÞ ¼ Sum SumðX Þ<0 SumðX Þ; f there exsts SumðX Þ < 0; SumðCubeðXÞÞ ¼ MnSumðX Þ; otherwse: To bound functons that are arthmetc expressons of base aggregate functons, sometmes, the sgn bounds of aggregatons need to be computed (see Secton 5). These nclude postve upper bound ðf þ Þ, postve lower bound ðf þ Þ, negatve upper bound ðf Þ, and negatve lower bound ðf Þ, where F s ether a base aggregate functon Mn, Max, Sum, or Count or an arthmetc expresson of base aggregate functons. F þ CubeðXÞ s defned to be a real l 0 obtaned by a sngle-scan algorthm such that l F ðgþ 0 for any partton g n CubeðXÞ. The other sgn bounds can be defned smlarly. The postve (negatve) bounds are applcable only f there are nonnegatve (negatve) aggregates among the MSPs. The sgn bounds of base aggregate functons are lsted n Table 3. We explan how the sgn bounds for Sum are computed. Computaton for other functons s straghtforward. We dscuss how we can compute the postve bounds of Sum, wth dscussons on the negatve bounds mrror the same The short notatons E 1, E 2, and ðe 1 op E 2 Þ denote E 1 ðcubeðxþþ, E 2 ðcubeðxþþ, and ðe 1 op E 2 ÞðCubeðXÞÞ, respectvely, where X s the measure of a data set. When the operator s or /, to bound ðe 1 op E 2 Þ, not all four knds of computaton are applcable. As ndcated, the sgns of E 1 ðx Þ and E 2 ðx Þð ¼ 1...nÞ decde the applcable computaton. case as n postve bounds. The postve upper bound s the sum of all postve sums of the MSPs. To compute the postve lower bound, two cases exst: 1) f SumðX Þ0 for all X ð1 nþ, then the postve lower bound s the mnmal SumðX Þ0 and 2) f the sums of MSPs nclude both negatve and postve ones, then the sum of any set of MSPs can be as low as 0, and so, 0 s the postve lower bound. 5 BOUNDING ARITHMETIC EXPRESSIONS OF BASE AGGREGATE FUNCTIONS Consder a data cube CubeðXÞ and an arthmetc expresson of auxlary aggregates E ¼ E 1 op E 2, where E 1 and E 2 are base aggregate functons or arthmetc expressons of base aggregates. If the operator s þ or, then the upper (lower) bounds of EðCubeðXÞÞ can be computed from the upper (lower) bounds of E 1 ðcubeðxþþ and E 2 ðcubeðxþþ. If the operator s or /, then we need to use the sgn bounds of E 1 ðcubeðxþþ and E 2 ðcubeðxþþ to bound EðCubeðXÞÞ. Proposton 1. Gven CubeðXÞ on measure X and an arthmetc expresson E ¼ E 1 op E 2, where the operator s þ,,, or /, and E 1 and E 2 are base aggregate functons or arthmetc expressons of base aggregates, EðCubeðXÞÞ and EðCubeðXÞÞ can be computed from the bounds or sgn bounds of E 1 ðcubeðxþþ and E 2 ðcubeðxþþ, as shown n Table 4. Proof. Let X 1 ;...;X n be the MSPs of CubeðXÞ. We consder each of the four possble operators: 1. E 1 þ E 2. By the defnton of upper bound, for any group g 2 CubeðXÞ, E 1 ðgþ E 1 ðcubeðxþþ, and E 2 ðgþ E 2 ðcubeðxþþ. We have E 1 ðgþþe 2 ðgþ E 1 ðcubeðxþþ þ E 2 ðcubeðxþþ: Thus, ðe 1 þ E 2 ÞðCubeðXÞÞ ¼ E 1 ðcubeðxþþ þ E 2 ðcubeðxþþ: The proof s smlar for ðe 1 þ E 2 ÞðCubeðXÞÞ ¼ E 1 ðcubeðxþþ þ E 2 ðcubeðxþþ: Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

6 908 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 7, JULY E 1 E 2. Gven g 2 CubeðXÞ, E 1 ðgþ E 1 ðcubeðxþþ and E 2 ðgþ E 2 ðcubeðxþþ. Thus, E 1 ðgþ E 2 ðgþ E 1 ðcubeðxþþ E 2 ðcubeðxþþ and ðe 1 E 2 ÞðCubeðXÞÞ ¼ E 1 ðcubeðxþþ E 2 ðcubeðxþþ: The proof s smlar for ðe 1 E 2 ÞðCubeðXÞÞ ¼ E 1 ðcubeðxþþ E 2 ðcubeðxþþ: 3. E 1 E 2. Let g be a group of CubeðXÞ. Let Y 1 ;Y 2 ;...;Y k be the MSPs that are contaned n g. Based on ther sgns under E 1, we form two subsets from these MSPs: PE 1 ¼fY j je 1 ðy j Þ0; 1 j kg; NE 1 ¼fY j je 1 ðy j Þ < 0; 1 j kg: Smlarly, based on ther sgns under E 2, we form the followng subsets from these MSPs: PE 2 ¼fY j je 2 ðy j Þ0; 1 j kg; and NE 2 ¼fY j je 2 ðy j Þ < 0; 1 j kg: For the upper bound, we frst consder the case where PE 1 6¼; and PE 2 6¼; or NE 1 6¼; and NE 2 6¼;. E 1 ðgþe 2 ðgþ can have postve values. Thus, E 1 ðgþe 2 ðgþ Max E1 þðcubeðxþþ Eþ 2 ðcubeðxþþ; E 1 ðcubeðxþþ E 2 ðcubeðxþþ : Otherwse, E 1 ðgþe 2 ðgþ can only be negatve. We have E 1 ðgþe 2 ðgþ Max E1 þ ðcubeðxþþ E 2 ðcubeðxþþ; E 1 ðcubeðxþþ Eþ 2 ðcubeðxþþ : For all cases, by combnng the two nequaltes wth E 1 ðgþe 2 ðgþ on the left-hand sde, we get ðe 1 E 2 ÞðCubeðXÞÞ: Max E1 þðcubeðxþþ Eþ 2 ðcubeðxþþ; E 1 ðcubeðxþþ E 2 ðcubeðxþþ; E þ 1 ðcubeðxþþ E 2 ðcubeðxþþ; E 1 ðcubeðxþþ Eþ 2 ðcubeðxþþ : The proof for the lower bound s qute smlar to that for the upper bound. The possble postve values of g are Mn E1 þ ðcubeðxþþ Eþ 2 ðcubeðxþþ; E1 ðcubeðxþþ E 2 ðcubeðxþþ and the possble negatve values of g are Mn E1 þðcubeðxþþ E 2 ðcubeðxþþ; E 1 ðcubeðxþþ Eþ 2 ðcubeðxþþ : By combnng the last two nequaltes, we get ðe 1 E 2 ÞðCubeðXÞÞ: Mn E1 þ ðcubeðxþþ Eþ 2 ðcubeðxþþ; E 1 ðcubeðxþþ E 2 ðcubeðxþþ; E þ 1 ðcubeðxþþ E 2 ðcubeðxþþ; E 1 ðcubeðxþþ Eþ 2 ðcubeðxþþ : 4. E 1 =E 2. The proof s smlar to that for the operator and s omtted. tu Example 6. Consder a data cube CubeðXÞ, where the measure X can be postve or negatve, and the aggregate functon Avg. Suppose the MSPs are X 1 ;...;X n. For any MSP X, we note that AvgðX Þ¼SumðX Þ=CountðX Þ, and CountðX Þ s always postve. Followng Table 4, AvgðCubeðXÞÞ s Max Sum þ ðcubeðxþþ=count þ ðcubeðxþþ; Sum ðcubeðxþþ=count þ ðcubeðxþþ and AvgðCubeðXÞÞ s Mn Sum þ ðcubeðxþþ=count þ ðcubeðxþþ; Sum ðcubeðxþþ=count þ ðcubeðxþþ : If there exsts some such that SumðX Þ > 0, then AvgðCubeðXÞÞ ¼ Sum þ ðcubeðxþþ=countðcubeðxþþ; and AvgðCubeðXÞÞ ¼ Sum þ ðcubeðxþþ=countðcubeðxþþ: Otherwse, SumðX Þ0 for all ¼ 1...n; then, we have AvgðCubeðXÞÞ ¼ Sum ðcubeðxþþ=countðcubeðxþþ; and AvgðCubeðXÞÞ ¼ Sum ðcubeðxþþ=countðcubeðxþþ: We then follow Table 3 to compute the sgn bounds n the above expressons. The sgn bounds for base aggregate functons are lsted n Table 3. We now dscuss how we can derve the sgn bounds for arthmetc expressons. Note that n Table 5, postve (negatve) bounds are applcable only f there exsts an MSP X n the cube such that ðe 1 ðx Þ op E 2 ðx ÞÞ 0 ððe 1 ðx Þ op E 2 ðx ÞÞ < 0Þ: Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

7 ZHANG ET AL.: EFFICIENT COMPUTATION OF ICEBERG CUBES BY BOUNDING AGGREGATE FUNCTIONS 909 TABLE 5 The Sgn Bounds of Arthmetc Expressons Short notatons E 1, E 2, and ðe 1 op E 2 Þ denote E 1 ðcubeðxþþ, E 2 ðcubeðxþþ, and ðe 1 op E 2 ÞðCubeðXÞÞ, respectvely, where X s the measure of a data set. For example, the postve upper bound ðe 1 þ E 2 ÞðCubeðXÞÞ þ s applcable only f there exsts X such that E1ðX ÞþE 2 ðx Þ0: For the operators þ and, when E 1 þ E 2 or E 1 E 2 can be postve or negatve for MSPs, the postve lower and negatve upper bounds for both expressons are 0: ths s because the combnaton of postve and negatve MSPs can produce the aggregate value of 0. For the operators and /, the sgns of E 1 and E 2 for MSPs decde the applcable computaton. We use an example to explan Table 5. Example 7. Gven data cube CubeðXÞ wth MSPs X 1 ;...;X n, consder the aggregate functon EðXÞ ¼1=ðMaxðXÞþMnðXÞÞ: To compute EðCubeðXÞÞ, three cases can arse: Case 1. MnðX Þ0 for all ¼ 1...n. It follows that MaxðX Þ0 for all ¼ 1...n. Then, EðCubeðXÞÞ ¼ 1= Max þ ðcubeðxþþ þ Mn þ ðcubeðxþþ : Case 2. MaxðX Þ0and MnðX Þ0for all ¼ 1...n. Then, EðCubeðXÞÞ ¼ 1= Max ðcubeðxþþ þ Mn ðcubeðxþþ : Case 3. MaxðX Þ and MnðX Þ can be postve or negatve ð1 nþ. Based on Table 4, we have EðCubeðXÞÞ ¼ 1=ðMaxðÞ þ MnðÞÞ þ ðcubeðxþþ. Based on Table 5, ðmaxðþ þ MnðÞÞ þ ðcubeðxþþ s 0, whch means that the upper bound for 1=ðMaxðÞ þ MnðÞÞðCubeðXÞÞ s 1. Proposton 1 leads to the followng theorem about boundng arthmetc expressons of base aggregate functons. Theorem 2. Gven a data cube on measure X, an arthmetc expresson E ¼ E 1 op E 2 of base aggregate functons, where the operator s þ,,, or/.e s boundable for CubeðXÞ, and ts bounds can be computed followng Tables 2, 3, 4, and 5. Proof. By Proposton 1, we can compute the bounds for an arthmetc expresson from the bounds or sgn bounds of the operand expressons, as lsted n Table 4. Table 5 shows how the sgn bounds of any expresson can be computed from the bounds for base aggregate functons. Tables 2 and 3 demonstrate that a sngle scan of the MSPs of a data cube can compute the bounds of all base aggregate functons. As a result, all arthmetc expressons of base aggregate functons are boundable, and the bounds can be computed followng Tables 2, 3, 4, and 5. tu We now show how we can compute the upper bound for the complex aggregate functon Var based on the theorem. Example 8. The Var for X ¼fx j ¼ 1...Ng s defned as 8 P ðx XÞ 2 VarðXÞ ¼ N, where X denotes the Avg of X. It can be rewrtten as follows: VarðXÞ ¼ ¼ ¼ ¼ P ðx2 2 X x þ X 2 Þ ; N P P x2 x N P x2 2 X N N 2 X2 þ X 2 ¼ Sum x 2 CountðXÞ SumðXÞ 2 : CountðXÞ þ N X2 N ; P x2 N X2 ; Let QSumðXÞ denote Sum x 2. It follows that VarðXÞ s an arthmetc expresson of QSumðXÞ, SumðXÞ, and CountðXÞ. Consder a data cube CubeðXÞ wth MSPs X 1 ;...;X n. Based on Table 4, we have QSumðÞ VarðCubeðXÞÞ ¼ ðcubeðxþþ CountðÞ SumðÞ 2ðCubeðXÞÞ: CountðÞ We next consder each of the two subexpressons separately. We frst consder ð QSumðÞ CountðÞ ÞðCubeðXÞÞ: For any MSP X, QSumðX Þ and CountðX Þ are always postve. By followng Table 4, we have the followng fractonal expresson, where the numerator and denomnator expressons can be computed followng Table 2: QSumðÞ CountðÞ ðcubeðxþþ ¼ QSumðCubeðXÞÞ=CountðCubeðXÞÞ. We then consder ð SumðÞ CountðÞ Þ2 ðcubeðxþþ. Based on Table 4, we have SumðÞ 2 Mn ðsum=countþ þ ðsum=countþ þ ; CountðÞ ðsum=countþ ðsum=countþ : Count s always postve. Byfollowng Table 5 for 2ðCubeðXÞÞ: computng sgn bounds, we get SumðÞ CountðÞ 8. There s another defnton of Varance: VarðXÞ ¼ dscussons here also apply to ths defnton. P ðx XÞ2 N 1. The Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

8 910 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 7, JULY ; Mn Sum þ ðcubeðxþþ=countðcubeðxþþ 2 Sum ðcubeðxþþ=countðcubeðxþþ : In the above equaton, CountðCubeðXÞÞ ¼ Sum CountðX Þ: Moreover, based on Table 3, Sum þ ðcubeðxþþ ¼ Mn SumðX Þ f SumðX Þ0 for all ¼ 1...n; otherwse, t s 0. Smlarly, Sum ðcubeðxþþ ¼ Max SumðX Þ f SumðX Þ < 0 for all ¼ 1...n; otherwse, t s 0. In summary, we compute VarðCubeðXÞÞ as follows: f SumðX Þ0 for ¼ 1::n; Sum SumðX 2Þ=Mn CountðX Þ 2 ; Mn SumðX Þ= Sum CountðX Þ otherwse; f SumðX Þ < 0 for ¼ 1::n; Sum SumðX 2Þ=Mn CountðX Þ 2 ; Max otherwse; Sum SumðX Þ= Sum CountðX Þ SumðX 2Þ=Mn CountðX Þ: Wth these equatons, obvously tghter VarðCubeðXÞÞ s obtaned f the sums of MSPs are of the same sgn. 6 OPTIMIZATION FOR BOUNDING COMPLEX FUNCTIONS When rewrtng several commonly used algebrac aggregaton functons, namely, Avg, Var, and Standard-Devaton, nto arthmetc expressons of dstrbutve aggregate functons, we notce that very often, they use the followng subexpresson: F ðxþ ¼Sum G 1 ðx Þ= Sum G 2 ðx Þ; where G 1 and G 2 are some dstrbutve aggregate functons, and X 1 ;...;X n are MSPs. (We can rewrte AvgðXÞ as SumðXÞ=CountðXÞ and VarðXÞ as QSumðXÞ=CountðXÞ ðsumðxþ=countðxþþ 2 : Moreover, we have SumðXÞ=CountðXÞ ¼Sum SumðX Þ=Sum CountðX Þ; whch s F, wth G 1 ¼ Sum and G 2 ¼ Count.) Therefore, t s desrable to gve tghter bounds for F than those obtaned by Table 4. The next result provdes not only tght bounds but also the optmal. Theorem 3. Let CubeðXÞ be a data cube over measure X wth MSPs X 1 ;...;X n, let G 1 and G 2 be aggregate functons, and let F be as defned above. If G 2 ðx Þ > 0 for all ¼ 1...n, then we can bound F usng FðCubeðXÞÞ ¼ Max FðX Þ; F ðcubeðxþþ ¼ Mn F ðx Þ: Moreover, these bounds are the optmal bounds for F ðcubeðxþþ. Proof. Wthout loss of generalty, we can assume that Mn FðX Þ¼FðX 1 Þ, and Max F ðx Þ¼FðX n Þ. Gven a group g of CubeðXÞ, let X g1 ;...;X gk be the MSPs contaned n g. Thus, g ¼ X g1 [...[ X gk. Then, F ðgþ ¼ Sum j G 1 ðx gj Þ=Sum j G 2 ðx gj Þ; ¼ G 1ðX g1 Þþ...þ G 1 ðx gk Þ ; G 2 ðx g1 ÞþþG 2 ðx gk Þ ¼ F ðx g 1 ÞG 2 ðx g1 Þþ...þ F ðx gk ÞG 2 ðx gk Þ : G 2 ðx g1 Þþ...þ G 2 ðx gk Þ The dvson n the last step s permtted, snce G 2 ðx g Þ > 0 for all. Therefore, F ðgþ FðX n Þ ¼ F ðx g 1 ÞG 2 ðx g1 Þþ...þ F ðx gk ÞG 2 ðx gk Þ G 2 ðx g1 Þþ...þ G 2 ðx gk Þ F ðx n Þ G 2ðX g1 Þþ...þ G 2 ðx gk Þ G 2 ðx g1 Þþ...þ G 2 ðx gk Þ ¼ ðfðxg 1 Þ FðXnÞÞG 2 ðxg 1 Þþ...þðF ðxg k Þ F ðxnþþg 2 ðxg k Þ G 2 ðxg 1 Þþ...þG 2 ðxg k Þ : Snce F ðx gj ÞF ðx n Þ and G 2 ðx gj Þ > 0 for all j ¼ 1...k, we have G 2 ðx g1 Þþ...þ G 2 ðx gk Þ > 0 and ðf ðx gj Þ F ðx n ÞÞ G 2 ðx gj Þ0 for all j. Therefore, FðgÞ F ðx n Þ0 or, equvalently, F ðgþ F ðx n Þ. Smlarly, t can be shown that F ðgþ F ðx 1 Þ. Thus, we have proven that F ðx n Þ and F ðx 1 Þ are the upper and lower bounds for F ðcubeðxþþ, respectvely. As Mn F ðx Þ s the smallest aggregate for an MSP and thus s a real aggregate n F ðcubeðxþþ, all other lower bounds for F ðcubeðxþþ must be greater than Mn F ðx Þ. Mn F ðx Þ s thus the tghtest lower bound for F ðcubeðxþþ. Smlarly, we can prove that Max FðX Þ s the tghtest upper bound for F ðcubeðxþþ. Next, we show that the bounds obtaned by followng Tables 2 and 4 are weaker. Consderng Max FðX Þ (and stll assumng that Max F ðx Þ¼FðX n Þ), the case for Mn F ðx Þ s smlar. By followng Tables 2 and 4, F ðcubeðxþþ s computed: Case 1. There exsts j such that G 1 ðx j Þ > 0. F ðcubeðxþþ s Sum G 1 ðx Þ=Sum G 2 ðx Þ¼ Sum G 1 ðx Þ= Mn G 2 ðx Þ: G 1 ðx Þ0 Snce G 2 ðx Þ > 0 for all ¼ 1...n, and G 1 ðx j Þ > 0, F ðx j Þ > 0. Snce Max FðX Þ¼FðX n Þ, t follows that G 1 ðx n Þ > 0. Snce Sum G 1 ðx ÞG 1 ðx n Þ and G 1 ðx Þ0 Mn G 2 ðx ÞG 2 ðx n Þ, we have Sum G 1ðX Þ0 G 1 ðx Þ= Mn G 2 ðx Þ G 1ðX n Þ G 2 ðx n Þ ¼ F ðx nþ: Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

9 ZHANG ET AL.: EFFICIENT COMPUTATION OF ICEBERG CUBES BY BOUNDING AGGREGATE FUNCTIONS 911 TABLE 6 Boundng Complex Aggregate Functons Fg. 3. A G-tree for the data set n Table 1. CubeðXÞ s a data cube on measure X wth MSPs fx 1 ;...;X n g. Hence, Max F ðx Þ¼F ðx n Þ s a tghter bound. Case 2. For all ð1 nþ, G 1 ðx Þ0. FðCubeðXÞÞ s Sum As G 1 ðx Þ=Sum G 2 ðx Þ¼Max G 1 ðx Þ= Max G 2 ðx Þ: 0 Max G 1 ðx ÞG 1 ðx n Þ and 0 <G 2 ðx n ÞMax G 2 ðx Þ, 0 Max G 1 ðx Þ= Max G 2 ðx ÞG 1 ðx n Þ=G 2 ðx n Þ¼FðX n Þ: Hence, Max F ðx Þ¼FðX n Þ s a tghter bound. tu By followng Theorem 3, tghter bounds for Avg and Var can be computed, as shown n Table 6. In Secton 7, we present an effcent mplementaton of the BP-Cubng algorthm, utlzng the boundng results dscussed so far. 7 BOUND-PRUNE CUBING ALGORITHMS We frst present the group tree (G-tree) data structure that we wll use for ceberg cubng. We then explan how bound prune cubng (BP-Cubng) s mplemented on the G-tree. Fnally, we brefly dscuss how our algorthms utlze antmonotone constrants. The ceberg cube wth the constrant Avg(Sale) n [15, 20] on the Sales data set n Table 1 s used as a runnng example throughout ths secton. 7.1 The G-Tree The underlyng data structure for BP-Cubng s the G-tree. A G-tree s the compresson of a gven nput data set and s constructed by one scan of the data set. It s used for both the top-down and bottom-up bound prune cubng algorthms (BP-Cubng(TD) and BP-Cubng(BU), respectvely), although there are some mnor dfferences dependng on the traversal strategy. The G-tree for the data set n Table 1 s shown n Fg. 3. A G-tree for an n-dmensonal data set s of depth n, where each level represents a dmenson. A path startng from the root collapses the tuples wth common dmenson values along the path. Each tree node keeps the auxlary aggregates necessary to compute the ceberg cube. In our example, the auxlary aggregates n a node are Sum(Sale) and CountðÞ. For the leftmost path from the root of the G-tree n Fg. 3, the node (March) shows that there are 70 tuples wth SumðSaleÞ ¼300 n the ðmarch; ; ; Þ partton, whereas the node (Peter) shows that there are 40 tuples wth SumðSaleÞ ¼100 n the ðmarch; TV; Peter; Þ partton. For a gven data set, dfferent dmenson orders result n dfferent G-trees. Dependng on whether the G-tree s traversed top-down or bottom-up n computng cubes, the tree should be constructed n dfferent orders of dmensons. Antmonotone prunng s effectve when the most dscrmnatng dmenson s examned frst. Ths suggests the cardnalty-descendng order of dmensons durng cube computaton. As a result, for bottom-up traversal, the cardnalty-ascendng order should be used n constructng the G-tree. In contrast, for top-down traversal, the cardnalty-descendng order should be used n constructng the G-tree. Our experments have confrmed that such heurstcs ndeed have a postve mpact on the effectveness of prunng and the effcency of cubng algorthms. 7.2 Top-Down Bound-Prune Cubng on G-Trees An observaton of Fg. 3 reveals some useful facts: The path from the root to a node of the G-tree aggregates a group, and each level of the G-tree computes a group-by of the cube lattce. For the G-tree n Fg. 3, the root node has the aggregate for the group ð; ; ; Þ, wth CountðÞ ¼ 88, and SumðSaleÞ ¼700. The nodes at level 1 compute the aggregates for groups n ðmonth; ; ; Þ, whch are ðmarch; ; ; ; 70; 300Þ, ðjanuary; ; ; ; 5; 200Þ, and ðaprl; ; ; ; 13; 200Þ. The nodes at the next three levels compute the aggregates for the group-bys (Month, Product), (Month, Product, SalesMan), and (Month, Product, SalesMan, Cty), respectvely. The leaf nodes gve the MSPs for Cube(Month, Product, SalesMan, Cty). We summarze these n the followng observatons. Observaton 2. In a G-tree, the aggregates n each node are the aggregates for the group wth dmenson values on the path from the root to the node. Observaton 3. In a G-tree, for n dmensons A 1 ;...;A n, the leaf nodes are the MSPs for CubeðA 1 ;...;A n Þ. Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

10 912 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 7, JULY 2007 Fg. 4. Top-down bound prune cubng of CubeðABCDÞ: For each node, we show a G-tree (before : ), the group-bys (after : ) computed n the G-tree, and the shared dmensons (after the / ). The numbers show the order n whch trees are bult. G-trees are constructed to compute all group-bys n a data cube. When constructng a G-tree for an n-dmensonal data set, we smultaneously compute n group-bys, namely, the group-bys whose dmensons are prefxes of the lst of dmensons ordered by the levels of the tree. To compute the other group-bys n the cube, we collapse one dmenson from a gven G-tree at a tme to construct a sub-g-tree 9 and compute the correspondng group-bys of the sub-g-tree. For example, by collapsng dmenson Product n the orgnal G-tree, gven n Fg. 3, we get the subtree fmonth; ð ProductÞ; SalesMan; Cty; Customerg, shown n Fg. 5. Fg. 4 shows the (sub)g-trees constructed for computng the data cube on dmensons A, B, C, and D. Each node n Fg. 4 represents a G-tree and the set of all correspondng group-bys. The ABCD tree at the top s constructed by one scan of the data set, and the correspondng group-bys (A, B, C, D), whch are the (A, B, C), (A, B), (A), and () trees are computed durng ths scan. The sub-g-trees of the ABCD tree, whch are the ð AÞBCD, Að BÞCD, and ABð CÞD trees, are formed by collapsng dmensons A, B, and C, respectvely. The CD tree s a subtree of the BCD tree and recursvely s also a subtree of the ABCD tree. The dmensons after / n the nodes denote common prefx dmensons for the tree at the node and all of ts subtrees. A s the common prefx dmenson for the ACD tree and ts subtrees. All group-bys that are computed on the ACD tree and ts subtrees form subcubes for A values CubeðCDÞja. The leaf nodes orgnatng from a are the MSPs for CubeðCDÞja. In the sub-g-tree constructon process, we also compute the bounds and use them for prunng, as descrbed n the observaton below. Observaton 4. Wth top-down aggregaton, gven a G-tree G wth n dmensons A 1 ;...;A n and a subtree G k, by collapsng a dmenson A k, 1 < k < n, A 1 ;...;A k 1 are the common prefx dmensons for G k and all ts subtrees. Each node ða 1 ;...;a k 1 Þ of G gves the prefx dmenson values for CubeðA kþ1 ;...;A n Þja 1 ;...;a k 1. The core of the cube conssts of the MSPs correspondng to the leaf nodes of the branches orgnatng from the node ða 1 ;...;a k 1 Þ. If the bounds of CubeðA kþ1 ;...;A n Þja 1 ;...;a k 1 fal the gven constrant, then those branches can be pruned. Example 9. Consder the G-tree G n Fg. 3 as the orgnal tree and the G-tree G p n Fg. 5 as the subtree. Month s the prefx dmenson for the group-bys on G p and the subtree of G p. The subcubes are 9. It should be ponted out that a sub-g-tree s not part of an orgnal G-tree but s obtaned by collapsng a gven dmenson of the orgnal G-tree. Fg. 5. The subtree on fmonth; ð ProductÞ; SalesMan; Ctyg obtaned by collapsng dmenson Product n the orgnal G-tree, gven n Fg CubeðSalesMan; CtyÞjMarch, 2. CubeðSalesMan; CtyÞjJanuary, and 3. CubeðSalesMan; CtyÞjAprl. In G, the leaf nodes of the March subtree are the MSPs of CubeðSalesMan; CtyÞjMarch. Wth the constrant Avg(Sale) n [15, 20], G s pruned n constructng the subtree G p. By followng Table 6, the bounds for CubeðSalesMan; CtyÞjMarch are computed from the three leaf nodes orgnatng from node (March) of G: AvgðCubeðSaleÞjMarÞ ¼Maxðf100=40; 100=20; 100=10gÞ ¼ 10: AvgðCubeðSaleÞjMarÞ ¼Mnðf100=40; 100=20; 100=10gÞ ¼ 2:5: As [2.5, 10] volates Avg(Sale) n [15, 20], all three branches orgnatng from (March) are removed from further computaton. Smlarly, Avg(Sale) of CubeðSalesman; CtyÞjJanuary s bounded as [40, 40], whch volates the constrant, and s pruned. Avg(Sale) of CubeðSalesman; CtyÞjAprl s bounded as [12.5, 25], whch does not volate the constrant, and the branches from (Aprl) are kept n G p. The nodes n dashed lnes n Fg. 5 hghlght the fact that the branches from (March) and (January) are pruned before the G p tree s generated, and mportantly, they are permanently pruned from all future recursve computaton. 7.3 The Top-Down Bound-Prune Cubng Algorthm In ths secton, we present the Top-Down Bound-Prune Cubng (BP-Cubng(TD)) algorthm, shown n Algorthm 1, for computng ceberg data cubes. It performs boundprunng n the top-down multway aggregaton manner, and t uses the G-trees. Algorthm 1. The Top-Down Bound-Prune Cubng Algorthm Input: A data set D over dmensons A 1 ;...;A n and aggregaton constrant C, assumed global. Output: An n-dmensonal ceberg cube on D satsfyng C. (1) Buld the G-tree TðA 1 ;...;A n Þ from D; (2) Output aggregates satsfyng C, computed when T was bult; (3) for ¼ 1...ðn 1Þ, do (4) BP-CubngðTðA 1 ;...;A n Þ;A Þ; Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

11 ZHANG ET AL.: EFFICIENT COMPUTATION OF ICEBERG CUBES BY BOUNDING AGGREGATE FUNCTIONS 913 Fg. 6. The G-tree for the data set n Table 1, wth header table and sde lnks. // B, the th dmenson of T s the dmenson to be collapsed Procedure BP-Cubng ðtðb 1 ;...;B k Þ;B Þ (5) T s nl; //construct subtree T s by collapsng B. (6) for each node n 1 of dmenson B 1 n T, do (7) V 1 dmenson values on the path from root to n 1 ; // the MSPs for CubeðB þ1 ;...;B k ÞjV 1 (8) Let M be the set of leaves orgnatng from n 1 ; (9) Compute the bounds for CubeðB þ1 ;...;B k ÞjV 1 from M; // prune the paths f the bounds volate C (10) f the bounds do not volate C then (11) Collapse B and add the paths passng n 1 of T to T s ; (12) Output aggregates on T s that satsfy C; (13) for j ¼ðþ1Þ...ðk 1Þ do (14) BP-CubngðT s ðb 1 ;...;B 1 ;B þ1 ;...;B k Þ;B j Þ Let D be a data set wth n dmensons A 1 ;...;A n and C be the ceberg constrant. To compute the ceberg cube, frst, a G-tree T s bult by one scan of D (lne 1). As mentoned n Observaton 2, we smultaneously aggregate n group-bys, namely, ða 1 Þ, ða 1 ;A 2 Þ;...; ða 1 ;A 2 ;...;A n Þ, when buldng the G-tree. Those groups that satsfy C are output (lne 2). The Bound-Prune Cubng (BP-Cubng) procedure computes the ceberg cube by recursvely collapsng dmensons and buldng sub-g-trees. Gven a G-tree, before a subtree s bult, bounds for branches are computed, and those branches whose bounds fal the constrant are pruned from further computaton (lnes 9-11). The leaf nodes of a G-tree are the MSPs for the G-tree, and these MSPs are used to compute the correspondng aggregate bounds. T s s a subtree of T, obtaned by collapsng dmenson B ; the collapsng computaton s descrbed n lnes Aggregates are computed durng the constructon of T s. Then, BP- Cubng s recursvely called (lnes 13-14), where group-bys of dmensons after B are computed. 7.4 Bottom-Up Bound-Prune Cubng on G-Trees The BP-Cubng(BU) algorthm s smlar to Algorthm 1. Wth the bottom-up cubng strategy, a group s computed before ts subgroups. There are several dfferences concernng 1) the G-tree used, 2) how sub-g-trees are constructed, and 3) how boundng wth MSPs are done. We descrbe these below. Fg. 7. The subtree of the G-tree n Fg. 6, wth Cty ¼ Sydney. We frst consder the G-tree used. In BUC, a G-tree wll also contan a header table and sde lnks. A header table entry represents a one-dmensonal group together wth the correspondng aggregates, and t s lnked to nodes wth the correspondng dmenson value. Fg. 6 shows such a G-tree for the sample Sales data set. The sde-lnks wll be used to effcently derve aggregates of other groups n subsequent computaton. We now consder how sub-g-trees are constructed. In the frst G-tree for a data set, the header table contans onedmensonal groups. In the sub-g-trees constructed from a gven G-tree, the header table contans groups whose dmensonalty s one level hgher than that of groups for the gven G-tree. For the G-tree n Fg. 6, we buld sub-gtrees for the 2D subgroups of the (Cty) groups, namely, the groups of (Salesman, Cty), (Product, Cty), and (Month, Cty). Smlarly, we buld sub-g-trees for the 3D subgroups of the (Salesman, Cty) groups, and so on. The sub-g-trees are obtaned by mergng paths n the orgnal G-tree. For example, to compute the 2D subgroups of Sydney, the subtree n Fg. 7 s constructed by mergng the paths from the root to the nodes on the Sydney sde lnk. In ths process, the aggregates for the nodes above the (Sydney) nodes need to be modfed to remove the contrbuton of the non-sydney ctes. For example, the node (John) has the aggregates of (30, 200) for both Perth and Sydney but only has the aggregates of (10, 100) for Sydney. We now turn to boundng wth MSPs. Recall that, wth the top-down aggregaton strategy, all MSPs for a (sub) data cube are convenently avalable for boundng (by convenently avalable, we mean that they are avalable wthout extra overhead). In the bottom-up aggregaton, the convenently avalable MSPs are those ponted to by sde lnks of certan header table entres. The observaton below descrbes how bounds are computed, and prunng s acheved. Observaton 5. For a G-tree of dmensons A 1 ;...;A n, the leaf nodes are the MSPs for CubeðA 1 ;...;A n Þ. Wth the BUC strategy, for a k 2 domanða k Þð1 k nþ, the subgroups of ða k Þ to be computed from the branches on the sde lnk of a k comprse CubeðA 1 ;...;A k 1 Þja k, and MSPs of the cube are the leaf nodes on the sde lnk for a k (avalable for boundng). The subtrees from the sde lnk of a k can be pruned f the bounds of CubeðA 1 ;...;A k 1 Þja k fal the gven constrant. Example 10. Wth CubeðMonth; Product; SalesmanÞjSydney, shown n Fg. 6, ts MSPs are the leaf nodes (Sydney, 10, 100) and (Sydney, 5, 100) on the sde lnk for Sydney. Thus, Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

12 914 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 7, JULY 2007 TABLE 7 Census, TPC-R, and Weather Multdmensonal Data Sets Fg. 8. Bottom-up bound prune cubng (BP-Cubng) on four dmensons A, B, C, and D CubeðMonth; Product; SalesmanÞjSydney ¼ 100=5 ¼ 20; CubeðMonth; Product; SalesmanÞjSydney ¼ 100=10 ¼ 10: As [10, 20] does not volate the constrant Avg(Sale) n [15, 20], the subtree for Sydney (shown n Fg. 7) s thus created. In Fg. 8, we use an example to gve some hgh-level vew of the BUC process for a four-dmensonal data cube. Each node represents the group that s computed on some G-tree. From top to bottom, nodes are lnked by the subtree relatonshp. The superscrpts over the nodes ndcate the order n whch the G-trees are created. Wth a tree, the dmensons after / are the shared dmensons for the tree and all ts subtrees. They are also the condtonal dmensons where the tree s created. Generally, the BP-Cubng(BU) algorthm s gven n Algorthm 2. Algorthm 2. The Bottom-Up Bound-Prune Cubng Algorthm Input: A data set D over dmensons A 1 ;...;A n and aggregaton constrant C. Output: The n-dmensonal ceberg cube on D, satsfyng C. (1) Buld the G-tree TðA 1 ;...;A n Þ from D; (2) Output all aggregates n TðA 1 ;...;A n Þ:Header; (3) foreach a 2 T:Header, do (4) BP-CubngðTðA 1 ;...;A n Þ; fagþ; Procedure BP-Cubng ðtðb 1 ;...;B k Þ;SÞ // S s a set of dmenson values as the condton for subcubes. (5) T s ¼ nl; (6) Suppose a s on the th dmenson of T; (7) Let M be the nodes followng the sde lnk of a; (8) Compute the bounds for CubeðB 1 ;...;B 1 ÞjS from the leaves orgnatng from M; (9) f the bounds do not volate C, then // prunng // Secton 7.4 (10) Construct the subtree T s ðb 1 ;...;B 1 Þ; (11) Output aggregates on T s :Header that satsfy C; (12) foreach a s 2 T s :Header do (13) BP-CubngðT s ðb 1 ;...;B 1 Þ;S[fa s gþ; 7.5 Interacton wth Antmonotone Constrants Our dscussons have focused on complex nonantmonotone aggregaton constrants. Antmonotone constrants can be easly ncorporated n BP-Cubng as follows: Suppose an antmonotone constrant C 0 s present, n addton to the nonantmonotone constrant C. Frst, n lnes 2 and 12 of Algorthm 1, a group g s checked aganst both C and C 0 before beng output. More mportantly, before the bounds are computed at lne 9, the branches of T orgnatng from n are pruned from further computaton f n fals C 0, as all subgroups of n wll also fal C 0. Wth top-down cubng, the prefx groups before the collapsng dmenson that fal C 0 are pruned. Wth bottom-up cubng, header-table entres that fal C 0 are pruned from further computaton. 8 EXPERIMENTS In ths secton, we evaluate the performance of both the BP- Cubng(TD) and BP-Cubng(BU) algorthms wth experments. We compare the performance of these algorthms wth that of the DnA algorthm [14], [15] on non-antmonotone aggregaton constrants and also wth that of the BUC algorthm [4] on antmonotone constrants. DnA s recent work on prunng for nonantmonotone aggregaton constrants. As DnA s not desgned for prunng for antmonotone constrants, extra processng s needed to prune for such constrants. On the other hand, BUC uses the same partton-based bottom-up aggregaton strategy as DnA and s desgned for antmonotone constrants. Thus, BP-Cubng s compared wth BUC on prunng wth antmonotone constrants. To do far comparson, all algorthms were mplemented wth all possble optmzaton technques. BUC was mplemented wth the collapsng duplcates optmzaton [4]. We added collapsng duplcates and ndexng tuples to DnA, even though such optmzatons were not reported n the orgnal papers [14], [15]. Block memory allocaton s used n the BP-cubng algorthms to reduce the number of calls of the dynamc memory allocaton functons. It s assumed that, for all algorthms, data structures used can ft nto memory. All experments were performed on a PC wth an 686 processor runnng GNU/Lnux. As outputtng the groups can take a sgnfcant amount of tme for bg data cubes, we choose to exclude the tme for output n the tmng for the algorthms. The thresholds for constrants are selected n a way such that there are at least 10 groups n the output. We expermented wth constrants nvolvng the aggregate functons Count, Avg, Var, and Sum. 8.1 Data Sets We used real-world, as well as artfcal, data sets n our experments. When selectng the data sets, we consdered the followng data characterstcs: dense versus sparse and random versus skewed. Table 7 summarzes the data sets used n our experments. The US census data set 10 was collected n a 1990 US households survey. The orgnal data set had 61 attrbutes such as hrswork1 (hours worked last week), nchld (number of own chldren on the household), and valueh (value of house). We selected 12 dscrete attrbutes as 10. ftp://ftp.pums.org/pums/data/p19001.z. Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

13 ZHANG ET AL.: EFFICIENT COMPUTATION OF ICEBERG CUBES BY BOUNDING AGGREGATE FUNCTIONS 915 Fg. 9. BP-Cubng versus BUC on the CountðÞ constrant. (a) Census. (b) TPC-R. (c) Weather. (d) Weather-5. dmensons and a numercal attrbute as the measure. The data set s dense and skewed. The Weather data set 11 contans real weather reports from varous weather statons n We used these nne attrbutes as dmensons: staton-d, longtude, solar-alttude, lattude, present-weather, weather-change-code, day, hour, and brghtness. Ther cardnaltes, respectvely, are 6,505, 351, 179, 152, 99, 10, 8, 3, and 2. We randomly generated values between 1 and 100 as the measure. Ths Weather data set s the same as that used n the experments n [4], where BUC was shown to be effcent for sparse data. The TPC-R data set 12 s an artfcal data set provded by the Transacton Processng Councl and s desgned for testng the performance of representatve complex queres n hgh-level busness decson-makng envronment. Its dmensons nclude customer, suppler, order, and shpment. The orgnal TPC-R data set conssts of several relatonal tables. We constructed a joned relaton as our multdmensonal data set. A numercal attrbute s selected as the measure. The TPC-R data set s relatvely dense and random. 8.2 BP-Cubng versus BUC on CountðÞ Ths set of experments s desgned to evaluate two aspects of BP-Cubng: the tree-based aggregaton strategy and prunng wth smple antmonotone constrants. Ths nvolves comparng the performance of the BP-Cubng algorthms wth BUC on computng ceberg cubes for the constrant CountðÞ. For the specal case of ¼ 1, the full data cubes are computed. Snce there s no prunng, the dfference n effcency should be solely attrbuted to the aggregaton strateges. Fg. 9 shows the runtme of the algorthms. Fg. 9a shows that BUC uses seconds to compute the full data cube over the dense and skewed Census data set, and BP-Cubng(BU) and BP-Cubng(TD) take and seconds, respectvely, whch are 1.9 and 3.76 tmes faster than BUC. Fg. 9b shows that BUC uses seconds to compute the full cube for the TPC-R data set. In contrast, BP-Cubng(BU) and BP-Cubng(TD) use only 17.9 and 5.89 seconds, respectvely, whch are 4.9 and 14.9 tmes faster. On the other hand, BUC s more effcent on the sparse Weather data than the BP-Cubng algorthms. Fg. 9c shows that BUC uses 6.16 seconds on computng the full cube, whereas BP-Cubng(BU) and BP-Cubng(TD) use and seconds, respectvely. When executed on the data set obtaned by removng the four dmensons wth the largest cardnaltes of 6,505, 351, 179, and 152 from the Weather data set, the BP-Cubng algorthms and BUC are all able to compute the full cube quckly: BUC uses 0.3 seconds, whereas both BP-Cubng algorthms use 0.02 seconds, as shown n Fg. 9d. Ths set of experments has shown that top-down multway aggregaton usng G-trees s a superb aggregaton strategy, and the expermental results suggest that treebased aggregaton s generally more effcent than recursve partton-based aggregaton. Our experments also confrmed the observaton n [4] that partton-based bottomup aggregaton s sutable for computng sparse data cubes. Even though BP-Cubng(BU) and BUC both use the same bottom-up strategy, BP-Cubng(BU) has better performance than BUC when dealng wth antmonotone constrants. Ths mples that aggregaton on G-trees s a superor strategy. In Fg. 9c, we see that ntally, BP-Cubng(TD) s more effcent than BP-Cubng(BU) for computng the full cube on Weather. However, for constrants wth the threshold 5, BP-Cubng(BU) outperforms BP-Cubng(TD). Ths may have happened because the sze of the G-tree s large for sparse data, and the top-down aggregaton strategy may have aggregated many groups, whch turns out to fal the constrant. 8.3 BP-Cubng versus DnA on AvgðXÞ n ½ 1 ; 2 Š In ths set of experments, we compare the performance of the BP-Cubng algorthms aganst DnA on the nonantmonotone constrant AvgðXÞ n ½ 1 ; 2 Š. Fg. 10 shows that the BP-Cubng algorthms scale very well when the ½ 1 ; 2 Š range threshold becomes looser. In all data sets, the BP-Cubng algorthms show modest lnear ncrease n computaton tme. In contrast, the performance of DnA degrades sgnfcantly (whch may be due to the fact that ts prunng s at the tuple record level: When many groups need to be processed, the search cost for the mnmal partton to approxmate a group becomes hgh (see Secton 9 for more dscussons)). Fgs. 10a and 10b show that n the dense data sets Census and TPC-R, the BP-Cubng algorthms sgnfcantly outperform DnA, and the BP-Cubng algorthm shows the best performance. For the constrant Avg (X) n [500, 50,000] over Census, DnA fnshes n seconds, whereas BP-Cubng(TD) fnshes n only 3.53 seconds, whch s tmes faster. At the lower end, for the constrant Avg(X) n [500, 10,000], BP-Cubng(TD) s 7.09 tmes faster. In TPC-R, BP-Cubng(TD) outperforms DnA by tmes. BP-Cubng(BU) also outperforms DnA overall, especally for larger constrant ranges. For the constrant Avg(X) n [500, 50,000] over Census, BP-Cubng(BU) s 2.33 tmes faster than DnA. Fg. 10c shows that n the sparse Weather data set, BP-Cubng(BU) acheves more sgnfcant effcency mprovement over DnA than BP-Cubng(TD). In ths data set, Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

14 916 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 7, JULY 2007 Fg. 10. BP-Cubng versus DnA on the AvgðXÞ n ½ 1 ; 2 Š constrant. (a) Census. (b) TPC-R. (c) Weather. the fgure may suggest that the mprovement of the BP- Cubng algorthms over DnA s not as pronounced as n Census and TPC-R. However, Fg. 9 has shown that n the sparse Weather data, the partton-based aggregaton strategy of DnA s more effcent than the tree-based aggregaton strategy of BP-Cubng. Wth an aggregaton strategy that works not that well, the BP-Cubng algorthms stll acheve better performance than DnA. Such dramatc result can only be attrbuted to the effectveness of bound prunng. As experments have shown that BP-Cubng(TD) has better performance than BP-Cubng(BU) n general, BP-Cubng(TD) s used n later experments for comparng wth DnA. 8.4 BP-Cubng versus DnA on VarðXÞ In ths secton, we compare our boundng technques aganst DnA on the constrant VarðXÞ, whch nvolves the complex nonmonotone functon Var. For BP- Cubng, we obtan the upper bound for Var by followng Table 6. For DnA, we derve the upper bound for Var as follows ([14, Example 4.2]): QSUMðXÞ Count 1ðcÞ psum 1ðcÞ 2 nsum 1ðcÞ 2 Count 2ðXÞ Count 3ðXÞ psum 2ðXÞnsum 2ðXÞ þ 2 ðcount 4ðcÞÞ 2 ; whose notatons are explaned later n Secton 9. The runtme of both BP-Cubng(TD) and DnA s shown n Fgs. 11a, 11b, and 11c. When the Var threshold gets larger, the ceberg cubes get smaller, and both algorthms fnsh faster. BP-Cubng s always faster than DnA at all Var thresholds. The speedup s usually around several tmes. Ths can be attrbuted to the tghter bounds obtaned usng MSPs. Fg. 11. BP-Cubng versus DnA on VarðXÞ and SumðXÞ. (a) VarðXÞ : Census. (b) VarðXÞ : TPC-R. (c) VarðXÞ : Weather. (d) SumðXÞ : Census. (e) SumðXÞ : TPC-R. (f) SumðXÞ : Weather. 8.5 BP-Cubng versus DnA on SumðXÞ We now compare BP-Cubng wth DnA on computng ceberg cubes defned by SumðXÞ. As shown n Table 2, SumðXÞ s a representatve for our general boundng theory, whch does not nvolve any optmzaton. The upper bound of SumðXÞ needs to be computed for prunng. In BP-Cubng, ths s computed followng Table 2. In DnA, ths s computed followng [15] (see Secton 9). The artfcal data sets TPC-R and Weather contan only postve measure values, whch renders SumðXÞ an antmonotone constrant. We randomly negated 10 percent of the measure values to make the constrant nonantmonotone so that prunng technques can be sensbly compared. Fgs. 11d, 11e, and 11f demonstrate the runtme of both algorthms on the three data sets under dfferent values of. BP-Cubng(TD) sgnfcantly outperforms DnA at all thresholds for SumðXÞ. To ascertan the contrbuton of prunng toward the effcency gan of BP-Cubng over DnA, for each algorthm, we collected the number of groups that are examned for prunng, namely, groups whose aggregates are bounded (approxmated) and tested aganst the gven constrant. There are two types of examned groups: a true postve (or nonsoluton group) s a group whose approxmated aggregate and ts real aggregate pass the constrant, and a Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

15 ZHANG ET AL.: EFFICIENT COMPUTATION OF ICEBERG CUBES BY BOUNDING AGGREGATE FUNCTIONS 917 TABLE 8 Number of Examned Groups versus Number of Soluton Groups on SumðXÞ Bound-prune cubng denotes (BP-Cubng(TD)). false postve (or soluton group) s a group whose approxmated aggregate passes the constrant, but whose real aggregate fals the constrant. Table 8 shows the number of examned groups versus the number of soluton groups for the three data sets. The three columns lst, respectvely, the number of soluton groups n the output and the number of groups examned n BP-Cubng(TD) and DnA. Substantally fewer groups are examned n BP-Cubng(TD) at all thresholds. Ths shows that BP-Cubng wth MSPs has a clear advantage. Observe that, when the number of soluton groups decreases, the numbers of groups examned by both algorthms decrease as well, but that number n DnA decreases at a much slower rate than that n BP-Cubng(TD). 9 RELATED WORK DnA [14], [15] s a recent approach for prunng wth nonantmonotone aggregaton constrants n ceberg cubng. The man dea of DnA s to dvde a partton of tuples nto two subspaces of postve and negatve measure values, respectvely, so that a gven constrant can be rewrtten usng antmonotone or monotone constrants n subspaces. For example, consder a measure X, whch can be postve or negatve. Gven the constrant AvgðXÞ, the authors would frst rewrte the orgnal aggregate nto AvgðXÞ ¼psumðXÞ=Count 1 ðxþ nsumðxþ=count 2 ðxþ; where psum and nsum are the (absolute) sum of postve/ negatve X values, and Count 1 and Count 2 are rewrtng Count n the postve and negatve spaces. Then, they would use psumðxþ=count 1 ðcþ nsumðcþ=count 2 ðxþ as a weaker antmonotone approxmator for Avg(X), where c s some smallest subpartton of the gven partton. If the approxmate fals the threshold, then all groups that are subgroups of X and supergroups of c are pruned. BP-Cubng dffers from DnA n several aspects: 1. Rather than the separately monotone rewrtng strategy of DnA, n BP-Cubng, rewrtng follows the prncples that an algebrac aggregate functon can be expressed as an algebrac expresson of dstrbutve aggregate functons and that the aggregate value for a group can be computed from the aggregates of MSPs. For example, gven measure X wth MSPs X 1 ;...;X n, AvgðXÞ s rewrtten nto SumðXÞ=CountðXÞ ¼ Sum ðsumðx ÞÞ=Sum ðcountðx ÞÞ: 2. In BP-Cubng, the optmzaton for complex aggregate functons such as Avg and Var can produce very tght bounds. Contnung wth the prevous example, by followng Table 6, the optmzed upper bound for AvgðXÞ s MaxðfAvgðc ÞgÞ, where c terates over the smallest parttons. As Avgðc Þ s the real aggregate of a group n the search space, MaxðfAvgðc ÞgÞ s the optmal approxmator and s tghter (smaller) than psumðxþ=count 1 ðcþ SumðcÞ=Count 2 ðxþ; whch s the bound derved by DnA. 3. MSPs are nonempty groups of tuples and BP-Cubng prunes groups. Moreover, the G-tree structure of BP- Cubng greatly facltates top-down multway aggregaton and saves computaton, especally for dense data. DnA calculates aggregate bounds from tuples and prunes tuples the bottom-up recursve parttonng aggregaton strategy can ncur extra cost searchng for and prunng tuples that do not occur n any groups that satsfy a gven constrant. 4. Extra processng s needed n DnA to ncorporate antmonotone constrants such as CountðÞ n. Contnung wth the prevous example, consder the constrant AvgðXÞ and CountðÞ n and the (ab) partton wth c, d, and e as further parttonng dmenson values. In DnA, wth recursve parttonng, even f (e) s nfrequent, subparttons of (ab) and (e), namely, (abe), (abce), (abde), and (abcde), are not pruned (the Rollback tree was proposed to prune these groups). Note that these are only some of the subparttons of (e) that should be pruned. In BP-Cubng, as dscussed n Secton 7.5, all subparttons of a partton falng an antmonotone constrant are pruned. The top-k average technque [7] was desgned specfcally for the constrant AvgðxÞ and CountðÞ k and s not a general prunng technque for ceberg cubng. The tree-based bottom-up aggregaton strategy was frst proposed n [7]. Top-down aggregaton was proposed ndependently n [16] for performng multway aggregaton n cubng; however, the work was focused on how to ncorporate bottom-up prunng for antmonotone aggregaton constrants nto top-down aggregaton. The BUC algorthm was proposed n [4], where prunng wth antmonotone constrants was performed n the bottom-up recursve parttonng aggregaton framework. Our study s also related to constrant data mnng [2], [3], [8], [9], [12], [13]. Agrawal and Srkant [2] wrote the frst paper on prunng wth the antmonotone constrant CountðÞ for assocaton mnng. In other works on Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

16 918 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 7, JULY 2007 assocaton mnng, prunng strateges wth specfc constrants were proposed, namely, mnmum mprovement constrants [3], succnct constrants [8], convertble constrants [9], tem constrants [12], and support constrants [13]. These constrants are orthogonal to the aggregaton constrants that we consder n ceberg cubng. 10 CONCLUSIONS In computng ceberg cubes, prunng wth nonantmonotone aggregaton constrants has been a challengng problem. In ths paper, we have proposed prunng wth nonantmonotone constrants by estmatng ther upper and lower bounds on a data cube from aggregates of the most specfc (or mnmal) parttons. We have proposed ceberg cubng algorthms, called BP-Cubng, whch ncorporate boundng and prunng, and ncorporated top-down and bottom-up aggregaton strateges by usng the G-tree structure. Our extensve experments on real and artfcal data sets have shown that BP-Cubng s effectve for prunng under varous data characterstcs, and our ceberg cubng algorthms sgnfcantly outperform state-of-the-art ceberg cubng algorthms. ACKNOWLEDGMENTS The authors thank the revewers for ther helpful comments. A sgnfcantly abrdged verson of ths paper appeared n [5]. [13] K. Wang, Y. He, and J. Han, Pushng Support Constrants nto Frequent Itemset Mnng, Proc. Int l Conf. Very Large Data Bases (VLDB 00), [14] K. Wang, Y. Jang, J.X. Yu, G. Dong, and J. Han, Pushng Aggregate Constrants by Dvde-and-Approxmate, Proc. 19th IEEE Int l Conf. Data Eng. (ICDE 03), [15] K. Wang, Y. Jang, J.X. Yu, G. Dong, and J. Han, Dvde-and- Approxmate: A Novel Constrant Push Strategy for Iceberg Cube Mnng, IEEE Trans. Knowledge and Data Eng., vol. 17, no. 3, pp , [16] D. Xn, J. Han, X. L, and B.W. Wah, Star-Cubng: Computng Iceberg Cubes by Top-Down and Bottom-Up Integraton, Proc. Int l Conf. Very Large Data Bases (VLDB 03), [17] Y. Zhao, P. Deshpande, and J.F. Naughton, An Array-Based Algorthm for Smultaneous Multdmensonal Aggregates, Proc. ACM Int l Conf. Management of Data (SIGMOD 97), Xuzhen Zhang receved the PhD degree from the Unversty of Melbourne. She s currently a senor lecturer n the School of Computer Scence and Informaton Technology, RMIT Unversty. Her research nterests nclude database technology, data mnng, machne learnng, and ther applcatons n new areas. She s a member of the ACM. Paulne Lenhua Chou receved the PhD degree from RMIT Unversty. She currently works for the Australan government as a data analyst. Her research nterests nclude database technology and data mnng. REFERENCES [1] S. Agarwal, R. Agrawal, P. Deshpande, A. Gupta, J. Naughton, R. Ramakrshnan, and S. Sarawag, On the Computaton of Multdmensonal Aggregates, Proc. Int l Conf. Very Large Data Bases (VLDB 96), pp , [2] R. Agrawal and R. Srkant, Fast Algorthms for Mnng Assocaton Rules, Proc. Int l Conf. Very Large Data Bases (VLDB 94), pp , [3] R.J. Bayardo, R. Agrawal, and D. Gunopulos, Constrant-Based Rule Mnng n Large Dense Databases, Proc. 15th IEEE Int l Conf. Data Eng. (ICDE 99), [4] K. Beyer and R. Ramakrshnan, Bottom-Up Computaton of Sparse and Iceberg Cubes, Proc. ACM Int l Conf. Management of Data (SIGMOD 99), [5] L. Chou and X. Zhang, Computng Complex Iceberg Cubes by Multway Aggregaton and Boundng, Proc. Sxth Int l Conf. Data Warehousng and Knowledge Dscovery (DaWak 04), [6] J. Gray, S. Chaudhur, A. Bosworth, A. Layman, D. Rechart, and M. Venkatrao, Data Cube: A Relatonal Aggregaton Operator Generalzng Group-By, Cross-Tab and Subtotals, Data Mnng and Knowledge Dscovery, vol. 1, no. 1, pp , [7] J. Han, J. Pe, G. Dong, and K. Wang, Effcent Computaton of Iceberg Cubes wth Complex Measures, Proc. ACM Int l Conf. Management of Data (SIGMOD 01), [8] R. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang, Exploratory Mnng and Prunng Optmzatons of Constraned Assocatons Rules, Proc. ACM Int l Conf. Management of Data (SIGMOD 98), pp , [9] J. Pe, J. Han, and L.V.S. Lakshmanan, Mnng Frequent Itemsets wth Convertble Constrants, Proc. 17th IEEE Int l Conf. Data Eng. (ICDE 01), pp , [10] K.A. Ross and D. Srvastava, Fast Computaton of Sparse Data Cubes, Proc. Int l Conf. Very Large Data Bases (VLDB 97), pp , [11] S. Sarawag, R. Agrawal, and R. Gupta, On Computng the Data Cube, Techncal Report RJ10026, IBM Almaden Research Center, [12] R. Srkant, Q. Vu, and R. Agrawal, Mnng Assocaton Rules wth Item Constrants, Proc. Thrd Int l Conf. Knowledge Dscovery and Data Mnng (KDD 97), pp , Guozhu Dong receved the PhD degree from the Unversty of Southern Calforna n He s a professor at Wrght State Unversty. He prevously taught at the Unversty of Melbourne and Flnders Unversty. He has served on numerous program commttees ncludng the IEEE Internatonal Conference on Data Engneerng (ICDE), the IEEE Internatonal Conference on Data Mnng (ICDM), the Internatonal Conference on Database Theory (ICDT), the ACM Symposum on Prncples of Database Systems (PODS), the ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, and the Internatonal Conference on Very Large Data Bases (VLDB). He was a program commttee cochar of the 2003 Internatonal Conference on Web-Age Informaton Management (WAIM), s on the steerng commttee of the same conference seres, s on the edtoral board of the Internatonal Journal of Informaton Technology, and wll serve as a program commttee cochar of the 2007 Jont Internatonal Conference of APWeb and WAIM. Hs man research nterests are databases, data mnng, and bonformatcs. Hs research has been funded by government agences such as the US Natonal Scence Foundaton (NSF), the Australan Research Councl (ARC), and the US Ar Force Research Laboratory (AFRL) and prvate corporatons. He has publshed more than 100 artcles n journals and conferences, holds three US patents, and was a recpent of the IEEE ICDM 2005 Best Paper Award. He s a senor member of the IEEE and member of the ACM.. For more nformaton on ths or any other computng topc, please vst our Dgtal Lbrary at Authorzed lcensed use lmted to: IEEE Xplore. Downloaded on November 18, 2008 at 20:54 from IEEE Xplore. Restrctons apply.

Multiway pruning for efficient iceberg cubing

Multiway pruning for efficient iceberg cubing Multway prunng for effcent ceberg cubng Xuzhen Zhang & Paulne Lenhua Chou School of CS & IT, RMIT Unversty, Melbourne, VIC 3001, Australa {zhang,lchou}@cs.rmt.edu.au Abstract. Effectve prunng s essental

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Math Homotopy Theory Additional notes

Math Homotopy Theory Additional notes Math 527 - Homotopy Theory Addtonal notes Martn Frankland February 4, 2013 The category Top s not Cartesan closed. problem. In these notes, we explan how to remedy that 1 Compactly generated spaces Ths

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements Explct Formulas and Effcent Algorthm for Moment Computaton of Coupled RC Trees wth Lumped and Dstrbuted Elements Qngan Yu and Ernest S.Kuh Electroncs Research Lab. Unv. of Calforna at Berkeley Berkeley

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming Optzaton Methods: Integer Prograng Integer Lnear Prograng Module Lecture Notes Integer Lnear Prograng Introducton In all the prevous lectures n lnear prograng dscussed so far, the desgn varables consdered

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES Ruxandra Olmd Faculty of Mathematcs and Computer Scence, Unversty of Bucharest Emal: ruxandra.olmd@fm.unbuc.ro Abstract Vsual secret sharng schemes

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

A Five-Point Subdivision Scheme with Two Parameters and a Four-Point Shape-Preserving Scheme

A Five-Point Subdivision Scheme with Two Parameters and a Four-Point Shape-Preserving Scheme Mathematcal and Computatonal Applcatons Artcle A Fve-Pont Subdvson Scheme wth Two Parameters and a Four-Pont Shape-Preservng Scheme Jeqng Tan,2, Bo Wang, * and Jun Sh School of Mathematcs, Hefe Unversty

More information

Oracle Database: SQL and PL/SQL Fundamentals Certification Course

Oracle Database: SQL and PL/SQL Fundamentals Certification Course Oracle Database: SQL and PL/SQL Fundamentals Certfcaton Course 1 Duraton: 5 Days (30 hours) What you wll learn: Ths Oracle Database: SQL and PL/SQL Fundamentals tranng delvers the fundamentals of SQL and

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

UNIT 2 : INEQUALITIES AND CONVEX SETS

UNIT 2 : INEQUALITIES AND CONVEX SETS UNT 2 : NEQUALTES AND CONVEX SETS ' Structure 2. ntroducton Objectves, nequaltes and ther Graphs Convex Sets and ther Geometry Noton of Convex Sets Extreme Ponts of Convex Set Hyper Planes and Half Spaces

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

IP mobility support is becoming very important as the

IP mobility support is becoming very important as the 706 IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 6, JUNE 2003 A New Scheme for Reducng Lnk and Sgnalng Costs n Moble IP Young J. Lee and Ian F. Akyldz, Fellow, IEEE Abstract IP moblty support s provded

More information

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR Judth Aronow Rchard Jarvnen Independent Consultant Dept of Math/Stat 559 Frost Wnona State Unversty Beaumont, TX 7776 Wnona, MN 55987 aronowju@hal.lamar.edu

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES UbCC 2011, Volume 6, 5002981-x manuscrpts OPEN ACCES UbCC Journal ISSN 1992-8424 www.ubcc.org VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Boosting Spatial Pruning: On Optimal Pruning of MBRs

Boosting Spatial Pruning: On Optimal Pruning of MBRs Boostng Spatal Prunng: On Optmal Prunng of MBRs Tobas Emrch, Hans-Peter Kregel, Peer Kröger, Matthas Renz, Andreas Züfle Insttute for Informatcs, Ludwg-Maxmlans-Unverstät München Oettngenstr. 67, D-8538

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds TF 2 P-growth: An Effcent Algorthm for Mnng Frequent Patterns wthout any Thresholds Yu HIRATE, Ego IWAHASHI, and Hayato YAMANA Graduate School of Scence and Engneerng, Waseda Unversty {hrate, ego, yamana}@yama.nfo.waseda.ac.jp

More information

Lecture 5: Probability Distributions. Random Variables

Lecture 5: Probability Distributions. Random Variables Lecture 5: Probablty Dstrbutons Random Varables Probablty Dstrbutons Dscrete Random Varables Contnuous Random Varables and ther Dstrbutons Dscrete Jont Dstrbutons Contnuous Jont Dstrbutons Independent

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information