High-dimensional Proximity Joins. Kyuseok Shim Ramakrishnan Srikant Rakesh Agrawal. IBM Almaden Research Center. 650 Harry Road, San Jose, CA 95120

Size: px
Start display at page:

Download "High-dimensional Proximity Joins. Kyuseok Shim Ramakrishnan Srikant Rakesh Agrawal. IBM Almaden Research Center. 650 Harry Road, San Jose, CA 95120"

Transcription

1 High-dimensinal Prximity Jins Kyusek Shim Ramakrishnan Srikant Rakesh Agrawal IBM Almaden Research Center 650 Harry Rad, San Jse, CA Abstract Many emerging data mining applicatins require a prximity (similarity) jin between pints in a high-dimensinal dmain. We present a new algrithm that utilizes a new data structure, called the -kd tree, fr fast spatial prximity jins n high-dimensinal pints. This data structure reduces the number f neighbring leaf ndes that are cnsidered fr the jin test, as well as the traversal cst f nding apprpriate branches in the internal ndes. The strage cst fr internal ndes is independent f the number f dimensins. Hence the prpsed data structure scales t high-dimensinal data. We analyze the cst f the jin fr the -kd tree and the R-tree family, and shw that the -kd tree will perfrm better fr high-dimensinal jins. Empirical evaluatin, using synthetic and real-life datasets, shws that prximity jin using the -kd tree is typically 2 t 40 times faster than the R + tree, with the perfrmance gap increasing with the number f dimensins. We als discuss hw sme f the ideas f the -kd tree can be applied t the R-tree family. These biased R-trees perfrm better than the crrespnding traditinal R-trees fr highdimensinal prximity jins, but d nt match the perfrmance f the -kd tree. 1 Intrductin Many emerging data mining applicatins require ecient prcessing f prximity (similarity) jins n high-dimensinal pints. Examples include applicatins in time-series databases[afs93, ALSS95], multimedia databases [Jag94, NBE + 93, NC91], medical databases [ACF + 93, TBS90], and scientic databases[vas93]. Sme typical queries in these applicatins include: (1) discver all stcks with similar price mvements; (2) nd all pairs f similar images; (3) retrieve music scres similar t a target music scre. These queries are ften a prelude t clustering the bjects. Fr example, given all pairs f similar images, the images can be clustered int grups such that the images in each grup are similar. T mtivate the need fr multidimensinal indices in such applicatins, cnsider the prblem f nding all pairs f similar time-sequences. The technique in [ALSS95] slves this prblem by breaking each time-sequences int a set f cntiguus subsequences, and nding all subsequences similar t each ther. If tw sequences have \enugh" similar subsequences, they are cnsidered similar. T nd similar subsequences, each subsequence is mapped t a pint in a multi-dimensinal Currently at Bell Labratries, Murray Hill, NJ. 1

2 space. Typically, the dimensinality f this space is quite high. The prblem f nding similar subsequences is nw reduced t the prblem f nding pints that are clse t the given pint in the multi-dimensinal space. A pair f pints are cnsidered \clse" if they are within distance f each ther with sme distance metric (such as L 2 r L1 nrms) that invlves all dimensins, where is specied by the user. A multi-dimensinal index structure (the R + tree) was used fr nding all pairs f clse pints. This apprach hlds fr ther dmains, such as image data. In this case, the image is brken int a grid f sub-images, key attributes f each sub-image mapped t a pint in a multi-dimensinal space, and all pair f similar sub-images are fund. If \enugh" sub-images f tw images match, a mre cmplex matching algrithm is applied t the images. A clsely related prblem is t nd all bjects similar t a given bjects. This translates t nding all pints clse t a query pint. Even if there is n direct mapping frm an bject t a pint in a multi-dimensinal space, this paradigm can still be used if a distance functin between bjects is available. An algrithm is presented in [FL95] fr generating a mapping frm an bject t a multi-dimensinal pint, given a set f bjects and a distance functin. Current spatial access methds (see [Sam89, Gut84] fr an verview) have mainly cncentrated n string map infrmatin, which is a 2-dimensinal r 3-dimensinal space. While they wrk well with lw dimensinal data pints, the time and space fr these indices grw rapidly with dimensinality. Mrever, while CPU cst is high fr prximity jins, existing indices have been designed with the reductin f I/O cst as their primary gal. We discuss these pints further later in the paper, after reviewing current multidimensinal indices. T vercme the shrtcmings f current indices fr high-dimensinal prximity jins, we prpse a structure called the -kd tree. This is a main-memry data structure ptimized fr perfrming prximity jins. The -kd tree als has a very small build time. This lets the -kd tree use the prximity distance limit as a parameter in building the tree. Empirical evaluatin shws that the build plus jin time fr the -kd tree is typically 2 t 40 times less than the jin time fr the R + tree [SRF87], 1 with the perfrmance gap increasing with the number f dimensins. A pure main-memry data structure wuld nt be very useful, since the data in many applicatins will nt t in memry. We present a jin algrithm that can handle large amunts f data while still using the -kd tree. Prblem Denitin We will cnsider tw versins f the spatial prximity jin prblem: Self-jin: Given a set f N high-dimensinal pints and a distance metric, nd all pairs f pints that are within distance f each ther. 1 Our experiments indicated that the R + tree was better than the R tree [Gut84] r the R tree [BKSS90] tree fr high-dimensinal prximity jins. 2

3 Nn-self-jin: Given tw sets S 1 and S 2 f high-dimensinal pints and a distance metric, nd pairs f pints, ne each frm S 1 and S 2, that are within distance f each ther. The distance metric fr tw n dimensinal pints ~ X and ~ Y that we cnsider is L p = nx 1 jx i? Y i j p! 1=p ; 1 p 1: L 2 is the familiar Euclidean distance, L 1 the Manhattan distance, and L1 crrespnds t the maximum distance in any dimensin. Prperties The prblem f high-dimensinal prximity jins with sme distance metric and parameter has the fllwing prperties: The feature vectr chsen fr similarity cmparisn is high-dimensinal. Every dimensin f the feature vectr is mapped int a numeric value whse bunds are knwn. The distance functin is cmputed cnsidering every dimensin f the feature vectr. The prximity distance limit is nt large, since indices are nt eective when the selectivity f the prximity jin is large (i.e. when every pint matches with mst ther pints). Paper Organizatin. In Sectin 2, we give an verview f existing spatial indices, and describe their shrtcmings when used fr high-dimensinal prximity jins. Sectin 3 describes the -kd tree and the algrithm fr prximity jins. In Sectin 4, we analyze the perfrmance fr the -kd tree and the R + tree fr prximity jins. We give a perfrmance evaluatin in Sectin 5. Sectin 6 discusses biased R + trees. We cnclude in Sectin 7. 2 Current Multidimensinal Index Structures We rst discuss the R-tree family f indices, which are the mst ppular multi-dimensinal indices, and describe hw t use them fr prximity jins. We als give a brief verview f ther indices. We then discuss prblems f the current index structures fr high-dimensinal prximity jins. 2.1 The R-tree family R-tree [Gut84] is a balanced tree in which each nde represents a rectangular regin. Each internal nde in a R-tree stres a minimum bunding rectangle (MBR) fr each f its children. The MBR cvers the space f the pints in the child nde. The MBRs f siblings can verlap. The decisin whether t traverse a subtree in an internal nde depends n whether its MBR verlaps with the space cvered by query. When a nde becmes full, it is split. Ttal area f the tw MBRs resulting frm the split is minimized while splitting a nde. Figure 1 shws an example f R-tree. This tree 3

4 N1 L1 L2 N2 L3 N1 N2 L1 L2 L3 L4 L4 (a) Space Cvered by Bunding Rectangles (b) R-tree Figure 1: Example f an R-tree cnsists f 4 leaf ndes and 3 internal ndes. The MBRs are N1,N2,L1,L2,L3 and L4. The rt nde has tw children whse MBRs are N1 and N2. R tree [BKSS90] added tw majr enhancements t R-tree. First, rather than just cnsidering the area, the nde splitting heuristic in R tree als minimizes the perimeter and verlap f the bunding regins. Secnd, R tree intrduced the ntin f frced reinsert t make the shape f the tree less dependent n the rder f the insertin. When a nde becmes full, it is nt split immediately, but a prtin f the nde is reinserted frm the tp level. With these tw enhancements, the R tree generally utperfrms R-tree. R + tree [SRF87] impses the cnstraint that n tw bunding regins f a nn-leaf nde verlap. Thus, except fr the bundary surfaces, there will be nly ne path t every leaf regin, which can reduce search and jin csts. X-tree [BKK96] avids splits that culd result in high degree f verlap f bunding regins fr R -tree. Their experiments shw that the verlap f bunding regins increases signicantly fr high dimensinal data resulting in perfrmance deteriratin in the R -tree. Instead f allwing splits that prduce high degree f verlaps, the ndes in X-tree are extended t mre than the usual blck size, resulting in s called super-ndes. Experiments shw that X-tree imprves the perfrmance f pint query and nearest-neighbr query cmpared t R -tree and TV-tree (described belw). N cmparisn with R + -tree is given in [BKK96] fr pint data. Hwever, since the R + - tree des nt have any verlap, and the gains fr the X-tree are btained by aviding verlap, ne wuld nt expect the X-tree t be better than the R + -tree fr pint data. Prximity Jin The jin algrithm using R-tree cnsiders each leaf nde, extends its MBR with -distance, and nds all leaf ndes whse MBR intersects with this extended MBR. The algrithm then perfrms a nested-lp jin r srt-merge jin fr the pints in thse leaf ndes, the jin cnditin being that the distance between the pints is at mst. (Fr the srt-merge jin, the pints are rst srted n ne f the dimensins.) T reduce redundant cmparisns between pints when jining tw leaf ndes, we culd rst screen pints. The bundary f each leaf nde is extended by, and nly pints that lie within the intersectin f the tw extended regins need be jined. Figure 2 shws an example, where the rectangles with slid lines represent the MBRs f tw leaf ndes and the dtted lines illustrate the 4

5 L1 L2 L1 L2 (a) Overlapping MBRs (R tree and R? tree) (b) Nn-verlapping MBRs (R + tree) Figure 2: Screening pints fr jin test extended bundaries. The shaded area cntains screened pints. 2.2 Other Index Structures kdb tree [Rb81] is similar t the R + tree. The main dierence is that the bunding rectangles cver the entire space, unlike the MBRs f the R + tree. hb-tree [LS09] is similar t the kdb tree except that bunding rectangles f the children f an internal nde are rganized as a K-D tree [Ben75] rather than as a list f MBRs. (The K-D-tree is a binary tree fr multi-dimensinal pints. In each level f the K-D-tree, nly ne dimensin, chsen cyclically, is used t decide the subtree fr traversal.) Further, the bunding regins may have rectangular hles in them. This reduces the cst f splitting a nde cmpared t the kdb tree. TV-tree [LJF94] uses a variable number f dimensins fr indexing. TV-tree has a design parameter (\active dimensin") which is typically a small integer (1 r 2). Fr any nde, nly dimensins are used t represent bunding regins and t split ndes. Fr the ndes clse t the rt, the rst dimensins are used t dene bunding rectangles. As the tree grws, sme ndes may cnsist f pints that all have the same value n their rst, say, k dimensins. Since the rst k dimensins can n lnger distinguish the pints in thse ndes, the next dimensins (after the k dimensins) are used t stre bunding regins and fr splitting. This reduces the strage and traversal cst fr internal ndes. Grid-le [NHS84] partitins the k-dimensinal space as a grid; multiple grid buckets may be placed in a single disk page. A directry structure keeps track f the mapping frm grid buckets t disk pages. A grid bucket must t within a leaf page. If a bucket verws, the grid is split n ne f the dimensins. 2.3 Prblems with Current Indices The index structures described abve suer frm the fllwing prblems fr perfrming prximity jins with high-dimensinal pints: 5

6 Figure 3: Number f neighbring leaf ndes. Number f Neighbring Leaf Ndes. The splitting algrithms in the R-tree variants utilize every dimensin equally fr splitting in rder t minimize the vlume f hyper-rectangles. This causes the number f neighbring leaf ndes within -distance f a given leaf nde t increase dramatically with the number f dimensins. T see why this happens, assume that a R-tree has partitined the space s that there is n \dead regin" between bunding rectangles. Then, with a unifrm distributin in a 3-dimensinal space, we may get 8 leaf ndes as shwn in Figure 3. Ntice that every leaf nde is within -distance f every ther leaf nde. In an n dimensinal space, there may be O(2 n ) leaf ndes within -distance f every leaf nde. The prblem is smewhat mitigated because f the use f MBRs. Hwever, the number f neighbrs within -distance still increases dramatically with the number f dimensins. This prblem als hlds fr ther multi-dimensinal structures, except perhaps the TV-tree. Hwever, the TV-tree suers frm a dierent prblem { it will nly use the rst k dimensins fr splitting, and des nt cnsider any f the thers (unless many pints have the same value in the rst k dimensins). With enugh data pints, this leads t the same prblem as fr the R-tree, thugh fr the ppsite reasn. Since the TV-tree uses nly the rst k dimensins fr splitting, each leaf nde will have many neighbring leaf ndes within -distance. Nte that the prblem aects bth the CPU and I/O cst. The CPU cst is aected because f the traversal time as well as time t screen all the neighbring pages. I/O cst is aected because we have t access all the neighbring pages. Strage Utilizatin. The kdb tree and R-tree family, including the X-tree, represent the bunding regins f each nde by rectangles. The bunding rectangles are represented by \min" and \max" pints f the hyper-rectangle. Thus, the space needed t stre the representatin f bunding rectangles increases linearly with the number f dimensins. This is nt a prblem fr the hb-tree (which des nt stre MBRs), the TV-tree (which nly uses a few dimensins at a time), r the grid le. Traversal Cst. When traversing a R-tree r kdb tree, we have t examine the bunding regins f children in the nde t determine whether t traverse the subtree. This step requires checking the ranges f every dimensin in the representatin f bunding rectangles. Thus, the CPU cst f examining bunding rectangles increases prprtinally t the number f dimensins f data pints. This prblem is mitigated fr the hb-tree r the TV-tree. This is nt a prblem fr the 6

7 grid-le. Build Time. The set f bjects participating in a spatial jin may ften be pruned by selectin predicates [LR94] (e.g. nd all similar internatinal funds). In thse cases, it may be faster t perfrm the nn-spatial selectin predicate rst (select internatinal funds) and then perfrm spatial jin n the result. Thus it is smetimes necessary t build a spatial index n-the-y. Current indices are designed t be built nce; the cst f building them can be mre than the cst f the jin [PD96]. Skewed Data. Handling skewed data is a prblem fr the grid-le. In a k-dimensinal space, a single data page verw may result in a k?1 dimensinal slice being added t the grid-le directry. If the grid-le had n buckets befre the split, and the splitting dimensin had m partitins, n=m new cells are added t the grid after the split. Thus, the size f the directry structure can grw rapidly fr skewed high-dimensinal pints. Summary. Each index has gd and bad features fr prximity jin f high-dimensinal pints. It wuld be dicult t design a general-purpse multi-dimensinal index which des nt have any f the shrtcmings listed abve. Hwever, by designing a special-purpse data structure, we can attack these prblems. We nw describe a new data structure, -kd tree, which is a special-purpse data structure fr this purpse. This data structure can be cnsidered analgus t hash tables built fr equality jins. 3 The -kd tree We intrduce the -kd tree in Sectin 3.1 and then discuss its design ratinale in Sectin kd tree denitin We rst dene the -kd tree 2. We then describe hw t perfrm prximity jins using the -kd tree, rst fr the case where the data ts in memry, and then fr the case where it des nt. -kd tree We assume, withut lss f generality, that the c-rdinates f the pints in each dimensin lie between 0 and +1. Pinters t the data pints are stred in leaf ndes. We start with a single leaf nde. Whenever the number f pints in a leaf nde exceeds a threshld, the leaf nde is split, and cnverted t an interir nde. If the leaf nde was at level i, the ith dimensin is used fr splitting the nde. The nde is split int b1=c parts, such that the width f each new leaf nde in the ith dimensin is either r slightly greater than. In the rest f this sectin, we assume withut lss f generality that is an exact divisr f 1. An example f -kd tree fr tw dimensinal space, with = 0:25, is shwn in Figure 4. 2 It is really a trie, but we call it a tree since it is cnceptually similar t kdb tree. 7

8 rt leaf leaf leaf leaves Figure 4: -kd tree Prximity Jin using the -kd tree Let x be an internal nde in the -kd tree. We use x[i] t dente the ith child f x. Let f be the fanut f the tree. Nte that f = 1=. Figure 5 describes the jin algrithm. The algrithm initially calls self-jin(rt), fr the self-jin versin, r jin(rt1, rt2), fr the nn-self-jin versin. The prcedures leaf-jin(x, y) and leaf-self-jin(x) perfrm a srt-merge jin n leaf ndes. Fr high-dimensinal data, the -kd tree will rarely use all the dimensins fr splitting. (Fr instance, with dimensins and a f 0.1, there wuld have t be mre than pints befre all dimensins are used.) Thus we can usually use ne f the free unsplit dimensin as a cmmn \srt dimensin". The pints in every leaf nde are kept srted n this dimensin, rather then being srted repeatedly during the jin. When jining tw leaf ndes, the algrithm des a srt-merge using this dimensin. Nte that we can use the same jin cde fr all the L p distance metrics, with nly the nal test between a pair f pints being metric-dependent. Memry Management The value f is ften given at run-time. Since the value f is a parameter fr building the data structure, it may nt be pssible t build a disk-based versin f the -kd tree in advance. Instead, we srt the multi-dimensinal pints with the rst splitting dimensin and keep them as an external le. We rst describe the jin algrithm, assuming that main-memry can hld all pints within a 2 distance n the rst dimensin, and then generalize it. The jin algrithm rst reads pints whse values in the srted dimensin lie between 0 and 2, builds the -kd tree fr thse pints in main memry, and perfrms the prximity jin in memry. The algrithm then deallcates the space used fr the pints whse values in the srted dimensin are between 0 and, reads pints whse values are between 2 and 3, build the -kd tree fr these pints, and perfrms the jin 8

9 prcedure jin(x, y) begin if leaf-nde(x) and leaf-nde(y) then leaf-jin(x, y); else if leaf-nde(x) then begin fr i = 1 t f d jin(x, y[i]); end else if leaf-nde(y) then begin fr i = 1 t f d jin(x[i], y); end else begin fr i = 1 t f? 1 d begin jin(x[i], y[i]); jin(x[i], y[i+1]); jin(x[i+1], y[i]); end jin(x[f], y[f]); end end prcedure self-jin(x) begin if leaf-nde(x) then leaf-self-jin(x); else begin fr i = 1 t f?1 d begin self-jin(x[i], x[i]); jin(x[i], x[i+1]); end self-jin(x[f], x[f]); end end Figure 5: Jin algrithm (a) Step 1 (b) Step 2 (c) Step 3 (d) Step 4 Figure 6: Memry Management: Example 1 prcedure again. This prcedure is cntinued until all the pints have been prcessed. Nte that we nly read each pint the disk nce. This prcedure is illustrated in Figure 6. The shaded prtin illustrates the data present in main memry in each step. This prcedure wrks because the build time fr the -kd tree is extremely small. It can be generalized t the case where a 2 chunk f the data des nt t in memry. The basic idea is t partitin the data int 2 chunks using an additinal dimensin. Then, the jin prcedure (i.e read pints int memry, build -kd, perfrm jin and s n) is instead repeated fr each 4 2 chunk f the data using the additinal dimensin. This prcess is shwn in Figure 7. The rst square shws the data partitined int a grid, with the fur 2 chunks in the tp left crner (11, 12, 21 and 22) shaded. These 4 chunks are read int memry, and the jin is perfrmed between these chunks, and fr each chunk. Chunks 11 and 21 are then drpped, chunks 13 and 23 read int memry, and the jins between 12, 22, 13 and 23 perfrmed. This prcess cntinues till all jins are cmplete. 9

10 Figure 7: Memry Management: Example Design Ratinale Tw distinguishing features f -kd tree are: Biased Splitting : We d nt treat all dimensins equally fr chsing split pints. Instead, we preferentially split ne dimensin multiple times. Sized Splitting : When we split a nde, we split the nde in sized chunks. We discuss belw hw these features help -kd tree slve the prblems with current indices utlined in Sectin 2. Number f Neighbring Leaf Ndes. Recall that with current indices, the number f neighbring leaf pages may increase expnentially with the number f dimensins. The -kd slves this prblem because f the biased splitting. When the length f the bunding rectangle f each leaf ndes in the split dimensin is at least, at mst tw neighbring leaf ndes need t be cnsidered fr the jin test. Hwever, as the length f the bunding rectangle in the split dimensin becmes less than, the number f neighbr leaf ndes fr jin test increases. Hence when a leaf nde becmes full, we split the nde int several children, each f size in the split dimensin. We split the nde int several children at nce, rather than gradually, in rder t reduce the build time and the number f children in each neighbr.

11 D0 X D2 X D1 (a) Chsing split dimensin glbally (b) Chsing split dimensin lcally Figure 8: Glbal and Lcal Ordering f Split Dimensins We have tw alternatives fr chsing the next split dimensin: glbal rdering and lcal rdering. Glbal rdering uses the same split dimensin fr all the ndes in the same level, while lcal rdering chses the split dimensin based n the distributin f pints in each nde. Examples f these tw cases are shwn in Figure 8, fr a 3-dimensinal space. Fr bth rderings, the dimensin D0 is used fr splitting in the rt nde (i.e. level 0). Fr glbal rdering, nly D1 is used fr splitting in level 1. Hwever, fr lcal rdering, bth D1 and D2 are chsen alternatively fr neighbring ndes in level 1. Cnsider the leaf nde labeled X. With glbal rdering, it has 5 neighbr leaf ndes (shaded in the gure). The number f neighbrs increases t 9 fr lcal rdering. Ntice that the space cvered by the neighbrs fr glbal rder is a prper subset f that cvered by the neighbrs fr lcal rdering. The dierence in the space cvered by the tw rderings increases as decreases. Hence we chse glbal rdering fr split dimensins, rather than lcal rdering. When the number f pints are s huge that the -kd tree is frced t split every dimensin, then the number f neighbrs will be cmparable t ther indices. Hwever, till that limit, the number f neighbrs depends n the number f pints (and their distributin) and, and is independent f the number f dimensins. The rder in which dimensins are chsen fr splitting can signicantly aect the space utilizatin and jin cst if crrelatins exist between sme f the dimensins. This prblem can be slved by statistically analyzing a sample f the data, and chsing fr the next split the dimensin that has the least crrelatin with the dimensins already used fr splitting. Space Requirements. Fr each internal nde, we simply need an array f pinters t its children. We d nt need t stre minimum bunding rectangles because they can be cmputed. Hence the space required depends nly n the number f pints (and their distributin), and is independent f the number f dimensins. Traversal Cst. Since we split ndes in sized chunks, traversal cst is extremely small. The jin prcedure never has t check bunding rectangles f ndes t decide whether r nt they may 11

12 Ttal number f pints T Range f pints 0 t 1 R + tree -kd tree Average number f pints per leaf nde N r N e Number f dimensins used fr splitting D r D e Table 1: Ntatin cntain pints within distance. Build time. The build time is small because we d nt have cmplex splitting algrithms, r splits that prpagate upwards. Skewed data. data reasnably. Since splitting a nde des nt aect ther ndes, the -kd tree will handle skewed 4 Analysis In this sectin, we analyze the number f jin tests fr the -kd tree, and the number f jin and screen tests fr the R + tree. The gal is t understand the behavir f the indices as the number f dimensins and number f pints vary. Fr simplicity, we assume that the MBRs f the R + tree cver the whle space. (That is, we really use a K-D-B tree as a prxy fr the R+ in this analysis. Fr datasets with a lt f pints, the gap between the MBRs will be small; hence this is a reasnable apprximatin.) In this analysis, we nly cnsider unifrm distributin f pints. We als assume that the R + tree will always split a rectangle n a dimensin which has nt been used fr splitting (if such a dimensin is available), 3 and that the bunding rectangle is split at its midpint. Further, we assume that the set f free dimensins is the same fr all leaf pages. This issue is similar t the issue f glbal vs. lcal splitting fr the -kd tree; hence this assumptin is favrable t the R + tree. We rst cnsider the cases where bth the R + tree and the -kd tree have unsplit dimensins left, and then extraplate t the case where either the R + tree, r bth indices, have n unsplit dimensins. Table 1 summarizes the ntatin we will use in this sectin. 4.1 R + tree: analysis Recall that in the, the jin test is perfrmed fr pints in each leaf nde, as well as between pairs f leaf ndes that have an verlap when their MBRs are extended by. A naive apprach t perfrming the jin test wuld be t use the nested-lp jin. In ther wrds, fr each pint in 3 The traditinal splitting heuristic which tries t minimize bth the vlume and perimeter f the hyper-rectangles will result in the R + tree usually splitting the MBR n a dimensin which has nt been used fr splitting. 12

13 (a) 1-dimensinal bundary (b) 0-dimensinal bundary Figure 9: Eect f Screening left leaf nde, we examine all pints in right leaf nde. Hwever, as shwn earlier in Figure 2, the algrithm can rst screen the pints in each leaf ndes t see if they t within the extended MBR f the ther leaf nde. Let D r be the number f split dimensins used by the R + tree. Then the R + tree will have 2 D r pages, with T=2 D r pints in each. Let L m be the maximum number f pints in a leaf page. Since T=2 D r < L m, D r = dlg 2 (T=L m )e. Nte every leaf page falls within the extended MBR f every ther leaf nde, since they all meet at the mid-pint f the space. Thus fr each leaf page, the pints in every ther leaf page have t be screened. Since there are T=N r pages, and T pints have t be screened fr each page, the ttal number f screen tests is given by # Screen Tests T 2 =N r (1) Next, we lk at the number f jin tests. If tw leaf ndes have a (D r?k)-dimensinal cmmn bundary (amng the dimensins used fr splitting), the expected number f pints in each nde left after screening is N r (2) k, where N r is the average number f pints in a leaf nde. (Fr each dimensin which is nt a bundary, 2 f the remaining pints are left after screening, since the size f the page in each dimensin is 1/2.) This is illustrated in Figure 9. Fr the tw leaf ndes with a 1-dimensinal (0-dimensinal) cmmn bundary, there are 2 (4 2 ) f the pints left after screening n each nde. The number f jin tests fr tw leaf pages with a (D r?k)-dimensinal cmmn bundary wuld be (N r (2) k ) 2, using a nested-lp jin. If we rst srt the pints n ne f the dimensins, and use a srt-merge jin, the cst drps t (N r (2) k ) 2 2 = N 2 r (2) 2k+1 (2) We d nt cunt the time fr the srt, since we can srt all the pints n ne f the unsplit dimensins at the time the tree is built. Hence, if we knw the number f neighbr leaf ndes that have a cmmn (D r?k)-dimensinal bundary, we can estimate the number f jin tests. We can represent the psitin f each leaf page in D r -dimensinal space as a D r -dimensinal blean array, where \True" in the k-th dimensin wuld crrespnd t the nde being abve the mid-pint f the space in the k-th dimensin. If tw leaf pages had the same value fr k dimensins, they wuld have a k-dimensinal cmmn 13

14 bundary. Since there are has (2).! D r k! ways f chsing k dimensins that are the same, each leaf page D r neighbrs that have a k-dimensinal cmmn bundary. k We can nw cmpute the ttal number f jin tests, fr all (T=N r ) leaf pages, by plugging in # Jin Tests = (T=N r ) XD r i=1 D r i! N 2 r (2) 2i+1 = T N r XD r i=1 D r i! (2) 2i+1 Since is typically quite small, 2 wuld be cnsiderably smaller than D r. Hence we can ignre terms with 5 r higher pwers f, resulting in T N r D r 8 3 (3) The abve frmula des nt include the cst f a self-jin n the pints in each leaf page. Since this requires N 2 r 2 cmparisns per leaf page, and there are (T=N r ) pages, the ttal number f jin tests fr these self-jins is Cmbining (3) and (4), we get T N r 2 (4) # Jin Tests T N r 2 (1 + D r 4 2 ) (5) Since typically N r < (T=N r ) (the number f pints per pages times is less than the number f leaf pages), and D r 4 2 < 1, we get T N r 2 < T T=N r That is, the number f jin tests (Equatin 5) is less than the number f screen tests (Equatin 1). Smene might argue that nt perfrming screen test may imprve the cst f prximity jin. Assume that we d nt perfrm screen and perfrm srt-merge jin nly. Then, the number f jin tests between a pair f leaf ndes becmes N r N r 2. The cst f screening pints fr a pair f leaf pages is 2N r. Since N r > 1, the jin cst withut screening is mre expensive than the cst f screening kd tree: analysis Let the -kd tree have a depth f D e. Each leaf has N e T D e pints n average. Recall that there is n screen cst fr the -kd. If we use a srt-merge jin, the jin cst between a pair f leaf ndes is N e N e 2 14

15 We d nt cunt the time fr the srt since that can be dne at the time the tree is built (that is, nce per leaf nde, rather than nce per pair f leaf ndes within distance). A leaf nde in the -kd has at mst 3 D e? 1 neighbrs within distance. Nw, 3 D e? 1 (T=N e ) (3) D e, since (T=N e ) is the number f leaf ndes, and (3) D e the fractin f leaf ndes within distance. Thus the ttal number f jin tests is (T=N e ) [(T=N e ) (3) D e ] N e 2 2 = T 2 2 (3) D e We can simplify the frmula by multiplying by a fudge factr f 1.5 (at the cst f penalizing the -kd tree a little). Thus we get # Jin Tests T 2 (3) D e+1 (6) 4.3 Cmparisn f csts Since there are n screen csts fr the -kd tree, Equatin 6 shws the dminant factr behind the perfrmance f the -kd tree. We can nw cmpare this with the dminant factr fr the R + tree, the number f screen tests given in Equatin 1. Frm the tw equatins, the -kd tree will be faster than the R + tree when (3) De+1 < (1=N r ) (7) Plugging in sme typical values, N r = 50; = 0:05 int Equatin 7, the number f tests fr the -kd tree is cnsiderably smaller even fr D e = 2. Nte that the perfrmance f the R + tree cannt be imprved by increasing the size f leaf pages (thus decreasing 1=N r ), since the jin cst wuld becme dminant as the size increased. Equatin 7 ignres traversal cst, which is much lwer fr the -kd tree than the R + tree since the -kd des nt have t check fr intersectin f MBRs. Further, Equatin 7 ignres the number f jin tests fr the R + tree cmpletely and just cnsiders screen csts. On the ther hand, since the R + tree des nt partitin the space, but uses MBRS, the number f screens fr the R + tree is likely t be smewhat less than suggested by the abve frmulae. In Sectin 5.5, we give empirical results fr the number f jin tests fr the -kd tree, and the number f jin and screen tests fr the R + tree, while varying several parameters. These results crrelate well with the abve analysis. 4.4 Other Cases We nw cnsider the case where bth indices have n unsplit dimensins, fllwed by the case where nly the R + tree has unsplit dimensins. (Since the R + tree des unbiased splitting, it will always run ut f dimensins befre the -kd tree.) 15

16 If the -kd tree has n unsplit dimensins left, we expect the number f jin tests t be similar fr the -kd tree and the R + tree, since bth f them will partitin the space in a similar manner. Fr high-dimensinal data, we expect this case t be very rare. Fr example, with a f 0.1, and dimensinal data, there have t be mre than 11 pints befre this ccurs. If nly the R + tree has n unsplit dimensins left, the perfrmance gap will narrw cmpared t the case where bth trees have unsplit dimensins. As the R + tree lls the space, the screen cst becmes less dminant cmpared t the jin cst. Hwever, the sum f the csts is still higher than fr the -kd tree, since the R + tree still perates in a higher dimensinal space than the -kd tree. Anther way t lk at this case is t cnsider it being in between the case where bth trees having unsplit dimensins and case where bth trees having n unsplit dimensins. Since there is a gradual transitin, the perfrmance gap narrws till the number f jin plus screen tests becme cmparable when bth trees have n unsplit dimensins left. 5 Perfrmance Evaluatin We empirically cmpared the perfrmance f the -kd tree with bth and a srt-merge algrithm. The experiments were perfrmed n an IBM RS/ wrkstatin with a CPU clck rate f 66 MHz, 128 MB f main memry, and running AIX Data was stred n a lcal disk, with measured thrughput f abut 1.5 MB/sec. We rst describe the algrithms cmpared in Sectin 5.1, and the datasets used in experiments in Sectin 5.2. Next, we shw the perfrmance f the algrithms n synthetic and real-life datasets in Sectins 5.3 and 5.4 respectively. Finally, we explain the bserved perfrmance by lking at the number f jin tests and screen cunts in Sectin Algrithms -kd tree. We implemented the -kd tree algrithm described in Sectin 3.1. A leaf nde was cnverted t an internal nde (i.e. split) if its memry usage exceeded 4096 bytes. Hwever, if there were n dimensins left fr splitting, the leaf nde was allwed t exceed this limit. The executin times fr the -kd tree include the I/O cst f reading an external srted le cntaining the data pints, as well as the cst f building the data structure. Since the external le can be generated nce and reused fr dierent value f, the executin times d nt include the time t srt the external le. R + tree. Our experiments indicated that the R + tree was faster than the R tree fr prximity jins n a set f high-dimensinal pints. (Recall that the dierence between R + tree and R? tree is that R + tree des nt allw verlap between minimum bunding rectangles. Hence it reduces the number f verlapping leaf ndes t be cnsidered fr the spatial prximity jin, resulting in faster executin time.) We therefre used R + tree fr ur experiments. We used a page size f

17 -kd R + tree Srt-Merge Jin Cst Yes Yes Yes Build Cst Yes N { Srt Cst (rst dim.) N { N Table 2: Csts included in the executin times. Parameter Default Value Range f Values Number f Pints,000,000 t 1 millin Number f Dimensins 4 t 28 (jin distance) t 0.2 Range f Pints -1 t +1 -same- Distance Metric L 2 -nrm L 1, L 2, L1 nrms Table 3: Synthetic Data Parameters bytes. In ur experiments, we ensured that the R + tree always t in memry and a built R + tree was available in memry befre the jin executin began. Thus, the executin time fr R + tree des nt include any build time it nly includes CPU time fr main-memry jin. (Althugh this gives the R + tree an unfair advantage, we err n the cnservative side.) 2-level Srt-Merge. Cnsider a simple srt-merge algrithm, which reads the data frm a le srted n ne f the dimensins and perfrms the jin test n all pairs f pints whse values in the srt dimensin are clser than. We implemented a mre sphisticated versin f this algrithm, which reads a 2 chunk f the srted data int memry, further srts in memry this data n a secnd dimensin, and then perfrms the jin test n pairs f pints whse values in the secnd srt dimensin are clse than. The algrithm then drps the rst chunk frm memry and reads the next chunk, and s n. The executin times reprted fr this algrithm als d nt include the external srt time. Table 2 summarizes the csts included in the executin times fr each algrithm. 5.2 Data Sets and Perfrmance Metrics Synthetic Datasets. We generated tw types f synthetic datasets: unifrm and gaussian. The values in each dimensin were randmly generated in the range?1:0 t 1:0 with either unifrm r gaussian distributin. Fr the Gaussian distributin, the mean and the standard deviatin were 0 and 0.25 respectively. Table 3 shws the parameters fr the datasets, alng with their default values and the range f values fr which we cnducted experiments. Distance Functins We used L 1, L 2 and L1 as distance functins in ur experiments. The extended bunding rectangles btained by extending MBRs by dier slightly in R + tree depending n distance functins. Figure shws the extended bunding regins fr the L 1, L 2 and L1 nrms. 17

18 (a) L 1 (b) L 2 (c) L1 Figure : Bunding Regins extended by Executin Time (sec.) 0 Unifrm Distributin 2-level Srt-merge Executin Time (sec.) 00 0 Gaussian Distributin 2-level Srt-merge Epsiln Epsiln Figure 11: Perfrmance n Synthetic Data: Value The rectangles with slid line represents the MBR f a leaf nde and the dashed lines the extended bunding regins. This dierence in the regins cvered by the extended regins may result in a slightly dierent number f intersecting leaf ndes fr a given a leaf nde. Hwever, in the R-tree family f spatial indices, the selectin query is usually represented by rectangles t reduce the cst f traversing the index. Thus, the extended bunding rectangles t be used t traverse the index fr bth L 1 and L 2 becme the same as that fr L Results n Synthetic Data value. Figure 11 shws the results f varying frm 0.01 t 0.2, fr bth unifrm and gaussian data distributins. L 2 is used as distance metric. We did nt explre the behavir f the algrithms fr greater than 0.2 since the jin result becmes t large t be meaningful. Nte that the executin times are shwn n a lg scale. The -kd tree algrithm is typically arund 2 t 20 times faster than the ther algrithms. Fr lw values f (0.01), the 2-level srt-merge algrithm is quite eective. In fact, the srt-merge algrithm and the -kd algrithm d almst the same actins, since the -kd will nly have arund 2 levels (excluding the rt). Fr the gaussian distributin, the perfrmance gap between the -kd tree and the R + tree narrws fr high values f because the jin result is very large. 18

19 Executin Time (sec.) 00 0 L 1 -nrm 2-level Srt-merge Executin Time (sec.) 00 0 L1-nrm 2-level Srt-merge Epsiln Epsiln Figure 12: Perfrmance: Distance Metrics (Gaussian Distributin) Distance Metric. Figure 12 shws the results f varying fr the L 1 and L1 nrms fr the gaussian distributin. The results fr the same datasets fr the L 2 nrm were shwn in Figure 11. The relative perfrmance f the algrithms is almst identical fr the three distance metrics. Althugh nt shwn, we btained similar results fr the unifrm distributin, and in ur ther experiments as well. Hence we nly shw the results fr the L 2 -nrm in the remaining experiments. Number f Dimensins. Figure 13 shws the results f increasing the number f dimensins frm 4 t 28. Again, the executin times are shwn using a lg scale. The -kd algrithm is arund 5 t 19 times faster than the srt-merge algrithm. Fr 8 dimensins r higher, it is arund 3 t 47 times faster than the R + tree, the perfrmance gap increasing with the number f dimensins. Fr 4 dimensins, it is nly slightly faster, since there are enugh pints fr the -kd tree t be lled in all dimensins. Fr the R + tree, increasing the number f dimensins increases the verhead f traversing the index, as well as the number f neighbring leaf ndes and the cst f screening them. Hence the time increases dramatically when ging frm 4 t 28 dimensins. 4 Even the srt-merge algrithm perfrms better than the R + tree at higher dimensins. In cntrast, the executin time fr the -kd remains rughly cnstant as the number f dimensins increases. Number f Pints. T see the scale up f -kd tree, we varied the number f pints frm,000 t 1,000,000. The results are shwn in Figure 14. Fr R + tree, we d nt shw results fr 1,000,000 pints because the tree n lnger t in main memry. Nne f the algrithms have linear scale-up; but the srt-merge algrithms has smewhat wrse scaleup than the ther tw algrithms. Fr the gaussian distributin, the perfrmance advantage f the -kd tree cmpared t the R + tree remains fairly cnstant (as a percentage). Fr the unifrm distributin, the relative perfrmance advantage 4 The dip in the R + tree executin time when ging frm 4 t 8 dimensin fr the gaussian distributin is because f the decrease in jin result size. This eect is als nticeable fr the -kd tree, fr bth distributins. 19

20 0 Unifrm Distributin 2-level Srt-merge 00 Gaussian Distributin 2-level Srt-merge Executin Time (sec.) Executin Time (sec.) Dimensin Dimensin Figure 13: Perfrmance n Synthetic Data: Number f Dimensins Unifrm Distributin 2-level Srt-merge Gaussian Distributin 2-level Srt-merge Executin Time (sec.) 0 Executin Time (sec.) Number f Pints ( 000s) Number f Pints ( 000s) Figure 14: Perfrmance n Synthetic Data: Number f Pints f the -kd tree varies since the average depth f the -kd tree des nt increase gradually as the number f pints increases. Rather, it jumps suddenly, frm arund 3 t arund 4, etc. These transitins ccur between 20,000 and 50,000 pints, and between 500,000 and 750,000 pints. Nn-self-jins. Figure 15 shws the executin times fr a prximity jin between tw dierent datasets (generated with dierent randm seeds). The size f ne f the datasets was xed at,000 pints, and the size f the ther dataset was varied frm,000 pints dwn t 5,000 pints. Fr experiments where the secnd dataset had,000 pints r fewer, each experiment was run 5 times with dierent randm seeds fr the secnd dataset and the results averaged. With bth datasets at,000 pints, the perfrmance gap between the R + tree and the -kd tree is similar t that n a self-jin with 200,000 pints. As the size f the secnd dataset decreases, the perfrmance gap als decreases. The reasn is that the time t build the index is included fr the -kd tree, but nt fr the R + tree. 20

21 0 Unifrm Distributin 0 Gaussian Distributin Executin Time Executin Time Rati f Size f Tw Data Set Rati f Size f Tw Data Set Figure 15: Nn-self-jins 00 2-level Srt-merge 0 2-level Srt-merge Executin Time 0 Executin Time Epsiln Dimensin Figure 16: Perfrmance n Mutual Fund Data 5.4 Experiment with a Real-life Data Set We experimented with the fllwing real-life dataset. Similar Time Sequences Cnsider the prblem f nding similar time sequences. The algrithm prpsed in [ALSS95] rst nds similar \atmic" subsequences, and then stitches tgether the atmic subsequence matches t get similar subsequences r similar sequences. Each sequence is brken int atmic subsequences by using a sliding windw f size w. The atmic subsequences are then mapped t pints in a w-dimensinal space. The prblem f nding similar atmic subsequences nw crrespnds t the prblem f nding pairs f w-dimensinal pints within distance f each ther, using the L1 nrm. (See [ALSS95] fr the ratinale behind this apprach.) The time sequences in ur experiment were the daily clsing prices f 795 U.S. mutual funds, frm Jan 4, 1993 t March 3, There were arund 400,000 pints fr the experiment (since each sequence is brken using a sliding windw). The data was btained frm the MIT AI Labratries' Experimental Stck Market Data Server ( We varied the 21

22 windw size (i.e. dimensin) frm 8 t 16 and frm 0.05 t 0.2. Figure 16 shws the resulting executin times fr the three algrithms. The results are quite similar t thse btained n the synthetic dataset, with the -kd tree utperfrming the ther tw algrithms. 5.5 Number f jin and screen tests Figure 17 shws the number f jin tests and screen cunts fr the 3 algrithms. In general, the R + tree has fewer jin tests than the -kd tree, but cnsiderably mre screen tests. Ntice that the relative curves fr the jin tests fr the -kd and srt-merge, and the screen tests fr the R + tree, are very similar t the executin times shwn in Sectin 5.4. The relative number f jin and screen tests fr the R + tree, and the jin tests -kd tree are als as predicted by the analysis in Sectin 4. In particular, cnsider the graphs shwing the numbers f tests while varying the number f dimensins. At higher dimensins, the R + tree has a lt mre screen tests than jin tests. The gap decreases at lwer dimensins as the R + trees \lls ut" the space. Further, the number f jin tests fr the -kd tree is independent f the number f dimensins. As the number f pints increases, the number f jin tests fr the -kd tree increases smthly fr the gaussian data, since the average height f the tree increases smthly. Fr unifrm data, the number f jin tests actually decreases when ging frm 25,000 t 50,000 pints and frm 500,000 t 750,000 pints. The reasn is that the average depth f the -kd tree jumps frm arund 3 (excluding the rt) t arund 4 and frm 4 t 5, decreasing the number f jin tests. 5.6 Summary The -kd tree was typically 2 t 47 times faster than the R + tree n self-jins, with the perfrmance gap increasing with the number f dimensins. It was typically 2 t 20 times faster than the srtmerge. The 2-level srt-merge was usually slwer than R + tree. But fr high dimensins (> 15) r lw values f (0.01), it was faster than the R + tree. Fr nn-self-jins, the results were similar when the datasets being jined were nt f very dierent sizes. Fr datasets with dierent sizes (e.g. 1: rati), the -kd tree was still faster than the R + tree. But the perfrmance gap narrwed since we include the build time fr the -kd tree, but nt fr the R + tree. The distance metric did nt signicantly aect the results: the relative perfrmance f the algrithms was almst identical fr the L 1, L 2 and L1 nrms. 6 Biased R-trees We shwed that the -kd is a fast data structure fr high-dimensinal prximity jins. Since the R-tree family is a very ppular index structure, and many cmmercial systems have implemented 22

23 Jin Test and Screen Cunts 1e+09 1e+08 1e+07 1e Unifrm Distributin 2-level Srt-merge : # f Jin Test : # f Jin Test R+tree : # f Screen Test R+tree : # f Jin Test Jin Test and Screen Cunts 1e+ 1e+09 1e+08 1e+07 1e+06 Gaussian Distributin 2-level Srt-merge : # f Jin Test : # f Jin Test R+tree : # f Screen Test R+tree : # f Jin Test Epsiln Epsiln Jin Test and Screen Cunts 1e+ 1e+09 1e+08 1e+07 1e level Srt-merge : # f Jin Test : # f Jin Test R+tree : # f Screen Test R+tree : # f Jin Test Jin Test and Screen Cunts 1e+ 1e+09 1e+08 1e+07 1e+06 2-level Srt-merge : # f Jin Test : # f Jin Test R+tree : # f Screen Test R+tree : # f Jin Test Number f Pints ( 000s) Number f Pints ( 000s) Jin Test and Screen Cunts 1e+09 1e+08 1e+07 1e+06 2-level Srt-merge : # f Jin Test : # f Jin Test R+tree : # f Screen Test R+tree : # f Jin Test Jin Test and Screen Cunts 1e+09 1e+08 1e+07 1e+06 2-level Srt-merge : # f Jin Test : # f Jin Test R+tree : # f Screen Test R+tree : # f Jin Test Dimensin Dimensin Figure 17: Jin Test and Screen Cunts 23

24 120 Unifrm Distributin Gaussian Distributin Executin Time Executin Time Build Epsiln Build Epsiln Figure 18: Eect f using dierent build s R trees, we decided t explre the pssibility f incrprating sme the ideas used in the -kd tree t the R tree family. Althugh we lk at the R + tree in this sectin, we expect the results t als apply t ther members f the R tree family. S far, we built the -kd tree n the y, as required. Hwever, this is nt an ptin fr the R + tree since its build time is much higher. We rst examine the perfrmance f the -kd tree when built with a dierent value than the value used fr the prximity jin, t see if it still maintains its perfrmance advantage. 6.1 Using dierent build and jin values fr the -kd tree We lked at the result f building the -kd tree with ne value, say the build, and ding the jin with a dierent, say the jin. Figure 18 shws the executin time fr a xed jin f 0.1, fr dierent build values. The hrizntal line shws the time fr the fr the same jin. As expected, the best perfrmance fr the -kd tree ccurs when the build is the same as the jin. Hwever, its perfrmance is better than that fr the thrughut a wide range f build values. There are several eects which impact the shape f the graph. Build > 0.1 As the build increases, the number f jin test increases, since the vlume f the space in adjacent buckets increases. This results in a gradual increase in the verall executin time, fr bth distributins. Build < 0.1 Fr build frm 0.1 t arund 0.025, the dminant factr is again the number f jin tests. Assume the depth each leaf nde is k. When bth build and jin are 0.1, each leaf nde has 3 k?1 neighbrs within distance. Hwever, when the build value is a bit smaller than 0.1 (e.g ), there are 5 k?1 neighbrs within distance fr each leaf nde. Since the number f pints in each bucket is almst the same whether the build is 0.1 r 0.099, the ttal number 24

25 f jin tests shts up when the build is decreased slightly frm 0.1. As the build decreases further, the number f pints in each leaf nde decreases, while the number f neighbrs stays the same. Hence the number f jin tests, and the ttal executin time decrease frm build f thrugh When the build decreases frm 0.05 t 0.049, the number f neighbrs jumps again, frm 5 k?1 t 7 k?1, and the cycle cntinues. Fr the unifrm distributin, the average height f the tree decreases abruptly by arund 1 level when the build is arund Hence the number f jin tests increases. Fr the gaussian distributin, there are n abrupt changes in the level f the tree. Hence this pattern is clearly visible frm 0.1 t arund Build 0.1 Fr build belw 0.02, the dminant factr is the traversal cst. Fr the unifrm distributin, the number f leaf ndes increases threefld as the build drps frm 0.02 t Mre imprtantly, the number f neighbrs t lk at increases frm arund 3 t Thugh the number f jin tests des nt change dramatically, the verhead f making functin calls while traversing the tree increases the verall time. Fr the unifrm distributin, this eect is mitigated by the fact that the average height f the tree cmes dwn as the build decreases frm 0.02 t Hence the verall time actually cmes dwn befre increasing again. 6.2 Biased Splitting fr the R + tree Since the -kd tree retains its perfrmance advantage even if it is built with a dierent than the jin, we explre using as a parameter fr building the R + tree. We apply a biased splitting heuristic t the R + tree: the same dimensin is selected repeatedly fr splitting as lng as the length f this dimensin in the MBR is at least 2. Traditinal heuristics are used t decide the split pint in the split dimensin. If the length f the current split dimensin is less than 2, the split heuristic mves t the next dimensin cyclically. We call this mdied R + tree the biased R + tree. We cmpared tw versins f R + -tree, ne with a traditinal splitting algrithm (that is, unbiased splitting) and the ther with biased splitting algrithm. Figure 19 shws that the average number f leaf ndes within distance fr the R + tree and the biased R + tree, as the number f dimensins increases. Biased splitting results in arund 5 t 25 times fewer intersecting pages, with the rati increasing with the number f dimensins. Figure 20 shws the executin times fr the biased R + tree. As expected, the perfrmance is mid-way between the perfrmance f the R + tree and the -kd tree. There are tw main reasns fr the biased R + tree remaining slwer than the -kd tree. First, the traversal cst is much higher fr the biased R + tree since the extended regins fr each leaf page still have t be checked against the MBRs in internal ndes. Secnd, the biased splitting heuristic stps splitting at 2, cmpared t fr the -kd tree. We btained similar results when varying and the number f pints. Since the biased R + tree nly uses a few dimensins fr splitting at each level, it need nt stre 25

High-dimensional Similarity Joins

High-dimensional Similarity Joins High-dimensinal Similarity Jins Kyusek Shim æ Ramakrishnan Srikant Rakesh Agrawal IBM Almaden Research Center 650 Harry Rad, San Jse, CA 95120 Abstract Many emerging data mining applicatins require a similarity

More information

High-dimensional Proximity Joins. Kyuseok Shim æ Ramakrishnan Srikant Rakesh Agrawal. IBM Almaden Research Center. Abstract

High-dimensional Proximity Joins. Kyuseok Shim æ Ramakrishnan Srikant Rakesh Agrawal. IBM Almaden Research Center. Abstract High-dimensinal Prximity Jins Kyusek Shim æ Ramakrishnan Srikant Rakesh Agrawal IBM Almaden Research Center 650 Harry Rad, San Jse, CA 95120 Abstract Many emerging data mining applicatins require a prximity

More information

Using SPLAY Tree s for state-full packet classification

Using SPLAY Tree s for state-full packet classification Curse Prject Using SPLAY Tree s fr state-full packet classificatin 1- What is a Splay Tree? These ntes discuss the splay tree, a frm f self-adjusting search tree in which the amrtized time fr an access,

More information

1 Version Spaces. CS 478 Homework 1 SOLUTION

1 Version Spaces. CS 478 Homework 1 SOLUTION CS 478 Hmewrk SOLUTION This is a pssible slutin t the hmewrk, althugh there may be ther crrect respnses t sme f the questins. The questins are repeated in this fnt, while answers are in a mnspaced fnt.

More information

A Fast Algorithm for high-dimensional Similarity Joins. Kyuseok Shim Ramakrishnan Srikant Rakesh Agrawal. IBM Almaden Research Center.

A Fast Algorithm for high-dimensional Similarity Joins. Kyuseok Shim Ramakrishnan Srikant Rakesh Agrawal. IBM Almaden Research Center. A Fast Algrithm fr high-dimensinal Similarity Jins Kyusek Shim Ramakrishnan Srikant Rakesh Agrawal IBM Almaden Research Center 650 Harry Rad, San Jse, CA 95120 Abstract Many emerging data mining applicatins

More information

Instance Based Learning

Instance Based Learning Instance Based Learning Vibhav Ggate The University f Texas at Dallas Readings: Mitchell, Chapter 8 surces: curse slides are based n material frm a variety f surces, including Tm Dietterich, Carls Guestrin,

More information

INSTALLING CCRQINVOICE

INSTALLING CCRQINVOICE INSTALLING CCRQINVOICE Thank yu fr selecting CCRQInvice. This dcument prvides a quick review f hw t install CCRQInvice. Detailed instructins can be fund in the prgram manual. While this may seem like a

More information

Computer Organization and Architecture

Computer Organization and Architecture Campus de Gualtar 4710-057 Braga UNIVERSIDADE DO MINHO ESCOLA DE ENGENHARIA Departament de Infrmática Cmputer Organizatin and Architecture 5th Editin, 2000 by William Stallings Table f Cntents I. OVERVIEW.

More information

1 Binary Trees and Adaptive Data Compression

1 Binary Trees and Adaptive Data Compression University f Illinis at Chicag CS 202: Data Structures and Discrete Mathematics II Handut 5 Prfessr Rbert H. Slan September 18, 2002 A Little Bttle... with the wrds DRINK ME, (r Adaptive data cmpressin

More information

Scatter Search And Bionomic Algorithms For The Aircraft Landing Problem

Scatter Search And Bionomic Algorithms For The Aircraft Landing Problem Scatter Search And Binmic Algrithms Fr The Aircraft Landing Prblem J. E. Beasley Mathematical Sciences Brunel University Uxbridge UB8 3PH United Kingdm http://peple.brunel.ac.uk/~mastjjb/jeb/jeb.html Abstract:

More information

Last time: search strategies

Last time: search strategies Last time: search strategies Uninfrmed: Use nly infrmatin available in the prblem frmulatin Breadth-first Unifrm-cst Depth-first Depth-limited Iterative deepening Infrmed: Use heuristics t guide the search

More information

$ARCSIGHT_HOME/current/user/agent/map. The files are named in sequential order such as:

$ARCSIGHT_HOME/current/user/agent/map. The files are named in sequential order such as: Lcatin f the map.x.prperties files $ARCSIGHT_HOME/current/user/agent/map File naming cnventin The files are named in sequential rder such as: Sme examples: 1. map.1.prperties 2. map.2.prperties 3. map.3.prperties

More information

CS510 Concurrent Systems Class 2. A Lock-Free Multiprocessor OS Kernel

CS510 Concurrent Systems Class 2. A Lock-Free Multiprocessor OS Kernel CS510 Cncurrent Systems Class 2 A Lck-Free Multiprcessr OS Kernel The Synthesis kernel A research prject at Clumbia University Synthesis V.0 ( 68020 Uniprcessr (Mtrla N virtual memry 1991 - Synthesis V.1

More information

Tutorial 5: Retention time scheduling

Tutorial 5: Retention time scheduling SRM Curse 2014 Tutrial 5 - Scheduling Tutrial 5: Retentin time scheduling The term scheduled SRM refers t measuring SRM transitins nt ver the whle chrmatgraphic gradient but nly fr a shrt time windw arund

More information

Hierarchical Classification of Amazon Products

Hierarchical Classification of Amazon Products Hierarchical Classificatin f Amazn Prducts Bin Wang Stanfrd University, bwang4@stanfrd.edu Shaming Feng Stanfrd University, superfsm@ stanfrd.edu Abstract - This prjects prpsed a hierarchical classificatin

More information

Administrativia. Assignment 1 due tuesday 9/23/2003 BEFORE midnight. Midterm exam 10/09/2003. CS 561, Sessions 8-9 1

Administrativia. Assignment 1 due tuesday 9/23/2003 BEFORE midnight. Midterm exam 10/09/2003. CS 561, Sessions 8-9 1 Administrativia Assignment 1 due tuesday 9/23/2003 BEFORE midnight Midterm eam 10/09/2003 CS 561, Sessins 8-9 1 Last time: search strategies Uninfrmed: Use nly infrmatin available in the prblem frmulatin

More information

Lab 5 Sorting with Linked Lists

Lab 5 Sorting with Linked Lists UNIVERSITY OF CALIFORNIA, SANTA CRUZ BOARD OF STUDIES IN COMPUTER ENGINEERING CMPE13/L: INTRODUCTION TO PROGRAMMING IN C WINTER 2013 Lab 5 Srting with Linked Lists Intrductin Reading This lab intrduces

More information

Please contact technical support if you have questions about the directory that your organization uses for user management.

Please contact technical support if you have questions about the directory that your organization uses for user management. Overview ACTIVE DATA CALENDAR LDAP/AD IMPLEMENTATION GUIDE Active Data Calendar allws fr the use f single authenticatin fr users lgging int the administrative area f the applicatin thrugh LDAP/AD. LDAP

More information

A FRAMEWORK FOR PROCESSING K-BEST SITE QUERY

A FRAMEWORK FOR PROCESSING K-BEST SITE QUERY Internatinal Jurnal f Database Management Systems ( IJDMS ) Vl., N., Octber 0 A FRAMEWORK FOR PROCESSING K-BEST SITE QUERY Yuan-K Huang * and Lien-Fa Lin Department f Infrmatin Cmmunicatin Ka-Yuan University;

More information

Due Date: Lab report is due on Mar 6 (PRA 01) or Mar 7 (PRA 02)

Due Date: Lab report is due on Mar 6 (PRA 01) or Mar 7 (PRA 02) Lab 3 Packet Scheduling Due Date: Lab reprt is due n Mar 6 (PRA 01) r Mar 7 (PRA 02) Teams: This lab may be cmpleted in teams f 2 students (Teams f three r mre are nt permitted. All members receive the

More information

3.1 QUADRATIC FUNCTIONS IN VERTEX FORM

3.1 QUADRATIC FUNCTIONS IN VERTEX FORM 3.1 QUADRATIC FUNCTIONS IN VERTEX FORM PC0 T determine the crdinates f the vertex, the dmain and range, the axis f symmetry, the x and y intercepts and the directin f pening f the graph f f(x)=a(x p) +

More information

You need to be able to define the following terms and answer basic questions about them:

You need to be able to define the following terms and answer basic questions about them: CS440/ECE448 Fall 2016 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f each Turing

More information

In-Class Exercise. Hashing Used in: Hashing Algorithm

In-Class Exercise. Hashing Used in: Hashing Algorithm In-Class Exercise Hashing Used in: Encryptin fr authenticatin Hash a digital signature, get the value assciated with the digital signature,and bth are sent separately t receiver. The receiver then uses

More information

Studio Software Update 7.7 Release Notes

Studio Software Update 7.7 Release Notes Studi Sftware Update 7.7 Release Ntes Summary: Previus Studi Release: 2013.10.17/2015.01.07 All included Studi applicatins have been validated fr cmpatibility with previusly created Akrmetrix Studi file

More information

PAGE NAMING STRATEGIES

PAGE NAMING STRATEGIES PAGE NAMING STRATEGIES Naming Yur Pages in SiteCatalyst May 14, 2007 Versin 1.1 CHAPTER 1 1 Page Naming The pagename variable is used t identify each page that will be tracked n the web site. If the pagename

More information

Computer Organization and Architecture

Computer Organization and Architecture Campus de Gualtar 4710-057 Braga UNIVERSIDADE DO MINHO ESCOLA DE ENGENHARIA Departament de Infrmática Cmputer Organizatin and Architecture 5th Editin, 2000 by William Stallings Table f Cntents I. OVERVIEW.

More information

UiPath Automation. Walkthrough. Walkthrough Calculate Client Security Hash

UiPath Automation. Walkthrough. Walkthrough Calculate Client Security Hash UiPath Autmatin Walkthrugh Walkthrugh Calculate Client Security Hash Walkthrugh Calculate Client Security Hash Start with the REFramewrk template. We start ff with a simple implementatin t demnstrate the

More information

TRAINING GUIDE. Overview of Lucity Spatial

TRAINING GUIDE. Overview of Lucity Spatial TRAINING GUIDE Overview f Lucity Spatial Overview f Lucity Spatial In this sessin, we ll cver the key cmpnents f Lucity Spatial. Table f Cntents Lucity Spatial... 2 Requirements... 2 Setup... 3 Assign

More information

CSE 361S Intro to Systems Software Lab #2

CSE 361S Intro to Systems Software Lab #2 Due: Thursday, September 22, 2011 CSE 361S Intr t Systems Sftware Lab #2 Intrductin This lab will intrduce yu t the GNU tls in the Linux prgramming envirnment we will be using fr CSE 361S this semester,

More information

Design Patterns. Collectional Patterns. Session objectives 11/06/2012. Introduction. Composite pattern. Iterator pattern

Design Patterns. Collectional Patterns. Session objectives 11/06/2012. Introduction. Composite pattern. Iterator pattern Design Patterns By Võ Văn Hải Faculty f Infrmatin Technlgies HUI Cllectinal Patterns Sessin bjectives Intrductin Cmpsite pattern Iteratr pattern 2 1 Intrductin Cllectinal patterns primarily: Deal with

More information

BI Publisher TEMPLATE Tutorial

BI Publisher TEMPLATE Tutorial PepleSft Campus Slutins 9.0 BI Publisher TEMPLATE Tutrial Lessn T2 Create, Frmat and View a Simple Reprt Using an Existing Query with Real Data This tutrial assumes that yu have cmpleted BI Publisher Tutrial:

More information

Contents: Module. Objectives. Lesson 1: Lesson 2: appropriately. As benefit of good. with almost any planning. it places on the.

Contents: Module. Objectives. Lesson 1: Lesson 2: appropriately. As benefit of good. with almost any planning. it places on the. 1 f 22 26/09/2016 15:58 Mdule Cnsideratins Cntents: Lessn 1: Lessn 2: Mdule Befre yu start with almst any planning. apprpriately. As benefit f gd T appreciate architecture. it places n the understanding

More information

Chapter 6: Lgic Based Testing LOGIC BASED TESTING: This unit gives an indepth verview f lgic based testing and its implementatin. At the end f this unit, the student will be able t: Understand the cncept

More information

Project #1 - Fraction Calculator

Project #1 - Fraction Calculator AP Cmputer Science Liberty High Schl Prject #1 - Fractin Calculatr Students will implement a basic calculatr that handles fractins. 1. Required Behavir and Grading Scheme (100 pints ttal) Criteria Pints

More information

ClassFlow Administrator User Guide

ClassFlow Administrator User Guide ClassFlw Administratr User Guide ClassFlw User Engagement Team April 2017 www.classflw.cm 1 Cntents Overview... 3 User Management... 3 Manual Entry via the User Management Page... 4 Creating Individual

More information

TL 9000 Quality Management System. Measurements Handbook. SFQ Examples

TL 9000 Quality Management System. Measurements Handbook. SFQ Examples Quality Excellence fr Suppliers f Telecmmunicatins Frum (QuEST Frum) TL 9000 Quality Management System Measurements Handbk Cpyright QuEST Frum Sftware Fix Quality (SFQ) Examples 8.1 8.1.1 SFQ Example The

More information

Stealing passwords via browser refresh

Stealing passwords via browser refresh Stealing passwrds via brwser refresh Authr: Karmendra Khli [karmendra.khli@paladin.net] Date: August 07, 2004 Versin: 1.1 The brwser s back and refresh features can be used t steal passwrds frm insecurely

More information

Exercise 4: Working with tabular data Exploring infant mortality in the 1900s

Exercise 4: Working with tabular data Exploring infant mortality in the 1900s Exercise 4: Wrking with tabular data Explring infant mrtality in the 1900s Backgrund Althugh peple tend t think abut GIS as being primarily cncerned with mapping. It is better thught f as a type f database

More information

McGill University School of Computer Science COMP-206. Software Systems. Due: September 29, 2008 on WEB CT at 23:55.

McGill University School of Computer Science COMP-206. Software Systems. Due: September 29, 2008 on WEB CT at 23:55. Schl f Cmputer Science McGill University Schl f Cmputer Science COMP-206 Sftware Systems Due: September 29, 2008 n WEB CT at 23:55 Operating Systems This assignment explres the Unix perating system and

More information

B Tech Project First Stage Report on

B Tech Project First Stage Report on B Tech Prject First Stage Reprt n GPU Based Image Prcessing Submitted by Sumit Shekhar (05007028) Under the guidance f Prf Subhasis Chaudhari 1. Intrductin 1.1 Graphic Prcessr Units A graphic prcessr unit

More information

UiPath Automation. Walkthrough. Walkthrough Calculate Client Security Hash

UiPath Automation. Walkthrough. Walkthrough Calculate Client Security Hash UiPath Autmatin Walkthrugh Walkthrugh Calculate Client Security Hash Walkthrugh Calculate Client Security Hash Start with the REFramewrk template. We start ff with a simple implementatin t demnstrate the

More information

Eastern Mediterranean University School of Computing and Technology Information Technology Lecture2 Functions

Eastern Mediterranean University School of Computing and Technology Information Technology Lecture2 Functions Eastern Mediterranean University Schl f Cmputing and Technlgy Infrmatin Technlgy Lecture2 Functins User Defined Functins Why d we need functins? T make yur prgram readable and rganized T reduce repeated

More information

Performance of VSA in VMware vsphere 5

Performance of VSA in VMware vsphere 5 Perfrmance f VSA in VMware vsphere 5 Perfrmance Study TECHNICAL WHITE PAPER Table f Cntents Intrductin... 3 Executive Summary... 3 Test Envirnment... 3 Key Factrs f VSA Perfrmance... 4 Cmmn Strage Perfrmance

More information

Retrieval Effectiveness Measures. Overview

Retrieval Effectiveness Measures. Overview Retrieval Effectiveness Measures Vasu Sathu 25th March 2001 Overview Evaluatin in IR Types f Evaluatin Retrieval Perfrmance Evaluatin Measures f Retrieval Effectiveness Single Valued Measures Alternative

More information

Two Dimensional Truss

Two Dimensional Truss Tw Dimensinal Truss Intrductin This tutrial was created using ANSYS 7.0 t slve a simple 2D Truss prblem. This is the first f fur intrductry ANSYS tutrials. Prblem Descriptin Determine the ndal deflectins,

More information

Lab 4. Name: Checked: Objectives:

Lab 4. Name: Checked: Objectives: Lab 4 Name: Checked: Objectives: Learn hw t test cde snippets interactively. Learn abut the Java API Practice using Randm, Math, and String methds and assrted ther methds frm the Java API Part A. Use jgrasp

More information

Iteration Part 2. Review: Iteration [Part 1] Flow charts for two loop constructs. Review: Syntax of loops. while continuation_condition : statement1

Iteration Part 2. Review: Iteration [Part 1] Flow charts for two loop constructs. Review: Syntax of loops. while continuation_condition : statement1 Review: Iteratin [Part 1] Iteratin Part 2 CS111 Cmputer Prgramming Department f Cmputer Science Wellesley Cllege Iteratin is the repeated executin f a set f statements until a stpping cnditin is reached.

More information

User Manual for. Version: copyright by PHOENIX Showcontroller GmbH & Co.KG - Boris Bollinger GERMANY

User Manual for. Version: copyright by PHOENIX Showcontroller GmbH & Co.KG - Boris Bollinger GERMANY User Manual fr Versin: 4.0 17.10.2011 cpyright by PHOENIX Shwcntrller GmbH & C.KG - Bris Bllinger GERMANY What s New Versin 2.0 This versin mainly aims t imprve the wrkflw and the utput f the raster image

More information

WEB LAB - Subset Extraction

WEB LAB - Subset Extraction WEB LAB - Subset Extractin Fall 2005 Authrs: Megha Siddavanahalli Swati Singhal Table f Cntents: Sl. N. Tpic Page N. 1 Abstract 2 2 Intrductin 2 3 Backgrund 2 4 Scpe and Cnstraints 3 5 Basic Subset Extractin

More information

TRAINING GUIDE. Lucity Mobile

TRAINING GUIDE. Lucity Mobile TRAINING GUIDE The Lucity mbile app gives users the pwer f the Lucity tls while in the field. They can lkup asset infrmatin, review and create wrk rders, create inspectins, and many mre things. This manual

More information

24-4 Image Formation by Thin Lenses

24-4 Image Formation by Thin Lenses 24-4 Image Frmatin by Thin Lenses Lenses, which are imprtant fr crrecting visin, fr micrscpes, and fr many telescpes, rely n the refractin f light t frm images. As with mirrrs, we draw ray agrams t help

More information

FIREWALL RULE SET OPTIMIZATION

FIREWALL RULE SET OPTIMIZATION Authr Name: Mungle Mukupa Supervisr : Mr Barry Irwin Date : 25 th Octber 2010 Security and Netwrks Research Grup Department f Cmputer Science Rhdes University Intrductin Firewalls have been and cntinue

More information

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Yu will learn the fllwing in this lab: The UNIVERSITY f NORTH CAROLINA at CHAPEL HILL Cmp 541 Digital Lgic and Cmputer Design Spring 2016 Lab Prject (PART A): A Full Cmputer! Issued Fri 4/8/16; Suggested

More information

Adverse Action Letters

Adverse Action Letters Adverse Actin Letters Setup and Usage Instructins The FRS Adverse Actin Letter mdule was designed t prvide yu with a very elabrate and sphisticated slutin t help autmate and handle all f yur Adverse Actin

More information

STUDIO DESIGNER. Design Projects Basic Participant

STUDIO DESIGNER. Design Projects Basic Participant Design Prjects Basic Participant Thank yu fr enrlling in Design Prjects 2 fr Studi Designer. Please feel free t ask questins as they arise. If we start running shrt n time, we may hld ff n sme f them and

More information

If you have any questions that are not covered in this manual, we encourage you to contact us at or send an to

If you have any questions that are not covered in this manual, we encourage you to contact us at or send an  to Overview Welcme t Vercity, the ESS web management system fr rdering backgrund screens and managing the results. Frm any cmputer, yu can lg in and access yur applicants securely, rder a new reprt, and even

More information

of Prolog An Overview 1.1 An example program: defining family relations

of Prolog An Overview 1.1 An example program: defining family relations An Overview f Prlg This chaptereviews basic mechanisms f Prlg thrugh an example prgram. Althugh the treatment is largely infrmal many imprtant cncepts are intrduced. 1.1 An example prgram: defining family

More information

Data Miner Platinum. DataMinerPlatinum allows you to build custom reports with advanced queries. Reports > DataMinerPlatinum

Data Miner Platinum. DataMinerPlatinum allows you to build custom reports with advanced queries. Reports > DataMinerPlatinum Data Miner Platinum DataMinerPlatinum allws yu t build custm reprts with advanced queries. Reprts > DataMinerPlatinum Click Add New Recrd. Mve thrugh the tabs alng the tp t build yur reprt, with the end

More information

On the road again. The network layer. Data and control planes. Router forwarding tables. The network layer data plane. CS242 Computer Networks

On the road again. The network layer. Data and control planes. Router forwarding tables. The network layer data plane. CS242 Computer Networks On the rad again The netwrk layer data plane CS242 Cmputer Netwrks The netwrk layer The transprt layer is respnsible fr applicatin t applicatin transprt. The netwrk layer is respnsible fr hst t hst transprt.

More information

Outlook Web Application (OWA) Basic Training

Outlook Web Application (OWA) Basic Training Outlk Web Applicatin (OWA) Basic Training Requirements t use OWA Full Versin: Yu must use at least versin 7 f Internet Explrer, Safari n Mac, and Firefx 3.X. (Ggle Chrme r Internet Explrer versin 6, yu

More information

In Java, we can use Comparable and Comparator to compare objects.

In Java, we can use Comparable and Comparator to compare objects. Pririty Queues CS231 - Fall 2017 Pririty Queues In a pririty queue, things get inserted int the queue in rder f pririty Pririty queues cntain entries = {keys, values /** Interface fr a key- value pair

More information

Memory Hierarchy. Goal of a memory hierarchy. Typical numbers. Processor-Memory Performance Gap. Principle of locality. Caches

Memory Hierarchy. Goal of a memory hierarchy. Typical numbers. Processor-Memory Performance Gap. Principle of locality. Caches Memry Hierarchy Gal f a memry hierarchy Memry: hierarchy f cmpnents f varius speeds and capacities Hierarchy driven by cst and perfrmance In early days Primary memry = main memry Secndary memry = disks

More information

IBM Cognos TM1 Web Tips and Techniques

IBM Cognos TM1 Web Tips and Techniques Tip r Technique IBM Cgns TM1 Web Tips and Prduct(s): IBM Cgns TM1 Area f Interest: Develpment IBM Cgns TM1 Web Tips and 2 Cpyright Cpyright 2008 Cgns ULC (frmerly Cgns Incrprated). Cgns ULC is an IBM Cmpany.

More information

UNIT IV. 1. We have been looking mostly at the higher-level models of a database. At the conceptual or logical level the database was viewed as

UNIT IV. 1. We have been looking mostly at the higher-level models of a database. At the conceptual or logical level the database was viewed as UNIT IV Strage and File Structures: Overview f Physical Strage Media Magnetic Disks RAID Tertiary Strage Strage Access File Organizatin. Indexing and Hashing: Basic Cncepts Static Hashing Dynamic Hashing.

More information

- Replacement of a single statement with a sequence of statements(promotes regularity)

- Replacement of a single statement with a sequence of statements(promotes regularity) ALGOL - Java and C built using ALGOL 60 - Simple and cncise and elegance - Universal - Clse as pssible t mathematical ntatin - Language can describe the algrithms - Mechanically translatable t machine

More information

TN How to configure servers to use Optimise2 (ERO) when using Oracle

TN How to configure servers to use Optimise2 (ERO) when using Oracle TN 1498843- Hw t cnfigure servers t use Optimise2 (ERO) when using Oracle Overview Enhanced Reprting Optimisatin (als knwn as ERO and Optimise2 ) is a feature f Cntrller which is t speed up certain types

More information

Reporting Requirements Specification

Reporting Requirements Specification Cmmunity Mental Health Cmmn Assessment Prject OCAN 2.0 - ing Requirements Specificatin May 4, 2010 Versin 2.0.2 SECURITY NOTICE This material and the infrmatin cntained herein are prprietary t Cmmunity

More information

Lab 1 - Calculator. K&R All of Chapter 1, 7.4, and Appendix B1.2

Lab 1 - Calculator. K&R All of Chapter 1, 7.4, and Appendix B1.2 UNIVERSITY OF CALIFORNIA, SANTA CRUZ BOARD OF STUDIES IN COMPUTER ENGINEERING CMPE13/L: INTRODUCTION TO PROGRAMMING IN C SPRING 2012 Lab 1 - Calculatr Intrductin In this lab yu will be writing yur first

More information

Data Structure Interview Questions

Data Structure Interview Questions Data Structure Interview Questins A list f tp frequently asked Data Structure interview questins and answers are given belw. 1) What is Data Structure? Explain. Data structure is a way that specifies hw

More information

Distributed Data Structures xfs: Serverless Network File System

Distributed Data Structures xfs: Serverless Network File System Advanced Tpics in Cmputer Systems, CS262B Prf Eric A. Brewer Distributed Data Structures xfs: Serverless Netwrk File System February 3, 2004 I. DDS Gal: new persistent strage layer that is a better fit

More information

Using the Swiftpage Connect List Manager

Using the Swiftpage Connect List Manager Quick Start Guide T: Using the Swiftpage Cnnect List Manager The Swiftpage Cnnect List Manager can be used t imprt yur cntacts, mdify cntact infrmatin, create grups ut f thse cntacts, filter yur cntacts

More information

QUICK START GUIDE FOR THE TREB CONNECT INTERFACE

QUICK START GUIDE FOR THE TREB CONNECT INTERFACE QUICK START GUIDE FOR THE TREB CONNECT INTERFACE CONNECT is a jint venture f the Trnt Real Estate Bard, REALTORS Assciatin f Hamiltn-Burlingtn, the Lndn and St. Thmas Assciatin f REALTORS and the Ottawa

More information

Implementation of Authentication Mechanism for a Virtual File System

Implementation of Authentication Mechanism for a Virtual File System Implementatin f Authenticatin Mechanism fr a Virtual File System Prject fr Operating Systems Curse (CS 5204) Implemented by- Vinth Jagannathan Abhishek Ram Under the guidance f Dr Dennis Kafura Abstract

More information

High-dimensional Proximity Joins. IBM Almaden Research Center. the particular data-mining problem of ænding similar

High-dimensional Proximity Joins. IBM Almaden Research Center. the particular data-mining problem of ænding similar Parallel Algrithms fr High-dimensinal Prximity Jins Jhn C. Shafer æ Rakesh Agrawal IBM Almaden Research Center 650 Harry Rad, San Jse, CA 95120 Abstract We cnsider the prblem f parallelizing highdimensinal

More information

Overview of Supervised Learning

Overview of Supervised Learning ESL Chap2 Overview f Supervised Learning Overview f Supervised Learning Ntatin X: inputs, feature vectr, predictrs, independent variables. Generally X will be a vectr f p real values. Qualitative features

More information

InformationNOW Standardized Tests

InformationNOW Standardized Tests InfrmatinNOW Standardized Tests Abut this Guide This Quick Reference Guide prvides an verview f the ptins available fr tracking standardized tests in InfrmatinNOW. Creating Standardized Tests T create

More information

THE CASE FOR MOVING TO DISK-BASED ARCHIVES FOR ENTERPRISE IT

THE CASE FOR MOVING TO DISK-BASED ARCHIVES FOR ENTERPRISE IT THE CASE FOR MOVING TO DISK-BASED ARCHIVES FOR ENTERPRISE IT ABSTRACT This white paper discusses the ideas behind upgrading frm tape strage t disk-based archiving. Octber, 2016 WHITE PAPER The infrmatin

More information

CMU 15-7/381 CSPs. Teachers: Ariel Procaccia Emma Brunskill (THIS TIME) With thanks to Ariel Procaccia and other prior instructions for slides

CMU 15-7/381 CSPs. Teachers: Ariel Procaccia Emma Brunskill (THIS TIME) With thanks to Ariel Procaccia and other prior instructions for slides CMU 15-7/381 CSPs Teachers: Ariel Prcaccia Emma Brunskill (THIS TIME) With thanks t Ariel Prcaccia and ther prir instructins fr slides Class Scheduling Wes 4 mre required classes t graduate A: Algrithms

More information

02/02/2011. Chapter 3: Low Level Stuff: Storage. Hard Disk 101. Hard Disk 101 cont d: Significant Times

02/02/2011. Chapter 3: Low Level Stuff: Storage. Hard Disk 101. Hard Disk 101 cont d: Significant Times Chapter 3: Lw Level Stuff: Strage Hard Disk Operatin Index Files Hash Addressing Disk Pack cmprises Platters Platters always spinning 400 rpm head reads/writes at any time Track is made f Sectrs Term Blck

More information

EUROPEAN IP NETWORK NUMBER APPLICATION FORM & SUPPORTING NOTES

EUROPEAN IP NETWORK NUMBER APPLICATION FORM & SUPPORTING NOTES EUROPEAN IP NETWORK NUMBER APPLICATION FORM & SUPPORTING NOTES T whm it may cncern. Thank yu fr yur request fr an IP netwrk number. Please ensure that yu read the infrmatin belw carefully befre submitting

More information

A Framework for Local Supervised Dimensionality Reduction of High Dimensional Data

A Framework for Local Supervised Dimensionality Reduction of High Dimensional Data A Framewrk fr Lcal Supervised Dimensinality Reductin f High Dimensinal Data Charu C. Aggarwal IBM T. J. Watsn Research Center charu@us.ibm.cm Abstract High dimensinal data presents a challenge t the classificatin

More information

Using CppSim to Generate Neural Network Modules in Simulink using the simulink_neural_net_gen command

Using CppSim to Generate Neural Network Modules in Simulink using the simulink_neural_net_gen command Using CppSim t Generate Neural Netwrk Mdules in Simulink using the simulink_neural_net_gen cmmand Michael H. Perrtt http://www.cppsim.cm June 24, 2008 Cpyright 2008 by Michael H. Perrtt All rights reserved.

More information

Colored Image Watermarking Technique Based on HVS using HSV Color Model Piyush Kapoor 1, Krishna Kumar Sharma 1, S.S. Bedi 2, Ashwani Kumar 3, 1

Colored Image Watermarking Technique Based on HVS using HSV Color Model Piyush Kapoor 1, Krishna Kumar Sharma 1, S.S. Bedi 2, Ashwani Kumar 3, 1 ACEEE Int. J. n Netwrk Security, Vl. 02, N. 03, July 2011 Clred Image Watermarking Technique Based n HVS using HSV Clr Mdel Piyush Kapr 1, Krishna Kumar Sharma 1, S.S. Bedi 2, Ashwani Kumar 3, 1 Indian

More information

Extensible Query Processing in Starburst

Extensible Query Processing in Starburst Extensible Query Prcessing in Starburst Laura M. Haas, J.C. Freytag, G.M. Lhman, and H.Pirahesh IBM Almaden Research Center CS848 Instructr: David Tman Presented By Yunpeng James Liu Outline Intrductin

More information

CS4500/5500 Operating Systems Page Replacement Algorithms and Segmentation

CS4500/5500 Operating Systems Page Replacement Algorithms and Segmentation Operating Systems Page Replacement Algrithms and Segmentatin Yanyan Zhuang Department f Cmputer Science http://www.cs.uccs.edu/~yzhuang UC. Clrad Springs Ref. MOSE, OS@Austin, Clumbia, Rchester Recap f

More information

Test Pilot User Guide

Test Pilot User Guide Test Pilt User Guide Adapted frm http://www.clearlearning.cm Accessing Assessments and Surveys Test Pilt assessments and surveys are designed t be delivered t anyne using a standard web brwser and thus

More information

It has hardware. It has application software.

It has hardware. It has application software. Q.1 What is System? Explain with an example A system is an arrangement in which all its unit assemble wrk tgether accrding t a set f rules. It can als be defined as a way f wrking, rganizing r ding ne

More information

Simple Regression in Minitab 1

Simple Regression in Minitab 1 Simple Regressin in Minitab 1 Belw is a sample data set that we will be using fr tday s exercise. It lists the heights & weights fr 10 men and 12 wmen. Male Female Height (in) 69 70 65 72 76 70 70 66 68

More information

* The mode WheelWork starts in can be changed using command line options.

* The mode WheelWork starts in can be changed using command line options. OperatiOns Manual Overview Yur muse mst likely has a wheel n it. If yu lk at yur muse frm the side, that wheel rtates clckwise and cunterclckwise, like a knb. If yu lk at the muse frm the tp, the wheel

More information

Report Customization. Course Outline. Notes. Report Designer Tour. Report Modification

Report Customization. Course Outline. Notes. Report Designer Tour. Report Modification 1 8 5 0 G a t e w a y B u l e v a r d S u i t e 1 0 6 0 C n c r d, C A 9 4 5 2 0 T e l 9 2 5-681- 2 3 2 6 F a x 9 2 5-682- 2900 Reprt Custmizatin This curse cvers the prcess fr custmizing reprts using

More information

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Yu will learn the fllwing in this lab: The UNIVERSITY f NORTH CAROLINA at CHAPEL HILL Designing a mdule with multiple memries Designing and using a bitmap fnt Designing a memry-mapped display Cmp 541 Digital

More information

SW-G using new DryadLINQ(Argentia)

SW-G using new DryadLINQ(Argentia) SW-G using new DryadLINQ(Argentia) DRYADLINQ: Dryad is a high-perfrmance, general-purpse distributed cmputing engine that is designed t manage executin f large-scale applicatins n varius cluster technlgies,

More information

LAB 0 MATLAB INTRODUCTION

LAB 0 MATLAB INTRODUCTION Befre lab: LAB 0 MATLAB INTRODUCTION It is recmmended that yu d the secnd prblem n the hmewrk befre cming t lab, althugh that wn t be required r cllected in lab. It is als recmmended that yu read the Intrductin

More information

CSE 3320 Operating Systems Deadlock Jia Rao

CSE 3320 Operating Systems Deadlock Jia Rao CSE 3320 Operating Systems Deadlck Jia Ra Department f Cmputer Science and Engineering http://ranger.uta.edu/~jra Recap f the Last Class Race cnditins Mutual exclusin and critical regins Tw simple appraches

More information

Exercises: Plotting Complex Figures Using R

Exercises: Plotting Complex Figures Using R Exercises: Pltting Cmplex Figures Using R Versin 2017-11 Exercises: Pltting Cmplex Figures in R 2 Licence This manual is 2016-17, Simn Andrews. This manual is distributed under the creative cmmns Attributin-Nn-Cmmercial-Share

More information

Relius Documents ASP Checklist Entry

Relius Documents ASP Checklist Entry Relius Dcuments ASP Checklist Entry Overview Checklist Entry is the main data entry interface fr the Relius Dcuments ASP system. The data that is cllected within this prgram is used primarily t build dcuments,

More information

CSE 3320 Operating Systems Page Replacement Algorithms and Segmentation Jia Rao

CSE 3320 Operating Systems Page Replacement Algorithms and Segmentation Jia Rao CSE 0 Operating Systems Page Replacement Algrithms and Segmentatin Jia Ra Department f Cmputer Science and Engineering http://ranger.uta.edu/~jra Recap f last Class Virtual memry Memry verlad What if the

More information

Automatic imposition version 5

Automatic imposition version 5 Autmatic impsitin v.5 Page 1/9 Autmatic impsitin versin 5 Descriptin Autmatic impsitin will d the mst cmmn impsitins fr yur digital printer. It will autmatically d flders fr A3, A4, A5 r US Letter page

More information

Vijaya Nallari -Math 8 SOL TEST STUDY GUIDE

Vijaya Nallari -Math 8 SOL TEST STUDY GUIDE Name Perid SOL Test Date Vijaya Nallari -Math 8 SOL TEST STUDY GUIDE Highlighted with RED is Semester 1 and BLUE is Semester 2 8.1- Simplifying Expressins and Fractins, Decimals, Percents, and Scientific

More information

Datacenter Traffic Measurement and Classification

Datacenter Traffic Measurement and Classification Datacenter Traffic Measurement and Classificatin Speaker: Lin Wang Research Advisr: Biswanath Mukherjee Grup meeting 6/15/2017 Datacenter Traffic Measurement and Analysis Data Cllectin Cllect netwrk events

More information