Efficient Estimation of Inclusion Coefficient using HyperLogLog Sketches
|
|
- Gyles Flynn
- 5 years ago
- Views:
Transcription
1 Efficient Estiation of Inclusion Coefficient using HyperLogLog Sketches Azade Nazi, Bolin Ding, Vivek Narasayya, Surajit Chaudhuri Microsoft Research {aznazi, bolind, viveknar, ABSTRACT Efficiently estiating the inclusion coefficient the fraction of values of one colun that are contained in another colun is useful for tasks such as data profiling and foreign-key detection. We present a new estiator,, for inclusion coefficient based on Hyperloglog sketches that results in significantly lower error copared to the state-of-the art approach that uses Botto-k sketches. We evaluate the error of the estiator using experients on industry bencharks such as TPC-H and TPC-DS, and several realworld databases. As an independent contribution, we show how Hyperloglog sketches can be aintained increentally with data deletions using only a constant aount of additional eory. PVLDB Reference Forat: Azade Nazi, Bolin Ding, Vivek Narasayya, Surajit Chaudhuri. Efficient Estiation of Inclusion Coefficient using HyperLogLog Sketches. PVLDB, (): 97-9, 28. DOI: INTRODUCTION The discovery of all inclusion dependencies in a dataset is an iportant part of data profiling efforts. It is useful for tasks such as foreign-key detection and data integration [7, 28, 4, 22, 2]. However, due to issues in data quality such as issing values or ultiple representations of the sae value, it becoes iportant to relax the requireent of exact containent. Thus, coputing the fraction of values of one colun that are contained in another colun inclusion coefficient is of interest to these applications. When the database schea and data sizes are large, coputing the inclusion coefficient for any pair of coluns in the database can be both coputationally expensive and eory intensive. One approach for addressing this challenge is to estiate the inclusion coefficient using only bounded-eory sketches of the data. Given a fixed budget of eory per colun, these techniques scan the data once, and copute a data sketch that fits within the eory budget. For a given pair of coluns X and Y, the inclusion coefficient (X, Y ) is then estiated using only the sketches on coluns X and Y. One such approach, adopted in Zhang et al. [3] is to build a Botto-k sketch [2] on each colun, and develop an inclusion Perission to ake digital or hard copies of all or part of this work for personal or classroo use is granted without fee provided that copies are not ade or distributed for profit or coercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific perission and/or a fee. Articles fro this volue were invited to present their results at The 44th International Conference on Very Large Data Bases, August 28, Rio de Janeiro, Brazil. Proceedings of the VLDB Endowent, Vol., No. Copyright 28 VLDB Endowent /8/6... $.. DOI: coefficient estiator using these sketches. Intuitively, Botto-k sketches have high accuracy for inclusion coefficient estiation when the nuber of distinct values in both X and Y are saller than k since the sketches effectively behave like hash tables of the respective coluns. However, as we show epirically in this paper, one of the liitations of using Botto-k sketches for estiating inclusion coefficient is that for given eory budget (of k values), as the cardinality (i.e. nuber of distinct values) of colun X or Y increases beyond k, the estiation error becoes large. Additionally, Botto-k sketches are not aenable to increental aintenance in situations where data is deleted. For instance, in data warehousing scenarios, it is not uncoon for recent data to be added and older data to be reoved fro the database. Developing an estiator with low error for inclusion coefficient using bounded-eory sketches is challenging. We first establish a hardness result showing that any estiator that relies only on sketches with bounded eory ust incur unbounded error in certain cases. Intuitively, the difficult cases are when colun X has sall cardinality and colun Y has large cardinality or vice-versa. Furtherore, as we report in our epirical evaluation, these difficult cases appear to be quite coon in the real-world databases that we have studied. We develop a new estiator for inclusion coefficient (Binoial Mean Lookup) based on Hyperloglog sketches [4]. Hyperloglog (HLL) sketches can be coputed efficiently and within a given eory budget, requiring invocation of only a single hash function for each value in the colun. HLL sketches are becoing increasingly popular for estiating cardinality in different applications, and are even being adopted in coercial database engines e.g., [6] uses Hyperloglog to support approxiate distinct count functionality. The ain idea behind the estiator is a theoretical result that establishes a apping fro the inclusion coefficient (X, Y ) to the probability that the value of a bucket in the HLL sketch of colun Y is greater than the value of the corresponding bucket in the HLL sketch of colun X., is based on axiu likelihood estiation (MLE) ethod. It observes the nuber of buckets in the HLL sketch of colun Y whose HLL value is greater than the HLL value of the corresponding bucket in colun X, and it returns the value of (X, Y ) that axiizes the likelihood of this observation. As a by-product of our estiation technique, we are also able to provide a bound on the error. This error bound is data-dependent, i.e. it is specific to the pair of coluns X and Y. Such an error bound can be valuable to applications that consue the estiates, and is not provided by prior techniques. Hyperloglog sketches can be aintained increentally in the presence of insertions. An independent contribution of this paper is a technique for increentally aintaining a Hyperloglog sketch in the presence of data deletions with a constant eory overhead. 97
2 The key idea is that each bucket in an HLL sketch always holds an integer value less than a constant `, where ` is the nuber of bits of the hash value. By aintaining a ax-heap of constant size (at ost `) for each bucket we are able to support increental deletion. We show through experients on real-world databases and industry benchark TPC-H and TPC-DS databases that has significantly lower overall error than the approach using Bottok sketches [3], particularly in cases where at least one of the coluns has large cardinality. For cases where both coluns have sall cardinality, the accuracy of both estiators are siilar. For instance, in two real-world databases where there are any coluns with sall and large cardinality, the average error using Botto-k sketches is.3 and.59 respectively, whereas the corresponding errors for are only. and.4. For the other two realworld databases where ost coluns have sall cardinality the average error using Botto-k sketches is.5 and.2 respectively, whereas the corresponding errors for are.6 and.4. Finally, we consider one iportant application of inclusion coefficients, naely the proble of foreign-key (FK) detection in a database. All prior work on FK detection [29, 3, ] relies on exact inclusion coefficients to prune the FK candidates. We show epirically on several real-world and benchark databases that the estiation error of is acceptable for these FK detection techniques. In other word, replacing the exact inclusion coefficient with an estiate obtained via the estiator has no noticeable ipact on the precision and recall of these FK detection algoriths. In suary, this paper akes the following contributions: We establish a hardness result for inclusion coefficient estiation using bounded-eory sketches. We develop an MLE-based estiation algorith called (Binoial Mean Lookup) for inclusion coefficient estiation based on Hyperloglog sketches [4]. We prove the correctness of our algorith and the error bound of our estiator. We show how, with a constant eory overhead, Hyperloglog sketches can be extended to support increental deletion. We ipleent our inclusion coefficient estiator () on Cosos [], a distributed, Big Data engine that is extensively used within Microsoft for analyzing large data sets; and evaluate the effectiveness of using several synthetic and real-world datasets. We easure the precision and recall of two existing foreign-key detection techniques when using inclusion coefficient estiates rather than the exact inclusion coefficients. The rest of the paper is organized as follows. We present a hardness result for inclusion coefficient estiation using sketches and introduce background of Hyperloglog sketch construction in 2. We describe the estiator for inclusion coefficient and its error analysis in 3. The extension to support increental deletion is presented in 4. We describe the results of our experiental evaluation of inclusion coefficient estiation and its application to FK detection in 5. We discuss related work in 6 and conclude in PROBLEM DEFINITION AND BACKGROUND Let database D be the collection of tables T, where the set of all coluns in tables T are C. Let n be the total nuber of coluns ( C = n). For each colun X 2C, the set of all its possible values is called the doain of X, denoted by do(x). We use X[i] as the value of colun X for tuple i. We forally define the inclusion coefficient estiation proble and we discuss the hardness result. 2. Inclusion Coefficient Estiation and Hardness Result Inclusion coefficient is defined between two sets of values, and is used to easure the fraction of values of one set that are contained in the other set. DEFINITION. (Inclusion Coefficient) Given two coluns/sets, X and Y, the inclusion coefficient of X and Y is defined as: (X, Y )= X \ Y, X 6= () X where represents the nuber of distinct values in a set. If X is fully covered by Y (X Y ), (X, Y )=; otherwise apple (X, Y ) <. Note that (X, Y ) is asyetric: in general, (X, Y ) is not equal to (Y,X). In tasks such as foreign key detection and data profiling, we need to calculate inclusion coefficients for any pairs of coluns, which could be too expensive (in ters of both tie and eory) for large datasets. As a trade-off between accuracy and perforance, the idea is to construct a sketch (or a copact data structure) for each colun C 2Cby scanning the data once, and estiate the inclusion coefficient between any pair of coluns using their sketches. Such sketching and estiation techniques are useful in two scenarios: i) coluns are too large to fit into eory; and ii) we want to copute inclusion coefficients for any pairs of coluns (e.g., there are n coluns and inclusion coefficients need to be calculated for all the n(n )/2 pairs). PROBLEM. (Estiating Inclusion Coefficient using sketches) For each colun C 2C, we construct a sketch S C by scanning C once. Then for any two coluns X and Y, we want to derive an estiator ˆ(S X, S Y ) to (X, Y ), by accessing only the two sketches. In the rest part, we write ˆ(S X, S Y ) as ˆ if X and Y are clear fro the context. Table shows frequently used notations. 2.. Estiation and Sketch Size Suppose we estiate the inclusion coefficient (X, Y ) of two coluns X and Y as ˆ, using their sketches. The estiation error (X, Y ) ˆ ranges fro to. Considering the randoness in the sketch construction, ideally, we would like the estiation error to be bounded with high probability, i.e., (X, Y ) ˆ apple with probability at least, for any given two coluns X and Y. Unfortunately, we can show that, unless the sketch size is linear in the nuber of distinct values (which could be equal to the nuber of rows), there is no sketch based on which the worst-case estiation error can be bounded with <and <. A lower bound of sketch size: The hardness can be observed even in a very siple case when X = {x} contains only one eleent, and Y is large. In this case, (X, Y ) takes value either (if x /2 Y ) or (otherwise). Therefore, to bound the estiation error below any constant less than, fro the sketches of Y, we need to distinguish two cases, i) x /2 Y or ii) x 2 Y, with high probability this is exactly the approxiate ebership proble [9]. More forally, we can prove the following hardness result. THEOREM. We pre-copute a sketch for each colun in a database to estiate the inclusion coefficient for pairs of coluns. If we require that, for any two given coluns X and Y 2 C, (X, Y ) ˆ apple < with probability at least >, then any sketch ust use space at least (n C log(/ )) bits per colun C 2C, where n C is the nuber of distinct values in C. PROOF. Fro the above discussion, we can reduce approxiate ebership proble to our inclusion coefficient estiation proble: for each colun Y of size n, we want to construct a sketch 98
3 to support ebership queries, i.e., check whether x 2 Y for any given x, with a false positive rate at ost. Suppose we want to estiate the inclusion coefficient (X, Y ) using sketches of X and Y. Consider the case when X = {x}. Indeed, (X, Y )=if x 2 Y, or if x /2 Y. Therefore, to ensure that the estiation error is less than, there ust be no false positive to the ebership query x 2 Y (with probability less than ) [9] shows that, for this purpose, any sketch ust use space at least (n log(/ )) bits. So the proof is copleted. Reark: [25] gives an even tighter lower bound of sketch sizes when the nuber of distinct values in each colun is unknown. 2.2 HLL Sketch Construction As shown in Theore, the worst-case error cannot be bounded based on sublinear-size sketches. However, the hope is that, for particular instances of X and Y, the estiation errors still could be uch better than the worst-case error. Hyperloglog (HLL) sketch [4, 5] provides a near-optial way to estiate cardinality (i.e. the nuber of distinct values in a set). In 3, we will show how to use HLL sketches to estiate inclusion coefficients. We first review how to construct the HLL sketch of a colun. Let h : do(x) {, }` be a hash function that returns ` bits for each value X[i] 2 do(x). For each hash value s i = h(x[i]), we find the position of the leftost represented by (s i) and the axiu of { (s i) : s i = h(x[i])} is the HLL sketch of the colun X which is used for cardinality estiation [4, 5]. One way to reduce the variance of the estiation is to use ultiple hash functions [5]; however, [5] proposed stochastic averaging that eploying only a single hash function eulates the effect of using different hash functions. They use one hash function but take the first bits to bucketize hash values to iic 2 hash functions and reduce the variance. Then the reaining ` bits of each hash value s i = h(x[i]) are used to find the position of the leftost as (s i). Algorith 3 in Appendix A shows the steps to construct such sketches for a set of coluns C. In 3.3, we discuss how to choose the paraeter for given two coluns X, and Y. X Buckets i j (X[i]) s i : (X[j]) s j : (s i ) = 3 (s j ) = 4 Max( (s i ), (s j )) b b 2^ = 4 Figure : Constructing the HLL sketch of the colun X as S X. For exaple, as shown in Figure, one hash function h is used for all values in colun X, and the first bits are used to bucketize hash values. As a result there are 2 buckets (b,b 2,,b 2 ) in HLL sketch of colun X represented by S X. Specifically, in this figure s i = h(x[i]) and s j = h(x[j]) are assigned to bucket b because their first bits are the binary representation of the value one. Moreover, (s i)=3and (s j)=4because the position of the leftost in the reaining ` bits of the hash values s i and s j are 3, and 4 respectively. If the s i and s j are the only hash values assigned to b, the final value in b is V X = ax(3, 4) = 4. The foral definition of Vi X is given by Definition 2. DEFINITION 2. (Vi X : HLL value of colun X in bucket b i). Let s j be the hash value of the tuple j in X whose first bits (s j[,...,]) indicate it belongs to bucket b i, i.e., s j[,...,] is the binary representation of the value i ((s j[...]) 2 = i), and (s j) be the leftost one in the reaining ` bits of the hash value s j (s j[ +,...,`]). The HLL value of colun X for the bucket b i is defined as: Table : Suary of frequently used notations n Total nuber of coluns nx Nuber of distinct values in colun X S X Sketch of colun X (X, Y ) Inclusion coefficient of coluns X and Y ˆ Estiation of (X, Y ) h ` bits Hash function Nuber of bits assigned for bucketization 2 Nuber of buckets in HLL sketch Vi X HLL value of colun X in bucket b i V X HLL value of colun X when there is only one bucket ˆP Estiated value of pr(v X apple V Y ) e p Estiation error of pr(v X apple V Y ) e Estiation error of (X, Y ) Vi X = ax (s j ) (2) s j :(s j [...]) =i 2 Note that when there is only one bucket we ignore index i and denote the HLL value of colun X by V X. Space coplexity: In the HLL sketch of the colun X (S X) the HLL value of each bucket is an integer nuber ( apple Vi X apple ` ). Thus each bucket needs log(` ) bits to store Vi X, i.e., total eory to store the sketch S X is O(2 log(` )). 3. EFFICIENT ESTIMATION OF INCLU- SION COEFFICIENT In this section, we describe a technique to estiate the inclusion coefficient using Hyperloglog (HLL) sketches. In a preprocessing step, a HLL sketch [4, 5] is constructed for all coluns by scanning the data only once in practice these sketches are copact enough to be able to fit into eory. Our algorith ai to estiate the inclusion coefficient of two coluns X and Y using pre-coputed HLL sketches of X and Y. We also produce an error bound for the estiate. 3. Overview of our Approach The ain idea of our algorith is to produce an estiate of the inclusion coefficient between two coluns X and Y by coparing their HLL values (Definition 2). To develop intuition on why coparing the HLL values of X and Y can help in finding the inclusion coefficient, we start with the case of a HLL sketch with a single bucket, and suppose that X and Y have the sae nuber of distinct values. We exaine the following two extree cases. i) if X = Y or (X, Y )=, we have pr(v X apple V Y )=pr(v X = V Y )=, because the hash function h in HLL is applied on the sae set of values for both X and Y ; and ii) if X \ Y =? or (X, Y )=, pr(v X apple V Y ) is at least.5. A useful observation here is that pr(v X apple V Y ) increases onotonically as (X, Y ) increases (we prove it forally later). For exaple, in Figure 2, we plot pr(v X apple V Y ) as a function of (X, Y ) when X = Y = introduces how to derive this function for general cases. Clearly given bucket i for each coluns X and Y, the event Vi X <= Vi Y is a Bernoulli trial. When there are ultiple buckets since buckets are independent, the events Vi X <= Vi Y are independent Bernoulli trials [5]. The reason that for a given colun the buckets are independent is the result of the HLL construction 2.2. fro property of the universal hashing, first bits of a hash value is independent of the rest ` bits. The intuition behind our algorith is that, using ultiple buckets in the HLL sketches of X and Y, we first estiate pr(v X apple V Y ) given the fact that the event Vi Y is an independent Bernoulli trial for each bucket b i (e.g., estiating pr(v X apple V Y ).8 in Figure 2). Then, since pr(v X apple V Y ) can be written as a function of the inclusion coefficient, we effectively lookup the value of 99
4 (X, Y ) that produces the estiated pr(v X apple V Y ). For exaple, in Figure 2 by looking up we get (X, Y ).62). Maxiizing the likelihood. Our algorith is based on the axiu likelihood estiation (MLE). More forally, let Z = {i Vi Y } be the nuber of buckets (aong all 2 buckets) where Vi Y. The rando variable Z follows a distribution paraeterized by X, Y, and (X, Y ). We observe Z = z fro HLL sketches of X and Y, and we want to choose (X, Y ) to axiize the likelihood of our observation. le = argax pr (Z = z (X, Y )= ). (3) Next we introduce our MLE-based estiation algorith called (Binoial Mean-Lookup). Suppose X and Y are known (or estiated fro their HLL sketches), there are still two reaining questions. First, we need to characterize the distribution of Z with (X, Y ) as a paraeter ( 3.2.). Second, we need an efficient algorith to axiize the likelihood as in Equation 3 ( 3.2.2). We prove the correctness of our algorith, i.e., show that our algorith indeed gives a solution to Equation 3, and analyze its estiation error in Finally, we briefly discuss how can leverage additional eory if available to iprove accuracy ( 3.4). 3.2 : Binoial Mean-Lookup Estiator As shown in Figure 2, pr(v X apple V Y ) increases onotonically as (X, Y ) increases. We first show how to derive the closed for of pr(v X apple V Y ) as a function of (X, Y ) then we discuss the detail of our inclusion coefficient estiator Calculating pr(v X apple V Y ) Given coluns X and Y, V X and V Y are both rando variables. Let us first consider a sipler case where there is only one rando variable V X and discuss how to find pr(v X k), where k is constant. Then we show how to use this siple case to derive the general case pr(v X apple V Y ). Given colun X with n X distinct values, when there is only one bucket ( =), based on the Definition 2, the HLL value of the colun X is in fact the axiu over n X independent rando variables. As we discussed in 2.2, for each hash value s i, we find the position of the leftost represented by (s i) and the axiu of { (s i):s i = h(x[i])} is the HLL sketch of the colun X. Obviously every bit in the s i is a Bernoulli trial given that it is a rando experient with exactly two possible outcoes, and and fro property of the universal hashing every bit of a hash value is independent fro each other. Thus, the bits in s i are independent Bernoulli trials. Each rando variable (s i) represents the leftost one in s i, i.e, first one after (s i) zeros. Thus as also pointed out in [4] each rando variable is geoetrically distributed and pr(v X apple k) is: pr(v X apple k) = nx i= ( Pr(first k bits are zero)) = ( (4) By Equation 4, pr(v X = k) is pr(v X apple k) pr(v X apple k ). pr(v X = k) =( ( 2 k )nx (5) Next we show how to use these equations to derive pr(v X apple V Y ), where both V X and V Y are rando variables. When the intersection of X and Y is non epty V X and V Y are not independent. Let T be X \ Y. To resolve the dependency of X and Y, we consider three disjoint sets: T, X = X \T, and Y = Y \T. As shown in Figure 3 based on the cardinality of these sets there are three different cases. () X and Y are disjoint, i.e. n T =, n X = n X, n Y = n Y. (2) Y is subset of X, i.e., Y = Y \T is epty (n T = n Y, n X = n X n Y, n Y =). (3) X and Y are partially overlapped Y = Y \T is not epty (n T 6=,n X 6=, n Y 6= ). Next we show how to use Equations 4, and 5 to derive the pr(v X apple V Y ) for each case. Case: Clearly, if T = X \ Y is epty (n T =), then V X and V Y are independent. As shown in Figure 3, since we are interested in V X apple V Y if X and Y are disjoint and V Y = k, then V X should be at ost k. Based on Definition 2 we have apple k apple `. Thus the pr(v X apple V Y ) with the help of Equations 4 and 5 can be calculated by Equation 6. Note that since n T =, n X = n X and n Y = n Y. pr(v X apple V Y )= pr(v X apple k) ^ pr(v Y = k) = ( k= k= ( ( 2 k )n Y For exaple, by Equation 6 when X = Y = 4, n T =, (X, Y ) is the pr(v X apple V Y ).58 (Figure 2). Case2: If Y X, then T = Y and Y is epty (n Y =). Siilar to case, if V T = k, then V X should be at ost k, where k can be any value in [,` ]. So pr(v X apple V Y ) is as follows: pr(v X apple V Y )= pr(v X apple k) ^ pr(v T = k) = ( k= k= ( 2 k )nt ( 2 k )nt For exaple, when X = Y = 4, and n T = 4 the (X, Y ) is and the pr(v X apple V Y ) (Figure 2). Case3: Finally, if X and Y are partially overlapped, as shown in Figure 3, given k there three scenarios ) V X apple k, V T apple k,v Y = k, 2) V X apple k, V T = k, V Y apple k, and, and 3) V X apple k, V T = k, V Y = k. Thus the pr(v X apple V Y ) can be derived by Equation 8. pr(v X apple V Y )= pr(v X apple k) ^ pr(v T apple k ) ^ pr(v Y = k) k= + pr(v X apple k) ^ pr(v T = k) ^ pr(v Y apple k ) + pr(v X apple k) ^ pr(v T = k) ^ pr(v Y = k) ( = ( k= +( +( ( ( ( ( 2 k )nt 2 k )nt ( 2 k )nt ( ( ( 2 k )nt 2 k )nt 2 k )n Y ( (6) (7) 2 k )n Y 2 k )n Y (8) For exaple, when X = Y = 4, and n T = 62 the (X, Y ) is.62 and the pr(v X apple V Y ).8 (Figure 2). Multiple buckets: When there are 2 buckets ( >), inspired by stochastic averaging [5] we consider the average case where the HLL value of colun X in Definition 2 will be the axiu over on average nx independent rando variables for each bucket 2 i and Equations 4 and 5 are updated as: pr(vi X apple k) =( 2 k ) nx 2 (9) pr(vi X = k) =( 2 k ) nx 2 ( 2 k ) nx 2 () Later in 3.3, we discuss how considering the average case affects the nuber of buckets.
5 = Lookup = e P e P e e (.62)= e P e = Figure 2: X = Y = 4, Lookup approach Figure 3: For coluns X and Y ( X Y ) cases are () disjoint, (2) overlapped and (3) partially overlapped, where T = X \ Y, X = X \T, Y = Y \T Figure 4: e p and e are the error of estiating pr(v X apple V Y ) and inclusion coefficient (X, Y ) Efficient Algorith to Maxiize the Likelihood Recall fro Equation (3) that we forulate the proble of estiating inclusion coefficient as a axiu likelihood estiation proble, where we observe the nuber of buckets (aong all 2 buckets) where Vi Y is z and the goal is to choose (X, Y ) to axiize the likelihood of our observation. Here we propose our estiator,, to efficiently solve the MLE. has two ain steps. Step one is to estiate pr(v X apple V Y ) using only S X and S Y. Step two is to use lookup approach to ap the estiated probability into the inclusion coefficient. Before describing the details of these two steps, we first provide an overview of how works. As shown in Figure 2, pr(v X apple V Y ) increases by increasing the (X, Y ) (proof in theore 2). Clearly, if estiates pr(v X apple V Y ), it can be used to find the (X, Y ). For exaple, let us assue the estiated value of pr(v X apple V Y ) be.8 ( ˆP =.8). As shown in Figure 2, after projection to blue line ( 3.2.) estiates (X, Y ) to be.62. Algorith shows the detail of the two ain steps of. The input to this algorith is the HLL sketches of colun X and Y (S X, S Y ). In 3.3, we discuss how to choose nuber of buckets for these sketches. In the first step, in order to calculate ˆP, since the event Vi X apple in each bucket b i is an independent Bernoulli trial ( 3.). Thus ˆP is the ratio of the nuber of buckets where Vi Y to the total nuber of buckets (2 ) (lines 3-7). Later in Theore 4, we use Hoeffding inequality to provide the error bound of the ˆP. Vi Y Algorith : // inclusion coefficient estiator : Input: S X, S Y 2: //Step : Estiate pr(v X apple V Y ) using S X, S Y 3: z = 4: for b i, where i 2 [, 2 ] 5: if Vi Y then z = z + 6: ˆP = z 2 7: //Step 2: Lookup step 8: n X = Estiated nuber of distinct values fro S X 9: n Y = Estiated nuber of distinct values fro S Y : return ˆ(X, Y )=Lookup( ˆP,, in(n X,n Y ),n X,n Y ) In the second step, (Algorith ) calls Lookup function (Algorith 2) to estiate inclusion coefficient (X, Y ) as the one that produces the ˆP. The key idea is that since pr(v X apple V Y ) in an increasing function of (X, Y ) (Theore 2), given an estiated probability ˆP we can use binary search to estiate (X, Y ). Algorith 2 shows the pseudo code of the lookup approach. The input to this algorith are ˆP, ininc, axinc, n X, n Y, and where ˆP is the estiation of the pr(v X apple V Y ), ininc and axinc are the boundary of the search, n X, n Y are the estiated cardinality of the X and Y ( [4]), and is the error or tolerance. Algorith 2 is doing binary search over the possible intersection size n T and at each iteration based on the value of n T depends on which case it is, it uses the suitable Equation fro 3.2. (lines 5-8). For exaple, if X \ Y = ;, it uses Equation 6. Such siple bisection procedure for iteratively converging on a solution which is known to lie inside soe interval [a, b] has been introduced in [8] for root finding. They showed the nuber of iterations required to obtain an error saller than is. In our proble since ln(a b) ln ln 2 ln apple (X, Y ) apple, the nuber of iteration would be. For ln 2 exaple, Algorith 2 only needs 7 iterations to obtain an error saller than = 5. It worth entioning that the cost of each iteration is very cheap (analysis in 3.2.3). Algorith 2: Lookup // Map ˆP into ˆ(X, Y ) : Input: ˆP, ininc, axinc, nx,n Y, 2: Outputs: ˆ(X, Y ) 3: n T = ininc+axinc ; ˆ = nt 2 n X 4: if (n T = n X & n T = n Y ) then Prob =. //X = Y 5: else if (n T =) then Prob = Equation 6 //X \ Y = ; 6: else if (n X >n Y & n T = n Y ) then Prob = Equation 7 7: else Prob= Equation 8 8: if Prob ˆP apple return ˆ 9: if Prob > ˆP return Lookup( ˆP,n T,axInc,n X,n Y ) : if Prob < ˆP return Lookup( ˆP,inInc,n T,n X,n Y ) Algorith Correctness and Analysis We provide the following analysis to show the correctness, efficiency, and the error bound of. We prove pr(v X apple V Y ) is an increasing function of (X, Y ), i.e., in lookup step there is a one to one apping between the probability pr(v X apple V Y ) and the inclusion coefficient. In (3), we forulated the proble of estiating inclusion coefficient as axiu likelihood proble. We prove that the results of and the MLE forulation (3) are identical. We also provide the error bound of the for estiation of the pr(v X apple V Y ) and inclusion coefficient. Finally, we show the tie coplexity of the Algorith. Algorith Correctness: uses binary search in order to find the apping between the probability ˆP and the inclusion coefficient (Algorith 2). This approach only works if there is a one to one apping fro pr(v X apple V Y ) into (X, Y ). Theore 2 shows that probability pr(v X apple V Y ) ( 3.2.) is an increasing function of (X, Y ). Clearly an increasing function is a one to one function, hence is invertible.
6 THEOREM 2. Given two coluns X, and Y with intersection T, where T = n T, probability pr(v X apple V Y ) is an increasing function of (X, Y ). PROOF. Please find the proof in Appendix B. In Theore 3 we prove that the results of Algorith and the MLE forulation (3) are identical. THEOREM 3. The inclusion coefficient estiate ˆ fro Algorith and the inclusion coefficient estiate le fro MLE forulation (3) are identical. PROOF. Please find the proof in Appendix C. Bound of the Probability Estiation: Let P and ˆP be the exact and estiated value of the pr(v X apple V Y ) respectively. (Algorith ) calculates ˆP as the ratio of the nuber of buckets where Vi Y to the total nuber of buckets 2. As shown in Figure 4, this estiation ay produce error e p, e.g., ˆP =.8 while P =.8 ± e p. This figure also show that when ˆP =.8, the lookup step in returns.62 as the estiated inclusion coefficient while the actual one is.62 ± e (Figure 4). Thus, estiation error e p result in e, i.e., estiation error of the inclusion coefficient (X, Y ). Here we first show that given an error e p ( apple e p apple ), what is the probability that the estiation error of Algorith be at ost e p. Next we show how to use the e p in order to bound estiation error of the inclusion coefficient. THEOREM 4. Probability that estiation error of ˆP be at ost e p is: pr ˆP P applee p 2exp 2 + e 2 p PROOF. Please find the proof in Appendix D. For exaple, when =7and estiation error e p is.4, the pr( ˆP P applee p) is at least.95, i.e., with 95% confidence the estiation error is at ost.4%. Bound of the Inclusion Coefficient Estiation: Fro Figure 4 one can see that the slope of P (the blue line) at any point represent the ratio of e p to e, e.g., when ˆP =.8, the estiated inclusion coefficient is.62 and the slope of P at.62, P (.62), is ep. One can nuerically find this slope of P for a given a point e, e.g., P (.62) =.7. Moreover, as we discussed in theores 4, with 95% confidence the estiation error e p is.4. Thus e at this point can be calculated as.4 =.2. More forally at any.7 point we have: e = ep P ( ) () So far we have shown how to find e for a given point. Clearly in Figure 4 the slope of P at different points are different. Thus in order to bound e of two coluns X and Y the key observation is to first find the iniu and axiu slopes of P and then the ratio of e p to those slopes will be the bound of e. Table 2 shows the iniu and axiu slopes of the P and the e p bound for coluns X and Y with different cardinalities. For exaple, when the cardinality of both X and Y is 4, the iniu and axiu slopes of P, in Figure 4 are.24 and.79 and the iniu and axiu e are.4.4 =.2 and =.5, where the ep is.4. Figures 6 to 8 shows how the slope of P varies for the coluns with different cardinalities. One can see that the slope of P reduces as the difference of the cardinalities of the X and Y increases e.g., when X =, Y = 4 the iniu and axiu slopes of P are.62 and.72 respectively. Clearly, when the slope of P reduces, e will increase (Table 2). Tie coplexity of the Algorith: In Algorith, the estiation of pr(v X apple V Y ) using S X, S Y takes 2 steps Table 2: The iniu and axiu e for synthetic data (X, Y ) X Y in( ep ep ) ax( ) ax(e ) in(e ) e e (lines 4-6), where 2 is the nuber of buckets. As we discussed in ln the binary search in the lookup step takes iterations to ln 2 obtain an error saller than. The cost of each iteration is in O(` ) due to the fact that the Equations 6, 7, 8 used in Algorith 2 ( lines 6-8) is the su of ` values. Thus, the tie coplexity of Algorith is O(2 + ln (` )) which is linear in the ln 2 nuber of buckets. 3.3 Choosing Nuber of Buckets Algorith 3 (Appendix A) shows the steps to construct a HLL sketch for a fixed. In this section, we first discuss given two coluns X, and Y how to choose the paraeter (nuber of bits for the buckets). Next, we show how this algorith changes when we consider all pair of coluns since paraeter needs to be varied. Paraeter for a given colun pair: The HLL construction of colun X can be viewed as bins and balls proble [2], where buckets are the bins and distinct values in colun X are balls. As we discussed in 3.2., given colun X with n X distinct values, when there is only one bucket the HLL value of colun X is in fact the axiu over n X independent rando variables (Definition 2). When there are 2 buckets ( >), the HLL value of n colun X will be the axiu over, on average, X 2 independent rando variables for each bucket, i.e, balanced load in each bucket. In bins and balls proble [2], it is shown that as the nuber of bins increases the probability of balanced load will decrease. In other words, given n X balls and 2 bins the probability that all bins contains exactly nx 2 balls reduces as the nuber of bins increases. Thus in HLL construction of colun X, we should also expect that as the nuber of buckets increases the probability that all buckets have the sae load decreases. However, it is also shown in [5, 4] that having large nuber of buckets reduces the variance of cardinality estiation using a HLL sketch. Thus in one hand, having less nuber of buckets increases the probability of balanced load ( nx ), but in the other hand ore nuber of buckets reduces the variance of estiation. For the perfect 2 balanced load, there should be only one bucket, and for the lowest variance nuber of buckets should be n X ( =log(n X)). Given two coluns X, and Y, in this paper, as a trade-off between the variance and the balanced load we heuristically pick a id point between the best variance and the best balanced load by considering the nuber of buckets as equation 2. Note to say that it is a heuristic and it is not the optial choice. log(ax(nx,ny )) = (2) 2 Paraeter for ultiple pairs of coluns: Our goal is to efficiently estiate inclusion coefficient for all colun pairs in a database. When we consider ultiple pairs of coluns Equation 2 ight returns different for a given colun. For exaple, consider three coluns X, Y, and Z with cardinality n X,n Y, and n Z where cardinality of X is saller than Y and Z (n X <n Y, n X <n Z) and cardinality of Y and Z are not equal (n Y 6= n Z). In this exaple, for colun pair X and Y, paraeter is log(ny ), 2 2
7 Figure 5: X =5, Y = 4 Figure 6: X = 4, Y =5 Figure 7: X = 3, Y = 4 Figure 8: X = 4, Y = 3 while for colun pair X and Z paraeter is log(nz ). Thus, for 2 colun X the has two values log(ny ) and log(nz ). 2 2 To address this proble, we change Algorith 3 such that it still reads data only once but it keeps all the sketches for different 2{,,l} (value of k can be deterined by Theore 5). More specifically, after reading the data in line 4, we iterate over different values of fro to l and the rest of the algorith is the sae. Observe that even if we keep all sketches for fro to l, the eory required only doubles because P k = 2 log(`) = 2 l+ log(`), where 2 log(`) is the size of sketch for each colun ( 2.2). After constructing the sketches, given two coluns X and Y, we decide which produces better estiation of inclusion coefficient (Equation 2) and we pass those sketches to Algorith. Finally, we discuss how to set the paraeter l. Suppose the eory bound for each colun is M. In Theore 5, we prove that given colun X, bounded eory M, and ` bits hash function h, the paraeter l should be at ost ln( M log(`) ). THEOREM 5. Given colun X, bounded eory M, and ` bits hash function h, l apple ln( M log(`) ). PROOF. We hash each value X[i] 2 X only once but we generate l +sketches h for each 2{,,l}. Since we only keep the axiu for each bucket b j, the aount of eory for each bucket is log(`). Each sketch has 2 buckets so in total it requires P l i= 2i log(`) =2 l+ log(`) bits. Since M is the bounded eory, 2 l+ should be at ost gives us l apple ln( M log(`) ). M log(`),i.e. (2l+ apple M log(`) ) which 3.4 Discussion: Leveraging ore eory We briefly discuss how can leverage additional eory, when available, to iprove its accuracy. It is shown in [4] that increasing the nuber of buckets for HLL sketches reduces the variance of cardinality estiation. In other words, the bucketization (with stochastic averaging) eulates the effect of n hash functions with only one single hash function [5]. However, as discussed in 3.3, increasing the nuber of buckets reduces the probability of balanced load ( nx 2 ) which can ultiately reduce the accuracy. Thus, leveraging additional eory by increasing the nuber of buckets is not adequate for our proble. One approach is to cobine the use of ultiple hash functions and stochastic averaging in order to take advantage of additional eory. In other words, given two coluns X, and Y we fix nuber of buckets to 2, where can be find by Equation 2. Then we use ultiple hash functions in HLL construction. uses the sketches built by each hash function in order to estiate inclusion coefficient (Algorith ), and the final estiation of inclusion coefficient is the average of those results. A thorough epirical study of the trade-offs between estiation error and the above ethod of leveraging additional eory is an interesting area of future work. 4. EXTENSION OF HLL CONSTRUCTION TO SUPPORT DELETION HLL sketches are becoing increasingly popular for estiating cardinality in different applications, and are even being adopted in coercial database engines, e.g., [6] uses Hyperloglog to support approxiate distinct count functionality. However, in data warehousing scenarios, it is not uncoon for recent data to be added and older data to be reoved fro the database. Clearly HLL sketches can be aintained increentally in the presence of insertions. When a new data ite X[k] inserted to colun X, the sae Algorith 3 finds the hash value of the X[k] (s k = h(x[k])) and the affected bucket b j can be identified by the leftost bits of s k. It then finds (s k ) which is the position of the leftost in the l bits of s k and it updates the value of bucket j as Vj X =ax(s X[b j], (s k )) (Definition 2). For exaple, Figure shows the HLL sketch of the colun X, i.e., S X. let us assue a new value X[k] is added to colun X such that the first bits of the s k = h(x[k]) represents the first bucket (b ) in S X and (s k ) be 5. Thus the value of the b will be updated to ax(4, 5) = 5. On the other hand, when X[i] is deleted, siilar to insertion we can find which bucket is affected. Lets b j be the affected bucket. Since X[i] exists in database, (s i) apples X[b j]. If (s i) < S X[b j], no update is required but if (s i)=s X[b j], it eans (s i) is the largest and should be deleted. Since we do not know what the second largest value for that bucket, we cannot handle deletion. We can odify Algorith 3 such that for each bucket we keep track of all (s i)s in order to support deletion. Since Vj X is the ax over all those value (Definition 2), and when a deletion happens knowing the second largest for each bucket is crucial, as shown in Figure 9, rather than only keeping the axiu value in each bucket (Figure ), we keep all (s i)s in a ax-heap. Interestingly, we can show that by aintaining a heap of constant size (at ost `) for each bucket we are able to support increental deletion. Recall that during HLL sketch construction, we apply a hash function h : do(x) {, }` on each X[i] 2 do(x) which returns ` bits s i, and if is the nuber of bits for the buckets, (s i) is always an integer nuber saller than equal ` ( apple (s i) apple ` ). For exaple, if it uses 64 bits hash function and =, then apple (s i) apple 64. In the worst case, if we keep all distinct (s i)s, we are required to keep only ` values, which requires (` )log(` ) bits. This explains why with only a heap of constant size (at ost ` ) increental deletion can be supported. One issue we need to consider is that it is possible that X[i] and X[k] are assigned to sae bucket b j and (s i) is equal to (s j). In this case, if Vj X = (s i) and X[i] is deleted, the value of bucket b j (Vj X ) should still be (s i) because X[k] 2 do(x) still exists and (s j)= (s i). To handle this scenario and keep the ax heap size still liited to `, we keep a counter for each node in heap. For 3
8 Cardinality Log scale i j X Buckets (X[i]) s i : (X[j]) (s i ) = 3 s j : (s j ) = 4 b b 2^ Max-Heap 4 c 3 <4 Figure 9: HLL sketch of the colun X for deletion support. TPCH- TPCDS Coluns Figure : Cardinality of the coluns in TPCH- and TPCDS- 3 Cardinality Log scale Real-3 Real Coluns Figure : Cardinality of the coluns in Real-3 and Real-4 exaple in Figure 9, since there is only one X[i] that its (s i)=4, value is attached to node 4. So, a node in heap is deleted if the counter is one; otherwise we just reduce the counter by one. Space coplexity: For each bucket b i the heap size is at ost ` and each value in the heap is an integer nuber at ost `. Thus each bucket only needs (` ) log(` ) bits. Thus total eory to store the sketch S X is O(2 (` )log(` )). Since the space coplexity of the original HLL is O(2 log(` )) ( 2.2), with a constant overhead we are able to support deletion. 5. EXPERIMENTS The ain goals of our experiental evaluation are: (a) Evaluate the accuracy of the estiator for inclusion coefficient against the existing approach using Botto-k sketches [3]. (b) Evaluate the ipact of inclusion coefficient estiates instead of exact inclusion coefficient on precision and recall of two existing foreign detection techniques [, 29]. The key takeaways fro the experients are: On databases containing coluns with large cardinality results in substantially saller estiation error (ranging fro. to.5) copared to the Botto-k sketch approach (where errors range fro.3 to.59). As expected, Botto-k perfors well when the cardinality of the coluns are sall (less than k). In databases where ost coluns have cardinality saller than k, the estiation error of the range between.4 and.7 and the error of Botto-k ranges between.3 and.6. The running tie of the estiator is coparable to the approach that uses Botto-k sketches. When we use instead of the exact inclusion coefficients, it has only a sall ipact on the F easure of the two foreign-key algoriths. In contrast, using the estiator based on Botto-k sketches significantly reduces F easure in soe databases. 5. Setup Hardware and Platfor: We ipleented algorith for constructing the sketches on Microsoft s big data syste Cosos []. Cosos is designed to run on large clusters consisting of thousands of coodity servers and it is based on ap-reduce odel to achieve parallelis. The rest of our experients, including the estiation of inclusion coefficient between pairs of coluns using sketches, were perfored on a 6-core, 2.2 GHz, Intel Xeon E achine with 384 GB of RAM. Datasets: Table 3 shows the details of the datasets we used in our experients. Specifically, we used TPC-H with GB scaling factor and TPCDS with 3GB, 5GB, and 2GB(2TB) scaling factors as benchark databases. We also evaluated our technique on four real-world databases, Real- and Real-2 are publicly available in [23] and Real-3 and Real-4 are real-world custoer databases. Figures and show the distribution over the cardinality of the coluns over the synthetic (TPCH-, TPCDS-3) and the largest real databases (Real-3, Real-4). As these figures show, in TPCDS-3, Real-3 and Real-4 there are any coluns with sall and large cardinality. Table 3: Databases used in experiental evaluation. Database #of tables # of coluns # of FKs Size TPCH GB TPCDS GB TPCDS GB TPCDS TB Geeea(Real-) MB Mondial(Real-2) MB Real unknown 8GB Real unknown 7GB Botto-k: Since the inclusion coefficient estiator in [3] used Botto-k sketches, we refer to it as Botto-k in the rest of this section. In brief, the inclusion coefficient of X and Y ( (X, Y )= X\Y ) can be calculated using the Jaccard coefficient, i.e., (X, Y )= X '(X,Y ) '({X[Y },Y ). They use Botto-k sketches to estiate Jaccard coefficient, '(X, Y )= X\Y. Another approach to estiate (X, Y ) X[Y is to divide the estiated value of X \ Y [5] by the estiated value of X. The authors in [3] have shown that the Botto-k approach is ore accurate than the estiator based on the [5] so we only copare the accuracy of against the estiator in [3]. Setting k in Botto-k: We used a 64-bit hash function (MururHash ) for the sketch construction of both HLL and Botto-k. For the fair coparison we fixed the aount of eory per colun used by both estiators ( and Botto-k). In 3.3 we discussed how to pick nuber of buckets. For exaple, in TPCDS-3 the axiu nuber of buckets used by our algorith is 2 3, and since we used 64-bits hash function and HLL keeps the position of the leftost, so each bucket only requires log 64 = 6 bits. Since we keep all the sketches for 2{,,...,3}, total eory is bits (2.28 KB). On the other hand, Botto-k keeps the k sallest hash values, i.e, k 64 bits. Thus, to ensure equal eory, we configure the k in Botto-k as k = 24 6 =536. Table 4 64 shows the and k paraeters for each database. 5.2 Coparing and Botto-k 5.2. Execution Tie Figure 2 shows the execution tie of estiating (X, Y ) for all pair of coluns belonging to different tables using and Botto-k. The total execution tie depends on the nuber of such colun-pairs, and hence varies across the different databases. For exaple, coputes the inclusion coefficient of the 7,635,8 colun-pairs in Real-3 which takes roughly 6 seconds, i.e.,.3 s per colun-pair. Not surprisingly, for each database, the execution ties of the and Botto-k are quite close to each other ( is slightly ore expensive coputationally). 4
9 Es#a#on Execu#on #e in second Real- Real-2 TPCH- Real-3 Real-4 TPCDS-3 TPCDS-5 TPCDS-2 Bo2o-K Real- Bo(o-K Real-2 TPCH- Real-3 TPCDS-3 Real-4 TPCDS-5 TPCDS Botto-k Exact Inclusion Coefficient Botto-k Exact Inclusion Coefficient Figure 2: Execution tie of estiating (X, Y ) for all pair of coluns Figure 3: Average of estiated (X, Y ) Figure 4: Average of estiated (X, Y ) in TPCDS-3: all pair of coluns, k = 536 Figure 5: Average of estiated (X, Y ) in TPCDS-3: X applet, Y applet, t = 5, k = Botto-k Botto-k Botto-k Botto-k Exact Inclusion Coefficient Exact Inclusion Coefficient k* k*2 k*3 k*4 k*5 k*6 k*7 k*8 k*9 Cardinality Threshold (t) k* k*2 k*3 k*4 k*5 k*6 k*7 k*8 k*9 Cardinality Threshold (t) Figure 6: Average of estiated (X, Y ) in TPCDS-3: X >t, Y >t, t = 5, k = 536 Figure 7: Average of estiated (X, Y ) in TPCDS-3: X > t, Y apple t or X apple t, Y >t, t = 5, k = 536 Figure 8: Average of estiated (X, Y ) in TPCDS-3: X >t, Y >t, k = 536 Figure 9: Average of estiated (X, Y ) in TPCDS-3: X > t, Y apple t or X apple t, Y >t, k = 536 Table 4: Paraeter is the nuber of bits that define a bucket in HLL, and k and nuber hash values kept for Botto-k. TPCH- TPCDS- TPCDS- TPCDS- Real- Real-2 Real-3 Real k Estiation Next we copare the average error of the two estiators. The estiation error is the difference between the estiated inclusion coefficient ˆ and its actual value (X, Y ), i.e., (X, Y ) ˆ. Recall that both estiators rely on sketches that use an equal aount of eory ( 5.). As shown in Figure 3, the estiation error in the databases where ost coluns have sall cardinality (Real-, Real-2, and TPCH-) is alost siilar for the and Bottok. However, results in substantially saller error for the other databases (TPCDS-3, Real-3, and Real-4) where any coluns have large cardinality (significantly larger than k). To better understand these results, Figures 4-7 breakdown the result for TPCDS-3 database. In Figure 4, the x-axis shows the buckets that each bucket (lets say a b) contains those colun pairs whose exact inclusion coefficients are in [a, b), and the y-axis represents the absolute error of and Botto-k for those colun pairs. For exaple,.2.3 in x-axis is the bucket contains those coluns pairs whose their exact inclusion coefficients are in [.2,.3) and the error of both and Botto-k (k=536 in Table 4) is around.2. The results show that in TPCDS-3 when exact inclusion coefficient is saller than.5, Botto-k is slightly better than, however, perfors better when inclusion coefficient is large. In applications like foreign-key detection where the high containent (e.g., inclusion coefficient.8) is an iportant signal an estiator with low error for large coefficients is preferable. Next, we show that, the reason is the cardinality of the coluns. As we discussed in 2.., the error of the inclusion coefficient estiators depends on the cardinality of the coluns. Analyzing error based on cardinality threshold t: In Figures 5-7, we evaluate the error of the estiators over three different sets of colun-pairs in TPCDS-3 based on a cardinality threshold t =5(we later vary t in Figures 8, 9). In particularly, Figure 5 only considers the colun-pairs (X,Y ) where both coluns have cardinality less than 5. Figure 6 considers the colunpairs where both coluns have cardinality ore than 5. Figure 7 considers pairs of coluns where the cardinality of one colun is saller than 5 and the cardinality of the other colun is ore than 5. When the cardinality of the coluns are sall, Botto-K perfors better than (Figure 5). In effect, for coluns with cardinality less than k, if a good hash function is used, Botto-k keeps the hashes of all distinct values, thereby behaving siilar to full hash table of the colun. On the other hand, for large coluns (Figure 6) and the one that one colun is large and the other colun is sall (Figure 7), outperfors Botto-k significantly. Figures 8, and 9 show the error of against Botto-K when the cardinality threshold t in above experients is varied. We show the cardinality threshold t as a factor of k in Botto-k. Figures 8 shows the error for those coluns whose cardinalities are ore than t, and Figure 9 shows the error when the cardinality of one colun is saller than t and the cardinality of the other colun is ore than t. Figure 8 shows that for the large coluns significantly outperfors Botto-K. In the unbalanced case (Figure 9), when t is exactly k, Botto-k perfors slightly better than but as t grows outperfors Botto-K. Effect of deletion: Next we evaluate the error and scalability of the version of that uses a HLL sketch that supports data deletions (discussed in 4) vs. Botto-k. Recall, that we keep a constant size 5
Clustering. Cluster Analysis of Microarray Data. Microarray Data for Clustering. Data for Clustering
Clustering Cluster Analysis of Microarray Data 4/3/009 Copyright 009 Dan Nettleton Group obects that are siilar to one another together in a cluster. Separate obects that are dissiilar fro each other into
More informationA Trajectory Splitting Model for Efficient Spatio-Temporal Indexing
A Trajectory Splitting Model for Efficient Spatio-Teporal Indexing Slobodan Rasetic Jörg Sander Jaes Elding Mario A. Nasciento Departent of Coputing Science University of Alberta Edonton, Alberta, Canada
More informationOblivious Routing for Fat-Tree Based System Area Networks with Uncertain Traffic Demands
Oblivious Routing for Fat-Tree Based Syste Area Networks with Uncertain Traffic Deands Xin Yuan Wickus Nienaber Zhenhai Duan Departent of Coputer Science Florida State University Tallahassee, FL 3306 {xyuan,nienaber,duan}@cs.fsu.edu
More informationCollaborative Web Caching Based on Proxy Affinities
Collaborative Web Caching Based on Proxy Affinities Jiong Yang T J Watson Research Center IBM jiyang@usibco Wei Wang T J Watson Research Center IBM ww1@usibco Richard Muntz Coputer Science Departent UCLA
More informationMapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues
Mapping Data in Peer-to-Peer Systes: Seantics and Algorithic Issues Anastasios Keentsietsidis Marcelo Arenas Renée J. Miller Departent of Coputer Science University of Toronto {tasos,arenas,iller}@cs.toronto.edu
More informationλ-harmonious Graph Colouring Lauren DeDieu
λ-haronious Graph Colouring Lauren DeDieu June 12, 2012 ABSTRACT In 198, Hopcroft and Krishnaoorthy defined a new type of graph colouring called haronious colouring. Haronious colouring is a proper vertex
More informationPerformance Analysis of RAID in Different Workload
Send Orders for Reprints to reprints@benthascience.ae 324 The Open Cybernetics & Systeics Journal, 2015, 9, 324-328 Perforance Analysis of RAID in Different Workload Open Access Zhang Dule *, Ji Xiaoyun,
More informationTheoretical Analysis of Local Search and Simple Evolutionary Algorithms for the Generalized Travelling Salesperson Problem
Theoretical Analysis of Local Search and Siple Evolutionary Algoriths for the Generalized Travelling Salesperson Proble Mojgan Pourhassan ojgan.pourhassan@adelaide.edu.au Optiisation and Logistics, The
More informationIdentifying Converging Pairs of Nodes on a Budget
Identifying Converging Pairs of Nodes on a Budget Konstantina Lazaridou Departent of Inforatics Aristotle University, Thessaloniki, Greece konlaznik@csd.auth.gr Evaggelia Pitoura Coputer Science and Engineering
More informationEE 364B Convex Optimization An ADMM Solution to the Sparse Coding Problem. Sonia Bhaskar, Will Zou Final Project Spring 2011
EE 364B Convex Optiization An ADMM Solution to the Sparse Coding Proble Sonia Bhaskar, Will Zou Final Project Spring 20 I. INTRODUCTION For our project, we apply the ethod of the alternating direction
More informationAn Efficient Approach for Content Delivery in Overlay Networks
An Efficient Approach for Content Delivery in Overlay Networks Mohaad Malli, Chadi Barakat, Walid Dabbous Projet Planète, INRIA-Sophia Antipolis, France E-ail:{alli, cbarakat, dabbous}@sophia.inria.fr
More informationA Novel Fast Constructive Algorithm for Neural Classifier
A Novel Fast Constructive Algorith for Neural Classifier Xudong Jiang Centre for Signal Processing, School of Electrical and Electronic Engineering Nanyang Technological University Nanyang Avenue, Singapore
More informationApproaching the Skyline in Z Order
Approaching the Skyline in Z Order KenC.K.Lee Baihua Zheng Huajing Li Wang-Chien Lee cklee@cse.psu.edu bhzheng@su.edu.sg huali@cse.psu.edu wlee@cse.psu.edu Pennsylvania State University, University Park,
More informationA Measurement-Based Model for Parallel Real-Time Tasks
A Measureent-Based Model for Parallel Real-Tie Tasks Kunal Agrawal 1 Washington University in St. Louis St. Louis, MO, USA kunal@wustl.edu https://orcid.org/0000-0001-5882-6647 Sanjoy Baruah 2 Washington
More informationFair Energy-Efficient Sensing Task Allocation in Participatory Sensing with Smartphones
Advance Access publication on 24 February 2017 The British Coputer Society 2017. All rights reserved. For Perissions, please eail: journals.perissions@oup.co doi:10.1093/cojnl/bxx015 Fair Energy-Efficient
More informationShortest Path Determination in a Wireless Packet Switch Network System in University of Calabar Using a Modified Dijkstra s Algorithm
International Journal of Engineering and Technical Research (IJETR) ISSN: 31-869 (O) 454-4698 (P), Volue-5, Issue-1, May 16 Shortest Path Deterination in a Wireless Packet Switch Network Syste in University
More informationDesigning High Performance Web-Based Computing Services to Promote Telemedicine Database Management System
Designing High Perforance Web-Based Coputing Services to Proote Teleedicine Database Manageent Syste Isail Hababeh 1, Issa Khalil 2, and Abdallah Khreishah 3 1: Coputer Engineering & Inforation Technology,
More informationGeo-activity Recommendations by using Improved Feature Combination
Geo-activity Recoendations by using Iproved Feature Cobination Masoud Sattari Middle East Technical University Ankara, Turkey e76326@ceng.etu.edu.tr Murat Manguoglu Middle East Technical University Ankara,
More informationData & Knowledge Engineering
Data & Knowledge Engineering 7 (211) 17 187 Contents lists available at ScienceDirect Data & Knowledge Engineering journal hoepage: www.elsevier.co/locate/datak An approxiate duplicate eliination in RFID
More informationEnergy-Efficient Disk Replacement and File Placement Techniques for Mobile Systems with Hard Disks
Energy-Efficient Disk Replaceent and File Placeent Techniques for Mobile Systes with Hard Disks Young-Jin Ki School of Coputer Science & Engineering Seoul National University Seoul 151-742, KOREA youngjk@davinci.snu.ac.kr
More informationThe Boundary Between Privacy and Utility in Data Publishing
The Boundary Between Privacy and Utility in Data Publishing Vibhor Rastogi Dan Suciu Sungho Hong ABSTRACT We consider the privacy proble in data publishing: given a database instance containing sensitive
More informationThe optimization design of microphone array layout for wideband noise sources
PROCEEDINGS of the 22 nd International Congress on Acoustics Acoustic Array Systes: Paper ICA2016-903 The optiization design of icrophone array layout for wideband noise sources Pengxiao Teng (a), Jun
More informationOptimal Route Queries with Arbitrary Order Constraints
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.?, NO.?,? 20?? 1 Optial Route Queries with Arbitrary Order Constraints Jing Li, Yin Yang, Nikos Maoulis Abstract Given a set of spatial points DS,
More informationTHE rapid growth and continuous change of the real
IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015 47 Designing High Perforance Web-Based Coputing Services to Proote Teleedicine Database Manageent Syste Isail Hababeh, Issa
More informationMULTI-INDEX VOTING FOR ASYMMETRIC DISTANCE COMPUTATION IN A LARGE-SCALE BINARY CODES. Chih-Yi Chiu, Yu-Cyuan Liou, and Sheng-Hao Chou
MULTI-INDEX VOTING FOR ASYMMETRIC DISTANCE COMPUTATION IN A LARGE-SCALE BINARY CODES Chih-Yi Chiu, Yu-Cyuan Liou, and Sheng-Hao Chou Departent of Coputer Science and Inforation Engineering, National Chiayi
More informationDetection of Outliers and Reduction of their Undesirable Effects for Improving the Accuracy of K-means Clustering Algorithm
Detection of Outliers and Reduction of their Undesirable Effects for Iproving the Accuracy of K-eans Clustering Algorith Bahan Askari Departent of Coputer Science and Research Branch, Islaic Azad University,
More informationScheduling Parallel Real-Time Recurrent Tasks on Multicore Platforms
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL., NO., NOV 27 Scheduling Parallel Real-Tie Recurrent Tasks on Multicore Platfors Risat Pathan, Petros Voudouris, and Per Stenströ Abstract We
More informationPrivacy-preserving String-Matching With PRAM Algorithms
Privacy-preserving String-Matching With PRAM Algoriths Report in MTAT.07.022 Research Seinar in Cryptography, Fall 2014 Author: Sander Sii Supervisor: Peeter Laud Deceber 14, 2014 Abstract In this report,
More informationEvaluation of a multi-frame blind deconvolution algorithm using Cramér-Rao bounds
Evaluation of a ulti-frae blind deconvolution algorith using Craér-Rao bounds Charles C. Beckner, Jr. Air Force Research Laboratory, 3550 Aberdeen Ave SE, Kirtland AFB, New Mexico, USA 87117-5776 Charles
More informationImage Filter Using with Gaussian Curvature and Total Variation Model
IJECT Vo l. 7, Is s u e 3, Ju l y - Se p t 016 ISSN : 30-7109 (Online) ISSN : 30-9543 (Print) Iage Using with Gaussian Curvature and Total Variation Model 1 Deepak Kuar Gour, Sanjay Kuar Shara 1, Dept.
More informationDesign Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems
Design Optiization of Mixed Tie/Event-Triggered Distributed Ebedded Systes Traian Pop, Petru Eles, Zebo Peng Dept. of Coputer and Inforation Science, Linköping University {trapo, petel, zebpe}@ida.liu.se
More informationIN many interactive multimedia streaming applications, such
840 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 8, NO. 5, MAY 06 A Geoetric Approach to Server Selection for Interactive Video Streaing Yaochen Hu, Student Meber, IEEE, DiNiu, Meber, IEEE, and Zongpeng Li, Senior
More informationCarving Differential Unit Test Cases from System Test Cases
Carving Differential Unit Test Cases fro Syste Test Cases Sebastian Elbau, Hui Nee Chin, Matthew B. Dwyer, Jonathan Dokulil Departent of Coputer Science and Engineering University of Nebraska - Lincoln
More informationInvestigation of The Time-Offset-Based QoS Support with Optical Burst Switching in WDM Networks
Investigation of The Tie-Offset-Based QoS Support with Optical Burst Switching in WDM Networks Pingyi Fan, Chongxi Feng,Yichao Wang, Ning Ge State Key Laboratory on Microwave and Digital Counications,
More informationPredicting x86 Program Runtime for Intel Processor
Predicting x86 Progra Runtie for Intel Processor Behra Mistree, Haidreza Haki Javadi, Oid Mashayekhi Stanford University bistree,hrhaki,oid@stanford.edu 1 Introduction Progras can be equivalent: given
More informationSolving the Damage Localization Problem in Structural Health Monitoring Using Techniques in Pattern Classification
Solving the Daage Localization Proble in Structural Health Monitoring Using Techniques in Pattern Classification CS 9 Final Project Due Dec. 4, 007 Hae Young Noh, Allen Cheung, Daxia Ge Introduction Structural
More informationSmarter Balanced Assessment Consortium Claims, Targets, and Standard Alignment for Math
Sarter Balanced Assessent Consortiu s, s, Stard Alignent for Math The Sarter Balanced Assessent Consortiu (SBAC) has created a hierarchy coprised of clais targets that together can be used to ake stateents
More informationComputer Aided Drafting, Design and Manufacturing Volume 26, Number 2, June 2016, Page 13
Coputer Aided Drafting, Design and Manufacturing Volue 26, uber 2, June 2016, Page 13 CADDM 3D reconstruction of coplex curved objects fro line drawings Sun Yanling, Dong Lijun Institute of Mechanical
More information1 Extended Boolean Model
1 EXTENDED BOOLEAN MODEL It has been well-known that the Boolean odel is too inflexible, requiring skilful use of Boolean operators to obtain good results. On the other hand, the vector space odel is flexible
More informationModeling Parallel Applications Performance on Heterogeneous Systems
Modeling Parallel Applications Perforance on Heterogeneous Systes Jaeela Al-Jaroodi, Nader Mohaed, Hong Jiang and David Swanson Departent of Coputer Science and Engineering University of Nebraska Lincoln
More informationOPTIMAL COMPLEX SERVICES COMPOSITION IN SOA SYSTEMS
Key words SOA, optial, coplex service, coposition, Quality of Service Piotr RYGIELSKI*, Paweł ŚWIĄTEK* OPTIMAL COMPLEX SERVICES COMPOSITION IN SOA SYSTEMS One of the ost iportant tasks in service oriented
More informationAnalysing Real-Time Communications: Controller Area Network (CAN) *
Analysing Real-Tie Counications: Controller Area Network (CAN) * Abstract The increasing use of counication networks in tie critical applications presents engineers with fundaental probles with the deterination
More informationA Low-Cost Multi-Failure Resilient Replication Scheme for High Data Availability in Cloud Storage
216 IEEE 23rd International Conference on High Perforance Coputing A Low-Cost Multi-Failure Resilient Replication Schee for High Data Availability in Cloud Storage Jinwei Liu* and Haiying Shen *Departent
More informationStructural Balance in Networks. An Optimizational Approach. Andrej Mrvar. Faculty of Social Sciences. University of Ljubljana. Kardeljeva pl.
Structural Balance in Networks An Optiizational Approach Andrej Mrvar Faculty of Social Sciences University of Ljubljana Kardeljeva pl. 5 61109 Ljubljana March 23 1994 Contents 1 Balanced and clusterable
More informationBoosted Detection of Objects and Attributes
L M M Boosted Detection of Objects and Attributes Abstract We present a new fraework for detection of object and attributes in iages based on boosted cobination of priitive classifiers. The fraework directly
More informationUsing Imperialist Competitive Algorithm in Optimization of Nonlinear Multiple Responses
International Journal of Industrial Engineering & Production Research Septeber 03, Volue 4, Nuber 3 pp. 9-35 ISSN: 008-4889 http://ijiepr.iust.ac.ir/ Using Iperialist Copetitive Algorith in Optiization
More informationELEVATION SURFACE INTERPOLATION OF POINT DATA USING DIFFERENT TECHNIQUES A GIS APPROACH
ELEVATION SURFACE INTERPOLATION OF POINT DATA USING DIFFERENT TECHNIQUES A GIS APPROACH Kulapraote Prathuchai Geoinforatics Center, Asian Institute of Technology, 58 Moo9, Klong Luang, Pathuthani, Thailand.
More informationUtility-Based Resource Allocation for Mixed Traffic in Wireless Networks
IEEE IFOCO 2 International Workshop on Future edia etworks and IP-based TV Utility-Based Resource Allocation for ixed Traffic in Wireless etworks Li Chen, Bin Wang, Xiaohang Chen, Xin Zhang, and Dacheng
More informationHIGH PERFORMANCE PRE-SEGMENTATION ALGORITHM FOR SONAR IMAGES
HIGH PERFORMANCE PRE-SEGMENTATION ALGORITHM FOR SONAR IMAGES Benjain Lehann*, Konstantinos Siantidis*, Dieter Kraus** *ATLAS ELEKTRONIK GbH Sebaldsbrücker Heerstraße 235 D-28309 Breen, GERMANY Eail: benjain.lehann@atlas-elektronik.co
More informationScheduling Parallel Task Graphs on (Almost) Homogeneous Multi-cluster Platforms
1 Scheduling Parallel Task Graphs on (Alost) Hoogeneous Multi-cluster Platfors Pierre-François Dutot, Tchiou N Takpé, Frédéric Suter and Henri Casanova Abstract Applications structured as parallel task
More informationEfficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Sha M. Kakade Microsoft Research and Wharton, U Penn skakade@icrosoft.co Varun Kanade SEAS, Harvard University
More informationA CRYPTANALYTIC ATTACK ON RC4 STREAM CIPHER
A CRYPTANALYTIC ATTACK ON RC4 STREAM CIPHER VIOLETA TOMAŠEVIĆ, SLOBODAN BOJANIĆ 2 and OCTAVIO NIETO-TALADRIZ 2 The Mihajlo Pupin Institute, Volgina 5, 000 Belgrade, SERBIA AND MONTENEGRO 2 Technical University
More informationI ve Seen Enough : Incrementally Improving Visualizations to Support Rapid Decision Making
I ve Seen Enough : Increentally Iproving Visualizations to Support Rapid Decision Making Sajjadur Rahan Marya Aliakbarpour 2 Ha Kyung Kong Eric Blais 3 Karrie Karahalios,4 Aditya Paraeswaran Ronitt Rubinfield
More informationA Hybrid Network Architecture for File Transfers
JOURNAL OF IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 6, NO., JANUARY 9 A Hybrid Network Architecture for File Transfers Xiuduan Fang, Meber, IEEE, Malathi Veeraraghavan, Senior Meber,
More informationDifferent criteria of dynamic routing
Procedia Coputer Science Volue 66, 2015, Pages 166 173 YSC 2015. 4th International Young Scientists Conference on Coputational Science Different criteria of dynaic routing Kurochkin 1*, Grinberg 1 1 Kharkevich
More informationGromov-Hausdorff Distance Between Metric Graphs
Groov-Hausdorff Distance Between Metric Graphs Jiwon Choi St Mark s School January, 019 Abstract In this paper we study the Groov-Hausdorff distance between two etric graphs We copute the precise value
More informationRanking Spatial Data by Quality Preferences
1 Ranking Spatial Data by Quality Preferences Man Lung Yiu, Hua Lu, Nikos Maoulis, and Michail Vaitis Abstract A spatial preference query ranks objects based on the qualities of features in their spatial
More informationAscending order sort Descending order sort
Scalable Binary Sorting Architecture Based on Rank Ordering With Linear Area-Tie Coplexity. Hatrnaz and Y. Leblebici Departent of Electrical and Coputer Engineering Worcester Polytechnic Institute Abstract
More informationReconstruction of Time Series using Optimal Ordering of ICA Components
Reconstruction of Tie Series using Optial Ordering of ICA Coponents Ar Goneid and Abear Kael Departent of Coputer Science & Engineering, The Aerican University in Cairo, Cairo, Egypt e-ail: goneid@aucegypt.edu
More informationApproximate String Matching with Reduced Alphabet
Approxiate String Matching with Reduced Alphabet Leena Salela 1 and Jora Tarhio 2 1 University of Helsinki, Departent of Coputer Science leena.salela@cs.helsinki.fi 2 Aalto University Deptartent of Coputer
More informationThe Application of Bandwidth Optimization Technique in SLA Negotiation Process
The Application of Bandwidth Optiization Technique in SLA egotiation Process Srećko Krile University of Dubrovnik Departent of Electrical Engineering and Coputing Cira Carica 4, 20000 Dubrovnik, Croatia
More information3 Conference on Inforation Sciences and Systes, The Johns Hopkins University, March, 3 Sensitivity Characteristics of Cross-Correlation Distance Metric and Model Function F. Porikli Mitsubishi Electric
More informationSurvivability Function A Measure of Disaster-Based Routing Performance
Survivability Function A Measure of Disaster-Based Routing Perforance Journal Club Presentation on W. Molisz. Survivability function-a easure of disaster-based routing perforance. IEEE Journal on Selected
More informationThe Unit Bar Visibility Number of a Graph
The Unit Bar Visibility Nuber of a Graph Eily Gaub 1,4, Michelle Rose 2,4, and Paul S. Wenger,4 August 20, 2015 Abstract A t-unit-bar representation of a graph G is an assignent of sets of at ost t horizontal
More information6.1 Topological relations between two simple geometric objects
Chapter 5 proposed a spatial odel to represent the spatial extent of objects in urban areas. The purpose of the odel, as was clarified in Chapter 3, is ultifunctional, i.e. it has to be capable of supplying
More informationA Beam Search Method to Solve the Problem of Assignment Cells to Switches in a Cellular Mobile Network
A Bea Search Method to Solve the Proble of Assignent Cells to Switches in a Cellular Mobile Networ Cassilda Maria Ribeiro Faculdade de Engenharia de Guaratinguetá - DMA UNESP - São Paulo State University
More informationBrian Noguchi CS 229 (Fall 05) Project Final Writeup A Hierarchical Application of ICA-based Feature Extraction to Image Classification Brian Noguchi
A Hierarchical Application of ICA-based Feature Etraction to Iage Classification Introduction Iage classification poses one of the greatest challenges in the achine vision and achine learning counities.
More informationAn Integrated Processing Method for Multiple Large-scale Point-Clouds Captured from Different Viewpoints
519 An Integrated Processing Method for Multiple Large-scale Point-Clouds Captured fro Different Viewpoints Yousuke Kawauchi 1, Shin Usuki, Kenjiro T. Miura 3, Hiroshi Masuda 4 and Ichiro Tanaka 5 1 Shizuoka
More informationPlanet Scale Software Updates
Planet Scale Software Updates Christos Gkantsidis 1, Thoas Karagiannis 2, Pablo Rodriguez 1, and Milan Vojnović 1 1 Microsoft Research 2 UC Riverside Cabridge, UK Riverside, CA, USA {chrisgk,pablo,ilanv}@icrosoft.co
More informationNON-RIGID OBJECT TRACKING: A PREDICTIVE VECTORIAL MODEL APPROACH
NON-RIGID OBJECT TRACKING: A PREDICTIVE VECTORIAL MODEL APPROACH V. Atienza; J.M. Valiente and G. Andreu Departaento de Ingeniería de Sisteas, Coputadores y Autoática Universidad Politécnica de Valencia.
More informationGeometry. The Method of the Center of Mass (mass points): Solving problems using the Law of Lever (mass points). Menelaus theorem. Pappus theorem.
Noveber 13, 2016 Geoetry. The Method of the enter of Mass (ass points): Solving probles using the Law of Lever (ass points). Menelaus theore. Pappus theore. M d Theore (Law of Lever). Masses (weights)
More informationDebiasing Crowdsourced Batches
Debiasing Crowdsourced Batches Honglei Zhuang, Aditya Paraeswaran, Dan Roth and Jiawei Han Departent of Coputer Science University of Illinois at Urbana-Chapaign {hzhuang3, adityagp, danr, hanj}@illinois.edu
More informationCS 361 Meeting 8 9/24/18
CS 36 Meeting 8 9/4/8 Announceents. Hoework 3 due Friday. Review. The closure properties of regular languages provide a way to describe regular languages by building the out of sipler regular languages
More information2013 IEEE Conference on Computer Vision and Pattern Recognition. Compressed Hashing
203 IEEE Conference on Coputer Vision and Pattern Recognition Copressed Hashing Yue Lin Rong Jin Deng Cai Shuicheng Yan Xuelong Li State Key Lab of CAD&CG, College of Coputer Science, Zhejiang University,
More informationThe Internal Conflict of a Belief Function
The Internal Conflict of a Belief Function Johan Schubert Abstract In this paper we define and derive an internal conflict of a belief function We decopose the belief function in question into a set of
More informationCOLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL 1 Te-Wei Chiang ( 蔣德威 ), 2 Tienwei Tsai ( 蔡殿偉 ), 3 Jeng-Ping Lin ( 林正平 ) 1 Dept. of Accounting Inforation Systes, Chilee Institute
More informationKS3 Maths Progress Delta 2-year Scheme of Work
KS3 Maths Progress Delta 2-year Schee of Work See the spreadsheet for the detail. Essentially this is an express route through our higher-ability KS3 Maths Progress student books Delta 1, 2 and 3 as follows:
More informationHandling Indivisibilities. Notes for AGEC 622. Bruce McCarl Regents Professor of Agricultural Economics Texas A&M University
Notes for AGEC 6 Bruce McCarl Regents Professor of Agricultural Econoics Texas A&M University All or None -- Integer Prograing McCarl and Spreen Chapter 5 Many investent and probles involve cases where
More informationData Caching for Enhancing Anonymity
Data Caching for Enhancing Anonyity Rajiv Bagai and Bin Tang Departent of Electrical Engineering and Coputer Science Wichita State University Wichita, Kansas 67260 0083, USA Eail: {rajiv.bagai, bin.tang}@wichita.edu
More informationDistributed Multicast Tree Construction in Wireless Sensor Networks
Distributed Multicast Tree Construction in Wireless Sensor Networks Hongyu Gong, Luoyi Fu, Xinzhe Fu, Lutian Zhao 3, Kainan Wang, and Xinbing Wang Dept. of Electronic Engineering, Shanghai Jiao Tong University,
More informationA GRAPH-PLANARIZATION ALGORITHM AND ITS APPLICATION TO RANDOM GRAPHS
A GRAPH-PLANARIZATION ALGORITHM AND ITS APPLICATION TO RANDOM GRAPHS T. Ozawa and H. Takahashi Departent of Electrical Engineering Faculty of Engineering, Kyoto University Kyoto, Japan 606 Abstract. In
More informationSupplementary Section. A. Algorithm. B. Datasets. C. More on Labeling Functions
Suppleentary Section In this section, we include ore details about our algorith, labeling functions, datasets as well as additional results and coparisons, which could not be included in the ain paper
More informationDefining and Surveying Wireless Link Virtualization and Wireless Network Virtualization
1 Defining and Surveying Wireless Link Virtualization and Wireless Network Virtualization Jonathan van de Belt, Haed Ahadi, and Linda E. Doyle The Centre for Future Networks and Counications - CONNECT,
More informationA Study of the Relationship Between Support Vector Machine and Gabriel Graph
A Study of the Relationship Between Support Vector Machine and Gabriel Graph Wan Zhang and Irwin King {wzhang, king}@cse.cuhk.edu.hk Departent of Coputer Science & Engineering The Chinese University of
More informationDesign and Implementation of Business Logic Layer Object-Oriented Design versus Relational Design
Design and Ipleentation of Business Logic Layer Object-Oriented Design versus Relational Design Ali Alharthy Faculty of Engineering and IT University of Technology, Sydney Sydney, Australia Eail: Ali.a.alharthy@student.uts.edu.au
More informationAn Ad Hoc Adaptive Hashing Technique for Non-Uniformly Distributed IP Address Lookup in Computer Networks
An Ad Hoc Adaptive Hashing Technique for Non-Uniforly Distributed IP Address Lookup in Coputer Networks Christopher Martinez Departent of Electrical and Coputer Engineering The University of Texas at San
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN
International Journal of Scientific & Engineering Research, Volue 4, Issue 0, October-203 483 Design an Encoding Technique Using Forbidden Transition free Algorith to Reduce Cross-Talk for On-Chip VLSI
More informationNovel Image Representation and Description Technique using Density Histogram of Feature Points
Novel Iage Representation and Description Technique using Density Histogra of Feature Points Keneilwe ZUVA Departent of Coputer Science, University of Botswana, P/Bag 00704 UB, Gaborone, Botswana and Tranos
More informationA Periodic Dynamic Load Balancing Method
2016 3 rd International Conference on Engineering Technology and Application (ICETA 2016) ISBN: 978-1-60595-383-0 A Periodic Dynaic Load Balancing Method Taotao Fan* State Key Laboratory of Inforation
More informationA Comparative Study of Two-phase Heuristic Approaches to General Job Shop Scheduling Problem
IE Vol. 7, No. 2, pp. 84-92, Septeber 2008. A Coparative Study of Two-phase Heuristic Approaches to General Job Shop Scheduling Proble Ji Ung Sun School of Industrial and Managent Engineering Hankuk University
More informationReal-Time Detection of Invisible Spreaders
Real-Tie Detection of Invisible Spreaders MyungKeun Yoon Shigang Chen Departent of Coputer & Inforation Science & Engineering University of Florida, Gainesville, FL 3, USA {yoon, sgchen}@cise.ufl.edu Abstract
More informationEFFICIENT VIDEO SEARCH USING IMAGE QUERIES A. Araujo1, M. Makar2, V. Chandrasekhar3, D. Chen1, S. Tsai1, H. Chen1, R. Angst1 and B.
EFFICIENT VIDEO SEARCH USING IMAGE QUERIES A. Araujo1, M. Makar2, V. Chandrasekhar3, D. Chen1, S. Tsai1, H. Chen1, R. Angst1 and B. Girod1 1 Stanford University, USA 2 Qualco Inc., USA ABSTRACT We study
More informationOn the Computation and Application of Prototype Point Patterns
On the Coputation and Application of Prototype Point Patterns Katherine E. Tranbarger Freier 1 and Frederic Paik Schoenberg 2 Abstract This work addresses coputational probles related to the ipleentation
More informationKeyword Search in Spatial Databases: Towards Searching by Document
IEEE International Conference on Data Engineering Keyword Search in Spatial Databases: Towards Searching by Docuent Dongxiang Zhang #1, Yeow Meng Chee 2, Anirban Mondal 3, Anthony K. H. Tung #4, Masaru
More informationA Model Free Automatic Tuning Method for a Restricted Structured Controller. by using Simultaneous Perturbation Stochastic Approximation
28 Aerican Control Conference Westin Seattle Hotel, Seattle, Washington, USA June -3, 28 WeC9.5 A Model Free Autoatic Tuning Method for a Restricted Structured Controller by Using Siultaneous Perturbation
More informationDeterministic Voting in Distributed Systems Using Error-Correcting Codes
IEEE TRASACTIOS O PARALLEL AD DISTRIBUTED SYSTEMS, VOL. 9, O. 8, AUGUST 1998 813 Deterinistic Voting in Distributed Systes Using Error-Correcting Codes Lihao Xu and Jehoshua Bruck, Senior Meber, IEEE Abstract
More informationEnhancing Real-Time CAN Communications by the Prioritization of Urgent Messages at the Outgoing Queue
Enhancing Real-Tie CAN Counications by the Prioritization of Urgent Messages at the Outgoing Queue ANTÓNIO J. PIRES (1), JOÃO P. SOUSA (), FRANCISCO VASQUES (3) 1,,3 Faculdade de Engenharia da Universidade
More informationImprove Peer Cooperation using Social Networks
Iprove Peer Cooperation using Social Networks Victor Ponce, Jie Wu, and Xiuqi Li Departent of Coputer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 Noveber 5, 2007 Corresponding
More informationMinimax Sensor Location to Monitor a Piecewise Linear Curve
NSF GRANT #040040 NSF PROGRAM NAME: Operations Research Miniax Sensor Location to Monitor a Piecewise Linear Curve To M. Cavalier The Pennsylvania State University University Par, PA 680 Whitney A. Conner
More informationJoint Measurement- and Traffic Descriptor-based Admission Control at Real-Time Traffic Aggregation Points
Joint Measureent- and Traffic Descriptor-based Adission Control at Real-Tie Traffic Aggregation Points Stylianos Georgoulas, Panos Triintzios and George Pavlou Centre for Counication Systes Research, University
More information