Distinct Sampling on Streaming Data with Near-Duplicates*

Size: px
Start display at page:

Download "Distinct Sampling on Streaming Data with Near-Duplicates*"

Transcription

1 Distinct Samping on Streaming Data with Near-Dupicates* ABSTRACT Jiecao Chen Indiana University Boomington Boomington, IN, USA In this paper we study how to perform distinct samping in the streaming mode where data contain near-dupicates. The goa of distinct samping is to return a distinct eement uniformy at random from the universe of eements, given that a the near-dupicates are treated as the same eement. We aso extend the resut to the siding window cases in which we are ony interested in the most recent items. We present agorithms with provabe theoretica guarantees for datasets in the Eucidean space, and aso verify their effectiveness via an extensive set of experiments. 1 INTRODUCTION Rea word datasets are aways noisy; imprecise references to same rea-word entities are ubiquitous in the business and scientific databases. For exampe, YouTube contains many videos of amost the same content; they appear to be sighty different due to cuts, compression and change of resoutions. A arge number of webpages on the Internet are near-dupicates of each other. Numerous tweets and WhatsApp/WeChat messages are re-sent with sma edits. This phenomenon makes data anaytics more difficut. It is cear that direct statistica anaysis on such noisy datasets wi be erroneous. For instance, if we perform standard distinct samping, then the samping wi be biased towards those eements that have a arge number of near-dupicates. On the other hand, due to the sheer size of the data it becomes infeasibe to perform a comprehensive data ceaning step before the actua anaytic phase. In this paper we study how to process datasets containing near-dupicates in the data stream mode [4, 23], where we can ony make a sequentia scan of data items using a sma memory space before the query-answering phase. When answering queries we need to treat a the near-dupicates as the same universe eement. This genera probem has been recenty proposed in [9], where the authors studied the estimation of the number of distinct eements of the data stream (aso caed F 0 ). In this paper we extend this ine of research by studying another fundamenta probem in the data stream iterature: the distinct samping (a.k.a. 0 -samping), where at *Both authors are supported by NSF CCF and IIS Permission to make digita or hard copies of a or part of this work for persona or cassroom use is granted without fee provided that copies are not made or distributed for profit or commercia advantage and that copies bear this notice and the fu citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or repubish, to post on servers or to redistribute to ists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. PODS 18, June 10 15, 2018, Houston, TX, USA 2018 Copyright hed by the owner/author(s). Pubication rights icensed to the Association for Computing Machinery. ACM ISBN /18/06... $ Qin Zhang Indiana University Boomington Boomington, IN, USA qzhangcs@indiana.edu the time of query we need to output a random sampe among a the distinct eements of the dataset. 0 -samping has many appications that we sha mention shorty. We remark, as aso pointed out in [9], that we cannot pace our hope on a magic hash function that can map a the near-dupicates into the same eement and otherwise into different eements, simpy because such a magic hash function, if exists, needs a ot of bits to describe. The Noisy Data Mode and Probems. Let us formay define the noisy data mode and the probems we sha study. In this paper we wi focus on points in the Eucidean space. More compicated data objects such as documents and images can be mapped to points in their feature spaces. We first introduce a few concepts (first introduced in [9]) to faciitate our discussion. Let d(, ) be the distance function of the Eucidean space, and et α be a parameter (distance threshod) representing the maximum distance between any two points in the same group. Definition 1.1 (data sparsity). We say a dataset S (α, β)-sparse in the Eucidean space for some β α if for any u, v S we have either d(u,v) α or d(u,v) > β. We ca max β β/α the separation ratio. Definition 1.2 (we-separated dataset). We say a dataset S weseparated if the separation ratio of S is arger than 2. Definition 1.3 (natura partition; F 0 of we-separated dataset). We can naturay partition a we-separated dataset S to a set of groups such that the intra-group distance is at most α, and the intergroup distance is more than 2α. We ca this the unique natura partition of S. Define the number of distinct eements of a weseparated dataset w.r.t. α, denoted as F 0 (S, α), to be the number of groups in the natura partition. We wi assume that α is given as a user-chosen input to our agorithms. In practice, α can be obtained for exampe by samping a sma number of items of the dataset and then comparing their abes. For a genera dataset, we need to define the number of distinct eements as an optimization probem as foows. Definition 1.4 (F 0 of genera dataset). The number of distinct eements of S given a distance threshod α, denoted by F 0 (S, α), is defined to be the size of the minimum cardinaity partition G = {G 1,G 2,...,G n } of S such that for any i = 1,..., n, and for any pair of points u,v G i, we have d(u,v) α. Note that the definition for genera datasets is consistent with the one for we-separated datasets. We next define 0 -samping for noisy datasets. To differentiate with the standard 0 -samping we wi ca it robust 0 -samping; but

2 we may omit the word robust in the rest of the paper when it is cear from the context. We start with we-separated datasets. Definition 1.5 (robust 0 -samping on we-separated dataset). Let S be a we-separated dataset with natura partition G = {G 1,G 2,...,G n }. The robust 0 -samping on S outputs a point u S such that i [n], Pr[u G i ] = 1/n. (1) That is, we output a point from each group with equa probabiity; we ca the outputted point the robust 0 -sampe. It is a itte more subte to define robust 0 -samping on genera datasets, since there coud be mutipe minimum cardinaity partitions, and without fixing a particuar partition we cannot define 0 -samping. We wi circumvent this issue by targeting a sighty weaker samping goa. Definition 1.6 (robust 0 -samping on genera dataset). Let S be a dataset and et n = F 0 (S, α). The robust 0 -samping on S outputs a point q such that, p S, Pr[q Ba(p, α) S] = Θ(1/n), (2) where Ba(p, α) is the ba centered at p with radius α. Let us compare Equation (1) and (2). It is easy to see that when S is we-separated, etting G(p) denote the group that p beongs to in the natura partition of S, we have and thus we can rewrite (1) as G(p) = Ba(p, α) S, p S, Pr[q Ba(p, α) S] = 1/n. (3) Comparing (2) and (3), one can see that the definition of robust 0 - samping on genera dataset is consistent with that on we-separated dataset, except that we have reaxed the sampe probabiity by a constant factor. Computationa Modes. We study robust 0 -samping in the standard streaming mode, where the points p 1,...,p m S comes one by one in order, and we need maintain a sketch of S t = {p 1,...,p t } (denoted by sk(s t )) such that at any time t we can output an 0 - sampe of S t using sk(s t ). The goa is to minimize the size of sketch sk(s t ) (or, the memory space usage) and the processing time per point under certain accuracy/approximation guarantees. We aso study the siding window modes. Let w be the window size. In the sequence-based siding window mode, at any time step t we shoud be abe to output an 0 -sampe of {p w+1,...,p } where p is the atest point that we receive by the time t. In the time-based siding window mode, we shoud be abe to output an 0 -sampe of {p,...,p } where p,...,p are points received in the ast w time steps t w +1,..., t. The siding window modes are generaizations of the standard streaming mode (which we ca the infinite window mode), and are very usefu in the case that we are ony interested in the most recent items. Our agorithms for siding windows wi work for both sequence-based and time-based cases. The ony difference is that the definitions of the expiration of a point are different in the two cases. Our Contributions. This paper makes the foowing theoretica contributions. (1) We propose a robust 0 -samping agorithm for we-separated datasets in the streaming mode in constant dimensiona Eucidean spaces; the agorithm uses O(ogm) words of space (m is the ength of the stream) and O(ogm) processing time per point, and successes with probabiity (1 1/m) during the whoe streaming process. This resut matches the one in the corresponding noiseess data setting. See Section 2.1 (2) We next design an agorithm for siding windows under the same setting. The agorithm works for both sequence-based and time-based siding windows, using O(og n ogw) words of space and O(og n ogw) processing time per point with success probabiity (1 1/m) during the whoe streaming process. We comment that the siding window agorithm is much more compicated than the one for the infinite window, and is our main technica contribution. See Section 2.2. (3) For genera datasets, we manage to show that the proposed 0 - samping agorithms for we-separated datasets sti produce amost uniform sampes on genera datasets. More precisey, it achieves the guarantee (2). See Section 3. (4) We further show that our agorithms can aso hande datasets in high dimensiona Eucidean spaces given sufficienty arge separation ratios. See Section 4. (5) Finay, we show that our 0 -samping agorithms can be used to efficienty estimate F 0 in both the standard streaming mode and the siding window modes. See Section 5. We have aso impemented and tested our 0 -samping agorithm for the infinite window case, and verified its effectiveness on various datasets. See Section A. Reated Work. We now briefy survey reated works on distinct samping, and previous work deaing with datasets with near-dupicates. The probem of 0 -samping is among the most we studied probems in the data stream iterature. It was first investigated in [14, 24, 26], and the current best resut is due to Jowhari et a. [28]. We refer readers to [13] for an overview of a number of 0 -sampers under a unified framework. Besides being used in various statistica estimations [14], 0 -samping finds appications in dynamic geometric probems (e.g., ϵ-approximation, minimum spanning tree [24]), and dynamic graph streaming agorithms (e.g., connectivity [1], graph sparsifiers [2, 3], vertex cover [10, 11] maximum matching [1, 5, 10, 30], etc; see [32] for a survey). However, a the agorithms for 0 -samping proposed in the iterature ony work for noiseess streaming datasets. 0 -samping in the siding windows on noiseess datasets can be done by running the agorithm in [6] with the rank of each item being generated by a random hash function. As before, this approach cannot work for datasets with near-dupicates simpy because the hash vaues assigned to near-dupicates wi be different. 0 -samping has aso been studied in the distributed streaming setting [12] where there are mutipe streams and we want to maintain a distinct sampe over the union of the streams. The samping agorithm in [12] is essentiay an extension of the random samping agorithms in [15, 34] by using a hash function to generate random ranks for items, and is thus again unsuitabe for datasets with near-dupicates. The ist of works for F 0 estimation is even onger (e.g., [7, 19, 22, 23, 25, 29]; just mention a few). Estimating F 0 in the siding window

3 mode was studied in [37]. Again, a these works target noiseess data. The genera probem of processing noisy data streams without a comprehensive data ceaning step was ony studied fairy recenty [9] for the F 0 probem. A number of statistica probems (F 0, 0 - samping, heavy hitters, etc.) were studied in the distributed mode under the same noisy data mode [36]. Unfortunatey the muti-round agorithms designed in the distributed mode cannot be used in the data stream mode because on data streams we can ony scan the whoe dataset once without ooking back. This ine of research is cosey reated to entity resoution (aso caed data dedupication, record inkage, etc.); see, e.g., [17, 20, 27, 31]. However, a these works target finding and merging a the near-dupicates, and thus cannot be appied to the data stream mode where we ony have a sma memory space and cannot store a the items. Techniques Overview. The high eve idea of our agorithm for the infinite window is very simpe. Suppose we can modify the stream by ony keeping one representative point (e.g., the first point according to the order of the data stream) of each group, then we can just perform a uniform random samping on the representative points, which can be done for exampe by the foowing fokore agorithm: We assign each point with a random rank in (0, 1), and maintain the point with the minimum rank as the sampe during the streaming process. Now the question becomes: Can we identify (not necessariy store) the first point of each group space-efficienty? Unfortunatey, we wi need to use Ω(n) space (n is the number of groups) to identify the first point of each group for a noisy streaming dataset, since we have to store at east 1 bit to record the first point of each group to avoid seecting other ater-coming points of the same group. One way to dea with this chaenge is to subsampe a set of groups in advance, and then ony focus on the first points of this set of groups. Two issues remain to be deat with: (1) How to sampe a set of groups in advance? (2) How to determine the sampe rate? Note that before we have seen a points in the group, the group itsef is not we-defined, and thus it is difficut to assign an ID to a group at the beginning and perform the subsamping. Moreover, the number of groups wi keep increasing as we see more points, we therefore have to decrease the sampe rate aong the way to keep the sma space usage. For the first question, the idea is to post a random grid of side ength Θ(α) (α is the group distance threshod) upon the point set, and then sampe ces of the grid instead of groups using a hash function. We then say a group (1) G is samped if and ony if G s first point fas into a samped ce, (2) G is rejected if G has a point in a samped ce, however the G s first point is not in a samped ce. (3) G is ignored if G has no point in a samped ce. We note that the second item is critica since we want to judge a group ony by its first point; even there is another point in the group that is samped, if it is not the first point of the group, then we wi sti consider the group as rejected. On the other hand, we do not need to worry about those ignored groups since they are not considered at the very beginning. To guarantee that our decision is consistent on each group we have to keep some neighborhood information on each rejected group as we to avoid doube-counting, which seems to be space-expensive at the first gance. Fortunatey, for constant dimensiona Eucidean space, we can show that if grid ces are randomy samped, then the number of non-samped groups is within a constant factor of that of samped groups. We thus can contro the space usage of the agorithm by dynamicay decreasing the sampe rate for grid ces. More precisey, we try to maintain a sampe rate as ow as possibe whie guarantee that there is at east one group that is samped. This answers the second question. The situation in the siding window case becomes compicated because points wi expire, and consequenty we cannot keep decreasing the grid ce sampe rate. In fact, we have to increase the ce sampe rate when there are not enough groups being samped. However, if we increase the ce sampe rate in the midde of the process, then the neighborhood information of those previousy ignored groups has aready got ost. To hande this diemma we choose to maintain a hierarchica samping structure. We choose to describe the high eve ideas as we as the actua agorithm in Section after the some basic agorithms and concepts have been introduced. For genera datasets, we show that our agorithms for we-separated datasets can sti return an amost uniform random distinct sampe. We first reate our robust 0 -samping agorithm to a greedy partition process, and show that our agorithm wi return a random group among the groups generated by that greedy partition. We then compare that particuar greedy partition with the minimum cardinaity partition, and show that the number of groups produced by the two partitions are within a constant factor of each other. Comparison with [9]. We note that athough this work foows the noisy data mode of that in [9] and the roadmap of this paper is simiar to that of [9] (which we think is the natura way for the presentation), the contents of this paper, namey, the ideas, proposed agorithms, and anaysis techniques, are a very different from that in [9]. After a, the 0 -samping probem studied in this paper is different from the F 0 estimation studied in [9]. We note, however, that there are natura connections between distinct eements and distinct samping, and thus woud ike to mention a few points. (1) In the infinite window case, we can easiy use our robust 0 - samping agorithm to get an agorithm for (1+ϵ)-approximating robust F 0 using the same amount of space as that in [9] (see Section 5). We note that in the noiseess data setting, the probem of 0 -samping and F 0 estimation can be reduced to each other by easy reductions. However, it is not cear how to straightforwardy use F 0 estimation to perform 0 -samping in the noisy data setting using the same amount of space as we have achieved. We beieve that since there is no magic hash function, simiar procedure ike finding the representative point of each group is necessary in any 0 -samping agorithm in the noisy data setting. (2) Our siding window 0 -samping agorithm can aso be used to obtain a siding window agorithm for (1+ϵ)-approximating F 0 (aso see Section 5). However, it is not cear how to extend

4 Notation Definition S stream of points m ength of the stream w ength of the siding window n = F 0 (S) number of groups G/G set of groups / a group G(p) group containing point p α threshod of group diameter G/C grid / a grid ce CELL(p) ce containing point p ADJ(p) set of ces adjacent to CELL(p) Ba(p, α) {q d(p,q) α} ϵ approximation ratio for F 0 Tabe 1: Notations the F 0 agorithm in [9] to the siding window case, which was not studied in [9]. (3) In order to dea with genera datasets, in [9] the authors introduced a concept caed F 0 -ambiguity and used it as a parameter in the anaysis. Intuitivey, F 0 -ambiguity measures the east fraction of points that we need to remove in order to make the dataset to be we-separated. This definition works for probems whose answer is a singe number, which does not depend on the actua group partition. However, different group partitions do affect the resut of 0 -samping, even that a those partitions have the minimum cardinaity. In Section 3 we show that by introducing a reaxed version of random samping we can bypass the issue of data ambiguity. Preiminaries. In Tabe 1 we summarize the main notations used in this paper. We use [n] to denote {1, 2,..., n}. We say x is (1 + ϵ)-approximation of y if x [(1 ϵ)y, (1 + ϵ)y]. We need the foowing versions of the Chernoff bound. LEMMA 1.7 (STANDARD CHERNOFF BOUND). Let X 1,..., X n be independent Bernoui random variabes such that Pr[X i = 1] = p i. Let X = i [n] X i. Let µ = E[X ]. It hods that Pr[X (1+δ)µ] e δ 2 µ/3 and Pr[X (1 δ)µ] e δ 2 µ/2 for any δ (0, 1). LEMMA 1.8 (VARIANT OF CHERNOFF BOUND). Let Y 1,...,Y n be n independent random variabes such that Y i [0,T ] for some T > 0. Let µ = E[ i Y i ]. Then for any a > 0, we have Pr Y i > a e (a 2µ)/T. i [n] 2 WELL-SEPARATED DATASETS IN CONSTANT DIMENSIONS We start with the discussion of 0 -samping on we-separated datasets in constant dimensiona Eucidean space. 2.1 Infinite Window We first consider the infinite window case. We present our agorithm for 2-dimensiona Eucidean space, but it can be triviay extended to O(1)-dimensions by appropriatey changing the constant parameters. Let G = {G 1,...,G n } be the natura group partition of the weseparated stream of points S. We post a random grid G with side ength α 2 on R2, and ca each grid square a ce. For a point p, define CELL(p) to be the ce C G containing p. Let ADJ(p) = {C G d(p,c) α}, where d(p,c) is defined to be the minimum distance between p and a point in C. We say a group G intersects a ce C if G C. Assuming that a points have x and y coordinates in the range [0, M] for a arge enough vaue M. Let = 2M α + 1. We assign the ce on the i-th row and the j-th coumn of the grid G [0, M] [0, M] a numerica identification (ID) ((i 1) + j). For convenience we wi use ce and its ID interchangeaby throughout the paper when there is no confusion. For ease of presentation, we wi assume that we can use fuy random hash functions for free. In fact, by Chernoff-Hoeffding bounds for imited independence [18, 33], a our anaysis sti hods when we adopt Θ(ogm)-wise independent hash functions, using which wi not affect the asymptotic space and time costs of our agorithms. Let h : [ 2 ] {0, 1,..., 2 2 og 1} be a fuy random hash function, and define h R for a given parameter R = 2 k (k N) to be h R (x) = h(x) mod R. We wi use h R to perform samping. In particuar, given a set of IDs Y = {y 1,...,y t }, we ca {y Y h R (y) = 0} the set of samped IDs of Y by h R. We aso ca 1/R the sampe rate of h R. As discussed in the techniques overview in the introduction, our main idea is to sampe ces instead of groups in advance using a hash function. Definition 2.1 (samped ce). A ce C is samped by h R if and ony if h R (C) = 0. By our choices of the grid ce size and the hash function we have: FACT 1. (a) Each ce wi intersect at most one group, and each group wi intersect at most O(1) ces. (b) For any set of points P = {p 1,...,p t }, {p P h 2R (ce(p)) = 0} {p P h R (ce(p)) = 0}. In the infinite window case (this section) we choose the representative point of each group to be the first point of the group. We note that the representative points are fuy determined by the input stream, and are independent of the hash function. We wi define the representative point sighty differenty in the siding window case (next section). We define a few sets which we wi maintain in our agorithms. Definition 2.2. Let S rep be the set of representative points of a groups. Define the accept set to be and the reject set to be S acc = {p S rep h R (CELL(p)) = 0}, S rej = {p S rep \S acc C ADJ(p) s.t. h R (C) = 0}. For convenience we aso introduce the foowing concepts. Definition 2.3 (samped, rejected, candidate group). We say a group G a samped group if G S acc, a rejected group if G S rej, and a candidate group if G (S acc S rej ).

5 y axis p 1 p 2 p x axis Figure 1: Each square is a ce; each ight bue square is a samped ce. Each gray dash circe stands for a group. Red points (p 1,p 2 and p 3 ) are representative points; p 1 is in the accept set and p 2 is in the reject set. Gray ces form ADJ(p 3 ). α = 2 in this iustration. Figure 1 iustrates some of the concepts we have introduced so far. Obviousy, the set of samped groups and the set of rejected groups are disjoint, and their union is the set of candidate groups. Aso note that S acc is the set of representative points of the samped groups, and S rej is the set of representative points of rejected groups. We comment that it is important to keep the set S rej, even that at the end we wi ony sampe a point from S acc. This is because otherwise we wi have no information regarding the first points of those groups that may have points other than the first ones faing into a samped ce, and consequenty points in S\S rep may aso be incuded into S acc, which wi make the fina samping to be nonuniform among the groups. One may wonder whether this additiona storage wi cost too much space. Fortunatey, since each group has diameter at most α, we ony need to monitor groups that are at a distance of at most α away from samped ces, whose cardinaity can be shown to be sma. More precisey, for a group G, etting p be its representative point, we monitor G if and ony if there exists a samped ce C such that C ADJ(p). The set of representative points of such groups is precisey S acc S rej. Our agorithm for 0 -samping in the infinite window case is presented in Agorithm 1. We contro the sampe rate by doubing the range R of the hash function when the number of points of S acc exceeds a threshod Θ(ogm) (Line 10 of Agorithm 1). We wi aso update S acc and S rej accordingy to maintain Definition 2.2. When a new point p comes, if CELL(p) is samped and p is the first point in G(p) (Line 6), we add p to S acc ; that is, we make p as the representative point of the samped group G(p). Otherwise if CELL(p) is not samped but there is at east one samped ce in ADJ(p), and p is the first point in G(p) (Line 8), then we add p to S rej ; that is, we make p as the representative point of the rejected group G(p). Agorithm 1: ROBUST 0 -SAMPLING-IW 1 R 1; S acc ; S rej 2 κ 0 is chosen to be a arge enough constant /* dataset is fed as a stream of points */ 3 for each arriving point p do /* if p is not the first point of a candidate group, skip it */ 4 if u S acc S rej s.t. d(u,p) α then 5 continue /* if p is the first point of a candidate group */ 6 if h R (ce(p)) = 0 then 7 S acc S acc {p} 8 ese if C adj(p) s.t. h R (ce(c)) = 0 then 9 S rej S rej {p} 10 if S acc > κ 0 ogm then 11 R 2R 12 update S acc and S rej according to the updated hash function h R /* at the time of query: */ 13 return a random point in S acc On the other hand, if there is at east one samped ce in ADJ(p) (i.e., G(p) is a candidate group) and p is not the first point (Line 4), then we simpy discard p. Note that we can test this since we have aready stored the representation points of a candidate groups. In the remaining case in which G(p) is not a candidate group, we aso discard p. At the time of query, we return a random point in S acc. Correctness and Compexity. We show the foowing theorem. THEOREM 2.4. In constant dimensiona Eucidean space for a we-separated dataset, there exists a streaming agorithm (Agorithm 1) that with probabiity 1 1/m, at any time step, it outputs a robust 0 -sampe. The agorithm uses O(ogm) words of space and O(ogm) processing time per point. The correctness of the agorithm is straightforward. First, S acc is a random subset of S rep since each point p S rep is incuded in S acc if and ony if h R (CELL(p)) = 0. Second, the outputted point is a random point in S acc. The ony thing eft to be shown is that we have S acc > 0 at any time step. LEMMA 2.5. With probabiity 1 1/m, we have S acc > 0 throughout the execution of the agorithm. PROOF. At the first time step of the streaming process, p 1 is added into S acc with certainty since R is initiaized to be 1. Then S acc keeps growing. At the moment when S acc > κ 0 ogm, R is doubed so that each point in S acc is resamped with probabiity 1 2. After the resamping, Pr[ S acc = 0] ( 1 2 ) κ0 og m 1 m 2. (4)

6 By a union bound over at most m resampe steps, we concude that with probabiity 1 1/m, S acc > 0 throughout the execution of the agorithm. We next anayze the space and time compexities of Agorithm 1. LEMMA 2.6. With probabiity (1 1/m) we have S rej = O(ogm) throughout the execution of the agorithm. PROOF. Consider a fixed time step. Let S = S acc S rej. For a fixed p S rep, since ADJ(p) 25 (we are in the 2-dimensiona Eucidean space), and each ce is samped randomy, we have Pr[p S rej ] 24 Pr[p S]. (5) 25 We ony need to prove the emma for the case Pr[p S rej ] = Pr[p S]; the case Pr[p S rej ] < Pr[p S] foows directy since p is ess ikey to be incuded in S rej. For each p S, define X p to be a random variabe such that X p = 1 if p S rej, and X p = 0 otherwise. Let X = p S rej X p. We have X = S rej and E[X] = S. By a Chernoff bound (Lemma 1.7), we have Pr [X E[X ] > 0.01E[X ]] e E[X ] 3. (6) If S ogm then we immediatey have S rej S = O(ogm). Otherwise by (6) we have We thus have Pr[X > 1.01E[X ]] < 1/m 2. 1/m 2 > Pr[X > 1.01E[X ]] = Pr[X > S ] = Pr[X > (X + S acc )] = Pr[0.0304X > S acc ]. According to Agorithm 1 it aways hods that S acc = O(ogm). Therefore S rej = X = O(ogm) with probabiity at east 1 1/m 2. The emma foows by a union bound over m time steps of the streaming process. By Lemma 2.6 the space used by the agorithm can be bounded by O( S acc + S rej ) = O(ogm) words. The processing time per point is aso bounded by O( S acc + S rej ). 2.2 Siding Windows We now consider the siding window case. Let w be the window size. We first present an agorithm that maintains a set of samped points in S acc with a fixed sampe rate 1/R; it wi be used as a subroutine in our fina siding window agorithm (Section 2.2.2) A Siding Window Agorithm with Fixed Sampe Rate. We describe the agorithm in Agorithm 2, and expain it in words beow. Besides maintaining the accept set and the reject set as that in the infinite window case, Agorithm 2 aso maintains a set A consisting of key-vaue pairs (u,p), where u is the representative point of a candidate group (u can be a point outside the siding window as ong as the group has at east one point inside the siding window), and p Agorithm 2: SW WITH FIXED SAMPLE RATE 1/R 1 for each expired point p do 2 if (u,p) A then 3 deete (u,p) from A, deete u from S acc S rej 4 for each arriving point p do /* if there aready exists a point of the same group in S acc S rej */ 5 if u S acc S rej s.t. d(u,p) α then 6 A (u,p) A\(u, ) /* otherwise we set p as a representative of its group */ 7 ese if h R (ce(p)) = 0 then 8 S acc S acc {p}, A A (p,p) 9 ese if C adj(p) s.t. h R (C) = 0 then 10 S rej S rej {p}, A A (p,p) Figure 2: Representative points in siding windows. There are two different groups, and the red window is the current siding window (of size w = 5). Point c is not the representative point of Group 1 because the window right before it (incusive) contains point b which is aso in Group 1. Point b is the representative point because it is the atest point such that there is no other point of Group 1 in the window right before b. is the atest point of the same group (thus p must be in the siding window). Define A(S acc ) = {p u S acc s.t. (u,p) A}. For each siding window, we guarantee that each candidate group G has exacty one representative point. This is achieved by the foowing process: for each candidate group G, if there is no maintained representative point, then we take the first point u as the representative point (Line 8 and Line 10). When the ast point p of the group expires, we deete the maintained representative point u from S acc S rej, and deete (u,p) from A (Line 3). For a new arriving point p, if there aready exists a point u S acc S rej in the same group G, then we simpy update the ast point in the pair (u, ) we maintained for G (Line 6). Otherwise p is the first point of G(p) in the siding window. If G(p) is a samped group, then we add p to S acc and (p,p) to A (Line 8); ese if G(p) is a rejected group, then we add p to S rej and (p,p) to A (Line 10). The foowing observation is a direct consequence of Agorithm 2. It foows from the discussion above and the testing at Line 7 of Agorithm 2. OBSERVATION 1. In Agorithm 2, at any time for the current siding window, we have (1) Each group has exacty one representative point, which is fuy determined by the stream and is independent of the hash function. More precisey, a point p becomes the representative

7 point of group G in the current window if p is the atest point in G such that the window right before p (incusive) has no point in G. See Figure 2 for an iustration. (2) The representative point of each group in the current window is incuded in S acc with probabiity 1/R A Space-Efficient Agorithm for Siding Windows. We now present our space-efficient siding window agorithm. Note that the agorithm presented in Section 2.2.1, though being abe to produce a random sampe in the siding window setting, does not have a good space usage guarantee; it may use space up to w/r where w is the window size. The siding window agorithm presented in this section works simutaneousy for both sequence-based siding window and timebased siding window. High Leve Ideas. As mentioned, the main chaenge of the siding window agorithm design is that points wi expire, and thus we cannot keep decreasing the sampe rate. On the contrary, if at some point there are too few non-expired samped points, then we have to increase the sampe rate to guarantee that there is at east one point in the siding window beonging to S acc. However, if we increase the sampe rate in the midde of the streaming process, then the neighborhood information of a newy samped group may aready get ost. In other words, we cannot propery maintain S rej when the sampe rate increases. To resove this issue we have the prepare such a decrease of S acc in advance. To this end, we maintain a hierarchica set of instances of Agorithm 2, with sampe rates 1/R being 1, 1/2, 1/4,... respectivey. We thus can guarantee that in the owest eve (the one with sampe rate 1) we must have at east one samped point. Of course, to achieve a good space usage we cannot endessy insert points to a the Agorithm 2 instances. We instead make sure that each eve stores at most S acc S rej = O(ogm) points, where S acc and S rej are the accept set and reject set respectivey in the run of an Agorithm 2 instance at eve. We achieve this by maintaining a dynamic partition of the siding window. In the -th subwindow we run an instance of Agorithm 2 with sampe rate 1/2. For each incoming point, we accept it at the highest eve in which the point fas into S acc, and then deete a points in the accept and reject sets in a the ower eves. Whenever the number of points in S acc at some eve exceeds the threshod c ogm for some constant c, we promote most of its points to eve + 1. The process may cascade to the top eve. At the time of query we propery resampe the points maintained at each S acc ( = 0, 1,...) to unify their sampe probabiities, and then merge them to S acc. In order to guarantee that if the siding window is not empty then we aways have at east one samped point in S acc, during the agorithm (in particuar, the promotion procedure) we make sure that the ast point of each eve is aways in the accept set S acc. REMARK 1. The hierarchica set of windows reminisces the exponentia histogram technique by Datar et a. [16] for basic counting in the siding window mode. However, by a carefu ook one wi notice that our agorithm is very different from exponentia histogram, and is (naturay) more compicated since we need to dea with both distinct eements and near-dupicates. For exampe, the exponentia Agorithm 3: Robust 0 -SAMPLING-SW 1 R 2 for a = 0, 1,..., L. 2 for 0 to L do /* create an agorithm instance according to Agorithm 2 with fixed sampe rate 1/R * / 3 ALG (,,, R ) 4 for each arriving point p do 5 for L downto 0 do 6 ALG (p) /* feed p to the instance ALG */ 7 if (u,p) A then /* prune a subsequent eves */ 8 for j 1 downto 0 do 9 ALG j (,,, R j ) 10 if S acc > κ 0 ogm then 11 j 12 create a temporary instance ALG 13 whie ( Sj acc > κ 0 ogm) do 14 (ALG, ALG j ) SPLIT(ALG j ) 15 ALG j+1 MERGE(ALG, ALG j+1 ) 16 j j if j > L then return error 18 break /* at the time of query: */ 19 S 20 Let c be the maximum index such that S acc 21 for 1 to c do 22 incude each point in the set {p (,p) A } to S with probabiity R /R c 23 return a random point from S histogram agorithm in [16] partitions the siding window deterministicay to subwindows of size 1, 2, 4,.... Suppose we are ony interesting in the representative point of each group, we basicay need to deete a the other points in each group in the siding window, which wi change the sizes of the subwindows. Handing near dupicates adds another ayer of difficuty to the agorithm design; we hande this by empoying Agorithm 2 (which is a variant of the agorithm for the infinite window in Section 2.1) at each of the subwindows with different sampe rates. The interpay between these components make the overa agorithm invoved. The Agorithm. We present our siding window agorithm in Agorithm 3 using Agorithm 4 and Agorithm 5 as subroutines. We use ALG to represent an instance of Agorithm 2. For convenience, we aso use ALG to represent a the data structures that are maintained during the run of Agorithm 2, and write ALG = (S acc, S rej, A, R), where S acc, S rej are the accept and reject sets respectivey, A is the key-vaue pair store, and R is the reciproca of the sampe rate.

8 Agorithm 4: SPLIT(ALG ) 1 create instances ALG a = (Sa acc, Sa rej, A a, R a ) and ALG b = (S b acc, Srej b, A b, R b ) 2 t = max{t (p t S acc ) (h R+1 (CELL(p t )) = 0)} 3 Sa acc {p k S acc (k t) (h R+1 (CELL(p k )) = 0)}; Sa rej {p k S rej (k t) (h R+1 (CELL(p k )) = 0)}; A a {(u, ) A u S acc }; R a R +1 4 S b acc {p k S acc k > t}; S rej b {p k S rej k > t}; A b {(u, ) A u S acc }; R b R 5 return (ALG a, ALG b ) Agorithm 5: MERGE(ALG a, ALG b ) 1 create a temporary instance ALG = (S acc, S rej, A, R) 2 S acc Sa acc S b acc; Srej Sa rej S rej b ; A A a A b ; R R a 3 return ALG Set R = 2 for = 0, 1,..., L = ogw. In Agorithm 3 we create L instances of Agorithm 2 with empty S acc, S rej, A (denoted by ), and sampe rates 1/R respectivey. We ca the instance with R = 2 the eve instance. When a new point p comes, we first find the highest eve such that p is samped by ALG (i.e., p S acc ), and then deete a the structures of ALG j (j < ), except keep their sampe rates 1/R j (Line 5 to Line 9). If after incuding p, the size of S acc becomes more than κ 0 ogm, we have to do a series of updates to restore the invariant that the accept set of each eve contains at most κ 0 ogm points at any time step (Line 10 to Line 16). To do this, we first spit the instance of ALG into two instances (Agorithm 4). Let point p be the ast point in S acc which is samped by hash function h R+1. We promote a the points in S acc S rej arriving before (and incude) p to eve + 1 by resamping them using h R+1, which gives a new eve + 1 instance ALG. We next try to merge ALG with ALG +1 who have the same sampe rate by merging their accept/reject sets and the sets of key-vaue pairs (Agorithm 5). The merge may resut S +1 acc > κ 0 ogm, in which case we have to perform the spit and merge again. These operations may propagate to the upper eves unti we research a eve in which we have S acc κ 0 ogm after the merge. At the time of query, we have to integrate the maintained sampes in a L eves. Since at each eve we sampe points use different sampe rates 1/R, we have to resampe each point in S acc with probabiity R /R c where c is the argest eve that has a non-empty accept set (Line 20 to Line 22). Correctness and Compexity. In this section we prove the foowing theorem. THEOREM 2.7. In constant dimensiona Eucidean space for a we-separated dataset, there exists a siding window agorithm (Agorithm 3) that with probabiity 1 1/m, at any time step, it eve 4 eve 3 eve 2 eve 1 eve 0 siding window Figure 3: An iustration of subwindows of a siding window; the subwindow at eve 0 is an empty set. outputs a robust 0 -sampe. The agorithm uses O(ogw ogm) words of space and O(ogw ogm) amortized processing time per point. First, it is easy to show the probabiity that Agorithm 3 outputs error is negigibe. LEMMA 2.8. With probabiity 1 1/m 2, Agorithm 3 wi not output error at Line 17 during the whoe streaming process. PROOF. Fix a siding window W. Let G 1,...,G k (k w) be the groups in W. The sampe rate at eve L is 1/R L = 1/2 L = 1/w. Let X be a random variabe such that X = 1 if the -th group is samped, and X = 0 otherwise. Let X = k =1 X. Thus E[X ] = k 1/R L w 1/w = 1. By a Chernoff bound (Lemma 1.8) we have that with probabiity 1 1/m 3, we have X κ 0 ogm for a arge enough constant κ 0. The emma then foows by a union bound over at most m samping steps. The foowing definition is usefu for the anaysis. Definition 2.9 (subwindows). For a fixed siding window W, we define a subwindow W for each instance ALG ( = 0, 1,..., L) as foows. W L starts from the first point in the siding window to the ast point (denoted by p tl ) in A(SL acc). For = L 1,..., 1, W starts from p t+1 +1 to the ast point (denoted by p t ) in A(S acc ). W 0 starts from p t1 +1 to the ast point in the window W. See Figure 3 for an iustration of subwindows. We note that a subwindow can be empty. We aso note the foowing immediate facts by the definitions of subwindows. FACT 2. W 0 W 1... W L covers the whoe window W. FACT 3. Each subwindow W ( = 1,..., L) ends up with a point in A(S acc ). For = 0, 1,..., L, et G be the set of groups whose ast points ie in W, and et S rep be the set of their representative points. From Agorithm 3, 4 and 5 it is easy to see that the foowing is maintained during the whoe streaming process. FACT 4. During the run of Agorithm 3, at any time step, S acc is formed by samping each point in S rep with probabiity 1/R. The foowing emma guarantees that at the time of query we can aways output a sampe. S acc S rej

9 LEMMA During the run of Agorithm 3, at any time step, if the siding window contains at east one point, then when querying we can aways return a sampe, i.e., S 1. PROOF. The emma foows from Fact 3, and the fact that ALG 0 incudes every point in S rep 0 (R 0 = 1). Now we are ready to prove the theorem. (FOR THEOREM 2.7). We have the foowing facts: (1) S rep 0, Srep 1,..., Srep L are set of representatives of disjoint sets of groups G 0, G 1,..., G L. And L =0 G is the set of a groups who have the ast points inside the siding window. (2) By Fact 4 each S acc is formed by samping each point in S rep with probabiity 1/R. (3) Each point in S rep is incuded in S with probabiity R /R c (Line 22 of Agorithm 3). (4) By Lemma 2.10, S 1 if the siding window is not empty. (5) The fina sampe returned is a random sampe of S. (6) By Lemma 2.8, with probabiity (1 1/m) the agorithm wi not output error. By the first three items we know that S is a random sampe of the ast points of a groups within the siding window, which, combined with Item 4, 5 and 6, give the correctness of the theorem. We now anayze the space and time compexities. The space usage at each eve can be bounded by O(ogm) words. This is due to the fact that Sj acc is aways no more κ 0 ogm, and consequenty A j has O(ogm) key-vaue pairs. Using Lemma 2.6 we have that with probabiity 1 1/m 2, S rej j = O(ogm). 1 Thus by a union bound, with probabiity (1 O(ogw/m 2 )), the tota space is bounded by O(ogw ogm) words since we have O(ogw) eves. For the time cost, simpy note that the time spent on each point at each eve during the whoe streaming process can be bounded by O(ogm), and thus the amortized processing time per item can be bounded by O(ogw ogm). 2.3 Discussions We concude the section with some discussions and easy extensions. Samping k Points with/without Repacement. Samping k groups with repacement can be triviay achieved by running k instances of the agorithm for samping one group (Agorithm 1 or Agorithm 3) in parae. For samping k groups without repacement, we can increase the threshod at Line 10 of Agorithm 1 to κ 0 k ogm, by which we can show using exacty the same anaysis in Section 2.1 that with probabiity 1 1/m we have S acc k. Simiary, for the siding window case we can increase the threshod at Line 10 of Agorithm 3 to κ 0 k ogm. Random Point As Group Representative. We can easiy augment our agorithms such that instead of aways returning the (fixed) representative point of a randomy samped group, we can return a random point of the group. In other words, we want to return each point p G with equa probabiity 1 n G. 1 We can reduce the faiure probabiity 1/m to 1/m 2 by appropriatey changing the constants in the proof. For the infinite window case we can simpy pug-in the cassica Reservoir samping [35] in Agorithm 1. We can impement this as foows: For each group G that has a point stored in S acc S rej, we maintain an e G = (v,ct) pair where ct is a counter counting the number of points of this group, and v is the random representative point. At the beginning (when the first point u of group G comes) we set e G = (u, 1). When a new point p is inserted, if there exists u S acc such that d(u,p) α (i.e., u and p are in the same group), we increment the counter ct for group G(u), and reset e G = (u,p) with probabiity ct 1. For the siding window case, we can just repace Reservoir samping with a random samping agorithm for siding windows (e.g., the one in [8]). 3 GENERAL DATASETS In this section we consider genera datasets which may not be weseparated, and consequenty there is no natura partition of groups. However, we show that Agorithm 1 sti gives the foowing guarantee. THEOREM 3.1. For a genera dataset S in constant dimensiona Eucidean space, there exists a streaming agorithm (Agorithm 1) that with probabiity 1 1/m, at any time step, it outputs a point q satisfying Equaity (2), that is, p S, Pr[q Ba(p, α) S] = Θ( 1 F 0 (S,α ) ), where Ba(p, α) is the ba centered at p with radius α. Before proving the theorem, we first study group partitions generated by a greedy process. Definition 3.2 (Greedy Partition). Given a dataset S, a greedy partition is generated by the foowing process: pick an arbitrary point p S, create a new group G(p) Ba(p, α) S and update S S\G(p); repeat this process unti S =. LEMMA 3.3. Given a dataset S, et n opt be the number of groups in the minimum cardinaity partition of S, and n gdy be the number of groups in an arbitrary greedy partition. We aways have n opt = Θ(n gdy ). PROOF. We first show n gdy n opt. Let G(p 1 ),...,G(p ngdy ) be the groups in the greedy partition according to the orders they were created, and et H 1,..., H nopt be the minimum partition. We prove by induction. First it is easy to see that G(p 1 ) must cover the group containing p 1 in the minimum partition (w..o.g. denote that group by H 1 ). Suppose that i j=1 G(p j ) covers i groups H 1,..., H i in the minimum partition, that is, i j=1 H j i j=1 G(p j ), we can show that there must be a new group H i+1 in the minimum partition such that i+1 j=1 H j i+1 j=1 G(p j ), which gives n gdy n opt. The induction step foows from the foowing facts. (1) p i+1 i j=1 G(p j ). (2) Ba(p i+1, α) i+1 j=1 G(p j ). (3) The diameter of each group in the minimum partition is at most α. Indeed, by (1) and the induction hypothesis we have p i+1 i j=1 H j. Let H i+1 be the group containing p i+1 in the minimum partition. Then by (2) and (3) we must have H i+1 Ba(p i+1, α) i+1 j=1 G(p j ).

10 We next show n opt O(n gdy ). This is not obvious since the diameter of a group in the greedy partition may be arger than α (but is at most 2α), whie groups in the minimum partition have diameter at most α. However, in constant dimensiona Eucidean space, each group in a greedy group partition can intersect at most O(1) groups in the minimum cardinaity partition. We thus sti have n opt O(n gdy ). Now we are ready to prove the theorem. (FOR THEOREM 3.1). We can think the group partition in Agorithm 1 as a greedy process. Let (q 1,...,q z ) be the sequence of points that are incuded in S acc, according to their arriving orders in the stream. We can generate a greedy group partition on zi=1 Ba(q i, α) as foows: for i = 1,..., z, create a new group G(q i ) Ba(q i, α) S and update S S\G(q i ). Let G sub = {G(q 1 ),...,G(q z )}. We then appy the greedy partition process on the remaining points in S, again according to their arriving orders in the stream. Let q z+1,...,q ngdy be the representative points of the remaining groups. Let G = {G(q 1 ),...,G(q ngdy )} be the fina group partition of S. We have the foowing facts. (1) Each group in G intersects Θ(1) grid ce in G. (2) Each grid ce in G is samped by the hash function h R with equa probabiity. (3) q 1,...,q z are the representative points of their groups in G sub. (4) Agorithm 1 returns a sampe randomy from q 1,...,q z. By items 1 and 2, we know that each group in G is incuded in G sub with probabiity Θ( G sub / G ). By items 3 and 4, we know that Agorithm 1 returns a random group from G sub. Therefore each group G G is samped by Agorithm 1 with probabiity Θ(1/n gdy ) = Θ(1/n opt ), where the ast equation is due to Lemma 3.3. Now for any p S, according to the greedy process and Agorithm 1, there must be some q S such that G(p) Ba(q, α), and if G(p) is samped then q is the samped point. So the probabiity that q is samped is at east the probabiity that G(p) is samped. Finay, note that if p Ba(q, α) then we aso have q Ba(p, α). We thus have Pr[ q Ba(p, α) s.t. q is samped] = Ω(1/n opt ). (7) On the other hand, in constant dimensiona Eucidean space Ba(p, α) can ony intersect O(1) groups in the greedy partition. We thus aso have Pr[ q Ba(p, α) s.t. q is samped] = O(1/n opt ). (8) The theorem foows from (7) and (8). It is easy to see that the above arguments can aso be appied to the siding window case with respect to Agorithm 3. COROLLARY 3.4. For a genera dataset in constant dimensiona Eucidean space, there exists a siding window agorithm (Agorithm 3) that with probabiity 1 1/m, at any time step, it outputs a point q such that p S, Pr[q Ba(p, α)] = Θ(1/n opt ), where S is the set of a the points in the siding window, and n opt is the size of the minimum cardinaity partition of S with group radius α. 4 HIGH DIMENSIONS In this section we consider datasets in d-dimensiona Eucidean space for genera d. We show that Agorithm 1, with some sma modifications, can hande (α, β)-sparse dataset in d-dimensiona Eucidean space with β > d 1.5 α as we. THEOREM 4.1. In the d-dimensiona Eucidean space, for an (α, β)-sparse dataset with β > d 1.5 α, there is a streaming agorithm such that with probabiity 1 1/m, at any time step, it outputs a robust 0 -sampe. The agorithm uses O(d ogm) words of space and O(d ogm) processing time per item. REMARK 2. We can use Johnson-Lindenstrauss dimension reduction to weaken the sparsity assumption to β c α og 1.5 m α for some arge enough constant c α. We pace a random grid G with side ength dα. Since the dataset is (α, β)-sparse with β > d 1.5 α, each grid ce can intersect at most one group. However, in the d-dimensiona space a group can intersect 2 d grid ces in the worst case, which may cause difficuty to maintain S rej in sma space in the worst case we woud have S rej = Ω(2 d ) whie S acc is sti sma. Fortunatey, in the foowing emma we show that for any p S rep, the probabiity that p S rej wi not be too arge compared with the probabiity that p S acc. LEMMA 4.2. For any fixed p S rep, we have Pr[p S rej ] κ 1 Pr[p S acc S rej ], where κ 1 (0, 1) is a constant. PROOF. For a group G, et Ba(G, α) = {p d(p,g) α} where d(p,g) = min q G d(p,q). It is easy to see that Ba(G, α) has a diameter of at most 3α because the diameter of G is at most α. Since the random grid has side ength dα, the probabiity that Ba(G, α) is cut by the boundaries of ces in each dimension is at most µ = d 3. If Ba(G, α) is cut by i dimensions, the number of ces it intersects is at most 2 i, and consequenty ADJ(p) 2 i for each p G. Reca that each ce is samped with probabiity R 1, we thus have = Pr[p S rej S acc ] Pr[p S rej S rej ADJ(p) = i] Pr[ ADJ(p) = i] i 1 d ( d )µ i d i 2i (1 µ) i R i=0 (2µ + 1 µ) d R (1 + d 3 )d ( ) 1 = O. R R Since S acc S rej =, we have Pr[p S rej ] = Pr[p S rej S acc ] Pr[p S acc ] for some constant κ 1 (0, 1). κ 1 Pr[p S acc S rej ] By Lemma 4.2, and basicay the same anaysis as that in Lemma 2.6, we can bound the space usage of Agorithm 1 by O(d ogm) (O(ogm)

Solutions to the Final Exam

Solutions to the Final Exam CS/Math 24: Intro to Discrete Math 5//2 Instructor: Dieter van Mekebeek Soutions to the Fina Exam Probem Let D be the set of a peope. From the definition of R we see that (x, y) R if and ony if x is a

More information

Space-Time Trade-offs.

Space-Time Trade-offs. Space-Time Trade-offs. Chethan Kamath 03.07.2017 1 Motivation An important question in the study of computation is how to best use the registers in a CPU. In most cases, the amount of registers avaiabe

More information

Lecture outline Graphics and Interaction Scan Converting Polygons and Lines. Inside or outside a polygon? Scan conversion.

Lecture outline Graphics and Interaction Scan Converting Polygons and Lines. Inside or outside a polygon? Scan conversion. Lecture outine 433-324 Graphics and Interaction Scan Converting Poygons and Lines Department of Computer Science and Software Engineering The Introduction Scan conversion Scan-ine agorithm Edge coherence

More information

Nearest Neighbor Learning

Nearest Neighbor Learning Nearest Neighbor Learning Cassify based on oca simiarity Ranges from simpe nearest neighbor to case-based and anaogica reasoning Use oca information near the current query instance to decide the cassification

More information

Priority Queueing for Packets with Two Characteristics

Priority Queueing for Packets with Two Characteristics 1 Priority Queueing for Packets with Two Characteristics Pave Chuprikov, Sergey I. Nikoenko, Aex Davydow, Kiri Kogan Abstract Modern network eements are increasingy required to dea with heterogeneous traffic.

More information

Language Identification for Texts Written in Transliteration

Language Identification for Texts Written in Transliteration Language Identification for Texts Written in Transiteration Andrey Chepovskiy, Sergey Gusev, Margarita Kurbatova Higher Schoo of Economics, Data Anaysis and Artificia Inteigence Department, Pokrovskiy

More information

On Upper Bounds for Assortment Optimization under the Mixture of Multinomial Logit Models

On Upper Bounds for Assortment Optimization under the Mixture of Multinomial Logit Models On Upper Bounds for Assortment Optimization under the Mixture of Mutinomia Logit Modes Sumit Kunnumka September 30, 2014 Abstract The assortment optimization probem under the mixture of mutinomia ogit

More information

Special Edition Using Microsoft Excel Selecting and Naming Cells and Ranges

Special Edition Using Microsoft Excel Selecting and Naming Cells and Ranges Specia Edition Using Microsoft Exce 2000 - Lesson 3 - Seecting and Naming Ces and.. Page 1 of 8 [Figures are not incuded in this sampe chapter] Specia Edition Using Microsoft Exce 2000-3 - Seecting and

More information

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining Data Mining Cassification: Basic Concepts, Decision Trees, and Mode Evauation Lecture Notes for Chapter 4 Part III Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

More information

Mobile App Recommendation: Maximize the Total App Downloads

Mobile App Recommendation: Maximize the Total App Downloads Mobie App Recommendation: Maximize the Tota App Downoads Zhuohua Chen Schoo of Economics and Management Tsinghua University chenzhh3.12@sem.tsinghua.edu.cn Yinghui (Catherine) Yang Graduate Schoo of Management

More information

ACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES. Eyal En Gad, Akshay Gadde, A. Salman Avestimehr and Antonio Ortega

ACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES. Eyal En Gad, Akshay Gadde, A. Salman Avestimehr and Antonio Ortega ACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES Eya En Gad, Akshay Gadde, A. Saman Avestimehr and Antonio Ortega Department of Eectrica Engineering University of Southern

More information

Alpha labelings of straight simple polyominal caterpillars

Alpha labelings of straight simple polyominal caterpillars Apha abeings of straight simpe poyomina caterpiars Daibor Froncek, O Nei Kingston, Kye Vezina Department of Mathematics and Statistics University of Minnesota Duuth University Drive Duuth, MN 82-3, U.S.A.

More information

A Fast Block Matching Algorithm Based on the Winner-Update Strategy

A Fast Block Matching Algorithm Based on the Winner-Update Strategy In Proceedings of the Fourth Asian Conference on Computer Vision, Taipei, Taiwan, Jan. 000, Voume, pages 977 98 A Fast Bock Matching Agorithm Based on the Winner-Update Strategy Yong-Sheng Chenyz Yi-Ping

More information

Solving Large Double Digestion Problems for DNA Restriction Mapping by Using Branch-and-Bound Integer Linear Programming

Solving Large Double Digestion Problems for DNA Restriction Mapping by Using Branch-and-Bound Integer Linear Programming The First Internationa Symposium on Optimization and Systems Bioogy (OSB 07) Beijing, China, August 8 10, 2007 Copyright 2007 ORSC & APORC pp. 267 279 Soving Large Doube Digestion Probems for DNA Restriction

More information

Outline. Parallel Numerical Algorithms. Forward Substitution. Triangular Matrices. Solving Triangular Systems. Back Substitution. Parallel Algorithm

Outline. Parallel Numerical Algorithms. Forward Substitution. Triangular Matrices. Solving Triangular Systems. Back Substitution. Parallel Algorithm Outine Parae Numerica Agorithms Chapter 8 Prof. Michae T. Heath Department of Computer Science University of Iinois at Urbana-Champaign CS 554 / CSE 512 1 2 3 4 Trianguar Matrices Michae T. Heath Parae

More information

A Petrel Plugin for Surface Modeling

A Petrel Plugin for Surface Modeling A Petre Pugin for Surface Modeing R. M. Hassanpour, S. H. Derakhshan and C. V. Deutsch Structure and thickness uncertainty are important components of any uncertainty study. The exact ocations of the geoogica

More information

M. Badent 1, E. Di Giacomo 2, G. Liotta 2

M. Badent 1, E. Di Giacomo 2, G. Liotta 2 DIEI Dipartimento di Ingegneria Eettronica e de informazione RT 005-06 Drawing Coored Graphs on Coored Points M. Badent 1, E. Di Giacomo 2, G. Liotta 2 1 University of Konstanz 2 Università di Perugia

More information

A Design Method for Optimal Truss Structures with Certain Redundancy Based on Combinatorial Rigidity Theory

A Design Method for Optimal Truss Structures with Certain Redundancy Based on Combinatorial Rigidity Theory 0 th Word Congress on Structura and Mutidiscipinary Optimization May 9 -, 03, Orando, Forida, USA A Design Method for Optima Truss Structures with Certain Redundancy Based on Combinatoria Rigidity Theory

More information

A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS. A. C. Finch, K. J. Mackenzie, G. J. Balsdon, G. Symonds

A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS. A. C. Finch, K. J. Mackenzie, G. J. Balsdon, G. Symonds A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS A C Finch K J Mackenzie G J Basdon G Symonds Raca-Redac Ltd Newtown Tewkesbury Gos Engand ABSTRACT The introduction of fine-ine technoogies to printed

More information

file://j:\macmillancomputerpublishing\chapters\in073.html 3/22/01

file://j:\macmillancomputerpublishing\chapters\in073.html 3/22/01 Page 1 of 15 Chapter 9 Chapter 9: Deveoping the Logica Data Mode The information requirements and business rues provide the information to produce the entities, attributes, and reationships in ogica mode.

More information

Arithmetic Coding. Prof. Ja-Ling Wu. Department of Computer Science and Information Engineering National Taiwan University

Arithmetic Coding. Prof. Ja-Ling Wu. Department of Computer Science and Information Engineering National Taiwan University Arithmetic Coding Prof. Ja-Ling Wu Department of Computer Science and Information Engineering Nationa Taiwan University F(X) Shannon-Fano-Eias Coding W..o.g. we can take X={,,,m}. Assume p()>0 for a. The

More information

An Exponential Time 2-Approximation Algorithm for Bandwidth

An Exponential Time 2-Approximation Algorithm for Bandwidth An Exponentia Time 2-Approximation Agorithm for Bandwidth Martin Fürer 1, Serge Gaspers 2, Shiva Prasad Kasiviswanathan 3 1 Computer Science and Engineering, Pennsyvania State University, furer@cse.psu.edu

More information

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm A Comparison of a Second-Order versus a Fourth- Order Lapacian Operator in the Mutigrid Agorithm Kaushik Datta (kdatta@cs.berkeey.edu Math Project May 9, 003 Abstract In this paper, the mutigrid agorithm

More information

Providing Hop-by-Hop Authentication and Source Privacy in Wireless Sensor Networks

Providing Hop-by-Hop Authentication and Source Privacy in Wireless Sensor Networks The 31st Annua IEEE Internationa Conference on Computer Communications: Mini-Conference Providing Hop-by-Hop Authentication and Source Privacy in Wireess Sensor Networks Yun Li Jian Li Jian Ren Department

More information

Hiding secrete data in compressed images using histogram analysis

Hiding secrete data in compressed images using histogram analysis University of Woongong Research Onine University of Woongong in Dubai - Papers University of Woongong in Dubai 2 iding secrete data in compressed images using histogram anaysis Farhad Keissarian University

More information

Automatic Grouping for Social Networks CS229 Project Report

Automatic Grouping for Social Networks CS229 Project Report Automatic Grouping for Socia Networks CS229 Project Report Xiaoying Tian Ya Le Yangru Fang Abstract Socia networking sites aow users to manuay categorize their friends, but it is aborious to construct

More information

Distance Weighted Discrimination and Second Order Cone Programming

Distance Weighted Discrimination and Second Order Cone Programming Distance Weighted Discrimination and Second Order Cone Programming Hanwen Huang, Xiaosun Lu, Yufeng Liu, J. S. Marron, Perry Haaand Apri 3, 2012 1 Introduction This vignette demonstrates the utiity and

More information

As Michi Henning and Steve Vinoski showed 1, calling a remote

As Michi Henning and Steve Vinoski showed 1, calling a remote Reducing CORBA Ca Latency by Caching and Prefetching Bernd Brügge and Christoph Vismeier Technische Universität München Method ca atency is a major probem in approaches based on object-oriented middeware

More information

Planar Graphs of Bounded Degree have Constant Queue Number

Planar Graphs of Bounded Degree have Constant Queue Number Panar Graphs of Bounded Degree have Constant Queue Number Michae A. Bekos, Henry Förster, Martin Gronemann, Tamara Mchedidze 3 Fabrizio Montecchiani 4, Chrysanthi Raftopouou 5, Torsten Ueckerdt 3 Institute

More information

Chapter Multidimensional Direct Search Method

Chapter Multidimensional Direct Search Method Chapter 09.03 Mutidimensiona Direct Search Method After reading this chapter, you shoud be abe to:. Understand the fundamentas of the mutidimensiona direct search methods. Understand how the coordinate

More information

Utility-based Camera Assignment in a Video Network: A Game Theoretic Framework

Utility-based Camera Assignment in a Video Network: A Game Theoretic Framework This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Y.LI AND B.BHANU CAMERA ASSIGNMENT: A GAME-THEORETIC

More information

Further Optimization of the Decoding Method for Shortened Binary Cyclic Fire Code

Further Optimization of the Decoding Method for Shortened Binary Cyclic Fire Code Further Optimization of the Decoding Method for Shortened Binary Cycic Fire Code Ch. Nanda Kishore Heosoft (India) Private Limited 8-2-703, Road No-12 Banjara His, Hyderabad, INDIA Phone: +91-040-3378222

More information

Searching, Sorting & Analysis

Searching, Sorting & Analysis Searching, Sorting & Anaysis Unit 2 Chapter 8 CS 2308 Fa 2018 Ji Seaman 1 Definitions of Search and Sort Search: find a given item in an array, return the index of the item, or -1 if not found. Sort: rearrange

More information

understood as processors that match AST patterns of the source language and translate them into patterns in the target language.

understood as processors that match AST patterns of the source language and translate them into patterns in the target language. A Basic Compier At a fundamenta eve compiers can be understood as processors that match AST patterns of the source anguage and transate them into patterns in the target anguage. Here we wi ook at a basic

More information

AN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART

AN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART 13 AN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART Eva Vona University of Ostrava, 30th dubna st. 22, Ostrava, Czech Repubic e-mai: Eva.Vona@osu.cz Abstract: This artice presents the use of

More information

Fastest-Path Computation

Fastest-Path Computation Fastest-Path Computation DONGHUI ZHANG Coege of Computer & Information Science Northeastern University Synonyms fastest route; driving direction Definition In the United states, ony 9.% of the househods

More information

TechTest2017. Solutions Key. Final Edit Copy. Merit Scholarship Examination in the Sciences and Mathematics given on 1 April 2017, and.

TechTest2017. Solutions Key. Final Edit Copy. Merit Scholarship Examination in the Sciences and Mathematics given on 1 April 2017, and. TechTest07 Merit Schoarship Examination in the Sciences and Mathematics given on Apri 07, and sponsored by The Sierra Economics and Science Foundation Soutions Key V9feb7 TechTest07 Soutions Key / 9 07

More information

Approximate Closest Community Search in Networks

Approximate Closest Community Search in Networks Approximate Cosest Community Search in Networks Xin Huang, Laks V.S. Lakshmanan, Jeffrey Xu Yu, Hong Cheng University of British Coumbia, The Chinese University of Hong Kong {xin,aks}@cs.ubc.ca, {yu, hcheng}@se.cuhk.edu.hk

More information

A Memory Grouping Method for Sharing Memory BIST Logic

A Memory Grouping Method for Sharing Memory BIST Logic A Memory Grouping Method for Sharing Memory BIST Logic Masahide Miyazai, Tomoazu Yoneda, and Hideo Fuiwara Graduate Schoo of Information Science, Nara Institute of Science and Technoogy (NAIST), 8916-5

More information

Extended Node-Arc Formulation for the K-Edge-Disjoint Hop-Constrained Network Design Problem

Extended Node-Arc Formulation for the K-Edge-Disjoint Hop-Constrained Network Design Problem Extended Node-Arc Formuation for the K-Edge-Disjoint Hop-Constrained Network Design Probem Quentin Botton Université cathoique de Louvain, Louvain Schoo of Management, (Begique) botton@poms.uc.ac.be Bernard

More information

CERIAS Tech Report Replicated Parallel I/O without Additional Scheduling Costs by Mikhail J. Atallah Center for Education and Research

CERIAS Tech Report Replicated Parallel I/O without Additional Scheduling Costs by Mikhail J. Atallah Center for Education and Research CERIAS Tech Report 2003-50 Repicated Parae I/O without Additiona Scheduing Costs by Mikhai J. Ataah Center for Education and Research Information Assurance and Security Purdue University, West Lafayette,

More information

Quality of Service Evaluations of Multicast Streaming Protocols *

Quality of Service Evaluations of Multicast Streaming Protocols * Quaity of Service Evauations of Muticast Streaming Protocos Haonan Tan Derek L. Eager Mary. Vernon Hongfei Guo omputer Sciences Department University of Wisconsin-Madison, USA {haonan, vernon, guo}@cs.wisc.edu

More information

Image Segmentation Using Semi-Supervised k-means

Image Segmentation Using Semi-Supervised k-means I J C T A, 9(34) 2016, pp. 595-601 Internationa Science Press Image Segmentation Using Semi-Supervised k-means Reza Monsefi * and Saeed Zahedi * ABSTRACT Extracting the region of interest is a very chaenging

More information

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions 2006 Internationa Joint Conference on Neura Networks Sheraton Vancouver Wa Centre Hote, Vancouver, BC, Canada Juy 16-21, 2006 A New Supervised Custering Agorithm Based on Min-Max Moduar Network with Gaussian-Zero-Crossing

More information

Performance of data networks with random links

Performance of data networks with random links Performance of data networks with random inks arxiv:adap-org/9909006 v2 4 Jan 2001 Henryk Fukś and Anna T. Lawniczak Department of Mathematics and Statistics, University of Gueph, Gueph, Ontario N1G 2W1,

More information

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Why Learn to Program?

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Why Learn to Program? Intro to Programming & C++ Unit 1 Sections 1.1-3 and 2.1-10, 2.12-13, 2.15-17 CS 1428 Spring 2018 Ji Seaman 1.1 Why Program? Computer programmabe machine designed to foow instructions Program a set of

More information

Self-Control Cyclic Access with Time Division - A MAC Proposal for The HFC System

Self-Control Cyclic Access with Time Division - A MAC Proposal for The HFC System Sef-Contro Cycic Access with Time Division - A MAC Proposa for The HFC System S.M. Jiang, Danny H.K. Tsang, Samue T. Chanson Hong Kong University of Science & Technoogy Cear Water Bay, Kowoon, Hong Kong

More information

Transformation Invariance in Pattern Recognition: Tangent Distance and Propagation

Transformation Invariance in Pattern Recognition: Tangent Distance and Propagation Transformation Invariance in Pattern Recognition: Tangent Distance and Propagation Patrice Y. Simard, 1 Yann A. Le Cun, 2 John S. Denker, 2 Bernard Victorri 3 1 Microsoft Research, 1 Microsoft Way, Redmond,

More information

THE PERCENTAGE OCCUPANCY HIT OR MISS TRANSFORM

THE PERCENTAGE OCCUPANCY HIT OR MISS TRANSFORM 17th European Signa Processing Conference (EUSIPCO 2009) Gasgow, Scotand, August 24-28, 2009 THE PERCENTAGE OCCUPANCY HIT OR MISS TRANSFORM P. Murray 1, S. Marsha 1, and E.Buinger 2 1 Dept. of Eectronic

More information

MCSE Training Guide: Windows Architecture and Memory

MCSE Training Guide: Windows Architecture and Memory MCSE Training Guide: Windows 95 -- Ch 2 -- Architecture and Memory Page 1 of 13 MCSE Training Guide: Windows 95-2 - Architecture and Memory This chapter wi hep you prepare for the exam by covering the

More information

Privacy Preserving Subgraph Matching on Large Graphs in Cloud

Privacy Preserving Subgraph Matching on Large Graphs in Cloud Privacy Preserving Subgraph Matching on Large Graphs in Coud Zhao Chang,#, Lei Zou, Feifei Li # Peing University, China; # University of Utah, USA; {changzhao,zouei}@pu.edu.cn; {zchang,ifeifei}@cs.utah.edu

More information

On-Chip CNN Accelerator for Image Super-Resolution

On-Chip CNN Accelerator for Image Super-Resolution On-Chip CNN Acceerator for Image Super-Resoution Jung-Woo Chang and Suk-Ju Kang Dept. of Eectronic Engineering, Sogang University, Seou, South Korea {zwzang91, sjkang}@sogang.ac.kr ABSTRACT To impement

More information

index.pdf March 17,

index.pdf March 17, index.pdf March 17, 2013 1 ITI 1121. Introduction to omputing II Marce Turcotte Schoo of Eectrica Engineering and omputer Science Linked List (Part 2) Tai pointer ouby inked ist ummy node Version of March

More information

A Method for Calculating Term Similarity on Large Document Collections

A Method for Calculating Term Similarity on Large Document Collections $ A Method for Cacuating Term Simiarity on Large Document Coections Wofgang W Bein Schoo of Computer Science University of Nevada Las Vegas, NV 915-019 bein@csunvedu Jeffrey S Coombs and Kazem Taghva Information

More information

Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles

Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles Hoistic Aggregates in a Networked Word: Distributed Tracking of Approximate Quanties Graham Cormode Be Laboratories cormode@be-abs.com Minos Garofaakis Be Laboratories minos@be-abs.com S. Muthukrishnan

More information

Outerjoins, Constraints, Triggers

Outerjoins, Constraints, Triggers Outerjoins, Constraints, Triggers Lecture #13 Autumn, 2001 Fa, 2001, LRX #13 Outerjoins, Constraints, Triggers HUST,Wuhan,China 358 Outerjoin R S = R S with danging tupes padded with nus and incuded in

More information

Testing Whether a Set of Code Words Satisfies a Given Set of Constraints *

Testing Whether a Set of Code Words Satisfies a Given Set of Constraints * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 6, 333-346 (010) Testing Whether a Set of Code Words Satisfies a Given Set of Constraints * HSIN-WEN WEI, WAN-CHEN LU, PEI-CHI HUANG, WEI-KUAN SHIH AND MING-YANG

More information

Collaborative Approach to Mitigating ARP Poisoning-based Man-in-the-Middle Attacks

Collaborative Approach to Mitigating ARP Poisoning-based Man-in-the-Middle Attacks Coaborative Approach to Mitigating ARP Poisoning-based Man-in-the-Midde Attacks Seung Yeob Nam a, Sirojiddin Djuraev a, Minho Park b a Department of Information and Communication Engineering, Yeungnam

More information

Real-Time Feature Descriptor Matching via a Multi-Resolution Exhaustive Search Method

Real-Time Feature Descriptor Matching via a Multi-Resolution Exhaustive Search Method 297 Rea-Time Feature escriptor Matching via a Muti-Resoution Ehaustive Search Method Chi-Yi Tsai, An-Hung Tsao, and Chuan-Wei Wang epartment of Eectrica Engineering, Tamang University, New Taipei City,

More information

Further Concepts in Geometry

Further Concepts in Geometry ppendix F Further oncepts in Geometry F. Exporing ongruence and Simiarity Identifying ongruent Figures Identifying Simiar Figures Reading and Using Definitions ongruent Trianges assifying Trianges Identifying

More information

University of Illinois at Urbana-Champaign, Urbana, IL 61801, /11/$ IEEE 162

University of Illinois at Urbana-Champaign, Urbana, IL 61801, /11/$ IEEE 162 oward Efficient Spatia Variation Decomposition via Sparse Regression Wangyang Zhang, Karthik Baakrishnan, Xin Li, Duane Boning and Rob Rutenbar 3 Carnegie Meon University, Pittsburgh, PA 53, wangyan@ece.cmu.edu,

More information

arxiv: v1 [cs.cg] 3 Dec 2013

arxiv: v1 [cs.cg] 3 Dec 2013 Linear transformation distance for bichromatic matchings Oswin Aichhozer Luis Barba Thomas Hack Aexander Piz Birgit Vogtenhuber arxiv:1312.0884v1 [cs.cg] 3 Dec 2013 Abstract Let P = B R be a set of 2n

More information

Endoscopic Motion Compensation of High Speed Videoendoscopy

Endoscopic Motion Compensation of High Speed Videoendoscopy Endoscopic Motion Compensation of High Speed Videoendoscopy Bharath avuri Department of Computer Science and Engineering, University of South Caroina, Coumbia, SC - 901. ravuri@cse.sc.edu Abstract. High

More information

Application of Intelligence Based Genetic Algorithm for Job Sequencing Problem on Parallel Mixed-Model Assembly Line

Application of Intelligence Based Genetic Algorithm for Job Sequencing Problem on Parallel Mixed-Model Assembly Line American J. of Engineering and Appied Sciences 3 (): 5-24, 200 ISSN 94-7020 200 Science Pubications Appication of Inteigence Based Genetic Agorithm for Job Sequencing Probem on Parae Mixed-Mode Assemby

More information

Computer Networks. College of Computing. Copyleft 2003~2018

Computer Networks. College of Computing.   Copyleft 2003~2018 Computer Networks Computer Networks Prof. Lin Weiguo Coege of Computing Copyeft 2003~2018 inwei@cuc.edu.cn http://icourse.cuc.edu.cn/computernetworks/ http://tc.cuc.edu.cn Attention The materias beow are

More information

Resource Optimization to Provision a Virtual Private Network Using the Hose Model

Resource Optimization to Provision a Virtual Private Network Using the Hose Model Resource Optimization to Provision a Virtua Private Network Using the Hose Mode Monia Ghobadi, Sudhakar Ganti, Ghoamai C. Shoja University of Victoria, Victoria C, Canada V8W 3P6 e-mai: {monia, sganti,

More information

Proceedings of the International Conference on Systolic Arrays, San Diego, California, U.S.A., May 25-27, 1988 AN EFFICIENT ASYNCHRONOUS MULTIPLIER!

Proceedings of the International Conference on Systolic Arrays, San Diego, California, U.S.A., May 25-27, 1988 AN EFFICIENT ASYNCHRONOUS MULTIPLIER! [1,2] have, in theory, revoutionized cryptography. Unfortunatey, athough offer many advantages over conventiona and authentication), such cock synchronization in this appication due to the arge operand

More information

Functions. 6.1 Modular Programming. 6.2 Defining and Calling Functions. Gaddis: 6.1-5,7-10,13,15-16 and 7.7

Functions. 6.1 Modular Programming. 6.2 Defining and Calling Functions. Gaddis: 6.1-5,7-10,13,15-16 and 7.7 Functions Unit 6 Gaddis: 6.1-5,7-10,13,15-16 and 7.7 CS 1428 Spring 2018 Ji Seaman 6.1 Moduar Programming Moduar programming: breaking a program up into smaer, manageabe components (modues) Function: a

More information

Understanding the Mixing Patterns of Social Networks: The Impact of Cores, Link Directions, and Dynamics

Understanding the Mixing Patterns of Social Networks: The Impact of Cores, Link Directions, and Dynamics Understanding the Mixing Patterns of Socia Networks: The Impact of Cores, Link Directions, and Dynamics [Last revised on May 22, 2011] Abedeaziz Mohaisen Huy Tran Nichoas Hopper Yongdae Kim University

More information

Joint disparity and motion eld estimation in. stereoscopic image sequences. Ioannis Patras, Nikos Alvertos and Georgios Tziritas y.

Joint disparity and motion eld estimation in. stereoscopic image sequences. Ioannis Patras, Nikos Alvertos and Georgios Tziritas y. FORTH-ICS / TR-157 December 1995 Joint disparity and motion ed estimation in stereoscopic image sequences Ioannis Patras, Nikos Avertos and Georgios Tziritas y Abstract This work aims at determining four

More information

Linearity of Saturation for Berge Hypergraphs

Linearity of Saturation for Berge Hypergraphs Linearity of Saturation for Berge Hypergraphs Sean Engish Dánie Gerbner Abhishek Methuku Michae Tait Juy 18, 2018 Abstract For a graph F, we say a hypergraph H is Berge-F if it can be obtained from F be

More information

Efficient Histogram-based Indexing for Video Copy Detection

Efficient Histogram-based Indexing for Video Copy Detection Efficient Histogram-based Indexing for Video Copy Detection Chih-Yi Chiu, Jenq-Haur Wang*, and Hung-Chi Chang Institute of Information Science, Academia Sinica, Taiwan *Department of Computer Science and

More information

CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING

CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING Binbin Dai and Wei Yu Ya-Feng Liu Department of Eectrica and Computer Engineering University of Toronto, Toronto ON, Canada M5S 3G4 Emais:

More information

JOINT IMAGE REGISTRATION AND EXAMPLE-BASED SUPER-RESOLUTION ALGORITHM

JOINT IMAGE REGISTRATION AND EXAMPLE-BASED SUPER-RESOLUTION ALGORITHM JOINT IMAGE REGISTRATION AND AMPLE-BASED SUPER-RESOLUTION ALGORITHM Hyo-Song Kim, Jeyong Shin, and Rae-Hong Park Department of Eectronic Engineering, Schoo of Engineering, Sogang University 35 Baekbeom-ro,

More information

WATERMARKING GIS DATA FOR DIGITAL MAP COPYRIGHT PROTECTION

WATERMARKING GIS DATA FOR DIGITAL MAP COPYRIGHT PROTECTION WATERMARKING GIS DATA FOR DIGITAL MAP COPYRIGHT PROTECTION Shen Tao Chinese Academy of Surveying and Mapping, Beijing 100039, China shentao@casm.ac.cn Xu Dehe Institute of resources and environment, North

More information

Dynamic Symbolic Execution of Distributed Concurrent Objects

Dynamic Symbolic Execution of Distributed Concurrent Objects Dynamic Symboic Execution of Distributed Concurrent Objects Andreas Griesmayer 1, Bernhard Aichernig 1,2, Einar Broch Johnsen 3, and Rudof Schatte 1,2 1 Internationa Institute for Software Technoogy, United

More information

l Tree: set of nodes and directed edges l Parent: source node of directed edge l Child: terminal node of directed edge

l Tree: set of nodes and directed edges l Parent: source node of directed edge l Child: terminal node of directed edge Trees & Heaps Week 12 Gaddis: 20 Weiss: 21.1-3 CS 5301 Fa 2016 Ji Seaman 1 Tree: non-recursive definition Tree: set of nodes and directed edges - root: one node is distinguished as the root - Every node

More information

ECEn 528 Prof. Archibald Lab: Dynamic Scheduling Part A: due Nov. 6, 2018 Part B: due Nov. 13, 2018

ECEn 528 Prof. Archibald Lab: Dynamic Scheduling Part A: due Nov. 6, 2018 Part B: due Nov. 13, 2018 ECEn 528 Prof. Archibad Lab: Dynamic Scheduing Part A: due Nov. 6, 2018 Part B: due Nov. 13, 2018 Overview This ab's purpose is to expore issues invoved in the design of out-of-order issue processors.

More information

Crossing Minimization Problems of Drawing Bipartite Graphs in Two Clusters

Crossing Minimization Problems of Drawing Bipartite Graphs in Two Clusters Crossing Minimiation Probems o Drawing Bipartite Graphs in Two Custers Lanbo Zheng, Le Song, and Peter Eades Nationa ICT Austraia, and Schoo o Inormation Technoogies, University o Sydney,Austraia Emai:

More information

More Relation Model: Functional Dependencies

More Relation Model: Functional Dependencies More Reation Mode: Functiona Dependencies Lecture #7 Autumn, 2001 Fa, 2001, LRX #07 More Reation Mode: Functiona Dependencies HUST,Wuhan,China 152 Functiona Dependencies X -> A = assertion about a reation

More information

Binarized support vector machines

Binarized support vector machines Universidad Caros III de Madrid Repositorio instituciona e-archivo Departamento de Estadística http://e-archivo.uc3m.es DES - Working Papers. Statistics and Econometrics. WS 2007-11 Binarized support vector

More information

Quality Assessment using Tone Mapping Algorithm

Quality Assessment using Tone Mapping Algorithm Quaity Assessment using Tone Mapping Agorithm Nandiki.pushpa atha, Kuriti.Rajendra Prasad Research Schoar, Assistant Professor, Vignan s institute of engineering for women, Visakhapatnam, Andhra Pradesh,

More information

CSE120 Principles of Operating Systems. Prof Yuanyuan (YY) Zhou Scheduling

CSE120 Principles of Operating Systems. Prof Yuanyuan (YY) Zhou Scheduling CSE120 Principes of Operating Systems Prof Yuanyuan (YY) Zhou Scheduing Announcement Homework 2 due on October 25th Project 1 due on October 26th 2 CSE 120 Scheduing and Deadock Scheduing Overview In discussing

More information

Community-Aware Opportunistic Routing in Mobile Social Networks

Community-Aware Opportunistic Routing in Mobile Social Networks IEEE TRANSACTIONS ON COMPUTERS VOL:PP NO:99 YEAR 213 Community-Aware Opportunistic Routing in Mobie Socia Networks Mingjun Xiao, Member, IEEE Jie Wu, Feow, IEEE, and Liusheng Huang, Member, IEEE Abstract

More information

TSR: Topology Reduction from Tree to Star Data Grids

TSR: Topology Reduction from Tree to Star Data Grids 03 Seventh Internationa Conference on Innovative Mobie and Internet Services in biquitous Computing TSR: Topoogy Reduction from Tree to Star Data Grids Ming-Chang Lee #, Fang-Yie Leu *, Ying-ping Chen

More information

Load Balancing by MPLS in Differentiated Services Networks

Load Balancing by MPLS in Differentiated Services Networks Load Baancing by MPLS in Differentiated Services Networks Riikka Susitaiva, Jorma Virtamo, and Samui Aato Networking Laboratory, Hesinki University of Technoogy P.O.Box 3000, FIN-02015 HUT, Finand {riikka.susitaiva,

More information

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Hardware Components Illustrated

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Hardware Components Illustrated Intro to Programming & C++ Unit 1 Sections 1.1-3 and 2.1-10, 2.12-13, 2.15-17 CS 1428 Fa 2017 Ji Seaman 1.1 Why Program? Computer programmabe machine designed to foow instructions Program instructions

More information

Automatic Hidden Web Database Classification

Automatic Hidden Web Database Classification Automatic idden Web atabase Cassification Zhiguo Gong, Jingbai Zhang, and Qian Liu Facuty of Science and Technoogy niversity of Macau Macao, PRC {fstzgg,ma46597,ma46620}@umac.mo Abstract. In this paper,

More information

Enumeration of MSO Queries on Strings with Constant Delay and Logarithmic Updates

Enumeration of MSO Queries on Strings with Constant Delay and Logarithmic Updates Enumeration of MSO Queries on Strings with Constant Deay and Logarithmic Updates ABSTRACT Matthias Niewerth University of Bayreuth We consider the enumeration of MSO queries over strings under updates.

More information

Link Registry Protocol Options

Link Registry Protocol Options Link Registry Protoco Options Norman Finn, March 2017 HUAWEI TECHNOLOGIES CO., LTD. IEEE 802.1 TSN At east two obvious choices for P802.1CS Link Registration Protoco An IS-IS-ike protoco. TCP (Transmission

More information

l A set is a collection of objects of the same l {6,9,11,-5} and {11,9,6,-5} are equivalent. l There is no first element, and no successor of 9.

l A set is a collection of objects of the same l {6,9,11,-5} and {11,9,6,-5} are equivalent. l There is no first element, and no successor of 9. Sets & Hash Tabes Week 13 Weiss: chapter 20 CS 5301 Spring 2018 What are sets? A set is a coection of objects of the same type that has the foowing two properties: - there are no dupicates in the coection

More information

Defense against Low-rate TCP-targeted Denial-of-Service Attacks

Defense against Low-rate TCP-targeted Denial-of-Service Attacks Defense against Low-rate CP-targeted Denia-of-Service Attacks Guang Yang, Mario Gera and M. Y. Sanadidi Computer Science Department, UCLA Los Angees, CA 995 {yangg, gera, medy}@cs.uca.edu e: +-3-825-888,

More information

RDF Objects 1. Alex Barnell Information Infrastructure Laboratory HP Laboratories Bristol HPL November 27 th, 2002*

RDF Objects 1. Alex Barnell Information Infrastructure Laboratory HP Laboratories Bristol HPL November 27 th, 2002* RDF Objects 1 Aex Barne Information Infrastructure Laboratory HP Laboratories Bristo HPL-2002-315 November 27 th, 2002* E-mai: Andy_Seaborne@hp.hp.com RDF, semantic web, ontoogy, object-oriented datastructures

More information

QoS-Aware Data Transmission and Wireless Energy Transfer: Performance Modeling and Optimization

QoS-Aware Data Transmission and Wireless Energy Transfer: Performance Modeling and Optimization QoS-Aware Data Transmission and Wireess Energy Transfer: Performance Modeing and Optimization Dusit Niyato, Ping Wang, Yeow Wai Leong, and Tan Hwee Pink Schoo of Computer Engineering, Nanyang Technoogica

More information

Multiscale Representation of Surfaces by Tight Wavelet Frames with Applications to Denoising

Multiscale Representation of Surfaces by Tight Wavelet Frames with Applications to Denoising Mutiscae Representation of Surfaces by Tight Waveet Frames with Appications to Denoising Bin Dong a, Qingtang Jiang b, Chaoqiang Liu c, and Zuowei Shen c a Department of Mathematics, University of Arizona,

More information

Neural Network Enhancement of the Los Alamos Force Deployment Estimator

Neural Network Enhancement of the Los Alamos Force Deployment Estimator Missouri University of Science and Technoogy Schoars' Mine Eectrica and Computer Engineering Facuty Research & Creative Works Eectrica and Computer Engineering 1-1-1994 Neura Network Enhancement of the

More information

Ad Hoc Networks 11 (2013) Contents lists available at SciVerse ScienceDirect. Ad Hoc Networks

Ad Hoc Networks 11 (2013) Contents lists available at SciVerse ScienceDirect. Ad Hoc Networks Ad Hoc Networks (3) 683 698 Contents ists avaiabe at SciVerse ScienceDirect Ad Hoc Networks journa homepage: www.esevier.com/ocate/adhoc Dynamic agent-based hierarchica muticast for wireess mesh networks

More information

Distributed Approximation of k-service Assignment

Distributed Approximation of k-service Assignment Distributed Approximation of k-service Assignment Magnús M. Hadórsson, Sven Köher 2, and Dror Rawitz 3 Reykjavik University, Iceand, mmh@ru.is 2 University of Freiburg, koehers@informatik.uni-freiburg.de

More information

Adaptive 360 VR Video Streaming: Divide and Conquer!

Adaptive 360 VR Video Streaming: Divide and Conquer! Adaptive 360 VR Video Streaming: Divide and Conquer! Mohammad Hosseini *, Viswanathan Swaminathan * University of Iinois at Urbana-Champaign (UIUC) Adobe Research, San Jose, USA Emai: shossen2@iinois.edu,

More information

Versatile Size-l Object Summaries for Relational Keyword Search

Versatile Size-l Object Summaries for Relational Keyword Search IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, XXXX, YYYY Versatie Size- Object Summaries for Reationa Keyword Search Georgios J. Fakas, Zhi Cai and Nikos Mamouis Abstract The Object

More information