Inverted Index Compression via Online Document Routing

Size: px
Start display at page:

Download "Inverted Index Compression via Online Document Routing"

Transcription

1 Inverted Index Copression via Online Docuent Routing Gal Lavee, Ronny Lepel, Edo Liberty, and Oren Soekh Yahoo! Labs., Haifa, Israel Eail: {gallavee, rlepel, edo, ABSTRACT Modern search engines are expected to ake docuents searchable shortly after they appear on the ever changing Web. To satisfy this requireent, the Web is frequently crawled. Due to the sheer size of their indexes, search engines distribute the crawled docuents aong thousands of servers in a schee called local index-partitioning, such that each server indexes only several illion pages. To ensure docuents fro the sae host (e.g., are distributed uniforly over the servers, for load balancing purposes, rando routing of docuents to servers is coon. To expedite the tie docuents becoe searchable after being crawled, docuents ay be siply appended to the existing index partitions. However, indexing by erely appending docuents, results in larger index sizes since docuent reordering for index copactness is no longer perfored. This, in turn, degrades search query processing perforance which depends heavily on index sizes. A possible way to balance quick docuent indexing with efficient query processing, is to deploy online docuent routing strategies that are designed to reduce index sizes. This work considers the effects of several online docuent routing strategies on the aggregated partitioned index size. We show that there exists a tradeoff between the copression of a partitioned index and the distribution of docuents fro the sae host across the index partitions (i.e., host distribution). We suggest and evaluate several online routing strategies with regard to their copression, host distribution, and coplexity. In particular, we present a ter based routing algorith which is shown analytically to provide better copression results than the industry standard rando routing schee. In addition, our algorith deonstrates coparable copression perforance and host distribution while having uch better running tie coplexity than other docuent routing heuristics. Our findings are validated by experiental evaluation perfored on a large benchark collection of Web pages. Perission to ake digital or hard copies of all or part of this work for personal or classroo use is granted without fee provided that copies are not ade or distributed for profit or coercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific perission and/or a fee. WWW2011 Mar Apr. 1, Hyderabad, India Copyright 2011 ACM X-XXXXX-XX-X/XX/XX...$ Categories and Subject Descriptors H.3. [Inforation Systes]: Inforation Storage and Retrieval General Ters Algoriths Keywords Inverted Index, Index Copression, Index Partitioning, Docuent Routing, Online Algorith 1. INTRODUCTION Modern search engines serve enorous query loads over tens of billions of Web pages, and are expected to deliver fresh and relevant results to users in sub-second tie. Due to the sheer size of the Web, search engines partition their index over thousands of servers, where each server typically processes only several illions docuents. At query tie, the top results retrieved fro all servers are erged to produce the final results, which are returned to the user. The ain data structure used by search engines to efficiently ap queries to Web pages is the inverted index [2, 23]. The inverted index contains a postings list for each unique ter appearing in the corpus. The postings list of a ter consists of the list of docuent identifiers 1 (docids) containing it, which are typically sorted by increasing docid values. The list is represented by encoding the gaps (called dgaps) between successive docids. Another data structure in an inverted index is the lexicon, or dictionary, which is a lookup table that for each ter in the corpus, points to the postings list corresponding to it. It is well known that index size has a ajor effect on query processing throughput. In addition to the direct reduction in eory and disk space, ore copact indexes lead to savings in data transfers and increase the hit rate of eory caches [22, 21]. Therefore, a large body of work has focused on index copaction and copression ethods. The structure described above leaves two ain degrees of freedo for copression optiization. The first is the assignent of docids to docuents (also referred to as docuent reordering). The second is the actual encoding of the dgaps into bits (also referred to as dgap copression). The basic idea behind an effective docid assignent is to 1 Although ters frequencies and offsets within the docuent occupy a ajor portion of odern inverted indexes, we focus here on the docuents identifiers only.

2 assign siilar docuents with close docids, hence, potentially reducing the dgaps since siilar docuents contain any coon ters. Alas, the proble of finding the optial docid assignent is NP-hard and so various heuristics have been proposed in the literature. However, previous work has focused on copacting the inverted index of a single server, whereas large corpora are indexed over thousands of servers. A standard approach distributes the corpus in a rando fashion across the servers, which in particular routes (with high probability) siilar docuents to different servers. While this adversely affects docuent reordering schees, it creates a ore balanced query processing workload across the servers and eases counication bottlenecks as pages of popular Web sites are scattered about evenly across the servers. This work considers a real-tie partitioned indexing scenario where newly arriving content ust be indexed on soe server and ade available for search iediately upon arrival. This prohibits the application of tie consuing stateof-the-art docuent reordering schees. However, the partitioned setting allows a degree of freedo in the for of the policy that routes incoing docuents to the various partitions. We thus apply online docuent routing schees instead of the industry standard rando routing schee to reduce the aggregated size of the partitioned index. We further onitor and constrain the distribution of docuents belonging to the sae hosts across the partitions (referred to as host distribution hereafter). We tested our algoriths on the 25 illion Web page TREC.gov2 collection. Our ain contributions are the following: We introduce the idea of using online docuent routing for aggregated index size copression in locally partitioned indexes. As far as we are aware, this novel approach has not been studied previously. We propose and evaluate several online routing strategies with regard to their copression, host distribution, and coplexity. We show that there exists a tradeoff between the copression of an partitioned index and host distribution across the partitions. In particular, we present a ter-based routing algorith. Adhering to a siple docuent generating odel presented in [11], we show analytically that the algorith provides better copression results than the industry standard rando routing schee. In addition, our algorith deonstrates coparable copression perforance and host distribution while having uch better running tie coplexity than other docuent routing heuristics. The rest of this work is organized as follows. Section 2 surveys related work. Section 3 forally defines our odel, proble and etrics. Algoriths for docuent routing are presented in Section 4, with the ter-based algorith analyzed in Section 5. Our experiental results are reported in Section 6. We conclude in Section RELATED WORK 2.1 Index Partitioning The sheer size of the Web, the enorous nuber of search queries, and the required low latency, enforce a distributed inverted index architecture [2, 4, 5, 23]. To support these requireents, both distribution and replication principles are applied. Replication (also known as irroring) eans aking enough identical copies of the syste so that the required query load can be served, and is beyond the scope of this work. Distribution eans the way the inverted index is partitioned across a collection of nodes. The two ain strategies of partitioning an inverted index are local index-partitioning and global index-partitioning [4]. According to the local index-partitioning strategy (or docuent based partition), each node is responsible for a disjoint subset of docuents in the collection. In the global indexpartitioning strategy (or ter based partition), ters are divided into subsets, such that each node stores postings lists only for a subset of the ters in the collection. Due to various theoretical and practical considerations, odern large-scale search engines follow the local inverted index-partitioning strategy and distribute the docuents across the nodes [4, 5]. Docuents can be assigned to nodes using different policies. For exaple, the hash distribution policy allocates docuents to nodes in a rando fashion by hashing the docuents URLs to yield a node identifier [2]. Various other distribution policies such as round-robin distribution are also possible [14]. 2.2 Inverted Index Copression Herewefocusonasingle nodeofalocal index-partitioning architecture. We consider a siplified odel of an inverted index in which the postings list of ter t holds the list of docids containing t, sorted by increasing value. Denote the list by d t 1,d t 2,...,d t n t, where d t i denotes the docid of the i th docuent containing t out of n t such docuents. The list is actually represented by encoding the first docid and the sequence of gaps (dgaps) between successive identifiers thereafter, i.e. d t 1,d t 2 d t 1,...,d t n t d t n t 1. The two degrees of freedo available for copressing the size of the lists are (a) docid assignent; and (b) dgap encoding. As we focus on the forer, we start by briefly reviewing the latter. dgap encoding techniques ai to copress a sequence of integers. The literature contains schees that encode each gap individually, e.g. Gaa, Delta, Golob-Rice [20] and Zeta [9] encodings, as well as schees that encode certain blocks of gaps, e.g. PForDelta [22, 13] and Siple9 [1]. Additionally, the Interpolative Encoding schee [15] is applied directly on the docids rather than their dgaps, and works well for clustered ter occurrences. In general, the docid assignent proble seeks a perutation of the the docuents that iniizes the invertedindex size under a specific dgap encoding schee. As shown in [6], this proble is NP-hard and various heuristics are used to provide approxiations. The size of an inverted-index is a function of the dgaps. All effective dgap encoding techniques represent saller nubers with fewer bits (about logarithic in the nuber value). Hence, assigning docids in a way which results in saller dgaps is the key for better copression. This principle drives ost works dealing with docid assignent, and they strive to assign close docids to siilar docuents, i.e. docuents that share any ters. Technically, ost works define a graph G = (D,E), where D is the set of docuents, and E is a set of edges representing the siilarity between two docuents d i,d j D. One line of work started by [16] traverses the graph G to find

3 the axial weight path connecting all the nodes, assigning docids accordingly. This is equivalent to the NP-Hard traveling salesan proble (TSP). Several TSP approxiations were applied for docid assignent in [16, 7, 12]. In [16], a siple greedy nearest neighbors (GNN) approach is used to add one edge at a tie. To reduce the coputational load, [7] uses singular value decoposition (SVD) to reduce the diensionality of the ter-docuent atrix. To scale up TSP-based schees [12] proposes a new fraework based on coputing TSP on a reduced sparse graph obtained through locality sensitive hashing. In yet another line of work, the nodes of G are clustered according to their siilarity and close docids are assigned to the nodes (docuents) within each cluster. A top-down approach is used in [8], where the whole collection is recursively split into sub-collections, inserting siilar nodes into the sae sub-collections. Then, the sub-collections are erged into an ordered group of nodes. A botto-up approach called k-scan was proposed in [18]. A hybrid ethod which cobines k-scan clustering and TSP for intra-cluster docid assignent is proposed by [6]. A different approach, which is both highly scalable and highly effective, was proposed for Web collections in [17]. It assigns docids according to the lexicographically sorted order of the docuents URLs, utilizing the fact that URL siilarity is a strong indicator of docuent siilarity. The schee was found to perfor rearkably well on various Web collections indexed as a whole. In all the aforeentioned works, a heuristic of docid assignent or an encoding of dgaps were epirically tested against several collections and copared to the results of other works. In contrast, [11] analyzes the copressibility of a collection whose docuents are generated by a siple probabilistic odel in which ters are chosen independently fro a given distribution. 2.3 Online Probles Online algoriths propose solutions to probles where the input is not known in advance and ust be processed as it arrives in a sequential online fashion [10]. At first glance, the docuent routing proble tackled in this paper resebles two classical online probles - the Metrical Task Syste (MTS) and the Load Balancing (a.k.a. Job Scheduling) probles. MTS is a general fraework for online probles that generalizes several known online probles such as the paging proble and the k-servers proble [10]. An MTS is coposed of two coponents. The first is a etric space M =< S,d >, where S is a set of points in the space and d : S S R is a etric over S. The second coponent of an MTS is a set of tasks R. Each task r R ay be seen as a vector < r 1,r 2,...,r S > of costs where each entry, r i, is the cost of processing the request in state i S. Since the requests ust be processed in the order which they arrive, the cost of an MTS algorith is then the cost of oving fro one state to another plus the cost of processing each request in the current state, given by the forula: n n ALG(σ) = d(alg(i 1),ALG(i))+ r i(alg(i)), i=1 Here σ denotes a particular request sequence of size n and ALG(i) denotes the state of the algoriths iediately before processing request i. i=1 The Load Balancing proble seeks to assign arriving jobs requests to possible achines(one achine per job). Each of the jobs is associated with a cost, and a load balancing algorith s objective is to balance the cuulative cost of jobs assigned to each achine. We ay think of a job j as a vector r j =< r j,1,r j,2,...,r j, >, where r j,i is the cost of job j on achine i [3]. The ajor difference between the docuent routing proble and either MTS or Load Balancing stes fro the fact that the cost associated with appending docuent d to index partition j depends on the sequence of docuents previously routed to j. This doesn t fit at all within the Load Balancing fraework, and causes a factorial explosion of states in MTS that renders its coputation infeasible and its copetitiveness non-existent. Therefore, we cannot transfer the analytical results of MTS or Load Balancing to our setting. Nevertheless, load balancing approaches and intuitions ay still be applied and assessed epirically (see Section 4). 3. FORMAL PROBLEM DEFINITION The online proble setting we tackle is the following. The syste consists of a docuent dispatcher and (initially epty) index partitions. Docuents, odeled as bags of words, arrive in soe arbitrary sequence to the dispatcher (e.g. as the result of a large scale distributed crawl policy). Each docuent belongs to soe Web-host. The dispatcher ust iediately route each docuent to one of the partitions, whereby that partition siply appends the docuent to its existing inverted index. Each index consists of (1) a dictionary, and (2) inverted lists per ter. The inverted lists encode docuent IDs only, i.e. no intradocuent offsets or any other payload. For exaple, consider a docuent d containing ters t 1,...,t k that is appended as the j th docuent of partition l. d will be assigned docid j on l. Any of t 1,...,t k that has not appeared in any of the previous j 1 docuents routed to l will be added to l s dictionary, and docid j will then be appended - by dgap encoding - to the k postings lists corresponding to the ters. The dispatcher attepts to iniize the overall index size, i.e. the su of index sizes on all partitions, while aintaining soe balance between the nuber of pages fro each host residing on each partition. Specifically, our algoriths will typically attept to iniize the overall index size subject to soe constraint on the skewness allowed in the distribution of sae-host pages across the partitions. Since docuents are routed and indexed one at a tie (no buffering is allowed, either at the dispatcher or on the partitions), only single dgap encoding schees apply. Specifically, our experients use Delta encoding. Block-based dgap encoding schees such as PForDelta and Siple9 are not applicable. The following subsections forally define our etrics for index size and host distribution across partitions. 3.1 The Bits per Posting Metric Let a corpus with N overall postings be indexed across partitions, and let T i denote the set of distinct ters on the i th partition. Let t be a ter appearing in n t docuents in soe partition, and denote those docids by d t 1 < d t 2 <... < d t n t. Assue all postings lists are encoded using Delta encoding. Then, the overall size of all postings lists on the

4 i th partition, P i, is given by P i = ( ) n t δ(d t 1)+ δ(d t j d t j 1), t T i j=2 where δ(k) is the length (in bits) of the Delta encoding of the integer k: δ(k) = 1+ log 2 k +2 log 2 (1+ log 2 k ). (1) The overall size of the postings across the partitions, P, is defined as P = P i. i=1 We further define the overhead OH of a partitioned index as the space taken by the dictionaries of the individual partitions. Each entry of the i th dictionary is a pointer into the sequence of postings lists on the i th server, and hence requires log 2 P i bits. Overall, OH = T i log 2 P i. i=1 Finally, the bits per posting etric coes in two flavors, with and without overhead. Those are siply P+OH and N P, respectively. N 3.2 Host-Distribution Measure We evaluate the distribution of pages belonging to N h hosts across partitions using the Chi squared (χ 2 ) distribution [19]. Let N h,i denote the nuber of docuents fro host h on partition i, and denote by N i the total nuber of docuents on partition i. Also, let p h denote the proportion of the docuents belonging to host h of the total nuber of docuents, N. The following host-balancing value B is distributed χ 2 with ( 1)(N h 1) degrees of freedo: B = N h (N h,i N i p h ) 2 i=1 h=1 N i p h Essentially, this expression checks the null hypothesis that the host distribution resulted fro a rando routing of docuents to the partitions. Intuitively, the value corresponding to a docuent routing schee that approxiately preserves the overall host distribution on each of the servers will be low. When the nuber of degrees of freedo is large, the exact χ 2 distribution is difficult to copute. Hence we use a standard Noral approxiation, where the ean is the nuber of degrees of freedo and the variance is twice this nuber. We will thus report the value B ( 1)(N h 1) 2( 1)(Nh 1), which is approxiately distributed N(0, 1). 4. ALGORITHMS This section described several Algoriths for online docuent routing in the fraework defined in Section 3. Rando docuent routing. The rando algorith, coonly used in search engines, siply routes each incoing docuent to a partition chosen uniforly and independently at rando. Greedy docuent routing. The greedy algorith routes each docuent d to the partition whose inverted lists will grow by the least aount when appending d to its existing index. Let d contain the set of ters T d and denote by φ(j,t) the dgap created in the postings list of ter t T d on partition j,j {1,2,...} upon appending d to the current state of partition j. Note that φ(j,t) ranges fro 1 to the nuber of docuents already routed to partition j (in case ter t has not yet appeared there). The change in the size of the inverted lists of partition j upon appending docuent d, (j,d) is thus given by: (j,d) = t T d δ(φ(j,t)), (2) where δ( ) denotes the Delta dgap encoding function (1). Load-balancing docuent routing. The load balancing algorith is inspired by the relationship of our proble to the online load balancing proble of peranent jobs on unrelated achines. In this algorith, which is based on [3], we start with an initial estiate L of the axiu load (i.e., the axiu aggregated index size). We then calculate the noralized load l j,d = l j,d /L and noralized cost of adding the next docuent d, C(j,d) = (j,d)/l, for each partition j = 1... Here (j,d) denotes the cost of adding docuent d to partitionj (seeequation(2)), andl j,d denotesthesizeofpartition j assuing docuent d will be routed to it. Next, we copute the following cost function for all partitions: a l j,d +C(j,d) a l j,d, where a = , and route the docuent to the partition of lowest cost. As ore docuents are added to the index, it is possible that the initial guess of the axiu load, L, will be exceeded. In this case we siply update our axiu load estiate to 2L. More precisely, we check if l j,d + (j,d) > βl, where β is a paraeter (see [3]). For the unrelated achines load balancing proble, this algorith proises an O(log ) copetitive solution. Of course, as our docuent routing algorith is different, this analytical guarantee does not hold. Ter based docuent routing. The ter-based algorith associates a disjoint set of representing ters with every partition. Each new docuent is routed to the partition with which it has the ost overlapping ters. Section 5 proves that this algorith achieves better copression than rando routing, the current de facto standard. The algorith requires a strategy for associating the ters in such a way that the probability of a particular ter to represent each partition is about equal, as is the total probability ass of ters assigned to each partition. Due to the Power-law nature of ter popularities, rando association of ters to partitions creates partitions with ibalanced

5 probability ass of their assigned ters. We therefore used a heuristic approach which first sorts the ters in the corpus according to frequency 2, and then associates the in a zig zag pattern, i.e. the highest frequency ters are associated with partitions 1.. in ascending order; the next highest frequency ters are associated with partitions...1 in descending order; and so on. After this initial phase, we use iterative swapping of the highest frequency ter fro the partition with the axiu cuulative frequency (over all ters associated with it) with the lowest frequency ter fro the partition with the iniu total frequency. This process is allowed to iterate until convergence of the difference between the axial cuulative frequency and the inial cuulative frequency. Note that in practice there s no need to associate all ters in the corpus aong the servers. Our experients will leave ost ters unassigned, effectively ignoring the when aking routing decisions. 4.1 Adding Host-Balancing Constraints We extend both the Greedy and Ter-based routing algoriths by liiting the nuber of docuents fro each host that are assigned to any single partition 3. The resulting routing schees are called Constrained Greedy and Constrained Ter-based, respectively. For each host h of (estiated) size n h, we bound the nuber of its docuents which ay be routed to any single partition. Then, upon arrival of a docuent of host h, the constrained algorith (either Greedy or Ter-based) siply considers only partitions on which the nuber of alreadyrouted docuents belonging to host h is below the bound. We apply two flavors of bounds, denoted b 1(h) and b 2(h), to each host: { b 1(h) = ax α n h { nh b 2(h) = ax +α },3 nh },3 The first bound flavor requires a slack value of α > 1, to allow each partition to hold soe ultiplicative factor of its fair share of docuents fro host h. Our experients use α {1.05,1.1,1.2}. The second bound defines the slack in ters of the (approxiate) standard deviation of the rando routing of h s docuents across the partitions. Our experients use α {1,2,3}. With either flavor, we always allow partitions to hold 3 docuents of each host. This relaxes our constraints over hosts with very few pages relative to the nuber of partitions. 4.2 Ipleentation Notes The Rando routing algorith is obviously trivial to ipleent, with each routing decision being ade by the dispatcher in O(1). All other algoriths copute soe value for routing docuent d to each of the partitions, where the value depends on the set of ters T d contained in d. The Greedy and Load Balancing algoriths each copute a cost, which requires exaining all T d ters in the docuent to be appended, for each of the partitions. These coputations can be done in a centralized fashion in the 2 We assue that approxiate ter statistics are known fro previous crawls. 3 We assue here that the sizes of the hosts are approxiately known due to previous crawls. (3) (4) dispatcher, or in a distributed fashion with an extra round of counication between the dispatcher and each of the partitions. For the centralized coputation, the dispatcher ust keep track of the nuber of docuents routed to each partition, as well as the serial nuber of the last docuent routed to each partition for each ter in the corpus. Effectively, this eans keeping a global dictionary with entries per ter. Let T denote the nuber of distinct ters in the corpus. The centralized coputation requires O(T ) eory at the dispatcher and O( T d ) tie upon dispatching d. In the distributed coputation, the dispatcher sends d to each partition, which coputes its local cost and sends it back to the dispatcher, who then sends an indexing request to the cheapest partition. Each local cost coputation requires O( T d ) tie and requires O(T ) extra eory per partition, as the last docuent ID per ter needs to be kept in the local dictionary 4. Turning to the Ter-based algorith, the centralized coputation requires the dispatcher to keep track of the ters associated with each partition. This requires O(T ) space. The routing decision can then be achieved in O( T d + ) tie while aintaining additional counters, for overall space coplexity of O(T + ). Therefore, coparing centralized ipleentations, this algorith requires about ties less tie and space than the Greedy and Load Balancing algoriths. This becoes significant as grows. In the distributed coputation, each partition can track its own set of associated ters at the cost of O(T ) space across all partitions, or roughly O(T /)per partition. In addition to the above resources, the Constrained versions of the Greedy and Ter-based algoriths ust also store the N h host size estiates (or bounds), as well as the nuber of pages per host that have been routed to each partition. This entails soe extra O(N h ) eory. This eory can be kept at the dispatcher, or (in case of a distributed ipleentation) each partition ay aintain its own counters for the nuber of pages per host it holds. 5. TERM BASED DOCUMENT ROUTING ALGORITHM - ANALYSIS This section shows that the Ter-based routing algorith results in a saller aggregated index size than that of the industry standard rando routing schee under the siplified docuent generation odel of [11]. According to the odel of [11], the N docuents of the corpus ay include up to L unique ters. Docuents are independently generated as follows: each of the L ters is chosen independently with replaceent, according to soe frequency rank distribution {p i} (e.g., power-law or double- Pareto distributions) defined over the vocabulary T. The resulting corpus can be described by a T N boolean terdoc atrix that has a 1 in entry (i,j) if ter i is included in docuent j. Let S i denote the r.v. corresponding to the size of the postings list that encodes the ith row of the ter-doc atrix (i.e. ter t i), when applying soe single dgap encoding (e.g., Delta encoding). In addition, let P i Pr(t i d) = 1 (1 p i) L denote the probability that ter t i is included in docuent d. Moreover, for 1 g N, let X g,i be the 4 While this ID is already encoded in the inverted index, decoding it requires decoding a long sequence of dgaps.

6 r.v. denoting the nuber of dgaps of length g in row i. According to [11, Th. 2] where S(N,P i) = E[S i] = N δ(g)e[x g,i], (5) g=1 E[X g,i] = P i(1 P i) g 1 +P 2 i (1 P i) g 1 (n g), (6) and δ(g) is the nuber of bits needed to encode a dgap of length g. In particular, [11, Sec. 7] shows that S(N,P i) achieves the entropy bound for every ter t i that satisfies 5 w ( ) logn N = Pi = o(1). Hence, for these ters the expected size can be approxiated by 6 S(N,P i) N H(P i). Let ˆT denote the subset of ters that satisfy the conditions of [11, Sec. 7] and randoly divide it into disjoint sets of size ˆT / (a set for each partition). We represent the sets by vectors {V l } l=1 with V l(i) = 1 if t i is included in the lth set and V l (i) = 0 otherwise. A docuent d is assigned to the kth partition if it axiizes the dot product between the docuent ter vector and the corresponding partition-defining ter vector, i.e. k = argax{v l d}. (7) l We note that since the ters of ˆT are randoly and evenly dividedbetweenthepartitions, thedotproducts{v l d} l=1 are identically distributed rando variables. Proposition 1 For the above docuent generating odel, the Ter-based docuent routing algorith achieves saller expected aggregated index size than that of the Rando docuent routing schee. Proof. We start by exaining how the probability of a ter being included in a docuent changes given that the docuent was assigned to a certain partition according to the Ter-based routing algorith. Focusing on the lth partition, P i,l Pr(t i d d l) (a) = Pr(d l ti d) Pr(t i d) Pr(d l) = β i,l P i, (8) where (a) is achieved by applying Bayes theore; P i Pr(t i d), and β i,l Pr(d l t i d)/pr(d l). We note that since the partition representing ters were divided randoly and equally aong the partitions (hence, the decision dot products are identically distributed r.v. s), then the probability that a docuent d is routed to the lth partition P r(d l) 1/. Exaining the definition of β i,l we observe that three cases should be considered: (1) the ter t i is a eber of the set which represents the lth partition, i.e., V l (i) = 1 - in this case it is easily verified that β i,l > 1; (2) the ter t i is a eber of the set which represents soe other partition l l - in this case it is easily verified that β i,l < 1 ; and (3) the ter t i is not representing any partition t i / ˆT - in this case β i,l = 1. Hence, the probability that a ter t i appears in a docuent which was 5 Practically it eans ters with not too high and not too low probability. 6 Here H(q) denotes the binary entropy function H(q) = qlog 2 q (1 q)log 2 (1 q). routed to partition l increases in case t i represents the lth partition, decreases in case t i represents soe other partition, and reains unchanged otherwise. Moreover, since every docuent is always routed to one of the servers, we have that Pr(d l)p i,l = P i 1 P i,l. (9) l=1 l=1. Intuitively, this states that the overall probability of a ter in the corpus reains unchanged regardless of the partitioning. We coplete the proof by showing that the expected aggregated size difference between the Rando routing and the Ter-based routing setups is positive (a) (b) = (S(N/,P i) S(N/,P i,l )) l=1 t i T (S(N/,P i) S(N/,P i,l )) l=1 t i ˆT (c) t i ˆT l=1 N (H(Pi) H(P i,l)) = N ( ) H(P 1 i) H(P i,l ) t i ˆT l=1 (d) > N ( ( )) H(P i) H P i,l = 0, t i ˆT 1 l=1 (10) where (a) holds assuing each partition for both setups includes approxiately N/ docuents due to the strong law of large nubers and since partitions are hoogeneous; (b) is achieved since we show that P i,l = P i for all ters t i / ˆT; (c) is achieved since S(N,P i) N H(P i) for t i ˆT; (d) holds since H( ) is a strictly concave function; and the last equality is due to (9). To corroborate the results of the last Proposition, we generated a corpus using the docuent generation odel of [11], selected the partition representing ter sets, applied ter based docuent routing, and appended the routed docuents to their partitions using Delta encoding. The resulting aggregated bits per posting were copared to those calculated for an identical syste using the Rando docuent routing schee. The coparison shows a consistent although inor reduction in the aggregated index size when Ter-based routing is used. This sall iproveent, while obeying Proposition 1, is far fro the 20% iproveent achieved by the Ter-based routing when applied to the.gov2 corpus (see Section 6). We conjecture that the perforance gap stes fro the siplified nature of the docuent generation odel used to produce the synthetic corpus, as assuing independent docuents of independent ters does not capture the inherent siilarity between real docuents belonging to the sae host (the properties which ake URL docuent ordering so successful [17]). Hence, the probability for.gov2 docuents of the sae host ending up in the sae partition is uch higher than that of two arbitrary docuents generated by the odel of [11]. Accordingly, as indicated by our experients, the difference (10) is expected to be uch larger for real life docuents.

7 6. EXPERIMENTAL RESULTS 6.1 Experiental Setup We use the TREC.gov2 Web corpus, a collection of about 25.2 illion pages crawled fro the gov doain, for the experients 7. After parsing, tokenizing, and reoving all epty docuents, we are left with 24.9 illion docuents across 17,000 hosts, 74.5 illion distinct ters, and 5,705.2 illion postings (distinct ter appearances in docuents). Our experients apply a rando order of docuents arrival and easure the resulting aggregated bits per posting etric for every algorith (see section 3.1). As entioned earlier, we use the Delta dgap encoding schee throughout our experients. 6.2 Index Copression Results Figure 1 plots the Bits per Posting easure achieved by the different online algoriths as functions of the nuber of servers. The left plot focuses on the inverted lists alone (without the overhead incurred by the dictionaries), and shows a downward trend in the nuber of bits per posting needed to encode the partitioned index as the nuber of servers increases. This trend was observed in all algoriths and is ainly an artifact of the additional degrees of freedo present when there are ore partitions to choose fro when routing. However, rando routing also enjoys the increasing nuber of achines - this can be understood intuitively by considering the extree case where there are as any servers as docuents and each server uses only 1 bit to encode each posting. Turning to the right side of the figure which takes into account the dictionary-induced overhead, there is an upward trend in the aount of bits per posting needed to encode the partitioned index. This is caused by overlapping ter sets across the various partitions - overlapping ters require ultiple dictionary entries, up to entries in the worst (and realistic for any words) case. The relative perforance of the various algoriths is the sae in both plots, and so the following discussion is applicable to both. At the two extrees, Rando routing upper bounds the copression results, whereas Greedy routing trups all algoriths and saves about a thirdof the space as copared with Rando. It anages to decrease the overall index size (i.e. including overhead) for ost of the growth in the nuber of partitions. The reason for this is that the greedy algorith anages to assign docuents with siilar ter sets to the sae server, keeping the overlap in ter sets across the partitions relatively sall. As we further discuss in Section 6.3, Greedy and Rando routing also perfor at the two extrees in ters of host distribution, but the Rando perfors best (quite intuitively) while Greedy perfors worst, as sae-host docuents tend to share any ters and will thus tend to be routed by Greedy to the sae sall set of partitions, rather than uniforly across all servers. The Load Balancing algorith shows result very siilar to those of the rando algorith, for a low nuber of partitions ( < 200). Around 200 servers, its curve starts to iprove on that of Rando s, achieving fewer bits per posting. For = 1000, Load Balancing beats out the constrained 7 We note that since it was crawled alost to exhaustion,.gov2 is very dense and includes alost all relevant pages [12]. versions of the Greedy and Ter-based routing schees, but still loses to the unconstrained versions. The Ter-based algorith whose curve is plotted in Figure 1 is based on the association of about 8 illion ters to the various partitions. The ters are those whose frequencies in the corpus range fro 5 to It achieves about 20% iproveent in index size over the Rando algorith at = The constrained versions of both the Greedy and Terbased schees attept to decrease index size while not deviating too uch fro a balanced allocation of each host s pages across the partitions. Qualitatively, they achieve about the average copression of Rando and their non-constrained version. Note that the curves shown in Figure 1 use found flavor b 1(h) with α = 1.2, as defined in Section 4.1. Section 6.3 will further analyze the constrained versions of the algoriths. Other results (not presented here) reveals that the Terbased routing is not so sensitive with regards to the nuber of ters associated with the various partitions. Experients with 6 and 4 illions ters of the unconstrained and constrained Ter-based routing yields siilar index sizes as those presented here for the 8 illion ters version. 6.3 Host Distribution Results This section further studies the trade-off between the quality of copression and the balanced routing of sae-host docuents across the different partitions. The conclusion fro the data points given below is one of no free lunch. Most of the redundancy in the docuent strea that is exploited by the better-copressing routing schees stes fro the ability to route sae-host docuents to a sall set of partitions, rather than scattering the evenly across the partitions. This is yet another indication of why reordering Web collections by URL sorting[17] is so successful. Table 1 shows the noralized Chi-squared host distribution values, as defined in Section 3.2, for the routing schees presented in Figure 1, for several values of. The schees are listed in the order of their copression effectiveness, fro worst (Rando, left) to best (Greedy, right). Observe the following: Rando routing, as expected, achieves values that are typical of a Noral(0, 1) rando variable. The tradeoff between copression (Figure 1) and hostbalancing (Table 1) is near-perfect. The relative order of the schees in ters of the two easures is alost entirely reversed. The sole exception is that the Constrained Greedy schee outperfors the Constrained Ter-based schee in both easures 8. Whenever the Load Balancing schee achieves randolevel host-balancing ( = 10, 40, 100), it is basically siilar to Rando routing in ters of copression. Other than those three points, all other schees distribute sae-host pages with an ibalance that is statistically ipossible to achieve with rando routing. The constrained versions of the Greedy and Terbased schees achieve a host-balancing value that is 8 The difference between Rando and Load Balancing for = 10,40 is just a rando fluctuation and does not contradict the above stateent.

8 13 (a) Agregatted index size without overhead 13 (b) Agregatted index size with overhead Bits per posting Rando Load balancing Constrained ter based (8M ters) Constrained greedy Ter based (8M ters) Greedy Nuber of partitions Bits per posting Rando Load balancing Constrained ter based (8M ters) Constrained greedy Ter based (8M ters) Greedy Nuber of partitions Figure 1: Bits per posting as function of the nuber of partitions for different docuent routing strategies and Delta encoding applied to.gov2 corpus, (a) without and (b) with overhead. Schee Rando Load Constrained Constrained Ter-based Greedy Balancing Ter-based (8M) Greedy (8M) Table 1: Noralized host-distribution values per schee, as functions of the nuber of partitions orders of agnitude lower (i.e. ore balanced) than the corresponding non-constrained schees, while sacrificing about half of the copression iproveent over Rando routing. While they still distribute saehost pages in a anner that is statistically very far fro rando, they are also very far fro the skewed distribution of the non-constrained versions. Next, Table 2 checks the sensitivity of both easures, Bits per Posting and host-balancing, to the bound flavor and value of α as defined in Section 4.1, with the Constrained Ter-based schee over = 400 partitions. The sensitivity is quite inor 9. Still, it preserves the trend of the noralized host-distribution value being larger when the Bits per Posting value is lower. 7. CONCLUSIONS This work introduced the proble of reducing a partitioned index size in a low-latency indexing setting via docuent routing. In such settings, incoing docuents ust be ade searchable quickly, and tie-consuing docuent 9 This was true also for the Constrained Greedy schee, whose results are not shown Constraint type Bits per Noralized Posting host-distribution ax { n h +3 } n h, ax { n h +2 n h }, ax { } 1.2 n h, ax { n h + n h }, ax { } 1.1 n h, ax { } 1.05 n h, Table 2: Noralized host-distribution and bits per postings (without overhead) values, with the Constrained Ter-based schee (8 illion ters) over = 400 partitions, as functions of the constraint paraeter α. reordering algoriths cannot be applied. The industry standard routes docuents randoly to the partitions, evenly distributing sae-host pages across the partitions, which is advantageous for query-tie perforance. However, this results in indexes which do not copress well. We frae docuent routing as an online proble, and present docuent routing schees that result in indexes that are up to

9 30% ore copact. These algoriths can be run in both centralized and distributed fashions. We prove that one such schee Ter-based routing yields better copression than Rando routing over a corpus odel appearing in the literature. Its advantages are accentuated on real-life corpora, where sae-host docuents exhibit high siilarity between the. Ter-based routing is also lightweight in ters of tie and space coplexity as copared with the other non-rando routing schees. For each routing schee, we explored how sae-host pages are spread across the partitions, and found a clear tradeoff between copression and a balanced host distribution. This holds also when the routing algoriths are constrained to not exceed certain ibalance criteria. Essentially, this shows that uch of the redundancy that routing algoriths exploit in Web collections stes fro routing each host s pages unevenly, to a sall set of partitions. We believe that low-latency partitioned index systes offer additional trade-offs between issues that are well understood in batch indexing settings, and plan to pursue such trade-offs in future work. In addition, we plan to investigate weighted versions of the Ter-based routing schee. These weights, which will be functions of the ters frequencies, will refine the current intersection-based score of a docuent with respect to each partition. 8. REFERENCES [1] V. Anh and A. Moffat. Inverted index copression using word-aligned binary codes. In in Proc. Inforation Retrieval, pages 8(1): , Jan [2] A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan. Searching the web. ACM Trans. Interet Technol., 1(1):2 43, [3] J. Aspnes, Y. Azar, A. Fiat, S. Plotkin, and O. Waarts. On-line routing of virtual circuits with applications to load balancing and achine scheduling. J. ACM, 44(3): , [4] C. Badue, R. Baeza-yates, B. Ribeiro-neto, and N. Ziviani. Distributed query processing using partitioned inverted files. In In Proc. of the 9th String Processing and Inforation Retrieval Syposiu (SPIRE 01), pages IEEE CS Press, [5] L. A. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The google cluster architecture. IEEE Micro, 23(2):22 28, April [6] R. Blanco. Index Copression for Inforation Retrieval Systes. PhD thesis, University of A Coruña, [7] R. Blanco and R. Barreiro. Tsp and cluster-based solutions to the reassignent of docuent identifiers. Journal of Inforation Retrieval (IR), 9(4), Sep [8] D. Blandford and G. Blelloch. Index copression through docuent reordering. Data Copression Conference, 0:0342, [9] P. Boldi and S. Vigna. Codes for the world wide web. Internet Matheatics, 2(4): , [10] A. Borodin and R. El-Yaniv. Online coputation and copetitive analysis. Cabridge University Press, New York, NY, USA, [11] F. Chierichetti, R. Kuar, and P. Raghavan. Copressed web indexes. In in Proc. WWW 2009, pages , North-Carolina, USA, Apr [12] S. Ding, J. Attenberg, and T. Suel. Scalable techniques for docuent identifier assignent. In in Proc. WWW 2010, pages , North-Carolina, USA, Apr [13] S. Háan. Super-scalar database copression between ra and cpu-cache. Master s thesis, Centru voor Wiskunde en Inforatica (CWI), [14] J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. Webbase: A repository of web pages. In in Proc. of the Ninth International World-Wide Web Conference (WWW 00), pages , May [15] A. Mofat and L. Stuiver. Binary interpolative coding for effective index copression. Inf. Retr., 3(1):25 47, [16] W. Shieh, T. Chen, J. Shann, and C. Chung. Inverted file copression through docuent identifier reassignent. Inf. Processing and Manageent, 39(1):117ï 131, [17] F. Silvestri. Sorting out the docuent identifier assignent proble. In In Proc. of 29th European Conference on IR Research (ECIR 07), pages , [18] F. Silvestri, R. Perego, and S. Orlando. Assigning docuent identifiers to enhance copressibility of web search engines indexes. In in Proc. of the 2004 ACM Syposiu on Applied Coputing (SAC 04), pages ACM Press, [19] G. Snedecor and W. Cochran. Statistical Methods. Iowa State University Press, eighth edition edition, [20] I. Witten, A. Moffat, and T. Bell. Managing Gigabytes. Morgan Kaufann, [21] H. Yan, S. Ding, and T. Suel. Inverted index copression and query processing with optiized docuent ordering. In in Proc. WWW 2009, Madrid, Spain, Apr [22] J. Zhang, X. Long, and T. Suel. Perforance of copressed inverted list caching in search engines. In in Proc. WWW 2008, pages , Beijing, China, [23] M. Zobel and A. Moffat. Inverted files for text search engines. ACM Coputing Surveys, 38(2), Jul.

Clustering. Cluster Analysis of Microarray Data. Microarray Data for Clustering. Data for Clustering

Clustering. Cluster Analysis of Microarray Data. Microarray Data for Clustering. Data for Clustering Clustering Cluster Analysis of Microarray Data 4/3/009 Copyright 009 Dan Nettleton Group obects that are siilar to one another together in a cluster. Separate obects that are dissiilar fro each other into

More information

Energy-Efficient Disk Replacement and File Placement Techniques for Mobile Systems with Hard Disks

Energy-Efficient Disk Replacement and File Placement Techniques for Mobile Systems with Hard Disks Energy-Efficient Disk Replaceent and File Placeent Techniques for Mobile Systes with Hard Disks Young-Jin Ki School of Coputer Science & Engineering Seoul National University Seoul 151-742, KOREA youngjk@davinci.snu.ac.kr

More information

Oblivious Routing for Fat-Tree Based System Area Networks with Uncertain Traffic Demands

Oblivious Routing for Fat-Tree Based System Area Networks with Uncertain Traffic Demands Oblivious Routing for Fat-Tree Based Syste Area Networks with Uncertain Traffic Deands Xin Yuan Wickus Nienaber Zhenhai Duan Departent of Coputer Science Florida State University Tallahassee, FL 3306 {xyuan,nienaber,duan}@cs.fsu.edu

More information

Geo-activity Recommendations by using Improved Feature Combination

Geo-activity Recommendations by using Improved Feature Combination Geo-activity Recoendations by using Iproved Feature Cobination Masoud Sattari Middle East Technical University Ankara, Turkey e76326@ceng.etu.edu.tr Murat Manguoglu Middle East Technical University Ankara,

More information

Performance Analysis of RAID in Different Workload

Performance Analysis of RAID in Different Workload Send Orders for Reprints to reprints@benthascience.ae 324 The Open Cybernetics & Systeics Journal, 2015, 9, 324-328 Perforance Analysis of RAID in Different Workload Open Access Zhang Dule *, Ji Xiaoyun,

More information

Fair Energy-Efficient Sensing Task Allocation in Participatory Sensing with Smartphones

Fair Energy-Efficient Sensing Task Allocation in Participatory Sensing with Smartphones Advance Access publication on 24 February 2017 The British Coputer Society 2017. All rights reserved. For Perissions, please eail: journals.perissions@oup.co doi:10.1093/cojnl/bxx015 Fair Energy-Efficient

More information

Utility-Based Resource Allocation for Mixed Traffic in Wireless Networks

Utility-Based Resource Allocation for Mixed Traffic in Wireless Networks IEEE IFOCO 2 International Workshop on Future edia etworks and IP-based TV Utility-Based Resource Allocation for ixed Traffic in Wireless etworks Li Chen, Bin Wang, Xiaohang Chen, Xin Zhang, and Dacheng

More information

Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System

Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System Designing High Perforance Web-Based Coputing Services to Proote Teleedicine Database Manageent Syste Isail Hababeh 1, Issa Khalil 2, and Abdallah Khreishah 3 1: Coputer Engineering & Inforation Technology,

More information

Computer Aided Drafting, Design and Manufacturing Volume 26, Number 2, June 2016, Page 13

Computer Aided Drafting, Design and Manufacturing Volume 26, Number 2, June 2016, Page 13 Coputer Aided Drafting, Design and Manufacturing Volue 26, uber 2, June 2016, Page 13 CADDM 3D reconstruction of coplex curved objects fro line drawings Sun Yanling, Dong Lijun Institute of Mechanical

More information

An Efficient Approach for Content Delivery in Overlay Networks

An Efficient Approach for Content Delivery in Overlay Networks An Efficient Approach for Content Delivery in Overlay Networks Mohaad Malli, Chadi Barakat, Walid Dabbous Projet Planète, INRIA-Sophia Antipolis, France E-ail:{alli, cbarakat, dabbous}@sophia.inria.fr

More information

EE 364B Convex Optimization An ADMM Solution to the Sparse Coding Problem. Sonia Bhaskar, Will Zou Final Project Spring 2011

EE 364B Convex Optimization An ADMM Solution to the Sparse Coding Problem. Sonia Bhaskar, Will Zou Final Project Spring 2011 EE 364B Convex Optiization An ADMM Solution to the Sparse Coding Proble Sonia Bhaskar, Will Zou Final Project Spring 20 I. INTRODUCTION For our project, we apply the ethod of the alternating direction

More information

OPTIMAL COMPLEX SERVICES COMPOSITION IN SOA SYSTEMS

OPTIMAL COMPLEX SERVICES COMPOSITION IN SOA SYSTEMS Key words SOA, optial, coplex service, coposition, Quality of Service Piotr RYGIELSKI*, Paweł ŚWIĄTEK* OPTIMAL COMPLEX SERVICES COMPOSITION IN SOA SYSTEMS One of the ost iportant tasks in service oriented

More information

Theoretical Analysis of Local Search and Simple Evolutionary Algorithms for the Generalized Travelling Salesperson Problem

Theoretical Analysis of Local Search and Simple Evolutionary Algorithms for the Generalized Travelling Salesperson Problem Theoretical Analysis of Local Search and Siple Evolutionary Algoriths for the Generalized Travelling Salesperson Proble Mojgan Pourhassan ojgan.pourhassan@adelaide.edu.au Optiisation and Logistics, The

More information

Shortest Path Determination in a Wireless Packet Switch Network System in University of Calabar Using a Modified Dijkstra s Algorithm

Shortest Path Determination in a Wireless Packet Switch Network System in University of Calabar Using a Modified Dijkstra s Algorithm International Journal of Engineering and Technical Research (IJETR) ISSN: 31-869 (O) 454-4698 (P), Volue-5, Issue-1, May 16 Shortest Path Deterination in a Wireless Packet Switch Network Syste in University

More information

Investigation of The Time-Offset-Based QoS Support with Optical Burst Switching in WDM Networks

Investigation of The Time-Offset-Based QoS Support with Optical Burst Switching in WDM Networks Investigation of The Tie-Offset-Based QoS Support with Optical Burst Switching in WDM Networks Pingyi Fan, Chongxi Feng,Yichao Wang, Ning Ge State Key Laboratory on Microwave and Digital Counications,

More information

Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues

Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues Mapping Data in Peer-to-Peer Systes: Seantics and Algorithic Issues Anastasios Keentsietsidis Marcelo Arenas Renée J. Miller Departent of Coputer Science University of Toronto {tasos,arenas,iller}@cs.toronto.edu

More information

Identifying Converging Pairs of Nodes on a Budget

Identifying Converging Pairs of Nodes on a Budget Identifying Converging Pairs of Nodes on a Budget Konstantina Lazaridou Departent of Inforatics Aristotle University, Thessaloniki, Greece konlaznik@csd.auth.gr Evaggelia Pitoura Coputer Science and Engineering

More information

Efficient Estimation of Inclusion Coefficient using HyperLogLog Sketches

Efficient Estimation of Inclusion Coefficient using HyperLogLog Sketches Efficient Estiation of Inclusion Coefficient using HyperLogLog Sketches Azade Nazi, Bolin Ding, Vivek Narasayya, Surajit Chaudhuri Microsoft Research {aznazi, bolind, viveknar, surajitc}@icrosoft.co ABSTRACT

More information

The optimization design of microphone array layout for wideband noise sources

The optimization design of microphone array layout for wideband noise sources PROCEEDINGS of the 22 nd International Congress on Acoustics Acoustic Array Systes: Paper ICA2016-903 The optiization design of icrophone array layout for wideband noise sources Pengxiao Teng (a), Jun

More information

A Hybrid Network Architecture for File Transfers

A Hybrid Network Architecture for File Transfers JOURNAL OF IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 6, NO., JANUARY 9 A Hybrid Network Architecture for File Transfers Xiuduan Fang, Meber, IEEE, Malathi Veeraraghavan, Senior Meber,

More information

A Novel Fast Constructive Algorithm for Neural Classifier

A Novel Fast Constructive Algorithm for Neural Classifier A Novel Fast Constructive Algorith for Neural Classifier Xudong Jiang Centre for Signal Processing, School of Electrical and Electronic Engineering Nanyang Technological University Nanyang Avenue, Singapore

More information

THE rapid growth and continuous change of the real

THE rapid growth and continuous change of the real IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015 47 Designing High Perforance Web-Based Coputing Services to Proote Teleedicine Database Manageent Syste Isail Hababeh, Issa

More information

A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing

A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing A Trajectory Splitting Model for Efficient Spatio-Teporal Indexing Slobodan Rasetic Jörg Sander Jaes Elding Mario A. Nasciento Departent of Coputing Science University of Alberta Edonton, Alberta, Canada

More information

1 Extended Boolean Model

1 Extended Boolean Model 1 EXTENDED BOOLEAN MODEL It has been well-known that the Boolean odel is too inflexible, requiring skilful use of Boolean operators to obtain good results. On the other hand, the vector space odel is flexible

More information

Collaborative Web Caching Based on Proxy Affinities

Collaborative Web Caching Based on Proxy Affinities Collaborative Web Caching Based on Proxy Affinities Jiong Yang T J Watson Research Center IBM jiyang@usibco Wei Wang T J Watson Research Center IBM ww1@usibco Richard Muntz Coputer Science Departent UCLA

More information

Optimal Route Queries with Arbitrary Order Constraints

Optimal Route Queries with Arbitrary Order Constraints IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.?, NO.?,? 20?? 1 Optial Route Queries with Arbitrary Order Constraints Jing Li, Yin Yang, Nikos Maoulis Abstract Given a set of spatial points DS,

More information

Different criteria of dynamic routing

Different criteria of dynamic routing Procedia Coputer Science Volue 66, 2015, Pages 166 173 YSC 2015. 4th International Young Scientists Conference on Coputational Science Different criteria of dynaic routing Kurochkin 1*, Grinberg 1 1 Kharkevich

More information

A Low-Cost Multi-Failure Resilient Replication Scheme for High Data Availability in Cloud Storage

A Low-Cost Multi-Failure Resilient Replication Scheme for High Data Availability in Cloud Storage 216 IEEE 23rd International Conference on High Perforance Coputing A Low-Cost Multi-Failure Resilient Replication Schee for High Data Availability in Cloud Storage Jinwei Liu* and Haiying Shen *Departent

More information

Data & Knowledge Engineering

Data & Knowledge Engineering Data & Knowledge Engineering 7 (211) 17 187 Contents lists available at ScienceDirect Data & Knowledge Engineering journal hoepage: www.elsevier.co/locate/datak An approxiate duplicate eliination in RFID

More information

TensorFlow and Keras-based Convolutional Neural Network in CAT Image Recognition Ang LI 1,*, Yi-xiang LI 2 and Xue-hui LI 3

TensorFlow and Keras-based Convolutional Neural Network in CAT Image Recognition Ang LI 1,*, Yi-xiang LI 2 and Xue-hui LI 3 2017 2nd International Conference on Coputational Modeling, Siulation and Applied Matheatics (CMSAM 2017) ISBN: 978-1-60595-499-8 TensorFlow and Keras-based Convolutional Neural Network in CAT Iage Recognition

More information

Guillotine subdivisions approximate polygonal subdivisions: Part III { Faster polynomial-time approximation schemes for

Guillotine subdivisions approximate polygonal subdivisions: Part III { Faster polynomial-time approximation schemes for Guillotine subdivisions approxiate polygonal subdivisions: Part III { Faster polynoial-tie approxiation schees for geoetric network optiization Joseph S. B. Mitchell y April 19, 1997; Last revision: May

More information

Image Filter Using with Gaussian Curvature and Total Variation Model

Image Filter Using with Gaussian Curvature and Total Variation Model IJECT Vo l. 7, Is s u e 3, Ju l y - Se p t 016 ISSN : 30-7109 (Online) ISSN : 30-9543 (Print) Iage Using with Gaussian Curvature and Total Variation Model 1 Deepak Kuar Gour, Sanjay Kuar Shara 1, Dept.

More information

Solving the Damage Localization Problem in Structural Health Monitoring Using Techniques in Pattern Classification

Solving the Damage Localization Problem in Structural Health Monitoring Using Techniques in Pattern Classification Solving the Daage Localization Proble in Structural Health Monitoring Using Techniques in Pattern Classification CS 9 Final Project Due Dec. 4, 007 Hae Young Noh, Allen Cheung, Daxia Ge Introduction Structural

More information

Survivability Function A Measure of Disaster-Based Routing Performance

Survivability Function A Measure of Disaster-Based Routing Performance Survivability Function A Measure of Disaster-Based Routing Perforance Journal Club Presentation on W. Molisz. Survivability function-a easure of disaster-based routing perforance. IEEE Journal on Selected

More information

Planet Scale Software Updates

Planet Scale Software Updates Planet Scale Software Updates Christos Gkantsidis 1, Thoas Karagiannis 2, Pablo Rodriguez 1, and Milan Vojnović 1 1 Microsoft Research 2 UC Riverside Cabridge, UK Riverside, CA, USA {chrisgk,pablo,ilanv}@icrosoft.co

More information

MULTI-INDEX VOTING FOR ASYMMETRIC DISTANCE COMPUTATION IN A LARGE-SCALE BINARY CODES. Chih-Yi Chiu, Yu-Cyuan Liou, and Sheng-Hao Chou

MULTI-INDEX VOTING FOR ASYMMETRIC DISTANCE COMPUTATION IN A LARGE-SCALE BINARY CODES. Chih-Yi Chiu, Yu-Cyuan Liou, and Sheng-Hao Chou MULTI-INDEX VOTING FOR ASYMMETRIC DISTANCE COMPUTATION IN A LARGE-SCALE BINARY CODES Chih-Yi Chiu, Yu-Cyuan Liou, and Sheng-Hao Chou Departent of Coputer Science and Inforation Engineering, National Chiayi

More information

Gromov-Hausdorff Distance Between Metric Graphs

Gromov-Hausdorff Distance Between Metric Graphs Groov-Hausdorff Distance Between Metric Graphs Jiwon Choi St Mark s School January, 019 Abstract In this paper we study the Groov-Hausdorff distance between two etric graphs We copute the precise value

More information

Distributed Multicast Tree Construction in Wireless Sensor Networks

Distributed Multicast Tree Construction in Wireless Sensor Networks Distributed Multicast Tree Construction in Wireless Sensor Networks Hongyu Gong, Luoyi Fu, Xinzhe Fu, Lutian Zhao 3, Kainan Wang, and Xinbing Wang Dept. of Electronic Engineering, Shanghai Jiao Tong University,

More information

Evaluation of a multi-frame blind deconvolution algorithm using Cramér-Rao bounds

Evaluation of a multi-frame blind deconvolution algorithm using Cramér-Rao bounds Evaluation of a ulti-frae blind deconvolution algorith using Craér-Rao bounds Charles C. Beckner, Jr. Air Force Research Laboratory, 3550 Aberdeen Ave SE, Kirtland AFB, New Mexico, USA 87117-5776 Charles

More information

IN many interactive multimedia streaming applications, such

IN many interactive multimedia streaming applications, such 840 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 8, NO. 5, MAY 06 A Geoetric Approach to Server Selection for Interactive Video Streaing Yaochen Hu, Student Meber, IEEE, DiNiu, Meber, IEEE, and Zongpeng Li, Senior

More information

An Ad Hoc Adaptive Hashing Technique for Non-Uniformly Distributed IP Address Lookup in Computer Networks

An Ad Hoc Adaptive Hashing Technique for Non-Uniformly Distributed IP Address Lookup in Computer Networks An Ad Hoc Adaptive Hashing Technique for Non-Uniforly Distributed IP Address Lookup in Coputer Networks Christopher Martinez Departent of Electrical and Coputer Engineering The University of Texas at San

More information

A Periodic Dynamic Load Balancing Method

A Periodic Dynamic Load Balancing Method 2016 3 rd International Conference on Engineering Technology and Application (ICETA 2016) ISBN: 978-1-60595-383-0 A Periodic Dynaic Load Balancing Method Taotao Fan* State Key Laboratory of Inforation

More information

Detection of Outliers and Reduction of their Undesirable Effects for Improving the Accuracy of K-means Clustering Algorithm

Detection of Outliers and Reduction of their Undesirable Effects for Improving the Accuracy of K-means Clustering Algorithm Detection of Outliers and Reduction of their Undesirable Effects for Iproving the Accuracy of K-eans Clustering Algorith Bahan Askari Departent of Coputer Science and Research Branch, Islaic Azad University,

More information

Real-Time Detection of Invisible Spreaders

Real-Time Detection of Invisible Spreaders Real-Tie Detection of Invisible Spreaders MyungKeun Yoon Shigang Chen Departent of Coputer & Inforation Science & Engineering University of Florida, Gainesville, FL 3, USA {yoon, sgchen}@cise.ufl.edu Abstract

More information

Carving Differential Unit Test Cases from System Test Cases

Carving Differential Unit Test Cases from System Test Cases Carving Differential Unit Test Cases fro Syste Test Cases Sebastian Elbau, Hui Nee Chin, Matthew B. Dwyer, Jonathan Dokulil Departent of Coputer Science and Engineering University of Nebraska - Lincoln

More information

Data Caching for Enhancing Anonymity

Data Caching for Enhancing Anonymity Data Caching for Enhancing Anonyity Rajiv Bagai and Bin Tang Departent of Electrical Engineering and Coputer Science Wichita State University Wichita, Kansas 67260 0083, USA Eail: {rajiv.bagai, bin.tang}@wichita.edu

More information

The Internal Conflict of a Belief Function

The Internal Conflict of a Belief Function The Internal Conflict of a Belief Function Johan Schubert Abstract In this paper we define and derive an internal conflict of a belief function We decopose the belief function in question into a set of

More information

A Learning Framework for Nearest Neighbor Search

A Learning Framework for Nearest Neighbor Search A Learning Fraework for Nearest Neighbor Search Lawrence Cayton Departent of Coputer Science University of California, San Diego lcayton@cs.ucsd.edu Sanjoy Dasgupta Departent of Coputer Science University

More information

A Learning Framework for Nearest Neighbor Search

A Learning Framework for Nearest Neighbor Search A Learning Fraework for Nearest Neighbor Search Lawrence Cayton Departent of Coputer Science University of California, San Diego lcayton@cs.ucsd.edu Sanjoy Dasgupta Departent of Coputer Science University

More information

Scheduling Parallel Real-Time Recurrent Tasks on Multicore Platforms

Scheduling Parallel Real-Time Recurrent Tasks on Multicore Platforms IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL., NO., NOV 27 Scheduling Parallel Real-Tie Recurrent Tasks on Multicore Platfors Risat Pathan, Petros Voudouris, and Per Stenströ Abstract We

More information

A Comparative Study of Two-phase Heuristic Approaches to General Job Shop Scheduling Problem

A Comparative Study of Two-phase Heuristic Approaches to General Job Shop Scheduling Problem IE Vol. 7, No. 2, pp. 84-92, Septeber 2008. A Coparative Study of Two-phase Heuristic Approaches to General Job Shop Scheduling Proble Ji Ung Sun School of Industrial and Managent Engineering Hankuk University

More information

EFFICIENT VIDEO SEARCH USING IMAGE QUERIES A. Araujo1, M. Makar2, V. Chandrasekhar3, D. Chen1, S. Tsai1, H. Chen1, R. Angst1 and B.

EFFICIENT VIDEO SEARCH USING IMAGE QUERIES A. Araujo1, M. Makar2, V. Chandrasekhar3, D. Chen1, S. Tsai1, H. Chen1, R. Angst1 and B. EFFICIENT VIDEO SEARCH USING IMAGE QUERIES A. Araujo1, M. Makar2, V. Chandrasekhar3, D. Chen1, S. Tsai1, H. Chen1, R. Angst1 and B. Girod1 1 Stanford University, USA 2 Qualco Inc., USA ABSTRACT We study

More information

Enhancing Real-Time CAN Communications by the Prioritization of Urgent Messages at the Outgoing Queue

Enhancing Real-Time CAN Communications by the Prioritization of Urgent Messages at the Outgoing Queue Enhancing Real-Tie CAN Counications by the Prioritization of Urgent Messages at the Outgoing Queue ANTÓNIO J. PIRES (1), JOÃO P. SOUSA (), FRANCISCO VASQUES (3) 1,,3 Faculdade de Engenharia da Universidade

More information

Adaptive Parameter Estimation Based Congestion Avoidance Strategy for DTN

Adaptive Parameter Estimation Based Congestion Avoidance Strategy for DTN Proceedings of the nd International onference on oputer Science and Electronics Engineering (ISEE 3) Adaptive Paraeter Estiation Based ongestion Avoidance Strategy for DTN Qicai Yang, Futong Qin, Jianquan

More information

Derivation of an Analytical Model for Evaluating the Performance of a Multi- Queue Nodes Network Router

Derivation of an Analytical Model for Evaluating the Performance of a Multi- Queue Nodes Network Router Derivation of an Analytical Model for Evaluating the Perforance of a Multi- Queue Nodes Network Router 1 Hussein Al-Bahadili, 1 Jafar Ababneh, and 2 Fadi Thabtah 1 Coputer Inforation Systes Faculty of

More information

Closing The Performance Gap between Causal Consistency and Eventual Consistency

Closing The Performance Gap between Causal Consistency and Eventual Consistency Closing The Perforance Gap between Causal Consistency and Eventual Consistency Jiaqing Du Călin Iorgulescu Aitabha Roy Willy Zwaenepoel EPFL ABSTRACT It is well known that causal consistency is ore expensive

More information

A Measurement-Based Model for Parallel Real-Time Tasks

A Measurement-Based Model for Parallel Real-Time Tasks A Measureent-Based Model for Parallel Real-Tie Tasks Kunal Agrawal 1 Washington University in St. Louis St. Louis, MO, USA kunal@wustl.edu https://orcid.org/0000-0001-5882-6647 Sanjoy Baruah 2 Washington

More information

I ve Seen Enough : Incrementally Improving Visualizations to Support Rapid Decision Making

I ve Seen Enough : Incrementally Improving Visualizations to Support Rapid Decision Making I ve Seen Enough : Increentally Iproving Visualizations to Support Rapid Decision Making Sajjadur Rahan Marya Aliakbarpour 2 Ha Kyung Kong Eric Blais 3 Karrie Karahalios,4 Aditya Paraeswaran Ronitt Rubinfield

More information

Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems

Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems Design Optiization of Mixed Tie/Event-Triggered Distributed Ebedded Systes Traian Pop, Petru Eles, Zebo Peng Dept. of Coputer and Inforation Science, Linköping University {trapo, petel, zebpe}@ida.liu.se

More information

Modeling Parallel Applications Performance on Heterogeneous Systems

Modeling Parallel Applications Performance on Heterogeneous Systems Modeling Parallel Applications Perforance on Heterogeneous Systes Jaeela Al-Jaroodi, Nader Mohaed, Hong Jiang and David Swanson Departent of Coputer Science and Engineering University of Nebraska Lincoln

More information

Collection Selection Based on Historical Performance for Efficient Processing

Collection Selection Based on Historical Performance for Efficient Processing Collection Selection Based on Historical Perforance for Efficient Processing Christopher T. Fallen and Gregory B. Newby Arctic Region Supercoputing Center University of Alaska Fairbanks Fairbanks, Alaska

More information

Improve Peer Cooperation using Social Networks

Improve Peer Cooperation using Social Networks Iprove Peer Cooperation using Social Networks Victor Ponce, Jie Wu, and Xiuqi Li Departent of Coputer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 Noveber 5, 2007 Corresponding

More information

The Boundary Between Privacy and Utility in Data Publishing

The Boundary Between Privacy and Utility in Data Publishing The Boundary Between Privacy and Utility in Data Publishing Vibhor Rastogi Dan Suciu Sungho Hong ABSTRACT We consider the privacy proble in data publishing: given a database instance containing sensitive

More information

Fair Resource Allocation for Heterogeneous Tasks

Fair Resource Allocation for Heterogeneous Tasks Fair Resource Allocation for Heterogeneous Tasks Koyel Mukherjee, Partha utta, Gurulingesh Raravi, Thangaraj Balasubraania, Koustuv asgupta, Atul Singh Xerox Research Center India, Bangalore, India 560105

More information

Adaptive Holistic Scheduling for In-Network Sensor Query Processing

Adaptive Holistic Scheduling for In-Network Sensor Query Processing Adaptive Holistic Scheduling for In-Network Sensor Query Processing Hejun Wu and Qiong Luo Departent of Coputer Science and Engineering Hong Kong University of Science & Technology Clear Water Bay, Kowloon,

More information

Predicting x86 Program Runtime for Intel Processor

Predicting x86 Program Runtime for Intel Processor Predicting x86 Progra Runtie for Intel Processor Behra Mistree, Haidreza Haki Javadi, Oid Mashayekhi Stanford University bistree,hrhaki,oid@stanford.edu 1 Introduction Progras can be equivalent: given

More information

QUERY ROUTING OPTIMIZATION IN SENSOR COMMUNICATION NETWORKS

QUERY ROUTING OPTIMIZATION IN SENSOR COMMUNICATION NETWORKS QUERY ROUTING OPTIMIZATION IN SENSOR COMMUNICATION NETWORKS Guofei Jiang and George Cybenko Institute for Security Technology Studies and Thayer School of Engineering Dartouth College, Hanover NH 03755

More information

Summary. Reconstruction of data from non-uniformly spaced samples

Summary. Reconstruction of data from non-uniformly spaced samples Is there always extra bandwidth in non-unifor spatial sapling? Ralf Ferber* and Massiiliano Vassallo, WesternGeco London Technology Center; Jon-Fredrik Hopperstad and Ali Özbek, Schluberger Cabridge Research

More information

Deterministic Voting in Distributed Systems Using Error-Correcting Codes

Deterministic Voting in Distributed Systems Using Error-Correcting Codes IEEE TRASACTIOS O PARALLEL AD DISTRIBUTED SYSTEMS, VOL. 9, O. 8, AUGUST 1998 813 Deterinistic Voting in Distributed Systes Using Error-Correcting Codes Lihao Xu and Jehoshua Bruck, Senior Meber, IEEE Abstract

More information

PERFORMANCE MEASURES FOR INTERNET SERVER BY USING M/M/m QUEUEING MODEL

PERFORMANCE MEASURES FOR INTERNET SERVER BY USING M/M/m QUEUEING MODEL IJRET: International Journal of Research in Engineering and Technology ISSN: 239-63 PERFORMANCE MEASURES FOR INTERNET SERVER BY USING M/M/ QUEUEING MODEL Raghunath Y. T. N. V, A. S. Sravani 2 Assistant

More information

Joint Measurement- and Traffic Descriptor-based Admission Control at Real-Time Traffic Aggregation Points

Joint Measurement- and Traffic Descriptor-based Admission Control at Real-Time Traffic Aggregation Points Joint Measureent- and Traffic Descriptor-based Adission Control at Real-Tie Traffic Aggregation Points Stylianos Georgoulas, Panos Triintzios and George Pavlou Centre for Counication Systes Research, University

More information

Generating Mechanisms for Evolving Software Mirror Graph

Generating Mechanisms for Evolving Software Mirror Graph Journal of Modern Physics, 2012, 3, 1050-1059 http://dx.doi.org/10.4236/jp.2012.39139 Published Online Septeber 2012 (http://www.scirp.org/journal/jp) Generating Mechaniss for Evolving Software Mirror

More information

COMPUTER GENERATED HOLOGRAMS Optical Sciences 627 W.J. Dallas (Monday, August 23, 2004, 12:38 PM) PART III: CHAPTER ONE DIFFUSERS FOR CGH S

COMPUTER GENERATED HOLOGRAMS Optical Sciences 627 W.J. Dallas (Monday, August 23, 2004, 12:38 PM) PART III: CHAPTER ONE DIFFUSERS FOR CGH S COPUTER GEERATED HOLOGRAS Optical Sciences 67 W.J. Dallas (onday, August 3, 004, 1:38 P) PART III: CHAPTER OE DIFFUSERS FOR CGH S Part III: Chapter One Page 1 of 8 Introduction Hologras for display purposes

More information

Analysing Real-Time Communications: Controller Area Network (CAN) *

Analysing Real-Time Communications: Controller Area Network (CAN) * Analysing Real-Tie Counications: Controller Area Network (CAN) * Abstract The increasing use of counication networks in tie critical applications presents engineers with fundaental probles with the deterination

More information

A Beam Search Method to Solve the Problem of Assignment Cells to Switches in a Cellular Mobile Network

A Beam Search Method to Solve the Problem of Assignment Cells to Switches in a Cellular Mobile Network A Bea Search Method to Solve the Proble of Assignent Cells to Switches in a Cellular Mobile Networ Cassilda Maria Ribeiro Faculdade de Engenharia de Guaratinguetá - DMA UNESP - São Paulo State University

More information

Feature Selection to Relate Words and Images

Feature Selection to Relate Words and Images The Open Inforation Systes Journal, 2009, 3, 9-13 9 Feature Selection to Relate Words and Iages Wei-Chao Lin 1 and Chih-Fong Tsai*,2 Open Access 1 Departent of Coputing, Engineering and Technology, University

More information

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Sha M. Kakade Microsoft Research and Wharton, U Penn skakade@icrosoft.co Varun Kanade SEAS, Harvard University

More information

The Flaw Attack to the RTS/CTS Handshake Mechanism in Cluster-based Battlefield Self-organizing Network

The Flaw Attack to the RTS/CTS Handshake Mechanism in Cluster-based Battlefield Self-organizing Network The Flaw Attack to the RTS/CTS Handshake Mechanis in Cluster-based Battlefield Self-organizing Network Zeao Zhao College of Counication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China National

More information

2013 IEEE Conference on Computer Vision and Pattern Recognition. Compressed Hashing

2013 IEEE Conference on Computer Vision and Pattern Recognition. Compressed Hashing 203 IEEE Conference on Coputer Vision and Pattern Recognition Copressed Hashing Yue Lin Rong Jin Deng Cai Shuicheng Yan Xuelong Li State Key Lab of CAD&CG, College of Coputer Science, Zhejiang University,

More information

V.2 Index Compression

V.2 Index Compression V.2 Index Compression Heap s law (empirically observed and postulated): Size of the vocabulary (distinct terms) in a corpus E[ distinct terms in corpus] n with total number of term occurrences n, and constants,

More information

Rule Extraction using Artificial Neural Networks

Rule Extraction using Artificial Neural Networks Rule Extraction using Artificial Neural Networks S. M. Karuzzaan 1 Ahed Ryadh Hasan 2 Abstract Artificial neural networks have been successfully applied to a variety of business application probles involving

More information

Geometry. The Method of the Center of Mass (mass points): Solving problems using the Law of Lever (mass points). Menelaus theorem. Pappus theorem.

Geometry. The Method of the Center of Mass (mass points): Solving problems using the Law of Lever (mass points). Menelaus theorem. Pappus theorem. Noveber 13, 2016 Geoetry. The Method of the enter of Mass (ass points): Solving probles using the Law of Lever (ass points). Menelaus theore. Pappus theore. M d Theore (Law of Lever). Masses (weights)

More information

Privacy-preserving String-Matching With PRAM Algorithms

Privacy-preserving String-Matching With PRAM Algorithms Privacy-preserving String-Matching With PRAM Algoriths Report in MTAT.07.022 Research Seinar in Cryptography, Fall 2014 Author: Sander Sii Supervisor: Peeter Laud Deceber 14, 2014 Abstract In this report,

More information

Scheduling Parallel Task Graphs on (Almost) Homogeneous Multi-cluster Platforms

Scheduling Parallel Task Graphs on (Almost) Homogeneous Multi-cluster Platforms 1 Scheduling Parallel Task Graphs on (Alost) Hoogeneous Multi-cluster Platfors Pierre-François Dutot, Tchiou N Takpé, Frédéric Suter and Henri Casanova Abstract Applications structured as parallel task

More information

Feature Based Registration for Panoramic Image Generation

Feature Based Registration for Panoramic Image Generation IJCSI International Journal of Coputer Science Issues, Vol. 10, Issue 6, No, Noveber 013 www.ijcsi.org 13 Feature Based Registration for Panoraic Iage Generation Kawther Abbas Sallal 1, Abdul-Mone Saleh

More information

A CRYPTANALYTIC ATTACK ON RC4 STREAM CIPHER

A CRYPTANALYTIC ATTACK ON RC4 STREAM CIPHER A CRYPTANALYTIC ATTACK ON RC4 STREAM CIPHER VIOLETA TOMAŠEVIĆ, SLOBODAN BOJANIĆ 2 and OCTAVIO NIETO-TALADRIZ 2 The Mihajlo Pupin Institute, Volgina 5, 000 Belgrade, SERBIA AND MONTENEGRO 2 Technical University

More information

Leveraging Relevance Cues for Improved Spoken Document Retrieval

Leveraging Relevance Cues for Improved Spoken Document Retrieval Leveraging Relevance Cues for Iproved Spoken Docuent Retrieval Pei-Ning Chen 1, Kuan-Yu Chen 2 and Berlin Chen 1 National Taiwan Noral University, Taiwan 1 Institute of Inforation Science, Acadeia Sinica,

More information

POSITION-PATCH BASED FACE HALLUCINATION VIA LOCALITY-CONSTRAINED REPRESENTATION. Junjun Jiang, Ruimin Hu, Zhen Han, Tao Lu, and Kebin Huang

POSITION-PATCH BASED FACE HALLUCINATION VIA LOCALITY-CONSTRAINED REPRESENTATION. Junjun Jiang, Ruimin Hu, Zhen Han, Tao Lu, and Kebin Huang IEEE International Conference on ultiedia and Expo POSITION-PATCH BASED FACE HALLUCINATION VIA LOCALITY-CONSTRAINED REPRESENTATION Junjun Jiang, Ruiin Hu, Zhen Han, Tao Lu, and Kebin Huang National Engineering

More information

A Fast Multi-Objective Genetic Algorithm for Hardware-Software Partitioning In Embedded System Design

A Fast Multi-Objective Genetic Algorithm for Hardware-Software Partitioning In Embedded System Design A Fast Multi-Obective Genetic Algorith for Hardware-Software Partitioning In Ebedded Syste Design 1 M.Jagadeeswari, 2 M.C.Bhuvaneswari 1 Research Scholar, P.S.G College of Technology, Coibatore, India

More information

Approximate String Matching with Reduced Alphabet

Approximate String Matching with Reduced Alphabet Approxiate String Matching with Reduced Alphabet Leena Salela 1 and Jora Tarhio 2 1 University of Helsinki, Departent of Coputer Science leena.salela@cs.helsinki.fi 2 Aalto University Deptartent of Coputer

More information

Ascending order sort Descending order sort

Ascending order sort Descending order sort Scalable Binary Sorting Architecture Based on Rank Ordering With Linear Area-Tie Coplexity. Hatrnaz and Y. Leblebici Departent of Electrical and Coputer Engineering Worcester Polytechnic Institute Abstract

More information

Design and Implementation of Business Logic Layer Object-Oriented Design versus Relational Design

Design and Implementation of Business Logic Layer Object-Oriented Design versus Relational Design Design and Ipleentation of Business Logic Layer Object-Oriented Design versus Relational Design Ali Alharthy Faculty of Engineering and IT University of Technology, Sydney Sydney, Australia Eail: Ali.a.alharthy@student.uts.edu.au

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN International Journal of Scientific & Engineering Research, Volue 4, Issue 0, October-203 483 Design an Encoding Technique Using Forbidden Transition free Algorith to Reduce Cross-Talk for On-Chip VLSI

More information

ELEVATION SURFACE INTERPOLATION OF POINT DATA USING DIFFERENT TECHNIQUES A GIS APPROACH

ELEVATION SURFACE INTERPOLATION OF POINT DATA USING DIFFERENT TECHNIQUES A GIS APPROACH ELEVATION SURFACE INTERPOLATION OF POINT DATA USING DIFFERENT TECHNIQUES A GIS APPROACH Kulapraote Prathuchai Geoinforatics Center, Asian Institute of Technology, 58 Moo9, Klong Luang, Pathuthani, Thailand.

More information

Defining and Surveying Wireless Link Virtualization and Wireless Network Virtualization

Defining and Surveying Wireless Link Virtualization and Wireless Network Virtualization 1 Defining and Surveying Wireless Link Virtualization and Wireless Network Virtualization Jonathan van de Belt, Haed Ahadi, and Linda E. Doyle The Centre for Future Networks and Counications - CONNECT,

More information

A Network-based Seamless Handover Scheme for Multi-homed Devices

A Network-based Seamless Handover Scheme for Multi-homed Devices A Network-based Sealess Handover Schee for Multi-hoed Devices Md. Shohrab Hossain and Mohaed Atiquzzaan School of Coputer Science, University of Oklahoa, Noran, OK 7319 Eail: {shohrab, atiq}@ou.edu Abstract

More information

THE rounding operation is performed in almost all arithmetic

THE rounding operation is performed in almost all arithmetic This is the author's version of an article that has been published in this journal. Changes were ade to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/.9/tvlsi.5.538

More information

Using Imperialist Competitive Algorithm in Optimization of Nonlinear Multiple Responses

Using Imperialist Competitive Algorithm in Optimization of Nonlinear Multiple Responses International Journal of Industrial Engineering & Production Research Septeber 03, Volue 4, Nuber 3 pp. 9-35 ISSN: 008-4889 http://ijiepr.iust.ac.ir/ Using Iperialist Copetitive Algorith in Optiization

More information

Supplementary Section. A. Algorithm. B. Datasets. C. More on Labeling Functions

Supplementary Section. A. Algorithm. B. Datasets. C. More on Labeling Functions Suppleentary Section In this section, we include ore details about our algorith, labeling functions, datasets as well as additional results and coparisons, which could not be included in the ain paper

More information

Scalable search-based image annotation

Scalable search-based image annotation Multiedia Systes DOI 17/s53-8-128-y REGULAR PAPER Scalable search-based iage annotation Changhu Wang Feng Jing Lei Zhang Hong-Jiang Zhang Received: 18 October 27 / Accepted: 15 May 28 Springer-Verlag 28

More information