Efficient and scalable trie-based algorithms for computing set containment relations

Size: px
Start display at page:

Download "Efficient and scalable trie-based algorithms for computing set containment relations"

Transcription

1 Effiient and salale trie-ased algorithms for omputing set ontainment relations Yongming Luo #1, George H. L. Flether #2, Jan Hidders 3, Paul De Bra #4 # Eindhoven University of Tehnology, The Netherlands 1 y.luo@tue.nl, 2 g.h.l.flether@tue.nl, 4 dera@win.tue.nl Delft University of Tehnology, The Netherlands 3 a.j.h.hidders@tudelft.nl Astrat Computing ontainment relations etween massive olletions of sets is a fundamental operation in data management, for example in graph analytis and data mining appliations. Motivated y reent hardware trends, in this paper we present two novel solutions for omputing set-ontainment joins over massive sets: the Patriia Trie-ased Signature Join (PTSJ) and PRETTI+, a Patriia trie enhaned extension of the state-of-theart PRETTI join. The ompat trie struture not only enales effiient use of main-memory, ut also signifiantly oosts the performane of oth approahes. By arefully analyzing the algorithms and onduting extensive experiments with various syntheti and real-world datasets, we show that, in many pratial ases, our algorithms are an order of magnitude faster than the state-of-the-art. I. INTRODUCTION Sets are uiquitous in data proessing and analytis. A fundamental operation on massive olletions of sets is omputing ontainment relations. Indeed, ulk omparison of sets finds many pratial appliations in domains ranging from graph analytial tasks (e.g., [1] [3]) and query optimization [4] to OLAP (e.g., [5], [6]) and data mining systems [7]. As a simple example, onsider an online dating wesite where eah user has an assoiated profile set listing their harateristis suh as hoies, interests, and so forth. User dating preferenes are also indiated y a set of suh harateristis. By exeuting a set-ontainment join of the set of user preferenes with the set of user profiles, the dating wesite an determine all potential dating mathes for users, pairing eah preferene set with all users whose profiles ontain all desired harateristis. A onrete illustration an e found in Tale I. In this paper we onsider effiient and salale solutions to the following formalization of this ommon prolem. Consider two relations R and S, eah having a set-valued attriute set. The set ontainment join of R and S (R S) is defined as R S = {(r, s) r R s S r.set s.set}. State of the art: Due to its fundamental nature, the theory and engineering of set ontainment joins have een intensively studied (e.g., [8] [18]). Existing solutions fall into two general ategories: signature-ased and informationretrieval-ased (IR) methods. Signature-ased methods (e.g., [8] [12]) enode set information into fixed-length it strings (alled signatures), and perform a ontainment hek on the signatures as an initial filter followed y a validation of the TABLE I: Example of set-ontainment join. If we perform a set-ontainment join ( ) etween user profiles and user preferenes, we retrieve mathing pairs {(u 1, p 1 ), (u 1, p 2 ), (u 2, p 3 )}. (a) user profiles id set signature u 1 {, d, f, g} 0111 u 2 {a,, h} 1011 u 3 {a,, d} 1011 () user preferenes id set signature p 1 {, d} 0101 p 2 {, f, g} 0110 p 3 {a,, h} 1011 resulting pairs using atual set omparisons. IR-ased methods (e.g., [13] [16]) uild inverted indexes upon sets storing tuple IDs in the inverted lists. A merge join etween inverted lists will produe tuples that ontain all suh set elements. Typially auxiliary indexes are reated to aelerate inverted index entry look-ups and joins. Most of the fous of the state-of-the-art algorithms has een on disk-ased algorithms (e.g., [11] [13], [15], [16]). Though these algorithms have proven quite effetive for joining massive set olletions, the performane of these solutions is ounded y their underlying in-memory proessing strategies, where less work has een done (see Setion II). For example, PSJ [11] and APSJ [12], two advaned disk-ased algorithms, share the same in-memory proessing strategy with mainmemory algorithm SHJ [8], whih we ll disuss in detail in Setion II-A. To keep up with ever-inreasing data volumes and modern hardware trends we need to push the performane of set-ontainment join to the next level. Therefore, it is essential to revisit (and develop new) in-memory set-ontainment join algorithms. Suh algorithms will serve oth as an essential omponent for main memory dataases [19] as well as uilding loks and inspiration for external memory and other omputation models and platforms. This is hallenging eause existing work has already investigated many possile optimization tehniques, suh as itwise operations [8], ahing [13], reusing result set [14] and so on. Contriutions: Nonetheless, y arefully analyzing the existing solutions and ringing in new data strutures, in this researh we propose two novel in-memory set-ontainment join algorithms that are in many ases an order of magnitude faster than the previous state-of-the-art. In our study, we sale the relations to e joined along three asi dimensions: set

2 ardinality, domain ardinality, and relation size. Here, set ardinality is the size of set values in the relations; domain ardinality is the size of the underlying domain from whih set elements are hosen; and relation size is the numer of tuples in eah relation. The ontriutions of our study are as follows: We propose two novel algorithms for set-ontainment join. One is for the low set ardinality, high domain ardinality setting (PRETTI+); the other is for the remaining senarios (PTSJ). Both algorithms make use of the ompat Patriia trie data struture. Our PTSJ proposal is a signature-ased method. Hene, the length of the signature is a ritial parameter for the algorithm s performane. Therefore, we perform a detailed analysis on PTSJ for determining the proper signature length. We also detail how PTSJ an (1) e easily extended to answer other set-oriented queries, suh as set-similarity joins, and (2) effiiently e adapted to disk-ased environment. We present the results of an extensive empirial study of our solutions on a variety of massive real-world and syntheti datasets whih demonstrate that our algorithms in many ases perform an order of magnitude faster than the previous state-of-the-art and sale well with relation size, set ardinality, and domain ardinality. The rest of the paper is organized as follows. In the next setion, we introdue the state-of-the-art solutions for setontainment join. In Setions III and IV we propose PTSJ and PRETTI+, our two new algorithms. Setion V presents the results of our empirial study of all algorithms. We then onlude in Setion VI with a disussion of future diretions for researh. II. STATE-OF-THE-ART ALGORITHMS In this setion we desrie two effiient in-memory setontainment join algorithms, SHJ and PRETTI. These solutions are representative of the state-of-the-art, and serve as aseline solutions in our later development and experiments. For simpliity we assume in the following that domain values and tuple IDs are represented as integers. A. Signature Hash Join We first introdue the definition of signature [8]. A signature of tuple t (t.sig) an e seen as an output of some hash funtion h (i.e., t.sig = h(t.set)) suh that t 1.set t 2.set h(t 1.set) h(t 2.set). Here the ontainment relation etween two hash values is defined as sig 1 sig 2 sig 1 & sig 2 = 0, where & and denote itwise AND and NOT operations. We will also refer to the relation as suset ontainment when there is no possiility of onfusion. A straightforward implementation of a signature hash funtion is as follows: assume the signature length t.sig is its, all initially set to zero. If integer x is in t.set we set the (x mod )th higher-order it of t.sig to 1. The resulting signature is essentially a ompressed itmap representation of t.set. In the signature olumn of Tale I we show the 4-it signature for eah set in our example relations. Alphaets are mapped to integers starting from 1, in alphaetial order (i.e., a is mapped to 1, to 2, and so forth). Note that tuples u 2 and u 3 have the same signature, ut different set values. The Signature Hash Join (SHJ) was proposed y Helmer and Moerkotte [8]. SHJ uses the signature struture as a onise representation for sets, and uses signature omparisons as filtering operations efore performing real set omparisons. In the spirit of hash join, SHJ works as follows: (1) for eah tuple s in S, ompute s.sig, and insert (s.sig, s) into a hash map (idx); (2) for eah tuple r in R, ompute r.sig, enumerate all susets of r.sig, examine all tuples with suh signatures in the hash map (hene in S), omparing them with r. Pseudo ode of this approah an e found in Algorithm 1 and Algorithm 2. Here we split SHJ into two parts: a generalized signature join framework (Algorithm 1) that an e reused for other algorithms; and, an enumeration algorithm used in SHJ (Algorithm 2) that an e replaed with more effiient algorithms (e.g, Algorithms 4 and 5 elow). Algorithm 1: SIGNATURE JOIN() signature join framework Input: relations S and R Output: pairs of tuple IDs that have the set ontainment relation 1 reate index idx // e.g., in SHJ is a hashmap 2 for eah s S do 3 insert (s.sig, s) into idx 4 for eah r R do 5 suset Call suset enumeration algorithm // e.g., SHJ ENUM(r.sig, idx) 6 for eah s suset do 7 if r.set s.set then 8 output (s, r) Algorithm 2: SHJ ENUM() suset enumeration of SHJ Input: signature, hash tale Output: tuple IDs that have signature ontainment relation 1 reate list 2 for eah suset signature do // enumerate all 3 if suset hash tale then 4 for eah tuple in hash tale[suset] do 5 add tuple to list 6 return list SHJ inspired other algorithms (e.g., PSJ [11] and APSJ [12]). It is one of the most effiient in-memory solutions for omputing set-ontainment join. One drawak of SHJ omes from line 2 of Algorithm 2, where all susets of a given signature are enumerated and validated in the hash map. Though the authors provide a very effiient proedure (with itwise operations) to perform this enumeration, suh a mehanism annot sale with respet to signature length, and

3 therefore annot sale with relation size and set ardinality. Consequently, all algorithms using this mehanism suffer also from the same prolem. In Setion III, we provide a solution to this prolem, with the introdution of an alternative data struture. B. PRETTI Join To the est of our knowledge, PRETTI (PREfix Tree ased set join) [14] is the most reent and effiient in-memory setontainment join algorithm. In ontrast with SHJ, PRETTI operates on the spae of set elements instead of on the spae of signatures. In partiular, PRETTI works as follows: given relations S and R, first uild a prefix tree (trie) ased on the ordered set elements of tuples in S; then uild an inverted file ased on set elements of tuples in R. In the same root-to-leaf path of the trie, tuples of the desendants ontain tuples of the anestors. Then when traversing the trie from root to leaf, at eah node a list of ontainment tuples an e generated y joining the tuples in the node and in the inverted list. The list is passed down the trie for further refinement. A sketh of the PRETTI join an e found in Algorithm 3. The reursive all operates on eah hild of the root node and goes down the tree in a depth-first-searh manner. Figure 1 illustrates the trie struture after inserting sets in user preferenes from Tale I. a h p3 root d p 1 f g p 2 Fig. 1: Trie example for PRETTI, after inserting sets from user preferenes (Tale I) Algorithm 3: PRETTI JOIN() reursively join and output Input: sutree root node, urrent list, inverted index idx Output: pairs of tuple IDs that have signature ontainment relation // Initially, urrent list idx[node.lael] 1 for eah s in node.tuples do 2 for eah r in urrent list do 3 output (s,r) 4 for eah hild of node do 5 hild list = urrent list idx[.lael] 6 PRETTI JOIN(, hild list, idx) Assume we have an inverted index reated for user profiles from Tale I as follows: {a:{u 2, u 3 }, :{u 1 }, :{u 2, u 3 }, d:{u 1, u 3 },...}. Then when PRETTI exeutes on the trie in Figure 1, it first finds all tuples that ontain element y proing the inverted index, whih is {u 1 }. Then the list is arried to s hildren nodes. At node d for instane, the list is joined with the inverted list ontaining element d, whih is {u 1, u 3 }. Sine we see one tuple p 1 on the urrent node, and only u 1 in the list is left, we an onlude that u 1 p 1. Suh ations are performed on all nodes in the trie, and therey PRETTI finds all ontainment relations. PRETTI is a very effiient algorithm. It only traverses the trie one to generate all results. Set omparisons are naturally performed while traversing, and most interestingly, early ontainment results are reused for further omparisons. PRETTI has two main weak points. First, many auxiliary data strutures suh as trie and inverted index are uilt for the algorithm, whih an onsume too muh spae if set ardinality is high. Seond, varied-length set omparisons an e time onsuming in omparison with fixed-length signature omparisons, espeially when set ardinality is high. In our later empirial evaluation we will see that PRETTI an perform quite well for low set ardinality datasets. However, due to exessive main memory onsumption and element omparisons, it annot sale with either larger relations or higher set ardinalities. Later in this paper, we develop extensions to PRETTI to overome this main-memory onsumption prolem. III. PATRICIA TRIE-BASED SIGNATURE JOIN (PTSJ) Let s reonsider SHJ from Setion II-A. After all signatures are omputed, given one signature r.sig, SHJ needs two steps to get its suset results: (1) enumerate all susets of r.sig; (2) hek whether some suset exists in the hash map entry and perform set omparison afterwards. It is diffiult for this mehanism to sale to longer signatures, eause the numer of possile susets of a given signature is exponential (2 ) to the signature length. Therefore in real ases, only part of the signature is used for enumeration purposes (and for reating hash map entries). Based on our experiene, this partial signature length annot even reah 20 its due to its exponential time omplexity. This mehanism essentially limits the possile performane gain of SHJ. However, it is not neessary to enumerate all possile susets, ut rather only those that atually exist in a relation. Hene, we only need O( S ) time to enumerate susets of r.sig (that exists in S). This is the ore idea of our initial algorithm. We will first introdue our algorithm using a simple inary trie, and later with a Patriia trie. A. Trie-ased Signature Join Reall that a trie is a asi tree data struture for storing strings. One property of tries is that strings within a sutree share the same path (prefix) from the root to the sutree. Here we use a inary trie, whih stores inary strings (i.e., signatures) and tuples assoiated with a given signature. After we insert all signatures into the trie, sine signatures have the same length, we get a trie with the height of signature length. From the root, eah level of trie nodes represents one position it in signatures. Tuple IDs and set values are stored in the leaves of the trie. An example of a inary trie an e found in Figure 2. When performing a readth-first searh on a trie, in the end we enumerate all existing signatures y visiting the leaves. If we restrit our searh at eah level of the trie using some given

4 root ranh nodes, ideally the trie should have around 2k nodes. But instead, it will in the worst ase need k( lg 2 k) + 2k nodes. The longer the signature, the more single-ranh nodes it has. Moreover, these nodes all need to e enqueued and visited. In an empirial study, we witnessed that Algorithm 4 performs slower than SHJ. Therefore we laim that Algorithm 4 is not pratial to use, and exlude it from later empirial study (Setion V). 1 p 1 0 p 2 p 3 1 Fig. 2: Trie example, after inserting signatures 0101, 0110, 1011 from user preferenes (Tale I) into an initially empty trie. Here we let left ranhes store signatures with prefix it 0 and right ranhes store signatures with prefix it 1. signature as guidane, we get the suset enumeration algorithm TRIE ENUM(), given in Algorithm 4. The asi idea is that, while traversing the trie level y level, we are examining all signatures it y it. Then if we take the input signature into onsideration, the searh spae shrinks every time a it 0 is enountered. We use a queue to hold nodes whose prefixes are susets of the input signature. When Algorithm 4 finishes, all its of the input signature are examined, and all signatures that are a suset of the input signature are in the queue. We an then diretly perform a set omparison of these tuples with the input tuple, y simply plugging in TRIE ENUM() into line 5 of Algorithm 1. Algorithm 4: TRIE ENUM() suset enumeration using trie Input: signature, trie Output: tuple IDs that have signature ontainment relation 1 reate queue q 2 i 0 3 urrent it signature[i++] 4 enqueue trie.root on q 5 while q.top has hildren do 6 node dequeue from q 7 if urrent it = 0 then // if node.left exists 8 enqueue node.left on q 9 else // if node.left and node.right exist 10 enqueue node.left and node.right on q 11 urrent it signature[i++] 12 return q For example, if we want to find ontainment relations for u 1 in Tale I, we first get its signature Then while we run Algorithm 4, all nodes in the left ranh of Figure 2 are visited and plaed on the queue. In the end, p 1 and p 2 at leaf nodes are returned. A limitation of this approah is that there are many unneessary nodes that only have one hild in the trie (whih we later refer to as single-ranh nodes). We also see this in Figure 2. For k signatures (with its eah), if there are no single- B. Introdue Patriia Trie Knowing what is the weakness, we an improve the design aordingly. To avoid single-ranh nodes, we adopt a data struture alled Patriia trie [20], [21], whih is speifially designed for this purpose. Essentially, a Patriia trie merges single-ranh nodes into one node in a trie, so it an guarantee that all nodes have full ranhes (in our ase two-way ranhes). Of ourse in the worst ase a Patriia trie is not etter than a regular trie, ut as we ll see in the experiments, that rarely happens for randomly-generated and real-world datasets. Figure 3 shows what a Patriia trie would look like if we insert the same signatures as in Figure 2. First, eause there is it differene on position 0, one node is reated on this position. Here, the right ranh has no more splitting points, so it diretly points to For the left ranh, there is another splitting point on position 2, so another node is reated aordingly, and eah signature elongs to one of the ranhes. Overall, 2 extra nodes are reated, and there is no single-ranh node in the trie. 01 root 01 p1 10 p p 3 Fig. 3: Patriia trie example, inserting the same signatures as in Figure 2 into a Patriia trie In this paper we apply a slight modifiation to the original Patriia trie. In our version of a Patriia trie node, we store (1) pointers to the left and right nodes, (2) the indexes at whih point the prefix starts and splits, and (3) the ommon prefix from the last split point to the urrent split point. We define a suset generation proedure on Patriia tries in Algorithm 5. It is similar to Algorithm 4 with the only differene eing that, instead of omparing one it at a time, segments of its (whih ome from merged single-ranh nodes) are ompared at eah node. In the end, signatures that have a ontainment relation are stored in the result list instead of queue q. Naturally, we an again reuse Algorithm 1 (y alling PATRICIA ENUM at line 5) to perform the join. We all this approah Patriia Trie-ased Signature Join (PTSJ). To ontinue our example, if we run the same query u 1 (0111) on Figure 3 using Algorithm 5, we still need to visit the left ranh of the trie. Only at this time, three instead of six nodes need to e traversed. In pratie, signatures an e

5 muh longer and sparse (see Setion V-B), therefore more node visits are saved ompared to Algorithm 4. Algorithm 5: PATRICIA ENUM() suset enumeration using Patriia trie Input: signature, patriia trie Output: tuple IDs that have signature ontainment relation 1 reate queue q 2 reate list result 3 enqueue patriia trie.root on q 4 while q do 5 node dequeue from q 6 if node.prefix signature.prefix then 7 if node.split = signature then 8 add node to result 9 else 10 split it signature[node.split] 11 if split it = 0 then 12 enqueue node.left on q 13 else 14 enqueue node.left and node.right on q 15 return result C. Cost analysis of PTSJ In this setion we give some ost estimation of PTSJ under simple onditions. Some notation we use are given in Tale II. The ost of PTSJ (C PTSJ ) an e roken down to C PTSJ = C reate PT + C query PT + C ompare set, where C reate PT is the ost to uild the Patriia trie on relation S, C query PT is the ost to ompare signatures on the trie, and C ompare set is the ost to atually perform set omparisons. We first identify that C reate PT and C ompare set are not the major ost of PTSJ. Then we dig deeper into C query PT, giving an estimation of how many integer omparisons will it ost. We find that under simple natural assumptions, C query PT is mostly influened y set ardinality and signature length. In the end, ased on these analyses, we propose a strategy to hoose a good signature length for PTSJ. 1) C reate P T and C ompare set : C reate PT : During Patriia trie reation, at most 2 S 1 nodes are reated in total. Even in the worst ase nodes are visited during eah signature insertion. Oviously, C reate PT does not take the major part of PTSJ s running time. C ompare set : Assume that on average N tuples remain for set omparison for eah tuple in R. Then C ompare set = N R. It is easy to see that N dereases when signature length grows, and inreases when R inreases. In general this is a small value (from 10 s to 100 s), proportional to the result output size (see elow). Therefore C ompare set is also not the major ost of PTSJ. Estimation of N: To estimate N, we start with a rather simple situation. Consider two signatures d and q, with set ardinalities (and hene numer of its set to 1 in signature) d and q, resp., and with signature length. We want to know Notation d Int X H N V TABLE II: Notation for ost analysis Explanation Signature length in its Average set ardinality Domain ardinality, set element domain Integer size in its Size of relation X Average height of Patriia trie Numer of tuples in S that have the signature ontainment relation with some tuple in R Numer of trie nodes one query has to visit what is the proaility that d q. For eah element in a set, the proaility that it appears on eah it is 1. For d q to happen, d should have 1 s on only the positions that q has 1 s. For eah element in d, they have q positions to hoose from, so eah element has the proaility q to e a suset. In total, the proaility is ( q ) d, and N = S ( q ) d. We next onsider a more ompliated senario. For example, if d s set ardinality is uniformly distriuted etween 1 and d, then the estimated proaility of d q would e p 1 +p p d d p d (1 p), where p = q. In general, N gets smaller when signature length () grows. High set ardinality query ( q ) tends to have more results, while low set ardinality data ( d ) tend to produe more results. All these intuitions are onfirmed y our formula. The main take-home message here is that N is a small value, so that set omparisons do not take the signifiant part of the overall running time. 2) C query P T : Let s assume that the numer of trie nodes eah tuple in R has to visit is V. Then the numer of omparisons to e done on the trie is C query PT = R V H Int H Here eah node on average ompares its, whih osts x H Int atual integer omparisons. We know that y + 1, so we get the upper ound x y. ( ) C query PT R V H Int + 1. (1) We first examine the integer omparisons. For low ardinality settings, signatures are sparse, so two signatures are more likely to share longer prefixes. In the extreme ase, all nodes share one path (skewed trie), therefore the average trie height H an e as high as 2. So it is rare for a single node to take more than two integer omparisons. For higher ardinalities, the trie tends to e more alaned, and H is a smaller value loser to log 2 (2 S ), ut still grows with respet to. Then we an expet a small ut slowly inreasing value for omparisons per node. The more important fator however is V.

6 Estimation of V : There are ( ) possile signatures with its set to 1. When set ardinality is small (i.e., when ( ) S ), it is highly proale that all possile signatures exist in the trie. For example, in the extreme ase that set ardinality is 1, there are only 2 possile nodes in the trie. Sine 2 << 2 S, the trie is likely to e full. In suh ases, V tends to reah the maximum possile, i.e., 2 H. Here H is approximately 2. This eomes less ovious when and grow to larger values. In suh ases, the trie will not ontain all possile ases, and the average height usually does not reah 2. If we have an all-one signature as the query, all nodes (2 S ) will e visited. Therefore 2 H = S (assume alaned trie). If on the lowest level, only one ranh is inluded, the numer of nodes to visit eomes 2 H H 1. Similarly, if singleranhes happen for the lowest x levels (whih yields the most numer of nodes), we get 2 H x +x 2 H x = (1+x) 2 H x. Furthermore, if we assume the numer of single-ranh nodes in a result is proportional to the numer of zeros in a signature (1 ), so x = (1 ) H, then, the numer of visited nodes is estimated to e ( ( V = 1 + H 1 )) 2 H (1 + H) S (2) Here, we see that with the inrease of S, the numer of visited nodes inreases. Bigger set ardinality also indiates more visited nodes, while longer signatures redue the numer of visited nodes. As we ll see later, we usually selet etween 2 Int and Int, so ( S ) is around 2 even for a million tuples. In suh ase we say the V is ounded y O(H). And if we ring formula 2 into formula 1, we get C query PT is ounded y O( R ). 3) Spae omplexity of PTSJ: Sine to uild a patriia trie for some relation S, only 2 S nodes are reated, and for eah tuple the signature size is usually no more than its set values, the spae omplexity of PTSJ is O( S ). D. Choosing the signature length for PTSJ Beause there is no need for exhaustive suset generation, in pratie, signature length an e set to thousands of its in PTSJ without any prolem. Generally, longer signatures provide more effetive filtering, ut ring more signature omparisons and higher main memory onsumption. So there is a need for finding the alane point for signature length. First of all, there is an asolute upper ound for signature length, whih is domain ardinality d. Letting = d essentially makes the signature a itmap representation of the sets. This numer, in many ases an e ahieved. For example, for a domain that has 1024 elements, the maximum signature length is 1024 Int integers. It is ovious that there is a lower ound for as well, whih is. If <, there is a high hane that all its in a signature are set to 1, whih is not useful anymore. Apart from these two ounds, we find the optimum signature length depends on many properties of input relations, suh as set ardinality, domain ardinality, relation size, and data distriutions. Among these, we notie oth from formula (2) and empirial study (Setion V-B) that the set ardinality has a igger impat on signature length seletion, and usually 2 Int Int an yield a good result. This also prevents the algorithm from using more signature omparisons than set omparisons. If not speified otherwise, we use the lower ound of the range ( 2 Int). Finally, we an set a maximum length in the algorithm, to prevent it from eing extremely long. In our experiments, this limit is set to 256 integers. Overall, our signature length is set to minimum of {d, } 2 Int, 256 Int. An empirial validation for this strategy is presented in Setion V-B. E. Extensions to PTSJ 1) Merge idential sets: With the help of the trie, tuples of the same signature are naturally grouped together. If we go one step further, maintaining a mapping list of tuples that have the same set elements, taking them into onsideration while output, we save the ost of omparing dupliates over time. This strategy is applied in our PTSJ implementation. It works well without introduing notieale overhead while reating the trie, and saves quite some omparisons while performing joins, espeially for real-world datasets. 2) Superset and set-equality joins: While our algorithms are designed for R S, it an e easily modified to perform R S, in ase we want to reuse the existing index on S. Here we take Algorithm 4 as an example to illustrate; Algorithm 5 an e hanged in a similar manner. The only plae that needs to e touhed is the if-else statements (lines 7 to 10). Two ase handling statements should e swithed, as given in Algorithm 6. Furthermore, in Algorithm 1 the set value ontainment hek (line 7) will hange aordingly, to if r.set s.set. Algorithm 6: Replae Algorithm 4 line 7 to 10 for superset join 7 if urrent it = 0 then 8 enqueue node.left and node.right on q 9 else 10 enqueue node.left on q Set-equality joins (R = S) an e answered effiiently as well. In this ase, a simple searh on the trie will return a list of tuples with the same signature. Further set omparisons are needed to validate the searh results. Sine we already merge tuples with the same set values, as disussed aove in Setion III-E1, many set omparisons are saved. 3) Set similarity joins: Apart from eing used for set ontainment omputations, a Patriia trie an e (re)used to answer set similarity join [22] queries as well. Set similarity join has een well-studied in the literature [23]. Solutions that make use of a trie have een proposed as well (e.g., [24], [25]), ut these do not operate on (and annot e easily adapted to) the signature spae as PTSJ does. For instane, given query signature q, we want to find signatures within hamming distane k. We an use Algorithm 7 to ahieve this

7 goal, where we extend Algorithm 4 for illustration purposes. In partiular, we use a ounter to rememer the hamming distane etween some prefix and our query. In the end, all signatures (therefore tuples) that are within the distane are in the queue, waiting for other operations (validation, output) to take ation. Systems suh as OLAP an enefit greatly y reusing one index for different purposes. Algorithm 7: TRIE SSJ() hamming distane set similarity join using trie Input: signature, trie, threshold k Output: tuple IDs that have similar signature within hamming distane k 1 reate queue q 2 i 0 3 urrent it signature[i++] 4 enqueue (trie.root, 0) on q 5 while q.top has hildren do 6 (node, i) dequeue from q 7 if i k then 8 if urrent it = 0 then 9 enqueue (node.left, i) on q 10 enqueue (node.right, i+1) on q 11 else 12 enqueue (node.left, i+1) on q 13 enqueue (node.right, i) on q 14 urrent it signature[i++] 15 return q 4) Disk-ased algorithm: PTSJ an e easily extended to an external memory setting. A straightforward implementation is to perform a nested-loop join over partitions of the data. Here we partition oth relations until one pair of partitions an fit into main memory. Then for eah pair of partitions from oth relations, we load them into main memory and perform the join. In this ase, the algorithm will have a quadrati ehavior with respet to the numer of partitions. Similar tehniques have een applied to other algorithms suh as PRETTI. However, as we disussed, PTSJ has a muh smaller memory footprint than PRETTI, whih makes it more suitale for this strategy. Smarter partitioning tehniques (e.g., [11], [12]) an e integrated into PTSJ as well. F. Disussion SHJ an e viewed as a one-level, multi-way trie, where eah ranh starts with a different prefix. PTSJ, on the other hand, is a multi-level, inary trie. The main enefits of PTSJ over SHJ ome from longer signatures, whih an filter out more unneessary set omparisons. Furthermore, the trie struture guarantees that only interesting suset prefixes are visited, instead of the whole exponential spae. PRETTI, on the other hand, does make use of a trie struture, ut it operates on the set element spae instead of signature spae. The enefit is that it does not need to e validated twie. The downside, however, is that trie height is as high as the set ardinality, making it only suitale for low set ardinality settings. This rings us to an advaned version of PRETTI, using a Patriia trie. IV. PRETTI+ Sine the Patriia trie is so useful for PTSJ, it is natural to ask if this data struture an e used to advantage with PRETTI. We have integrated a Patriia trie with PRETTI, alling this new join algorithm PRETTI+. Modifiations have to e done oth on trie onstrution and on the join proedures. Inserting sets to the trie an e a it trikier than with PTSJ, sine sets are not neessarily of the same size. In Algorithm 8, we show the trie onstrution funtion for PRETTI+. Here we assume eah node maintains a prefix, a set of related tuples, and a set of hildren nodes. The main idea is that, depending on the ommon prefix etween a trie node and the newly arrived set (tuple), the new tuple may e inserted to different positions with respet to the given node. Speifially, the tuple may e inserted to (1) the urrent root, or (2) some sutree of the urrent root, or (3) a newly reated node that eomes a parent of the urrent root, or (4) a newly reated node that is a siling of the urrent root. The ore of Algorithm 8 then is to find the orret insertion position. ah p3 root d p 1 fg p 2 Fig. 4: Trie example for PRETTI+, after inserting sets from user preferenes (Tale I) Algorithm 8: PRETTI+ INSERT() trie onstrution for PRETTI+ Input: sutree root node, tuple s, ursor on s.set: from Output: root for the sutree // insert s.set[from:] to sutree node, here we treat s.set as a string 1 len ommon prefix of node.prefix and s.set[from:] 2 nlen node.prefix 3 tlen s.set[from:] 4 if len = nlen then 5 if len < tlen then 6 some hild of node that mathes s.set[(from+len):] 7 all PRETTI+ INSERT(, s, from + len) 8 else // len = tlen 9 put s into node 10 return node 11 else // len < nlen 12 if len = tlen then 13 reate new node for s, insert new node etween node and its parent 14 else 15 reate new node as parent for node and tuple 16 return new node The join operation is almost the same as for PRETTI,

8 exept that lists of tuples from the inverted index have to e joined several times in eah node, sine eah node holds several set elements. By replaing a standard trie with a Patriia trie, PRETTI+ onsumes muh less main memory than PRETTI. However, set omparisons and tuple list joins still take plae, same as in PRETTI. As we ll see in our empirial study, PRETTI+ is always a etter hoie than PRETTI. V. EMPIRICAL STUDY In this setion we empirially ompare the performane of SHJ, PRETTI, PTSJ, and PRETTI+. We first introdue the experiment settings. Then we validate the signature length seletion strategy disussed aove in Setion III-D. After that we ondut the main omparison of the four algorithms on a variety of syntheti and real-world datasets. A. Experiment setting 1) Syntheti datasets: We reate a data generator to generate syntheti relations. The generator an generate relations with varying sizes, set ardinalities, domain ardinalities, and so on. The distriution of data an vary on oth set ardinality and elements. The distriutions are generated using Apahe Commons Math 1, a roust mathematis and statistis pakage. We start with a simple setting, with uniform distriution on different set ardinalities and set elements. Later we test the algorithms performane on relations with Zipf and Poisson distriutions, whih are ommonly found in real-world senarios. 2) Real-world datasets: We experiment with four representative real-world datasets, overing the senarios of low, medium and high set ardinalities. Some statistis of the datasets 2 are shown in Tale III. TABLE III: Statistis for real-world datasets data R avg. median d flikr orkut twitter wease Flikr-3.5M (flikr): The flikr dataset 3 assoiates photos with tags [26]. Naturally, here we treat tags as sets, to perform a set-ontainment join on photo ids. In this way, we reate the ontainment relation etween photos. Further operations suh as reommendation an e investigated upon suh relations. This is a low set-ardinality senario. Orkut ommunity (orkut): The Orkut dataset 4 ontains relations of people from an online soial network and the ommunities they elong to [27]. Here we treat eah person as a tuple and the ommunities they elong to as a set. Set-ontainment join in this ase, an help people disover new ommunities and new friends with similar hoies. Set Can e downloaded at xirong/index.php?n=dataset.flikr3m 4 ardinality for this dataset is higher than Flikr, and we further keep tuples with 10 to exhiit a low-to-medium set ardinality senario. Twitter k-isimulation (twitter): We derive this dataset from paper [28]. Bisimulation is a method to partition the nodes in a graph, ased on the neighorhood information of nodes. In this dataset, tuples are the partitions of the graph, and sets are the enoded neighorhood information eah partition represents. Here we define the neighorhood of eah node to e within 5 steps from the node. On suh dataset set-ontainment join ould e used for graph similarity detetion and graph query answering. For this dataset, we selet tuples with 30, to exhiit a medium set-ardinality senario. WeBase Outlinks-200 (wease): This dataset is a we graph from the Stanford WeBase projet [29]. We extrat the data 5 using tools from the WeGraph projet [30]. We only keep pages that have more than 200 outlinks, following Melnik et al. [12], to exhiit a high set-ardinality senario. 3) Implementation details: We implement all algorithms in Java. The signature length of SHJ is set to optimal aording to paper [8]. The signature length of PTSJ is set as suggested in setion III-D. For PRETTI and PRETTI+, we maintain a hash map in eah trie node to enale fast aess to hildren while traversing. This is ostly ut neessary for the algorithm to reah its est performane. Note that here we tried various effiient implementations of hash map (e.g., Fastutil 6, CompatColletions 7, Trove 8 ), and we find the HashMap implementation from JDK 7 itself has oth the est performane and lowest main memory onsumption. The opensoure ode of all implemented algorithms is availale online 9. 4) Test environment: All experiments are exeuted on a single mahine (Intel Xeon 2.27 GHz proessor, 12GB main memory, Fedora it Linux, JDK 7). The JVM maximum heap size is set to 5GB, whih we think is a deent setting even for today s omputers. In the experiments we run eah algorithm ten times, and reord the average, standard deviation and median of running times. We oserve in our measurements that the average gives a good estimate of the running time, and the standard deviation is not signifiant when ompared with the overall time. Hene in the following we only show the average running time. We tend to test with igger relations when possile, sine larger relations and longer running times eliminates the random ehavior introdued y OS sheduling. We run programs with taskset ommand, to restrit the exeution on one CPU ore. The running time we later present inlude the time to uild indexes (e.g., hash map for SHJ and trie strutures for the rest algorithms). We notie there is a trend that with the inrease of set ardinality, the perentage of index uild time over running time dereases. This is due to the fat that igger set ardinality leads to more set element omparisons, whih takes a larger portion of running time aordingly. But in general, the index uild time of SHJ and PTSJ are less than 1% and 5% of the overall running time; PRETTI and PRETTI+ on the other hand take more than 70% and 20% of the running time to uild indexes

9 D = 2 10 D = 2 11 D = 2 12 D = 2 13 D = 2 14 (a) Impat of domain ardinality setting = 2 2 = 2 4 = 2 6 = 2 8 = 2 10 () Impat of set ardinality setting R = 2 15 R = 2 16 R = 2 17 R = 2 18 R = 2 19 () Impat of relation size Fig. 5: Performane of PTSJ with different signature length settings For PRETTI and PRETTI+ ertain datasets are too ig to run in the given memory. In suh ases we swith the algorithms to the nested-loop on-disk versions. We notie that PRETTI and PRETTI+ may gain some effiieny y this approah, sine the in-memory trie of a partition an e shallower than the gloal trie. This is more notieale for high set ardinality senarios. Overall when swith to diskased versions, the differenes in ehavior of PRETTI and PRETTI+ are insignifiant, sine the algorithms running times are dominated y omputations instead of disk I/Os. B. The optimal signature length of PTSJ As we disussed, the signature length has a huge impat on PTSJ s performane, sometimes an order of magnitude differene. In Setion III-D, we gave some suggestions on how to hoose signature length. In this setion, we want to empirially validate these suggestions. Given a dataset, there are three main properties: the relation size, the set ardinality, and the domain ardinality. We want to know how these properties affet the ehavior of PTSJ. The strategy of this investigation is to hange one property while keeping the other two fixed. By examining the performane under different signature lengths, we an then learly see whether there is a orrelation etween a ertain property and signature length. Tale IV summarizes the settings for this investigation. fixed parameters TABLE IV: Dataset onfigurations hanging parameter R = 2 17, = 2 4 d {2 10, 2 11, 2 12, 2 13, 2 14 } R = 2 17, d = 2 14 {2 2, 2 4, 2 6, 2 8, 2 10 } = 2 4, d = 2 14 R {2 15, 2 16, 2 17, 2 18, 2 19 } Figure 5 shows the performane results of PTSJ, where the x-axis is the ratio etween signature length and set ardinality. The strategy given in Setion III-D suggests that a ratio etween 16 and 32 is suffiient. In Figure 5a, we see that indeed, a ratio etween 16 and 32 gives the est performane. Domain ardinality does not have a ig impat on the signature seletion. In Figure 5 we show how the algorithm performs under different set ardinality settings. Again PTSJ finds its est performane point etween 16 and 32. We notie that for some high ardinality settings ( = 2 8, 2 10 ), omparing signatures themselves eomes an expensive operation. In these ases shorter signatures are preferred in general. Figure 5 shows the impat of relation size over signature length seletion. We see a slow trend that when relations grow in size, the optimal signature length tends to move to larger values. This is indiated y formula 2, where R is part of the fator. But as we oserve, a ratio etween 16 and 32 an already give a good result. Overall, these experiments support our signature seletion strategy of Setion III-D. A signature of length etween 16 and 32 is usually a good seletion. C. Comparison of algorithms In this setion we disuss the experimental results of the four algorithms on various syntheti datasets. We test on different settings to show the salaility of all algorithms. Figure 6 shows experiments on uniformly distriuted datasets. Figure 7 further shows performane on Poisson and Zipf distriutions. Dataset onfiguration is the same as in Tale IV. 1) Spae effiieny for different algorithms: Main-memory onsumption is an essential fator for evaluating main memory algorithms. Low main-memory onsumption indiates etter salaility of the algorithm with respet to larger datasets. It is not diffiult to get a rough estimation of memory onsumption for the algorithms mentioned in this paper. The main differenes ome from the different data strutures (indexes) eah algorithm uses. For instane, for SHJ, a hash tale has to e uilt; for PRETTI and PRETTI+, a prefix tree and an inverted index; for PTSJ, a patriia trie. In general, two fators influene memory onsumption: (1) relation size R and (2) set ardinality. The influene of relation size is ovious: the numer of hash tale entries grows linearly with relation size, and so does the size of the prefix tree and inverted index, and the Patriia trie. Set ardinality, on the other hand, has a larger impat on PRETTI and PRETTI+, while SHJ and PTSJ are not so sensitive to it.

10 mem. per tuple (yte) d (a) Memory onsumption () Salaility w.r.t. domain ardinality () Salaility w.r.t. set ardinality 10 0 R R R (d) Salaility w.r.t. relation size ( = 2 4 ) (e) Salaility w.r.t. relation size ( = 2 6 ) (f) Salaility w.r.t. relation size ( = 2 8 ) SHJ PRETTI PTSJ PRETTI+ Fig. 6: Comparison of different algorithms for uniformly distriuted data We an learly see this via our experiments. In Figure 6a, we plot, for eah join algorithm, with different set ardinality settings, main memory onsumption per tuple. Here we note that, though the experiment runs with 2 17 tuples, the result stays the same for muh larger relations. This means that we an estimate how muh memory we need, given information aout relation size and set ardinality. We see that the memory onsumption asially has a linear relationship with set ardinality. SHJ, PTSJ and PRETTI+ vary y a onstant fator, whih is asially the ost of longer signatures (PTSJ), patriia trie (PTSJ and PRETTI+) and inverted index (PRETTI+). PRETTI on the other hand, needs around ten times more main-memory than others. For a relation with set ardinality 2 6, it needs more than 10KB per tuple, whih means 10GB for just one million tuples. This empirially sustantiates our remarks on PRETTI. 2) Salaility with different domain ardinality settings: Figure 6 depits performane with different domain ardinality settings. We see that the signature-ased solutions (SHJ and PTSJ) are not sensitive to hanges in domain ardinality, sine they operate on the signature spae instead of on the set element spae. PRETTI and PRETTI+, on the other hand, operate diretly on the set element spae. Larger domain ardinality indiates more entries in the inverted index, and shorter inverted lists (therefore faster merge joins on the lists). So PRETTI and PRETTI+ perform etter when domain ardinality is high. 3) Salaility with different set ardinality settings: In order to determine the salaility of the algorithms with respet to set ardinality, we set the relation size to 2 17, with average set ardinality varying from 2 2 to The very high set ardinality senarios (2 10 ) are not unommon, espeially in the ontext of graph analytis. We ll see more data of this kind from experiments with real data. In Figure 6, we see that PRETTI and PRETTI+ are oth more sensitive to set ardinalities, ompared to the signature-ased solutions. When set ardinality is lower (elow 2 5 ), PRETTI+ is a etter hoie over the other alternatives; ut eyond that point, PTSJ is a etter hoie. In eah ase, one of our new algorithms will ahieve nearly an order of magnitude performane gain over the est of SHJ and PRETTI. 4) Salaility with different relation sizes: Algorithm salaility with respet to relation size may e the most important fator in pratie. From Figure 6d to 6f, we show performane with differene set ardinality senarios ( = 2 4, 2 6, 2 8 ). Just as we saw earlier, for low ardinality settings (Figure 6d), PRETTI+ is a lear winner, followed y PTSJ, PRETTI and SHJ. When set ardinality grows, the advantages of signatureased solutions start to show. PTSJ eomes a etter hoie over the others. The differene eomes more signifiant with larger relation sizes. In Figure 6f we see that in many ases in-memory PRETTI (and PRETTI+) annot finish the experiments, so we swith the algorithm to a disk-ased nested-loop version. 5) Poisson distriution and Zipf distriution: Here we want to determine if different distriutions on the set ardinality and set elements have an impat on performane. We test datasets ( R = 2 17 ) with two distriutions: Poisson distriution and Zipf distriution, whih are widely found in real-world datasets. Distriutions are applied to either set ardinality or set

11 max (a) Salaility w.r.t. set ardinality, with poisson distriution on set ardinality () Salaility w.r.t. set ardinality, with poisson distriution on set element () Salaility w.r.t. set ardinality, with zipf distriution on set ardinality (d) Salaility w.r.t. set ardinality, with zipf distriution on set element SHJ PRETTI PTSJ PRETTI+ Fig. 7: Comparison of different algorithms for skewed distriutions elements. We expet that the distriution on set ardinality will have a greater impat, as shown previously. Unless speified otherwise, the x-axis shows the average set ardinalities. In Figure 7a we show datasets with Poisson distriution on set ardinalities. This setting is ad news for PRETTI and PRETTI+, eause then the set ardinality an e potentially large. We see that indeed, even when = 2 3, PRETTI and PRETTI+ are not ompetitive with PTSJ. Indeed, PTSJ performs the est in all ases. Figure 7 shows Poisson distriution on set elements. This distriution does not make a signifiant differene for all algorithms, whih ehave as in Figure 6. Zipf distriution on set ardinality favors PRETTI and PRETTI+. As in Figure 7, we see that PRETTI+ eomes the est solution on all settings. Note that in this ase the x- axis is the maximum set ardinality instead of average. Sine follows a Zipf distriution, many sets have small and only a few have larger ones. In fat, the median set ardinality for the dataset with max = 2 9 is only 17. This explains why PRETTI+ performs so well. Zipf distriution on set elements, as in Figure 7, does not have a huge impat on performane differenes. PRETTI and PRETTI+ perform slightly etter than in uniform distriution, sine they ould produe results earlier due to the nature of Zipf distriution (frequent elements are plaed near the trie root). Overall, our oservation is that distriutions on set ardinality has a large impat on performane. In suh ases, we need to not only examine the average set ardinality, ut also the median of set ardinality of data, for hoosing the right algorithm. Nonetheless, either PTSJ or PRETTI+ will e the est hoie, with sometimes a 10-fold speedup ompared with the urrent state-of-the-art. D. Experiments on real-world datasets Figure 8 summarizes performane on various real-world datasets, where we plot the ratio of a ertain algorithm s running time over the est algorithm for that dataset. We see that the performane an vary in an order of magnitude for many algorithms. In low-to-medium set ardinality settings (flikr, orkut), PRETTI+ is the lear winner, where signature ased methods, even PTSJ, are at least three times slower. SHJ in these two ases runs longer than a day. When it omes to medium-to-high set ardinality settings (twitter), however, the enefit of signatures starts to appear, PTSJ an make the omputation 3.6 times faster than the seond est (SHJ). For wease, PTSJ again is at least 8 times faster than the stateof-the-art, 2.6 times faster than PRETTI > > flikr orkut twitter wease SHJ PRETTI PTSJ PRETTI+ Fig. 8: Algorithm performane omparison for different realworld datasets VI. CONCLUSION AND FUTURE WORK Motivated y reent hardware trends and pratial appliations from graph analytis, query proessing, OLAP systems, and data mining tasks, in this paper we proposed and studied two effiient and salale set-ontainment join algorithms: PTSJ and PRETTI+. The latter is suitale for low set ardinality, high domain ardinality settings, while the former is a more ommon algorithm suitale for the other senarios. As shown in the experiments, these two new algorithms an e in many ases remarkaly faster than the existing state-of-the-art, and sale graefully with set ardinality, domain ardinality,

Incremental Mining of Partial Periodic Patterns in Time-series Databases

Incremental Mining of Partial Periodic Patterns in Time-series Databases CERIAS Teh Report 2000-03 Inremental Mining of Partial Periodi Patterns in Time-series Dataases Mohamed G. Elfeky Center for Eduation and Researh in Information Assurane and Seurity Purdue University,

More information

Path Sharing and Predicate Evaluation for High-Performance XML Filtering*

Path Sharing and Predicate Evaluation for High-Performance XML Filtering* Path Sharing and Prediate Evaluation for High-Performane XML Filtering Yanlei Diao, Mihael J. Franklin, Hao Zhang, Peter Fisher EECS, University of California, Berkeley {diaoyl, franklin, nhz, fisherp}@s.erkeley.edu

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

A Support-Based Algorithm for the Bi-Objective Pareto Constraint

A Support-Based Algorithm for the Bi-Objective Pareto Constraint A Support-Based Algorithm for the Bi-Ojetive Pareto Constraint Renaud Hartert and Pierre Shaus UCLouvain, ICTEAM, Plae Sainte Bare 2, 1348 Louvain-la-Neuve, Belgium {renaud.hartert, pierre.shaus,}@ulouvain.e

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

A Support-Based Algorithm for the Bi-Objective Pareto Constraint

A Support-Based Algorithm for the Bi-Objective Pareto Constraint Proeedings of the Twenty-Eighth AAAI Conferene on Artifiial Intelligene A Support-Based Algorithm for the Bi-Ojetive Pareto Constraint Renaud Hartert and Pierre Shaus UCLouvain, ICTEAM, Plae Sainte Bare

More information

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks Query Evaluation Overview Query Optimization: Chap. 15 CS634 Leture 12 SQL query first translated to relational algebra (RA) Atually, some additional operators needed for SQL Tree of RA operators, with

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

Chapter 2: Introduction to Maple V

Chapter 2: Introduction to Maple V Chapter 2: Introdution to Maple V 2-1 Working with Maple Worksheets Try It! (p. 15) Start a Maple session with an empty worksheet. The name of the worksheet should be Untitled (1). Use one of the standard

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

A Compressed Breadth-First Search for Satisfiability

A Compressed Breadth-First Search for Satisfiability A Compressed Breadth-First Searh for Satisfiaility DoRon B. Motter and Igor L. Markov Department of EECS, University of Mihigan, 1301 Beal Ave, Ann Aror, MI 48109-2122 dmotter, imarkov @ees.umih.edu Astrat.

More information

Routing Protocols for Wireless Ad Hoc Networks Hybrid routing protocols Theofanis Kilinkaridis

Routing Protocols for Wireless Ad Hoc Networks Hybrid routing protocols Theofanis Kilinkaridis Routing Protools for Wireless Ad Ho Networks Hyrid routing protools Theofanis Kilinkaridis tkilinka@.hut.fi Astrat This paper presents a partiular group of routing protools that aim to omine the advantages

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

Mining Edge-Weighted Call Graphs to Localise Software Bugs

Mining Edge-Weighted Call Graphs to Localise Software Bugs Mining Edge-Weighted Call Graphs to Loalise Software Bugs Frank Eihinger, Klemens Böhm, and Matthias Huer Institute for Program Strutures and Data Organisation (IPD), Universität Karlsruhe (TH), Germany

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

An Experimental Study of Fractional Cooperation in Wireless Mesh Networks

An Experimental Study of Fractional Cooperation in Wireless Mesh Networks An Experimental tudy of Frational Cooperation in Wireless Mesh Networks Anthony Cale, Nariman Farsad, and Andrew W. Ekford Dept. of Computer iene and Engineering, York University 47 Keele treet, Toronto,

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

Sparse Certificates for 2-Connectivity in Directed Graphs

Sparse Certificates for 2-Connectivity in Directed Graphs Sparse Certifiates for 2-Connetivity in Direted Graphs Loukas Georgiadis Giuseppe F. Italiano Aikaterini Karanasiou Charis Papadopoulos Nikos Parotsidis Abstrat Motivated by the emergene of large-sale

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Figure 1. LBP in the field of texture analysis operators.

Figure 1. LBP in the field of texture analysis operators. L MEHODOLOGY he loal inary pattern (L) texture analysis operator is defined as a gray-sale invariant texture measure, derived from a general definition of texture in a loal neighorhood. he urrent form

More information

Test Case Generation from UML State Machines

Test Case Generation from UML State Machines Test Case Generation from UML State Mahines Dirk Seifert Loria Université Nany 2 Campus Sientifique, BP 239 F-54506 Vandoeuvre lès Nany edex Dirk.Seifert@Loria.fr inria-00268864, version 2-23 Apr 2008

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

Boosted Random Forest

Boosted Random Forest Boosted Random Forest Yohei Mishina, Masamitsu suhiya and Hironobu Fujiyoshi Department of Computer Siene, Chubu University, 1200 Matsumoto-ho, Kasugai, Aihi, Japan {mishi, mtdoll}@vision.s.hubu.a.jp,

More information

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks Flow Demands Oriented Node Plaement in Multi-Hop Wireless Networks Zimu Yuan Institute of Computing Tehnology, CAS, China {zimu.yuan}@gmail.om arxiv:153.8396v1 [s.ni] 29 Mar 215 Abstrat In multi-hop wireless

More information

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays nalysis of input and output onfigurations for use in four-valued D programmable logi arrays J.T. utler H.G. Kerkhoff ndexing terms: Logi, iruit theory and design, harge-oupled devies bstrat: s in binary,

More information

Divide-and-conquer algorithms 1

Divide-and-conquer algorithms 1 * 1 Multipliation Divide-and-onquer algorithms 1 The mathematiian Gauss one notied that although the produt of two omplex numbers seems to! involve four real-number multipliations it an in fat be done

More information

A Dictionary based Efficient Text Compression Technique using Replacement Strategy

A Dictionary based Efficient Text Compression Technique using Replacement Strategy A based Effiient Text Compression Tehnique using Replaement Strategy Debashis Chakraborty Assistant Professor, Department of CSE, St. Thomas College of Engineering and Tehnology, Kolkata, 700023, India

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

Definitions Homework. Quine McCluskey Optimal solutions are possible for some large functions Espresso heuristic. Definitions Homework

Definitions Homework. Quine McCluskey Optimal solutions are possible for some large functions Espresso heuristic. Definitions Homework EECS 33 There be Dragons here http://ziyang.ees.northwestern.edu/ees33/ Teaher: Offie: Email: Phone: L477 Teh dikrp@northwestern.edu 847 467 2298 Today s material might at first appear diffiult Perhaps

More information

Improved flooding of broadcast messages using extended multipoint relaying

Improved flooding of broadcast messages using extended multipoint relaying Improved flooding of broadast messages using extended multipoint relaying Pere Montolio Aranda a, Joaquin Garia-Alfaro a,b, David Megías a a Universitat Oberta de Catalunya, Estudis d Informàtia, Mulimèdia

More information

Machine Vision. Laboratory Exercise Name: Student ID: S

Machine Vision. Laboratory Exercise Name: Student ID: S Mahine Vision 521466S Laoratory Eerise 2011 Name: Student D: General nformation To pass these laoratory works, you should answer all questions (Q.y) with an understandale handwriting either in English

More information

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System Algorithms, Mehanisms and Proedures for the Computer-aided Projet Generation System Anton O. Butko 1*, Aleksandr P. Briukhovetskii 2, Dmitry E. Grigoriev 2# and Konstantin S. Kalashnikov 3 1 Department

More information

Tracking Table Tennis Balls in Real Match Scenes for Umpiring Applications

Tracking Table Tennis Balls in Real Match Scenes for Umpiring Applications British Journal of Mathematis & Computer Siene 1(4): 228-241, 2011 SCIENCEDOMAIN international www.sienedomain.org Traking Tale Tennis Balls in Real Math Senes for Umpiring Appliations K. C. P. Wong 1*

More information

Improved Circuit-to-CNF Transformation for SAT-based ATPG

Improved Circuit-to-CNF Transformation for SAT-based ATPG Improved Ciruit-to-CNF Transformation for SAT-based ATPG Daniel Tille 1 René Krenz-Bååth 2 Juergen Shloeffel 2 Rolf Drehsler 1 1 Institute of Computer Siene, University of Bremen, 28359 Bremen, Germany

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

5.2.1 Ant, indispensable Ant

5.2.1 Ant, indispensable Ant 5.2.1 Ant, indispensale Ant Apahe s Ant produt (http://ant.apahe.org/) is a uild tool that lets you easily ompile and test appliations (among other things). It is the de fato standard for uilding Java

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

Outline. CS38 Introduction to Algorithms. Administrative Stuff. Administrative Stuff. Motivation/Overview. Administrative Stuff

Outline. CS38 Introduction to Algorithms. Administrative Stuff. Administrative Stuff. Motivation/Overview. Administrative Stuff Outline CS38 Introdution to Algorithms Leture 1 April 1, 2014 administrative stuff motivation and overview of the ourse stale mathings example graphs, representing graphs graph traversals (BFS, DFS) onnetivity,

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup Parallelizing Frequent Web Aess Pattern Mining with Partial Enumeration for High Peiyi Tang Markus P. Turkia Department of Computer Siene Department of Computer Siene University of Arkansas at Little Rok

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN

International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN International Journal of Advanements in Researh & Tehnology, Volume 3, Issue 3, Marh-204 ISSN 2278-773 47 Phrase Based Doument Retrieving y Comining Suffix Tree index data struture and Boyer- Moore faster

More information

An Efficient and Scalable Approach to CNN Queries in a Road Network

An Efficient and Scalable Approach to CNN Queries in a Road Network An Effiient and Salable Approah to CNN Queries in a Road Network Hyung-Ju Cho Chin-Wan Chung Dept. of Eletrial Engineering & Computer Siene Korea Advaned Institute of Siene and Tehnology 373- Kusong-dong,

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

Automated Test Generation from Vulnerability Signatures

Automated Test Generation from Vulnerability Signatures Automated Test Generation from Vulneraility Signatures Adulaki Aydin, Muath Alkhalaf, and Tevfik Bultan Computer Siene Department University of California, Santa Barara Email: {aki,muath,ultan}@s.us.edu

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

Dynamic Programming. Lecture #8 of Algorithms, Data structures and Complexity. Joost-Pieter Katoen Formal Methods and Tools Group

Dynamic Programming. Lecture #8 of Algorithms, Data structures and Complexity. Joost-Pieter Katoen Formal Methods and Tools Group Dynami Programming Leture #8 of Algorithms, Data strutures and Complexity Joost-Pieter Katoen Formal Methods and Tools Group E-mail: katoen@s.utwente.nl Otober 29, 2002 JPK #8: Dynami Programming ADC (214020)

More information

Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the Convergence of ICP Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome,

More information

Recommendation Subgraphs for Web Discovery

Recommendation Subgraphs for Web Discovery Reommation Subgraphs for Web Disovery Arda Antikaioglu Department of Mathematis Carnegie Mellon University aantika@andrew.mu.edu R. Ravi Tepper Shool of Business Carnegie Mellon University ravi@mu.edu

More information

This fact makes it difficult to evaluate the cost function to be minimized

This fact makes it difficult to evaluate the cost function to be minimized RSOURC LLOCTION N SSINMNT In the resoure alloation step the amount of resoures required to exeute the different types of proesses is determined. We will refer to the time interval during whih a proess

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION Cuiui Kang 1, Shengai Liao, Shiming Xiang 1, Chunhong Pan 1 1 National Laboratory of Pattern Reognition, Institute of Automation, Chinese

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

Exploiting Enriched Contextual Information for Mobile App Classification

Exploiting Enriched Contextual Information for Mobile App Classification Exploiting Enrihed Contextual Information for Mobile App Classifiation Hengshu Zhu 1 Huanhuan Cao 2 Enhong Chen 1 Hui Xiong 3 Jilei Tian 2 1 University of Siene and Tehnology of China 2 Nokia Researh Center

More information

A {k, n}-secret Sharing Scheme for Color Images

A {k, n}-secret Sharing Scheme for Color Images A {k, n}-seret Sharing Sheme for Color Images Rastislav Luka, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos The Edward S. Rogers Sr. Dept. of Eletrial and Computer Engineering, University

More information

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method 3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr

More information

An Interactive-Voting Based Map Matching Algorithm

An Interactive-Voting Based Map Matching Algorithm Eleventh International Conferene on Mobile Data Management An Interative-Voting Based Map Mathing Algorithm Jing Yuan* University of Siene and Tehnology of China Hefei, China yuanjing@mail.ust.edu.n Yu

More information

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes Deteting Outliers in High-Dimensional Datasets with Mixed Attributes A. Koufakou, M. Georgiopoulos, and G.C. Anagnostopoulos 2 Shool of EECS, University of Central Florida, Orlando, FL, USA 2 Dept. of

More information

Accelerating Multiprocessor Simulation with a Memory Timestamp Record

Accelerating Multiprocessor Simulation with a Memory Timestamp Record Aelerating Multiproessor Simulation with a Memory Timestamp Reord Kenneth Barr Heidi Pan Mihael Zhang Krste Asanovi Marh, 5 Massahusetts Institute of Tehnology Intelligent sampling gives est speed-auray

More information

Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps

Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps Stairase Join: Teah a Relational DBMS to Wath its (Axis) Steps Torsten Grust Maurie van Keulen Jens Teubner University of Konstanz Department of Computer and Information Siene P.O. Box D 88, 78457 Konstanz,

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

1 Disjoint-set data structure.

1 Disjoint-set data structure. CS 124 Setion #4 Union-Fin, Greey Algorithms 2/20/17 1 Disjoint-set ata struture. 1.1 Operations Disjoint-set ata struture enale us to effiiently perform operations suh as plaing elements into sets, querying

More information

The Implementation of RRTs for a Remote-Controlled Mobile Robot

The Implementation of RRTs for a Remote-Controlled Mobile Robot ICCAS5 June -5, KINEX, Gyeonggi-Do, Korea he Implementation of RRs for a Remote-Controlled Mobile Robot Chi-Won Roh*, Woo-Sub Lee **, Sung-Chul Kang *** and Kwang-Won Lee **** * Intelligent Robotis Researh

More information

Abstract. We describe a parametric hybrid Bezier patch that, in addition. schemes are local in that changes to part of the data only aect portions of

Abstract. We describe a parametric hybrid Bezier patch that, in addition. schemes are local in that changes to part of the data only aect portions of A Parametri Hyrid Triangular Bezier Path Stephen Mann and Matthew Davidhuk Astrat. We desrie a parametri hyrid Bezier path that, in addition to lending interior ontrol points, lends oundary ontrol points.

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

Gradient based progressive probabilistic Hough transform

Gradient based progressive probabilistic Hough transform Gradient based progressive probabilisti Hough transform C.Galambos, J.Kittler and J.Matas Abstrat: The authors look at the benefits of exploiting gradient information to enhane the progressive probabilisti

More information

Space- and Time-Efficient BDD Construction via Working Set Control

Space- and Time-Efficient BDD Construction via Working Set Control Spae- and Time-Effiient BDD Constrution via Working Set Control Bwolen Yang Yirng-An Chen Randal E. Bryant David R. O Hallaron Computer Siene Department Carnegie Mellon University Pittsburgh, PA 15213.

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

Cluster Centric Fuzzy Modeling

Cluster Centric Fuzzy Modeling 10.1109/TFUZZ.014.300134, IEEE Transations on Fuzzy Systems TFS-013-0379.R1 1 Cluster Centri Fuzzy Modeling Witold Pedryz, Fellow, IEEE, and Hesam Izakian, Student Member, IEEE Abstrat In this study, we

More information

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty

More information

2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any urrent or future media, inluding reprinting/republishing this material for advertising

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Improvement of low illumination image enhancement algorithm based on physical mode

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Improvement of low illumination image enhancement algorithm based on physical mode [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 22 BioTehnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(22), 2014 [13995-14001] Improvement of low illumination image enhanement

More information

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center Construting Transation Serialization Order for Inremental Data Warehouse Refresh Ming-Ling Lo and Hui-I Hsiao IBM T. J. Watson Researh Center July 11, 1997 Abstrat In typial pratie of data warehouse, the

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

Cluster-based Cooperative Communication with Network Coding in Wireless Networks

Cluster-based Cooperative Communication with Network Coding in Wireless Networks Cluster-based Cooperative Communiation with Network Coding in Wireless Networks Zygmunt J. Haas Shool of Eletrial and Computer Engineering Cornell University Ithaa, NY 4850, U.S.A. Email: haas@ee.ornell.edu

More information

Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings

Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings Taming Deentralized PMDPs: Towards ffiient Poliy omputation for Multiagent Settings. Nair and M. Tambe omputer Siene Dept. University of Southern alifornia Los Angeles A 90089 nair,tambe @us.edu M. Yokoo

More information

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification A New RBFNDDA-KNN Network and Its Appliation to Medial Pattern Classifiation Shing Chiang Tan 1*, Chee Peng Lim 2, Robert F. Harrison 3, R. Lee Kennedy 4 1 Faulty of Information Siene and Tehnology, Multimedia

More information

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communiations 1 RAC 2 E: Novel Rendezvous Protool for Asynhronous Cognitive Radios in Cooperative Environments Valentina Pavlovska,

More information

Tree Awareness for Relational DBMS Kernels: Staircase Join

Tree Awareness for Relational DBMS Kernels: Staircase Join Tree Awareness for Relational DBMS Kernels: Stairase Join Torsten Grust 1 and Maurie van Keulen 2 1 Department of Computer and Information Siene, University of Konstanz, P.O. Box D188, 78457 Konstanz,

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

Semi-Supervised Affinity Propagation with Instance-Level Constraints

Semi-Supervised Affinity Propagation with Instance-Level Constraints Semi-Supervised Affinity Propagation with Instane-Level Constraints Inmar E. Givoni, Brendan J. Frey Probabilisti and Statistial Inferene Group University of Toronto 10 King s College Road, Toronto, Ontario,

More information

Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis

Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis Design Impliations for Enterprise Storage Systems via Multi-Dimensional Trae Analysis Yanpei Chen, Kiran Srinivasan, Garth Goodson, Randy Katz University of California, Berkeley, NetApp In. {yhen2, randy}@ees.berkeley.edu,

More information

Direct-Mapped Caches

Direct-Mapped Caches A Case for Diret-Mapped Cahes Mark D. Hill University of Wisonsin ahe is a small, fast buffer in whih a system keeps those parts, of the ontents of a larger, slower memory that are likely to be used soon.

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

The Happy Ending Problem

The Happy Ending Problem The Happy Ending Problem Neeldhara Misra STATUTORY WARNING This doument is a draft version 1 Introdution The Happy Ending problem first manifested itself on a typial wintery evening in 1933 These evenings

More information

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq Volume 4 Issue 6 June 014 ISSN: 77 18X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om Medial Image Compression using

More information

Adapting K-Medians to Generate Normalized Cluster Centers

Adapting K-Medians to Generate Normalized Cluster Centers Adapting -Medians to Generate Normalized Cluster Centers Benamin J. Anderson, Deborah S. Gross, David R. Musiant Anna M. Ritz, Thomas G. Smith, Leah E. Steinberg Carleton College andersbe@gmail.om, {dgross,

More information

Implementing Load-Balanced Switches With Fat-Tree Networks

Implementing Load-Balanced Switches With Fat-Tree Networks Implementing Load-Balaned Swithes With Fat-Tree Networks Hung-Shih Chueh, Ching-Min Lien, Cheng-Shang Chang, Jay Cheng, and Duan-Shin Lee Department of Eletrial Engineering & Institute of Communiations

More information