A Combination of Trie-trees and Inverted Files for the Indexing of Set-valued Attributes

Size: px
Start display at page:

Download "A Combination of Trie-trees and Inverted Files for the Indexing of Set-valued Attributes"

Transcription

1 A Combination o Trie-trees and Files or the Indexing o Set-valued Attributes Manolis Terrovitis Nat. Tehnial Univ. Athens mter@dblab.ee.ntua.gr Spyros Passas Nat. Tehnial Univ. Athens spas@dblab.ee.ntua.gr Panos Vassiliadis Univ. o Ioannina pvassil@s.uoi.gr Timos Sellis Nat. Tehnial Univ. Athens timos@dblab.ee.ntua.gr ABSTRACT Set-valued attributes requently our in ontexts like marketbasked analysis and stok market trends. Late researh literature has mainly oused on set ontainment joins and data mining without onsidering simple queries on set valued attributes. In this paper we address superset, subset and equality queries and we propose a novel indexing sheme or answering them on set-valued attributes. The proposed index superimposes a trie-tree on top o an ile that indexes a relation with set-valued data. We show that we an eiiently answer the aorementioned queries by indexing only a subset o the most requent o the items that our in the indexed relation. Finally, we show through extensive experiments that our approah outperorms the state o the art mehanisms and sales graeully as database size grows. Categories and Subjet Desriptors H.2.2 [Database Management]: Physial Design Aess Methods General Terms Algorithms, Perormane Keywords HTI, iles, tries, ontainment queries 1. INTRODUCTION Containment queries on set-values emerge in a variety o appliation areas ranging rom sientii databases to XML douments. Examples o set valued data an be ound in market basket analysis, prodution models, image and moleular databases [7]. Containment queries span a wide range o query amilies, ranging rom simple existene queries to omposite similarity, pattern mathing, or graph isomorphism queries. Naturally, the undamental set-ontainment operators are typial or a large number o situations (e.g., Give me all photographs whose annotation ontains the terms galaxy and red giant, or Give me all protein sequenes that ontain either G or T or a ombination o Permission to make digital or hard opies o all or part o this work or personal or lassroom use is granted without ee provided that opies are not made or distributed or proit or ommerial advantage and that opies bear this notie and the ull itation on the irst page. To opy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speii permission and/or a ee. CIKM 6, November 5 11, 26, Arlington, Virginia, USA. Copyright 26 ACM /6/11...$5.. them, but nothing else ). Moreover, set-ontainment operators an be used in other query lasses where a pruning o the andidate sets to be proessed takes plae (e.g., Give me all mediines sequenes that are similar to my XYZ test mediine and their X omponent ontains either G or T or a ombination o them, but nothing else ). Another important appliation area or ontainment queries is the evaluation o path expressions in XML data, whih partially resolves to keyword searhing [9]. As RDBMSs and IR ome loser, oten in the interest o storing and handling XML [2] and web data [4], ontainment queries on set values beome a more and more signiiant use ase or an RDBMS. A natural way o modelling and storing set-values in modern RDBMS is by using set valued attributes. Set valued attributes are an integral part o the objet-relational model and they are supported by most modern RDBMS s [17]. In this ontext, we are interested in ontainment queries over the set valued attributes o a relation. More speiially, assuming a relation D(id, set values) and a set o interesting items qs = {i 1,..., i n}, we would be interested to ask queries o the orm {t t D qs θ t.set values}, where θ {,, }. The problem o eiiently omputing the result set o these operations is hallenging, mainly due to the vastness o the underlying data volumes and the partiularities o the queries. The problem with set values is that the spae o potentially indexed values is enormous (2 n, or n items) and the resulting index would also be huge as well. Moreover, the query semantis are quite dierent: whereas simple subset queries retrieve the tuples that ontain a ertain set o items, superset values require that some (but not neessarily all) o these items are ontained in the result tuples, and nothing else. Thereore, an eiient indexing sheme that an (a) support the oexistene o multiple items in the same query set and (b) adequately support dierent lasses o ontainment queries by exploiting their harateristis is neessary. To this day, the database and the inormation retrieval(ir) researh ommunities are mainly the ones having studied set-values in depth. From the database perspetive, there is a need to eiiently handle huge volumes o small sets, usually taking values rom a limited domain. So ar, database researh has mostly oused on similarity and join queries. Similarity queries [3, 12] retrieve the set values that are most similar to the one provided in the query. Join queries, whih are lassiied as similarity joins [15], or as set ontainment joins [11, 13], ous on interseting two dierent relations based on their set valued attributes. Researh on aess methods or basi ontainment queries is very limited. To 728

2 the best o our knowledge, only aess methods based on signature iles [2] and ile indies [1, 22] have been used in the database researh literature or supporting ontainment queries on set valued attributes. A reent survey [7] has shown that iles learly outperorm signature-based methods or ontainment queries on low ardinality set values. The same holds or text douments as Zobel et. al. showed in [21]. Moreover, Zhang et. al., studied in [2] how iles indies ompare to traditional relational methods or ontainment queries, motivated by the integration o IR untionality in RDBMSs. Using traditional relational indies like B-trees or ontainment queries was shown to have signiiantly inerior perormane in most ases. Considering iles as the stateo-the-art mehanism or set ontainment is also supported by the at that they used by all WWW searh engines [19]. Still, the perormane o iles suers when the domain o the distint items o the database is small or when the distribution o the items is skewed and ew items dominate the dataset. This is due to their internal struture: iles ontain a header list with all the items o the voabulary; or eah item, an list with pointers to the transations that ontain this item is maintained. Thus, i some items appear in many set values, their lists beome very long. Sine ontainment queries usually require sanning the entire lists o the query items, having long lists has a deteriorating impat o the query evaluation. This is oten the ase o real world. A harateristi ase o numerous reords o set values rom a limited domain are the real datasets rom UCI KDD arhive [8] that we use in our experimental evaluation. These datasets are logs that trae the behavior o users in large web portals, whih is a ommon soure o data that are analyzed by using ontainment queries (e.g., Whih users downloaded only drivers and pathes rom our website and did not visit any other page? ). Moreover, highly skewed data is a ommon ase or retail transations, where some basi produts dominate the transational logs. In this paper, we ous on the eiient evaluation o ontainment queries on large olletions o low ardinality sets with exat query semantis. The query lasses under investigation inlude subset, superset and set equality queries. These queries test a set o items, a.k.a query set, over a set valued attribute o a set o reords, or the ulillment o the query s seletion ondition (subset, superset or equality). The exat set o transations that ulill the seletion ondition is returned. To eiiently answer these lasses o queries, we propose a novel indexing sheme, the Hybrid Trie- ile (HT I) index. The HT I-index superimposes a trie struture, the aess tree, over an ile index. The aess tree oers pointers to the lists o the most requent items, thus leveraging the perormane o iles. In the HT I index, queries over the requent items are evaluated by the aess tree. At the same time, the memory requirements remain low, sine the inormation or the vast majority o the data is kept in the ile. This evaluation mehanism has a signiiant impat on query answering eiieny in the average ase, sine we expet items to be queried aording to their requeny o appearane. In short, our ontribution omprises the ollowing: 1. We propose a novel indexing sheme, the HT I index that ombines a trie with an ile, or large olletions o low ardinality sets. The main idea is that the trie is plaed in main memory, indexing the top k most requent items o the data set, whereas the ile is plaed in seondary storage, assoiating eah item with all the transations that ontain it. The index is partiularly it or data rom a limited domain or skewed data, whih is a very ommon real world ase. 2. We present eiient evaluation algorithms or set ontainment queries that utilize the proposed index. For all types o queries we quikly identiy the set o requent items that partiipate in the query by exploiting the main memory part o HT I and omplement the answer by testing the inrequent items through the ile. 3. We demonstrate the superiority o our proposal over the state o the art aess methods, by extensive experiments. We evaluate the HT I index on real and syntheti data. We assess the number o perormed by the HT I index as a untion o domain o items, database size and size o the query set. In all oasions, HT I signiiantly outperorms a ompetitor ile, and sales graeully, espeially in the ases o large database and query sizes (as opposed to the ile that ails to sale similarly). In the ase o the real datasets, whih involve 32k and 1M transations, the HT I index perorms an order o magnitude less with a memory overhead o less than.5mb. Our experiments with syntheti data show that even or large domains, keeping a low threshold or the top-k items held in the trie is suiient or ahieving high perormane with minimum memory expenses. The rest o the paper is organized as ollows: In Setion 2 we ormulate the problem and in Setion 3 we present the proposed HT I index. Setion 4 desribes the query evaluation algorithms and in Setion 5 we demonstrate the results o the experimental omparison o our proposal against the ile index. Finally, Setion 6 onludes the paper. 2. PROBLEM FORMULATION For reasons o simpliity we assume that the data are organized under a simple objet relational shema D with eah tuple t = [id, s] having two attributes; id is a unique identiier o the transation and s is a set (not a bag or a list) o objets rom an ininitely ountable domain o distint items. We reer to the ative domain o D, with the term voabulary and denote it as I. Thus, every t.s I. Moreover, throughout the paper we onsider the id as adequate inormation to allow us to retrieve the whole transation rom the hard disk in one step (i.e., in one page aess). Queries. In queries on set valued data, the user speiies the query prediate and the query set qs. The query set is a set o items rom the domain o I. The queries we are interested in are deined as ollows: Subset queries. In subset queries the user asks or all transations t that ontain the query set qs, i.e., {t t D qs t.s}. Equality queries. In equality queries the user asks or all transations that ontain exatly the query set, i.e., {t t D qs t.s}. 729

3 ID Items bought 1 {, a, } 2 {, b, d} 3 {, a} 4 {a, } ID 5 {, d} 6 {, } 7 {} Items bought Figure 1: Example relation D o ustomer transations Voabulary (I) a d b 1, 3 1, 2 1, 3 2, 5 2 lists o transation id s 5,6 4,6 4 7 Database transations (D) Figure 2: A simple ile index sheme or the example o Figure 1. Superset queries. In superset queries the user asks or all transations whose items are ontained in the query set, i.e., {t t D qs t.s}. 3. INDEX STRUCTURE Tries and iles have been extensively used or text indexing, still the ormer have not been employed or indexing set-valued attributes in objet-relational databases. In this setion, we introdue the HT I index that ombines a main memory trie with an ile residing in seondary storage. First, we give bakground inormation or iles and tries and we explain their beneits and drawbaks. Then, we show how these indexing shemes are ombined in the HT I index. Finally, we also disuss issues onerning updates, ompression and ahing. 3.1 The ile The ile index has two major omponents: (a) the voabulary and (b) the lists. The voabulary is a list o all the distint items appearing in the database, i.e., it is the same with the database voabulary I o Setion 2. Eah list node has a label indiating the item it represents and a pointer to the head o the list. The list ontains inormation about all the transations in whih the item appears. In our ase, this inormation omprises the transation id alongside with its length. The length o the transation is required in order to eiiently exeute equality and superset queries. Figure 2 depits an ile index or the relation o Figure 1. The voabulary inludes all the distint items that appear in the transations. The lists may be huge or large databases; the id and the length o a transation is inserted as many times as the number o items it ontains. This means that, theoretially, the size o the ile ould be similar to the size o the transation olletion or even larger. Unompressed iles or text douments typially onsume around 3% o the spae required or the unompressed database [16]. In the ases that we are mostly 4 7 a, interested, where there are no repetitions and the voabulary is signiiantly smaller than the number o transations (I D), the ile an be equal or larger than the database, sine the t.id requires more bits than the items o I. We an trae the answer to subset, superset and equality queries by using set operations on the lists. Due to their size, the lists are stored in seondary storage. Thereore, the larger these lists are, the more memory pages have to be retrieved rom the disk or evaluating a query. This means that the most requent items that have the larger lists are the most expensive to proess. This is an important weakness, when dealing with set values in databases, onsidering that most requent items are usually the ones most requently queried. 3.2 Tries in the ontext o set-values Tries are multiway tree strutures or storing string keys whih enable retrieval in time proportional to the string length [1]. Unlike iles, tries are letter oriented and eah string orresponds to a path in the tree. Consequently, ommon preixes in strings orrespond to ommon preix paths in the tree. Lea nodes inlude either the douments themselves, or links to the douments that ontain the string that orresponds to the path. Sine strings are words o some language, the maximum number o hildren or a node, is limited by the number o letters o the alphabet o the douments language. The way tries are reated allows or preix (or suix, i strings are beore being mapped to paths) searh, i.e., they provide a kind o range searh, based on the irst letters o the string. A signiiant dierene between set values and text douments, is that unlike words (whih are omposed o letters), the items o a set are not urther deomposable to smaller units. Even i the items are alphanumeri values themselves, this is simply a oding sheme o the database, that eventually has no relationship to the user queries. Thereore, it is meaningless to exploit the alphanumeri value o the items or indexing purposes, but rather, we need to use the set o all items I as the voabulary o the index. As a result, eah node might have I hildren. This makes the potential size o the trie very large and thus the spae gain ahieved rom ommon preixes is a lot smaller ompared to the one in the text doument ase. Pratially, even or a moderately large I, e.g., 2k, the maximum spae o the trie is so big, that it grows almost linearly with the number o transations. In our ollowing deliberations, we need to deine the undamental notion o item requeny ordering that onerns the ordering o the items o a voabulary. Item requeny ordering. The item requeny ordering o the items o a voabulary I (over a database D) is the total ordering o the items aording to their requeny o appearane in the underlying database. In our reerene example, the item reerene ordering < I = [,, a, d, b]. To onstrut a trie or set values, we ollow the approah o Han et al. in [5, 6]. First, eah transation is transormed rom an unordered set to an ordered sequene based on the item requeny ordering o the voabulary. An item x preedes another item y in an ordered transation i x is more requent than y in the whole database D. The ordered transation is subsequently mapped to a path starting rom the trie tree root. I some nodes already exist, due to a ommon preix with a previously inserted transation, we only add the new nodes. 73

4 Ordered Transations 1 {,,a} 2 {,d,b} 3 {,a} 4 {,a} 5 {,d} 6 {,} 7 {} a tid s: 1,6 tid s: 1 a tid s: 3 d Null tid s: 1, 3, 5, 6, 7 tid s: 5 d b tid s: 2 tid s: 2 tid s: 2, 4 a tid s: 4 How the ull trie would ideally be. The shaded area is to be exluded in the aess tree o HTI Figure 3: An abstrat orm o a trie tree or the example o Figure 1 An abstrat orm o the trie tree or the database o Figure 1 is depited in Figure 3. The transation with id = 1 and set value s = {a,, } is ordered aording to the requeny o its items in the database. Sine ours 5 times, ours 4 and a ours 3 times, the transation s set is transormed to a sequene s = {,, a} that subsequently ontributes the path a in the trie. Unlike typial tries, in Figure 3 we annotate eah node with the list o transation id s that orrespond to it (without implying that they are atually kept in main memory along with the trie). Note that depending on its preix, a transation might belong to the list o more than one nodes. For example, the transation with id = 1 belongs to the lists o all the nodes o its preix, i.e., all the nodes o the path a. Finally, there is a dierene among the transations that pertain solely to a node and the transations that also pertain to its desendants. Observe the node o the path a. The transation with id = 1 is the transation {,, a} that also belongs to the node a o the same path. On the ontrary, the transation with id = 6 reers exatly to the path. The distintion will be very useul later, or equality and superset queries. The potentially very large number o desendants that a node might have and the at that tries are unbalaned, does not make the trie a good andidate or seondary memory storage. Thereore, we hoose to use it as a main memory struture oering alternative aess to the data, on top o the ile. 3.3 The HTI index As we have explained in Setion 3.1, the perormane o the ile suers, when very long lists have to be proessed. The issues involved in the proessing o iles are (a) the IO ost o transerring the disk pages with the lists to main memory and (b) the CPU ost o interseting lists o dierent items that partiipate in the same query set. To ounter this eet we propose the HT I-index, whih uses a relatively small main memory trie to oer additional aess points to the lists o the most requent items (that also have the longest lists). The basi idea o the HT I-index is to split the voabulary o the database into (a) a small set o requent items I r and (b) a large set o inrequent items I \ I r. Then, a trie is used or the ormer, in order to speed up the aess to the lists that pertain only to the ombinations o requent items, whereas the latter are treated as usually, through an ile. The HT I-index, has three major omponents: a voabulary, an aess tree and a set o lists. An HT I index is shematially depited in Figure 4. The voabulary. Like iles, the HT I has a list o all the distint items o the database, whih oers aess to the lists. The items in the voabulary are divided in two lasses: (a) the requent items I r, I r I, whose voabulary entries point to the aess tree in main memory, and (b) the inrequent items, I inr = I \ I r, whose voabulary entries lead diretly to their lists in seondary storage, exatly like in iles. The voabulary is kept as an array in main memory and together with the aess tree root they omprise the initial aess points to the lists. The array is implemented as a hash table. The aess tree. The aess tree is a trie struture that oers aess points to bloks o transations that share the same aess preix paths (app). The app o a transation an easily be omputed i we order its items aording to the item requeny ordering o I. Then, we deine as aess preix path the sequene preix path whose items all lie in I r i.e., the ordered sequene o the requent items o the transation. For example, the app o {, a} is {}. We store the app o eah transation in the aess tree, by putting the irst and most requent element as a diret hild o the root (see also the next setion or a detailed disussion on the reation o the aess tree). The aess tree has two kinds o nodes: (a) the root, whih does not orrespond to any item in I r and (b) inormation nodes, whih are all the other nodes o the trie. Eah suh node holds the ollowing inormation: A label indiating the item o I r, whih orresponds to the node. A link to the sublist o the transations that ontribute to the path rom the root to the node. These are all the transations whose preix is the same with the path rom the root to the urrent node. Navigational links to the hildren-nodes, the parentnode and to the rest o the nodes with the same label. It is important to stress here that due to the vast volume o the ull-ledged trie presented in the previous setion, the aess tree is a subset o it, onerning only its most requent items I r. The voabulary entries onerning these requent items point to lists that omprise all the aess tree nodes that are labelled with the respetive item. In turn, these nodes point to the respetive lists, stored in seondary storage. In Figure 4 we depit an example HT I index or the relation o Figure 1. We hoose as requent items I r =, (having a requeny greater than 3), and we reate the aess tree onsidering only them. Observe that in Figure 3, these were also the items with the longest transation lists. The shaded area in Figure 3 onerns the inrequent items that were subsequently dropped rom the aess tree o Figure 4. Item is more requent than, thus it preedes it in aess tree paths. Assuming this I r set, all the transations o Figure 1, ontribute to three paths: root, 731

5 Frequent Items Inrequent Items Voabulary a d b Null Aess Tree 3, 5 1, 6 1, 3 2, 5 2 Main memory 7,1 2, 4 4 Seondary storage 6 lists Figure 4: HT I index or the relation o Figure 1. Dark shaded box stands or the list o Total number o transations that ontain the item 5 3 3, 5 7,1 6 Number o transations whose app ends at urrent node, i.e., transations 3,5,7 Light shaded boxes stand or HD pages Figure 5: The transation list orresponding to the node, assuming two id s per disk page. root and root. Observe, also, how the nodes labeled are linked to eah other. The lists. There are two ases or the lists o the voabulary items: (a) lists o non-requent items and (b) lists o requent items. Conerning the non-requent items, their lists are exatly the same as those o a regular ile (i.e., sorted lists ontaining the id s o all the transations that ontain the respetive item). The ase o requent items belonging to I r, on the other hand, involves lists made up o many smaller sorted sublists, eah o them orresponding to an aess tree inormation node labelled with the respetive item. To enhane the evaluation o equality and superset queries, we urther divide the sublists o the aess tree to two parts as depited in Figure 5, or the ase o : (a) the id s o the transations whose app ends at this node; these are transations id s 3,5 and 7, (b) the id s o rest o the transations that ontribute to the urrent node; these are id s 1 and 6. In the beginning o eah inormation node sublist, we store the number o transations o ase (a) alongside with the total number o the transations that ontribute to the urrent node, so that we an retrieve the right blok rom the disk eah time. Example. As shown in Figure 4, the aess tree and the voabulary are kept in the main memory, whereas the lists reside at seondary storage. Transations 1,3,5,6,7 ontribute to the path root, thus they are stored at the sublist o node. Observe that, being the most requent item has exatly one sublist, i.e. its list omprises a single sublist, orresponding to its single appearane in the aess tree. This is not neessarily the ase or all the items, though. For example, the item has two sublists. Two o the transations o, 1 and 6, also ontribute to the path root, and they are stored in the irst sublist. At the same time, transations 2 and 4 ontribute to the path root and they are stored at the seond sublist. For storage eiieny, the individual sublists o all the dierent nodes o are stored ontiguously, one ater the other. The nodes o the aess tree point to the oset o the list where their orresponding sublist begins. Note that in Figure 4 we depit only the id s that are ontained in the sublists and not the labels that mark eah sublist or reasons o readability. The real struture o the sublist or the ase o item is depited in Figure 5. The rest o the items are indexed by an ile and the id s o the transations that inlude them are stored in the respetive lists. Note that, onerning the inrequent items a, d, b, their voabulary entries point diretly to the lists in seondary storage without any intererene with the aess tree. Updates in HTI index. When a new transation is to be inserted or deleted rom the HT I index we pratially have to perorm two dierent updates: one to the ile omponent and one to the aess tree. I the items and the order o I r are not modiied, the ase is straightorward [18]. Still, it is also possible that the order and the member items o I r should be hanged, due to hanges in the items appearane requeny. In all suh ases, the query evaluation algorithms are orret. Considering also, that in most related appliation areas, the relative requenies o the items hange slowly or remain stable, the rebuilding o the index is not neessary. In any ase, the requeny ordering relets a heuristi or keeping the size o the aess tree small, as reported in [5]; other orderings ould also apply. For more details on the reation and maintenane o the HT I index we reer the interest reader to the long version o the paper [18]. Compression and ahing There is a question o how the HT I index ompares to iles, when ompression tehniques are applied [16, 14] or a ahe equal to the aess tree size is given to the ile. As ar as the ormer is onerned, the HT I index is omplementary to ompression and not ompetitive to it. I the lists beome smaller, then we an redue the size o the HT I by using a smaller threshold. Giving ahe to the lists on the other hand, may be a good solution or uniorm distributions with large voabularies. Still, the eetiveness o the ahe is dependent on how big it is when ompared with the total ile and it will be redued as the size o the ile grows. On the ontrary, the main memory requirements o the HT I index depend mostly on the size o the voabulary, sine dupliate or similar transations do not aet its size and eetiveness. Thus, or small voabularies and espeially or skewed distributions, the HT I index is a better hoie. 4. QUERY EVALUATION In this setion, we present the evaluation algorithms or the three types o queries that we are interested in: subset, equality and superset. The evaluation algorithms or all types o queries have two main stages: (a) evaluation in the aess tree, and (b) evaluation in the ile. The evaluation in the aess tree onerns the requent items o the query set, and the evaluation in the ile the rest o the items. The basi idea is that we use the aess points to the lists oered by the trie, to quikly trae the inal or a andidate answer to the query. The beneit is quite signiiant sine the aess points are given or the largest lists, whih orrespond to the items o I r. This way we avoid expensive union or intersetion operations be- 732

6 tween the lists indexed by the aess tree, and instead we impliitly perorm these operations in the tree itsel. For all three ases o queries, we assume a query set o the orm qs = { 1,..., k, i k+1,..., i n}, where the irst k items i onern the requent items o the query set, belonging to the aess tree, and the next n k items i j are the inrequent items that are only indexed by the ile. In the ollowing, we detail the evaluation tehniques or eah type o queries. 4.1 Subset queries Subset queries are the most ommon queries exeuted against transation and text olletions and most broadly studied in researh literature. Furthermore, the evaluation o many query lasses, inluding ranking ones, partially resolves to the evaluation o subset queries. The main idea around evaluating subset queries is that the transations that ontain the app part o the qs an easily be identiied by using the aess tree, without merging the respetive lists. This is eiiently done by traing all the appearanes o the last element o the app, k (whih is also the least requent in app), and then identiying whih paths rom the root to the k nodes ontain the app o the qs. These paths possibly ontain other requent items too, but they neessarily ontain the app o the query set. We all the set o the retrieved transation id s andidateids. Possibly, apart rom the requent items, there are also inrequent items in the query set. The only way to aess these inrequent items i k+1,..., i n is through the ile. Thereore, to ompute the inal query answer we must ind the intersetion o the lists o transation id s that orrespond to the inrequent items i k+1,..., i n with the list o the already retrieved andidateids. Any transation id that belongs to this result ontains both the requent items o the app and the inrequent items i k+1,..., i n. The algorithm in pseudo-ode is depited in Figure 6. Algorithm SubsetQueries Input: An HT I index H over a dataset D, a query set qs = { 1,..., k, i k+1,..., i n } and a query Q={t qs t.s}. Output: the t.id s o the transations that ontain qs Method: 1. Determine the app = { 1,..., k } o the query set. 2. I app is not empty use subsettrie(app) to retrieve the andidateids rom the trie. 3. I {i k+1,..., i n} is not empty in the query set: 4. result=merge-join the andidateids with the lists o {i k+1,..., i n} 5. else 6. result=andidateids 7. return result Funtion subsettrie(app) Input: An HT I index H over a dataset D, the app o the qs Output: The andidateids, i.e. the t.id s o the transations that ontain the items o app Method: 1. Let be the last item (least requent) o app 2. For every appearane o in the trie 3. i every item i app appears in the path rom the root to the urrent node. 4. add the t.ids o the sublists o the urrent node to the andidateids 5. return andidateids Figure 6: Algorithm or determining subset queries Assume or example that the user asks or all transations that ontain the {,, a} items rom the relation D depited in Figure 1. I we evaluate the query against the ile, depited in Figure 2, we would have to perorm a merge-join o the lists o all the items in the query set. That would require six and we would only have one answer, that is t.id = 1. I, instead, we evaluate the query against the HT I index, the disk pages aesses are muh less. First, we have to identiy the app o the qs whih is. Then, we must trae all the nodes o the aess tree and identiy the paths rom root to, whih ontain the rest o the items o app, i.e.,. This results in only one path: root. Now we an diretly retrieve the transations that ontain and, whih are 1 and 6 by perorming only 1 page aess. Subsequently we an merge-join {1, 6} with the list o a to retrieve the inal answer. The total we enounter in this ase is two. In general, i the lists o the items o the qs (ordered by requeny) over l 1,..., l n disk pages, the worst ase evaluation will require l l n. This holds or both the ile and HT I-index, but as experiments in Setion 5 show, the average ases learly avor the HT I index. The beneit rom using the aess tree omes rom the at that we avoid perorming intersetions between the largest lists. This beneit an potentially be very signiiant, espeially i the requent items are not orrelated. Moreover, the larger the lists are and the greater the skewness o the items distribution is, the greater beneit we gain rom using the aess tree. Some more tehnial notes should also be made or algorithm o Figure 6. Whereas the simpliied orm o the algorithm, implies that we use the aess tree to atually retrieve the t.ids rom the disk and put them in the andidateids this is not the most eetive implementation in most ases. Instead, we return the links to the sublists in the disk, whih are then merged-joined with the lists o the {i k+1,..., i n } items. Furthermore, the merge-join is perormed by starting rom the less requent item, thus it is not always neessary to use the aess tree. In some ases, we an quikly deide that there is no solution, by interseting the smaller lists, and avoid any urther omputation. Pratially, the algorithm irst traverses the aess tree and deides i there is a solution or the app items, and how many it will need to retrieve them. Depending on how many it will need, the algorithm deides the order o the merge-joins i.e., whether it will start rom the trie or the ile. 4.2 Equality queries Employing the HT I index or equality queries leads to very eiient evaluations. For eah query, only one path o the aess tree has to be identiied. This is the path, whih is idential to the app o the query set. Assuming that nodes are organized in some eiient data struture, like hash arrays, the evaluation on the trie an be done in time O( app ), that is proportional to the app o the query set. Ater identiying the single sublist that possibly satisies the query, it has to be interseted with the lists o the non-requent items. In the proess o the mergejoin, the transations are iltered aording to their length, whih must be equal to. We reer the interested user to the long version o the paper [18] or the pseudoode 733

7 o the evaluation algorithm. The worst ase in terms o page aesses is again the same as or subset queries. Still, experiments show that whereas evaluating equality queries in the ile requires as many as the respetive subset queries did, the results with HT I index are a lot better in this ase. 4.3 Superset queries Superset queries are by ar the most expensive queries we study. In a sense, a superset query is equivalent to 2 equality queries, or all its subsets. The evaluation algorithms, even those that work only in the ile, require signiiantly less than 2 equality queries, but still the number is high. I the lists o the items o the qs (ordered by requeny) need l 1,..., l n disk pages respetively, evaluating a superset query solely in the ile, with the algorithm presented in Figure 7, requires in the worst ase l 1 + 2l nl n disk page aesses. As in the ase o equality, the aess tree an drastially boost the eiieny o the query evaluation. The basi idea is to ind all the paths in the trie, whih are solely onstruted by items rom the app o the query. Then we an saely add to andidateids, the ids o all the transations that end in any node o these paths. For these transations we know that they do not ontain any other item o I r, exept rom 1,..., k. I the qs has non requent items too, then we have to hek in the ile i the remaining items o the transations o andidateids ontain only items rom i k+1,..., i n. I the qs does not ontain any other items we ilter the andidateids using their length and the length o the path that lead to them, as pruning riteria. I their length is greater than their app, whih an be inerred rom the trie without examining the transation itsel, the transation is dropped, sine it must have more items that are not ontained in qs. The algorithm or evaluating the superset query is presented in Figure 7. The redution o the disk pages aessed, when using the HT I index or superset queries, is not only attributed to the aess points oered by the trie. It is also a result o the possibility o identiying exatly the transations whose app ends at the aess tree nodes, as opposed to the rest o the transations within the same sublist. 5. EXPERIMENTAL STUDY As several surveys and previous researh have demonstrated, the iles, although a simple tehnique, oer better perormane than signature based methods or low ardinality set values [7] and or doument indexing [21]. Moreover they outperorm traditional indies like B-trees, or ontainment queries in RDBMSs [2]. For the aorementioned reasons, we hose the iles as the main point o reerene or the evaluation o the HT I index. 5.1 Methodology HTI index. We have implemented a prototype o the HT I index aording to the desription we gave in Setion 3. Sine query evaluation perormane is dominated by disk aesses, our implementation is aimed at providing aurate results on number o disk pages aesses during query evaluation on the HT -index. Some aspets o the index untionality were simulated; disk pages are 4k arrays in main memory, and sibling nodes Algorithm SupersetQueries Input: An HT I index H over a dataset D, a query set qs = { 1,..., k, i k+1,..., i n } and a query Q={t qs t.s}. Output: the t.id s o the transations that where t.s qs Method: 1. Determine the app = { 1,..., k } o the query set. 2. I app is not empty use supersettrie(app,root) to retrieve the andidateids rom the trie. 3. Let il 1... il m be the lists o all the non requent items o the qs and the andidateids, ordered aording to the number o memory pages 4. or (i=1 ; i n ; i++) 5. or eah entry t o il i 6. unmathed=t.length 1 7. i (unmathed == ) add t to result and break 8. or (j = i + 1 ; j n ; j++) 9. i (unmathed > n j) break 1. i (unmathed==) add t to result and break 11. san orward il j 12. i t ound in il j unmathed = unmathed return result Funtion supersettrie(app,urrentnode) Input: An HT I index H over a dataset D, the app o the qs, the root o the trie as urrentnode Output: The andidateids, i.e. the t.id s o the transations whose items are ontained in app Method: 1. while (app not empty) 2. newcnode=pop(app) 3. i newcnode is hild o urrentnode 4. add the sublist o newcnode to andidateids 5. supersettrie(app,newcnode) 6. return andidateids Figure 7: Algorithm or determining superset queries are stored in linked lists instead o arrays. This implementation provides aurate results both on the page aesses and on the size o aess tree in the main memory. The ormer are expliitly ounted by the program and the latter an be omputed by ignoring the links between sibling nodes. iles. We have implemented a basi version o the ile index. The voabulary is kept in a hash table and the lists in 4k arrays orresponding to disk pages. Eah entry in the ile omprises the id and the length o eah transation. The size o eah entry is e s = sizeo(long int) + sizeo(short int), whih is 6 bytes in our ase. Real data. We have evaluated HT I on two real datasets rom UCI KDD [8] arhive. Both o them are logs o user behavior on web portals. The irst one, denoted as msweb, is a one-week log traing the virtual areas that users visited in the web portal Eah reord orresponds to a user session and the set value omprises the areas she/he visited. There are 32k reords and the voabulary o the dataset ontains 294 distint items (areas). The distribution o the items in the reords is skewed and the average size o the reord is 3 items. Sine the dataset is small, to illustrate the perormane o the two indies better, we reated a new one, by dupliating the reords by a ator o 1, whih resulted to a dataset o 32k reords. This multipliation is reasonable, sine it simply orresponds to a 1 week log. The seond dataset, denoted msnb is again a log o users behavior on the web portal o msnb.om taken rom the UCI 734

8 KDD arhive as well. The voabulary here is very limited, omprising only 17 distint items and unlike the previous one, the distribution o the items is relatively uniorm. The average size o the reord is 5.7 items. Syntheti data. To investigate how HT I behaves or datasets and domains larger than the ones we had rom real soures, we used syntheti data, with a skewed zipian distribution o order 1 (as in [7]). Dupliates in eah transation were dropped and we ended up with transations with lengths rom 2 to 22 items, uniormly distributed. Query generation. We reated query sets or all the three types o queries. As in other approahes [7], we onsider the evaluation o the proposed method on queries that always have a solution as more inormative. We reated suh queries by randomly seleting existing transations rom D. For the syntheti data, we ranged the number o items in the query set,, rom 2 to 22 and we reated 5 queries o eah type. For the real data, we ranged the rom 2-7, sine their domain and the average reord length is a lot smaller. The seletivities o the subset queries are less than 3%, with highest appearing or queries with = 2. The most ommon ase or larger and or equality queries is that there are less than 5 answers. On the other hand the seletivity o superset queries an surpass 3% or large on the real data. Evaluation metris. We evaluate the HT I index by onsidering two main ators: (a) the beneit it provides to query evaluation, ompared to regular iles and (b) the main memory requirements it imposes. We evaluate the beneit to query evaluation by ounting as the dominating ator o the problem. We show how main memory requirements are aeted or the dierent D parameters by providing the number o aess tree nodes. Experimental setup. We implemented both methods in C, on a Linux platorm (Suse 9.3) and ompiled it with g version Our experiments were perormed on an AMD Sempron 28+ with 2G o main memory. The disk page aesses were diretly ounted by the program, by traing how many o the 4k arrays were aessed. 5.2 Perormane o the HT I index Real data To measure the beneit on query evaluation provided by the HT I on real data, we evaluated subset, equality and superset queries against the ile, and the HT I index. For the ase o the HT I index we varied the threshold, i.e., the perentage o items that omprise the I r. The results are depited in Figures 8 and 9. For the ase o msweb data, whih are skewed but they have larger voabulary than msnb data, we used as thresholds 5%, 2%, 4%. The size o the aess tree that must be kept in main memory is small in all ases, with the biggest being around 35k, or threshold 4%. For the ase o msnb data, where the voabulary is very small, we used the thresholds 2%, 6% and 1%. The largest aess tree in this ase is around 2k, or threshold 1%. Note that or a threshold o 1%, all items o I are indexed by the aess tree, thus or all types o queries no alse positives are retrieved rom the disk (we an iner the length o a transation by the length o the aess tree path i all items are indexed by the aess tree). As we an see the HT I index outperorms the ile in all ases. Moreover, it sales a lot better as the size o the query grows. For the larger queries, the perormane o HT I (with a suitable threshold) is at least a order o magnitude better or all types o queries Syntheti data By using syntheti data we are able to trae the impat o the voabulary I, the size o the dataset D and the size o the query set qs on the HT I index. In the ollowing we investigate how eah o the query types we introdued is aeted by these ators. Subset. In Figure 1 we see how the ile and the HT I index perorm or subset queries. We ompare three versions o HT I-index with the ile, eah time varying the threshold. Consider the irst variant o the HT I index with a I r o only the top.5% o the total items. In all three experiments o Figure 1, we ount the average number o page aesses perormed by all our queries on all our datasets as a untion o (a) the size o the voabulary, I (let); (b) the size o the underlying database D (enter), and () the number o items belonging to the query set qs (right). In all three ases, results are given or the average value o all parameters that do not appear in eah igure. Thus, when varying D, we present the average o the results or all I and, when we vary I we present the average o the results or all D and and when we vary we present the average o the results or all D and I. Individual results obey the general trend and are omitted or the interest o spae. In all ases, the HT I index outperorms the ile by a signiiant ator. It is important to note that the HT I seems to sale a lot better or large databases and large queries; whereas in the average ase the inrease o D seems to have a linear impat on the or both methods, the gradient o the HT I index perormane is signiiantly smaller. The larger the threshold is, the smaller the disk page aess inrease is. Furthermore, the inrease o the has diverting impat on the perormane o the ile and the HT I index. In the ormer ase it is ollowed by a proportional inrease in disk page aesses, whereas in the latter ase the required number o page aesses is redued. This is due to the at that when dealing with large queries, the hane o having more items rom I r is greater, thus the hane o perorming a more eetive pruning in the aesses tree is greater. The inrease o the voabulary size seems beneiial both or the HT I index and the ile, but as we show in the experiments or the HT I size, it signiiantly augments the memory requirements or the aess tree. Equality. Equality queries avor the HT I-index even more. In Figure 11 we assess the number o or equality queries as a untion o (a) the voabulary size, I (let), (b) the size o the underlying database, D (enter) and the number o items o the query set, (right). The evaluation in the ile requires exatly the same or equality queries, as it did or subset queries. On the other hand, evaluating equality queries in the HT I requires less than hal o the disk pages aesses it did or the respetive subset ones. This eet is even greater or queries with low ardinality qs. The main reason that makes equality queries behave better with the HT I index is that eah query requires retrieving one list rom the aess tree at most. 735

9 thres-5% thres-2% thres-4% Subset thres-2% thres-6% Subset thres-.5% I in 1 s thres-.5% thres-.5% I in 1 s thres-5% thres-2% thres-4% Equality thres-5% thres-2% thres-4% Figure 8: Average perormane o queries on msweb data thres-2% thres-6% Equality Superset 5 thres-2% thres-6% Figure 9: Average perormane o queries on msnb data thres-.5% D in 1 s Figure 1: Average perormane o subset queries thres-.5% D in 1 s Figure 11: Average perormane o equality queries thres-.5% Superset thres-.5% thres-.5% thres-.5% number o tree nodes in 1 s thres-.5% I in 1 s I in 1 s Figure 13: Eet o I number o tree nodes in 1 s D in 1 s Figure 12: Average perormane o superset queries thres-.5% D in 1 s Figure 14: Eet o D number o tree nodes D in millions o transations Figure 15: I = 5k,.5% Avg. page aesses Number o tree nodes in 1 s threshold Figure 16: Eet o k 736

10 Superset. As it an be inerred rom Figure 12 in superset queries the HT I-index learly outperorms the ile index. The ile perorms very poorly, sine it requires multiple sans o many lists. Note that the disk page aesses perormed in the evaluation o the superset queries surpass the needed by subset and equality queries by almost an order o magnitude. 5.3 Memory requirements o the HT I index The size o the aess tree o the HT I index or the real datasets we used is very small; or the ase o the msweb data it has only 1857 nodes (around 33kb) or a threshold o 5%, and in the worst ase (threshold 4%) it has 2569 nodes (around 369kb). For the ase o msnb data, it has only 7 nodes or a threshold o 2% and in the worst ase (threshold 1%) it has nodes (26kb). The size o the aess tree is important, sine it has to be resident in main memory; thereore, we investigated how it sales or larger D and I by using syntheti data. Figures 13 and 14 show how the aess tree is aeted by the voabulary size, I and the size o the database D. An interesting observation is that or smaller voabularies, where the queries take longer to evaluate due to the existene o larger lists, the size o the aess tree is smaller, too. This means that we an reate HT I indies with larger thresholds to ounter this eet. As the voabulary inreases, the maximum size o the trie augments superlinearly, thus, or large voabularies the aess tree tends to inrease in a proportional way to the database size. For small voabularies, the size o the aess tree grows sublinearly (or remains stable i the maximum size has been reahed) with respet to the database size. This is evident in Figure 15, where we vary the size o the database while keeping the voabulary ardinality at 5k and the HT I threshold at.5. In the respetive experiment with I = 1k the tree reahes its maximum size (31 nodes) very soon and remains invariant to the size o D. 5.4 Threshold hoie Whereas the voabulary and the database size depend on the data we have, the threshold or the HT I index is a hoie we must make aording to the speed requirements and the memory we have at our disposal. To highlight its eet we reated several HT I indies or dierent thresholds and we show their perormane in Figure 16 by varying the threshold rom.2% to 1%. We depit simultaneously how the aess tree grows, in 1 s o nodes, and how the average or the three types o queries all as the threshold grows. Ater a ertain threshold the average disk pages aesses are not signiiantly redued, whereas the size o the aess tree ontinues to grow, even i not as ast as or very low threshold. 6. CONCLUSIONS In this paper we have takled the problem o ontainment queries on large olletions o low ardinality set-valued attributes. We have proposed a novel indexing sheme, the HT I index, whih superimposes a trie tree (kept in main memory) over an ile (kept in seondary storage) to eiiently answer subset, superset and set-equality queries. We have introdued novel evaluation algorithms or these lasses o queries that use the HT I index and experimentally demonstrated that the HT I learly outperorms the state-o-the-art organization sheme, i.e., the ile, with reasonable main-memory overhead. Our experiments have showed that the sale o our approah is a lot smoother than the one o iles and in ertain ases, or large database or query-set sizes, we an redue the disk page aesses by orders o magnitude, with a small overhead o main memory. Future work omprises urther investigations on how to redue the size o the aess tree and how to exploit the HT I index to eiiently support other kind o queries, like, or example, set intersetions or similarity queries. 7. REFERENCES [1] R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Inormation Retrieval. ACM Press / Addison-Wesley, [2] C. Faloutsos. Signature iles. In Inormation Retrieval: Data Strutures & Algorithms, pages [3] A. Gionis, D. Gunopulos, and N. Koudas. Eiient and tunable similar set retrieval. In SIGMOD, 21. [4] R. Goldman and J. Widom. Wsq/dsq: A pratial approah or ombined querying o databases and the web. In SIGMOD, 2. [5] J. Han, J. Pei, Y. Yin, and R. Mao. Mining requent patterns without andidate generation. In SIGMOD, 2. [6] J. Han, J. Pei, Y. Yin, and R. Mao. Mining requent patterns without andidate generation: A requent-pattern tree approah. Data Mining and Knowledge Disovery, 8(1):53 87, 24. [7] S. Helmer and G. Moerkotte. A perormane study o our index strutures or set-valued attributes o low ardinality. VLDBJ, 12(3): , 23. [8] S. Hettih and S. D. Bay. The UCI KDD Arhive. University o Caliornia, Department o Inormation and Computer Siene [9] R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the integration o struture indexes and lists. In SIGMOD, 24. [1] D. E. Knuth. The Art o Computer Programming, Volume III: Sorting and Searhing. Addison-Wesley, [11] N. Mamoulis. Eiient proessing o joins on set-valued attributes. In SIGMOD, 23. [12] N. Mamoulis, D. W. Cheung, and W. Lian. Similarity searh in sets and ategorial data using the signature tree. In ICDE, 23. [13] S. Melnik and H. Garia-Molina. Adaptive algorithms or set ontainment joins. ACM TODS, 28(1):56 99, 23. [14] A. Moat and J. Zobel. Sel-indexing iles or ast text retrieval. ACM TOIS, 14(4): , Ot [15] S. Sarawagi and A. Kirpal. Eiient set joins on similarity prediates. In SIGMOD, 24. [16] F. Sholer, H. E. Williams, J. Yiannis, and J. Zobel. Compression o indexes or ast query evaluation. In ACM SIGIR, Aug. 22. [17] M. Stonebraker and D. Moore. Objet-Relational DBMSs: The Next Great Wave. Morgan Kaumann, [18] M. Terrovitis, S. Passas, P. Vassiliadis, and T. Sellis. HTI tehnial report. mter/papers/ TR-HTI-1.pd, 26. [19] I. H. Witten, A. Moat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Douments and Images. Morgan Kaumann, 2nd edition, [2] C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, and G. M. Lohman. On supporting ontainment queries in relational database management systems. In SIGMOD, 21. [21] J. Zobel, A. Moat, and K. Ramamohanarao. iles versus signature iles or text indexing. ACM TODS, 23(4):453 49, [22] J. Zobel, A. Moat, and R. Saks-Davis. An eiient indexing tehnique or ull text databases. In VLDB,

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

1. Introduction. 2. The Probable Stope Algorithm

1. Introduction. 2. The Probable Stope Algorithm 1. Introdution Optimization in underground mine design has reeived less attention than that in open pit mines. This is mostly due to the diversity o underground mining methods and omplexity o underground

More information

A Fast Sub-pixel Motion Estimation Algorithm for. H.264/AVC Video Coding

A Fast Sub-pixel Motion Estimation Algorithm for. H.264/AVC Video Coding A Fast Sub-pixel Motion Estimation Algorithm or H.64/AVC Video Coding Weiyao Lin Krit Panusopone David M. Baylon Ming-Ting Sun 3 Zhenzhong Chen 4 and Hongxiang Li 5 Institute o Image Communiation and Inormation

More information

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results An Alternative Approah to the Fuzziier in Fuzzy Clustering to Obtain Better Clustering Results Frank Klawonn Department o Computer Siene University o Applied Sienes BS/WF Salzdahlumer Str. 46/48 D-38302

More information

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks Query Evaluation Overview Query Optimization: Chap. 15 CS634 Leture 12 SQL query first translated to relational algebra (RA) Atually, some additional operators needed for SQL Tree of RA operators, with

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System Algorithms, Mehanisms and Proedures for the Computer-aided Projet Generation System Anton O. Butko 1*, Aleksandr P. Briukhovetskii 2, Dmitry E. Grigoriev 2# and Konstantin S. Kalashnikov 3 1 Department

More information

Incremental Mining of Partial Periodic Patterns in Time-series Databases

Incremental Mining of Partial Periodic Patterns in Time-series Databases CERIAS Teh Report 2000-03 Inremental Mining of Partial Periodi Patterns in Time-series Dataases Mohamed G. Elfeky Center for Eduation and Researh in Information Assurane and Seurity Purdue University,

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

1. The collection of the vowels in the word probability. 2. The collection of real numbers that satisfy the equation x 9 = 0.

1. The collection of the vowels in the word probability. 2. The collection of real numbers that satisfy the equation x 9 = 0. C HPTER 1 SETS I. DEFINITION OF SET We begin our study of probability with the disussion of the basi onept of set. We assume that there is a ommon understanding of what is meant by the notion of a olletion

More information

Chapter 2: Introduction to Maple V

Chapter 2: Introduction to Maple V Chapter 2: Introdution to Maple V 2-1 Working with Maple Worksheets Try It! (p. 15) Start a Maple session with an empty worksheet. The name of the worksheet should be Untitled (1). Use one of the standard

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

Exploring the Commonality in Feature Modeling Notations

Exploring the Commonality in Feature Modeling Notations Exploring the Commonality in Feature Modeling Notations Miloslav ŠÍPKA Slovak University of Tehnology Faulty of Informatis and Information Tehnologies Ilkovičova 3, 842 16 Bratislava, Slovakia miloslav.sipka@gmail.om

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center Construting Transation Serialization Order for Inremental Data Warehouse Refresh Ming-Ling Lo and Hui-I Hsiao IBM T. J. Watson Researh Center July 11, 1997 Abstrat In typial pratie of data warehouse, the

More information

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks Flow Demands Oriented Node Plaement in Multi-Hop Wireless Networks Zimu Yuan Institute of Computing Tehnology, CAS, China {zimu.yuan}@gmail.om arxiv:153.8396v1 [s.ni] 29 Mar 215 Abstrat In multi-hop wireless

More information

A Novel Range Compression Algorithm for Resolution Enhancement in GNSS-SARs

A Novel Range Compression Algorithm for Resolution Enhancement in GNSS-SARs sensors Artile A Novel Range Compression Algorithm or Resolution Enhanement in GNSS-SARs Yu Zheng, Yang Yang and Wu Chen Department o Land Surveying and Geo-inormatis, The Hong Kong Polytehni University,

More information

Coprocessors, multi-scale modeling, fluid models and global warming. Chris Hill, MIT

Coprocessors, multi-scale modeling, fluid models and global warming. Chris Hill, MIT Coproessors, multi-sale modeling, luid models and global warming. Chris Hill, MIT Outline Some motivation or high-resolution modeling o Earth oean system. the modeling hallenge An approah Sotware triks

More information

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq Volume 4 Issue 6 June 014 ISSN: 77 18X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om Medial Image Compression using

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

1 The Knuth-Morris-Pratt Algorithm

1 The Knuth-Morris-Pratt Algorithm 5-45/65: Design & Analysis of Algorithms September 26, 26 Leture #9: String Mathing last hanged: September 26, 27 There s an entire field dediated to solving problems on strings. The book Algorithms on

More information

Exploiting Enriched Contextual Information for Mobile App Classification

Exploiting Enriched Contextual Information for Mobile App Classification Exploiting Enrihed Contextual Information for Mobile App Classifiation Hengshu Zhu 1 Huanhuan Cao 2 Enhong Chen 1 Hui Xiong 3 Jilei Tian 2 1 University of Siene and Tehnology of China 2 Nokia Researh Center

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

splitting tehniques that partition live ranges have been proposed to solve both the spilling problem[5][8] and the assignment problem[8][9]. The parti

splitting tehniques that partition live ranges have been proposed to solve both the spilling problem[5][8] and the assignment problem[8][9]. The parti Load/Store Range Analysis for Global Register Alloation Priyadarshan Kolte and Mary Jean Harrold Department of Computer Siene Clemson University Abstrat Live range splitting tehniques divide the live ranges

More information

Scalable P2P Search Daniel A. Menascé George Mason University

Scalable P2P Search Daniel A. Menascé George Mason University Saling the Web Salable P2P Searh aniel. Menasé eorge Mason University menase@s.gmu.edu lthough the traditional lient-server model irst established the Web s bakbone, it tends to underuse the Internet s

More information

An Efficient and Scalable Approach to CNN Queries in a Road Network

An Efficient and Scalable Approach to CNN Queries in a Road Network An Effiient and Salable Approah to CNN Queries in a Road Network Hyung-Ju Cho Chin-Wan Chung Dept. of Eletrial Engineering & Computer Siene Korea Advaned Institute of Siene and Tehnology 373- Kusong-dong,

More information

Path Sharing and Predicate Evaluation for High-Performance XML Filtering*

Path Sharing and Predicate Evaluation for High-Performance XML Filtering* Path Sharing and Prediate Evaluation for High-Performane XML Filtering Yanlei Diao, Mihael J. Franklin, Hao Zhang, Peter Fisher EECS, University of California, Berkeley {diaoyl, franklin, nhz, fisherp}@s.erkeley.edu

More information

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive

More information

Tree Awareness for Relational DBMS Kernels: Staircase Join

Tree Awareness for Relational DBMS Kernels: Staircase Join Tree Awareness for Relational DBMS Kernels: Stairase Join Torsten Grust 1 and Maurie van Keulen 2 1 Department of Computer and Information Siene, University of Konstanz, P.O. Box D188, 78457 Konstanz,

More information

A {k, n}-secret Sharing Scheme for Color Images

A {k, n}-secret Sharing Scheme for Color Images A {k, n}-seret Sharing Sheme for Color Images Rastislav Luka, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos The Edward S. Rogers Sr. Dept. of Eletrial and Computer Engineering, University

More information

Mining effective design solutions based on a model-driven approach

Mining effective design solutions based on a model-driven approach ata Mining VI 463 Mining effetive design solutions based on a model-driven approah T. Katsimpa 2, S. Sirmakessis 1,. Tsakalidis 1,2 & G. Tzimas 1,2 1 Researh ademi omputer Tehnology Institute, Hellas 2

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

OvidSP Quick Reference Card

OvidSP Quick Reference Card OvidSP Quik Referene Card Searh in any of several dynami modes, ombine results, apply limits, use improved researh tools, develop strategies, save searhes, set automati alerts and RSS feeds, share results...

More information

Parametric Abstract Domains for Shape Analysis

Parametric Abstract Domains for Shape Analysis Parametri Abstrat Domains for Shape Analysis Xavier RIVAL (INRIA & Éole Normale Supérieure) Joint work with Bor-Yuh Evan CHANG (University of Maryland U University of Colorado) and George NECULA (University

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

P-admissible Solution Space

P-admissible Solution Space P-admissible Solution Spae P-admissible solution spae or Problem P: 1. the solution spae is inite, 2. every solution is easible, 3. evaluation or eah oniguration is possible in polynomial time and so is

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

Optimizing Correlated Path Queries in XML Languages. Technical Report CS November 2002

Optimizing Correlated Path Queries in XML Languages. Technical Report CS November 2002 Optimizing Correlated Path Queries in XML Languages Ning Zhang and M. Tamer Özsu Tehnial Report CS-2002-36 November 2002 Shool Of Computer Siene, University of Waterloo, {nzhang,tozsu}@uwaterloo.a 1 Abstrat

More information

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om A New-Fangled Algorithm

More information

Efficient and scalable trie-based algorithms for computing set containment relations

Efficient and scalable trie-based algorithms for computing set containment relations Effiient and salale trie-ased algorithms for omputing set ontainment relations Yongming Luo #1, George H. L. Flether #2, Jan Hidders 3, Paul De Bra #4 # Eindhoven University of Tehnology, The Netherlands

More information

Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps

Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps Stairase Join: Teah a Relational DBMS to Wath its (Axis) Steps Torsten Grust Maurie van Keulen Jens Teubner University of Konstanz Department of Computer and Information Siene P.O. Box D 88, 78457 Konstanz,

More information

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks Unsupervised Stereosopi Video Objet Segmentation Based on Ative Contours and Retrainable Neural Networks KLIMIS NTALIANIS, ANASTASIOS DOULAMIS, and NIKOLAOS DOULAMIS National Tehnial University of Athens

More information

Boosted Random Forest

Boosted Random Forest Boosted Random Forest Yohei Mishina, Masamitsu suhiya and Hironobu Fujiyoshi Department of Computer Siene, Chubu University, 1200 Matsumoto-ho, Kasugai, Aihi, Japan {mishi, mtdoll}@vision.s.hubu.a.jp,

More information

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors Eurographis Symposium on Geometry Proessing (003) L. Kobbelt, P. Shröder, H. Hoppe (Editors) Rotation Invariant Spherial Harmoni Representation of 3D Shape Desriptors Mihael Kazhdan, Thomas Funkhouser,

More information

XML Data Streams. XML Stream Processing. XML Stream Processing. Yanlei Diao. University of Massachusetts Amherst

XML Data Streams. XML Stream Processing. XML Stream Processing. Yanlei Diao. University of Massachusetts Amherst XML Stream Proessing Yanlei Diao University of Massahusetts Amherst XML Data Streams XML is the wire format for data exhanged online. Purhase orders http://www.oasis-open.org/ommittees/t_home.php?wg_abbrev=ubl

More information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

Semi-Supervised Affinity Propagation with Instance-Level Constraints

Semi-Supervised Affinity Propagation with Instance-Level Constraints Semi-Supervised Affinity Propagation with Instane-Level Constraints Inmar E. Givoni, Brendan J. Frey Probabilisti and Statistial Inferene Group University of Toronto 10 King s College Road, Toronto, Ontario,

More information

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8 Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introdution... 1 1.1. Internet Information...2 1.2. Internet Information Retrieval...3 1.2.1. Doument Indexing...4 1.2.2. Doument Retrieval...4

More information

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? 3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? Bernd Girod, Peter Eisert, Marus Magnor, Ekehard Steinbah, Thomas Wiegand Te {girod eommuniations Laboratory, University of Erlangen-Nuremberg

More information

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method 3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr

More information

Tackling IPv6 Address Scalability from the Root

Tackling IPv6 Address Scalability from the Root Takling IPv6 Address Salability from the Root Mei Wang Ashish Goel Balaji Prabhakar Stanford University {wmei, ashishg, balaji}@stanford.edu ABSTRACT Internet address alloation shemes have a huge impat

More information

Alias Detection in Malicious Environments

Alias Detection in Malicious Environments Alias Detetion in Maliious Environments Patrik Pantel Inormation Sienes Institute University o Southern Caliornia 4676 Adralty Way Marina del Rey, CA 90292 pantel@isi.edu Abstrat Alias detetion is a hallenging

More information

Sparse Certificates for 2-Connectivity in Directed Graphs

Sparse Certificates for 2-Connectivity in Directed Graphs Sparse Certifiates for 2-Connetivity in Direted Graphs Loukas Georgiadis Giuseppe F. Italiano Aikaterini Karanasiou Charis Papadopoulos Nikos Parotsidis Abstrat Motivated by the emergene of large-sale

More information

Title: Time-Based Tree Graphs for Stabilized Force Structure Representations

Title: Time-Based Tree Graphs for Stabilized Force Structure Representations Paper for the 8 th International Command & Control Researh & Tehnology Symposium Title: Time-Based Tree Graphs for Stabilized Fore Struture Representations Submitted by: Sam Chamberlain U.S. Army Researh

More information

A Novel Timestamp Ordering Approach for Co-existing Traditional and Cooperative Transaction Processing

A Novel Timestamp Ordering Approach for Co-existing Traditional and Cooperative Transaction Processing A Novel Timestamp Ordering Approah for Co-existing Traditional and Cooperative Transation Proessing Author Sun, Chengzheng, Zhang, Y., Kambayashi, Y., Yang, Y. Published 1998 Conferene Title Proeedings

More information

CA Test Data Manager 4.x Implementation Proven Professional Exam (CAT-681) Study Guide Version 1.0

CA Test Data Manager 4.x Implementation Proven Professional Exam (CAT-681) Study Guide Version 1.0 Implementation Proven Professional Study Guide Version 1.0 PROPRIETARY AND CONFIDENTIAL INFORMATION 2017 CA. All rights reserved. CA onfidential & proprietary information. For CA, CA Partner and CA Customer

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

Divide-and-conquer algorithms 1

Divide-and-conquer algorithms 1 * 1 Multipliation Divide-and-onquer algorithms 1 The mathematiian Gauss one notied that although the produt of two omplex numbers seems to! involve four real-number multipliations it an in fat be done

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

Detection and Recognition of Non-Occluded Objects using Signature Map

Detection and Recognition of Non-Occluded Objects using Signature Map 6th WSEAS International Conferene on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, De 9-31, 007 65 Detetion and Reognition of Non-Oluded Objets using Signature Map Sangbum Park,

More information

Colouring contact graphs of squares and rectilinear polygons de Berg, M.T.; Markovic, A.; Woeginger, G.

Colouring contact graphs of squares and rectilinear polygons de Berg, M.T.; Markovic, A.; Woeginger, G. Colouring ontat graphs of squares and retilinear polygons de Berg, M.T.; Markovi, A.; Woeginger, G. Published in: nd European Workshop on Computational Geometry (EuroCG 06), 0 Marh - April, Lugano, Switzerland

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Improvement of low illumination image enhancement algorithm based on physical mode

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Improvement of low illumination image enhancement algorithm based on physical mode [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 22 BioTehnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(22), 2014 [13995-14001] Improvement of low illumination image enhanement

More information

Introductory Programming, IMM, DTU Systematic Software Test. Software test (afprøvning) Motivation. Structural test and functional test

Introductory Programming, IMM, DTU Systematic Software Test. Software test (afprøvning) Motivation. Structural test and functional test Introdutory Programming, IMM, DTU Systemati Software Test Peter Sestoft a Programs often ontain unintended errors how do you find them? Strutural test Funtional test Notes: Systemati Software Test, http://www.dina.kvl.dk/

More information

A Support-Based Algorithm for the Bi-Objective Pareto Constraint

A Support-Based Algorithm for the Bi-Objective Pareto Constraint A Support-Based Algorithm for the Bi-Ojetive Pareto Constraint Renaud Hartert and Pierre Shaus UCLouvain, ICTEAM, Plae Sainte Bare 2, 1348 Louvain-la-Neuve, Belgium {renaud.hartert, pierre.shaus,}@ulouvain.e

More information

Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the Convergence of ICP Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome,

More information

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R.

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R. EngOpt 2008 - International Conferene on Engineering Optimization Rio de Janeiro, Brazil, 01-05 June 2008. Automated System for the Study of Environmental Loads Applied to Prodution Risers Dustin M. Brandt

More information

Improved flooding of broadcast messages using extended multipoint relaying

Improved flooding of broadcast messages using extended multipoint relaying Improved flooding of broadast messages using extended multipoint relaying Pere Montolio Aranda a, Joaquin Garia-Alfaro a,b, David Megías a a Universitat Oberta de Catalunya, Estudis d Informàtia, Mulimèdia

More information

A Support-Based Algorithm for the Bi-Objective Pareto Constraint

A Support-Based Algorithm for the Bi-Objective Pareto Constraint Proeedings of the Twenty-Eighth AAAI Conferene on Artifiial Intelligene A Support-Based Algorithm for the Bi-Ojetive Pareto Constraint Renaud Hartert and Pierre Shaus UCLouvain, ICTEAM, Plae Sainte Bare

More information

Query Optimization for Structured Documents Based on Knowledge. on the Document Type Denition. Institute of Inf. Systems. ETH Zentrum Dolivostr.

Query Optimization for Structured Documents Based on Knowledge. on the Document Type Denition. Institute of Inf. Systems. ETH Zentrum Dolivostr. Query Optimization for Strutured Douments Based on Knowledge on the Doument Type Denition Klemens Bohm Karl Aberer Institute of Inf. Systems GMD{IPSI ETH Zentrum Dolivostr. 15 8092 Zurih, Switzerland 64293

More information

PROJECT PERIODIC REPORT

PROJECT PERIODIC REPORT FP7-ICT-2007-1 Contrat no.: 215040 www.ative-projet.eu PROJECT PERIODIC REPORT Publishable Summary Grant Agreement number: ICT-215040 Projet aronym: Projet title: Enabling the Knowledge Powered Enterprise

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

Bayesian Belief Networks for Data Mining. Harald Steck and Volker Tresp. Siemens AG, Corporate Technology. Information and Communications

Bayesian Belief Networks for Data Mining. Harald Steck and Volker Tresp. Siemens AG, Corporate Technology. Information and Communications Bayesian Belief Networks for Data Mining Harald Stek and Volker Tresp Siemens AG, Corporate Tehnology Information and Communiations 81730 Munih, Germany fharald.stek, Volker.Trespg@mhp.siemens.de Abstrat

More information

Naïve Bayesian Rough Sets Under Fuzziness

Naïve Bayesian Rough Sets Under Fuzziness IJMSA: Vol. 6, No. 1-2, January-June 2012, pp. 19 25 Serials Publiations ISSN: 0973-6786 Naïve ayesian Rough Sets Under Fuzziness G. GANSAN 1,. KRISHNAVNI 2 T. HYMAVATHI 3 1,2,3 Department of Mathematis,

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs?

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs? One Against One or One Against All : Whih One is Better for Handwriting Reognition with SVMs? Jonathan Milgram, Mohamed Cheriet, Robert Sabourin To ite this version: Jonathan Milgram, Mohamed Cheriet,

More information

REVIEW OF THE SPACE MAPPING APPROACH TO ENGINEERING OPTIMIZATION AND MODELING

REVIEW OF THE SPACE MAPPING APPROACH TO ENGINEERING OPTIMIZATION AND MODELING REVIEW OF THE SPACE MAPPING APPROACH TO ENGINEERING OPTIMIZATION AND MODELING Mohamed H. Bakr Simulation Optimization Systems Researh Laboratory and the Department o Eletrial and Computer Engineering,

More information

Test Case Generation from UML State Machines

Test Case Generation from UML State Machines Test Case Generation from UML State Mahines Dirk Seifert To ite this version: Dirk Seifert. Test Case Generation from UML State Mahines. [Researh Report] 2008. HAL Id: inria-00268864

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

A scheme for racquet sports video analysis with the combination of audio-visual information

A scheme for racquet sports video analysis with the combination of audio-visual information A sheme for raquet sports video analysis with the ombination of audio-visual information Liyuan Xing a*, Qixiang Ye b, Weigang Zhang, Qingming Huang a and Hua Yu a a Graduate Shool of the Chinese Aadamy

More information

Radargrammetry and SAR interferometry for DEM generation: validation and data fusion

Radargrammetry and SAR interferometry for DEM generation: validation and data fusion adargrammetry and A intererometry or EM generation: validation and data usion Mihele Crosetto (), Fernando Pérez Aragues () () IIA - ez. ilevamento, Politenio di Milano P. Leonardo a Vini, Milan, Italy

More information

Creating Adaptive Web Sites Through Usage-Based Clustering of URLs

Creating Adaptive Web Sites Through Usage-Based Clustering of URLs Creating Adaptive Web Sites Through Usage-Based Clustering of URLs Bamshad Mobasher Dept. of Computer Siene, DePaul University, Chiago, IL mobasher@s.depaul.edu Robert Cooley, Jaideep Srivastava Dept.

More information

Space- and Time-Efficient BDD Construction via Working Set Control

Space- and Time-Efficient BDD Construction via Working Set Control Spae- and Time-Effiient BDD Constrution via Working Set Control Bwolen Yang Yirng-An Chen Randal E. Bryant David R. O Hallaron Computer Siene Department Carnegie Mellon University Pittsburgh, PA 15213.

More information

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays nalysis of input and output onfigurations for use in four-valued D programmable logi arrays J.T. utler H.G. Kerkhoff ndexing terms: Logi, iruit theory and design, harge-oupled devies bstrat: s in binary,

More information

TOWARD HYBRID VARIANT/GENERATIVE PROCESS PLANNING

TOWARD HYBRID VARIANT/GENERATIVE PROCESS PLANNING Proeedings of DETC 97: 1997 ASME Design Engineering Tehnial Conferenes September 14-17,1997, Saramento, California DETC97/DFM-4333 TOWARD HYBRID VARIANT/GENERATIVE PROCESS PLANNING Alexei Elinson Dept.

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

TMIX: Temporal Model for Indexing XML Documents

TMIX: Temporal Model for Indexing XML Documents TMIX: Temporal Model for Indexing XML Douments Rasha Bin-Thalab Department of Information System Faulty of omputers and Information Cairo University, Egypt azi_z30@yahoo.om Neamat El-Tazi Department of

More information

Recommendation Subgraphs for Web Discovery

Recommendation Subgraphs for Web Discovery Reommation Subgraphs for Web Disovery Arda Antikaioglu Department of Mathematis Carnegie Mellon University aantika@andrew.mu.edu R. Ravi Tepper Shool of Business Carnegie Mellon University ravi@mu.edu

More information

Adapting K-Medians to Generate Normalized Cluster Centers

Adapting K-Medians to Generate Normalized Cluster Centers Adapting -Medians to Generate Normalized Cluster Centers Benamin J. Anderson, Deborah S. Gross, David R. Musiant Anna M. Ritz, Thomas G. Smith, Leah E. Steinberg Carleton College andersbe@gmail.om, {dgross,

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information