Discovering Relational Patterns across Multiple Databases

Size: px
Start display at page:

Download "Discovering Relational Patterns across Multiple Databases"

Transcription

1 Dscoverng Relatonal Patterns across Multple Databases Xngquan Zhu, 3 and Xndong Wu Dept. of Computer Scence & Eng., Florda Atlantc Unversty, Boca Raton, FL 3343, USA Dept. of Computer Scence, Unversty of Vermont, Burlngton, VT 05405, USA 3 Graduate Unversty, Chnese Academy of Scences, Bejng 00080, Chna xqzhu@cse.fau.edu; xwu@cs.uvm.edu Relatonal patterns across multple databases can reveal specal pattern relatonshps hdden nsde data collectons. Exstng research n data mnng has made sgnfcant efforts n dscoverng dfferent types of patterns from sngle or multple databases, but how to fnd patterns that have a hgher support n database A than n database B wth a gven support threshold α s stll an open problem. We propose n ths paper DRAMA, a systematc framework for Dscoverng Relatonal patterns Across Multple databases. More specfcally, gven a seres of data collectons, we try to dscover patterns from dfferent databases wth patterns relatonshps satsfyng the user specfed constrants. Our method seeks to buld a Hybrd Frequent Pattern tree (HFP-tree) from multple databases, and mne patterns from the HFP-tree by ntegratng users constrants nto the pattern mnng process.. Introducton Many real-world applcatons nvolve the collecton and management of multple databases. Examples nclude market basket transacton data from dfferent branches of a whole sale store, data collectons of a partcular branch n dfferent tme perods, census data of dfferent states n a partcular year, and data of a certan state n dfferent years. For years, knowledge dscovery and data mnng (also referred to as KDD) [-] has been proven to be an effectve tool to search for novel and actonable patterns and relatonshps that exst n the data. When patterns take the form of assocaton rules, exstng research n the area has made sgnfcant efforts n dscoverng patterns (frequent temsets, closed patterns or sequental patterns) from dfferent types of data envronments, wth solutons roughly fall nto the followng three categores: () fndng patterns from a sngle (large volume) database; () fndng patterns from multple databases; and (3) fndng patterns from contnuous data streams. The essental goal s to enhance mnng algorthms such that they can scale up well to large volumes of (centralzed, dstrbuted or contnuous) data. To fnd patterns from multple databases, a common concern s to dscover knowledge whch does not exst unless one unfes all data collectons nto a sngle vew. For ths purpose, exstng research has been manly targeted to dscoverng global patterns, wth assstance of a local data mnng process. Collectve data mnng [3] s one of the most representatve efforts n the area wth the objectve of unambguous local analyss that can be used as a buldng block for generatng the correct global results. A common practce s to conduct data mnng on each sngle database, and then forward promsng meta patterns to a central place for analyss [4]. Ths research has been supported by the US Natonal Scence Foundaton (NSF) under Grant No. CCF and the Natonal Scence Foundaton of Chna (NSFC) under Grant No The problem of fndng global patterns s surely mportant n realty, as t reveals knowledge whch s unavalable from each sngle database pont of vew. There s, however, another problem nvolved n pattern mnng from multple databases dscoverng relatonal patterns and ther relatonshps across databases. Takng a retal store wth two branches A and B as an example, f a store manager were organzng data from these two branches for ntellgent analyss, he/she may easly rase concerns lke () what are the frequent patterns n both A and B?.e., (A α) & (B α), where α s the threshold n fndng frequent patterns, and A α means that a pattern s support value n database A should be no less than the value α; () what are the frequent patterns whch appear more often n A than n B,.e. A > B α; and (3) what are the patterns whose support dfferences n these two stores are no less than the value α,.e., A-B α. There are possbly many other concerns n ths regard, but unfortunately, no systematc soluton has been proposed to address ths ssue n an effectve way, such that the dscovered relatonal patterns can be used to support effcent and effectve data and knowledge management. In realty, when users are exposed to the data collected from multple sources, t s a natural sense to refer to a contrast study for knowledge and pattern dscovery. Examples nclude natonal census data analyss, network ntruson detecton, and molecular genetc data analyss. We lst here two motvatng examples. Example : Consderng a data expert who s nterested n studyng resdents of north eastern states of Amerca (.e., the so called New England area ncludng the states of Connectcut (CT), Mane (ME), Massachusettes (MA), New Hampshre (NH), Rhode Island (RI), and Vermont (VT)), ths expert may be also nterested n fndng the smlarty/dfferences between resdents n ths area and the resdents on the West Coast, say Calforna (CA). For these purposes, the followng queres are lkely to be rased by the expert. Query. Fndng patterns that are frequent wth a support level of α n all of the New England states, but sgnfcantly nfrequent wth support level of β n Calforna,.e, {(CT α) & (ME α) & (MA α) & (NH α) & (RI α) & (VT α)} & {CA <β}. Query. Fndng patterns that are frequent wth a support level of α n the New England area, w.r.t. all states,.e., {(CT+ME+MA+NH+RI+VT) α} Query 3. Fndng patterns that are frequent wth a support level of α n all New England States, but wth ther supports declnng from northern to southern states,.e., {ME > (NH VT) > MA > (CT RI) α} Example : Recent development n mcrobology and bonformatcs has made t possble to extract gene expresson data for molecular genetc analyss. One of the most mport applcatons s to use such gene expresson data for genetc dsease proflng, for example, the molecular cancer classfcaton /07/$ IEEE. 76

2 [5]. In order to detect sgnature patterns for Leukema, say Acute Myelod Leukema (AML) and Acute Lymphoblastc Leukema (ALL), a mcrobologst can splt the underlng data nto four datasets, wth D contanng gene expresson data of normal tssues, D contanng data of AML tssues, D 3 contanng all ALL tssues, and D 4 contanng all other cancer tssues. Queres of the followng types can then be used to capture the sgnature patterns for cancer classfcaton. Query : Fndng the patterns that are frequent wth a support level of α n ether of the cancer datasets: D, D 3, or D 4, but are sgnfcantly nfrequent n D..e., {(D D 3 D 4 ) α} & {(D < β)} Query : Fndng the patterns that are frequent wth a support level of α n all cancer datasets, but wth support n Leukema tssues hgher than other cancers tssues..e., {(D D 3 ) D 4 α} There are many other applcatons, asde from the above two examples, that users wll have to deal wth data from dfferent sources. In addton, t s often the case that users know some basc features of these data collectons, such as the date and tme each database was collected, or the regon or entty each database may represent. What remans unclear s the relatonshp of the patterns hdden across multple data collectons. As a result, the needs of comparng patterns from dfferent datasets and understandng ther relatonshps are emergng. For example, the store managers may want to fnd gradually ncreasng shoppng patterns of ther customers n a certan perod of tme, or a mcrobologst may want to fnd patterns of the dseases along an evolvng order. For these purposes, dscoverng relatonal patterns across multple databases can be a very mportant part of the KDD process. Although well motvated, the soluton to ths end, however, requres an effcent mechansm for complex querng and mnng on multple databases.. Smple Solutons and Challenges In a naïve sense, the problem of dscoverng relatonal patterns across multple databases can be solved by three smple solutons: () Sequental Pattern Verfcaton (SPV); () Parallel Pattern Mnng (PPM); and (3) Collaboratve Pattern Mnng (CPM). SPV starts pattern mnng from a seed database (whch can be a subset of a database n the query) and then passes on the dscovered patterns to the second database for verfcaton. Such a sequental process repeats untl patterns have been verfed by all the databases nvolved n the query. For example, to answer Query n Example, SPV may start from the CT database to fnd frequent patterns, then pass on patterns to database ME to fnd patterns frequent n both CT and ME. Any patterns whch do not satsfy the query wll be pruned out mmedately. Ths process repeats untl all the databases n the query have verfed the pattern. Instead of verfyng patterns n a sequental way, PPM concurrently dscovers patterns from each sngle database, and then forwards all frequent patterns (from each sngle database pont of vew) to a central place to fnd the ones whch satsfy the query constrants. For example, to answer Query n Example, PPM concurrently dscovers patterns from each sngle database (CT, ME,.. and VT), and then checks whether a pattern satsfes the query or not. One should be aware that t s techncally nfeasble to fnd patterns whch satsfy CA <β by usng database CA only, because no determnstc prunng rules wll hold and one has to lst all the canddates, f he/she does ntend to do so. Therefore, PPM wll concurrently mne patterns from all other parts (CT, ME,.., and VT), and then pass on the patterns to CA to verfy whether they satsfy CA <β. Both SPV and PPM rely on the results dscovered from a sngle database for pattern verfcaton, where the mnng process (canddate generaton and prunng) at each sngle ste does not consder the exstence of other databases at all (unless the patterns were forwarded to other databases for verfcaton). As we wll dscuss later, ths sngle database based framework wll forbd both SPV and PPM from answerng some complex queres. However, ths dsadvantage can be overcome by CPM, whch unfes all databases n the query nto one vew for canddate generaton and verfcaton. The theme of CPM s to generate length-l canddates from each sngle database, wth all canddates forwarded to a central place for canddate justfcaton, such that only canddates satsfyng certan condtons are redspatched to each database for the next round of pattern growng (length-l+). Ths procedure repeats untl no more canddates can be further generated. All the three methods above can somewhat fulfll the goal of fndng relatonal patterns across multple databases, although not necessarly for all types of queres. For example, SPV and PPM cannot possbly answer Query n Example. Because a pattern satsfyng {(CT+ME+MA+NH+RI+VT) α} can take any support value n each sngle database. For example, f a pattern s support was 0 n CT, ME, MA, NH, and RI, but α n VT, t would stll satsfy the query. To fnd such patterns, SPV and PPM have to lst all possble canddates (by settng each database s threshold value to 0), whch s techncally nfeasble. In fact, the most serous dsadvantage of all these three methods les n the fact that they are all Apror-based, where pattern generaton and database rescannng for verfcaton wll sgnfcantly reduce ther speed n fndng relatonal patterns. It s commonly recognzed that database rescannng for pattern verfcaton could be very tme consumng, especally when the underlyng data volumes are large. Therefore, we need a fundamentally dfferent desgn whch should take the followng concerns nto consderaton n dscoverng relatonal patterns. () Beng able to unfy all databases n the query to fulfll the pattern dscovery process. In other words, conductng pattern mnng from a sngle database wthout consderng all other databases s not an opton for us. () Beng able to meet all queres lsted n the above two examples. In Secton 4, we wll formally defne our problem and queres, whch should also be addressed by our solutons. (3) Beng able to scale well to large data volumes and can be easly extended to dscover other types of relatonal patterns other than frequent temsets. In ths paper, we take the above concerns nto consderaton and propose a hybrd frequent pattern (HFP) tree based soluton. Our method seeks to buld a sngle HFP-tree for each query, where pattern generaton and verfcaton unfy the underlyng databases to speed up the prunng process. Expermental comparsons from both synthetc and real-world databases wll demonstrate that ths framework can sgnfcantly enhance the speed n fndng relatonal patterns, where the mprovement can be as much as over 00 tmes better than smple solutons /07/$ IEEE. 77

3 3. Related Work The problem of handlng data from multple databases s a nontrval task n realty, and t often rases concerns lke how to compare or unfy dfferent parts of data to acheve a common goal. Domans of applcatons nclude classfcaton [6], frequent temset mnng [7-8], clusterng [9], and OLAP [7]. For example, Yn et. al [6] have prevously proposed a CrossMner for classfcaton from multple databases. The problem of assocaton rule mnng from dstrbuted databases has also been well studed [0-4], where count dstrbuton, data dstrbuton, and canddate dstrbuton are three basc mechansms for effectve mnng from multple databases [4]. However, among all these research actvtes, the focus has typcally been on mnng a sngle database (whether t s dstrbuted or centralzed), wth the objectve of unfyng patterns dscovered from each sngle database nto new knowledge and patterns. In comparson, our research focuses on fndng patterns and ther relatonshps across multple databases. When the underlyng data nvolve multple (dstrbuted /centralzed) sources, one of the most mportant tasks s to assess the smlarty between the datasets, such that the structural nformaton among the databases can be provded for analyss such as clusterng. [5] and [6] have prevously addressed the problem of database smlarty assessment by comparng assocaton rules from each component database, e.g. how many of those rules are dentcal, and what are the numbers of nstances covered by those dentcal rules? In comparson, we are nterested n fndng patterns across multple databases. The mportance of fndng dfferences between databases has been notced by many researchers n the area [7-0], wth the man focus on explorng dfferences between two databases at a tme. Webb et al. [8] proposed a rule based method to explore a contrast set between two databases. J et al. [0] have proposed methods to explore mnmal dstngushng subsequence patterns between two datasets, where the patterns take the form of frequent n database A, but sgnfcantly less frequent n database B,.e. {(A α)&(b β)}. All those methods are nterested n fndng dfferences (n terms of data tems or patterns) between two datasets, but cannot support the complex queres we mentoned n Example. The research n database queres has made sgnfcant efforts n supportng data mnng operatons [-3], wth extensons of the database query languages to support mnng tasks, but most of these efforts focus on a sngle database wth relatvely smple query condtons. Among them, the most relevant work related to ths research s the complex mnng optmzaton system proposed by Jn and Agrawal []. They presented an SQL-based mechansm from queryng frequent patterns across multple databases, wth the objectve of optmzng the users queres to fnd qualfed patterns. There are, however, essental dfferences between ther work and what we wll propose here. () The efforts n [] only focus on the problem of enumeratng dfferent query plans and choosng the one wth the least cost. The pattern mnng methods they adopted are actually the smple solutons as we dscussed n Secton. Instead of optmzng queres, our research wll propose a data mnng framework n supportng users queres to fnd relatonal patterns; () Because of the lmtatons of ther pattern mnng framework (relyng on each sngle database), the soluton n [] can only answer smple queres lke {(A α ) & ( B α ) & (C β)},.e., each element of such a query must explctly specfy one sngle database and ts correspondng threshold value. Ther method, however, cannot answer a complex query lke Queres and 3 n Example, and therefore ts applcablty s lmted n realty. Table Two toy datasets D and D Database D Database D Trans ID Items Trans ID Items {a, b, d} {c, f, g} {a, d, f, g} {a, b, d, g} 3 {a, b, c, d} 3 {a, b, c} 4 {a, c, d, g} 4 {a, b, d} 5 {b, d, f} 5 {a, c} 6 {a, b, d, g} 6 {e, c, d} 7 {e, f, d} 7 {a, c, d, f, g} 8 {a, b, c, e, g} 4. Problem Defnton A pattern, P, dscussed n ths paper takes the form as an temset,.e. a set of tems whch satsfy the user specfed constrant(s). P The support of the pattern P n database D, denoted by Sup, D represents the rato between the number of appearances of P n D and the total transacton number n D. Unless specfed otherwse, we always use ths rato to denote a pattern s support. The users constrants specfy the patterns they ntend to dscover from the database. For example, a user can specfy {D α} to ndcate that he/she s ntendng to fnd patterns from database D, wth all qualfed patterns support larger than the gven threshold α. A user can specfy multple databases n ther constrants, for example {A B α}, whch ndcates a pattern wth ts support values n A and B both larger than α, and n addton, a pattern s support n A should be larger than ts support n B. In ths paper, we defne the followng two types of relatonshp factors and four operators to descrbe a user s constrants. Relatonshp factors: X α (X > α) ndcates that X s no less than α ( X s larger than α) X α ( X < α) ndcates that X s no larger than α ( X s less than α) Operators: X + Y ndcates the operaton of summng up the support values n both X and Y X - Y ndcates the operaton of subtractng the support n Y from the support n X. X & Y (X Y) ndcates the operaton of X and Y ( X or Y) X ndcates the absolute support value n X. Notce that + drectly sums up support values from partcpant databases. The results from ths operator do not reveal patterns support values from the unon of the partcpant databases. Ths operator s helpful when a data manager ntends to fnd the average support of the patterns from multple databases. A user s query s smply the user s constrants, takng the form of a combnaton of the above relatonshp factors and operators, n fndng relatonal patterns across multple databases. More specfcally, a query should nvolve at least one database and one relatonshp factor, say {A α}. A query may also nvolve multple relatonshp factors and multple operators, whch s often the case n realty, such as the query {ME > (NH VT) > /07/$ IEEE. 78

4 MA > (CT RI) α} n Example. A pattern whch satsfes the user s query s called a relatonal pattern. Due to the lmtaton of the pattern mnng process, a user s query cannot take an arbtary form as he/she wshes, nstead, we confne that a query must nvolve at least one relatonshp factor (or >) wth a numercal threshold value mmedately followng ths factor. A query whch comples wth ths confnement s called a vald query. For example {A B C} s not a vald query; and however, {A B C α} s. The reason we defne a vald query s because wthout a specfc threshold α, t s techncally nfeasble to fnd all patterns satsfyng {A B C}. The procedure of dscoverng relatonal patterns across multple databases s an nteractve process, where a user provdes a query and the system fnds all patterns satsfyng the query, n an effectve way. In ths paper, we only deal wth the problem of pattern dscovery. We assume that users queres and the underlyng databases are mmedately avalable. The problems of effectve/effcent user nteractons and data prvacy/securty are not of our concern at ths stage. 5. Hybrd Frequent Pattern Tree Constructon The frequent pattern tree (FP-tree) [7] s a well-known data structure n mnng frequent temsets. The merts of the FP-tree le n the fact that t stores the set of frequent tems of each transacton n a compact structure, whch can avod repeatedly scannng the orgnal database durng the mnng process. In ths secton, we proposes a soluton to have multple databases joned together to buld a sngle Hybrd Frequent Pattern tree (HFP-tree), whch wll be used to dscover relatonal patterns at a later stage. Dfferent from the tradtonal FP-tree whch works on a sngle database, the purpose of an HFP-tree s to fnd the set of frequent temsets from transactons n all databases. For ths purpose, changes and extensons have been made accordngly. As one of the major changes, each node of the HFP-tree takes the followng form {x y :y : :y n }, where x s the name of the tem stored at the current node (denoted by tem_name), and y, y,..,y n are the numbers of tmes that a partcular temset has appeared n databases D, D,.., D n respectvely. Take D and D n Table as our example databases. Assumng they are jonng together to construct an HFP-tree, each node n the tree wll take the form of {x y :y } wth y and y denotng the numbers of tmes that the temset, wth tems startng from the Root and endng at the current node x, has appeared n databases D and D respectvely. If D and D are jonng together to buld a tree, they must agree wth, n advance, the order of the tems lsted n the tree. Here, we assume D and D agreed to lst ther tems accordng to the alphabetc order (we wll dscuss the generaton of ths lst n Secton 5.). We also dscard any threshold value at ths stage, and therefore all tems wll be added nto the HFP-tree. Gven a transacton n D, say Trans T ={a, b, d} where tems have been sorted accordng to the alphabetc order, the HFP-tree constructon wll start from the frst tem, a, and check whether any chld node of the Root has the same tem_name. Snce we know the HFP-tree s empty at ths stage, a s not a chld of Root. As a result, we construct a new chld node ϑ = {a :0} for Root, whch specfes that a s a chld of Root wth a appearng once n D and zero tme n D. After that, we move to the second tem b n T, and check whether b s a chld of the recently bult node ϑ = {a :0}. It s obvous that ϑ currently has no chld, so we buld another node ϑ ={b : 0}, and set ths node as the chld of ϑ. It means that temset {ab} has appeared once n D but 0 tme n D. Fnally, we move to the thrd tem d n T. We fnd d s not a chld of the recently bult node ϑ, so we buld another new node ϑ = {d :0} and set t as the chld of ϑ. Agan, t means that temset {abd} has appeared once n D but stll 0 tme n D. For any other transactons n D or D, we wll repeat the same procedure. Take the thrd transacton n D, T 3 ={a, b, c}, as an example. We frst check whether Root has any chld node named a, and snce we have prevously constructed such a node, we know for sure that t does exst. Denotng ths node by x, we wll ncrease x s frequency count for database D by step, then we check whether x has any chld node named b,.e., the second tem of T 3. We ncrease the frequent count for D by step, f such a node ndeed exsts; otherwse, we smply add a new member {b 0:} as the chld node of x. We recursvely repeat the above procedure untl we fnsh the last tem n T 3. The constructed HFP-tree for D and D s shown n Fgure. To speed up the tree transversal, a header table s bult for all tems ever lsted n the HFP-tree. As shown n Fgure (for a clearer presentaton, we only lst the header table for tems a, c, e, and g). For each tem, say g, ts lst records all the locatons where g has ever appeared n the HFP-tree. The purpose of the header table s to facltate the access of the tem sets endng wth the same tem letter. For example, n Fgure, f we want to fnd the set endng wth pattern letter c, we may smply go through all records of c s header lst, and at each locaton, trackng upwards to the Root wll produce an tem set assocated wth tem c. In Fgure, we have lsted detaled nformaton of buldng an HFPtree from multple databases. But before we go any further, we d lke to solve a partcular ssue rased by multple databases. Header Table Item head of node lnk a b c d e f g g : 0 HFP-Tree Root a 6:5 b : 0 c 0: e : b 4: 3 c : d : 0 d : 0 f 0: c 0: f : 0 c : d : d : f : 0 f : 0 g 0: d 0: d : 0 e : 0 d : 0 g : f 0: g : 0 g : 0 g 0: Fgure Example of constructed HFP-tree for D and D n Table. 5. Jont Rankng Lst In the above example, we assume that all parts partcpatng n the HFP-tree constructon use the same predefned tem lst (the alphabetc order of the tems). In realty, the order of the lst plays an mportant role to buld a compact HFP-tree. Take a dataset contanng four transactons {d}, {c, d}, {b, c, d}, and {a, b, c, d} as an example. A frequent pattern tree bult by usng the tems alphabetc order,.e, a, b, c, and d, wll have 0 nteror nodes (excludng the Root). On the other hand, f tems were prevously ranked n the descendng order of ts frequency,.e., d, c, b, and a, /07/$ IEEE. 79

5 the correspondng FP-tree wll have 4 nteror nodes only, whch s about 60% of tree sze reducton. Reducng the tree sze wll eventually lead to dramatc tme savng n buldng the pattern tree. To solve the problem, the orgnal FP-tree algorthm [7] scans the database beforehand to produce the rankng lst, and then use ths lst to buld the FP-tree. When several databases jon together to buld an HFP-tree, a smple soluton s to use a predefned tem lst to buld the HFPtree. Ths, however, wll sgnfcantly deterorate the system performances, because any lst wthout takng tem frequency nto consderaton wll lead to an nferor soluton and eventually rase the cost n tree constructon. For ths purpose, we propose a rank-jon based rankng mechansm. Gven M databases D, D,.., D M for HFP-tree constructon, assume I, I,.., and I N are the unon of the tems n the databases. For any database D, we scan t and rank all tems n D n a descendng order of ther frequency. Denotng R the rankng order of tem I j n database D (wth the frst tem n the lst denoted by ), then Eq. () wll represent the average rankng order of each tem I j. The fnal rankng lst for all tems s j constructed by rankng R, j=,, N, n an ascendng order, where tems wth the least average rankng are lsted at the top. j M j R = R () = M The above mechansm jons the ranks of each tem n all databases together to produce the fnal rankng. By dong so, we assume that databases are equally weghted, and the rank n all databases plays an equal role n decdng an tem s fnal rankng. In realty, the sze (number of transactons) of the databases nvolved n the query may vary sgnfcantly, where a database contanng more transactons should carry more weght n decdng the fnal rankng of a partcular tem. For ths purpose, we revse Eq. () by takng the sze of each database nto consderaton. Assume S s the number of transactons n D, then S=S +S +..+S M denotes the total number of transactons. The weghted average rankng order s then represented n Eq. (). Input: Databases D,.., D M, and ther mnmal support thresholds α,.., α M. Output: Hybrd Frequent Pattern tree, HFP-tree. Intalze an empty HFP-tree wth node Root only. Scan each database D,,D M once, and calculate the rankng order of each tem I,..,I N n each sngle database (tems wth ther support less than the correspondng threshold α are elmnated). 3. Use Eq. () to produce a jont rankng lst L 4. For each transacton T k n D, sort tems n T k accordng to the lst L. Denote sorted T k by T k wth tems n T k denoted by I,..,IK 5. ϑ Root; κ 6. For all chldren x of ϑ a. If x.tem_name = I κ.tem_name, ncrease the correspondng frequency count by step (the one correspondng to D ). ϑ x, κ κ+. Repeat step 5 untl κ=k. b. If no chld of ϑ has tem_name I κ.tem_name, create a new node y wth y.tem_name = I κ.tem_name. Intalze y s frequency to zero except for D, whch s set to.. Insert y as ϑ s chld. ϑ y, κ κ+. Repeat step 5 untl κ=k. 7. Repeat step 4 for all databases D,..,D M, then return the constructed HFP-tree Fgure Hybrd Frequent Pattern tree constructon j R j = M M = 5. HFP-tree Constructon S S S Fgure lsts the algorthm detals n buldng an HFP-tree from multple databases D,..D M. We assume here that each database D comes wth a mnmal support threshold n determnng ts frequent tems. In the next secton, we wll explan the detals on how to parse a user s query to generate such threshold values. 6. Dscoverng Relatonal Patterns Usng HFP- Tree 6. User Query Decomposton As we have dscussed n Secton 4, a user s query may nvolve multple relatonshp factors and operators. When submttng such a complex query to the data mnng model, t s often the case that not all parts of the query comply wth the down closure property,.e., the subset of a frequent temset may also be frequent. For example, the and < relatonshp factors normally do not comply wth the down closure property. It s obvous that even f a pattern n B, say {abc}, does not satsfy B β, but ts superset, say {abcd}, may stll comply wth B β. Therefore, the mnng process must preprocess a user s query and explctly decompose t nto a set of subqueres whch do comply wth the down closure property, such that the mnng model can use these subqueres to facltate the canddate prunng process. For ths purpose, we lst fve propertes here, and wll use these propertes to decompose each query before t s submtted to the data mnng model. All decomposed subqueres (whch comply wth the down closure property) are placed nto a Down Closure (DC) subset, and meanwhle the orgnal query s stll kept to check a pattern s valdty at the fnal stage. Property 6.. If a subquery has a sngle database and a threshold value α lsted on the left and rght sde of the relatonshp factor or > respectvely, then ths subquery comples wth the down closure property. Ths property s based drectly on the Apror rule n frequent temset mnng. If a pattern P s support n a database s less than a gven threshold α, then any supersets of P (the patterns growng from P) wll also have ther support less than α. Therefore, f a query nvolves multple databases, factors or >, and a sngle threshold value α, we may decompose ths query nto a set of subqueres wth each sngle database and the threshold value α lsted on the left and rght sdes of the factor. For example, the query {A B C α} can be decomposed nto three subqueres (A α ), (B α ), and (C α ), and placed nto the DC set. It s obvous that f a pattern P volates any one of these three subqueres, there s no way for P, as well as P s any supersets, to be a qualfed pattern. It s worth notng that subqueres n the DC set are merely for pattern prunng purposes, and one should not use them to replace the orgnal query. The orgnal query wll stll be used to verfy the patterns at the fnal stage (as we wll dscuss n the next subsecton). Property 6.. If a subquery has the sum ( + ) of multple databases and a threshold value α lsted on the left and rght sde of factor or > respectvely, then ths subquery comples wth the down closure property. R j () /07/$ IEEE. 730

6 For example, a subquery lke {(A+B+C) α} comples wth the down closure property, and can be drectly put nto the DC set. The proof of ths property s trval. Gven a pattern P and any of ts subpatterns Q, assumng P s and Q s supports n A, B and C are P, P, P 3 and Q, Q, Q 3 respectvely, t s obvous that Q P, Q P, Q 3 P 3. If (P +P +P 3 ) α, then t s obvous that (Q +Q +Q 3 ) (P +P +P 3 ) α. Therefore, the property s true. Ths property states that f a subquery sums up multple databases and s followed by factors or > and a threshold value α, then t should be placed nto the DC set for pattern prunng. Property 6..3 If a subquery has the support dfference of two databases, say (A-B), and a threshold value α lsted on the left and rght sde of factors or > respectvely, then ths subquery can be further transformed nto a subquery lke A α, whch stll comples wth the down closure property. It s obvous that f (A-B) α, then A (B+α). Snce a pattern s support n a database cannot be negatve, so we have A α. Property 6..4 If a subquery has the absolute support dfference of two databases, say A-B, and a threshold value α lsted on the left and the rght sde of factors or > respectvely, then ths query can be transformed nto a subquery lke {(A α) (B α)}, whch stll comples wth the down closure property. It s obvous that f A B α, then we have (A B) α or (A B) -α, whch lead to the nequatons A (B+α) or B (A+α),.e. {(A α) (B α)}. For any pattern P, f ts supports n A and B are both less than α, there s no way for P s superset to have a hgher support than α. Therefore, t stll comples wth the down closure property. Property 6..5 A subquery nvolves relatonshp factors or < wll most lkely not comply wth the down closure property, and therefore cannot be placed nto the DC set. Wth the above fve propertes, we can decompose most complex queres nto a set of subqures whch comply wth the down closure property, and use the DC set to support effcent pattern prunng. For example, Query 3 n Example, {ME > (NH VT) > MA > (CT RI) α} can be decomposed nto a set of subqueres lke ME α, (NH VT) α, MA α, and (CT RI) α, whch wll be used to check all canddates durng the pattern growng process. g HFP-tree Root a 6:5 c 0: b 4: 3 c : d : 0 f 0: c : d : d : f : 0 g 0: 6 e : 0 g : 0 A hybrd prefx path of g Meta HFP-tree hfg g Root a 4: c 0: c : d : d : Fgure 3 (b) The meta HFP-tree for T={g} Meta HFP-tree hfg gd g : f 0: g : 0 g : Root g 0: 3 a 3: Fgure 3 (a) The hybrd prefx paths of g Fgure 3 (c) The meta HFP-tree for T={gd} Fgure 3 A runnng example of hybrd prefx paths and a meta HFP-tree for tem g 6. Relatonal Pattern Dscovery Usng HFP-tree The constructon of the HFP-tree ensures that the set of frequent temsets for transactons n all databases can be enclosed nto a compact tree structure, but ths does not automatcally produce the relatonal patterns to meet our needs. In ths subsecton, we ntroduce the HFP-tree based mnng process n dscoverng relatonal patterns. Fgure 4 gves the pseudo code of the mnng process, whch manly conssts of two procedures: HFP-mnng and HFP-growth. In the man procedure, HFP-mnng, an nput query Q s frst decomposed nto a set of subqueres (DC). Then the system recursvely calls HFP-growth to dscover relatonal patterns from the HFP-tree, where the DC set s used to prune out unnecessary canddates on the fly, and the query Q s used at the fnal stage to assert the valdty of the patterns. Gven an HFP-tree bult from the multple databases, the HFPgrowth frst checks each node a n the header table of the tree. Because the header table has recorded the locatons where a has ever appeared n the tree, we can start from each of a s locatons l j and track upwards towards the Root, whch wll produce a hybrd prefx path HPP j for a (w.r.t. to the current locaton l j ). Fgure 3 pctorally demonstrates the concept of a hybrd prefx path for tem g of the HFP-tree n Fgure (for smplcty, we only show branches nvolvng g). In Fgure 3, g s header table has recorded sx (denoted by 6 dgtal numbers from to 6). For each locaton, say locaton, trackng from g upwards towards the Root wll produce a set {ecba}. We replace the support of each tem n the set by the current support of g, and t wll produce a path {e :0, c :0, b :0, a :0}, whch s called a hybrd prefx path (HPP) for g. It s understandable that an HPP records tems (and ther frequences w.r.t. to each database) whch co-occur wth g and have a hgher rank than g n the lst L. Parsng all the HPPs of g should be able to produce frequent temsets assocated wth g (The HFP-growth wll start from the tem wth the lowest rank for pattern growth). For ths purpose, for any tem n the hybrd prefx paths of g, we sum up ts frequences (w.r.t. to each database) from all locatons, whch wll drectly ndcate whether ths tem s frequently assocated wth g or not. For example, the other fve hybrd prefx paths n Fgure 3 are {d :, b :, a :}, {f 0:, d 0:, c 0:, a 0:}, {d :0, c :0, a :0}, {f :0, d :0, a :0}, and {f 0:, c 0:}. The total frequences of tems n g s hybrd prefx path are Freq g ={a 4:, b :, c :, d 3:, e :0, f :}. Dvdng all the frequency values by the total number of transactons n each database (D =8 and D =7) wll produce the support values of each tem Sup g =={a 0.5:0.9, b 0.5:0.4, c 0.5:0.9, d 0.38:0.9, e 0.3:0, f 0.3:0.9}. Gven a query Q={D D 0.5}, the query decomposton process wll produce a DC set lke DC={(D 0.5) AND (D 0.5)}. Comparng all tems support values n Sup g wth the DC set wll explctly ndcate that any of the followng tems, {b 0.5:0.4} {e 0.3:0}, and {f 0.3:0.9}, cannot form an temset wth g to satsfy the query Q. Therefore, we can prune out those unqualfed tems drectly, wth fltered HPPs of g denoted by {c :0, a :0}, {d :, a :}, {d 0:, c 0:, a 0:}, {d :0, c :0, a :0}, {d :0, a :0}, and {c 0:}. After that, we take each fltered HPP as a meta-transacton, and buld a Meta HFPtree for g, as shown n Fgure 3 (b). At any stage, f a meta HFP-tree, hfp, has more than one path, we wll have to recursvely call the HFP-growth procedure to check each node n the header table of hfp, and buld a meta HFP-tree for the node. The mnng process recursvely calls the /07/$ IEEE. 73

7 HFP-growth procedure, untl the meta HFP-tree eventually contans one path only. In Fgure 3(b), because the meta HFPtree of g, hfp g, contans more than one path, we wll recursvely call the HFP-growth to buld a meta HFP-tree for each of the nodes n hfp g (.e., tem d). For ths purpose, HFP-growth wll push the current tem g nto a base set T={g 4:3} (whch records the frequent tems so far), and conduct recursve pattern growth. The recursve HFP-growth process wll eventually lead to a meta HFP-tree contanng one or zero path. At ths stage, there s no need to grow patterns any further; nstead, we can drectly produce patterns by enumeratng all the combnatons of the nodes n the tree and appendng any of the combnatons to the underlyng base set T to generate a pattern P, as ndcated on lne e of the HFP-growth procedure (Fgure 4). Meanwhle, P s fnal supports are the mnmal support of all nvolved tems (w.r.t. to each database). I.e.,, where PSup = mn{ SupP [ k] }: mn{ SupP [ k] } k=,.., K k=,.., K Sup ] P k [ means the support value of the k th tem n P (w.r.t. to database D ) and K s the number of tems n P. For example, Fgure 3 (c) shows a one path (actually one node) meta HFP-tree bult for base set T={g 4:3, d 3:}. Appendng ths only node to the current base set T wll produce a pattern P=T {a 3:}. The fnal supports of P are the mnmal support value of the tem n P, whch s {3:},.e. P Sup ={0.38:0.8}. As we have analyzed n Secton 6., the DC set s not equvalent to the orgnal query, but rather for pattern prunng purposes only. Therefore, a pattern P whch s generated by usng the down closure rule n the DC set, does not necessarly comply wth the orgnal query Q. A valdty check must be conducted to assert whether P ndeed comples wth the query Q or not. Ths can be easly acheved by comparng pattern P s support P Sup wth the orgnal query Q. It s obvous that the supports of P={g 4:3, d 3:, a 3:} are P Sup ={0.38:0.8}, whch satsfy Q={D D 0.5}, then pattern P s eventually appended to the relatonal-pattern set RP. Input: an HFP-tree hfp bult from M databases, rankng lst L, and the orgnal query Q Output: Ratonal-pattern set, RP Procedure HFP-Mnng (HFP-tree, Q). Down Closure Set (DC) Query-Decomposton (Q). RP, T 3. HFP-growth (HFP-tree, T, RP, DC, Q, L) Procedure HFP-growth (hfp, T, RP, DC, Q, L) For each node n n the header table of hfp (n nverse order of the rankng lst L) a. S ; T T n. The supports of T are the mnmal support values of all the nodes n T (w.r.t. each database) b. For each of n s locaton a j n the header table of hfp. Buld a hybrd prefx path, HPP j, for a j,. S S HPP j c. Prune tems n S based on the down closure rule n the DC set d. Buld a meta HFP-tree, hfp, based on the remanng tems n S and rankng lst L e. If hfp contans a sngle path PS. For each combnaton (denoted by π) of the nodes n the path PS. Generate pattern P T π, the supports of P are the mnmal support values of the node n π (w.r.t. each database) M PSup = mn{ SupP [ k] }:...:mn{ SupP [ k] }, where Sup P [ k ] means the support k=,.., K k=,.., K value of the k th tem n P (w.r.t. to database D ) and K s the number of tems n P. Check whether P comples wth the query Q; f t does, RP RP P f. Else. HFP-growth (hfp, T, RP, DC, Q, L) 7. Expermental Evaluaton In ths secton, we report expermental evaluatons and a comparatve study wth two smple soluton based relatonalpattern dscovery mechansms. Our test datasets are collected from two sources: () synthetc databases generated by usng IBM Quest data generator [8][4]; and () the IPUMS (Integrated Publc Use Mcrodata Seres) 000 USA census mcro-data wth % samplng rate [5]. All experments are performed on a.0 GHz Pentum PC machne wth 5MB man memory. All the programs are wrtten n C++, wth the ntegraton of an STL-lke C++ tree class [6] to fulfll the tree constructon and access. Although t s possble for DRAMA to reuse a prevously constructed HFP-tree to answer multple queres, for farness n comparson, DRAMA wll ntate HFP-tree constructon and HFP-mnng for each query. In the followng tables and fgures, unless specfed otherwse, the runtme always means the total executon tme,.e., the tree constructon plus the mnng tme. For a comparatve study, we mplement two smple solutons, SPV and CPM, as we have dscussed n Secton. Whle SPV sequentally mnes and verfes patterns from each database, CPM wll generate canddates from each component database, and refer to the collaboratve mnng process for canddate prunng. For SPV, we use FP-tree nstead of the Apror algorthm to mne patterns from the frst database. Because CPM needs canddates generated at each sngle database for collaboratve mnng, we apply the tradtonal Apror algorthm on each database. The runtme of CPM s the pattern mnng tme of the databases wth the largest tme expense plus the tme for collaboratve mnng and pattern verfcaton. Because real-world databases can vary sgnfcantly n sze, we generate four synthetc databases wth dfferent szes, as shown n Table. The explanatons of the database descrpton can be found n [8]. In short, T0.I6.D300k.N000.L000 means a database wth 300,000 transactons and 000 tems, where each transacton contans 0 tems, and each pattern contans 6 tems on average. It s understandable that the runtme of the systems wll crucally rely on the underlyng queres. For an objectve assessment, we defne fve queres, as shown n Table 3, and wll demonstrate the average system runtme performances n answerng these queres. Table Synthetc database characterstcs Database Database descrpton D T0.I6.D300k.N000.L000 D T0.I6.D00k.N000.L000 D 3 T0.I6.D00k.N000.L000 T0.I6.D50k.N000.L000 D 4 Table 3 Query plan descrpton Query Query constrants Q {D D D 3 α} Q {(D + D ) α} {(D 3 +D 4 ) α} Q 3 {(D D ) (D 3 D 4 ) α} Q 4 {D (D D 3 ) α} & {D 4 β} Q 5 { D D (D 3 + D 4 ) α} Fgure 4 Relatonal-pattern mnng usng HFP-tree /07/$ IEEE. 73

8 7. HFP-tree Constructon Results In Secton 5. we have proposed a jont rankng lst whch ranks tems from dfferent databases for HFP-tree constructon. We report n ths secton the performance of ths rankng mechansm n facltatng tree constructon and pattern growth processes. We apply Q n Table 3 on the synthetc databases, and use both the jont rankng lst and the fxed rankng lst to buld HFP-trees. We report the results n Fgure 6, where Fgure 6(a) denotes the comparson of the HFP-tree constructon tme, Fgure 6(b) represents the comparson of the total number of HFP-tree nteror nodes, and Fgure 6(c) reports the comparson of the HFP-growth tme. In all fgures, the x-axs denotes the support threshold α n Q, and the y-axs denotes the results of dfferent measures. The meanng of each curve n Fgure 6 s explaned n Fgure 5. As shown n Fgure 6(a), the proposed jont rankng lst can dramatcally reduce the tme n buldng an HFP-tree from multple databases, where the lower the support threshold α, the more sgnfcant the mprovement can be observed. When α=%, t wll cost the fxed rankng lst and jont rankng lst about 98 seconds and 60 seconds respectvely to buld the HFP-tree; on the other hand, when α becomes sgnfcantly low, say 0.0%, the cost of the jont rankng lst ncreases to about 98.5 seconds, whch s about 3.5 tmes less than the tme of the fxed rankng lst (364.8 seconds). A low α value wll have most tems n the database become frequent, and therefore be added nto the HFPtree. Ths can be very tme consumng, f the nsertng process does not take tem frequency nformaton nto consderaton, because each tem needs to check wth the exstng HFP-tree to fnd whether the current path already contans ths tem or not. The more the frequent tems, the fatter the HFP-tree, and the more tme s gong to be spent on ths process. On the other hand, a rankng order whch unfes the tem frequency nformaton from all databases can sgnfcantly reduce the tme n nsertng each transacton nto the HFP-tree, because each tem a wll have less search space n verfyng whether the current node (of the HFP-tree) already contans a or not. In addton, snce the jont rankng lst has tems sorted by ther frequences before they were nserted nto the HFP-tree, t wll have a better chance, compared to the fxed rankng lst, to force tems n a frequent temset to follow a sngle path, and consequently reduce the sze of the constructed HFP-tree. As shown n Fgure 6(b), the nteror node number of the HFP-tree bult from the jont rankng lst s about % to 0% less than the tree bult from the fxed rankng lst. Because of the HFP-tree qualty mprovement (more compact and less nteror nodes), the HFP-growth process wll consequently grow faster n fndng frequent patterns, as shown n Fgure 6(c). Snce the jont rankng lst unfes the rankng order of each tem from dfferent databases, one may argue that why we don t just treat all tems as they were from one sngle database, e.g., D=D +D +D 3, and then rank the tems accordng to ther total frequences (wth nfrequent tems n each database removed beforehand), just lke the tradtonal FP-tree method does. However, such a global rankng lst revews tems as they come from a sngle database wthout consderng ther frequences n each sngle database, whch may produce a lst nferor to the one from the jont rankng lst. For example, f the frequences of tems {a, b, c} n D and D are {3000, 000, 900} and {00, 000, 000} respectvely. The global rankng lst wll sum up each tem s frequency and produce the lst L=abc; on the other hand, the jont rankng lst wll produce the lst L=bac. Consderng that the most possble frequent temsets n D and D are {bc} nstead {ac} or {ab}, the jont rankng lst may lead to better results n realty. Fgure 6(a) also reports the HFP-tree constructon tme of the global rankng lst, whch further supports our analyss. The HFP-growth on the tree bult from the global rankng lst also needs more tme than the one bult from the jont rankng lst, and we therefore omt the results from ths mechansm n Fgures 6(b) and 6(c). 7. Query Runtme Comparson Fgure 7 reports a detaled runtme performance comparson between DRAMA and two smple solutons (SPV and CPM) on Q n Table 3, where the x-axs denotes the support threshold value α and the y-axs represents the system runtme n seconds. For a detaled comparson, we also lst the actual value of each method n the fgure. When the threshold value s relatvely small, say 0.05% or 0.0%, the runtmes of SPV and CPM are extremely large, whch makes no sense for comparson (the empty cells). Overall, DRAMA lnearly responds to the threshold value α and does an excellent job n answerng the query Q. When the value of α s larger than.5%, we notce that DRAMA s nferor to both SPV and CPM. A further study shows that for large α values, the tme for HFP-tree constructon becomes sgnfcant, Jont rankng lst Fxed rankng lst Global rankng lst Fgure 5 The meanngs of curves n Fgure 6 HFP-tree Constructon Tme (s) # of HFP-tree nodes HFP-growth Tme (s) Support Threshold (%) Support Threshold (%) Support Threshold (%) (a) HFP-tree constructon tme (b) # of HFP-tree nteror nodes (c) HFP-growth tme Fgure 6 HFP-tree constructon comparsons on Query n Table. (a) the HFP-tree constructon tme; (b) the total number of HFP-tree nteror nodes; and (c) the HFP-growth tme /07/$ IEEE. 733

9 compared to the tme for HFP-growth. For example, when α=.5%, DRAMA spends about 68 seconds on buldng the HFP-tree; however, t only costs about 9 seconds for the HFPgrowth to mne the patterns. At ths support value level, SPV apples an FP-tree based algorthm on D, whch outputs only 96 patterns for D to verfy. So the performance of SPV at α=.5% s really just the runtme of the FP-tree mnng on D. On the other hand, when the threshold value decreases, the patterns generated from D can sgnfcantly ncrease, whch leads to a huge runtme expense for D to verfy these patterns (notce that database scannng for pattern verfcaton can be very expensve, especally for large databases). For example, when α=0.%, D wll generate about eghty thousand patterns whch need to be verfed by D, among whch about ten thousands patterns wll further need to be verfed by D 3. As shown n Fgure 7, the sequental verfcaton mechansm of SPV needs more than ten thousand seconds to check all those patterns. For DRAMA, although the tree constructon at ths level (α=0.%) costs about 96 seconds, the ntegrated pattern prunng mechansm wll sgnfcantly reduce the HFP-growth tme to about 0 seconds only. So n total, DRAMA can answer Q n about 06 seconds, whch s a huge mprovement compared to SPV. Although our analyss n Secton suggests that Collaboratve Parallel Mnng (CPM) may possbly outperform SPV, because of the underlyng collaboratve mnng process n canddate prunng. The results n Fgure 7 ndcate that ths s not the case. Because CPM needs multple databases to forward ther canddates to a central place for collaboratve mnng (by prunng unqualfed canddates), we can only apply Apror on each sngle database. So the system performance of CPM s crucally bounded by the poor performance of Apror based algorthms. When the support value α s large, say %, the performance of Apror and FP-tree s almost dentcal (snce not many tems can be frequent). However, for small α values, the stuaton can be totally dfferent. For example, when α=0.%, about 680 tems n D are frequent, whch produces more than 30 thousand length- patterns from D (although collaboratve pattern prunng can somewhat remove some canddates, t stll leaves a large number of canddates for D to evaluate). Ths huge burden sgnfcantly slows down the performance of CPM, and makes t almost unbearable n answerng many queres. However, beng worse than SPV does not necessarly mean CPM s useless. As we have analyzed n Secton, some queres lke Q n Table 3 cannot be answered by SPV, because no mnng from a sngle database can produce answers for Q. For such stuatons, CPM becomes useful. To answer a query lke Q, we need a mnng process whch s able to unfy multple databases nto one vew. Both DRAMA and CPM can possbly attan ths by usng ther collaboratve mnng and pattern prunng process, where only patterns wth ther support satsfyng (D + D ) α or {(D 3 +D 4 ) α are kept for further actons. For DRAMA, nstead of preflterng any sngle nfrequent tems before the HFP-tree constructon, we wll buld an HFP-tree by usng all tems n the transactons, and then let HFP-growth prune out the canddates on the fly. Ths mechansm turns out to be very effcent n realty, as the HFPtree constructon n ths case spends only 05 seconds (whch s about 7 seconds more than α=0.0%). As shown n Table 4 (where the value of α s fxed to 0.5%), the run tme performance of DRAMA s much better than CPM n answerng Q. Table 4 has further lsted a runtme comparson between DRAMA and CPM n answerng other queres n Table 3, wth the performance of DRAMA consstently and sgnfcantly better than CPM for all the queres. Query runtme comparson (seconds) DRAMA SPV CPM DRAMA SPV CPM Support Threshold (%) Fgure 7 Query runtme comparson on Q n Table 3 Table 4 Query runtme comparson on Q, Q 3, Q 4, and Q 5 n Table 3 (α=0.5%, β=0.0%) Algorthm Q Q 3 Q 4 Q 5 DRAMA CPM Case Study on a Real-world Dataset To further assess the system performance of our proposed effort on real-world datasets, we download the US 000 census mcrodata from the IPUMS [6], whch provdes census nformaton about the US resdents (ndvduals and households). We use % sample of the year 000 census data wth forty seven attrbutes. Those attrbutes cover age, household/personal ncome, educaton, race, ctzenshp, poverty, and famly relatonshp etc. Because many attrbutes contan multple attrbute values, and some attrbutes are numercal, we further dscretze each contnuous attrbute and extend the total attrbute to 587 dstnct tems. We ntentonally collect the data from four states (Calforna, New York, Florda, and Vermont), correspondng to datasets CA, NY, FL, and VT. Dependng on the number of populatons n each state, the sze of the dataset vares from 6000 (Vermont) to over 330,000 records (Calforna). Table 5 reports a runtme performance comparson among DRAMA, CPM, and SPV, wth dataset settngs D =CA, D =NY, D 3 =VT, and D 4 =FL, and two sets of support threshold values. Because SPV s not able to answer Q and Q 5, ts results n the correspondng cells are set to N/A. Because census data are not randomly generated (lke the synthetc data), tem frequences are not random wth many tems frequences sgnfcantly hgher than others. So the support threshold values (α and β) we choose are relatve hgh. But even so, we can see that DRAMA consstently outperforms both SPV and CPM wth a sgnfcant runtme mprovement. The results n Table 5 ndcate that dfferent from the synthetc data, CPM actually has a much better performance than SPV n answerng some queres. In fact, when α=40% and β=5%, although t wll cost FP-tree mnng about 5 seconds to mne patterns from D, there are over ten thousand patterns generated from D wth the longest pattern contanng 3 tems. All these patterns need to be verfed by D, whch ncreases the runtme sgnfcantly. On the other hand, at the /07/$ IEEE. 734

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds TF 2 P-growth: An Effcent Algorthm for Mnng Frequent Patterns wthout any Thresholds Yu HIRATE, Ego IWAHASHI, and Hayato YAMANA Graduate School of Scence and Engneerng, Waseda Unversty {hrate, ego, yamana}@yama.nfo.waseda.ac.jp

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models Text and Data Mnng In Innovaton Joseph Engler Innovaton Typology Generatonal Models 1. Lnear or Push (Baroque) 2. Pull (Romantc) 3. Cyclc (Classcal) 4. Strategc (New Age) 5. Collaboratve (Polyphonc) Collaboratve

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Oracle Database: SQL and PL/SQL Fundamentals Certification Course

Oracle Database: SQL and PL/SQL Fundamentals Certification Course Oracle Database: SQL and PL/SQL Fundamentals Certfcaton Course 1 Duraton: 5 Days (30 hours) What you wll learn: Ths Oracle Database: SQL and PL/SQL Fundamentals tranng delvers the fundamentals of SQL and

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

UNIT 2 : INEQUALITIES AND CONVEX SETS

UNIT 2 : INEQUALITIES AND CONVEX SETS UNT 2 : NEQUALTES AND CONVEX SETS ' Structure 2. ntroducton Objectves, nequaltes and ther Graphs Convex Sets and ther Geometry Noton of Convex Sets Extreme Ponts of Convex Set Hyper Planes and Half Spaces

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

AP PHYSICS B 2008 SCORING GUIDELINES

AP PHYSICS B 2008 SCORING GUIDELINES AP PHYSICS B 2008 SCORING GUIDELINES General Notes About 2008 AP Physcs Scorng Gudelnes 1. The solutons contan the most common method of solvng the free-response questons and the allocaton of ponts for

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Learning from Multiple Related Data Streams with Asynchronous Flowing Speeds

Learning from Multiple Related Data Streams with Asynchronous Flowing Speeds Learnng from Multple Related Data Streams wth Asynchronous Flowng Speeds Zh Qao, Peng Zhang, Jng He, Jnghua Yan, L Guo Insttute of Computng Technology, Chnese Academy of Scences, Bejng, 100190, Chna. School

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Priority queues and heaps Professors Clark F. Olson and Carol Zander

Priority queues and heaps Professors Clark F. Olson and Carol Zander Prorty queues and eaps Professors Clark F. Olson and Carol Zander Prorty queues A common abstract data type (ADT) n computer scence s te prorty queue. As you mgt expect from te name, eac tem n te prorty

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

Ontology Generator from Relational Database Based on Jena

Ontology Generator from Relational Database Based on Jena Computer and Informaton Scence Vol. 3, No. 2; May 2010 Ontology Generator from Relatonal Database Based on Jena Shufeng Zhou (Correspondng author) College of Mathematcs Scence, Laocheng Unversty No.34

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research Schedulng Remote Access to Scentfc Instruments n Cybernfrastructure for Educaton and Research Je Yn 1, Junwe Cao 2,3,*, Yuexuan Wang 4, Lanchen Lu 1,3 and Cheng Wu 1,3 1 Natonal CIMS Engneerng and Research

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework

Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework Fuzzy Weghted Assocaton Rule Mnng wth Weghted Support and Confdence Framework M. Sulaman Khan, Maybn Muyeba, Frans Coenen 2 Lverpool Hope Unversty, School of Computng, Lverpool, UK 2 The Unversty of Lverpool,

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals Agenda & Readng COMPSCI 8 SC Applcatons Programmng Programmng Fundamentals Control Flow Agenda: Decsonmakng statements: Smple If, Ifelse, nested felse, Select Case s Whle, DoWhle/Untl, For, For Each, Nested

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information