Efficient Semantically Equal Join on Strings in Practice

Size: px
Start display at page:

Download "Efficient Semantically Equal Join on Strings in Practice"

Transcription

1 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 Effcent Semantcally Equal Jon on Strngs n Practce Juggapong Natwcha Computer Engneerng Department, Faculty of Engneerng Chang Ma Unversty, Chang Ma, Thaland 5000 juggapong@changma.ac.th Xngzh Sun IBM esearch Laboratory, Bejng, Chna sunxngz@cn.bm.com Mara E. Orlowska School of Informaton Technology and Electrcal Engneerng The Unversty of Queensland, Brsbane, Qld 407, Australa mara@tee.uq.edu.au Abstract In general, data ntegraton begns wth schema ntegraton and must be followed by detaled data nstances resemblance n order to reach data representaton unfcaton. In ths paper, we address the lmts of the data-level reconclaton automaton process n cases where the compared data s semantcally equvalent, but the data representaton of the values of gven attrbutes s dfferent. We assume that a semantc relatonshp between potentally used terms s establshed by a human expert pror to the desgned computatons, and s represented n an auxlary table bult and mantaned for each attrbute. On such a context, we ntroduce a noton of semantcally equal jon (SEJ), whch s the jon operaton based on a pre-defned semantc relatonshp. Our goal s to propose a soluton for SEJ that can be supported by standard SQL. The paper begns wth an llustraton of the approach for a sngle attrbute jon, followed by a generalsaton of SEJ for any number of jon attrbutes. The paper contnues wth performance consderatons for the nvented method. Fnally, ths aspect s supported by extensve expermentaton based on our mplementaton of SEJ executed on synthetc datasets. Keywords: Approxmate Strng Jon, Data Integraton, Semantcally Equal Jon.. Introducton The data ntegraton ssue [] s one of the most challengng problems that computer scence and IT practtoners have faced n the last decade or so. Ths problem orgnally emerged when demand for data access from multple sources was observed n the context of dstrbuted mult-database systems; later, database federaton systems and other versons of data ntegrated constructons ncludng data warehousng and, very recently, e-research applcatons. Frstly, ntegraton problems occur at the schema level [] where the name, meanng and constrant scope of attrbutes must be addressed. Usually, human nvolvement n ths process s necessary [3]. Subsequently, the data-level ntegraton 54

2 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 problems need to be addressed. The problem at ths level exsts due to: potental msmatch of attrbutes domans, adopted strngs to represent attrbutes values, as well as conventons to express the data felds. Addtonally, the data msmatch may be caused by many other reasons, such as, for nstance, typng errors. In ths paper, we consder the data ntegraton where two gven relatons are joned wth each other on some attrbutes. On ths context, we address the synonym msmatch problem,.e., the values of jon attrbutes are semantcally equvalent, but unable to match due to dfferent representaton, e.g. dfferent abbrevaton standards. Naturally, on such an attrbute doman, the semantc equvalence relatonshp between strngs needs to be pre-defned. We call the jon operatons that are based on ths semantc equvalence relatonshp, semantcally equal jon (SEJ). For example, from Fgure a, and are two relatons whch contan the medcal records of two dfferent hosptals respectvely. Assume that one ssues a query to fnd pars of patents who have the same type of dsease. Due to the fact that n medcal doman, strngs Tumours, Kunb, Neoplasms are semantcally equvalent and so are the strngs Mad Cow" and B.S.E." (Bovne Spongform Encephalopathy), the result of the query should be the relaton gven n Fgure b. We can see that ths query result can be obtaned by jonng two relatons and on ther common attrbute Dsease" such that the jon condton s true f and only f the values of the jon attrbute are semantcally equvalent. In the rest of ths paper, the focus s to formally defne the SEJ operaton and to propose an effcent approach to compute SEJ n a standard SQL envronment. Our dscusson begns wth an llustraton of the approach for -attrbute SEJ. Frst, to represent the semantc relatonshp between potentally used terms, we ntroduce the auxlary table whch s an addtonal relaton bult and mantaned for the par of attrbutes on whch the jon s to be performed. The desgn of the auxlary table s mportant because t drectly affects the SEJ performance. To address ths ssue, we desgn the schema of the auxlary table, based on whch, the relatonal algebra for SEJ s proposed. Snce, n general, the way that the SQL statement s wrtten wll affect the performance of queres n DBMS, we optmze the query performance for the group-based approach by adjustng ts SQL formulaton. Partcularly, we propose a subquery-based SQL statement for SEJ, whch s much more effcent than the SQL, that s drectly translated from the relatonal algebra. PID Dsease PID Dsease P Tumours Q Neoplasms P Knub Q B. S. E P 3 Mad Cow Q 3 Brd Flu P 4 Brd Flu a) Example relaton PID Dsease PID Dsease P Tumours Q Neoplasms P Knub Q Neoplasms P 3 Mad Cow Q B. S. E P 4 Brd Flu Q 3 Brd Flu b) SEJ result Fgure : SEJ Example 55

3 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 Learnng from dscusson of - attrbute SEJ, we generalze the SEJ problem to k-attrbutes, where k auxlary tables need to be desgned and mantaned. Based on the sub-query-based SQL statement proposed for -attrbute SEJ, t s straghtforward to get a query for computng k-attrbutes SEJ. Because the jon condton of k-attrbutes SEJ s the conjuncton of k predcates (each of whch corresponds to the condton for a -attrbute SEJ), the order of these k predcates n the SQL statement wll sgnfcantly affect the cost of the query [4]. To optmze the performance for k-attrbutes SEJ, we modfy the order on these k predcates by referencng the sze of each auxlary table, and the jon selectvty of each par of jon attrbutes, such that the total cost of accessng k auxlary tables s mnmum. Fnally, we evaluate the effcency of the proposed approaches by performng ntensve experments on synthetc datasets wth varous characterstcs. The paper organzaton s gven as follows. Secton provdes a revew of related work and how the SEJ s postoned n ths context. Our proposed approach s shown n Secton 3. The experments and results are reported n Secton 4. Fnally, we provde concludng remarks n Secton 5. elated Work and Dscusson The problem of jons wth nexact matchng of correspondng data, approxmate strng jons, has already been addressed by many authors expermentng wth a varety of approaches. Most of them search for solutons that can be effectvely executed by the exstng and wdely used tradtonal relatonal database envronments, DBMS and SQL, wthout any necessty for any specal purpose programmng. The problem was addressed frstly n [5]. The man concept behnd ths approach s to decompose a strng attrbute value nto substrngs called q-gram [6] whch has been used wdely n strng matchng context. The length of substrngs s specfed by q value. For example, f q s set at 3, a strng John" can be decomposed nto the set of substrngs { ##J", #Jo", Joh", ohn", hn$", n$$"}. When matchng two strngs, a dstance functon based on the edt dstance s computed on the two sets of ther substrngs. A number of q-gram related propostons were proposed to reduce the cost of edt dstance computaton. In [7], an approxmate strng jon approach based on cosne smlarty computaton was proposed. In ths work, strng data of each attrbute (whch can be composed of many terms) s represented as a mult-dmensonal vector. For each attrbute, the number of dmensons s the number of all possble terms used n the attrbute (a strng may consst of multple terms). For each dmenson, a magntude s assgned accordng to the sgnfcance of the correspondng term; an nfrequent term wll have a hgher value, whle a common term wll have a lower value. For example, a strng Informaton Technology and Electrcal Engneerng" n an afflaton attrbute wth eght possble terms can be represented as an 8-dmensonal vector (0, 0, 0.5, 0., 0.6, 0.4, 0, 0), where 0.5, 0., 0.6 and 0.4 represent the sgnfcance of the terms Informaton, Technology, Electrcal and Engneerng respectvely. When strngs are to be matched, the cosne smlarty functon s then computed. If the cosne smlarty of two strngs s hgher than a prespecfed threshold, they can be matched. For example, f a specfed threshold s set at 0.4, the above strng and another strng Informaton Engneerng" represented as a vector (0, 0, 0.5, 0, 0, 0.4, 0, 0) can be matched because ther cosne smlarty s 0.4 (0.5x x0.4). Clearly, one can buld many examples where such a matchng wll not delver hgh qualty results, manly due to an evdent lack of dependency measures between the terms. 56

4 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 Also, t s ntrnscally dffcult to allocate the best values for sgnfcance measures, and any allocaton wll be regarded as hghly subjectve. Also, n the same work, a samplng based computaton has been used to reduce the computatonal cost of the functon. Instead of computng a smlarty functon for every par of the jon canddates, whch s an expensve operaton, nsgnfcant canddates wll be sampled out, wth only the promsng pars computed. In the same way as n [5], ths work does not requre any modfcaton to DBMS. Only auxlary tables are requred to be created and they are used to drve the matchng mechansm. In [8], web data ntegraton was proposed as another applcaton for approxmate strng jons. In that work, more extensve theoretcal aspects and expermental results were presented. Furthermore, many combnatons of dstance functons were dscussed, for example, edt dstance, block edt dstance, cosne smlarty wth words as terms, and cosne smlarty wth q-grams as terms. As a result, the last type of combnaton, cosne smlarty wth q-grams as terms, has been suggested as the best dstance functon because t can capture all types of msmatch, ncludng spellng errors, nserton and deleton of short or common words and varatons of word order. An attempt to nclude semantc knowledge nto the approxmate strng jon computaton (rather than purely structural strng comparson as ndcated above), s presented n [9]. To mprove smlartybased jons, two terms whch share the same semantc meanng, but have too much dfference n ther representatons (e.g. MCI" and Worldcom" whch refer to the same company), are formed as a par n an auxlary table by human experts. Subsequently, n the pre-processng phase of a jon, each tuple s extended by assocatng ts value wth the pars n the auxlary table whch have a hgh" smlarty score. For example, consder a tuple wth a value MCI Corp". After the pre-processng phase, the value wll be extended to ( MCI Corp", MCI", Worldcom") by assocatng the value wth a par of ( MCI", Worldcom"). When answerng a matchng query, the attrbute values of the query strng are also extended by usng the auxlary table n the same way as n the pre-processng phase. Suppose that an attrbute value Worldcom Corp" s gven n the query, t wll be extended to ( Worldcom Corp", Worldcom", MCI"). Consequently, t can be matched wth the tuple above, wth the MCI Corp" value, by tradtonal cosne smlarty computatons. Although only brefly dscussed n ths paper, ths approach can remedy the shortcomngs of prevously smlarty-based approaches by applyng semantc knowledge. Smlar to our work, ths work also addresses the synonym msmatch problem by applyng semantc knowledge. However, unlke our work, whch s to effcently compute SEJ, the focus of that approach s on how to use semantc knowledge to pre-process two relatons such that later approxmate jon approaches can acheve more accurate results wthout addtonal operatons. 3. Semantcally Equal Jon Approach 3. Prelmnares Let us frst consder two relatons ( A,... A ) and ( B,... B ) such that m n attrbutes A and B have the same doman Dom, where Dom s a set of strngs. Defnton. Semantc equvalence, denoted as, s a relaton defned on a gven doman Dom such that for every a, b Dom, a b ff a s a synonym of b. Also, we defne that a and b are strct semantcally equvalent, denoted as a b, f a b and a b. 57

5 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 A B USA Unted States GB France The States Unted Kngdom France a) Jon attrbutes {(USA, USA), (Unted States, Unted States), (The States, The States), (GB, GB), (Unted Kngdom, Unted Kngdom), (France, France)} {(USA, Unted States), (Unted States, USA), (USA, The States), (The States, USA), (Unted States, The States), (The States, Unted States), (GB, Unted Kngdom), (Unted Kngdom, GB)} b) Semantc equvalence relaton Fgure : Semantc Equvalence Example. Suppose that Fgure a gves the values of jon attrbutes A and B whch are the results of projectng the relatons and respectvely. Dom n ths example s the strng set {USA, Unted States, The States, GB, Unted Kngdom, France}. Accordng to the standard conventon used for expressng country names, we can naturally defne the semantc equvalence relaton on Dom as Fgure b. Also observe that n ths example, the strct semantc equvalence relatonshp s shown as the rght-hand-sde of the unon operator. Defnton. Gven a relaton of semantc equvalence, semantcally equal jon (SEJ) s a theta jon operator n the relatonal algebra such that the theta comparson operator s.. Further, the semantcally equal jon operator s denoted as To perform the semantcally equal jon between two tables and on attrbutes A and B, the semantc equvalence relaton on doman Dom needs to be predefned. Snce t s straghtforward to observe that a strng can semantcally jon wth tself, we defne a relaton Au, called auxlary table, to only specfy the relatonshp of strct semantcally equvalent,.e., to gve the nformaton of synonyms n Dom. Accordng to Defnton, t s easy to observe that semantc equvalence,, s an equvalence relaton. Formally, gven a relaton of semantc equvalence relaton on Dom, we can partton Dom nto a set of equvalence classes C, = l such that for any C, there does not exst elements a, b holdng the condton of a b. C Note that only the class C wth C provdes the nformaton of synonyms. We defne the set of non-trval equvalence classes, formally, { C C } where = l. Accordng to the above dscusson, Au can be regarded as the representaton of the set. Let us further consder the Example. Gven the relaton, Dom s parttoned nto three equvalence classes: C { USA, Unted States, The States }, C { GB, Unted Kngdom}, and C3 { France}. The As an equvalence relaton, = has the followng property: ) a a for all a Dom, ) a b b a for all a, b Dom 3) a b b c a c for all a, b, c Dom. 58

6 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 set of non-trval equvalence classes s { C, C }. After ntroducng the key concepts, we generalze the problem of semantcally equal jon to k attrbutes and gve the formal problem statement as follows. Problem statement: Let ( A,... A ) and m ( B,... B ) be two relatons such that n attrbutes.a and.b (= k) have the same doman Dom, where Dom s a set of strngs. For each Dom (= k), let be the equvalence relaton defned on Dom and Au be the auxlary table that represents the correspondng strct semantcally equvalent relatonshp. The semantcally equal jon (SEJ) problem s: gven relatons, and auxlary table Au,..., Au k, express k A B,... A k Bk by a sngle SQL statement. In the rest of Secton 3, we wll frst study how to buld an SQL statement to effcently compute SEJ on one attrbute, then dscusson s extended to SEJ on k- attrbutes (k>). 3. Schema of Auxlary Table and Semantcally Equal Jon Frstly, we propose a structure of auxlary tables, called the group-based approach for the schema of the auxlary table. In ths approach, gven the set, the auxlary table Au s defned as Au ( Gd, P) {(, a) C and a C }. In Example, the auxlary table s gven n Fgure 3. Wth ths schema, we can buld the prmary ndex on the attrbute P for effcent access because there can not exst a strng that belongs to two dfferent synonym groups. GID P USA The States Unted States GB Unted Kngdom Fgure 3: The example of group-based auxlary table Gven the schema of Au, the A expresson for the semantcally equal jon for ths schema s: ( (( Au) P A ) B A Au. GID Au '. GID where ( B Au' )) P Au ' s the copy of Au. () 3.3 Query for Implementng SEJ Gven the relatonal algebra operaton n Equaton (), there are multple ways to buld an SQL statement to compute the semantcally equal jon. Although generally, dfferent queres may have the same executon plan n a gven DBMS, n many cases, the way to wrte a query wll affect the executon plan and the query performance. In ths secton, we propose an SQL statement whch can effcently compute semantcally equal jon. ather than consderng query optmzaton technques whch are dependant on the DBMS, we only put our effcency concern at the SQL level SQL Statements Accordng to Equaton (), we can drectly translate the A nto an SQL 59

7 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 statement, whch s shown n Fgure 4a. Snce part of ths SQL statement requres three jon operatons to lnk four tables, we call t three-jon-based SQL statement. (SELECT * FOM, WHEE A =B ) UNION (SELECT * FOM,, Au Au, Au Au WHEE (A =Au.P AND B =Au.P AND Au.d = Au.d)) a) Three-jon-based SELECT * FOM, WHEE A =B O ((SELECT Gd FOM Au WHEE P=A ) = (SELECT Gd FOM Au WHEE P=B )) b) Sub-query-based Fgure 4: SQL statements Intutvely, the cost of jonng four tables s expensve. Therefore, to mprove the effcency, we propose another SQL statement n Fgure 4b, called sub-querybased SQL statement. The semantc meanng behnd ths query s that two tuples can be joned together f ether of the values of the jon attrbutes are the same, or they belong to the same group n the auxlary table. Note that when both values of A and B are not n the auxlary table, the condton after O wll compare two Null values. In the standard SQL, the result of Null=Null s unknown, whch wll not make two tuples jon Cost Analyss We brefly dscuss the I/O cost for computng the SEJ n ths secton. Let us frst consder the three-jonbased SQL. In Fgure 4a, we can see that the query conssts of three operatons: equal jon, operaton of three-jon, and unon. It s not hard to observe that ths query wll generate a sgnfcant amount of ntermedate results from jons and requre the external sort to remove duplcates for the unon operaton, both of whch wll lead to hgh I/O cost. For the sub-query-based approach, the query plan s much smpler: When and are joned, for any par of tuples whch can not equally jon, the auxlary table s accessed by the prmary ndex (on the attrbute P) twce to check the semantc jon condton. Note that n practce, an auxlary table s not very large and often can be cached n the buffer. In ths case, the I/O cost of sub-query-based SQL s close to the cost of an equal jon of and. For the above dscusson, we can see the sub-query based approach requres less dsk I/O when the sze of Au s small. Ths wll be also demonstrated n the experment at secton. 3.4 k-attrbutes Semantcally Equal Jon Now, we consder the task of semantcally jonng two relatons on k attrbutes, as s descrbed n the problem statement n Secton 3.. As the jon condton of k-attrbutes SEJ s the conjuncton of the condtons for the sngle attrbute jons, we can readly extend the prevous SQL statement shown n Fgure 4 nto the statement n Fgure 5. Snce the jon condtons are lnked by AND, f any set of condton s not satsfed, t s not necessary to check the other jon condtons of other attrbutes. Apparently, we can order the jon condtons sets to mnmze the cost of accessng auxlary tables. Intutvely, two factors need to be consdered: ) the sze of auxlary table Au ) the jon/sej selectvty of each par of jon attrbutes. SELECT * FOM, WHEE (A =B O ((SELECT Gd FOM Au WHEE P=A )=(SELECT Gd FOM Au WHEE P=B ))) 60

8 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 AND(A k =B k O ((SELECT Gd FOM Au k WHEE P=A k )=(SELECT Gd FOM Au k WHEE P=B k ))) Fgure 5: SQL for k-semantcally equal jon It s well-known to the database communty that the query performance can be adjusted by orderng jon condtons lnked by AND. However, drven by the specalty of a new type query, we dscuss ths orderng problem on the context of SEJ wth the purpose of mnmzng the cost of auxlary table access. Let p be the number of pages of the auxlary table Au where =,...,k. Gven relaton and, let ndex of the attrbute s,..., l ( j s the j k ) be the semantcally equal jon selectvty of and on l attrbutes. Formally, A B,..., A l B l to: s,..., l s equal. Smlarly, we defne the equal jon selectvty s,, l as: A B,..., A B l Gven an order of jon attrbutes,..., k, under the assumpton that the jon attrbutes are ndependent, the total I/O cost on access k auxlary tables s: [( s ) p s ( s ) p... s,,... l ( s ) p k k k Also due to the assumpton that the jon attrbutes are ndependent, we have ^ s s,... l. So the orderng task j... l can be presented as: j ]. Gven the szes of auxlary tables p,p,,p k, the equal jon selectvty of each par attrbutes s,s, s k, and the semantcally equal jon selectvty of each par attrbutes s,s, s k, are ordered to fnd an order,,, k cost functon such that the [( s ) p s ( s ) p... s,,... k ( s ) p ] k s mnmzed. k To fnd the order wth mnmum cost, theoretcally, we need to compute the costs for k! orders. However, based on the cost functon, we can apply some heurstcs to speed up the search. The equal jon selectvty or semantcally equal jon selectvty s often small. Consderng the cost functon, assume that the values of (s )p for =,, k are not sgnfcantly dfferent, we can expect that the cost of accessng -th auxlary table s less than that of accessng the ( ) -th auxlary table n a manner of magntude. Intutvely, to mnmze the total cost, t could be helpful to frst select ndex wth mnmum (s )p. Based on ths dea, we can gnore the semantcally equal jon selectvty and order the ndex accordngly accordng to the value (s )p. Once the optmum order O of ndex on the jon attrbutes s found, the query n Fgure 5 can be optmzed by re-orderng the jon condtons based on O. 4. Expermental esults In ths secton, to evaluate the effcency of SEJ, we report the result of the performance study on synthetc datasets. 4. Expermental Confguraton System resource: The experments are conducted on an 800 MHz Intel Pentum III 6

9 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 wth 56 megabytes man memory, runnng a commercal DBMS (Oracle 8). All nvented methods are mplemented usng standard SQL. Datasets: In our experments, all the relatons (,, and auxlary table Au) are generated by dfferent random varables, each of whch follows the unform dstrbuton. For brevty, we only show how to generate relatons for -attrbute SEJ. The dscusson can be readly extended to k- attrbutes SEJ. There are two controlled parameters for the auxlary table, the cardnalty Au and the number g of semantcally equvalent groups. Gven Au and g, we generate the group-based auxlary table Au(Gd,P) by randomly assgnng Au dstnct strngs nto g groups. Based on the unform dstrbuton, Au g gves the expected number of strng values n each group, whch s denoted as v. For the two relatons and, ther schema are desgned as ( A, A ) and ( B, B ) wth A and B to be the jon attrbutes. Further, we consder the nstances n each relaton. Agan for smplcty, we set the constrants that the values of attrbute A are exactly the same as the values of attrbutes B Consderng the attrbute A ( B ), ts values consst of three parts: the S part whch has the attrbute values n the auxlary table (.e., can semantcally match the values n the other relaton), E part whch has the values that can only exact match the values n the other relaton, and the N part, whch has the values that cannot match any values n the other relaton. Let S%,E%,N% (S%+E%+N%=) be the percentages of the values n part S, part E, and part N respectvely, and ns,ne,nn be the numbers of dstnct values n these there parts. Gven the above parameters, the jon relatons are generated by followng the equal dstrbuton. For example, n S part, we can expect each dstnct value appears * S% tmes, where s the ns cardnalty of ( ). 4. Expermental esults and Dscusson -Attrbute SEJ esults and Dscusson Effect of jon relaton sze : In ths experment, we ncrease from 00 records to 4,000 records to examne the performance of two types of query wth dfferent szes of jon relaton. The other parameters of the jon relatons are set as: S%=80%, E%=0, N%=0%, ns=0, ne=0, nn=3. For auxlary table, we set Au =0 and g=3. The performance of -attrbute SEJ by dfferent types of queres s presented n Fgure 6a. It can be seen that the three-jonbased approach s very neffcent. Therefore, t cannot be used for a large amount of data. As dscussed n Secton 3, the SQL statement for three-jon-based approach requres that four tables are joned, whch leads to expensve I/O cost. Therefore, for the rest of ths secton, we wll omt the result of the three-jon-based approach and only dscuss the group-based sub-query. Effect of selectvty s ^ : The second consderaton for -attrbute SEJ s about ts performance regardng the semantc jon ^ ^ selectvty s of and where s s composed of two parts, strct semantc jon selectvty s ' (.e. the selectvty for the pars of tuples that are matched based on strct semantcally equvalent relatonshp) and equal jon selectvty s (ntroduced n Secton 3.4). By defnton, we always have ^ s s s'. Note here that we slghtly change the way of generatng synthetc datasets n ths part. The records n S part are allowed to only be joned through an auxlary table (.e., no equal jon n ths part). In ths way, due to the equal 6

10 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 dstrbuton, the strct semantc jon ( S%) selectvty s, where g s the g number of groups n Au. Smlarly, gven E% and ne, the equal jon selectvty can be ( E%) estmated as. In the experments, ne the strct semantc jon selectvty s and equal jon selectvty can be adjusted by changng parameters S%, ns, E%, and ne. Both experments use some common parameters whch are the sze of jon relatons at 0,000 tuples wth ne at 5 and nn at 00. Also, for both experments, we set the sze of auxlary table Au at 00 wth g at 5, whch means that v s set at 4. To evaluate the mpact of s ', we fx s by always settng E%=0% and ne=5. We ncrease the strct semantc jon selectvty by ncreasng S% from 0% to 80%. Note that when S% s ncreased, we also decrease N% to keep the sze of jon relaton the same. For evaluatng the performance of SEJ regardng to equal jon selectvty s, we fx the s ' by always settng S%=0% and ns=00 and ncrease E% from 0% to 80%. a) Effect of jon relaton b) Effect of selectvty Fgure 6: Performance of SEJ We frstly measure the performance of SEJ regardng to s ', whch s shown n Fgure 6b. Obvously, the performance of SEJ s rrelevant to strct semantc jon selectvty. Ths verfes our thoughts that the amount of tme for auxlary table access only depends on the equal jon selectvty s. The reason s that for any two records t and t the auxlary table needs to be accessed unless t and t can be equal joned. Also, n Fgure 6b, we report the performance of SEJ when the selectvty s s ncreased. We can see that due to the same reason, the executon tme decreases when E% s ncreased. k-attrbutes SEJ esults and Dscusson In ths experment, we dscuss the SEJ jon n terms of the order of k attrbutes n an SQL statement. Partcularly, we verfy our cost analyss n Secton 3.4 and demonstrate how the proposed heurstcs help to quckly fnd an optmal order. As dscussed prevously, the followng two factors are consdered for the orderng: ) sze of auxlary table Au ) jon/sej selectvty. We generate jon relatons wth three jon attrbutes (k=3) of sze 00,000 records. The characterstcs and the estmated selectvtes of the relatons are shown n Table. As shown n Table, the frst and the 63

11 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 second attrbutes have the same Au, but they are dfferent n selectvtes. On the other hand, even f the thrd attrbute has the lowest selectvty, t has the largest Au. The performance of all possble orders of jon condtons for the dataset s evaluated. The performance on k-attrbutes SEJ wth all possble jon condton orderngs s reported n Fgure 7. On the x- axs, we put the ascendng order for all theoretcal I/O costs (for dfferent orderng) computed from the cost functon. The result shows that the order of actual cost s accordant wth the order of estmated cost, theoretcally. Ths demonstrates the correctness of our prevous analyss. Compared wth the worst case order 3--, the best order --3 has better performance, by 7.9%. As mentoned n Secton 3.4, we can also compute the best order by computng estmated cost ( s ) p, n whch the result s also the order --3. Another nterestng observaton s that the order --3 has more cost than the order - 3-, whch shows that the SEJ selectvty should also be consdered, not only the sze of auxlary tables. On the other hand, orderng only by SEJ selectvty can not perform very well, as presented by the order 3--. Table : Characterstcs and Selectvtes of Expermental Dataset Attrbute Auxlary table Jon relaton Selectvty Au g v S% E% N% ns ne nn s ' s % 60% 30% % 3.6% 3.8% % 0% 40% % 0.% 5.% % 0% 40% % 0.4% 3.6% ^ s Fgure 7: Performance of k-attrbute SEJ. 5. Concluson In ths paper, we have addressed the problem of synonym msmatch n the context of data ntegraton, where the data from dfferent sources are semantcally equvalent, but the data representaton s dfferent. We have ntroduced a new relatonal algebrac operaton, called semantcally equal jon (SEJ), whch can be generally appled for ntegratng two relatons based on predefned equvalence relaton(s). The orgnalty and contrbutons of our work nclude the followng aspects: 64

12 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 ) we have ntroduced the semantc equvalent relatonshp to resolve the synonym msmatch problem. In fact, ths study can also complement prevous work of approxmate jon that manly focuses on resolvng typo msmatch. ) We have formally defned the concept of SEJ and proposed an effcent approach to mplement the SEJ operator n the standard DBMS by a sngle SQL statement. 3) By extensve experments we have demonstrated performance of our approach for SEJ. 6. eferences [] Halevy, A.Y., Ashsh, N., Btton, D., Carey, M., Draper, D., Pollock, J., osenthal, A., Skka, V., Enterprse Informaton Integraton: Successes, Challenges and Controverses, In Proceedngs of the 005 ACM SIGMOD Internatonal Conference on Management of Data, New York, NY, USA, ACM Press, pp , 005. [] ahm, E., Bernsten, P.A., A Survey of Approaches to Automatc Schema Matchng, the VLDB Journal,Vol. 0, pp , 00. [3] Colomb,.M., Orlowska, M.E.: Interoperablty n Informaton Systems. Informaton Systems, Journal,Vol.5, pp.37 50,994. [4] Orlowsk, M.W., On Optmsaton of Jons n Dstrbuted Database System, In: Future Databases, pp.06 4,99. [5] Gravano, L., Iperots, P.G., Jagadsh, H.V., Koudas, N., Muthukrshnan, S., Srvastava, D., Approxmate Strng Jons n a Database (Almost) for Free, In Proceedngs of 7th Internatonal Conference on Very Large Data Bases, Morgan Kaufmann, pp , 00. [6] Ullmann, J.., A Bnary n-gram Technque for Automatc Correcton of Substtuton, Deleton, Inserton and eversal Errors n Words. The Computer Journal, Vol. 0, pp.40 47, 976. [7] Gravano, L., Iperots, P.G., Koudas, N., Srvastava, D., Text Jons for Data Cleansng and Integraton n an dbms, In Proceedngs of the 9th Internatonal Conference on Data Engneerng, IEEE Computer Socety, pp.79 73, 003. [8] Gravano, L., Iperots, P.G., Koudas, N., Srvastava, D., Text Jons n an dbms for Web Data Integraton, In Proceedngs of the Twelfth Internatonal World Wde Web Conference, pp.90 0, 003. [9] Koudas, N., Marathe, A., Srvastava, D., Flexble Strng Matchng Aganst Large Databases n Practce, In Proceedngs of the Thrteth Internatonal Conference on Very Large Data Bases, Morgan Kaufmann, pp ,

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Oracle Database: SQL and PL/SQL Fundamentals Certification Course

Oracle Database: SQL and PL/SQL Fundamentals Certification Course Oracle Database: SQL and PL/SQL Fundamentals Certfcaton Course 1 Duraton: 5 Days (30 hours) What you wll learn: Ths Oracle Database: SQL and PL/SQL Fundamentals tranng delvers the fundamentals of SQL and

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

MODULE DESIGN BASED ON INTERFACE INTEGRATION TO MAXIMIZE PRODUCT VARIETY AND MINIMIZE FAMILY COST

MODULE DESIGN BASED ON INTERFACE INTEGRATION TO MAXIMIZE PRODUCT VARIETY AND MINIMIZE FAMILY COST INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN, ICED 07 28-31 AUGUST 2007, CITE DES SCIENCES ET DE L'INDUSTRIE, PARIS, FRANCE MODULE DESIGN BASED ON INTERFACE INTEGRATION TO MAIMIZE PRODUCT VARIETY AND

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

arxiv: v3 [cs.ds] 7 Feb 2017

arxiv: v3 [cs.ds] 7 Feb 2017 : A Two-stage Sketch for Data Streams Tong Yang 1, Lngtong Lu 2, Ybo Yan 1, Muhammad Shahzad 3, Yulong Shen 2 Xaomng L 1, Bn Cu 1, Gaogang Xe 4 1 Pekng Unversty, Chna. 2 Xdan Unversty, Chna. 3 North Carolna

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

A NOTE ON FUZZY CLOSURE OF A FUZZY SET (JPMNT) Journal of Process Management New Technologes, Internatonal A NOTE ON FUZZY CLOSURE OF A FUZZY SET Bhmraj Basumatary Department of Mathematcal Scences, Bodoland Unversty, Kokrajhar, Assam, Inda,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Summarizing Data using Bottom-k Sketches

Summarizing Data using Bottom-k Sketches Summarzng Data usng Bottom-k Sketches Edth Cohen AT&T Labs Research 8 Park Avenue Florham Park, NJ 7932, USA edth@research.att.com Ham Kaplan School of Computer Scence Tel Avv Unversty Tel Avv, Israel

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm 01 Internatonal Conference on Image, Vson and Computng (ICIVC 01) IPCSIT vol. 50 (01) (01) IACSIT Press, Sngapore DOI: 10.776/IPCSIT.01.V50.4 Vectorzaton of Image Outlnes Usng Ratonal Splne and Genetc

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Ontology Generator from Relational Database Based on Jena

Ontology Generator from Relational Database Based on Jena Computer and Informaton Scence Vol. 3, No. 2; May 2010 Ontology Generator from Relatonal Database Based on Jena Shufeng Zhou (Correspondng author) College of Mathematcs Scence, Laocheng Unversty No.34

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research Schedulng Remote Access to Scentfc Instruments n Cybernfrastructure for Educaton and Research Je Yn 1, Junwe Cao 2,3,*, Yuexuan Wang 4, Lanchen Lu 1,3 and Cheng Wu 1,3 1 Natonal CIMS Engneerng and Research

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Positive Semi-definite Programming Localization in Wireless Sensor Networks Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

SAO: A Stream Index for Answering Linear Optimization Queries

SAO: A Stream Index for Answering Linear Optimization Queries SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices Hgh resoluton 3D Tau-p transform by matchng pursut Wepng Cao* and Warren S. Ross, Shearwater GeoServces Summary The 3D Tau-p transform s of vtal sgnfcance for processng sesmc data acqured wth modern wde

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information