Efficient Semantically Equal Join on Strings in Practice
|
|
- Charleen Barber
- 5 years ago
- Views:
Transcription
1 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 Effcent Semantcally Equal Jon on Strngs n Practce Juggapong Natwcha Computer Engneerng Department, Faculty of Engneerng Chang Ma Unversty, Chang Ma, Thaland 5000 juggapong@changma.ac.th Xngzh Sun IBM esearch Laboratory, Bejng, Chna sunxngz@cn.bm.com Mara E. Orlowska School of Informaton Technology and Electrcal Engneerng The Unversty of Queensland, Brsbane, Qld 407, Australa mara@tee.uq.edu.au Abstract In general, data ntegraton begns wth schema ntegraton and must be followed by detaled data nstances resemblance n order to reach data representaton unfcaton. In ths paper, we address the lmts of the data-level reconclaton automaton process n cases where the compared data s semantcally equvalent, but the data representaton of the values of gven attrbutes s dfferent. We assume that a semantc relatonshp between potentally used terms s establshed by a human expert pror to the desgned computatons, and s represented n an auxlary table bult and mantaned for each attrbute. On such a context, we ntroduce a noton of semantcally equal jon (SEJ), whch s the jon operaton based on a pre-defned semantc relatonshp. Our goal s to propose a soluton for SEJ that can be supported by standard SQL. The paper begns wth an llustraton of the approach for a sngle attrbute jon, followed by a generalsaton of SEJ for any number of jon attrbutes. The paper contnues wth performance consderatons for the nvented method. Fnally, ths aspect s supported by extensve expermentaton based on our mplementaton of SEJ executed on synthetc datasets. Keywords: Approxmate Strng Jon, Data Integraton, Semantcally Equal Jon.. Introducton The data ntegraton ssue [] s one of the most challengng problems that computer scence and IT practtoners have faced n the last decade or so. Ths problem orgnally emerged when demand for data access from multple sources was observed n the context of dstrbuted mult-database systems; later, database federaton systems and other versons of data ntegrated constructons ncludng data warehousng and, very recently, e-research applcatons. Frstly, ntegraton problems occur at the schema level [] where the name, meanng and constrant scope of attrbutes must be addressed. Usually, human nvolvement n ths process s necessary [3]. Subsequently, the data-level ntegraton 54
2 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 problems need to be addressed. The problem at ths level exsts due to: potental msmatch of attrbutes domans, adopted strngs to represent attrbutes values, as well as conventons to express the data felds. Addtonally, the data msmatch may be caused by many other reasons, such as, for nstance, typng errors. In ths paper, we consder the data ntegraton where two gven relatons are joned wth each other on some attrbutes. On ths context, we address the synonym msmatch problem,.e., the values of jon attrbutes are semantcally equvalent, but unable to match due to dfferent representaton, e.g. dfferent abbrevaton standards. Naturally, on such an attrbute doman, the semantc equvalence relatonshp between strngs needs to be pre-defned. We call the jon operatons that are based on ths semantc equvalence relatonshp, semantcally equal jon (SEJ). For example, from Fgure a, and are two relatons whch contan the medcal records of two dfferent hosptals respectvely. Assume that one ssues a query to fnd pars of patents who have the same type of dsease. Due to the fact that n medcal doman, strngs Tumours, Kunb, Neoplasms are semantcally equvalent and so are the strngs Mad Cow" and B.S.E." (Bovne Spongform Encephalopathy), the result of the query should be the relaton gven n Fgure b. We can see that ths query result can be obtaned by jonng two relatons and on ther common attrbute Dsease" such that the jon condton s true f and only f the values of the jon attrbute are semantcally equvalent. In the rest of ths paper, the focus s to formally defne the SEJ operaton and to propose an effcent approach to compute SEJ n a standard SQL envronment. Our dscusson begns wth an llustraton of the approach for -attrbute SEJ. Frst, to represent the semantc relatonshp between potentally used terms, we ntroduce the auxlary table whch s an addtonal relaton bult and mantaned for the par of attrbutes on whch the jon s to be performed. The desgn of the auxlary table s mportant because t drectly affects the SEJ performance. To address ths ssue, we desgn the schema of the auxlary table, based on whch, the relatonal algebra for SEJ s proposed. Snce, n general, the way that the SQL statement s wrtten wll affect the performance of queres n DBMS, we optmze the query performance for the group-based approach by adjustng ts SQL formulaton. Partcularly, we propose a subquery-based SQL statement for SEJ, whch s much more effcent than the SQL, that s drectly translated from the relatonal algebra. PID Dsease PID Dsease P Tumours Q Neoplasms P Knub Q B. S. E P 3 Mad Cow Q 3 Brd Flu P 4 Brd Flu a) Example relaton PID Dsease PID Dsease P Tumours Q Neoplasms P Knub Q Neoplasms P 3 Mad Cow Q B. S. E P 4 Brd Flu Q 3 Brd Flu b) SEJ result Fgure : SEJ Example 55
3 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 Learnng from dscusson of - attrbute SEJ, we generalze the SEJ problem to k-attrbutes, where k auxlary tables need to be desgned and mantaned. Based on the sub-query-based SQL statement proposed for -attrbute SEJ, t s straghtforward to get a query for computng k-attrbutes SEJ. Because the jon condton of k-attrbutes SEJ s the conjuncton of k predcates (each of whch corresponds to the condton for a -attrbute SEJ), the order of these k predcates n the SQL statement wll sgnfcantly affect the cost of the query [4]. To optmze the performance for k-attrbutes SEJ, we modfy the order on these k predcates by referencng the sze of each auxlary table, and the jon selectvty of each par of jon attrbutes, such that the total cost of accessng k auxlary tables s mnmum. Fnally, we evaluate the effcency of the proposed approaches by performng ntensve experments on synthetc datasets wth varous characterstcs. The paper organzaton s gven as follows. Secton provdes a revew of related work and how the SEJ s postoned n ths context. Our proposed approach s shown n Secton 3. The experments and results are reported n Secton 4. Fnally, we provde concludng remarks n Secton 5. elated Work and Dscusson The problem of jons wth nexact matchng of correspondng data, approxmate strng jons, has already been addressed by many authors expermentng wth a varety of approaches. Most of them search for solutons that can be effectvely executed by the exstng and wdely used tradtonal relatonal database envronments, DBMS and SQL, wthout any necessty for any specal purpose programmng. The problem was addressed frstly n [5]. The man concept behnd ths approach s to decompose a strng attrbute value nto substrngs called q-gram [6] whch has been used wdely n strng matchng context. The length of substrngs s specfed by q value. For example, f q s set at 3, a strng John" can be decomposed nto the set of substrngs { ##J", #Jo", Joh", ohn", hn$", n$$"}. When matchng two strngs, a dstance functon based on the edt dstance s computed on the two sets of ther substrngs. A number of q-gram related propostons were proposed to reduce the cost of edt dstance computaton. In [7], an approxmate strng jon approach based on cosne smlarty computaton was proposed. In ths work, strng data of each attrbute (whch can be composed of many terms) s represented as a mult-dmensonal vector. For each attrbute, the number of dmensons s the number of all possble terms used n the attrbute (a strng may consst of multple terms). For each dmenson, a magntude s assgned accordng to the sgnfcance of the correspondng term; an nfrequent term wll have a hgher value, whle a common term wll have a lower value. For example, a strng Informaton Technology and Electrcal Engneerng" n an afflaton attrbute wth eght possble terms can be represented as an 8-dmensonal vector (0, 0, 0.5, 0., 0.6, 0.4, 0, 0), where 0.5, 0., 0.6 and 0.4 represent the sgnfcance of the terms Informaton, Technology, Electrcal and Engneerng respectvely. When strngs are to be matched, the cosne smlarty functon s then computed. If the cosne smlarty of two strngs s hgher than a prespecfed threshold, they can be matched. For example, f a specfed threshold s set at 0.4, the above strng and another strng Informaton Engneerng" represented as a vector (0, 0, 0.5, 0, 0, 0.4, 0, 0) can be matched because ther cosne smlarty s 0.4 (0.5x x0.4). Clearly, one can buld many examples where such a matchng wll not delver hgh qualty results, manly due to an evdent lack of dependency measures between the terms. 56
4 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 Also, t s ntrnscally dffcult to allocate the best values for sgnfcance measures, and any allocaton wll be regarded as hghly subjectve. Also, n the same work, a samplng based computaton has been used to reduce the computatonal cost of the functon. Instead of computng a smlarty functon for every par of the jon canddates, whch s an expensve operaton, nsgnfcant canddates wll be sampled out, wth only the promsng pars computed. In the same way as n [5], ths work does not requre any modfcaton to DBMS. Only auxlary tables are requred to be created and they are used to drve the matchng mechansm. In [8], web data ntegraton was proposed as another applcaton for approxmate strng jons. In that work, more extensve theoretcal aspects and expermental results were presented. Furthermore, many combnatons of dstance functons were dscussed, for example, edt dstance, block edt dstance, cosne smlarty wth words as terms, and cosne smlarty wth q-grams as terms. As a result, the last type of combnaton, cosne smlarty wth q-grams as terms, has been suggested as the best dstance functon because t can capture all types of msmatch, ncludng spellng errors, nserton and deleton of short or common words and varatons of word order. An attempt to nclude semantc knowledge nto the approxmate strng jon computaton (rather than purely structural strng comparson as ndcated above), s presented n [9]. To mprove smlartybased jons, two terms whch share the same semantc meanng, but have too much dfference n ther representatons (e.g. MCI" and Worldcom" whch refer to the same company), are formed as a par n an auxlary table by human experts. Subsequently, n the pre-processng phase of a jon, each tuple s extended by assocatng ts value wth the pars n the auxlary table whch have a hgh" smlarty score. For example, consder a tuple wth a value MCI Corp". After the pre-processng phase, the value wll be extended to ( MCI Corp", MCI", Worldcom") by assocatng the value wth a par of ( MCI", Worldcom"). When answerng a matchng query, the attrbute values of the query strng are also extended by usng the auxlary table n the same way as n the pre-processng phase. Suppose that an attrbute value Worldcom Corp" s gven n the query, t wll be extended to ( Worldcom Corp", Worldcom", MCI"). Consequently, t can be matched wth the tuple above, wth the MCI Corp" value, by tradtonal cosne smlarty computatons. Although only brefly dscussed n ths paper, ths approach can remedy the shortcomngs of prevously smlarty-based approaches by applyng semantc knowledge. Smlar to our work, ths work also addresses the synonym msmatch problem by applyng semantc knowledge. However, unlke our work, whch s to effcently compute SEJ, the focus of that approach s on how to use semantc knowledge to pre-process two relatons such that later approxmate jon approaches can acheve more accurate results wthout addtonal operatons. 3. Semantcally Equal Jon Approach 3. Prelmnares Let us frst consder two relatons ( A,... A ) and ( B,... B ) such that m n attrbutes A and B have the same doman Dom, where Dom s a set of strngs. Defnton. Semantc equvalence, denoted as, s a relaton defned on a gven doman Dom such that for every a, b Dom, a b ff a s a synonym of b. Also, we defne that a and b are strct semantcally equvalent, denoted as a b, f a b and a b. 57
5 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 A B USA Unted States GB France The States Unted Kngdom France a) Jon attrbutes {(USA, USA), (Unted States, Unted States), (The States, The States), (GB, GB), (Unted Kngdom, Unted Kngdom), (France, France)} {(USA, Unted States), (Unted States, USA), (USA, The States), (The States, USA), (Unted States, The States), (The States, Unted States), (GB, Unted Kngdom), (Unted Kngdom, GB)} b) Semantc equvalence relaton Fgure : Semantc Equvalence Example. Suppose that Fgure a gves the values of jon attrbutes A and B whch are the results of projectng the relatons and respectvely. Dom n ths example s the strng set {USA, Unted States, The States, GB, Unted Kngdom, France}. Accordng to the standard conventon used for expressng country names, we can naturally defne the semantc equvalence relaton on Dom as Fgure b. Also observe that n ths example, the strct semantc equvalence relatonshp s shown as the rght-hand-sde of the unon operator. Defnton. Gven a relaton of semantc equvalence, semantcally equal jon (SEJ) s a theta jon operator n the relatonal algebra such that the theta comparson operator s.. Further, the semantcally equal jon operator s denoted as To perform the semantcally equal jon between two tables and on attrbutes A and B, the semantc equvalence relaton on doman Dom needs to be predefned. Snce t s straghtforward to observe that a strng can semantcally jon wth tself, we defne a relaton Au, called auxlary table, to only specfy the relatonshp of strct semantcally equvalent,.e., to gve the nformaton of synonyms n Dom. Accordng to Defnton, t s easy to observe that semantc equvalence,, s an equvalence relaton. Formally, gven a relaton of semantc equvalence relaton on Dom, we can partton Dom nto a set of equvalence classes C, = l such that for any C, there does not exst elements a, b holdng the condton of a b. C Note that only the class C wth C provdes the nformaton of synonyms. We defne the set of non-trval equvalence classes, formally, { C C } where = l. Accordng to the above dscusson, Au can be regarded as the representaton of the set. Let us further consder the Example. Gven the relaton, Dom s parttoned nto three equvalence classes: C { USA, Unted States, The States }, C { GB, Unted Kngdom}, and C3 { France}. The As an equvalence relaton, = has the followng property: ) a a for all a Dom, ) a b b a for all a, b Dom 3) a b b c a c for all a, b, c Dom. 58
6 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 set of non-trval equvalence classes s { C, C }. After ntroducng the key concepts, we generalze the problem of semantcally equal jon to k attrbutes and gve the formal problem statement as follows. Problem statement: Let ( A,... A ) and m ( B,... B ) be two relatons such that n attrbutes.a and.b (= k) have the same doman Dom, where Dom s a set of strngs. For each Dom (= k), let be the equvalence relaton defned on Dom and Au be the auxlary table that represents the correspondng strct semantcally equvalent relatonshp. The semantcally equal jon (SEJ) problem s: gven relatons, and auxlary table Au,..., Au k, express k A B,... A k Bk by a sngle SQL statement. In the rest of Secton 3, we wll frst study how to buld an SQL statement to effcently compute SEJ on one attrbute, then dscusson s extended to SEJ on k- attrbutes (k>). 3. Schema of Auxlary Table and Semantcally Equal Jon Frstly, we propose a structure of auxlary tables, called the group-based approach for the schema of the auxlary table. In ths approach, gven the set, the auxlary table Au s defned as Au ( Gd, P) {(, a) C and a C }. In Example, the auxlary table s gven n Fgure 3. Wth ths schema, we can buld the prmary ndex on the attrbute P for effcent access because there can not exst a strng that belongs to two dfferent synonym groups. GID P USA The States Unted States GB Unted Kngdom Fgure 3: The example of group-based auxlary table Gven the schema of Au, the A expresson for the semantcally equal jon for ths schema s: ( (( Au) P A ) B A Au. GID Au '. GID where ( B Au' )) P Au ' s the copy of Au. () 3.3 Query for Implementng SEJ Gven the relatonal algebra operaton n Equaton (), there are multple ways to buld an SQL statement to compute the semantcally equal jon. Although generally, dfferent queres may have the same executon plan n a gven DBMS, n many cases, the way to wrte a query wll affect the executon plan and the query performance. In ths secton, we propose an SQL statement whch can effcently compute semantcally equal jon. ather than consderng query optmzaton technques whch are dependant on the DBMS, we only put our effcency concern at the SQL level SQL Statements Accordng to Equaton (), we can drectly translate the A nto an SQL 59
7 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 statement, whch s shown n Fgure 4a. Snce part of ths SQL statement requres three jon operatons to lnk four tables, we call t three-jon-based SQL statement. (SELECT * FOM, WHEE A =B ) UNION (SELECT * FOM,, Au Au, Au Au WHEE (A =Au.P AND B =Au.P AND Au.d = Au.d)) a) Three-jon-based SELECT * FOM, WHEE A =B O ((SELECT Gd FOM Au WHEE P=A ) = (SELECT Gd FOM Au WHEE P=B )) b) Sub-query-based Fgure 4: SQL statements Intutvely, the cost of jonng four tables s expensve. Therefore, to mprove the effcency, we propose another SQL statement n Fgure 4b, called sub-querybased SQL statement. The semantc meanng behnd ths query s that two tuples can be joned together f ether of the values of the jon attrbutes are the same, or they belong to the same group n the auxlary table. Note that when both values of A and B are not n the auxlary table, the condton after O wll compare two Null values. In the standard SQL, the result of Null=Null s unknown, whch wll not make two tuples jon Cost Analyss We brefly dscuss the I/O cost for computng the SEJ n ths secton. Let us frst consder the three-jonbased SQL. In Fgure 4a, we can see that the query conssts of three operatons: equal jon, operaton of three-jon, and unon. It s not hard to observe that ths query wll generate a sgnfcant amount of ntermedate results from jons and requre the external sort to remove duplcates for the unon operaton, both of whch wll lead to hgh I/O cost. For the sub-query-based approach, the query plan s much smpler: When and are joned, for any par of tuples whch can not equally jon, the auxlary table s accessed by the prmary ndex (on the attrbute P) twce to check the semantc jon condton. Note that n practce, an auxlary table s not very large and often can be cached n the buffer. In ths case, the I/O cost of sub-query-based SQL s close to the cost of an equal jon of and. For the above dscusson, we can see the sub-query based approach requres less dsk I/O when the sze of Au s small. Ths wll be also demonstrated n the experment at secton. 3.4 k-attrbutes Semantcally Equal Jon Now, we consder the task of semantcally jonng two relatons on k attrbutes, as s descrbed n the problem statement n Secton 3.. As the jon condton of k-attrbutes SEJ s the conjuncton of the condtons for the sngle attrbute jons, we can readly extend the prevous SQL statement shown n Fgure 4 nto the statement n Fgure 5. Snce the jon condtons are lnked by AND, f any set of condton s not satsfed, t s not necessary to check the other jon condtons of other attrbutes. Apparently, we can order the jon condtons sets to mnmze the cost of accessng auxlary tables. Intutvely, two factors need to be consdered: ) the sze of auxlary table Au ) the jon/sej selectvty of each par of jon attrbutes. SELECT * FOM, WHEE (A =B O ((SELECT Gd FOM Au WHEE P=A )=(SELECT Gd FOM Au WHEE P=B ))) 60
8 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 AND(A k =B k O ((SELECT Gd FOM Au k WHEE P=A k )=(SELECT Gd FOM Au k WHEE P=B k ))) Fgure 5: SQL for k-semantcally equal jon It s well-known to the database communty that the query performance can be adjusted by orderng jon condtons lnked by AND. However, drven by the specalty of a new type query, we dscuss ths orderng problem on the context of SEJ wth the purpose of mnmzng the cost of auxlary table access. Let p be the number of pages of the auxlary table Au where =,...,k. Gven relaton and, let ndex of the attrbute s,..., l ( j s the j k ) be the semantcally equal jon selectvty of and on l attrbutes. Formally, A B,..., A l B l to: s,..., l s equal. Smlarly, we defne the equal jon selectvty s,, l as: A B,..., A B l Gven an order of jon attrbutes,..., k, under the assumpton that the jon attrbutes are ndependent, the total I/O cost on access k auxlary tables s: [( s ) p s ( s ) p... s,,... l ( s ) p k k k Also due to the assumpton that the jon attrbutes are ndependent, we have ^ s s,... l. So the orderng task j... l can be presented as: j ]. Gven the szes of auxlary tables p,p,,p k, the equal jon selectvty of each par attrbutes s,s, s k, and the semantcally equal jon selectvty of each par attrbutes s,s, s k, are ordered to fnd an order,,, k cost functon such that the [( s ) p s ( s ) p... s,,... k ( s ) p ] k s mnmzed. k To fnd the order wth mnmum cost, theoretcally, we need to compute the costs for k! orders. However, based on the cost functon, we can apply some heurstcs to speed up the search. The equal jon selectvty or semantcally equal jon selectvty s often small. Consderng the cost functon, assume that the values of (s )p for =,, k are not sgnfcantly dfferent, we can expect that the cost of accessng -th auxlary table s less than that of accessng the ( ) -th auxlary table n a manner of magntude. Intutvely, to mnmze the total cost, t could be helpful to frst select ndex wth mnmum (s )p. Based on ths dea, we can gnore the semantcally equal jon selectvty and order the ndex accordngly accordng to the value (s )p. Once the optmum order O of ndex on the jon attrbutes s found, the query n Fgure 5 can be optmzed by re-orderng the jon condtons based on O. 4. Expermental esults In ths secton, to evaluate the effcency of SEJ, we report the result of the performance study on synthetc datasets. 4. Expermental Confguraton System resource: The experments are conducted on an 800 MHz Intel Pentum III 6
9 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 wth 56 megabytes man memory, runnng a commercal DBMS (Oracle 8). All nvented methods are mplemented usng standard SQL. Datasets: In our experments, all the relatons (,, and auxlary table Au) are generated by dfferent random varables, each of whch follows the unform dstrbuton. For brevty, we only show how to generate relatons for -attrbute SEJ. The dscusson can be readly extended to k- attrbutes SEJ. There are two controlled parameters for the auxlary table, the cardnalty Au and the number g of semantcally equvalent groups. Gven Au and g, we generate the group-based auxlary table Au(Gd,P) by randomly assgnng Au dstnct strngs nto g groups. Based on the unform dstrbuton, Au g gves the expected number of strng values n each group, whch s denoted as v. For the two relatons and, ther schema are desgned as ( A, A ) and ( B, B ) wth A and B to be the jon attrbutes. Further, we consder the nstances n each relaton. Agan for smplcty, we set the constrants that the values of attrbute A are exactly the same as the values of attrbutes B Consderng the attrbute A ( B ), ts values consst of three parts: the S part whch has the attrbute values n the auxlary table (.e., can semantcally match the values n the other relaton), E part whch has the values that can only exact match the values n the other relaton, and the N part, whch has the values that cannot match any values n the other relaton. Let S%,E%,N% (S%+E%+N%=) be the percentages of the values n part S, part E, and part N respectvely, and ns,ne,nn be the numbers of dstnct values n these there parts. Gven the above parameters, the jon relatons are generated by followng the equal dstrbuton. For example, n S part, we can expect each dstnct value appears * S% tmes, where s the ns cardnalty of ( ). 4. Expermental esults and Dscusson -Attrbute SEJ esults and Dscusson Effect of jon relaton sze : In ths experment, we ncrease from 00 records to 4,000 records to examne the performance of two types of query wth dfferent szes of jon relaton. The other parameters of the jon relatons are set as: S%=80%, E%=0, N%=0%, ns=0, ne=0, nn=3. For auxlary table, we set Au =0 and g=3. The performance of -attrbute SEJ by dfferent types of queres s presented n Fgure 6a. It can be seen that the three-jonbased approach s very neffcent. Therefore, t cannot be used for a large amount of data. As dscussed n Secton 3, the SQL statement for three-jon-based approach requres that four tables are joned, whch leads to expensve I/O cost. Therefore, for the rest of ths secton, we wll omt the result of the three-jon-based approach and only dscuss the group-based sub-query. Effect of selectvty s ^ : The second consderaton for -attrbute SEJ s about ts performance regardng the semantc jon ^ ^ selectvty s of and where s s composed of two parts, strct semantc jon selectvty s ' (.e. the selectvty for the pars of tuples that are matched based on strct semantcally equvalent relatonshp) and equal jon selectvty s (ntroduced n Secton 3.4). By defnton, we always have ^ s s s'. Note here that we slghtly change the way of generatng synthetc datasets n ths part. The records n S part are allowed to only be joned through an auxlary table (.e., no equal jon n ths part). In ths way, due to the equal 6
10 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 dstrbuton, the strct semantc jon ( S%) selectvty s, where g s the g number of groups n Au. Smlarly, gven E% and ne, the equal jon selectvty can be ( E%) estmated as. In the experments, ne the strct semantc jon selectvty s and equal jon selectvty can be adjusted by changng parameters S%, ns, E%, and ne. Both experments use some common parameters whch are the sze of jon relatons at 0,000 tuples wth ne at 5 and nn at 00. Also, for both experments, we set the sze of auxlary table Au at 00 wth g at 5, whch means that v s set at 4. To evaluate the mpact of s ', we fx s by always settng E%=0% and ne=5. We ncrease the strct semantc jon selectvty by ncreasng S% from 0% to 80%. Note that when S% s ncreased, we also decrease N% to keep the sze of jon relaton the same. For evaluatng the performance of SEJ regardng to equal jon selectvty s, we fx the s ' by always settng S%=0% and ns=00 and ncrease E% from 0% to 80%. a) Effect of jon relaton b) Effect of selectvty Fgure 6: Performance of SEJ We frstly measure the performance of SEJ regardng to s ', whch s shown n Fgure 6b. Obvously, the performance of SEJ s rrelevant to strct semantc jon selectvty. Ths verfes our thoughts that the amount of tme for auxlary table access only depends on the equal jon selectvty s. The reason s that for any two records t and t the auxlary table needs to be accessed unless t and t can be equal joned. Also, n Fgure 6b, we report the performance of SEJ when the selectvty s s ncreased. We can see that due to the same reason, the executon tme decreases when E% s ncreased. k-attrbutes SEJ esults and Dscusson In ths experment, we dscuss the SEJ jon n terms of the order of k attrbutes n an SQL statement. Partcularly, we verfy our cost analyss n Secton 3.4 and demonstrate how the proposed heurstcs help to quckly fnd an optmal order. As dscussed prevously, the followng two factors are consdered for the orderng: ) sze of auxlary table Au ) jon/sej selectvty. We generate jon relatons wth three jon attrbutes (k=3) of sze 00,000 records. The characterstcs and the estmated selectvtes of the relatons are shown n Table. As shown n Table, the frst and the 63
11 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 second attrbutes have the same Au, but they are dfferent n selectvtes. On the other hand, even f the thrd attrbute has the lowest selectvty, t has the largest Au. The performance of all possble orders of jon condtons for the dataset s evaluated. The performance on k-attrbutes SEJ wth all possble jon condton orderngs s reported n Fgure 7. On the x- axs, we put the ascendng order for all theoretcal I/O costs (for dfferent orderng) computed from the cost functon. The result shows that the order of actual cost s accordant wth the order of estmated cost, theoretcally. Ths demonstrates the correctness of our prevous analyss. Compared wth the worst case order 3--, the best order --3 has better performance, by 7.9%. As mentoned n Secton 3.4, we can also compute the best order by computng estmated cost ( s ) p, n whch the result s also the order --3. Another nterestng observaton s that the order --3 has more cost than the order - 3-, whch shows that the SEJ selectvty should also be consdered, not only the sze of auxlary tables. On the other hand, orderng only by SEJ selectvty can not perform very well, as presented by the order 3--. Table : Characterstcs and Selectvtes of Expermental Dataset Attrbute Auxlary table Jon relaton Selectvty Au g v S% E% N% ns ne nn s ' s % 60% 30% % 3.6% 3.8% % 0% 40% % 0.% 5.% % 0% 40% % 0.4% 3.6% ^ s Fgure 7: Performance of k-attrbute SEJ. 5. Concluson In ths paper, we have addressed the problem of synonym msmatch n the context of data ntegraton, where the data from dfferent sources are semantcally equvalent, but the data representaton s dfferent. We have ntroduced a new relatonal algebrac operaton, called semantcally equal jon (SEJ), whch can be generally appled for ntegratng two relatons based on predefned equvalence relaton(s). The orgnalty and contrbutons of our work nclude the followng aspects: 64
12 Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 ) we have ntroduced the semantc equvalent relatonshp to resolve the synonym msmatch problem. In fact, ths study can also complement prevous work of approxmate jon that manly focuses on resolvng typo msmatch. ) We have formally defned the concept of SEJ and proposed an effcent approach to mplement the SEJ operator n the standard DBMS by a sngle SQL statement. 3) By extensve experments we have demonstrated performance of our approach for SEJ. 6. eferences [] Halevy, A.Y., Ashsh, N., Btton, D., Carey, M., Draper, D., Pollock, J., osenthal, A., Skka, V., Enterprse Informaton Integraton: Successes, Challenges and Controverses, In Proceedngs of the 005 ACM SIGMOD Internatonal Conference on Management of Data, New York, NY, USA, ACM Press, pp , 005. [] ahm, E., Bernsten, P.A., A Survey of Approaches to Automatc Schema Matchng, the VLDB Journal,Vol. 0, pp , 00. [3] Colomb,.M., Orlowska, M.E.: Interoperablty n Informaton Systems. Informaton Systems, Journal,Vol.5, pp.37 50,994. [4] Orlowsk, M.W., On Optmsaton of Jons n Dstrbuted Database System, In: Future Databases, pp.06 4,99. [5] Gravano, L., Iperots, P.G., Jagadsh, H.V., Koudas, N., Muthukrshnan, S., Srvastava, D., Approxmate Strng Jons n a Database (Almost) for Free, In Proceedngs of 7th Internatonal Conference on Very Large Data Bases, Morgan Kaufmann, pp , 00. [6] Ullmann, J.., A Bnary n-gram Technque for Automatc Correcton of Substtuton, Deleton, Inserton and eversal Errors n Words. The Computer Journal, Vol. 0, pp.40 47, 976. [7] Gravano, L., Iperots, P.G., Koudas, N., Srvastava, D., Text Jons for Data Cleansng and Integraton n an dbms, In Proceedngs of the 9th Internatonal Conference on Data Engneerng, IEEE Computer Socety, pp.79 73, 003. [8] Gravano, L., Iperots, P.G., Koudas, N., Srvastava, D., Text Jons n an dbms for Web Data Integraton, In Proceedngs of the Twelfth Internatonal World Wde Web Conference, pp.90 0, 003. [9] Koudas, N., Marathe, A., Srvastava, D., Flexble Strng Matchng Aganst Large Databases n Practce, In Proceedngs of the Thrteth Internatonal Conference on Very Large Data Bases, Morgan Kaufmann, pp ,
Learning-Based Top-N Selection Query Evaluation over Relational Databases
Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **
More informationCan We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search
Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton
More informationFor instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)
Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationTsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance
Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More information6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour
6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the
More informationOracle Database: SQL and PL/SQL Fundamentals Certification Course
Oracle Database: SQL and PL/SQL Fundamentals Certfcaton Course 1 Duraton: 5 Days (30 hours) What you wll learn: Ths Oracle Database: SQL and PL/SQL Fundamentals tranng delvers the fundamentals of SQL and
More informationNUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS
ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana
More informationAnalysis of Continuous Beams in General
Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,
More informationHigh-Boost Mesh Filtering for 3-D Shape Enhancement
Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationAn Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices
Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More informationData Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach
Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer
More informationAn Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationA New Approach For the Ranking of Fuzzy Sets With Different Heights
New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays
More informationCMPS 10 Introduction to Computer Science Lecture Notes
CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationModule Management Tool in Software Development Organizations
Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,
More informationSteps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices
Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between
More informationTPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints
TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationHermite Splines in Lie Groups as Products of Geodesics
Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the
More informationA Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems
A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty
More informationMeta-heuristics for Multidimensional Knapsack Problems
2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,
More informationGSLM Operations Research II Fall 13/14
GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are
More informationDetermining the Optimal Bandwidth Based on Multi-criterion Fusion
Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn
More informationConcurrent Apriori Data Mining Algorithms
Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng
More informationOptimal Workload-based Weighted Wavelet Synopses
Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,
More informationComparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments
Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,
More informationFINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,
More informationR s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes
SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges
More informationMODULE DESIGN BASED ON INTERFACE INTEGRATION TO MAXIMIZE PRODUCT VARIETY AND MINIMIZE FAMILY COST
INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN, ICED 07 28-31 AUGUST 2007, CITE DES SCIENCES ET DE L'INDUSTRIE, PARIS, FRANCE MODULE DESIGN BASED ON INTERFACE INTEGRATION TO MAIMIZE PRODUCT VARIETY AND
More informationProper Choice of Data Used for the Estimation of Datum Transformation Parameters
Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationarxiv: v3 [cs.ds] 7 Feb 2017
: A Two-stage Sketch for Data Streams Tong Yang 1, Lngtong Lu 2, Ybo Yan 1, Muhammad Shahzad 3, Yulong Shen 2 Xaomng L 1, Bn Cu 1, Gaogang Xe 4 1 Pekng Unversty, Chna. 2 Xdan Unversty, Chna. 3 North Carolna
More informationQuerying by sketch geographical databases. Yu Han 1, a *
4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,
More informationX- Chart Using ANOM Approach
ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are
More informationSum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints
Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan
More informationMachine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More informationA NOTE ON FUZZY CLOSURE OF A FUZZY SET
(JPMNT) Journal of Process Management New Technologes, Internatonal A NOTE ON FUZZY CLOSURE OF A FUZZY SET Bhmraj Basumatary Department of Mathematcal Scences, Bodoland Unversty, Kokrajhar, Assam, Inda,
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationHelsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)
Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute
More informationQuality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation
Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on
More informationA MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS
Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung
More informationReducing Frame Rate for Object Tracking
Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg
More informationLoad Balancing for Hex-Cell Interconnection Network
Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,
More informationTerm Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task
Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto
More informationUB at GeoCLEF Department of Geography Abstract
UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department
More informationCombining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval
Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu
More information2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements
Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.
More informationAn Entropy-Based Approach to Integrated Information Needs Assessment
Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology
More informationWishing you all a Total Quality New Year!
Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma
More informationQuery Clustering Using a Hybrid Query Similarity Measure
Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan
More informationSummarizing Data using Bottom-k Sketches
Summarzng Data usng Bottom-k Sketches Edth Cohen AT&T Labs Research 8 Park Avenue Florham Park, NJ 7932, USA edth@research.att.com Ham Kaplan School of Computer Scence Tel Avv Unversty Tel Avv, Israel
More informationLoad-Balanced Anycast Routing
Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance
More informationVirtual Machine Migration based on Trust Measurement of Computer Node
Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on
More informationVectorization of Image Outlines Using Rational Spline and Genetic Algorithm
01 Internatonal Conference on Image, Vson and Computng (ICIVC 01) IPCSIT vol. 50 (01) (01) IACSIT Press, Sngapore DOI: 10.776/IPCSIT.01.V50.4 Vectorzaton of Image Outlnes Usng Ratonal Splne and Genetc
More informationObject-Based Techniques for Image Retrieval
54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the
More informationOntology Generator from Relational Database Based on Jena
Computer and Informaton Scence Vol. 3, No. 2; May 2010 Ontology Generator from Relatonal Database Based on Jena Shufeng Zhou (Correspondng author) College of Mathematcs Scence, Laocheng Unversty No.34
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationAssembler. Building a Modern Computer From First Principles.
Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought
More informationParallel matrix-vector multiplication
Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationEfficient Distributed File System (EDFS)
Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate
More informationClassifier Selection Based on Data Complexity Measures *
Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.
More informationScheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research
Schedulng Remote Access to Scentfc Instruments n Cybernfrastructure for Educaton and Research Je Yn 1, Junwe Cao 2,3,*, Yuexuan Wang 4, Lanchen Lu 1,3 and Cheng Wu 1,3 1 Natonal CIMS Engneerng and Research
More informationToday s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.
Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:
More informationSkew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach
Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research
More informationType-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data
Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES
More informationProblem Set 3 Solutions
Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,
More informationPositive Semi-definite Programming Localization in Wireless Sensor Networks
Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer
More informationPrivate Information Retrieval (PIR)
2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market
More informationBIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING
An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationSAO: A Stream Index for Answering Linear Optimization Queries
SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples
More informationRelated-Mode Attacks on CTR Encryption Mode
Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory
More informationThe Research of Support Vector Machine in Agricultural Data Classification
The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou
More informationChapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward
More informationHigh resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices
Hgh resoluton 3D Tau-p transform by matchng pursut Wepng Cao* and Warren S. Ross, Shearwater GeoServces Summary The 3D Tau-p transform s of vtal sgnfcance for processng sesmc data acqured wth modern wde
More informationConstructing Minimum Connected Dominating Set: Algorithmic approach
Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected
More informationPerformance Evaluation of Information Retrieval Systems
Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence
More informationCHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION
24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and
More informationRepeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits
Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.
More informationAvailable online at Available online at Advanced in Control Engineering and Information Science
Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced
More informationSelf-tuning Histograms: Building Histograms Without Looking at Data
Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com
More informationA Clustering Algorithm for Chinese Adjectives and Nouns 1
Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,
More informationBioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.
[Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented
More informationSubspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;
Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features
More informationAPPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT
3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ
More informationRun-Time Operator State Spilling for Memory Intensive Long-Running Queries
Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process
More information