Exploring synonyms within large commercial site search engine queries
|
|
- William Jennings
- 6 years ago
- Views:
Transcription
1 Explorng synonyms wthn large commercal ste search engne queres Jula Kseleva, Andrey Smanovsky HP Laboratores HPL Keyword(s): synonym mnng, query log analyss Abstract: We descrbe results of experments of extract-ng synonyms from large commercal ste search engne query log. Our prmary object s product search queres. The resultng dctonary of synonyms can be plugged nto a search engne n order to mprove search results qualty. We use product database to extend the dctonary. External Postng Date: Aprl 6, 2011 [Fulltext] Internal Postng Date: Aprl 6, 2011 [Fulltext] Approved for External Publcaton Copyrght 2011 Hewlett-Packard Development Company, L.P.
2 Explorng synonyms wthn large commercal ste search engne queres Jula Kseleva, Andrey Smanovsky HP Labs Russa Abstract. We descrbe results of experments of extractng synonyms from large commercal ste search engne query log. Our prmary object s product search queres. The resultng dctonary of synonyms can be plugged nto a search engne n order to mprove search results qualty. We use product database to extend the dctonary. Keywords: synonym mnng, query log analyss 1 Introducton A large commercal ste s an nformaton portal for customers where they can fnd everythng about the vendor s products e.g. manuals, drvers, etc. A large commercal ste has a search engne and ts man functon s to help customers to retreve approprate nformaton. We regard a user search query as a product query when the user s ntent s to retreve nformaton about hp products and servces ncludng manuals, drvers, and support. One way to mprove search qualty s to utlze a dctonary of synonyms that ncorporates a document collecton vocabulary and vocabulary of dfferent users. We analyze ncomng queres for synonym terms whch could be ncluded nto a thesaurus. We attempted several technques n order to detect synonymous terms and queres among queres from a large commercal ste search engne. Query expanson s defned as a stage of the nformaton retreval process durng whch a user s ntal query statement s extended wth addtonal search terms n order to mprove retreval performance. Query expanson s ratonalzed by the fact that ntal query formulaton does not always reflect the exact nformaton need of a
3 user. The applcaton of thesaur to query expanson and reformulaton has become an area of ncreasng nterest. Three types of query expanson are dscussed n the lterature: manual, automatc, and nteractve (ncludng semautomatc, user-medated, and user-asssted). These approaches use dfferent sources of search terms and a varety of expanson technques. Manual approach does not nclude any knowledge about a collecton whle nteractve approach mples a query modfcaton by a feedback process. However, assstance could be sought from other sources, ncludng a dctonary or a thesaurus. In the query expanson research, one of the bggest ssues s to generate approprate keywords that represent the user s ntenton. Spellng correcton [1] s also related to synonyms detecton as ts technques are applcable, especally for product synonyms whch share a lot of common words. The methods descrbed above have used only a query set as nput data. But there are a few publshed approaches whch use external data sources for synonym detecton to make the technque more robust. The goal of ths project s detectng synonymous terms n search queres whch are submtted by users to a large commercal ste search engne. We also provde recommendatons for enhancng search qualty on the large commercal ste. The remander of the report s organzed as follows. We revew related work n Secton 2. The problem s formulated n Secton 3. Secton 4 dscusses algorthms that were utlzed; n partcular, Sectons present functons we use as measures of smlarty. Secton 5 descrbes our expermental data set and compares expermental results. Fnally, Secton 6 summarzes our contrbuton. 2 Related Work There are a lot of papers related to synonym detecton n search queres. Thesaur have been recognzed as a useful source for enhancng search-term selecton for query formulaton and expanson [4], [5]. Termnologcal assstance may be provded through ncluson of thesaur and classfcaton schemes nto the IR system. In a seres of experments on desgnng nterfaces to the Okap search engne t was found that both mplct and explct use of a thesaurus durng automatc and nteractve query expanson were benefcal. It was also suggested that whle the system could fnd useful thesaurus terms through an automatc query-expanson process, terms explctly selected by users are of partcular value ([4], [6]). 2
4 The paper [3] presents a new approach to query expanson. Authors proposed Related Word Extracton Algorthm (RWEA). Ths algorthm extracts words from texts that are supposed to be strongly related to the ntal query. RWEA weghts were also used n Robertson s Selecton Value (RSV), a well known method for relevance feedback [4], weghtng scheme. Query expanson was performed based on the results of each method (RSV, RWEA, and RSV wth RWEA weghts) and a comparson was made. RWEA evaluates a word n a document and RSV evaluates a word among several documents, consequently, the combnaton should perform unformly well. Expermental results corroborated that statement: the combned method works effectvely for all queres on average. In partcular, when a user nputs ntal queres whch results have Average Precson (AP) under 0.6 the method obtans the hghest Mean Average Precson (MAP). It also obtans the hghest among the three methods MAP on experments wth navgatonal queres. However, RWEA obtans the hghest MAP on experments wth nformatonal queres. Expermental results show that effectveness of a method for query expanson depends on the type of queres. There are a lot of research papers about query spellng correcton [1] whch were publshed recently. We thnk that ths area s also related to synonym detecton as ts technques are applcable. For example n [7] authors consder a new class of smlarty functons between canddate strngs and reference enttes. These smlarty functons are more accurate than prevous strng-based smlarty functons because they aggregate evdence from multple documents and explot web search engnes n order to measure smlarty. They thoroughly evaluate technques on real datasets and demonstrate ther precson and effcency. In [2] authors present a study on clusterng of synonym terms n search queres. The man dea s that f users clck on the same web-page after submttng dfferent search queres those queres are synonyms. 3 Problem Statement Our goal s to buld a thesaurus of synonyms terms whch are related to respectve products. We also provde a set of recommendatons for enhancng qualty of search results returned by the large commercal ste search engne. 3
5 4 Algorthms 4.1 Smlarty Dstance Metrcs We perform experments wth token-based and term-based smlarty metrcs. We choose ths metrcs because ther effcency was proved n lterature [11], [13] Token-based dstance There are a lot of token-based strng smlarty metrcs whch are descrbed n the lterature. Levenshten dstance (LD) s a measure of the smlarty between two strngs, whch we wll refer to as the source strng (s) and the target strng (t). The dstance s the number of deletons, nsertons, or substtutons requred to transform s nto t. For example, If s s "test" and t s "test", then LD(s,t) = 0, because no transformatons are needed. The strngs are already dentcal. If s s "test" and t s "tent", then LD(s,t) = 1, because one substtuton (change "s" to "n") s suffcent to transform s nto t. The greater the Levenshten dstance s the more dfferent are the strngs. Levenshten dstance s also called edt dstance. Smth Waterman dstance [11] s smlar to Levenshten dstance. It was developed to dentfy optmal algnments between related DNA and proten sequences. It has two parameters, a functon d and a gap G. The functon d s a functon from an alphabet to cost values for substtutons. The gap G allows costs to be attrbuted to nsert and delete operatons. The smlarty score D s computed wth a dynamc programmng algorthm descrbed by the equaton below: 0 // start D( 1, j 1) d( s, tj) // subst / copy D(, j) max D( 1) G // nsert D(, j 1) G // delete The fnal score s gven by the hghest valued cell. Table 1 presents the example of score calculaton. 4
6 C O H E N M C C O H N Table 1. Smth-Waterman calculaton between strng cohen and mccohn where G = 1, d(c,c) =2, d(c,d) = +1. Smth-Waterman-Gotoh [12] s an extenson of Smth-Waterman dstance that allows affne gaps wthn the sequence. The Affne Gap model ncludes varable gap costs typcally based upon the length of the gap l (W l ). If two sequences, A (=a 1 a 2 a 3... a n ) and B (=b 1 b 2 b 3... b m ), are compared the formula for dynamc programmng algorthm s: D j =max{d -1, j-1 +d(a,b j ), max k {D -k,j -W k }, max l {D, j-l -W l }, 0}, where D j s n fact maxmum smlarty of two segments endng n a and b j respectvely. Two affne gap costs are consdered, a cost for startng a gap and a cost for contnuaton of a gap. Defnton: The taxcab dstance, d 1, between two vectors p, q n an n-dmensonal real vector space wth fxed Cartesan coordnate system, s the sum of the lengths of the projectons of the lne segment between the ponts onto the coordnate axes: d 1( p, q) p q 1 p q, Where p p, p,..., p ) and q q, q,..., q ) are the two vectors. ( 1 2 n n 1 ( 1 2 n The taxcab metrc s also known as rectlnear dstance, L 1 dstance or 1 norm, cty block dstance, Manhattan dstance, or Manhattan length Term based dstance We choose cosne smlarty metrc as a term-based dstance. Cosne smlarty s a measure of smlarty between two vectors whch s equal to the cosne of the angle between them. The result of the Cosne functon s equal to 1 when the vectors are collnear or between 0 and 1 otherwse. 5
7 Cosne of two vectors can be easly derved by usng the Eucldean Dot Product formula: a * b a b cos a * b smlarty cos( ) a b n 1 ( a ) n a 1 2 b n 1 ( b ) As a weghtng functon we used a tf*df weght. The tf (term frequency) n the gven document s smply the number of tmes a gven term appears n that document: 2 tf n n k j k j where n,j s the number of occurrences of the consdered term t n document d j, and the denomnator s the sum of number of occurrences of all terms n document d j, that s, the sze of the document d j. The df (nverse document frequency ) s a measure of the general mportance of the term : df D log { d : t d} We selected tf*df weght. It combnes two aspects of a word, the mportance of word for document and ts dscrmnatve power wthn the whole collecton. Each query was regarded as a document n the collecton. Tf s the frequency of a term n a query. It s almost always equal to 1 and df s the ordnary nverse document frequency. 4.2 Probablstc Model Source Chanel Model In paper [1] authors apply source channel model to the error correcton task. We explore the possblty of applyng t to fndng synonyms. Source channel model has been wdely used for spellng correcton. Usng source channel model, we try to solve an equvalent problem by applyng Bayes rule and droppng the constant denomnator: 6
8 * c argmax c C P(q c)p(c), where q s query, c s correcton canddate. In ths approach, two components of generatve model are nvolved: P(c) characterzes user s ntended query c and P(q c) models error. The two components can be estmated ndependently. The source model (P(c)) could be approxmated wth n-gram statstcal language model. It s estmated wth tokenzed query logs n practce for mult-term query. Consder, for example, a bgram model. c s a correcton canddate contanng n terms, c= c 1 c2... cn, then P(c) could be wrtten as a product of consecutve bgram probabltes: P ( c) P( c c 1 ) Smlarly, the error model probablty of a query s decomposed nto generaton probabltes of ndvdual terms whch are assumed to be ndependent: P q c) P( q c ) ( Now the word synonymy can be accessed va correlaton. There are dfferent ways to estmate dstrbutonal smlarty between two words, and the one we propose to use s confuson probablty. Formally, confuson probablty P c estmates the possblty that a word w 1 could be replaced by another word w 2 [1]: P( w w c 2 ) P ( w2 w1 ) P( w w1 ) P( w2 ) P( w) w where w belongs to the set of words that co-occur wth both, w 1 and w 2. For synonym detecton we assume that w 1 s an ntal word and w 2 s a synonym. Confuson probablty P c ( w 2 w1 ) models the probablty of w 1 beng rephrased as w 2 n query logs Utlzng database as external data contaner As we menton n secton Related works, there s a successful practce of utlzng external sources to dscover synonyms. We present a novel method whch makes use of a database wth product names to enhance synonym detecton estmated n the prevous secton. The database provdes new ways to detect synonym terms because t contans product names whch are related to the queres but could be expressed n, 7
9 other words. Synonym terms from the database are extremely useful for detectng related products durng search process. We ntroduce an analog of confuson probablty between words n the query and terms n the database. Fgure 1. Metrcs nsde search query tokens and product names database Fgure 1 shows sets of tokens n a database (D) and n a query log (Q). D Q s an ntersecton of terms n the database and the query log; w s a token from the ntersecton. P w w ) s the confuson probablty from [1]. c ( w depcts a smlarty functon wthn the space of database terms between the term ' w and the term w, whch we choose to be Manhattan dstance because t performed best as token-based smlarty measure. { w } s a set of terms whch occur n the ntersecton between database and queres (n D Q ). ' We extend a noton of confuson probablty between w and w where w s term ' whch occurs only n queres and w s term whch occurs only n the database. We propose two ways of ntroducng confuson probablty extenson (n both formulas ndexes words of the ntersecton): ' 1. P ( w, w ) max C max( P ( w w ) * ( w, w )) max_ c ' 2. P ( w, w ) C P( w ) ( w, w) * P ( w w ) * P( w ) c W c c ' 8
10 Note that the natural desred property P c ( w, w ) P c ( w w ) f w DQ s not automatcally met by the ntroduced extenson. Another possble approach to extend the confuson probablty s to ntro- ' ' " duce P C ( w, w ) accordng the jont dstrbuton of ( w, w j ), where w j DQ. w DQ and 5 Experments In order to perform ntal data flterng, we have bult basc statstcs of the query log and found notable propertes of the current large commercal ste search engne traffc, whch are presented n secton 5.1. Next we evaluated the metrcs presented n the Secton 4. We present the evaluaton n subsequent sectons together wth sample results. 5.1 Data Descrpton In ths secton we present data descrpton and some statstcs whch wll help us to understand data nature. By data nature we mean answers to the followng questons: Where the queres have come from? What s the average length of a query? What s the lst of stop words for the large commercal ste search engne query log? The query log used for analyss s collected durng 8 days. It contans queres, unque queres, and queres whch occur more than one tme. The average length of the query s words. Table 2 provdes a detaled query log descrpton. The log does not contan any addtonal nformaton about users except paddresses. They do not unquely dentfy users. 9
11 Ipaddress 1 Tme Request Browser nformaton Status Status1 Return page *.* 05/Jun/ 2010:00 :00: GET /query.html?lang=e n&search=++&qt= pavllon+6130+add+re place+expanson&l a=en&cc=us&char set=utf-8 HTTP/1.1 Mozlla/5.0 (Wndows; U; Wndows NT 6.1; en- US; rv: ) Gecko/ Frefox/ www. hp.co m/ Table 2. Query log descrpton. The query frequency dstrbuton n the log s presented below, on the fgure 2. Fgure 2. Query s frequency dstrbuton. Top most frequent queres are gven n the Table 3. The most popular queres are nonproduct queres lke google. We thnk that those queres are most frequent because they have come from nternal corporate users. Probably t happens because the commercal ste page s by default a start page of company s employees. 1 Here and further on IP addresses are partally obfuscated because of prvacy consderatons 10
12 Query Frequency search: 1066 Google 610 Drvers 579 hp offcejet j4500 seres search 535 Slate 439 hp deskjet f2200 seres search 421 Warranty 363 hp busness avalablty center 354 hp deskjet f4200 seres search 246 Tablet 232 go nstant 214 Table 3. The most frequent queres n the log There s a parameter web secton n the request that shows what category on ste was selected by a user. From our pont of vew the query dstrbuton by topc could be useful n order to understand user behavor. We bult statstcs by web secton from query URLs. Ths web secton s related to the query topc. The statcs s demonstrated n the Table 4. The total number of web secton queres s 527 whch s 0.35 % of the whole number of queres,.e. web secton functonalty s not popular wth the users. Web Secton Topcs Frequency small & medum busness 153 Home 108 compaq.com 70 home & home offce 55 home & home offce secton only 42 small & medum busness ste 37 11
13 hp procurer networkng 27 products and servces 10 home & home offce only 9 hp promotons only 6 busness technology optmzaton (bto) software 4 learn about supples 3 hp onlne store 2 hp servces 1 Total 527 (0, 35%) Table 4. Dstrbuton of web secton queres 5.2 Data Preprocessng Data Flterng For some of the approaches that we apply, as well as to make dstncton between external and nternal use of the ste, we need per user data. To obtan per-user statstcs we develop a technque for data flterng. We fgure out that there were p-addresses whch send many requests to the search engne. We gve examples of such p-addresses, whch had more than 1000 requests, n the Table 5. We beleve that most of those search queres are sent from company s employees computers through corporate proxes. The corporate p-addresses are marked wth bold n the Table 5. We called ths set of ps non-confdental and they were removed from the data set. Ip-address Frequency *.* *.* *.* *.* *.*
14 *.* *.* *.* *.* *.* *.* *.* 1200 Table 5. Top non-confdental p-addresses We calculated statstcs of requests from all p-addresses and from non-confdental p-addresses. The statstcs are presented n the Table 6. We conclude that at least 25% of search queres orgnate from nsde the company. Date Number of confdental requests Number of all requests Delta 1 June June June June June June June June Total Table 6. Daly query statstcs per orgn To make our methodology more robust we buld a lst of stop-words. It contans prepostons and term hp. We used ths lst to clean up queres n the log. 13
15 5.2.2 Identfcaton of user sesson tme In one sesson a user may try to pursue sngle nformaton need and reformulate queres untl he/she gets a desred result. Thus, analyzng user sessons n order to fnd synonymous queres seems reasonable. We fltered p-addresses form the log accordng to the algorthm descrbed n Secton Data flterng to dentfy user sesson. Defnnton1: Delta s a tme n seconds between two contguous clcks from the same p. Defnnton2: Delta frequency frequency of delta n the whole query log. For both cases, wth non-confdental p-addresses and wthout non-confdental paddresses, we bult plots whch are presented on Fgure 3. We suppose that we should see how a user rephrases the query or expands t. We used Manhattan Dstance to fnd synonyms because t has performed well n prevous experments. (a) 14
16 (b) Fgure 3. (a) a hstogram of deltas whch start from 5 seconds for all p-addresses and (b) a hstogram for deltas whch start from 5 seconds for set of p-addreses wthout non-confdental ps. 5.3 Evaluaton Metrcs We use precson as an evaluaton metrc for our experments. Its formula s gven below: # correct _ results Precson # total _ results 5.4 Experments wth dfferent token based smlarty metrcs The frst approach that we consdered for fndng synonyms orgnates n the task of matchng smlar strngs 2. To characterze whether or not a canddate strng s synonymous to another strng, we compute the strng smlarty score between the canddate and the reference strngs [10, 6]. 2 We use smmetrcs lbrary ( 15
17 Unfortunately, there s no gold standard for evaluatng synonyms dscovery n query logs and we have to buld ground truth. After performng experments wth dfferent metrcs we select top results and evaluate them manually. We decded not to make general poolng and evaluate precson at 100 metrc nstead. We beleve that top smlar pars are more stable that pars smlar to a gven one. The results of evaluaton and the volume of gold standard are presented n the Table 7. Token-based Metrc Gold Standard Sze Precson Levenshten Dstance Smth-Waterman Dstance Smth-Waterman-Gotoh Dstance Manhattan Dstance Table 7. Results of experments wth proposed token-based metrc. Manhattan Dstance shows the best precson at 100. The man reason for low precson s that strng smlarty does not mply synonymy. E.g. strngs hp deskjet 960c and deskjet 932c are smlar accordng to smlarty metrc but they represent dfferent models of prnters and ths s not a case of synonymy Synonyms detecton by usng clck on the same URL A hypothess suggested n [2] clams that f users clck on the same search result URL ther queres should be synonyms. We explored that hypothess on our data. The Table 8 shows a few examples that were obtaned: Id Queres from the same clcked url 1 hp deskjet 845c hp deskjet d hp deskjet d1360 hp deskjet 845c 3 hp laserjet 4350tn hp laserjet hp laserjet 1102 hp laserjet 4350tn 5 hp photosmart c6380 hp photosmart a524 hp photosmart c
18 6 hp psc 1300 hp psc 1315 hp psc hp psc 1315 hp psc 1300 hp psc hp photosmart a524 hp photosmart c6380 hp photosmart c hp photosmart c4240 hp photosmart c6380 hp photosmart a hp psc 2410 hp psc 1300 hp psc hp pavlon dv6500 hp pavlon dv2000 hp pavlon dv3 Table 8. Examples of synonyms through clcks on the same URL One can see that we obtaned low precson. A clue to that ssue s that queres whch contan dfferent model numbers are regarded as synonyms. We expected that users wll reformulate a query by replacng a term, but we found that users mostly replace a model number Synonyms detecton by usng user sesson We perform experments wth the purpose of fndng smlar terms wthn the query sesson usng the methodology to detect a user sesson that we descrbed n secton We evaluated 203 queres manually and ths set s our gold standard for expert evaluaton. We obtaned precson equal to A few examples of synonyms n one user sesson are gven n the Table 9. The Table 9 also presents a smlarty value between queres wthn the sesson. User IP Query1 Query2 Smlarty Value audo sp27792 sp hpdv6-1153e drvers hp prolant ml350 g6 dv5-1153e drvers 0.5 ml330 g ml330 ml330 g
19 hp offcejet j4500 seres search hp offcejet j4500 seres warranty regstraton Table 9. Synonymous queres wthn a user sesson 5.5 Experments wth term based metrc We nflated weght for terms that are numbers or contan numbers. It was done n order to avod regardng queres wth dfferent model numbers as synonyms. Canddate pars of synonymous queres whch had cosne smlarty less than 0.7 were fltered. We have evaluated 150 queres and obtaned precson of 0.4. Almost all results are synonyms expanson. We dd not nclude the term hp and prepostons nto features space because we consder them as stop words. A few examples of synonyms found wth cosne smlarty are presented n the Table 10. The obtaned set of synonyms could be dvded nto two categores: query expanson (pars 1, 2, and 3) query rephrasng (par 4). In ths case we can conclude that terms laptop and notebook are synonyms. d Intal Query Query Synonym photosmart hp laserjet 4250n 4250n 3 rx3715 paq rx laptop 4510 notebook Table 10. Examples of query synonyms obtaned wth cosne smlarty metrcs 5.6 Experments wth confuson probablty In ths secton we appled another approach to synonyms detecton. Ths approach detects synonyms on the level of sngle words rather than whole queres and t recalls source channel model. 18
20 Some of the top results of the descrbed synonyms detecton method are presented n the Table 11. Most of presented synonyms could be characterzed by followng categores: paronymous terms lke face and facal ; msspellng lke Desgnerjet and Desgnjet ; dfferent forms of the same word lke dv42160us and dv4-2164us. Query term Query term should be smlar Confuson probablty Desgnerjet Desgnjet 0.75 Wndows Twan Twn 0.2 Mchael Mcheal dv42160us dv4-2164us Facal Face Vtamne Vtamn 0.2 Ms-6390 Ms Technsch Farm Table 11. Synonymous terms n queres detected wth confuson probablty 6 Concluson and recommendatons We dscovered that all obtaned synonyms can be classfed nto the followng groups: 1. Msspellngs. 2. Dfferent forms of a word (mostly plural form) 3. Term and dgt. Terms adherng the followng regular expressons: Dgt Space* Letter and Letter Space* Dgt. 4. Query expansons. 5. Rephrasngs. It s the type of synonyms whch s the most nterestng for us. The Table 12 contans examples of the above categores. 19
21 Category Intal Query Synonyms Query Msspellng 1) alanta 2) laser 3) vdeo 4) Desgnerjet Warrantes 1) Atlanta 2) Leser 3) Vdeo 4) Desgnerjet Warranty Dfferent form of the word Term and dgt dv 8 dv8 Query expanson hp offce locatons n hp nda nda Rephrasng 1) Remove 2) Actvaton 3) How to 4) Total care 5) Call center 1) Unnstall 2) Product key 3) Help, not workng, support 4) Advser 5) servce center Table 12. Synonyms categores wth examples Accordng the dscovered groups of synonyms we gve the followng recommendatons: 1. Make spellng correcton n run tme. We can dentfy and store a lst of most common msspelled terms. The appendx B demonstrates that currently search engne at the ste cannot detect a msspellng. The Fgure 5 shows that the search engne does not correct msspellng and returns rrelevant results. For now we cannot say that we have detected the whole lst of msspellngs because the current query log does not have enough data. 2. We thnk that storng dfferent forms of terms wll mprove search qualty. 3. Make data normalzaton. Terms adherng the followng regular expressons: Dgt Space* Letter and Letter Space* Dgt should be normalzed. We should normalze ncomng queres and data n the database. The appendx C contans two Fgures, 7 and 8, whch show how search result could change dependng on form of wrtng for hard drve capacty. 4. We need more data to detect query expansons. The search engne has query reformulatons servce but sometmes very werd suggestons are returned. One of the examples s presented n appendx A, Fgure 4. The ste should have a product orented search engne but suggested queres look lke most frequent queres and are not related to products. An example could be found n the Appendx A, the Fgures 5 and We present novel technque for synonym detecton n ths report. We need more data to detect strong lst of rephrasng synonyms. 20
22 We detected two problems wth data set: The majorty of queres come from nternal corporate users and they are not product search queres. We thnk that ths pecularty s not nherent to the specfc query log and reflects general ssues wth the current search functonalty on the ste. Statstcs of the one week log are not enough to detect strong synonym patterns. We total number of extracted synonym pars counts on tens. We hope that a longer log can ncrease that number wth close to lnear dependence on the log sze. 7 References 1. Mu L, Muhua Zhu, Yang Zhang, Mng Zhou. Explorng Dstrbutonal Smlarty Based Models Query Spellng Correcton. In processng of the 21st Internatonal Conference on Computatonal Lngustcs and 44th Annual Meetng of the ACL, pages , Jeonghee Y, Farzn Maghoul.Query clusterng usng clck-through graph. In processng of SIGIR, Tetsuya Osh, Shunsuke Kuramoto, Tsunenor Mne, Ryuzo Hasegawa, Hrosh Fujta, Myuk Koshmura: A Method for Query Expanson Usng the Related Word Extracton Algorthm. Web Intellgence/IAT Workshops 2008: Beaulev, M. (1997). Experments of nterfaces to support query expanson Journal of Documentaton, 53(1), Brajnk, G., Mzzaro, S., & Tasso, C. (1996, August). Evaluatng user nterfaces to nformaton retreval systems: A case study on user support. Proceedngs of the 19th annual conference on Research and Development n Informaton Retreval (ACM/SIGIR) (pp ). Zurch, Swtzerland. 6. Jones, S., Gatford, M., Hancock-Beauleu, M., Robertson, S.E.,Walker,W.,& Secker, J. (1995). Interactve thesaurus navgaton: Intellgence rules Ok? Journal of the Amercan Socety for Informaton Scence, 46(1), Surajt Chaudhur, Venkatesh Gant, Dong Xn. Explotng Web Search to Generate Synonyms for Enttes, WWW K. Chakrabart, S. Chaudhur, V. Gant, and D. Xn. An effcent flter for approxmate membershp checkng. In SIGMOD Conference, pages , W. W. Cohen and S. Sarawag. Explotng dctonares n named entty extracton: combnng sem-markovextracton processes and data ntegraton methods. InKDD, pages 89-98, C. H. Bennett, P. Gács, M. L, P. M. B. Vtány, and W. Zurek, Informaton dstance, IEEE Trans. Inform. Theory, vol. 44, pp , July Smth, T. F. and Waterman, M. S. Identfcaton of common molecular subsequences, J. Mol. Bol., pp ,
23 12. Gotoh, O. "An Improved Algorthm for Matchng Bologcal Sequences". Journal of Molecular Bology. 162: , Rshn Haldar, Debajyot Mukhopadhyay.Levenshten Dstance Technque n Dctonary Lookup Methods: An Improved Approach. In processng of CoRR abs/ (2011). 8 Appendx 8.1 A. Fgure 4. Controversal query suggestons: 22
24 23
25 8.2 B Fgure 5. Msspelled query Alanta servce Fgure 6. Search page for query Atlanta servce 24
26 8.3 C Fgure 7. Search page for query hp eltebook 200 gb. Fgure 8. Search page for query hp eltebook 200gb. 25
Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task
Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationQuery Clustering Using a Hybrid Query Similarity Measure
Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan
More informationPerformance Evaluation of Information Retrieval Systems
Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationTsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance
Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for
More informationUser Authentication Based On Behavioral Mouse Dynamics Biometrics
User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA
More informationLearning-Based Top-N Selection Query Evaluation over Relational Databases
Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **
More informationUB at GeoCLEF Department of Geography Abstract
UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department
More informationEnhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques
Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland
More informationAn Entropy-Based Approach to Integrated Information Needs Assessment
Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationA CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION
A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com
More informationAvailable online at Available online at Advanced in Control Engineering and Information Science
Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationDescription of NTU Approach to NTCIR3 Multilingual Information Retrieval
Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan
More informationCSCI 5417 Information Retrieval Systems Jim Martin!
CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne
More informationInformation Retrieval
Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are
More informationProgramming in Fortran 90 : 2017/2018
Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values
More informationA Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems
A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty
More informationSubspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;
Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationOptimizing Document Scoring for Query Retrieval
Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationSteps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices
Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between
More informationLoad Balancing for Hex-Cell Interconnection Network
Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,
More informationCan We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search
Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton
More informationRelated-Mode Attacks on CTR Encryption Mode
Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory
More information6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour
6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationLocal Quaternary Patterns and Feature Local Quaternary Patterns
Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents
More informationDetection of an Object by using Principal Component Analysis
Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,
More informationTN348: Openlab Module - Colocalization
TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More information3D vector computer graphics
3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres
More informationFeature Reduction and Selection
Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationA Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment
A Webpage Smlarty Measure for Web Sessons Clusterng Usng Sequence Algnment Mozhgan Azmpour-Kv School of Engneerng and Scence Sharf Unversty of Technology, Internatonal Campus Ksh Island, Iran mogan_az@ksh.sharf.edu
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationNews. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example
Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationProper Choice of Data Used for the Estimation of Datum Transformation Parameters
Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and
More informationCS 534: Computer Vision Model Fitting
CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust
More informationAn Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationFINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,
More informationFor instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)
Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A
More informationSum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints
Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan
More informationImage Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline
mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and
More informationSkew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach
Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research
More informationWishing you all a Total Quality New Year!
Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma
More informationA fast algorithm for color image segmentation
Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au
More informationLobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide
Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.
More informationOutline. Type of Machine Learning. Examples of Application. Unsupervised Learning
Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton
More informationA Multi-step Strategy for Shape Similarity Search In Kamon Image Database
A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,
More informationEECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationShape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram
Shape Representaton Robust to the Sketchng Order Usng Dstance Map and Drecton Hstogram Department of Computer Scence Yonse Unversty Kwon Yun CONTENTS Revew Topc Proposed Method System Overvew Sketch Normalzaton
More informationSummarizing Data using Bottom-k Sketches
Summarzng Data usng Bottom-k Sketches Edth Cohen AT&T Labs Research 8 Park Avenue Florham Park, NJ 7932, USA edth@research.att.com Ham Kaplan School of Computer Scence Tel Avv Unversty Tel Avv, Israel
More informationUnsupervised Learning
Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and
More informationThe Effect of Similarity Measures on The Quality of Query Clusters
The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of
More informationMachine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More informationDetermining the Optimal Bandwidth Based on Multi-criterion Fusion
Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn
More informationA Knowledge Management System for Organizing MEDLINE Database
A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson
More informationHigh-Boost Mesh Filtering for 3-D Shape Enhancement
Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,
More informationClassifier Selection Based on Data Complexity Measures *
Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.
More informationClassic Term Weighting Technique for Mining Web Content Outliers
Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha
More informationA Method of Hot Topic Detection in Blogs Using N-gram Model
84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna
More informationUSING GRAPHING SKILLS
Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp
More informationOutline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1
4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:
More informationReducing Frame Rate for Object Tracking
Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg
More informationSelecting Query Term Alterations for Web Search by Exploiting Query Contexts
Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer
More informationOnline Text Mining System based on M2VSM
FR-E2-1 SCIS & ISIS 2008 Onlne Text Mnng System based on M2VSM Yasufum Takama 1, Takash Okada 1, Toru Ishbash 2 1. Tokyo Metropoltan Unversty, 2. Tokyo Metropoltan Insttute of Technology 6-6 Asahgaoka,
More informationMining User Similarity Using Spatial-temporal Intersection
www.ijcsi.org 215 Mnng User Smlarty Usng Spatal-temporal Intersecton Ymn Wang 1, Rumn Hu 1, Wenhua Huang 1 and Jun Chen 1 1 Natonal Engneerng Research Center for Multmeda Software, School of Computer,
More informationKeywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines
(IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationLinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals
nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted
More informationIntrinsic Plagiarism Detection Using Character n-gram Profiles
Intrnsc Plagarsm Detecton Usng Character n-gram Profles Efstathos Stamatatos Unversty of the Aegean 83200 - Karlovass, Samos, Greece stamatatos@aegean.gr Abstract: The task of ntrnsc plagarsm detecton
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationHierarchical clustering for gene expression data analysis
Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally
More informationQuality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation
Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15
CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc
More informationHermite Splines in Lie Groups as Products of Geodesics
Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the
More informationNAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics
Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson
More informationCorner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity
Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent
More informationDetermining Fuzzy Sets for Quantitative Attributes in Data Mining Problems
Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku
More informationMOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN
MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS by XUNYU PAN (Under the Drecton of Suchendra M. Bhandarkar) ABSTRACT In modern tmes, more and more
More informationThe Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique
//00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy
More information1. Introduction. Abstract
Image Retreval Usng a Herarchy of Clusters Danela Stan & Ishwar K. Seth Intellgent Informaton Engneerng Laboratory, Department of Computer Scence & Engneerng, Oaland Unversty, Rochester, Mchgan 48309-4478
More informationPrivate Information Retrieval (PIR)
2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market
More informationPersonalized Concept-Based Clustering of Search Engine Queries
IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton
More informationStructured Query Suggestion for Specialization and Parallel Movement: Effect on Search Behaviors
Structured Query Suggeston for Specalzaton and Parallel Movement: Effect on Search Behavors Makoto P. Kato Tetsuya Saka Katsum Tanaka Mcrosoft Research Asa, Chna tetsuyasaka@acm.org Kyoto Unversty, Japan
More informationMeta-heuristics for Multidimensional Knapsack Problems
2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,
More informationA Deflected Grid-based Algorithm for Clustering Analysis
A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan
More informationKeyword-based Document Clustering
Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of
More informationOptimal Workload-based Weighted Wavelet Synopses
Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,
More information