Exploring synonyms within large commercial site search engine queries

Size: px
Start display at page:

Download "Exploring synonyms within large commercial site search engine queries"

Transcription

1 Explorng synonyms wthn large commercal ste search engne queres Jula Kseleva, Andrey Smanovsky HP Laboratores HPL Keyword(s): synonym mnng, query log analyss Abstract: We descrbe results of experments of extract-ng synonyms from large commercal ste search engne query log. Our prmary object s product search queres. The resultng dctonary of synonyms can be plugged nto a search engne n order to mprove search results qualty. We use product database to extend the dctonary. External Postng Date: Aprl 6, 2011 [Fulltext] Internal Postng Date: Aprl 6, 2011 [Fulltext] Approved for External Publcaton Copyrght 2011 Hewlett-Packard Development Company, L.P.

2 Explorng synonyms wthn large commercal ste search engne queres Jula Kseleva, Andrey Smanovsky HP Labs Russa Abstract. We descrbe results of experments of extractng synonyms from large commercal ste search engne query log. Our prmary object s product search queres. The resultng dctonary of synonyms can be plugged nto a search engne n order to mprove search results qualty. We use product database to extend the dctonary. Keywords: synonym mnng, query log analyss 1 Introducton A large commercal ste s an nformaton portal for customers where they can fnd everythng about the vendor s products e.g. manuals, drvers, etc. A large commercal ste has a search engne and ts man functon s to help customers to retreve approprate nformaton. We regard a user search query as a product query when the user s ntent s to retreve nformaton about hp products and servces ncludng manuals, drvers, and support. One way to mprove search qualty s to utlze a dctonary of synonyms that ncorporates a document collecton vocabulary and vocabulary of dfferent users. We analyze ncomng queres for synonym terms whch could be ncluded nto a thesaurus. We attempted several technques n order to detect synonymous terms and queres among queres from a large commercal ste search engne. Query expanson s defned as a stage of the nformaton retreval process durng whch a user s ntal query statement s extended wth addtonal search terms n order to mprove retreval performance. Query expanson s ratonalzed by the fact that ntal query formulaton does not always reflect the exact nformaton need of a

3 user. The applcaton of thesaur to query expanson and reformulaton has become an area of ncreasng nterest. Three types of query expanson are dscussed n the lterature: manual, automatc, and nteractve (ncludng semautomatc, user-medated, and user-asssted). These approaches use dfferent sources of search terms and a varety of expanson technques. Manual approach does not nclude any knowledge about a collecton whle nteractve approach mples a query modfcaton by a feedback process. However, assstance could be sought from other sources, ncludng a dctonary or a thesaurus. In the query expanson research, one of the bggest ssues s to generate approprate keywords that represent the user s ntenton. Spellng correcton [1] s also related to synonyms detecton as ts technques are applcable, especally for product synonyms whch share a lot of common words. The methods descrbed above have used only a query set as nput data. But there are a few publshed approaches whch use external data sources for synonym detecton to make the technque more robust. The goal of ths project s detectng synonymous terms n search queres whch are submtted by users to a large commercal ste search engne. We also provde recommendatons for enhancng search qualty on the large commercal ste. The remander of the report s organzed as follows. We revew related work n Secton 2. The problem s formulated n Secton 3. Secton 4 dscusses algorthms that were utlzed; n partcular, Sectons present functons we use as measures of smlarty. Secton 5 descrbes our expermental data set and compares expermental results. Fnally, Secton 6 summarzes our contrbuton. 2 Related Work There are a lot of papers related to synonym detecton n search queres. Thesaur have been recognzed as a useful source for enhancng search-term selecton for query formulaton and expanson [4], [5]. Termnologcal assstance may be provded through ncluson of thesaur and classfcaton schemes nto the IR system. In a seres of experments on desgnng nterfaces to the Okap search engne t was found that both mplct and explct use of a thesaurus durng automatc and nteractve query expanson were benefcal. It was also suggested that whle the system could fnd useful thesaurus terms through an automatc query-expanson process, terms explctly selected by users are of partcular value ([4], [6]). 2

4 The paper [3] presents a new approach to query expanson. Authors proposed Related Word Extracton Algorthm (RWEA). Ths algorthm extracts words from texts that are supposed to be strongly related to the ntal query. RWEA weghts were also used n Robertson s Selecton Value (RSV), a well known method for relevance feedback [4], weghtng scheme. Query expanson was performed based on the results of each method (RSV, RWEA, and RSV wth RWEA weghts) and a comparson was made. RWEA evaluates a word n a document and RSV evaluates a word among several documents, consequently, the combnaton should perform unformly well. Expermental results corroborated that statement: the combned method works effectvely for all queres on average. In partcular, when a user nputs ntal queres whch results have Average Precson (AP) under 0.6 the method obtans the hghest Mean Average Precson (MAP). It also obtans the hghest among the three methods MAP on experments wth navgatonal queres. However, RWEA obtans the hghest MAP on experments wth nformatonal queres. Expermental results show that effectveness of a method for query expanson depends on the type of queres. There are a lot of research papers about query spellng correcton [1] whch were publshed recently. We thnk that ths area s also related to synonym detecton as ts technques are applcable. For example n [7] authors consder a new class of smlarty functons between canddate strngs and reference enttes. These smlarty functons are more accurate than prevous strng-based smlarty functons because they aggregate evdence from multple documents and explot web search engnes n order to measure smlarty. They thoroughly evaluate technques on real datasets and demonstrate ther precson and effcency. In [2] authors present a study on clusterng of synonym terms n search queres. The man dea s that f users clck on the same web-page after submttng dfferent search queres those queres are synonyms. 3 Problem Statement Our goal s to buld a thesaurus of synonyms terms whch are related to respectve products. We also provde a set of recommendatons for enhancng qualty of search results returned by the large commercal ste search engne. 3

5 4 Algorthms 4.1 Smlarty Dstance Metrcs We perform experments wth token-based and term-based smlarty metrcs. We choose ths metrcs because ther effcency was proved n lterature [11], [13] Token-based dstance There are a lot of token-based strng smlarty metrcs whch are descrbed n the lterature. Levenshten dstance (LD) s a measure of the smlarty between two strngs, whch we wll refer to as the source strng (s) and the target strng (t). The dstance s the number of deletons, nsertons, or substtutons requred to transform s nto t. For example, If s s "test" and t s "test", then LD(s,t) = 0, because no transformatons are needed. The strngs are already dentcal. If s s "test" and t s "tent", then LD(s,t) = 1, because one substtuton (change "s" to "n") s suffcent to transform s nto t. The greater the Levenshten dstance s the more dfferent are the strngs. Levenshten dstance s also called edt dstance. Smth Waterman dstance [11] s smlar to Levenshten dstance. It was developed to dentfy optmal algnments between related DNA and proten sequences. It has two parameters, a functon d and a gap G. The functon d s a functon from an alphabet to cost values for substtutons. The gap G allows costs to be attrbuted to nsert and delete operatons. The smlarty score D s computed wth a dynamc programmng algorthm descrbed by the equaton below: 0 // start D( 1, j 1) d( s, tj) // subst / copy D(, j) max D( 1) G // nsert D(, j 1) G // delete The fnal score s gven by the hghest valued cell. Table 1 presents the example of score calculaton. 4

6 C O H E N M C C O H N Table 1. Smth-Waterman calculaton between strng cohen and mccohn where G = 1, d(c,c) =2, d(c,d) = +1. Smth-Waterman-Gotoh [12] s an extenson of Smth-Waterman dstance that allows affne gaps wthn the sequence. The Affne Gap model ncludes varable gap costs typcally based upon the length of the gap l (W l ). If two sequences, A (=a 1 a 2 a 3... a n ) and B (=b 1 b 2 b 3... b m ), are compared the formula for dynamc programmng algorthm s: D j =max{d -1, j-1 +d(a,b j ), max k {D -k,j -W k }, max l {D, j-l -W l }, 0}, where D j s n fact maxmum smlarty of two segments endng n a and b j respectvely. Two affne gap costs are consdered, a cost for startng a gap and a cost for contnuaton of a gap. Defnton: The taxcab dstance, d 1, between two vectors p, q n an n-dmensonal real vector space wth fxed Cartesan coordnate system, s the sum of the lengths of the projectons of the lne segment between the ponts onto the coordnate axes: d 1( p, q) p q 1 p q, Where p p, p,..., p ) and q q, q,..., q ) are the two vectors. ( 1 2 n n 1 ( 1 2 n The taxcab metrc s also known as rectlnear dstance, L 1 dstance or 1 norm, cty block dstance, Manhattan dstance, or Manhattan length Term based dstance We choose cosne smlarty metrc as a term-based dstance. Cosne smlarty s a measure of smlarty between two vectors whch s equal to the cosne of the angle between them. The result of the Cosne functon s equal to 1 when the vectors are collnear or between 0 and 1 otherwse. 5

7 Cosne of two vectors can be easly derved by usng the Eucldean Dot Product formula: a * b a b cos a * b smlarty cos( ) a b n 1 ( a ) n a 1 2 b n 1 ( b ) As a weghtng functon we used a tf*df weght. The tf (term frequency) n the gven document s smply the number of tmes a gven term appears n that document: 2 tf n n k j k j where n,j s the number of occurrences of the consdered term t n document d j, and the denomnator s the sum of number of occurrences of all terms n document d j, that s, the sze of the document d j. The df (nverse document frequency ) s a measure of the general mportance of the term : df D log { d : t d} We selected tf*df weght. It combnes two aspects of a word, the mportance of word for document and ts dscrmnatve power wthn the whole collecton. Each query was regarded as a document n the collecton. Tf s the frequency of a term n a query. It s almost always equal to 1 and df s the ordnary nverse document frequency. 4.2 Probablstc Model Source Chanel Model In paper [1] authors apply source channel model to the error correcton task. We explore the possblty of applyng t to fndng synonyms. Source channel model has been wdely used for spellng correcton. Usng source channel model, we try to solve an equvalent problem by applyng Bayes rule and droppng the constant denomnator: 6

8 * c argmax c C P(q c)p(c), where q s query, c s correcton canddate. In ths approach, two components of generatve model are nvolved: P(c) characterzes user s ntended query c and P(q c) models error. The two components can be estmated ndependently. The source model (P(c)) could be approxmated wth n-gram statstcal language model. It s estmated wth tokenzed query logs n practce for mult-term query. Consder, for example, a bgram model. c s a correcton canddate contanng n terms, c= c 1 c2... cn, then P(c) could be wrtten as a product of consecutve bgram probabltes: P ( c) P( c c 1 ) Smlarly, the error model probablty of a query s decomposed nto generaton probabltes of ndvdual terms whch are assumed to be ndependent: P q c) P( q c ) ( Now the word synonymy can be accessed va correlaton. There are dfferent ways to estmate dstrbutonal smlarty between two words, and the one we propose to use s confuson probablty. Formally, confuson probablty P c estmates the possblty that a word w 1 could be replaced by another word w 2 [1]: P( w w c 2 ) P ( w2 w1 ) P( w w1 ) P( w2 ) P( w) w where w belongs to the set of words that co-occur wth both, w 1 and w 2. For synonym detecton we assume that w 1 s an ntal word and w 2 s a synonym. Confuson probablty P c ( w 2 w1 ) models the probablty of w 1 beng rephrased as w 2 n query logs Utlzng database as external data contaner As we menton n secton Related works, there s a successful practce of utlzng external sources to dscover synonyms. We present a novel method whch makes use of a database wth product names to enhance synonym detecton estmated n the prevous secton. The database provdes new ways to detect synonym terms because t contans product names whch are related to the queres but could be expressed n, 7

9 other words. Synonym terms from the database are extremely useful for detectng related products durng search process. We ntroduce an analog of confuson probablty between words n the query and terms n the database. Fgure 1. Metrcs nsde search query tokens and product names database Fgure 1 shows sets of tokens n a database (D) and n a query log (Q). D Q s an ntersecton of terms n the database and the query log; w s a token from the ntersecton. P w w ) s the confuson probablty from [1]. c ( w depcts a smlarty functon wthn the space of database terms between the term ' w and the term w, whch we choose to be Manhattan dstance because t performed best as token-based smlarty measure. { w } s a set of terms whch occur n the ntersecton between database and queres (n D Q ). ' We extend a noton of confuson probablty between w and w where w s term ' whch occurs only n queres and w s term whch occurs only n the database. We propose two ways of ntroducng confuson probablty extenson (n both formulas ndexes words of the ntersecton): ' 1. P ( w, w ) max C max( P ( w w ) * ( w, w )) max_ c ' 2. P ( w, w ) C P( w ) ( w, w) * P ( w w ) * P( w ) c W c c ' 8

10 Note that the natural desred property P c ( w, w ) P c ( w w ) f w DQ s not automatcally met by the ntroduced extenson. Another possble approach to extend the confuson probablty s to ntro- ' ' " duce P C ( w, w ) accordng the jont dstrbuton of ( w, w j ), where w j DQ. w DQ and 5 Experments In order to perform ntal data flterng, we have bult basc statstcs of the query log and found notable propertes of the current large commercal ste search engne traffc, whch are presented n secton 5.1. Next we evaluated the metrcs presented n the Secton 4. We present the evaluaton n subsequent sectons together wth sample results. 5.1 Data Descrpton In ths secton we present data descrpton and some statstcs whch wll help us to understand data nature. By data nature we mean answers to the followng questons: Where the queres have come from? What s the average length of a query? What s the lst of stop words for the large commercal ste search engne query log? The query log used for analyss s collected durng 8 days. It contans queres, unque queres, and queres whch occur more than one tme. The average length of the query s words. Table 2 provdes a detaled query log descrpton. The log does not contan any addtonal nformaton about users except paddresses. They do not unquely dentfy users. 9

11 Ipaddress 1 Tme Request Browser nformaton Status Status1 Return page *.* 05/Jun/ 2010:00 :00: GET /query.html?lang=e n&search=++&qt= pavllon+6130+add+re place+expanson&l a=en&cc=us&char set=utf-8 HTTP/1.1 Mozlla/5.0 (Wndows; U; Wndows NT 6.1; en- US; rv: ) Gecko/ Frefox/ www. hp.co m/ Table 2. Query log descrpton. The query frequency dstrbuton n the log s presented below, on the fgure 2. Fgure 2. Query s frequency dstrbuton. Top most frequent queres are gven n the Table 3. The most popular queres are nonproduct queres lke google. We thnk that those queres are most frequent because they have come from nternal corporate users. Probably t happens because the commercal ste page s by default a start page of company s employees. 1 Here and further on IP addresses are partally obfuscated because of prvacy consderatons 10

12 Query Frequency search: 1066 Google 610 Drvers 579 hp offcejet j4500 seres search 535 Slate 439 hp deskjet f2200 seres search 421 Warranty 363 hp busness avalablty center 354 hp deskjet f4200 seres search 246 Tablet 232 go nstant 214 Table 3. The most frequent queres n the log There s a parameter web secton n the request that shows what category on ste was selected by a user. From our pont of vew the query dstrbuton by topc could be useful n order to understand user behavor. We bult statstcs by web secton from query URLs. Ths web secton s related to the query topc. The statcs s demonstrated n the Table 4. The total number of web secton queres s 527 whch s 0.35 % of the whole number of queres,.e. web secton functonalty s not popular wth the users. Web Secton Topcs Frequency small & medum busness 153 Home 108 compaq.com 70 home & home offce 55 home & home offce secton only 42 small & medum busness ste 37 11

13 hp procurer networkng 27 products and servces 10 home & home offce only 9 hp promotons only 6 busness technology optmzaton (bto) software 4 learn about supples 3 hp onlne store 2 hp servces 1 Total 527 (0, 35%) Table 4. Dstrbuton of web secton queres 5.2 Data Preprocessng Data Flterng For some of the approaches that we apply, as well as to make dstncton between external and nternal use of the ste, we need per user data. To obtan per-user statstcs we develop a technque for data flterng. We fgure out that there were p-addresses whch send many requests to the search engne. We gve examples of such p-addresses, whch had more than 1000 requests, n the Table 5. We beleve that most of those search queres are sent from company s employees computers through corporate proxes. The corporate p-addresses are marked wth bold n the Table 5. We called ths set of ps non-confdental and they were removed from the data set. Ip-address Frequency *.* *.* *.* *.* *.*

14 *.* *.* *.* *.* *.* *.* *.* 1200 Table 5. Top non-confdental p-addresses We calculated statstcs of requests from all p-addresses and from non-confdental p-addresses. The statstcs are presented n the Table 6. We conclude that at least 25% of search queres orgnate from nsde the company. Date Number of confdental requests Number of all requests Delta 1 June June June June June June June June Total Table 6. Daly query statstcs per orgn To make our methodology more robust we buld a lst of stop-words. It contans prepostons and term hp. We used ths lst to clean up queres n the log. 13

15 5.2.2 Identfcaton of user sesson tme In one sesson a user may try to pursue sngle nformaton need and reformulate queres untl he/she gets a desred result. Thus, analyzng user sessons n order to fnd synonymous queres seems reasonable. We fltered p-addresses form the log accordng to the algorthm descrbed n Secton Data flterng to dentfy user sesson. Defnnton1: Delta s a tme n seconds between two contguous clcks from the same p. Defnnton2: Delta frequency frequency of delta n the whole query log. For both cases, wth non-confdental p-addresses and wthout non-confdental paddresses, we bult plots whch are presented on Fgure 3. We suppose that we should see how a user rephrases the query or expands t. We used Manhattan Dstance to fnd synonyms because t has performed well n prevous experments. (a) 14

16 (b) Fgure 3. (a) a hstogram of deltas whch start from 5 seconds for all p-addresses and (b) a hstogram for deltas whch start from 5 seconds for set of p-addreses wthout non-confdental ps. 5.3 Evaluaton Metrcs We use precson as an evaluaton metrc for our experments. Its formula s gven below: # correct _ results Precson # total _ results 5.4 Experments wth dfferent token based smlarty metrcs The frst approach that we consdered for fndng synonyms orgnates n the task of matchng smlar strngs 2. To characterze whether or not a canddate strng s synonymous to another strng, we compute the strng smlarty score between the canddate and the reference strngs [10, 6]. 2 We use smmetrcs lbrary ( 15

17 Unfortunately, there s no gold standard for evaluatng synonyms dscovery n query logs and we have to buld ground truth. After performng experments wth dfferent metrcs we select top results and evaluate them manually. We decded not to make general poolng and evaluate precson at 100 metrc nstead. We beleve that top smlar pars are more stable that pars smlar to a gven one. The results of evaluaton and the volume of gold standard are presented n the Table 7. Token-based Metrc Gold Standard Sze Precson Levenshten Dstance Smth-Waterman Dstance Smth-Waterman-Gotoh Dstance Manhattan Dstance Table 7. Results of experments wth proposed token-based metrc. Manhattan Dstance shows the best precson at 100. The man reason for low precson s that strng smlarty does not mply synonymy. E.g. strngs hp deskjet 960c and deskjet 932c are smlar accordng to smlarty metrc but they represent dfferent models of prnters and ths s not a case of synonymy Synonyms detecton by usng clck on the same URL A hypothess suggested n [2] clams that f users clck on the same search result URL ther queres should be synonyms. We explored that hypothess on our data. The Table 8 shows a few examples that were obtaned: Id Queres from the same clcked url 1 hp deskjet 845c hp deskjet d hp deskjet d1360 hp deskjet 845c 3 hp laserjet 4350tn hp laserjet hp laserjet 1102 hp laserjet 4350tn 5 hp photosmart c6380 hp photosmart a524 hp photosmart c

18 6 hp psc 1300 hp psc 1315 hp psc hp psc 1315 hp psc 1300 hp psc hp photosmart a524 hp photosmart c6380 hp photosmart c hp photosmart c4240 hp photosmart c6380 hp photosmart a hp psc 2410 hp psc 1300 hp psc hp pavlon dv6500 hp pavlon dv2000 hp pavlon dv3 Table 8. Examples of synonyms through clcks on the same URL One can see that we obtaned low precson. A clue to that ssue s that queres whch contan dfferent model numbers are regarded as synonyms. We expected that users wll reformulate a query by replacng a term, but we found that users mostly replace a model number Synonyms detecton by usng user sesson We perform experments wth the purpose of fndng smlar terms wthn the query sesson usng the methodology to detect a user sesson that we descrbed n secton We evaluated 203 queres manually and ths set s our gold standard for expert evaluaton. We obtaned precson equal to A few examples of synonyms n one user sesson are gven n the Table 9. The Table 9 also presents a smlarty value between queres wthn the sesson. User IP Query1 Query2 Smlarty Value audo sp27792 sp hpdv6-1153e drvers hp prolant ml350 g6 dv5-1153e drvers 0.5 ml330 g ml330 ml330 g

19 hp offcejet j4500 seres search hp offcejet j4500 seres warranty regstraton Table 9. Synonymous queres wthn a user sesson 5.5 Experments wth term based metrc We nflated weght for terms that are numbers or contan numbers. It was done n order to avod regardng queres wth dfferent model numbers as synonyms. Canddate pars of synonymous queres whch had cosne smlarty less than 0.7 were fltered. We have evaluated 150 queres and obtaned precson of 0.4. Almost all results are synonyms expanson. We dd not nclude the term hp and prepostons nto features space because we consder them as stop words. A few examples of synonyms found wth cosne smlarty are presented n the Table 10. The obtaned set of synonyms could be dvded nto two categores: query expanson (pars 1, 2, and 3) query rephrasng (par 4). In ths case we can conclude that terms laptop and notebook are synonyms. d Intal Query Query Synonym photosmart hp laserjet 4250n 4250n 3 rx3715 paq rx laptop 4510 notebook Table 10. Examples of query synonyms obtaned wth cosne smlarty metrcs 5.6 Experments wth confuson probablty In ths secton we appled another approach to synonyms detecton. Ths approach detects synonyms on the level of sngle words rather than whole queres and t recalls source channel model. 18

20 Some of the top results of the descrbed synonyms detecton method are presented n the Table 11. Most of presented synonyms could be characterzed by followng categores: paronymous terms lke face and facal ; msspellng lke Desgnerjet and Desgnjet ; dfferent forms of the same word lke dv42160us and dv4-2164us. Query term Query term should be smlar Confuson probablty Desgnerjet Desgnjet 0.75 Wndows Twan Twn 0.2 Mchael Mcheal dv42160us dv4-2164us Facal Face Vtamne Vtamn 0.2 Ms-6390 Ms Technsch Farm Table 11. Synonymous terms n queres detected wth confuson probablty 6 Concluson and recommendatons We dscovered that all obtaned synonyms can be classfed nto the followng groups: 1. Msspellngs. 2. Dfferent forms of a word (mostly plural form) 3. Term and dgt. Terms adherng the followng regular expressons: Dgt Space* Letter and Letter Space* Dgt. 4. Query expansons. 5. Rephrasngs. It s the type of synonyms whch s the most nterestng for us. The Table 12 contans examples of the above categores. 19

21 Category Intal Query Synonyms Query Msspellng 1) alanta 2) laser 3) vdeo 4) Desgnerjet Warrantes 1) Atlanta 2) Leser 3) Vdeo 4) Desgnerjet Warranty Dfferent form of the word Term and dgt dv 8 dv8 Query expanson hp offce locatons n hp nda nda Rephrasng 1) Remove 2) Actvaton 3) How to 4) Total care 5) Call center 1) Unnstall 2) Product key 3) Help, not workng, support 4) Advser 5) servce center Table 12. Synonyms categores wth examples Accordng the dscovered groups of synonyms we gve the followng recommendatons: 1. Make spellng correcton n run tme. We can dentfy and store a lst of most common msspelled terms. The appendx B demonstrates that currently search engne at the ste cannot detect a msspellng. The Fgure 5 shows that the search engne does not correct msspellng and returns rrelevant results. For now we cannot say that we have detected the whole lst of msspellngs because the current query log does not have enough data. 2. We thnk that storng dfferent forms of terms wll mprove search qualty. 3. Make data normalzaton. Terms adherng the followng regular expressons: Dgt Space* Letter and Letter Space* Dgt should be normalzed. We should normalze ncomng queres and data n the database. The appendx C contans two Fgures, 7 and 8, whch show how search result could change dependng on form of wrtng for hard drve capacty. 4. We need more data to detect query expansons. The search engne has query reformulatons servce but sometmes very werd suggestons are returned. One of the examples s presented n appendx A, Fgure 4. The ste should have a product orented search engne but suggested queres look lke most frequent queres and are not related to products. An example could be found n the Appendx A, the Fgures 5 and We present novel technque for synonym detecton n ths report. We need more data to detect strong lst of rephrasng synonyms. 20

22 We detected two problems wth data set: The majorty of queres come from nternal corporate users and they are not product search queres. We thnk that ths pecularty s not nherent to the specfc query log and reflects general ssues wth the current search functonalty on the ste. Statstcs of the one week log are not enough to detect strong synonym patterns. We total number of extracted synonym pars counts on tens. We hope that a longer log can ncrease that number wth close to lnear dependence on the log sze. 7 References 1. Mu L, Muhua Zhu, Yang Zhang, Mng Zhou. Explorng Dstrbutonal Smlarty Based Models Query Spellng Correcton. In processng of the 21st Internatonal Conference on Computatonal Lngustcs and 44th Annual Meetng of the ACL, pages , Jeonghee Y, Farzn Maghoul.Query clusterng usng clck-through graph. In processng of SIGIR, Tetsuya Osh, Shunsuke Kuramoto, Tsunenor Mne, Ryuzo Hasegawa, Hrosh Fujta, Myuk Koshmura: A Method for Query Expanson Usng the Related Word Extracton Algorthm. Web Intellgence/IAT Workshops 2008: Beaulev, M. (1997). Experments of nterfaces to support query expanson Journal of Documentaton, 53(1), Brajnk, G., Mzzaro, S., & Tasso, C. (1996, August). Evaluatng user nterfaces to nformaton retreval systems: A case study on user support. Proceedngs of the 19th annual conference on Research and Development n Informaton Retreval (ACM/SIGIR) (pp ). Zurch, Swtzerland. 6. Jones, S., Gatford, M., Hancock-Beauleu, M., Robertson, S.E.,Walker,W.,& Secker, J. (1995). Interactve thesaurus navgaton: Intellgence rules Ok? Journal of the Amercan Socety for Informaton Scence, 46(1), Surajt Chaudhur, Venkatesh Gant, Dong Xn. Explotng Web Search to Generate Synonyms for Enttes, WWW K. Chakrabart, S. Chaudhur, V. Gant, and D. Xn. An effcent flter for approxmate membershp checkng. In SIGMOD Conference, pages , W. W. Cohen and S. Sarawag. Explotng dctonares n named entty extracton: combnng sem-markovextracton processes and data ntegraton methods. InKDD, pages 89-98, C. H. Bennett, P. Gács, M. L, P. M. B. Vtány, and W. Zurek, Informaton dstance, IEEE Trans. Inform. Theory, vol. 44, pp , July Smth, T. F. and Waterman, M. S. Identfcaton of common molecular subsequences, J. Mol. Bol., pp ,

23 12. Gotoh, O. "An Improved Algorthm for Matchng Bologcal Sequences". Journal of Molecular Bology. 162: , Rshn Haldar, Debajyot Mukhopadhyay.Levenshten Dstance Technque n Dctonary Lookup Methods: An Improved Approach. In processng of CoRR abs/ (2011). 8 Appendx 8.1 A. Fgure 4. Controversal query suggestons: 22

24 23

25 8.2 B Fgure 5. Msspelled query Alanta servce Fgure 6. Search page for query Atlanta servce 24

26 8.3 C Fgure 7. Search page for query hp eltebook 200 gb. Fgure 8. Search page for query hp eltebook 200gb. 25

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Local Quaternary Patterns and Feature Local Quaternary Patterns

Local Quaternary Patterns and Feature Local Quaternary Patterns Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment A Webpage Smlarty Measure for Web Sessons Clusterng Usng Sequence Algnment Mozhgan Azmpour-Kv School of Engneerng and Scence Sharf Unversty of Technology, Internatonal Campus Ksh Island, Iran mogan_az@ksh.sharf.edu

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram Shape Representaton Robust to the Sketchng Order Usng Dstance Map and Drecton Hstogram Department of Computer Scence Yonse Unversty Kwon Yun CONTENTS Revew Topc Proposed Method System Overvew Sketch Normalzaton

More information

Summarizing Data using Bottom-k Sketches

Summarizing Data using Bottom-k Sketches Summarzng Data usng Bottom-k Sketches Edth Cohen AT&T Labs Research 8 Park Avenue Florham Park, NJ 7932, USA edth@research.att.com Ham Kaplan School of Computer Scence Tel Avv Unversty Tel Avv, Israel

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

The Effect of Similarity Measures on The Quality of Query Clusters

The Effect of Similarity Measures on The Quality of Query Clusters The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Classic Term Weighting Technique for Mining Web Content Outliers

Classic Term Weighting Technique for Mining Web Content Outliers Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Online Text Mining System based on M2VSM

Online Text Mining System based on M2VSM FR-E2-1 SCIS & ISIS 2008 Onlne Text Mnng System based on M2VSM Yasufum Takama 1, Takash Okada 1, Toru Ishbash 2 1. Tokyo Metropoltan Unversty, 2. Tokyo Metropoltan Insttute of Technology 6-6 Asahgaoka,

More information

Mining User Similarity Using Spatial-temporal Intersection

Mining User Similarity Using Spatial-temporal Intersection www.ijcsi.org 215 Mnng User Smlarty Usng Spatal-temporal Intersecton Ymn Wang 1, Rumn Hu 1, Wenhua Huang 1 and Jun Chen 1 1 Natonal Engneerng Research Center for Multmeda Software, School of Computer,

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted

More information

Intrinsic Plagiarism Detection Using Character n-gram Profiles

Intrinsic Plagiarism Detection Using Character n-gram Profiles Intrnsc Plagarsm Detecton Usng Character n-gram Profles Efstathos Stamatatos Unversty of the Aegean 83200 - Karlovass, Samos, Greece stamatatos@aegean.gr Abstract: The task of ntrnsc plagarsm detecton

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS by XUNYU PAN (Under the Drecton of Suchendra M. Bhandarkar) ABSTRACT In modern tmes, more and more

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

1. Introduction. Abstract

1. Introduction. Abstract Image Retreval Usng a Herarchy of Clusters Danela Stan & Ishwar K. Seth Intellgent Informaton Engneerng Laboratory, Department of Computer Scence & Engneerng, Oaland Unversty, Rochester, Mchgan 48309-4478

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

Structured Query Suggestion for Specialization and Parallel Movement: Effect on Search Behaviors

Structured Query Suggestion for Specialization and Parallel Movement: Effect on Search Behaviors Structured Query Suggeston for Specalzaton and Parallel Movement: Effect on Search Behavors Makoto P. Kato Tetsuya Saka Katsum Tanaka Mcrosoft Research Asa, Chna tetsuyasaka@acm.org Kyoto Unversty, Japan

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information