Empirical Exploitation of Click Data for Query-Type-Based Ranking

Size: px
Start display at page:

Download "Empirical Exploitation of Click Data for Query-Type-Based Ranking"

Transcription

1 Emprcal Explotaton of Clck Data for Query-Type-Based Rankng Anle Dong Y Chang Shhao J Cya Lao Xn L Zhaohu Zheng Yahoo! Labs 701 Frst Avenue Sunnyvale, CA {anle,ychang,shhao,cyalao,xnl,zhaohu}@yahoo-nc.com Abstract In machne-learnng-based rankng, each category of queres can be appled wth a specfc rankng functon, whch s called query-type-based rankng. Such a dvdeand-conquer strategy can potentally provde better rankng functon for each query categores. A crtcal problem for the query-type-based rankng s tranng data nsuffcency, whch may be solved by usng the data extracted from clck log. Ths paper emprcally studes how to approprately explot clck data to mprove rank functon learnng n query-type-based rankng. The man contrbutons are 1) the exploraton on the utltes of two promsng approaches for clck par extracton; 2) the analyss of the role played by the nose nformaton whch nevtably appears n clck data extracton; 3) the approprate strategy for combnng tranng data and clck data; 4) the comparson of clck data whch are consstent and nconsstent wth baselne functon. 1 Introducton Learnng-to-rank approaches (Lu, 2008) have been wdely appled n commercal search engnes (Brn and Page, 1998), n whch rankng models are learned usng labeled documents. Sgnfcant efforts have been made n attempt to learn a generc rankng model whch can approprately rank documents for all queres. However, web users query ntentons are extremely heterogeneous, whch makes t dffcult for a generc rankng model to acheve best rankng results for all queres. For ths reason, rankng problems are studed for some specfc categores of queres. A query category can be defned by users query ntentons, such as navgatonal queres, Clck log Generc tranng data Dedcated tranng data Clck data mnng Heurstc-rule-based approach Sequental supervsed learnng approach Sample selecton and combnaton GBrank algorthm Generc clck data Dedcated clck data Query-type-based rankng model Fgure 1: Framework of ncorporatng clckthrough data wth tranng data to mprove dedcated model for query-type-based rankng. news queres, product queres, etc. The motvaton of ths dvde-and-conquer strategy s that, each query category has ts own data dstrbuton and dscrmnant rank features; thus, a dedcated rankng model should gve better rankng results for ths query category comparng wth a generc rankng model. Such a dedcated model can be traned usng the labeled data belongng to ths query category (whch s called dedcated tranng data). However, the amount of tranng data dedcated to one query category s usually nsuffcent because human labelng s expensve and tme-consumng, not to menton there are multple categores need to be taken care of. To deal wth the tranng data nsuffcency problem for query-type-based rankng, we propose to extract clck-through data and ncorporate t wth dedcated tranng data to learn a dedcated model. In order to ncorporate clck nformaton to mprove the rankng for a dedcate query category, t s crtcal to fully explot clck log. We emprcally explore the related approaches for the appro-

2 prate clck data explotaton n query-type-based rank functon learnng. Fgure 1 llustrates the procedures and crtcal components to be studed. 1) Clck data mnng: the purpose s to extract nformatve and relable users preference nformaton from clck log. We employ two promsng approaches: one s heurstc rule approach, the other s sequental supervsed learnng approach. 2) Sample selecton and combnaton: wth labeled tranng data and unlabeled clck data, how to select and combne them so that the samples have the best utlty for learnng? As the data dstrbuton wthn a specfc query category s dfferent from the generc data dstrbuton, t s natural to select those labeled tranng samples and unlabeled clck preference pars whch belong to ths query category, so that the data dstrbutons of tranng set and testng set are consstent for ths category. On the other hand, we should keep n mnd that: a) non-dedcated data,.e, the data that does not belong the specfc category, mght also have smlar dstrbuton as the dedcated data. Such dstrbuton smlarty makes non-dedcated data also useful for query-type-based rank functon learnng, especally for the scenaro that dedcated tranng samples s nsuffcent. b) The qualty of dedcated clck data may be not as relable as human labeled tranng data. In other words, there are some extracted clck preference pars that are nconsstent wth human labelng whle we regard human labelng as correct labelng. 3) Rank functon learnng algorthm: we use GBrank (Zheng et al., 2007) algorthm for rank functon learnng, whch has proved to be one of the most effectve up-to-date learnng-to-rank algorthms; furthermore, GBrank algorthm also takes preference pars as nputs, whch wll be llustrated wth more detals n the paper. 2 Related work Learnng to rank has been a promsng research area whch contnuously mproves web search relevance (Burges et al., 2005) (Zha et al., 2006) (Cao et al., 2007) (Freund et al., 1998) (Fredman, 2001) (Joachms, 2002) (Wang and Zha, 2007) (Zheng et al., 2007). The rankng problem s usually formulated as learnng a rankng functon from preference data. The basc dea s to mnmze the number of contradcted pars n the tranng data, and dfferent algorthm cast the preference learnng problem from dfferent pont of vew, for example, RankSVM (Joachms, 2002) uses support vector machnes; RankBoost (Freund et al., 1998) apples the dea of boostng from weak learners; GBrank (Zheng et al., 2007) uses gradent boostng wth decson tree; RankNet (Burges et al., 2005) uses gradent boostng wth neural net-work. In (Zha et al., 2006), query dfference s taken nto consderaton for learnng effectve retreval functon, whch leads to a mult-task learnng problem usng rsk mnmzaton framework. There are a few related works to apply multple rankng models for dfferent query categores. However, none of them takes clck-through nformaton nto consderaton. In (Kang and Km, 2003), queres are categorzed nto 3 types, nformatonal, navgatonal and transactonal, and dfferent models are appled on each query category. Geng at el. (Geng et al., 2008) propose a KNN method to employed dfferent rankng models to handle dfferent types of queres. The KNN method s unsupervsed, and t targets to mprove the overall rankng nstead of the rank-ng for a certan query category. In addton, the KNN method requres all feature vector to be the same. Qute a few research papers explore how to obtan useful nformaton from clck-through data, whch could beneft search relevance (Carterette et al., 2008) (Fox et al., 2005) (Radlnsk and Joachms, 2007) (Wang and Zha, 2007). The nformaton can be expressed as par-wse preferences (Chapelle and Zhang, 2009) (J et al., 2009) (Radlnsk et al., 2008), or represented as rank features (Agchten et al., 2006). Topcal rankng reles on the accuracy of query classfcaton. Query classfcaton or query ntenton dentfcaton has been extensvely studed n (Betzel et al., 2007) (Lee et al., 2005) (L et al., 2008) (Rose and Levnson, 2004). How to combne edtoral data and clck data s well dscussed n (Chen et al., 2008) (Zheng et al., 2007). In addton, how to use clck data to mprove rankng are also exploted n personalzed or preference-based search (Coyle and Smyth, 2007) (Glance, 2001) (R. Jn, 2008) (Smyth et al., 2003). 3 Techncal approach Ths secton presents the related approaches n Fgure 1. In Secton 4, we wll make deeper analyss based on expermental results.

3 Table 1: Statstcs of clck occurrences for heurstc rule approach. mp cc mpresson, number of occurrence of the tuple number of occurrence of the tuple where two documents both get clcked ncc number of occurrence of the tuple where url 1 s not clcked but url 2 s clcked cnc number of occurrence of the tuple where url 1 s clcked but url 2 s not clcked ncnc number of occurrence of the tuple where url 1 and url 2 are not clcked Table 2: Skp-above pars count vs. human judgements (e.g., the element n the thrd row and second column means we have 40 skp-above pars wth excellent url 1 and perfect url 2 ). P: perfect; E: excellent; G: good; F: far; B: bad. P E G F B P E G F B Clck data mnng We use two approaches for clck data mnng, whose outputs are preference pars. A preference par s defned as a tuple {< x q, y q > x q y q }, whch means for the query q, the document x q s more relevant than y q. We need to extract nformatve and relable preference pars whch can be used to mprove rank functon learnng Heurstc rule approach We use heurstc rules to extract skp-above pars and skp-next pars, whch are smlar to Strategy 1 (clck > skp above) and Strategy 5 (clck > noclck next) proposed n (Joachms et al., 2005). To reduce the msleadng effect of an ndvdual clck behavor, clck nformaton from dfferent query sessons s aggregated before applyng heurstc rules. For a tuple (q, url 1, url 2, pos 1, pos 2 ) where q s query, url 1 and url 2 are urls representng two documents, pos 1 and pos 2 are rankng postons for the two documents wth pos 1 pos 2 meanng url 1 has hgher rank than url 2, the statstcs for ths tuple are lsted n Table 1. Skp-above par extracton: f ncc s much cc mp, ncnc mp larger than cnc, and s much smaller than 1, that means, when url 1 s ranked hgher than url 2 n query q, most users clck url 2 but not clck url 1. In ths case, we extract a skp-above par,.e., url 2 s more relevant than url 1. In order to have hghly accurate skp-above pars, a set of thresholds are appled to only extract the pars that have hgh mpresson and ncc s larger enough than cnc. Skp-next par extracton: f pos 1 = pos 2 1, cnc s much larger than ncc, and cc mp, ncnc mp s much smaller than 1, that means, n most of cases when url 2 s ranked just below url 1 n query q, most users clck url 1 but not clck url 2. In ths case, we regard ths tuple as a skp-next par. Table 3: Skp-next pars vs. human judgements (e.g., the element n the thrd row and second column means we have 10 skp-next pars wth excellent url 1 and perfect url 2 ). P: perfect; E: excellent; G: good; F: far; B: bad. P E G F B P E G F B To test the accuracy of preference pars, we ask edtors to judge some randomly selected pars from skp-above pars and skp-next pars. Edtors label each query-url par usng fve grades accordng to relevance: perfect, excellent, good, far, bad. Table 2 shows skp-above par dstrbuton. The dagonal elements have hgh values, whch are for ted pars labeled by edtors but determned as skp-above pars from heurstc rules. Hgher values appear n the left-bottom trangle than n the rght-top trangle, because there are more skpabove preferences agreed wth edtors than dsagreed wth edtors. Summng up the ted pars, agreed and dsagreed pars, 44% skp-above preference judgments agree wth edtors, 18% skpabove preference judgments dsagree wth edtors, and there are 38% skp-above pars judged as te pars by edtors. Table 3 shows skp-next par dstrbuton. Summng up the ted pars, agreed and dsagreed pars, 70% skp-next preference judgments agree wth edtors, 4% skp-next preference judgments dsagree wth edtors, and 26% skp-next pars judged as te pars by edtors.

4 Therefore, skp-next pars have much hgher accuracy than skp-above. That s because n a search engne that already has a good rankng functon, t s much easer to fnd a correct skpnext pars whch are consstent wth the search engne than to fnd a correct skp-above pars whch are contradctory to the search engne. Skp-above and skp-next preferences provde us two knds of users s feedbacks whch are complementary: skpabove preferences provde us the feedback that the user s vote s contradctory to the current rankng, whch mples the current relatve rankng should be reversed; skp-next preferences shows that the user s vote s consstent wth the current rankng, whch mples the current relatve rankng should be mantaned wth hgh confdence provded by users vote Sequental supervsed learnng The clck modelng by sequental supervsed learnng (SSL) was proposed n (J et al., 2009), n whch user s sequental clck nformaton s exploted to extract relevance nformaton from clck-logs. Ths approach s relable because 1) the sequental clck nformaton embedded n an aggregaton of user clcks provdes substantal relevance nformaton of the documents dsplayed n the search results, and 2) the SSL s supervsed learnng (.e., human judgments are provded wth relevance labels for the tranng). The SSL s formulated n the framework of global rankng (Qn et al., 2008). Let x (q) = {x (q) 1, x(q) 2,..., x(q) n } represent the documents retreved wth a query q, and y (q) = {y (q) 1, y(q) 2,..., y(q) n } represent the relevance labels assgned to the documents. Here n s the number of documents retreved wth q. Wthout loss of generalty, we assume that n s fxed and nvarant wth respect to dfferent queres. The SSL determnes to fnd a functon F n the form of y (q) =F (x (q) ) that takes all the documents as ts nputs, explotng both local and global nformaton among the documents, and predct the relevance labels of all the document jontly. Ths s dstnct to most of learnng to rank methods that optmze a rankng model defned on a sngle document,.e., n the form of y (q) = f(x (q) ), = 1, 2,..., n. Ths formulaton of the SSL s mportant n extractng relevance nformaton from user clck data snce users clck decsons among dfferent documents dsplayed n a search sesson tend to rely not only on the relevance judgment of a sngle document, but also on the relatve relevance comparson among the documents dsplayed; and the global rankng framework s wellformulated to explot both local and global nformaton from an aggregaton of user clcks. The SSL aggregates all the user sessons for the same query nto a tuple <query, n-document lst, and an aggregaton of user clcks>. Fgure 2 llustrates the process of feature extracton from an aggregated sesson, where x (q) = {x (q) 1, x(q) 2,..., x(q) n } denotes a sequence of feature vectors extracted from the aggregated sesson, wth x (q) representng the feature vector extracted for document. Specfcally, to form feature vector x (q), frst a feature vector x (q),j s extracted from each user j s clck nformaton, and j {1, 2,... }, then x (q) s formed by averagng over x (q),j, j {1, 2,... },.e., x(q) s actually an aggregated feature vector for document. Table 4 lsts all the features used n the SSL modelng. Note that some features are statstcs ndependent of temporal nformaton of the clcks, such as Poston and Frequency, whle other features reply on ther surroundng documents and the clck sequences. We use 90,000 query-url pars to tran the SSL model, and 10,000 query-url pars for best model selecton. Wth the sequental clck modelng dscussed above, several sequental supervsed algorthms, ncludng the condtonal random felds (CRF) (Lafferty et al., 2001), the sldng wndow method and the recurrent sldng wndow method (Detterch, 2002), are explored to fnd a global rankng functon F. We omt the detals but refer one to (J et al., 2009). The emphass here s on the mportance to adapt these algorthms to the rankng problem. After tranng, the SSL model can be used to predct the relevance labels of all the documents n a new aggregated sesson, and thus par-wse preference data can be extracted, wth the score dfference representng the confdence of preference predcton. For the reason of convenence, we also call the preference pars contradctng wth producton rankng as skp-above pars and those consstent wth producton rankng as skp-next pars, so that we can analyze these two types of preference pars respectvely.

5 q doc 1 doc 2 doc doc 10 user1 o oo o user2 o o Feature Extracton x (q) = { } x 1 x 2 x x 10 y (q) = { } Fgure 2: An llustraton of feature extracton for an aggregated sesson for SSL approach. x (q) denotes an extracted sequence of feature vectors, and y (q) denotes the correspondng label sequence that s assgned by human judges for tranng. Table 4: Clck features used n SSL model. Poston ClckRank Frequency FrequencyRank IsNextClcked IsPreClcked IsAboveClcked IsBelowClcked ClckDuraton y 1 y 2 y y 10 Poston of the document n the result lst Rank of 1st clck of doc. n clck seq. Average number of clcks for ths doc. Rank n the lst sorted by num. of clcks 1 f next poston s clcked, 0 otherwse 1 f prevous poston s clcked, 0 otherwse 1 f there s a clck above, 0 otherwse 1 f there s a clck below, 0 otherwse Tme spent on the document n the par. To mnmze the loss functon, h(x) has to be larger than h(y) wth the margn τ, whch can be chosen as constant value, or as dynamc values varyng wth pars. When par-wse judgments are extracted from edtors labels wth dfferent grades, par-wse judgments can nclude grade dfference, whch can further be used as margn τ. The GBrank algorthm s llustrated n Algorthm 1, and two parameters need to be determned: the shrnkage factor η and the number of teraton. Algorthm 1 GBrank algorthm. Start wth an ntal guess h 0, for k = 1, 2, Usng h k 1 as the current approxmaton of h, we separate S nto two dsjont sets S + = {< x, y > S h k 1 (x ) h k 1 (y ) + τ} and S = {< x, y > S h k 1 (x ) < h k 1 (y ) + τ} 2. Fttng a regresson functon gk(x) usng gradent boostng tree (Fredman, 2001) and the followng tranng data {(x, h k 1 (y )+τ), (y, h k 1 (x ) τ) (x, y ) S } 3. Form (wth normalzaton of the range of h k ) h k (x) = kh k 1(x)+ηg k (x) k+1 where η s a shrnkage factor. 3.3 Sample selecton and combnaton We use a straghtforward approach to learn rankng model from the combned data, whch s llustrated n Algorthm Modelng algorthm The basc dea of GBrank (Zheng et al., 2007) s that f the orderng of a preference par by the rankng functon s contradctory to ths preference, we need to modfy the rankng functon along the drecton by swappng ths prefence par. Preferences pars could be generated GBrank was proposed n (Zheng et al., 2007). For each preference par < x, y > n the avalable preference set S = {< x, y > x y, = 1, 2,..., N}, x should be ranked hgher than y. In GBrank algorthm, the problem of learnng rankng functons s to compute a rankng functon h, so that h matches the set of preference,.e, h(x ) h(y ), f x y, = 1, 2,..., N as many as possble. The followng loss functon s used to measure the rsk of a gven rankng functon h. R(h) = 1 2 N (max{0, h(y ) h(x ) + τ}), (1) =1 where τ s the margn between the two documents Algorthm 2 Learn rankng model by combnng edtoral data and clck preference pars. Input: Edtoral absolute judgement data. Preference pars from clck data. 1. Extract preference pars from labeled data wth absolute judgement. 2. Select and combne preference pars from clck data and labeled data. 3. Learn GBrank model from the combned preference pars. Absolute judgement on labeled data contans (query, url) pars wth absolute grade values labeled by human. In Step 1, for each query wth n q query-url pars wth correspondng grades, {< query, url, grade > = 1, 2,..., n q }, ts preference pars are extracted as {< query, url, url j, grade grade j >, j = 1, 2,..., n q, j}. When combnng human-labeled pars and clck preference pars, we can gve use dfferent relatve weghts for these two data sources. The loss func-

6 ton becomes R(h) = w (max{0, h(y ) h(x ) + τ}) N l Labeled 1 w (max{0, h(y ) h(x ) + τ}),(2) N c Clck where w s used to control the relatve weghts between labeled tranng data and clck data, N l s the number of tranng data pars, and N c s the number of clck pars. The margn τ can be determned as grade dfference for edtor pars, and be a constant parameter for clck pars. Step 2 s crtcal for the effcacy of the approach. A few factors need to be consdered: 1) data dstrbuton: for the applcaton of query-type-based rankng, our purpose s to mprove rankng for the queres belongng to ths category. An mportant observaton s that the relevance patterns for the rankng wthn a specfc category may have some unque characterstcs, whch are dfferent from generc relevance rankng. Thus, t s reasonable to consder only usng dedcated labeled tranng data and dedcated clck preference data for tranng. The realty s that dedcated tranng data s usually nsuffcent, whle t s possble that non-dedcated data can also help the learnng. 2) clck par qualty: t s nevtable there exst some ncorrect pars n the clck preference pars. Such ncorrect pars may mslead the learnng. So overall, can the clck preference pars stll help the learnng for query-type-based rankng? By our study, skp-above pars usually contan more ncorrect pars compared wth skp-above pars. Does ths mean skp-next pars are always more helpful n mprovng learnng than skp-next pars? 3) clck par utlty: use labeled tranng data as baselne, how much complmentary nformaton can clck pars brng? Ths s determned by the methodology of clck data mnng approach. Whle t s possble to acheve some learnng mprovement for query-type-based rankng by usng clck pars by a plausble method, we attempt to emprcally explore the above nterweavng factors for deeper understandng, n order to use the most approprate strategy to explot clck data. 4 Experments 4.1 Data set Query category: n the experments, we use long query rankng as an example of query-type-based rankng, because t s commonly known that long query rankng has some unque relevance patterns compared wth generc rankng. We defne the long queres as the queres contanng at least three tokens. The technques and analyss proposed n ths paper can be appled to other query categores, such as product queres, news queres, blog queres, etc. Labeled tranng data: we do experments based on a data set for a commercal search engne, for whch there are 16,797 query-url pars (wth 1,123 dfferent queres) that have been labeled by edtors. The proporton of long queres s about 35% of all queres. The data dstrbuton of such long queres may be dfferent from general data dstrbuton, as t wll be valdated n the experments below. The human labeled data s randomly splt nto two sets: tranng set (8,831 query-url pars, 589 queres), and testng set (7,966 query-url pars, 534 queres). The tranng set wll be combned wth clck preference pars for rank functon learnng, and the testng set wll be used to evaluate the effcacy of the rankng functon. In the tranng set, there are 3,842 long query-url pars (229 queres). At testng stage, the learned rank functons are appled only to the long queres n the testng data, as our concern n ths paper s how to mprove querytype-based rankng,.e., long query rankng n the experment. In the testng data, there are 3,210 query-url pars (193 queres) are long query data, whch wll be used to test rank functons. Clck preference pars: usng the two approaches of heurstc rule approach and sequental supervsed approach, we extract clck prefence pars from the clck log of the search engne. Each approach yelds both skp-next and skp-above pars, whch are sorted by confdence descendng order respectvely. 4.2 Setup and measurements We try dfferent sample selecton and combnaton strateges to tran rank functons usng GBrank algorthm. For the labeled tranng data, we ether use generc data or dedcated data. For the clck preference pars, we also try these two optons. Furthermore, as more clck preference pars may brng more useful nformaton to help the learnng whle on the other hand, the more ncorrect pars may be gven so that they mslead the learnng, we try dfferent amounts of these prefence

7 Table 5: Use clck data by heurstc rule approach (Data Selecton: N : not use; D : use dedcated data; G : use generc data. Data Source: T : tranng data; C : clck data) (a) skp-next pars NT DT GT NC n/a DC (1.2%) (2.4%) GC (1.2%) (1.7%) (b) skp-above pars NT DT GT NC n/a DC (-1.6%) (-0.8%) GC (-2.0%) (2.2%) Table 6: Use clck data by SSL approach (Data Selecton: N : not use; D : use dedcated data; G : use generc data. Data Source: T : tranng data; C : clck data) (a) skp-next pars NT DT GT NC n/a DC (1.5%) (1.5%) GC (0.4%) (1.2%) (b) skp-above pars NT DT GT NC n/a DC (-2.2%) (-0.3%) GC (-1.2%) (-0.5%) pars: 5,000, 10,000, 30,000, 50,000, 70,000 and 100,000 pars. We use NDCG to evaluate rankng model, whch s defned as NDCG n = Z n n =1 2r() 1 log(+1) where s the poston n the document lst, r() s the score of Document, and Z n s a normalzaton factor, whch s used to make the NDCG of deal lst be Results Table 5 and 6 show the NDCG 5 results by usng heurstc rule approach and SSL approach respectvely. We do not present NDCG 1 results due to space lmtaton, but NDCG 1 results have the smlar trends as NDCG 5. Baselne by tranng data: there are two baselne functons by usng tranng data sets 1) use NDCG dedcate tran + dedcate clck generc tran + dedcate clck dedcate clck generc clck clck pars x 10 4 Fgure 3: Incorporate dfferent amounts of skpnext pars by heurstc rule approach wth generc tranng data. NDCG ,000 dedcate clck + 10,000 dedcate clck + 30,000 dedcate clck trangng sample weght Fgure 4: The effects of usng dfferent combnng weghts. Skp-next pars by heurstc rule approach are combned wth generc tranng data. dedcated tranng data (DT), NDCG 5 on the testng set by the rank functon s ; 2) use generc tranng data (GT), NDCG 5 s It s reasonable that usng generc tranng data s better than only usng dedcated tranng data, because the dstrbutons of non-dedcated data and dedcated data share some smlarty. As the dedcated tranng data s nsuffcent, the adopton of the extra non-dedcated data helps the learnng. We compare learnng results wth Baselne 2) (use generc tranng data, the slot of NC + GT n the tables), whch s the hgher baselne. Baselne by clck data: we then study the utltes of clck preference pars by usng them alone for tranng wthout usng labeled tranng data. In Table 5 and 6, each of the NDCG 5 results usng clck preference pars s the hghest NDCG 5 value over the cases of usng dfferent amounts of pars (5000, 10,000, 30,000, 50,000, 70,000 and 100,000 pars). The results regardng the pars amounts are llustrated n Fgure 3, whch wll help us to analyze the results more deeply. If we only use clck preference pars for tranng

8 (the two table slots DC+NT and GC+NT, correspondng to usng dedcated clck preference pars and generc clck pars respectvely), the best case s usng skp-next pars extracted by heurstc rule approach (Table 5 (a) ). It s not surprsng that skp-next pars outperform skp-above pars because there are sgnfcantly lower percentage of ncorrect pars n skp-next pars compared wth skp-above pars. It s a lttle bt surprsng that the case of DC+NT has no domnant advantage over GC+NT as we expected. For example, n Table 5 (a), the NDCG 5 values ( and ) are very close to each other. However, n Fgure 3, we fnd that wth the same amount of pars, when we use 30,000 or fewer pars, usng dedcated clck pars alone s always better than usng generc clck pars alone. Wth more clck pars beng used (> 30, 000), the nose rates become hgher n the pars, whch makes the dstrbuton factor less mportant. Combne tranng data and clck data: we compare the four table slots, DC+DT, GC+DT, DC+GT, GC+GT, n Table 5 and 6, and there are qute a few nterestng observatons: 1) Skp-next vs. skp-above: overall, ncorporatng skp-next pars wth tranng data s better than ncorporatng skp-above pars, due to the reason that there are more ncorrect pars n skp-above pars, whch may mslead the learnng. The only excepton s the slot GC+GT n Table 5 (b), whose NDCG 5 mprovement s as hgh as 2.2%. We further track ths result, and fnd that ths s the case by usng only 5,000 generc skp-above pars. The nose rate of these 5,000 pars s low because they have the hghest par extracton confdence values. At the same tme, these 5,000 pars may provde good complementary sgnals to the generc tranng data, so that the learnng result s good. However, n general, skp-next pars have better utltes than skp-above pars. 2) Dedcated tranng data vs. generc tranng data: usng generc tranng data s generally better than only usng dedcated tranng data. If tranng data s nsuffcent, the extra non-dedcated data provdes useful nformaton for relevance pattern learnng, and the dstrbuton dssmlarty between dedcated data and nondedcated data s not the most mportant factor. 3) Dedcated clck data vs. generc clck data: usng dedcated clck data s more effectve than usng generc clck data. From Fgure 3, we observe that when 30,000 or fewer pars are ncorporated nto tranng data, usng dedcate clck pars s always better than usng generc clck pars. NDCG ,000 dedcate clck + 10,000 dedcate clck + 30,000 dedcate clck τ Fgure 5: The effects of usng dfferent margn values for clck preference pars. Skp-next pars by heurstc rule approach are ncorporated wth generc tranng data. 4) Heurstc rule approach vs. SSL approach: the preference pars extracted by heurstc rule approach have better utltes than those extracted by SSL approach. 5) GBrank parameters for combnng tranng data and clck pars: the relatve weght w for combnng tranng data and clck pars n (2) may also affect rank functon learnng. Fgure 4 shows the effects of usng dfferent combnng weghts, for whch skp-next pars by heurstc rule approach are combned wth generc tranng data. We observe that nether over-weghtng tranng data or over-weghtng clck pars yelds good results whle the two data sources are best exploted at certan weght values when there s good balance between them. Another concern s the approprate margn value τ for the clck pars n (2). Fgure 5 shows that τ = 1 consstently yelds good learnng results, whch suggests us that clck par provdes good nformaton at τ = Dscussons we have defactorzed the related approaches for the task of explotng clck data to mprove querytype-based rank learnng. The utlty of clck preference pars depends on the followng factors: 1) Data dstrbuton: f clck pars have good qualty, we should use dedcated clck pars nstead of generc clck pars, so that the samples for tranng have smlar dstrbuton to the task of query-type-based rankng. 2) The amount of dedcated tranng data: the more dedcated tranng data, the more relable the

9 query-type-based rank functon s; thus, the less room for learnng mprovement usng clck data. For the case n the experment that dedcated tranng s nsuffcent, the non-dedcated tranng data can also help the learnng as non-dedcated tranng data share relevance pattern smlarty wth the dedcated data dstrbuton. 3) The qualty of clck pars: f we can extract large amount of hgh-qualty clck pars, the learnng mprovement wll be sgnfcant. For example, as shown n Fgure 3, at the early stage wth fewer clck pars (5,000 and 10,000 pars) beng combned wth tranng data, the learnng mprovement s best. Wth more clck pars are used, the nose rate n the clck pars becomes hgher so that the learnng msleadng factor s more mportant than nformaton complementary factor. Thus, t s mportant to mprove the relablty of the clck pars. 4) The utlty of clck pars: by our study, the qualty of clck pars extracted by SSL approach s comparable to those extracted by heurstc rule approach. The possble reason that heurstc-rulebased clck pars can brng more beneft s that these pars provde more complementary nformaton compared wth SSL approach. As the methodologes of these two clck data extracton approaches are totally dfferent, n future we wll explore the concrete reason that causes such utlty dfference. 5 Conclusons By emprcally explorng the related factors n utlzng clck-through data to mprove dedcated model learnng for query-type-based rankng, we have better understood the prncples of usng clck preference pars approprately, whch s mportant for the real-world applcatons n commercal search engnes as usng clck data can sgnfcantly save human labelng costs and makes rank functon learnng more effcent. In the case that dedcated tranng data s lmted, whle nondedcated tranng data s helpful, usng dedcated skp-next pars s the most effectve way to further mprove the learnng. Heurstc rule approach provdes more useful clck pars compared wth sequental supervsed learnng approach. The qualty of clck pars s crtcal for the effcacy of the approach. Therefore, an nterestng topc s how to further reduce the nconsstency between skpabove pars and human labelng so that such data may also be useful for query-type-based rankng. References E. Agchten, E. Brll, and S. Dumas Improvng web search rankng by ncorporatng user behavor nformaton. Proc. of ACM SIGIR Conference. S. M. Betzel, E. C. Jensen, A. Chowdhury, and O. Freder Varyng approaches to topcal web query classfcaton. Proceedngs of ACM SI- GIR conference. S. Brn and L. Page The anatomy of a largescale hypertextual web search engne. Proceedngs of Internatonal Conference on World Wde Web. C. Burges, T. Shaked, E. Renshaw, A. Lazer, M. Deeds, N. Hamlton, and G. Hullender Learnng to rank usng gradent descent. Proc. of Intl. Conf. on Machne Learnng. Z. Cao, T. Qn, T. Lu, M. Tsa, and H. L Learnng to rank: From parwse approach to lstwse. Proceedngs of ICML conference. B. Carterette, P. N. Bennett, D. M. Chckerng, and S. T. Dumas Here or there: preference judgments for relevance. Proc. of ECIR. O. Chapelle and Y. Zhang A dynamc bayesan network clck model for web search rankng. Proceedngs of the 18th Internatonal World Wde Web Conference. K. Chen, Y. Zhang, Z. Zheng, H. Zha, and G. Sun Adaptng rankng functons to user preference. ICDE Workshops, pages M. Coyle and B. Smyth Supportng ntellgent web search. ACM Transacton Internet Tech., 7(4). T. G. Detterch Machne learnng for sequental data: a revew. Lecture Notes n Computer Scence, (2396): S. Fox, K. Karnawat, M. Mydland, S. Dumas, and T. Whte Evaluatng mplct measures to mprove web search. ACM Trans. on Informaton Systems, 23(2): Y. Freund, R. D. Iyer, R. E. Schapre, and Y. Snger An effcent boostng algorthm for combnng preferences. Proceedngs of Internatonal Conference on Machne Learnng. J. Fredman Greedy functon approxmaton: a gradent boostng machne. Ann. Statst., 29: X. Geng, T. Lu, T. Qn, A. Arnold, H. L, and H. Shum Query dependent rankng wth k nearest neghbor. Proceedngs of ACM SIGIR Conference. N. S. Glance Communty search assstant. Intellgent User Interfaces, pages

10 S. J, K. Zhou, C. Lao, Z. Zheng, G. Xue, O. Chapelle, G. Sun, and H. Zha Global rankng by explotng user clcks. In SIGIR 09, Boston, USA, July T. Joachms, L. Granka, B. Pan, and G Gay Accurately nterpretng clckthough data as mplct feedback. Proc. of ACM SIGIR Conference. T. Joachms Optmzng search engnes usng clckthrough data. In Proceedngs of the ACM Conference on Knowledge Dscovery and Data Mnng (KDD). I. Kang and G. Km Query type classfcaton for web document retreval. Proceedngs of ACM SIGIR Conference. J. Lafferty, A. McCallum, and F. Perera Condtonal random felds: Probablstc models for segmentng and labelng sequence data. In ICML, pages U. Lee, Z. Lu, and J. Cho Automatc dentfcaton of user goals n web search. Proceedngs of Internatonal Conference on World Wde Web. X. L, Y.Y Wang, and A. Acero Learnng query ntent from regularzed clck graphs. Proceedngs of ACM SIGIR Conference. T. Y Lu Learnng to rank for nformaton retreval. SIGIR tutoral. T. Qn, T. Lu, X. Zhang, D. Wang, and H. L Global rankng usng contnuous condtonal random felds. In NIPS. H. L R. Jn, H. Valzadegan Rankng refnement and ts applcaton to nformaton retreval. Proceedngs of Internatonal Conference on World Wde Web. F. Radlnsk and T. Joachms Actve exploraton for learnng rankngs from clckthrough data. Proc. of ACM SIGKDD Conference. F. Radlnsk, M. Kurup, and T. Joachms How does clckthrough data reflect retreval qualty? Proceedngs of ACM CIKM Conference. D. E. Rose and D. Levnson Understandng user goals n web search. Proceedngs of Internatonal Conference on World Wde Web. B. Smyth, E. Balfe, P. Brggs, M. Coyle, and J. Freyne Collaboratve web search. Proceedngs of IJ- CAI Conference. X. Wang and C. Zha Learn from web search logs to organze search results. In Proceedngs of the 30th ACM SIGIR. H. Zha, Z. Zheng, H. Fu, and G. Sun Incorporatng query dfference for learnng retreval functons n world wde web search. Proceedngs of the 15th ACM Conference on Informaton and Knowledge Management. Z. Zheng, K. Chen, G. Sun, and H. Zha A regresson framework for learnng rankng functons usng relatve relevance judgments. Proc. of ACM SIGIR Conference, pages

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Adaptive Transfer Learning

Adaptive Transfer Learning Adaptve Transfer Learnng Bn Cao, Snno Jaln Pan, Yu Zhang, Dt-Yan Yeung, Qang Yang Hong Kong Unversty of Scence and Technology Clear Water Bay, Kowloon, Hong Kong {caobn,snnopan,zhangyu,dyyeung,qyang}@cse.ust.hk

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS P.G. Demdov Yaroslavl State Unversty Anatoly Ntn, Vladmr Khryashchev, Olga Stepanova, Igor Kostern EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS Yaroslavl, 2015 Eye

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Query classification using topic models and support vector machine

Query classification using topic models and support vector machine Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Relevance Feedback Document Retrieval using Non-Relevant Documents

Relevance Feedback Document Retrieval using Non-Relevant Documents Relevance Feedback Document Retreval usng Non-Relevant Documents TAKASHI ONODA, HIROSHI MURATA and SEIJI YAMADA Ths paper reports a new document retreval method usng non-relevant documents. From a large

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Resolving Surface Forms to Wikipedia Topics

Resolving Surface Forms to Wikipedia Topics Resolvng Surface Forms to Wkpeda Topcs Ypng Zhou Lan Ne Omd Rouhan-Kalleh Flavan Vasle Scott Gaffney Yahoo! Labs at Sunnyvale {zhouy,lanne,omd,flavan,gaffney}@yahoo-nc.com Abstract Ambguty of entty mentons

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval

KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval KD Lab at mageclef 2012 Personal Photo Retreval Cha-We Ku, Been-Chan Chen, Guan-Bn Chen, L-J Gaou, Rong-ng Huang, and ao-en Wang Knowledge, nformaton, and Database ystem Laboratory Department of Computer

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

Feature-Based Matrix Factorization

Feature-Based Matrix Factorization Feature-Based Matrx Factorzaton arxv:1109.2271v3 [cs.ai] 29 Dec 2011 Tanq Chen, Zhao Zheng, Quxa Lu, Wenan Zhang, Yong Yu {tqchen,zhengzhao,luquxa,wnzhang,yyu}@apex.stu.edu.cn Apex Data & Knowledge Management

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,

More information

Manifold-Ranking Based Keyword Propagation for Image Retrieval *

Manifold-Ranking Based Keyword Propagation for Image Retrieval * Manfold-Rankng Based Keyword Propagaton for Image Retreval * Hanghang Tong,, Jngru He,, Mngjng L 2, We-Yng Ma 2, Hong-Jang Zhang 2 and Changshu Zhang 3,3 Department of Automaton, Tsnghua Unversty, Bejng

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition Fast and Scalable Tranng of Sem-Supervsed CRFs wth Applcaton to Actvty Recognton Maryam Mahdavan Computer Scence Department Unversty of Brtsh Columba Vancouver, BC, Canada Tanzeem Choudhury Intel Research

More information

Generalized Team Draft Interleaving

Generalized Team Draft Interleaving Generalzed Team Draft Interleavng Eugene Khartonov,2, Crag Macdonald 2, Pavel Serdyukov, Iadh Ouns 2 Yandex, Russa 2 Unversty of Glasgow, UK {khartonov, pavser}@yandex-team.ru 2 {crag.macdonald, adh.ouns}@glasgow.ac.uk

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

The Effect of Similarity Measures on The Quality of Query Clusters

The Effect of Similarity Measures on The Quality of Query Clusters The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted

More information

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data Learnng Semantcs-Preservng Dstance Metrcs for Clusterng Graphcal Data Aparna S. Varde, Elke A. Rundenstener Carolna Ruz Mohammed Manruzzaman,3 Rchard D. Ssson Jr.,3 Department of Computer Scence Center

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL) Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

11. HARMS How To: CSV Import

11. HARMS How To: CSV Import and Rsk System 11. How To: CSV Import Preparng the spreadsheet for CSV Import Refer to the spreadsheet template to ad algnng spreadsheet columns wth Data Felds. The spreadsheet s shown n the Appendx, an

More information

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM Classfcaton of Face Images Based on Gender usng Dmensonalty Reducton Technques and SVM Fahm Mannan 260 266 294 School of Computer Scence McGll Unversty Abstract Ths report presents gender classfcaton based

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Signature and Lexicon Pruning Techniques

Signature and Lexicon Pruning Techniques Sgnature and Lexcon Prunng Technques Srnvas Palla, Hansheng Le, Venu Govndaraju Centre for Unfed Bometrcs and Sensors Unversty at Buffalo {spalla2, hle, govnd}@cedar.buffalo.edu Abstract Handwrtten word

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Application of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions

Application of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions Applcaton of Maxmum Entropy Markov Models on the Proten Secondary Structure Predctons Yohan Km Department of Chemstry and Bochemstry Unversty of Calforna, San Dego La Jolla, CA 92093 ykm@ucsd.edu Abstract

More information

Audio Content Classification Method Research Based on Two-step Strategy

Audio Content Classification Method Research Based on Two-step Strategy (IJACSA) Internatonal Journal of Advanced Computer Scence and Applcatons, Audo Content Classfcaton Method Research Based on Two-step Strategy Sume Lang Department of Computer Scence and Technology Chongqng

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

A Compositional Exemplar-Based Model for Hair Segmentation

A Compositional Exemplar-Based Model for Hair Segmentation A Compostonal Exemplar-Based Model for Har Segmentaton Nan Wang 1, Hazhou A 1, and Shhong Lao 2 1 Computer Scence & Technology Department, Tsnghua Unversty, Bejng, Chna ahz@mal.tsnghua.edu.cn 2 Core Technology

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Behavioral Model Extraction of Search Engines Used in an Intelligent Meta Search Engine

Behavioral Model Extraction of Search Engines Used in an Intelligent Meta Search Engine Behavoral Model Extracton of Search Engnes Used n an Intellgent Meta Search Engne AVEH AVOUSI Computer Department, Azad Unversty, Garmsar Branch BEHZAD MOSHIRI Electrcal and Computer department, Faculty

More information