LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

Size: px
Start display at page:

Download "LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals"

Transcription

1 nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted to ACM Transactons on Internet Technology. Please do not cte or quote wthout permssons from the authors.

2 Abstract As the sze and complexty of webstes expands dramatcally, t has become ncreasngly challengng to desgn webstes on whch web surfers can easly fnd the nformaton they seek. In ths paper, we address the desgn of the portal page of a webste, whch serves as the homepage of a webste or a default web portal. We defne a new and mportant research problem hyperlnk selecton: selectng from a large set of hyperlnks n a gven webste a lmted number of hyperlnks for ncluson n a portal page. The obectve of hyperlnk selecton s to maxmze the effcency, effectveness and usage of a web ste s portal page. We propose a heurstc approach to hyperlnk selecton, nkselector, whch s based on relatonshps among hyperlnks structural relatonshps that can be extracted from an exstng webste and access relatonshps that can be dscovered from a web log. nkselector calculates preferences of hyperlnks and preferences of hyperlnk sets from these relatonshps. Usng these preferences, we develop and ncorporate a clusterng algorthm nto nkselector to extract a lmted number of hyperlnks from a large set of hyperlnks. We compared the performance of nkselector wth that of the current practce of hyperlnk selecton (.e., manual hyperlnk selecton by doman experts) and wth data mnng methods classcal herarchcal clusterng and assocaton rule mnng, usng data obtaned from the Unversty of Arzona webste. Results showed that nkselector outperformed all of these. Specfcally, the three maor contrbutons of ths paper are: We have ntroduced and formally defned a new and mportant research problem -- hyperlnk selecton. We have proposed and shown that a web mnng based hyperlnk selecton approach, named nkselector, outperformed other hyperlnk selecton approaches We have developed a new clusterng algorthm and appled t to hyperlnk selecton. Applcatons of the algorthm are not lmted to hyperlnk selecton.

3 . INTRODUCTION As the sze and complexty of webstes expands dramatcally, t has become ncreasngly challengng to desgn webstes on whch web surfers can easly fnd the nformaton they seek. To address ths challenge, we ntroduce a new research problem n ths area, hyperlnk selecton, and present a web mnng based approach, nkselector, as a soluton. There are two domnant ways through whch web surfers fnd the nformaton they seek [Chakrabart 2000]: usng search engnes and clckng on hyperlnks. Research on the former s concerned wth mprovng recall and precson [awrence and Gles 998;999; Chakrabart 2000] of search engnes. Our research, however, concentrates on mprovng the effcency of the second way of web nformaton searchng. As web surfers clck on a group of hyperlnks to fnd the nformaton they seek, placng approprate hyperlnks n web pages s crtcal to mprovng ther web nformaton searchng effcency. In partcular, ths paper focuses on placng approprate hyperlnks n the portal page of a webste, whch s the entrance to a webste. The homepage of a webste s one type of portal page. Homepages whch gude users to locate the nformaton they seek easly create a good frst mpresson and attract more users, whle homepages whch make nformaton searchng dffcult result n a bad frst mpresson and correspondng user loss [Nelson and Wagner 996]. A default web portal s another type of portal page. Recently, web portals that serve as a personal entrances to webstes have attracted more and more attenton. Unverstes such as UCA have bult educatonal web portals (e.g., My UCA, corporatons such as Yahoo! have developed commercal web portals (e.g., My Yahoo!, For practcal purposes, portal servce provders (e.g., Yahoo!) provde portal users wth a standard default web portal, whch the users can personalze (e.g., add or remove hyperlnks from the default web portal). As the frst verson of a web portal encountered by portal users, the default web portal plays an mportant role n the 2

4 success of a web portal. Moreover, accordng to My Yahoo!, most users never customze ther default web portals [Manber et al. 2000]. Ths fndng makes the default web portal even more crtcal. A portal page conssts of hyperlnks selected from a hyperlnk pool, whch s a set of hyperlnks pontng to top-level web pages. Usually, the hyperlnk pool of a webste conssts of hyperlnks lsted n the ste-ndex page or the ste-drectory page. As shown n Fg., hyperlnks n the portal page of the Unversty of Arzona webste ( are selected from ts hyperlnk pool. The hyperlnk pool conssts of hyperlnks n ts ste-ndex page ( ndex/webndex.shtml). Hyperlnks n the portal page of My Yahoo! ( are also selected from ts hyperlnk pool. The pool, n ths case, conssts of hyperlnks n ts ste-drectory page. Web pages n a webste are organzed n a herarchy n whch a hgh level web page s an aggregaton of ts low level web pages [Nelson 999]. For example, the web page of faculty lst s one level hgher than ts correspondng faculty homepages and t s an aggregaton of ts correspondng faculty homepages. For a unversty webste, top level web pages nclude the web page of department lst and the web page of computng resources etc.. 3

5 Hyperlnk Selecton (a) Fg.. (a) The hyperlnk pool (top) and the portal page (bottom) of the webste of the Unversty of Arzona; 4

6 Hyperlnk Selecton (b) Fg. (b) The hyperlnk pool (top) and the portal page (bottom) of My Yahoo!. 5

7 Gven the web desgn prncple that scrollng must be avoded n portal pages [Nelson 999], a well-desgned portal page normally contans several dozen (.e., usually less than 4 dozen) hyperlnks 2. However, the hyperlnk pool of a typcal webste has at least several hundred hyperlnks. For example, the portal page of the Unversty of Arzona webste conssts of 32 hyperlnks whle the hyperlnk pool has 743 hyperlnks. It s computatonally too expensve to exhaust all combnatons of several dozen hyperlnks from a hyperlnk pool wth several hundred hyperlnks and fnd the one that s the most effcent n gudng web surfers to fnd the nformaton they seek. In ths partcular, for example, the number of combnatons of selectng 32 hyperlnks from 743 hyperlnks s.44e+56 (.e., C ). Current practce of hyperlnk selecton reles on doman experts (e.g., webste desgners) experences. Obvously, such selecton s subectve. In addton, t reflects only webste desgners perspectves on what hyperlnks should be selected, not web surfers perspectves. The second perspectve should be emphaszed as the purpose of hyperlnk selecton s to reduce web surfers nformaton searchng efforts, not web desgners. In comparson, our hyperlnk selecton method, nkselector, ncorporates both patterns extracted from the structure of a webste and those dscovered from a web log, whch records web surfers behavors of nformaton searchng. nkselector frst employs web mnng technques [Kosala and Blockeel 2000] to extract the above-mentoned patterns and calculates preferences of hyperlnks and hyperlnk sets defned n ths paper from these patterns. nkselector then selects hyperlnks from a gven hyperlnk pool by runnng the calculated preferences through a clusterng algorthm developed n ths paper. There are three maor contrbutons of ths paper. We have proposed and formally defned a new and mportant research problem hyperlnk selecton. 2 Placng too many hyperlnks n a portal page wll cause some hyperlnks to be vsble only when scrollng down the wndow of the page. Unfortunately, accordng to Nelson s research [Nelson 999], web surfers rarely scroll down the wndow of a portal page. 6

8 We have proposed and shown that a web mnng based hyperlnk selecton approach outperformed other hyperlnk selecton approaches. We have developed a new clusterng algorthm and appled t to hyperlnk selecton. Applcatons of ths algorthm are not lmted to hyperlnk selecton. Currently, we are adaptng ths algorthm to clusterng large temsets dscovered va assocaton rule mnng. The rest of the paper s organzed as follows. We revew related work n secton 2. In secton 3, we propose metrcs to measure the qualty of a portal page and formally defne the hyperlnk selecton problem. A web mnng based approach for hyperlnk selecton, nkselector, s presented n secton 4. In secton 5, we evaluate the performance of nkselector usng data obtaned from the Unversty of Arzona webste and the metrcs proposed n secton 3. We conclude the paper n secton RAATED WORK In ths secton, we revew works on web mnng on whch nkselector s based. In [Cooley et al. 997], web mnng s defned as the process of dscoverng and analyzng useful nformaton from the Web. A good survey on web mnng research can be found n [Kosala and Blockeel 2000]. Srvastava et al. classfed web data nto content, structure and usage [Srvastava et al 2000]: Content s the data n web pages. It usually conssts of texts and graphcs. Structure s the data descrbng the organzaton of the Web, such as hyperlnks. Usage s the data that descrbe web surfers nformaton searchng behavors. Web usage data can be found n web logs. For dfferent types of web data, correspondng web mnng methods are developed. Web content mnng s the process of automatcally retrevng, flterng and categorzng web documents, a 7

9 good survey on whch can be found n [Chakrabart 2000]. As t typcally makes use of only texts on web pages, valuable nformaton mplctly contaned n hyperlnks s overlooked. As a complement to web content mnng, web structure mnng [Klenberg 998; Brn and Page 998; Chakrabart et al. 999] nfers useful patterns from the Web s lnk topology to help retreve hgh qualty documents from the Web. Web usage mnng [Srvastava et al. 2000] s the process of applyng data mnng technques to dscover web access patterns from a web log. Due to ts greater relevance to ths research, we revew works on web usage mnng n more detal. The data used for web usage mnng are web logs. A web log s a collecton of data that explctly records web surfers behavors of nformaton searchng n a webste. Fg. 2 shows a sample web log collected by a web server at the Unversty of Arzona. Useful attrbutes for web usage mnng n a web log nclude IP address, tme and UR, whch explctly descrbe who at what tme accessed whch web page. Addtonal attrbutes nclude status of a HTTP request and the count of bytes returned by a web server. IP Address 3 Tme Method/UR/Protocol Status Sze [0/Sep/200:05:38: "GET /workng/ndex.shtml ] HTTP/.0" [0/Sep/200:05:38: "GET /workng/mages/heademploy.gf ] HTTP/.0" [0/Sep/200:05:38: "GET /workng/mages/work.pg ] HTTP/.0" [0/Sep/200:05:38: ] "GET /workng/mages/staffqucklnks.gf HTTP/.0" Fg. 2. A sample web log collected by a web server at the Unversty of Arzona Proects of web usage mnng are classfed nto two groups: general-purpose proects and specfc-purpose proects [Srvastava et al. 2000]. General-purpose proects, such as [Chen et al. 996; Cooley et al. 999], focused on web usage mnng n general. Cooley et al. proposed an archtecture and specfc steps for web usage mnng and presented a method to dentfy 3 To protect prvacy of web users, IP addresses n ths table are artfcal IP addresses. 8

10 potentally nterestng patterns from mnng results (e.g., patterns n whch unlnked web pages are vsted together frequently) [Cooley et al. 999]. Chen et al. explored a new data mnng capablty to mne path traversal patterns from web logs [Chen el al. 996]. Specfc-purpose proects focused on applcatons of web usage mnng. Web usage mnng can be used to mprove organzatons of webstes. Adaptve webste proect [Perkowtz and Etzon 2000] used web vstng patterns learned from web logs to automatcally mprove organzatons and presentatons of webstes. Splopoulou and Pohle exploted web usage mnng to measure and mprove the success of webstes [Splopoulou and Pohle 200]. ee and Podlaseck presented an nteractve vsualzaton system that provdes users wth abltes to actvely nterpret and explore web log data of onlne stores to evaluate the effectveness of web merchandsng [ee and Podlaseck 200]. Web usage mnng can also be used to personalze users web surfng experence. In [Yan et al. 996], clusters of vstors who exhbted smlar nformaton needs (e.g., vstors who accessed smlar web pages) were dscovered va web usage mnng. These clusters could be used to classfy new vstors and dynamcally suggest hyperlnks for them. Mobasher et al. [Mobasher 200] presented technques to learn user preferences from web usage data usng data mnng technques, such as assocaton rule mnng. Based on the learned preferences, dynamc hyperlnks could be recommended for actve vstng sessons. Anderson et al. developed MINPATH, an algorthm that automatcally suggests useful shortcut lnks n real tme to mprove wreless web navgatons [Anderson et al. 200]. A complete survey of web usage mnng research by year 2000 can be found n [Srvastava et al. 2000]. Research employng only web usage mnng, such as work descrbed n [Chen el al. 996], extracted web surfers web vstng patterns from a web log. However, theses studes dd not consder the nformaton contaned n the structure of a webste, whch s an mportant complement to web vstng patterns. In [Cooley 999; Perkowtz and Etzon 2000], the structure of a webste was used to flter out unnterestng web vstng patterns (.e., patterns n whch drectly lnked web pages are vsted together frequently). However, these excluded unnterestng 9

11 web vstng patterns provde valuable nformaton for hyperlnk selecton. We wll dscuss ths n secton 4.. The hyperlnk selecton approach proposed n ths paper s based on both web usage mnng and web structure mnng and consders both nterestng and unnterestng web vstng patterns. 3. PROBEM DEFINITION HYPERINK SEECTION To defne the hyperlnk selecton problem, we propose three metrcs to measure the qualty of a portal page effectveness, effcency and usage. All three are calculated from web logs. A web log can be broken down nto sessons wth each sesson representng a sequence of consecutve web accesses by the same vstor. For the convenence of readers, mportant notatons used n ths artcle are summarzed n Table I. 0

12 Table I. Notaton Summary Notaton Descrpton w a webste wl a web log of w s the number of sessons n wl S a sesson of wl, for =,2, s SH( S ) the set of hyperlnks clcked n S, for =,2, s HP the hyperlnk pool of w UH( S ) UH( S ) = SH( S ) HP, web pages ponted to by hyperlnks n UH( S ) are user-sought top-level web pages n l the number of hyperlnks n w a hyperlnk n w, for =,2, l P the set of hyperlnks n the web page ponted to by PH the set of hyperlnks n the portal page of w EH EH = P, the set of hyperlnks that are contaned n web pages PH drectly ponted to by the portal page of w H H = PH EH, web pages ponted to by hyperlnks n H can be easly found from the portal page of w N the number of hyperlnks to be placed n the portal page of w W the set of web pages n w k-hs a set of k hyperlnks, where HP σ (k-hs) the support of a k-hs SR the set of structure relatonshps between hyperlnks n HP AR the set of access relatonshps among hyperlnks n HP PRE the preference of a hyperlnk, where HP PHP a set of pars, n whch, each par conssts of a hyperlnk par wth group II relatonshp and ts preference PHS a set of pars, n whch, each par conssts of a hyperlnk set wth group II relatonshp and ts preference a hyperlnk cluster C sm, the smlarty between hyperlnk clusters C and SIM C C the smlarty matrx S C The effectveness of a portal page s measured as the degree of easness to fnd user-sought toplevel web pages 4 (Defnton ) from the portal page, as hyperlnks n a portal page are selected 4 We beleve that a levelwse approach s approprate for the desgn of a webste. In ths approach, the portal page s desgned to fnd user-sought top-level web pages easly. Top-level web pages then are desgned to locate user-sought web pages one level below easly. As the hyperlnk selecton approach proposed n ths paper can be appled to every subsequent level, consequently, webstes desgned n ths approach wll be easy to navgate to fnd user-sought nformaton. In ths paper, we concentrate on desgnng the portal page to facltate the search of top-level web pages.

13 from a pool of hyperlnks pontng to top-level web pages. We denote w as a webste, wl as a web log of w, s as the number of sessons n wl, S as a sesson n wl, for =,2, s, and SH( S ) as the set of hyperlnks clcked n S.We denote the hyperlnk pool of w as HP. Defnton : For a sesson S, web pages ponted to by hyperlnks n UH( S ) are usersought top-level web pages n S, where, UH( S ) = SH( S ) HP =,2, s () Usually, web pages that are -2 clcks away 5 from a portal page can be easly found from the portal page. We denote l as the number of hyperlnks n w, as a hyperlnk n w, for =,2, l, P as the set of hyperlnks n the web page ponted to by hyperlnks n the portal page of w. and PH as the set of Defnton 2: EH s the set of hyperlnks, where EH = P (2) PH EH conssts of hyperlnks that are contaned n web pages drectly ponted to by the portal page of w. Web pages ponted to by hyperlnks n H are -2 clcks away from the portal page of w and can be easly found from the portal page of w, where H = PH EH (3) 5 Web pages that are clck away from a portal page refer to web pages that are drectly ponted to by hyperlnks n the portal page. Web pages that are 2 clcks away from a portal page are web pages that are ponted to by hyperlnks n web pages clck away. 2

14 Example : As shown n Fg. 3, the portal page of a web ste contans hyperlnks, 2 and 3. Web pages ponted to by hyperlnks, 2 and 3 contan hyperlnks 2 and 4 ; hyperlnk 3 ; and hyperlnks 5 and 8 respectvely. 2 4 web page ponted to by web page ponted to by 2 portal page 5 8 web page ponted to by 3 Fg. 3. PH and EH In ths example, PH={, }, P = { }, P = { } and { } 2, 3 2, P =. Accordng to 3 5, 8 (2), EH = P = ( P P P ) {,,, } ; accordng to (3), PH = , 8 H = PH EH ={,,,, } , 8 The effectveness of a portal page can be measured n terms of the recall rate of the portal page at two dfferent levels sesson level and web log level. For a sesson S, the more hyperlnks n UH( S ) found n H, the more user-sought top-level web pages easly found from the portal page of w; hence, the hgher the effectveness of the portal page of w. 3

15 as: Defnton 3: For a sesson effectveness( S S, the sesson level effectveness of the portal page of w s defned UH( S ) H ) = (4) UH( S ) where =,2, s and X denotes the cardnalty of a set X. Example 2: In a sesson S, web pages ponted to by hyperlnks, 0,, 2, 3, 4, 5, 9, 7 and 2 were accessed. Hyperlnks, 2, 5 and 7 are also elements of the hyperlnk pool HP. Hence, UH( S ) s {, } 2, the portal page gven n Example, ts sesson level effectveness s, UH( S) H effectven ess( S ) = UH( S ) 5, 7, whch ponts to user-sought top-level web pages. For {, 2, 5} 3 = = {,,, } 4 = The result states that 75% of the user-sought top-level web pages n sesson S can be easly found from the portal page Defnton 4: For a web log wl, the log level effectveness of the portal page of w s defned as: s effectveness( S ) = effectveness( wl) = (5) s Gven the lmted number of hyperlnks that can be placed n a portal page, t s desrable to have more user-sought top-level web pages easly found from the portal page (.e., more hyperlnks n UH( S ) H ) wth fewer hyperlnks placed n the portal page (.e., fewer hyperlnks n PH). 4

16 Defnton 5: For a sesson S, the sesson level effcency of the portal page of w s defned as: UH( S ) H effcency( S ) = (6) PH where =,2, s and X denotes the cardnalty of a set X. Example 3: For the portal page gven n Example and the sesson S gven n Example 2, the sesson level effcency of the portal page s, effcency ( S UH( S) H ) = PH {, = {,, 5} 3 =, } 3 2 = 2 3 Defnton 6: For a web log wl, the log level effcency of the portal page of w s defned as: s effcency( S ) = effcency( wl) = (7) s Usage of a portal page measures how often a portal page s vsted. As the portal page constructed by nkselector has not been used by web surfers, we measure ts usage by countng the number of user-sought top-level web pages that can be easly found from the portal page (.e., the number of hyperlnks n UH( S ) H ). It s a proper approxmaton because ease of fndng these web pages from the portal page wll attract users to vst the portal page. We defne usage measured at the web log level as below. Defnton 7: For a web log wl, the usage of the portal page of w s defned as: s usage( wl) = UH( S ) H (8) = where X denotes the cardnalty of a set X. 5

17 Example 4: Gven a web log wl wth 5 sessons (.e., S, S2, S3, S4 and S 5 ) and H of a portal page, we found that UH ( S ) H =, UH ( S 2 ) H = 2, UH ( S 3 ) H = 0, UH( S 4 ) H = 2 and UH ( S 5 ) H = 3. Accordng to (8), the usage of the portal page s, 5 usage ( wl) = UH( S ) H = =8 = Based on the three metrcs presented above, we formally defne the hyperlnk selecton problem as below. Defnton 8: Gven a webste w, ts hyperlnk pool HP and the number of hyperlnks to be placed n the portal page of w N, where N < HP, the hyperlnk selecton problem s to construct the portal page by selectng N hyperlnks from the hyperlnk pool HP to maxmze the effectveness, effcency and usage of the resultng portal page w.r.t. a web log wl of w (.e., all metrcs are measured at the web log level). 4. THE INKSEECTOR APPROACH As dscussed n secton, t s computatonally too expensve to fnd the optmal soluton for the hyperlnk selecton problem. In ths secton, we present a heurstc soluton method named nkselector. Compared wth the current practce of hyperlnk selecton based on doman experts experence, our method ncorporates patterns extracted from the structure of a webste and those extracted from a web log, whch records web surfers behavors of nformaton searchng n the webste. Hence, nkselector s more obectve and reflects web surfers perspectves on whch hyperlnks should be selected whle current practce of hyperlnk selecton s subectve and reflects only doman experts perspectves. 6

18 For a web ste w, the nput of nkselector ncludes wl -- a web log of w, W -- the set of web pages n w, HP -- the hyperlnk pool of w, and N -- the number of hyperlnks to be placed n the portal page of w. The output of nkselector s the set of N hyperlnks selected from the hyperlnk pool HP. We ntroduce nkselector n secton 4.. A detaled descrpton of nkselector s llustrated step by step n secton 4.2 through secton 4.6. We dscuss tme complexty of nkselector n secton Overvew of nkselector Hyperlnks n a hyperlnk pool may be drectly connected to each other (.e., one hyperlnk s contaned n a web page ponted to by another hyperlnk) or accessed together n a sesson. Accordngly, we categorze relatonshps among hyperlnks n a hyperlnk pool nto two types structure relatonshp and access relatonshp. Defnton 9: For any par of hyperlnks and has a structure relatonshp wth hyperlnk and, denoted as, where HP, f and only f s the termnal hyperlnk n ths structure relatonshp. HP, and, P. s the ntal Example 5: As shown n Fg. 4, web page contans hyperlnks and 3 ; web page 2, whch s ponted to by hyperlnk, contans hyperlnks 2, 4, 6 and 8 ; and web page 3, whch s ponted to by hyperlnk 3, contans hyperlnk 5 and 7. All hyperlnks are elements of the hyperlnk pool HP. 7

19 3 web page web page 2 web page Fg. 4. Structure relatonshp In ths example, P = {,, } and { } 2 4 6, 8 P =. Accordng to Defnton 9, structure 3 relatonshp 2 exsts, n whch s the ntal hyperlnk and 2 s the termnal hyperlnk. Smlarly, structure relatonshps 4, 6, 8, 3 5 and 3 7 5, 7 also hold. For a structure relatonshp, puttng the ntal hyperlnk n a portal page makes the ntal hyperlnk an element of PH and the termnal hyperlnk an element of EH (.e., accordng to Defnton ). If a hyperlnk acts as the ntal hyperlnk n M structure relatonshps, placng t n a portal page makes t an element of PH and the M termnal hyperlnks n all the structure relatonshps elements of EH. As dscussed n secton 3, web pages ponted to by hyperlnks n PH or EH can be easly found from a portal page. Therefore, the more structure relatonshps a hyperlnk partcpates n as the ntal hyperlnk, the more top-level web pages can be easly found from a portal page f the hyperlnk s placed n the portal page. In Example 5, partcpates n four structure relatonshps (.e., 2, 4, 6 and 8 ) as the ntal hyperlnk. Hence, placng n a portal page enables fve top-level web pages (.e., pages ponted to by, 2, 4, 6 and 8 ) to be easly found from the portal page. 3 partcpates n two structure relatonshps (.e., 3 5 and 3 7 ) as the ntal hyperlnk. Hence, placng 8

20 3 n a portal page enables three top-level web pages (.e., pages ponted to by 3, 5 and 7 ) to be easly found from the portal page. A hyperlnk set k-hs s a set of k hyperlnks, where HP. A k-hs s accessed n a sesson S f and only f k-hs SH S ). For a web log wl, the support of a k-hs (denoted as ( σ (k-hs)) s the rato of the number of sessons n whch the k-hs s accessed over the total number of sessons n wl. Defnton 0: For a k-hs, where k 2, there exsts an access relatonshp among elements n the k-hs f and only f ts support σ (k-hs) s greater than a pre-defned threshold. σ (k-hs) s called the support of the access relatonshp. Example 6: For a web log wth 00 sessons, a 3-HS {, } s accessed n 20 sessons. 2, Hence, the support of the 3-HS, σ {,, }),s 0.2. If the threshold s set at 0.5, then there ( 2 3 exsts an access relatonshp among hyperlnks, 2 and 3 and the support of the access relatonshp s In practce, structure relatonshps are dscovered by parsng an exstng webste whle access relatonshps are extracted from a web log usng data mnng algorthms, such as assocaton rule mnng [Agrawal et al. 993]. For hyperlnks n a hyperlnk pool, ther parwse relatonshps can be categorzed nto four groups as shown n Fg. 5. Access Relatonshp Structure Relatonshp YES NO YES I II NO III IV Fg. 5. Parwse relatonshps for hyperlnks n a hyperlnk pool 9

21 Group I relatonshp ndcates that both structure relatonshp and access relatonshp hold between two hyperlnks. As we have dscussed, hyperlnks partcpatng n more structure relatonshps as ntal hyperlnks wll be selected for the portal page over other hyperlnks wth respect to ncreasng the number of top-level web pages easly found from the portal page. For a structure relatonshp, the support of the access relatonshp between and ts termnal hyperlnk -- σ {, }), reflects the qualty of the structure relatonshp n ( hyperlnk selecton. The hgher the support σ {, }), the hgher the possblty that toplevel web pages ponted to by and ( wll be accessed together n future vsts 6. In ths regard, group I relatonshp whch was regarded as unnterestng n prevous research [Perkowtz and Etzon 2000; Cooley 999], provdes us wth two ndcators of hyperlnk preference n hyperlnk selecton: the number of structure relatonshps a hyperlnk partcpates n as the ntal hyperlnk and the qualtes of these structure relatonshps measured as supports of access relatonshps between the hyperlnk and ts termnal hyperlnks. We descrbe how to calculate the preference of a hyperlnk usng these two ndcators n secton 4.4. Group II relatonshp ndcates that an access relatonshp but not a structure relatonshp exsts between two hyperlnks. As hyperlnks wth group II relatonshp are structurally unlnked, n order to navgate from the web page ponted to by one hyperlnk to the web page ponted to by the other, web vstors have to explore the webste to fnd the path. Ths creates nconvenence for web surfng and the stuaton becomes worse as these two hyperlnks are accessed together frequently (.e., access relatonshp between the hyperlnks). In contrast, f these two hyperlnks were placed together n a portal page, users could navgate between these two hyperlnks wthout searchng the webste for a path as they could easly be found from the portal page. Hence, hyperlnk pars wth group II relatonshp are preferred to hyperlnk pars wthout n hyperlnk selecton. For a hyperlnk par wth group II 6 Ths s based on an assumpton that web vstng patterns are coherent n past and future vsts [Perkowtz and Etzon 2000]. 20

22 relatonshp, the support of the access relatonshp between ts hyperlnks s the determnng factor for ts preference over other hyperlnk pars wth group II relatonshp n hyperlnk selecton. We descrbe how to calculate the preference of a hyperlnk par for group II relatonshp n secton 4.5. Group III relatonshp ndcates that a structure relatonshp but not an access relatonshp exsts between two hyperlnks. Ths relatonshp reveals that the web page ponted to by the ntal hyperlnk n a structure relatonshp contans a rarely vsted hyperlnk whch s the termnal hyperlnk n the structure relatonshp. As hyperlnk selecton focuses on choosng hyperlnks for a portal page, we do not dscuss group III relatonshp n ths paper. Group IV relatonshp does not reveal any patterns between hyperlnks; thus, s not consdered n ths research. nkselector employs group I and group II relatonshps to calculate preferences of hyperlnks and preferences of hyperlnk pars respectvely. Based on preferences calculated, a clusterng algorthm s developed to extract N hyperlnks from the gven hyperlnk pool. We outlne nkselector n Fg. 6. Steps n ths algorthm are descrbed n secton 4.2 through secton 4.6. Input: wl: a web log of a webste w W : the set of web pages n a webste w HP : the hyperlnk pool of a webste w N : the number of hyperlnks to be placed n the portal page of a webste w Output: PH: the set of hyperlnks n the portal page of a webste w Dscover structure relatonshps; Dscover access relatonshps; Calculate preferences of hyperlnks based on group I relatonshp; Calculate preferences of hyperlnk pars based on group II relatonshp; Cluster hyperlnks. Fg. 6. The sketch of nkselector 2

23 4.2 Dscover Structure Relatonshps nkselector dscovers structure relatonshps between hyperlnks n a hyperlnk pool by parsng the web pages the hyperlnks pont to. As shown n Fg. 7, the web page ponted to by each hyperlnk n HP s retreved and parsed. A structure relatonshp s added to structure relatonshps SR f s an element of HP. appears n the web page ponts to, and are dfferent and Input: HP: the hyperlnk pool of a webste w W : the set of web pages n a webste w Output: SR: the set of structure relatonshps between hyperlnks n HP SR = φ ; For each hyperlnk, where HP retreve the web page ponted to by ; parse the web page ponted to by ; For every hyperlnk, where P If ( HP) ( ) add the structure relatonshp to SR; End f End for End for Fg. 7. An algorthm to dscover structure relatonshps between hyperlnks n HP 4.3 Dscover Access Relatonshps 4.3. Web og Preprocessng Access relatonshps among hyperlnks can be extracted from a web log. A raw web log collected from a web server needs to be preprocessed before meanngful access relatonshps can be 22

24 extracted [Cooley et al. 999]. In ths research, we dvde the preprocessng task nto two steps web log cleanng and sesson dentfcaton. In web log cleanng, two types of web log records are removed. Frst, web log records wth the value of the status attrbute greater than 400 are deleted because they record faled web access. Second, web log records recordng accessory requests to a web page request, such as pcture requests, are also removed. As a raw web log records every fle request sent to a web server, one web access could result n several web log records. For example, an access to a web page wth two pctures wll result n three web log records, one for the web page and the other two for the pctures. Web log records recordng accesses to web pages are suffcent to descrbe web surfers nformaton searchng behavors. The basc processng unt for extractng access relatonshps s a sesson. A web log needs to be dvded nto sessons before the extracton of access relatonshps. However, n HTTP protocol, as connectons between web clents and a web server are stateless, there s no noton of sesson n a web log. One method of dvdng a web log nto sessons s based on tmeout. If the tme between page requests from the same user exceeds a certan lmt, t s assumed that the user has started another sesson [Cooley et al. 999]. Some commercal tools use 30 mnutes as a default tmeout. Catledge and Ptkow calculated a tmeout of 25.5 mnutes based on emprcal data [Catledge and Ptkow 995]. Another method s to modfy a web server to encode sesson dentfers n web pages transferred between clents and a web server [Yan et al. 996]. As the second method requres modfcaton of a web server, t s not convenent and practcal for many webstes. We adopt the tmeout method to dentfy sessons. Web log records are frst sorted by clent IP address then by access tme. We use a 30-mnute tmeout to dvde web log records generated by the same IP address nto sessons. 23

25 4.3.2 Mne Access Relatonshps From The Preprocessed Web og Once a raw web log has been cleaned and dvded nto sessons, assocaton rule mnng [Agrawal et al. 993] can be appled to extract access relatonshps from t. Assocaton rule mnng s defned on a set of tems {, 2,, } =. et D be a set of transactons, where each transacton k T s a set of tems such that T. The support of an temset (.e., a set of tems) n D s the fracton of all transactons contanng the temset. An temset s called large f ts support s greater than a user-specfed support threshold. The most mportant step n assocaton rule mnng s to fnd large temsets and ther supports. Varous algorthms for fndng large temsets, such as Apror [Agrawal and Srkant 994], are n use. In the case of mnng access relatonshps, sessons correspond to transactons, hyperlnks correspond to tems and hyperlnk sets correspond to temsets. Applyng the Apror algorthm on the preprocessed web log, all hyperlnk sets that have access relatonshps among ther elements can be found and ther correspondng supports can be calculated. The Apror algorthm can be found n [Agrawal and Srkant 994] and s skpped n ths paper. The output of mnng access relatonshps s a set of pars denoted as AR, n whch, each par contans a hyperlnk set and ts correspondng support. 4.4 Calculate Preferences of Hyperlnks As dscussed n secton 4., n hyperlnk selecton, the preference of a hyperlnk s determned by two factors: the number of structure relatonshps t partcpates n as the ntal hyperlnk and the qualtes of these structure relatonshps measured as supports of access relatonshps between t and ts termnal hyperlnks. Based on these two factors, we defne the preference of a hyperlnk as below. 24

26 Defnton : For a hyperlnk,where HP, and partcpates n M structure relatonshps as the ntal hyperlnk, ( ) SR, for m =,2, M, the preference of the m hyperlnk s defned as: where coeff PRE = M m= {, }) (σ ( * coeff ) (9) {, }, σ ({, } f ( )) AR m = m 0 otherwse m Example 7: Gven a hyperlnk pool HP =,,,,,, }, structure relatonshps SR { and access relatonshps AR, SR = { 2, 3, 4, 2 5, 3 4, 3 6, 4 5, 5 6, 5 7, 6 3, 7 3, 7 4} AR = { ({, 2},0.022),({, 3},0.08),({, 4},0.0),({, 5},0.002),({, 6},0.04), ({, 7},0.04),({ 2, 5 },0.007),({ 2, 6},0.006),({ 2, 7},0.008),({ 3, 6},0.008), ({ 3, 7},0.02),({ 4, 5},0.0),({ 5, 6},0.03),({ 6, 7},0.005),({, 6, 7},0.003)} Hyperlnk partcpates n three structure relatonshps as the ntal hyperlnk. Accordng to (9), PRE 3 {, }) * coeff ) = σ ({, }) * + σ ({, }) * ({, }) * = ( σ ( m σ m= = = 0.05 Smlarly, we get, PRE.007, PRE = 0.008, PRE = 0.0, PRE = 0.03, PRE = = and PRE = In ths example, hyperlnks wth the hghest and the second hghest preference value represent two types of mportant hyperlnks hub hyperlnk and hot hyperlnk. Hub hyperlnk, such as hyperlnk, s a hyperlnk that has structure relatonshps wth many other hyperlnks. Hot hyperlnk, such as hyperlnk 5, s the startng pont of a popularly vsted path. The algorthm to compute preferences of hyperlnks s trval and skpped n ths paper. 25

27 4.5 Calculate Preferences of Hyperlnk Pars As dscussed n secton 4., hyperlnk pars wth group II relatonshp are preferred to hyperlnk pars wthout n hyperlnk selecton. For a hyperlnk par wth group II relatonshp, the support of the access relatonshp between ts hyperlnks s the determnng factor for ts preference over other hyperlnk pars wth group II relatonshp n hyperlnk selecton. We set preferences of hyperlnk pars wthout group II relatonshp to be 0 and those of hyperlnk pars wth group II relatonshp to be the supports of the access relatonshps between ts hyperlnks. Both hyperlnk pars wth group II relatonshp and ther preferences can be extracted from access relatonshps AR. For a 2-HS n AR, f there s no structure relatonshp between ts elements, then the 2-HS s a hyperlnk par wth group II relatonshp and ts support σ (2-HS) s the preference of the hyperlnk par. We denote PHP as a set of pars, n whch, each par conssts of a hyperlnk par wth group II relatonshp and the preference of the hyperlnk par. Example 8: For a 2-HS {, 5 } n access relatonshps AR gven n Example 7, snce there s no structure relatonshp between hyperlnks and 5, { }, 5 and ts support become an element of PHP shown below. Preferences of hyperlnk pars outsde PHP are set to be 0. PHP = { ({, 5},0.002),({, 6},0.04),({, 7 },0.04),({ 2, 6},0.006),({ 2, 7},0.008), ({ 6, 7},0.005)} Dscussons on hyperlnk pars wth group II relatonshp can be extended to nclude hyperlnk sets wth more than 2 elements. Defnton 2: A hyperlnk set k-hs, where k 2, s a hyperlnk set wth group II relatonshp f and only f, there exsts an access relatonshp among ts hyperlnks, and there s no structure relatonshp between any par of ts hyperlnks. 26

28 Smlarly, we set preferences of hyperlnk sets wthout group II relatonshp to be 0. And, the preference of a hyperlnk set wth group II relatonshp s set to be the support of the access relatonshp among ts hyperlnks. We denote PHS as a set of pars, n whch, each par conssts of a hyperlnk set wth group II relatonshp and the preference of the hyperlnk set. Example 9:PHS extracted from access relatonshps AR n Example 7 s, PHS = { ({, 5},0.002),({, 6},0.04),({, 7 },0.04),({ 2, 6},0.006),({ 2, 7},0.008), ({ 6, 7},0.005),({, 6, 7},0.003)} Accordng to Defnton 2, hyperlnk set,, } s added to PHS. Preferences of hyperlnk sets outsde PHS are set to be 0. { 6 7 The algorthm to extract PHS from access relatonshps AR s trval and skpped n ths paper. 4.6 Cluster Hyperlnks In ths secton, a clusterng algorthm s developed to extract N hyperlnks from the gven hyperlnk pool based on preferences of ndvdual hyperlnks (.e., PRE ) and those of hyperlnk sets (.e. PHS). Clusterng s a well researched area. In [Jan and Dubes 988; Jan et al. 999], clusterng algorthms are dvded nto two groups herarchcal approaches and parttonal approaches. As parttonal approaches need to specfy the number of output clusters, they are not sutable for hyperlnk clusterng. The hyperlnk clusterng we proposed s a herarchcal approach. However, classcal herarchcal clusterng algorthms [Jan and Dubes 988; Jan et al. 999], such as sngle-lnk and complete-lnk algorthms, are not sutable for hyperlnk clusterng because of the followng lmtatons: 27

29 () Classcal herarchcal clusterng algorthms are based on a smlarty matrx whose ndexes are obects to be clustered and whose elements are parwse smlartes between these obects. In the case of hyperlnk clusterng, hyperlnks correspond to obects and preferences of hyperlnk pars correspond to smlartes between obects. In ths classcal framework, two addtonal attrbutes essental to hyperlnk clusterng have not been consdered: (a) weghts of obects -- preferences of hyperlnks n hyperlnk clusterng; (b) smlartes among three or more obects -- preferences of hyperlnk sets wth more than two hyperlnks n hyperlnk clusterng. (2) Classcal herarchcal clusterng algorthms pck ether the maxmum of the smlartes between all pars of obects drawn from a par of clusters (one element from each cluster) (.e., sngle-lnk algorthm) or the mnmum of the smlartes between all pars of obects drawn from a par of clusters (.e., complete-lnk algorthm) as the smlarty between the two clusters. However, both mnmum and maxmum methods could bas the actual smlarty between clusters. To nclude preferences of hyperlnk sets wth more than two hyperlnks, our hyperlnk clusterng algorthm s based on a smlarty matrx whose ndexes are hyperlnk clusters and whose elements are smlartes between hyperlnk clusters. Intally, each hyperlnk n the gven hyperlnk pool s placed n a unque hyperlnk cluster. As hyperlnk clusters are merged, the smlarty matrx s updated. To address lmtaton (a), we defne the smlarty between two hyperlnk clusters C = {,,, } and C =,,, } to nclude two components: preferences of 2 P { 2 Q ndvdual hyperlnks n from C and C and C and preferences of hyperlnk sets whose elements are selected C. When calculatng preferences of ndvdual hyperlnks n C and C, we use 28

30 average preference of hyperlnks n C and C, P Q PRE + p p= q= P + Q PRE q, nstead of sum of preferences of hyperlnks n C and C, P p= PRE p + Q q= PRE q, to avod the possblty that the number of hyperlnks n clusters could mpact the smlarty between them. To address lmtaton (b), we consder preferences of hyperlnk sets k-hss (.e., k 2, u elements from C and (k-u) elements from C, where u and ( k u) ), when calculatng the smlarty between hyperlnk clusters C and C. These hyperlnk sets can be grouped nto several categores based on the value of k. For each category, we use the average preference of hyperlnk sets, average ( σ ( k - HS)), nstead of the mnmum or the maxmum preference of hyperlnk sets (.e., lmtaton 2). Defnton 3: For any two hyperlnk clusters C = {,,, } and 2 P C =,,, } where P and Q, { 2 Q P PRE + PRE p q p= q=, Q smc C = + (average( σ ( k - HS)) ) (0) P + Q where k-hs s a hyperlnk set wth u elements selected from C, k 2, u and ( k u) ; and ( k - HS, σ ( k - HS)) PHS. C and (k-u) elements selected from Example 0: Gven preferences of hyperlnks and hyperlnk sets calculated n Example 7 and Example 9, for two hyperlnk clusters C = { } and { } sm C,C 2 ncludes two parts,, 2 C =, accordng to (0), 2 6, 7 PRE + PRE + PRE 2 + PRE 6 7 preferences of hyperlnks n C and C 2 : = preferences of hyperlnk sets whose elements are selected from C and C 2 : 29

31 {, }) + σ ({, }) + σ ({, }) + σ ({, } σ ( ) average ( σ (2 - HS)) = 4 = {,, }) σ ( 6 7 average ( σ (3- HS)) = Hence, = sm C,C = = For the hyperlnk pool gven n Example 7, seven hyperlnks are ntally placed n seven unque hyperlnk clusters. Smlartes between hyperlnk clusters can be computed and a 7 7 ntal smlarty matrx s created as below. Indexes of the matrx (.e.,,2,3,4,5,6,7) represent hyperlnk clusters { }{, }{, }{, }{, }{ } and { } , 6 respectvely Once the ntal smlarty matrx has been created, hyperlnk clusters wth the hghest smlarty, { } and { 7 }, are merged nto one hyperlnk cluster { } to a 6. A 7 7 smlarty matrx changes 6 smlarty matrx and smlartes related to the merged hyperlnk cluster (.e., { }, 7 ) are re-computed. The updated smlarty matrx s lsted below and ndexes of the matrx (.e.,,2,3,4,5,6) represent hyperlnk clusters {, }{, }{, }{, }{ } and{ } respectvely , 5 6,

32 Based on the updated smlarty matrx above, hyperlnk clusters {, 7 } and { 6 } can be merged and the smlarty matrx s updated agan. Ths process s repeated untl the number of hyperlnks n a hyperlnk cluster s greater than or equal to the number of hyperlnks to be placed n the portal page N. Ths cluster contans all hyperlnks to be placed n the portal page 7. We summarze the hyperlnk clusterng algorthm n Fg. 8. Input: HP: the hyperlnk pool of a webste w PRE : the preference of a hyperlnk n HP, for =,2,, HP PHS : hyperlnk sets and ther preferences N: the number of hyperlnks to be placed n the portal page of w Output: PH : the set of hyperlnks n the portal page of w dstrbute each hyperlnk n HP nto an unque hyperlnk cluster; max_sze = ; /*max_sze: maxmum number of hyperlnks n hyperlnk clusters*/ ntalze SIM; /*SIM: smlarty matrx*/ Whle (max_sze < N) merge hyperlnk clusters wth the hghest smlarty nto C; /*C: merged cluster*/ If (sze(c) > max_sze) /*sze(c): number of hyperlnks n C*/ max_sze = sze(c); End f re-compute SIM ; /*only need to re-compute smlartes related to the merged cluster*/ End whle PH = C; Fg. 8. The hyperlnk clusterng algorthm 4.7 TIME COMPEXITY OF INKSEECTOR To compute tme complexty of nkselector, tme complexty for each step n the algorthm s consdered frst. We denote the number of hyperlnks n the gven hyperlnk pool HP as n l, the number of sessons n a web log as n s, the number of elements n access relatonshps AR as n AR 7 If the number of hyperlnks n the result hyperlnk cluster s larger than the number of hyperlnks to be placed n a portal page (N), then N hyperlnks are selected from the cluster accordng to ther ndvdual preferences, from hgh to low. 3

33 and the number of elements n structure relatonshps SR as n SR. Structure relatonshps are dscovered n step. In ths step, pages ponted to by all hyperlnks n HP are retreved and parsed to extract structure relatonshps. If the average number of hyperlnks n all retreved pages s β, tme complexty for step s O( n l β ). Usually, β s much less than n l. Hence, tme complexty for step s O( n l ). We employ the Apror algorthm to dscover access relatonshps AR 8 n step 2. Apror goes through all sessons n a web log α rounds to dscover access relatonshps AR. Therefore, tme complexty for step 2 s O( n s α ). As α s much less than n s (e.g., n our experment dscussed n Secton 5, α equals to 6 whle n s s 262K), tme complexty for step 2 s the order of O( n s ). Preferences of hyperlnks are computed n step 3. If the average number of structure relatonshps a hyperlnk partcpates n as the ntal hyperlnk s γ, tme complexty for step 3 s O( n l γ ). Usually, γ s much less than n l. Hence, tme complexty for step 3 s O( n l ). In step 4, each hyperlnk set n access relatonshps AR s checked to see whether there s structure relatonshp between ts elements. Hence, tme complexty for 2 step 4 O( n AR ). One n l nl smlarty matrx s created n step 5, whch needs O( n l ) tme. At worst, t also needs O( n ) tme to re-compute the smlarty matrx. Hence, tme complexty for 2 l step 5 s the order of O( n ). Combnng tme complextes of all 5 steps, we get the tme 2 complexty of nkselector as O( n + n 2 l l s + n AR ). 5. EXPERIMENT RESUTS In ths secton, we compare nkselector wth other hyperlnk selecton approaches. The data s descrbed n secton 5. and a portal page constructed usng nkselector s presented n secton 5.2. We compare nkselector wth the current practce of hyperlnk selecton n secton 5.3. In 8 We assume that web log s cleaned and sessons are dentfed before applyng nkselector. 32

34 secton 5.4, we also compare nkselector wth classcal herarchcal clusterng and wth assocaton rule mnng -- a popular data mnng method used n the web usage mnng feld. 5. DATA We dd our experments on the Unversty of Arzona webste because t s large enough and generates enough web logs to permt comparsons of dfferent hyperlnk selecton approaches. One month s web log recently collected from the unversty web server was used n our experments. The raw web log contaned 0 mllon records. After web log cleanng, the cleaned web log had 4.2 mllon records, enough to extract web vstng patterns from. We used a 30- mnute tmeout to dvde the cleaned web log nto 344K sessons. 23 days of the cleaned web log were used as the tranng data, whch contaned 262K sessons; 7 days of the cleaned web log were used as the testng data, whch contaned 82K sessons. The hyperlnk pool conssts of 743 hyperlnks n the ndex page of the unversty webste ( edu/ndex/webndex.shtml). Among these, 0 hyperlnks ponted to web pages at the unversty web server and the remanng 633 hyperlnks ponted to web pages at other web servers. As the web log was collected from the unversty web server, access relatonshps among the 633 hyperlnks pontng to web pages at other web servers could not be mned from the collected web log. Therefore, the hyperlnk pool used n our experments conssted of the 0 hyperlnks pontng to web pages at the unversty web server. 5.2 RESUTS OF INKSEECTOR Applyng nkselector to the hyperlnk pool and the tranng data, preferences of hyperlnks and hyperlnk sets were calculated. We lst hyperlnks wth top ten preference values n table II. 33

35 Table II. Hyperlnks n the Hyperlnk Pool wth Top 0 Preference Values Hyperlnk Preference /ndex/alldepts-ndex.shtml /shared/sports-entertan.shtml /workng/teachng.shtml /shared/aboutua.shtml /newschedule/parse-schedule-new.cg /student_lnk /spotlght/ndex.shtml /shared/gettng-around.shtml /shared/athletcs.shtml /shared/tours.shtml In table II, the hyperlnk wth the hghest preference value(.e., /ndex/alldepts-ndex.shtml) s a hub hyperlnk that has structure relatonshps wth hyperlnks pontng to homepages of departments, programs and colleges at the unversty. The hyperlnk wth the second hghest preference value (.e., /shared/sports-entertan.shtml) s a hot hyperlnk whch s the startng pont of a popularly vsted path leadng to web pages on sports and entertanments at the unversty. Clusterng hyperlnks based on preferences of hyperlnks and hyperlnk sets resulted n the 0 hyperlnks selected for the unversty portal pages. These hyperlnks are lsted n table III. Table III. Result of nkselector (N=0) No. Hyperlnk /ndex/alldepts-ndex.shtml 2 /shared/sports-entertan.shtml 3 /workng/teachng.shtml 4 /shared/aboutua.shtml 5 /shared/gettng-around.shtml 6 /spotlght/ndex.shtml 7 /newschedule/parse-schedule-new.cg 8 /student_lnk 9 /phonebook 0 /academc/oncourse/data/nterface 5.3 PERFORMANCE COMPARISON WITH EXPERT SEECTION AND TOP-INK SEECTION Current practce of hyperlnk selecton, namely expert selecton, reles on doman experts (e.g., desgners of a webste) experences. Hyperlnks contaned n the current portal page 34

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Effective Page Recommendation Algorithms Based on. Distributed Learning Automata and Weighted Association. Rules

Effective Page Recommendation Algorithms Based on. Distributed Learning Automata and Weighted Association. Rules Effectve Page Recommendaton Algorthms Based on Dstrbuted Learnng Automata and Weghted Assocaton Rules R. Forsat 1*, M. R. Meybod 2 1 Department of Computer Engneerng, Islamc Azad Unversty, Karaj Branch,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A new segmentation algorithm for medical volume image based on K-means clustering

A new segmentation algorithm for medical volume image based on K-means clustering Avalable onlne www.jocpr.com Journal of Chemcal and harmaceutcal Research, 2013, 5(12):113-117 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCRC5 A new segmentaton algorthm for medcal volume mage based

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Recommendations of Personal Web Pages Based on User Navigational Patterns

Recommendations of Personal Web Pages Based on User Navigational Patterns nternatonal Journal of Machne Learnng and Computng, Vol. 4, No. 4, August 2014 Recommendatons of Personal Web Pages Based on User Navgatonal Patterns Yn-Fu Huang and Ja-ang Jhang Abstract n ths paper,

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

Online Text Mining System based on M2VSM

Online Text Mining System based on M2VSM FR-E2-1 SCIS & ISIS 2008 Onlne Text Mnng System based on M2VSM Yasufum Takama 1, Takash Okada 1, Toru Ishbash 2 1. Tokyo Metropoltan Unversty, 2. Tokyo Metropoltan Insttute of Technology 6-6 Asahgaoka,

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment A Webpage Smlarty Measure for Web Sessons Clusterng Usng Sequence Algnment Mozhgan Azmpour-Kv School of Engneerng and Scence Sharf Unversty of Technology, Internatonal Campus Ksh Island, Iran mogan_az@ksh.sharf.edu

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Utilizing Content to Enhance a Usage-Based Method for Web Recommendation based on Q-Learning

Utilizing Content to Enhance a Usage-Based Method for Web Recommendation based on Q-Learning Proceedngs of the Twenty-Frst Internatonal FLAIS Conference (2008) Utlzng Content to Enhance a Usage-Based Method for Web ecommendaton based on Q-Learnng Nma Taghpour Department of Computer Engneerng Amrkabr

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Data Preprocessing Based on Partially Supervised Learning Na Liu1,2, a, Guanglai Gao1,b, Guiping Liu2,c

Data Preprocessing Based on Partially Supervised Learning Na Liu1,2, a, Guanglai Gao1,b, Guiping Liu2,c 6th Internatonal Conference on Informaton Engneerng for Mechancs and Materals (ICIMM 2016) Data Preprocessng Based on Partally Supervsed Learnng Na Lu1,2, a, Guangla Gao1,b, Gupng Lu2,c 1 College of Computer

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

A Heuristic for Mining Association Rules In Polynomial Time

A Heuristic for Mining Association Rules In Polynomial Time A Heurstc for Mnng Assocaton Rules In Polynomal Tme E. YILMAZ General Electrc Card Servces, Inc. A unt of General Electrc Captal Corporaton 6 Summer Street, MS -39C, Stamford, CT, 697, U.S.A. egemen.ylmaz@gecaptal.com

More information

Research on Categorization of Animation Effect Based on Data Mining

Research on Categorization of Animation Effect Based on Data Mining MATEC Web of Conferences 22, 0102 0 ( 2015) DOI: 10.1051/ matecconf/ 2015220102 0 C Owned by the authors, publshed by EDP Scences, 2015 Research on Categorzaton of Anmaton Effect Based on Data Mnng Na

More information

A Heuristic for Mining Association Rules In Polynomial Time*

A Heuristic for Mining Association Rules In Polynomial Time* Complete reference nformaton: Ylmaz, E., E. Trantaphyllou, J. Chen, and T.W. Lao, (3), A Heurstc for Mnng Assocaton Rules In Polynomal Tme, Computer and Mathematcal Modellng, No. 37, pp. 9-33. A Heurstc

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

SAO: A Stream Index for Answering Linear Optimization Queries

SAO: A Stream Index for Answering Linear Optimization Queries SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Cross-Language Information Retrieval

Cross-Language Information Retrieval Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network A New Token Allocaton Algorthm for TCP Traffc n Dffserv Network A New Token Allocaton Algorthm for TCP Traffc n Dffserv Network S. Sudha and N. Ammasagounden Natonal Insttute of Technology, Truchrappall,

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

The Effect of Similarity Measures on The Quality of Query Clusters

The Effect of Similarity Measures on The Quality of Query Clusters The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of

More information

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China for Database Clusterng Guangdong Unversty of Technology, Guangdong, 0503, Chna E-mal: 6085@qq.com Me Zhang Guangdong Unversty of Technology, Guangdong, 0503, Chna E-mal:64605455@qq.com Database clusterng

More information

Preprocessing of Web Usage Data for Application in Prefetching to Reduce Web Latency

Preprocessing of Web Usage Data for Application in Prefetching to Reduce Web Latency Internatonal Journal of Electrcal& Computer Scences IJECS-IJENS Vol:14 No:04 1 Preprocessng of Web Usage Data for Applcaton n Prefetchng to Reduce Web Latency G T Raju Professor, Department of CSE, RNS

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Behavioral Model Extraction of Search Engines Used in an Intelligent Meta Search Engine

Behavioral Model Extraction of Search Engines Used in an Intelligent Meta Search Engine Behavoral Model Extracton of Search Engnes Used n an Intellgent Meta Search Engne AVEH AVOUSI Computer Department, Azad Unversty, Garmsar Branch BEHZAD MOSHIRI Electrcal and Computer department, Faculty

More information

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models Text and Data Mnng In Innovaton Joseph Engler Innovaton Typology Generatonal Models 1. Lnear or Push (Baroque) 2. Pull (Romantc) 3. Cyclc (Classcal) 4. Strategc (New Age) 5. Collaboratve (Polyphonc) Collaboratve

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information