Term Ranking for Clustering Web Search Results

Size: px
Start display at page:

Download "Term Ranking for Clustering Web Search Results"

Transcription

1 Term Rakig for Clusterig Web Search Results Fatih Gelgi Departmet of Computer Sciece ad Egieerig Arizoa State Uiversity Tempe, AZ, 85287, USA Hasa Davulcu Departmet of Computer Sciece ad Egieerig Arizoa State Uiversity Tempe, AZ, 85287, USA Sriivas Vadrevu Departmet of Computer Sciece ad Egieerig Arizoa State Uiversity Tempe, AZ, 85287, USA ABSTRACT Clusterig web search egie results for ambiguous keyword searches poses uique challeges. First, we show that oe caot readily import the frequecy based feature rakig to cluster the web search results as i the text documet clusterig. Next, we preset TermRak, a variatio of the PageRak algorithm based o a relatioal graph represetatio of the cotet of web documet collectios. TermRak achieves desirable rakig of discrimiative terms higher tha the ambiguous terms, ad rakig ambiguous terms higher tha commo terms. We experimet with two clusterig algorithms to demostrate the efficacy of TermRak. TermRak is show to perform substatially better tha frequecy based classical methods. Keywords: Web page clusterig, term rakig, PageRak, radom walk. 1. INTRODUCTION Durig Web search, whe keyword queries are ambiguous such as apple, which appears i various cotexts such as Computers or Fruit, it becomes arduous for a user to idetify the distict seses of their search terms ad fid cotextual search results. This problem is further exacerbated whe the user is uable to guess additioal discrimiative keywords, such as ipod, to filter the matches. For example, there are 544 millio results retured by Google search egie 1 for the keyword query apple. A better way to browse the search results retrieved by a search egie would be to orgaize them ito clusters with their descriptios (like Vivisimo 2, EigeCluster [3], SakeT [4] ad Grouper [15]) to guide the user durig her search. The features i Web documet clusterig are usually geerated from HTML sytax ad text i a give collectio. Sytactic features are out of the scope of this work; this paper focuses o exploitig most useful terms i a give Web Permissio to make digital or hard copies of all or part of this work for persoal or classroom use is grated without fee provided that copies are ot made or distributed for profit or commercial advatage ad that copies bear this otice ad the full citatio o the first page. To copy otherwise, to republish, to post o servers or to redistribute to lists, requires prior specific permissio ad/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX...$5.00. documet collectio retrieved from a ambiguous keyword search such as apple. Clusterig has bee previously applied o text documets [8]. The commo practice is to use all the terms as features [13] i the documet collectio by geeratig the term vectors for each documet. Especially i large collectios usig all terms explodes the feature space ad gives rise to a wellkow problem, the curse of dimesioality. It slows dow ad reduces the quality of clusterig dramatically. To avoid this problem, oe usual method is to use the top-k best raked terms as features where k is a reasoable dimesio for clusterig techique to be used. I that case, the quality of top-k terms eeds to be assured by a robust term rakig method. Broadly used frequecy based term rakig methods are TF (term frequecy) ad TF/IDF (term frequecy/iverse documet frequecy) [13] that pealizes the weights for commo keywords that appear i large umber of documets such as that, about, the etc. These measures work well o clusterig text documets sice text documet collectios usually cotai focused ad mostly relevat vocabulary for their terms. However, clusterig web search egie results for ambiguous keyword searches poses uique challeges. First, we show that we caot readily import the frequecy based features to cluster the web search results. This is due to the ature of the web search results which (i) usually cotai various kids of irrelevat terms i their avigatioal aids, advertisemets ad liks to other pages ad also due to the observatio that (ii) Web pages are usually ot as comprehesive as text documets ad mostly cover oly fragmets of the relevat cotext. Hece, we propose to orgaize terms foud i web search results ito three categories: discrimiative terms that belog to a specific cotext ad strogly related with a distict sese of the keyword search term, ambiguous terms that have may seses, ad commo terms that appear i may distict cotexts of a keyword search term. Adjustig term weights based o their category is critical for buildig pure clusters of web search results. For example, for the keyword search term apple, the term ipod is discrimiative i determiig that the web documets matchig ipod belogs to the computer sese of the word. Where as, a commo term such as cotact or iformatio should ot carry much weight durig clusterig. I this paper we preset TermRak, a variatio of the PageRak [11] algorithm based o a relatioal graph represetatio of the cotet of web documet collectios. TermRak achieves desirable rakig of discrimiative terms higher

2 tha the ambiguous terms, ad rakig ambiguous terms higher tha commo terms. As idicated i [16], traditioal clusterig algorithms refie terms after clusterig istead of filterig carefully before clusterig [9, 14, 5, 7]. However, a good term rakig method improves the quality of the features ad helps Web page clusterig sigificatly. We use two clusterig algorithms to demostrate the efficacy of TermRak: K-meas, a popular ad efficiet clusterig algorithm, ad SCuBA [1], a state-of-art subspace clusterig algorithm. We show that performaces of classical term rakig methods, TF ad TF/IDF, for both K- meas ad SCuBA are very close. O the other had, TermRak is show to perform substatially better tha the classical methods for both K-meas (purity +9% ad F-measure +11%) ad SCuBA (purity +4% ad F- measure +5%). 2. FREQUENCY BASED TERM RANKING Term rakig is a essetial issue i clusterig documets. Rakig distiguishig terms higher yields better estimatio of similarity betwee documets ad hece higher quality clusterig. Stadard frequecy based term rakig methods i Iformatio Retrieval (IR) are: Term frequecy (TF) is the frequecy of a term amog all the terms i the Web page collectio, ad calculated as T F (t) = t where t is the umber of occurreces of t i the collectio ad is the total umber of terms i the collectio. Term frequecy / iverse documet frequecy TF/IDF [13] is a method to reduce the bias of term frequecy by pealizig with the documet frequecy. It is calculated as T F/IDF (t) = T F (t). lg W where D(t) is D(t) the set of Web pages t appears. 2.1 Aalysis o the Web Data TF/IDF is kow to be a effective method for rakig terms i text documet collectios. However, Web pages are ot composed of oly raw text ad the characteristics of the Web data is differet from text documets. We ca distiguish betwee Web ad text documet collectios by makig the followig key observatios: Observatio 1. Web pages are usually ot as comprehesive as text documets ad mostly cover oly fragmets of the relevat cotext. Observatio 2. Almost all terms appearig i a text documet ca be cosidered as relevat withi the documet s cotext. However, oe ca observe may cotext irrelevat terms which might be part of advertisemets, headers or footers, ad avigatio aids such as back, cotact ad search withi Web pages. Substatial quality differece betwee text documet clusterig ad Web page clusterig is apparet i experimetal results of [2, 9]. Such results strogly supports our observatios. We will detail our aalysis by ivestigatig the frequecy distributios of frequet terms o the Web data. We also observe that, all frequet terms ca be grouped ito three categories: Discrimiative terms: These terms typically belog to a specific cotext, ad they are strogly related with the cotext. Mac, ipod ad recipe are such examples from apple 3 data. Ambiguous terms: These terms appear i more tha oe cotext ad their degree of relatedess might vary depedig o the cotext. For istace, software ad computer appear i both Computers ad Video games categories of the apple data. However, their degree of relatedess would ot be weak due to the overlap i the cotext of categories. Commo terms: These terms appear i may cotexts i the data. Ulike the ambiguous terms, they have weak coectios with the cotext that they appear. Some examples of commo terms are , cotact, ad search. I the rest of the paper importat terms refer to both discrimiative ad ambiguous terms. I Web page clusterig, rakig discrimiative terms higher tha the ambiguous terms, ad rakig ambiguous terms higher tha the commo terms cotribute to high accuracy. Sice cotextual relatedess does ot deped o the term frequecies aloe ad i the Web data commo ad ambiguous terms have higher frequecy characteristics, TF method would mix the raks of differet types of terms. TF/IDF is a good measure for pealizig commo ad ambiguous terms usig iverse documet frequecy i text collectios. However, it becomes ieffective for rakig ambiguous ad commo terms lower tha discrimiatig terms sice they would ot appear i sufficietly large umbers due to Observatio 1 of Web collectios. As a cosequece, TF/IDF gets ito the same pitfalls as the other frequecy based methods such as TF. To illustrate the similarities of term rakigs based o TF ad TF/IDF we raked the top-200 (out of 24,455) terms i the apple data. We observed excessive overlap (97%!) betwee their rakigs ad similar weightig of the terms. Cosequetly, oe caot merely rely o term frequecy based measures to rak amog the three groups of terms i the preferred orderigs described above. Next, we will preset a ew rakig method i order to obtai the desired term rakigs for clusterig Web search results. 3. EXTRACTING RELATIONAL GRAPH A relatioal graph is a weighted udirected graph that captures the co-occurrece relatioship iformatio betwee terms i a give Web page collectio. The odes are the terms i the collectio, ad the weights o the edges represet the associatio stregth betwee the terms. Formally, we defie ad iitialize a relatioal graph as follows: Defiitio 1 (Relatioal Graph). A relatioal graph G is a weighted udirected graph where the odes are the terms ad the edge weights are the couts of the correspodig co-occurrece relatios i the collectio. Assumig w ij as the weight betwee the terms i ad j, w ij is the umber of times the edge (i, j) appears i the etire data. The weight of the ode i is iitialized as the occurrece of the correspodig term i the collectio. 3 The experimetal Web page collectio is gathered from with apple keyword search.

3 Note that the edges are udirected sice associatio stregth betwee terms is a bidirectioal measure. Extractig cotextually related terms precisely from Web pages require deep sytactic ad sematic aalysis. This problem is outside the scope of this paper. Thus we use a simple heuristic that retrieves oly the blocks i which the search keyword appear withi each Web page. Here, block refers to the text fragmets delimited by a set of pre-determied tags such as <div>, <spa>, <table>, <p>, <ul> ad <ol>. The edges are iitialized as the co-occurrece of terms withi such blocks. We observed that usig the etire page for relatio extractio itroduces too may irrelevat term associatios. For example, terms such as copyright, search, back are almost always irrelevat to the keyword search cotext. Our block based heuristic would reduce their associatio stregths sice they would rarely co-occur i the same block with the importat terms. Suppose a relatioal graph of a Web search result collectio is give. We ext propose a efficiet algorithm to utilize the relatioal graph to rak the importat terms. We also observe that differet types of terms preseted above have the followig associatio characteristics: A discrimiative term does ot have may eighbors, but importat oes, ad its associatios are strog. A ambiguous term has may eighbors ad it has strog associatios as well as weak oes. A commo term has may eighbors ad mostly its associatios are weak. Figure 1: The fragmet of the relatioal graph of apple data. Sizes of odes ad thickess of edges are proportioal to their term frequecies ad associatio stregths respectively. A fragmet of a relatioal graph of apple data is preseted i Figure 1 for illustrative purposes. Discrimiative terms such as mac ad recipe have eighbors with strog associatios. Commo terms such as cotact ad have may eighbors with weak coectios. Computer ad software are examples of ambiguous terms that have may eighbors with strog associatios as well as weak oes. Let W = {W 1, W 2,..., W W } be the set of Web pages i the collectio. Cosiderig a Web page is the vector of its terms ad all pairwise relatios betwee them, extractio of the relatioal graph of the collectio requires W i W Wi 2 time. OP 4. TERMRANK ALGORITHM TermRak is a variatio of PageRak algorithm that calculates the raks of the terms i the relatioal graph. This sectio gives a brief backgroud o PageRak ad its ituitive explaatio. The, we preset the TermRak algorithm alogside its justificatio for term rakig. 4.1 PageRak PageRak [11] is a method that calculates the importace of the odes i a lik (citatio) graph of Web pages. The idea is based o the probability of a radom surfer visitatio to a Web page followig the liks of Web pages. As idicated i [11], importace of a Web page is cosidered to be proportioal to the umber importat sources poitig to that page. PageRak computes this recursive defiitio of importace by utilizig a radom walk approach over the Web lik graph. Radom walk propagates the probability of each page to the pages it liks to. Give the lik graph G PageRak score for a ode is calculated as: P R(i) = α X j N (i) P R(j) N (j) + + (1 α) 1 V(G) where V(G) is the set of odes i G, N ( ) + ad N ( ) gives the set of eighbors that are coected to the ode with their outgoig ad icomig liks respectively. (1 α) is a decay factor to avoid of rak siks. Rak siks [11] are defied to be a set of odes which have liks betwee themselves but o liks to the other odes. Hece these odes geerate a loop ad accumulate rak but they ever distribute ay rak outside sice there is o outgoig edges. That decay factor acts as a exit ode with certai probability. A ituitive explaatio for the decay factor is give as the jumpig probability of a radom surfer to other pages. 4.2 TermRak Origially PageRak operates o a directed graph, ad edges have o weights. TermRak adopts the PageRak algorithm to icorporate udirected edges ad edge weights. Hece, all edges are cosidered to be both icomig ad outgoig. Additioally, jumpig to a radom page requiremet i PageRak is ot ecessary for TermRak sice there are o rak siks i udirected graphs. Thus decay factor is ot icluded i our formula. Give a relatioal graph G, TermRak ca be calculated as follows: T R(i) = X j N (i) PT R(j).w ij k N (j) w jk where N ( ) represets the set of eighbors of the ode. Similar to [11, 6], to compute TermRak, we use a very efficiet approximatio method which iterates Equatio 2: T R (0) (i) = T R (t+1) (i) = w i Pj V(G) wj = T F (i) X j N (i) T R (t) (j).w ij Pk N (j) w jk (1) (2). (3)

4 TermRak iteratio: 0 iteratio: 20 TF/IDF mac macitosh game ipod video computer cotact Table 1: Raks of terms based o TermRak ad TF/IDF. The higher value is the higher rak. I Equatio 3, the raks of the odes are iitialized with their term frequecies sice importat terms are assumed to be potetially frequet i the collectio. This iitializatio biases the formula towards the odes with high term frequecies. Please recall that this set of frequet terms cotais importat terms as well as commo oes. ItemRak rus util the differece betwee to iteratios is less tha δ which is a reasoably small value. TermRak satisfies the exteder graph property ad hece its covergece is guarateed [10]. I radom walks, a high rakig ode eeds to satisfy two essetial factors: importat eighbors, strog coectios. Please ote that the umber of liks of a ode is a importat factor i PageRak due to its o-weighted (or uiform) liks. However, i weighted graphs the stregth of the edges are the determiig factor. For istace, a ode with oe edge which has a weight of 10 will receive more rak tha a ode with 10 edges each of which has a weight of 1 whe the rest of their graphs are idetical. The term raks from higher to lower will be sorted as follows: (i) the odes with may importat eighbors ad strog coectios, (ii) the oes with some importat eighbors ad some strog coectios, ad fially (iii) the oes with may eighbors ad may weak coectios. Please recall that the term rak orderigs preseted above would correspod exactly to the desired rakigs of discrimiative, ambiguous, ad commo terms. Figure 2: The fragmet of the relatioal graph of apple data. The umbers uder the ode labels represet the term ad documet couts of the correspodig odes i the Web page collectio. Similarly, the umbers o the edges represet the edge couts. To illustrate TermRak o a example, Figure 2 which is part of the relatioal graph of apple data is adopted from Figure 1. mac, macitosh, game, ipod ad video are discrimiative terms, computer is a ambiguous term ad cotact is a commo term. Node weights are iitialized as term couts ad δ is set to As preseted i Table 1, iitial raks of the terms are TF values which gives a mixed rakig i terms of discrimiative, ambiguous ad commo terms. TermRak coverges i 20 iteratios ad it results with the desirable rakig of the terms; first five are the discrimiative terms, the the ext two are the ambiguous terms ad fially the commo term. Coversely, computer has the highest rak ad cotact has the third rak i TF/IDF rakig. TermRak retrieves each ode ad edge i each iteratio. That meas the total ruig time for k iteratios is O (k( + m)) utilizig hash tables for odes ad their edges. I our experimets we foud that k is a small costat. Eve for 322 millio liks, the PageRak algorithm coverges i about 52 iteratios [11]. 5. EXPERIMENTS Keyword # of # of Categories pages terms apple Computers(463), Fruit(136), Music(21), Locatios(17), Movies(6), Games(5) dell Arts(18), Computers(11), Authors(4) gold Shoppig(471), Miig(151), Movies(28), Motors(11), Games(8), Sports(1) jaguar Cars(78), Video games(48), Aimals(9), Music(3) jorda Coutry(249), Music(42), Authors(38), City(29), Basketball(10), Geealogy(8), Baks(10), Movies(10), Soccer(1), News(4) satur Cars(22), Plaets(21), Aime(19), Video games(9) tiger Sports(35), Video games(28), Aimals(25), Movies(19), Terrorism(3) Table 2: Selected search keywords ad their correspodig categories i data sets. I our experimets, we idetified some ambiguous keywords ispired from [16] o Ope Directory Project 4 (ODP). ODP is the largest, most comprehesive huma-edited directory of the Web. It is costructed ad maitaied by a vast, global commuity of voluteer editors. We selected the search keywords apple, dell, gold, jaguar, jorda, satur ad tiger for data collectio comprisig 2,112 Web pages ad 81,422 uique terms i total. These documet collectio statistics are preseted i Table 2. Durig the preprocessig step, commo data types such as percetages, dates, umbers etc., stop words, ad puctuatio symbols are filtered usig simple regular expressios. We used two clusterig algorithms to demostrate the efficacy of TermRak: K-meas ad SCuBA [1]. K-meas is oe of the most commo clusterig methods preferred for its speed ad quality. I our experimets the actual umber of clusters K is provided accordig to the umber of matchig ODP categories. For each keyword data, K-meas has bee executed 20 times ad the results correspod to the average of all rus. Purity ad F-measure deviate i the iterval of ±5%. SCuBA is a state-of-art subspace clusterig algorithm that efficietly determies clusters ad their related features (subspaces) by aalyzig frequet term sets of documets. It is origially part of a article recommedatio system for researchers. The experimets ru o a Itel Petium4 3GHz CPU with 1GB RAM which had Widows XP operatig system. 4

5 K-Meas SCuBA TF TF/IDF TermRak TF TF/IDF TermRak Keyword P E F P E F P E F P E F P E F P E F apple dell gold jaguar jorda satur tiger overall Table 3: Performace compariso of term rakig methods i K-meas ad SCuBA. P, E ad F refers to purity, etropy ad F-measure respectively. etropy of the clusterig: E = X i i Ei. (6) Figure 3: The overlap of TF/IDF features with the other methods. Keyword Sample features Rel. Category apple cake, puddig, fruit, Fruit recipe, pie, bread dell computer, desktop, busiess, software Computers gold exploratio, mieral, miig, Miig resources, paig, compay jaguar classic, british, club, car Cars jorda atioal, foreig, middle, east Coutry satur playstatio, game, arcade Video games tiger movie, martial, art, hidde, Movies drago, crouchig, chiese Table 4: Sample subspaces ad their related categories. Oe subspace is show for each search keyword. 5.1 Evaluatio Metrics To evaluate the quality of our results, the computed clusters are compared with the actual categories give i ODP. We use the commo evaluatio metrics for clusterig [12]: precisio, recall, F-measure, purity, ad etropy. Precisio, p ij = ij i ad recall, r ij = ij j compare each cluster i with each category j where ij is the umber of Web pages appear i both the cluster i ad the category j, i ad j are the umber of Web pages i the cluster i ad i the category j respectively. F-measure, F ij = 2p ij r ij p ij +r ij is a commo metric calculated similarly to the oe i IR. The F-measure of a category j is F j = max i{f ij} ad similarly the overall F-measure is: F = X j j Fj. (4) Quality of each cluster ca be calculated by purity ad etropy. Purity measures how pure is the cluster i by ρ i = max j{p ij}. The purity of the etire clusterig ca be calculated by weightig each cluster X proportioal to its size as: i ρ = ρi (5) i P where is the total umber of Web pages. The etropy of a cluster i is E i = j pij log pij. Calculatig the weighted average over all clusters gives the etire 5.2 ODP Results First we use a wrapper which seds a give search keyword to ad collects the resultig categories ad the Web pages that belog to those categories. Collected pages are categorized by their ODP categories. Next, all blocks are extracted from the collected Web pages. About 5% of the collected Web pages i ODP are defective such as icorrect or redirected urls, erroeous HTML codes, pages composed of just images, or uder costructio. For all Web search result collectios, TermRak is quite fast ad it rus i less tha a milisecod. The term overlaps i the top-200 for TF/IDF vs. TF ad TermRak are give i Figure 3. TF/IDF ad TF have sigificat overlap about 94% o the average. Whereas TermRak has overlap of oly 76%. Hece it is show TF ad TF/IDF perform very similarly (almost i ±3% rage) as detailed i Table 3. Overall performaces of TF ad TF/IDF for both K-meas ad SCuBA are very close. O the other had, TermRak have performed substatially better for both K-meas (purity +9% ad F-measure +11%) ad SCuBA (purity +4% ad F-measure +5%). The overlaps i the terms geerated by TermRak are the lowest i jaguar ad jorda data sets as show i Figure 3. Discrimiatig terms of the categories are well idetified ad properly raked hece oe ca see the sigificat differece betwee F-measures of TF, TF/IDF ad TermRak i Table 3. TermRak is show to be successful i degradig the commo terms. Table 5 presets some obvious commo terms ad their raks. Commo terms are ideed placed at much lower raks by TermRak tha by TF/IDF. Especially i dell, jorda, satur ad tiger data sets, TermRak very successfully reduces the raks of the commo terms. That s why TermRak performs better tha the other methods i these data sets. The quality of clusterig i a documet collectio is affected by other factors such as cotext domiace ad cotext overlap. If most of the Web pages belog to oe category, the cotext of that category domiates the others. As a cosequece, a large amout of the feature terms are geerated from the cotext of the domiatig category ad their raks are high. I that case, clusterig quality reduces sice the there is ot sufficiet terms to distiguish remaiig small categories. Apple, gold ad jorda are such examples i our data sets. Computer, Shoppig ad Coutry are the domiatig categories i these data sets respectively. I the secod case, Web pages i differet categories have

6 apple dell gold jaguar jorda satur tiger tfidf tr tfidf tr tfidf tr tfidf tr tfidf tr tfidf tr tfidf tr search >1000 > >1000 cotact > liks > > > >1000 ews > olie > > >1000 special > > copyright >1000 >1000 >1000 >1000 rights >1000 > >1000 >1000 >1000 >1000 >1000 >1000 >1000 top Table 5: Raks of some commo terms i differet data sets. tfidf ad tf refers to TF/IDF ad TermRak respectively. top-200 row specifies the umber terms raked i top-200. similar cotexts. This is ot due to the cotext ambiguity or the ambiguity of the terms i the cotext but exactly the same cotext might appear i more tha oe categories. For istace, may Web pages i Shoppig category have iformatio o gold as a material which is commo i Miig. Aother example is Sports ad Video games categories i tiger data set. Tiger Woods is the well-kow famous golf player ad golf is the mai cotext of the Tiger Woods Web pages i Sports category. However, the cotext is agai golf i the Tiger Woods video game pages i Video Games category. That is the mai reaso of the purity of tiger data set to be lower as preseted i Table 3. High quality term features geerated by TermRak ca be very useful i subspace clusterig algorithms to produce precise subspaces. To demostrate the efficacy, some of the subspaces geerated by SCuBA are preseted i Table 4. Subspaces that serve as the cotextual terms of the categories are accurately idetified from the terms geerated by TermRak. For example, i tiger data set the subspace refers to the Web pages of Crouchig Tiger, Hidde Drago, the famous movie with 4 Oscar awards i The movie is origially Chiese ad about martial arts. 6. CONCLUSION I this paper, we show the ieffectiveess of frequecy based term rakig methods such as TF ad TF/IDF for clusterig the Web search results. Istead, we provide a ovel term rakig method, TermRak which utilizes radom walks o a relatioal graph of the give Web page collectio. Our experimetal results illustrate the effectiveess of our algorithm by measurig purity, etropy ad F-measure of geerated clusters based o Ope Directory Project (ODP) data. I this work, we raked the terms without explicitly categorizig them ito the three categories; discrimiative, ambiguous ad commo terms. Future work icludes clearly separatio of terms ito these categories. By categorizig the terms, commo terms ca be easily excluded whereas discrimiative ad ambiguous terms ca be more effectively used i clusterig Web search results. 7. REFERENCES [1] N. Agarwal, E. Haque, H. Liu, ad L. Parsos. A subspace clusterig framework for research group collaboratio. Iteratioal Joural of Iformatio Techology ad Web Egieerig, 1(1):35 38, [2] F. Beil, M. Ester, ad X. Xu. Frequet term-based text clusterig. I SIGKDD, pages , New York, NY, USA, ACM Press. [3] D. Cheg, S. Vempala, R. Kaa, ad G. Wag. A divide-ad-merge methodology for clusterig. I PODS, pages , New York, NY, USA, ACM Press. [4] P. Ferragia ad A. Gulli. A persoalized search egie based o web-sippet hierarchical clusterig. I WWW, pages , New York, NY, USA, ACM Press. [5] E. Gabrilovich ad S. Markovitch. Feature geeratio for text categorizatio usig world kowledge. I Proceedigs of The Nieteeth Iteratioal Joit Coferece for Artificial Itelligece, pages , Ediburgh, Scotlad, [6] M. Gori ad A. Pucci. Itemrak: A radom-walk based scorig algorithm for recommeder egies. I IJCAI, pages , [7] S. Huag, Z. Che, Y. Yu, ad W.-Y. Ma. Multitype features coselectio for web documet clusterig. IEEE Trasactios o Kowledge ad Data Egieerig, 18(4): , [8] A. Leouski ad W. B. Croft. A evaluatio of techiques for clusterig search results. Techical Report IR-76, Uiversity of Massachusetts, Amherst, [9] T. Liu, S. Liu, Z. Che, ad W.-Y. Ma. A evaluatio o feature selectio for text clusterig. I ICML, pages , [10] R. Motwai ad P. Raghava. Radomized Algorithms. Cambridge Uiversity Press, [11] L. Page, S. Bri, R. Motwai, ad T. Wiograd. The pagerak citatio rakig: Brigig order to the web. Techical report, Staford Digital Library Techologies Project, [12] M. Rosell, V. Ka, ad J.-E. Litto. Comparig comparisos: Documet clusterig evaluatio usig two maual classificatios. I ICON, [13] G. Salto, A. Wog, ad C. S. Yag. A vector space model for automatic idexig. Commuicatios of the ACM, 18(11): , [14] A. Strehl, J. Ghosh, ad R. Mooey. Impact of similarity measures o web-page clusterig. I AAAI, pages AAAI, July [15] O. Zamir ad O. Etzioi. Grouper: a dyamic clusterig iterface to Web search results. Computer Networks, 31(11 16): , [16] H.-J. Zeg, Q.-C. He, Z. Che, W.-Y. Ma, ad J. Ma. Learig to cluster web search results. I ACM SIGIR, pages , New York, NY, USA, ACM Press.

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

Sectio 4, a prototype project of settig field weight with AHP method is developed ad the experimetal results are aalyzed. Fially, we coclude our work

Sectio 4, a prototype project of settig field weight with AHP method is developed ad the experimetal results are aalyzed. Fially, we coclude our work 200 2d Iteratioal Coferece o Iformatio ad Multimedia Techology (ICIMT 200) IPCSIT vol. 42 (202) (202) IACSIT Press, Sigapore DOI: 0.7763/IPCSIT.202.V42.0 Idex Weight Decisio Based o AHP for Iformatio Retrieval

More information

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c Advaces i Egieerig Research (AER), volume 131 3rd Aual Iteratioal Coferece o Electroics, Electrical Egieerig ad Iformatio Sciece (EEEIS 2017) Pruig ad Summarizig the Discovered Time Series Associatio Rules

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING Y.K. Patil* Iteratioal Joural of Advaced Research i ISSN: 2278-6244 IT ad Egieerig Impact Factor: 4.54 HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING Prof. V.S. Nadedkar** Abstract: Documet clusterig is

More information

Using Markov Model and Popularity and Similarity-based Page Rank Algorithm for Web Page Access Prediction

Using Markov Model and Popularity and Similarity-based Page Rank Algorithm for Web Page Access Prediction Iteratioal Coferece o Advaces i Egieerig ad Techology (ICAET'2014 March 29-30, 2014 Sigapore Usig Markov Model ad Popularity ad Similarity-based Page Rak Algorithm for Web Page Access Predictio Phyu Thwe

More information

New HSL Distance Based Colour Clustering Algorithm

New HSL Distance Based Colour Clustering Algorithm The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics

More information

SCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS

SCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS SCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS Fatih Gelgi, Srinivas Vadrevu, Hasan Davulcu Department of Computer Science and Engineering, Arizona State University, Tempe, AZ fagelgi@asu.edu,

More information

Analysis of Documents Clustering Using Sampled Agglomerative Technique

Analysis of Documents Clustering Using Sampled Agglomerative Technique Aalysis of Documets Clusterig Usig Sampled Agglomerative Techique Omar H. Karam, Ahmed M. Hamad, ad Sheri M. Moussa Abstract I this paper a clusterig algorithm for documets is proposed that adapts a samplig-based

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised

More information

Scalable Diversified Ranking on Large Graphs

Scalable Diversified Ranking on Large Graphs IEEE TRANSACTIONS ON NOWLEDGE AND DATA ENGINEERING, VOL.XXX, NO. XXX, 22 Scalable Diversified Rakig o Large Graphs Rog-Hua Li ad Jeffery Xu Yu Abstract Ehacig diversity i rakig o graphs has bee idetified

More information

Latent Visual Context Analysis for Image Re-ranking

Latent Visual Context Analysis for Image Re-ranking Latet Visual Cotext Aalysis for Image Re-rakig Wegag Zhou 1, Qi Tia 2, Liju Yag 3, Houqiag Li 1 Dept. of EEIS, Uiversity of Sciece ad Techology of Chia 1, Hefei, P.R. Chia Dept. of Computer Sciece, Texas

More information

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation 6-0-0 Kowledge Trasformatio from Task Scearios to View-based Desig Diagrams Nima Dezhkam Kamra Sartipi {dezhka, sartipi}@mcmaster.ca Departmet of Computig ad Software McMaster Uiversity CANADA SEKE 08

More information

Searching a Russian Document Collection Using English, Chinese and Japanese Queries

Searching a Russian Document Collection Using English, Chinese and Japanese Queries Searchig a Russia Documet Collectio Usig Eglish, Chiese ad Japaese Queries Fredric C. Gey (gey@ucdata.berkeley.edu) UC Data Archive & Techical Assistace Uiversity of Califoria, Berkeley, CA 94720 USA ABSTRACT.

More information

1 Graph Sparsfication

1 Graph Sparsfication CME 305: Discrete Mathematics ad Algorithms 1 Graph Sparsficatio I this sectio we discuss the approximatio of a graph G(V, E) by a sparse graph H(V, F ) o the same vertex set. I particular, we cosider

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c Iteratioal Coferece o Computatioal Sciece ad Egieerig (ICCSE 015) Harris Corer Detectio Algorithm at Sub-pixel Level ad Its Applicatio Yuafeg Ha a, Peijiag Che b * ad Tia Meg c School of Automobile, Liyi

More information

Octahedral Graph Scaling

Octahedral Graph Scaling Octahedral Graph Scalig Peter Russell Jauary 1, 2015 Abstract There is presetly o strog iterpretatio for the otio of -vertex graph scalig. This paper presets a ew defiitio for the term i the cotext of

More information

Bayesian Network Structure Learning from Attribute Uncertain Data

Bayesian Network Structure Learning from Attribute Uncertain Data Bayesia Network Structure Learig from Attribute Ucertai Data Wetig Sog 1,2, Jeffrey Xu Yu 3, Hog Cheg 3, Hogya Liu 4, Ju He 1,2,*, ad Xiaoyog Du 1,2 1 Key Labs of Data Egieerig ad Kowledge Egieerig, Miistry

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

CS 683: Advanced Design and Analysis of Algorithms

CS 683: Advanced Design and Analysis of Algorithms CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Sprig 2017 A secod course i data miig http://www.it.uu.se/edu/course/homepage/ifoutv2/vt17/ Kjell Orsbor Uppsala Database Laboratory Departmet of Iformatio Techology, Uppsala Uiversity,

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

Study on effective detection method for specific data of large database LI Jin-feng

Study on effective detection method for specific data of large database LI Jin-feng Iteratioal Coferece o Automatio, Mechaical Cotrol ad Computatioal Egieerig (AMCCE 205) Study o effective detectio method for specific data of large database LI Ji-feg (Vocatioal College of DogYig, Shadog

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

Web Text Feature Extraction with Particle Swarm Optimization

Web Text Feature Extraction with Particle Swarm Optimization 32 IJCSNS Iteratioal Joural of Computer Sciece ad Network Security, VOL.7 No.6, Jue 2007 Web Text Feature Extractio with Particle Swarm Optimizatio Sog Liagtu,, Zhag Xiaomig Istitute of Itelliget Machies,

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

A Novel Feature Extraction Algorithm for Haar Local Binary Pattern Texture Based on Human Vision System

A Novel Feature Extraction Algorithm for Haar Local Binary Pattern Texture Based on Human Vision System A Novel Feature Extractio Algorithm for Haar Local Biary Patter Texture Based o Huma Visio System Liu Tao 1,* 1 Departmet of Electroic Egieerig Shaaxi Eergy Istitute Xiayag, Shaaxi, Chia Abstract The locality

More information

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein 068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical

More information

New Results on Energy of Graphs of Small Order

New Results on Energy of Graphs of Small Order Global Joural of Pure ad Applied Mathematics. ISSN 0973-1768 Volume 13, Number 7 (2017), pp. 2837-2848 Research Idia Publicatios http://www.ripublicatio.com New Results o Eergy of Graphs of Small Order

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

BOOLEAN MATHEMATICS: GENERAL THEORY

BOOLEAN MATHEMATICS: GENERAL THEORY CHAPTER 3 BOOLEAN MATHEMATICS: GENERAL THEORY 3.1 ISOMORPHIC PROPERTIES The ame Boolea Arithmetic was chose because it was discovered that literal Boolea Algebra could have a isomorphic umerical aspect.

More information

A Kernel Density Based Approach for Large Scale Image Retrieval

A Kernel Density Based Approach for Large Scale Image Retrieval A Kerel Desity Based Approach for Large Scale Image Retrieval Wei Tog Departmet of Computer Sciece ad Egieerig Michiga State Uiversity East Lasig, MI, USA togwei@cse.msu.edu Rog Ji Departmet of Computer

More information

Text Summarization using Neural Network Theory

Text Summarization using Neural Network Theory Iteratioal Joural of Computer Systems (ISSN: 2394-065), Volume 03 Issue 07, July, 206 Available at http://www.ijcsolie.com/ Simra Kaur Jolly, Wg Cdr Ail Chopra 2 Departmet of CSE, Ligayas Uiversity, Faridabad

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Descriptive Statistics

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Descriptive Statistics ENGI 44 Probability ad Statistics Faculty of Egieerig ad Applied Sciece Problem Set Descriptive Statistics. If, i the set of values {,, 3, 4, 5, 6, 7 } a error causes the value 5 to be replaced by 50,

More information

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article Available olie www.jocpr.com Joural of Chemical ad Pharmaceutical Research, 2013, 5(12):745-749 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 K-meas algorithm i the optimal iitial cetroids based

More information

Avid Interplay Bundle

Avid Interplay Bundle Avid Iterplay Budle Versio 2.5 Cofigurator ReadMe Overview This documet provides a overview of Iterplay Budle v2.5 ad describes how to ru the Iterplay Budle cofiguratio tool. Iterplay Budle v2.5 refers

More information

Page Quality: In Search of an Unbiased Web Ranking

Page Quality: In Search of an Unbiased Web Ranking Page Quality: I Search of a Ubiased Web Rakig Jughoo Cho UCLA Computer Sciece Los Ageles, CA 90095 cho@cs.ucla.edu Robert E. Adams UCLA Computer Sciece Los Ageles, CA 90095 robadams@cs.ucla.edu ABSTRACT

More information

On (K t e)-saturated Graphs

On (K t e)-saturated Graphs Noame mauscript No. (will be iserted by the editor O (K t e-saturated Graphs Jessica Fuller Roald J. Gould the date of receipt ad acceptace should be iserted later Abstract Give a graph H, we say a graph

More information

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals UNIT 4 Sectio 8 Estimatig Populatio Parameters usig Cofidece Itervals To make ifereces about a populatio that caot be surveyed etirely, sample statistics ca be take from a SRS of the populatio ad used

More information

ISSN (Print) Research Article. *Corresponding author Nengfa Hu

ISSN (Print) Research Article. *Corresponding author Nengfa Hu Scholars Joural of Egieerig ad Techology (SJET) Sch. J. Eg. Tech., 2016; 4(5):249-253 Scholars Academic ad Scietific Publisher (A Iteratioal Publisher for Academic ad Scietific Resources) www.saspublisher.com

More information

c-dominating Sets for Families of Graphs

c-dominating Sets for Families of Graphs c-domiatig Sets for Families of Graphs Kelsie Syder Mathematics Uiversity of Mary Washigto April 6, 011 1 Abstract The topic of domiatio i graphs has a rich history, begiig with chess ethusiasts i the

More information

Research on Interest Model of User Behavior

Research on Interest Model of User Behavior 2011 Iteratioal Coferece o Computer Sciece ad Iformatio Techology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Sigapore DOI: 10.7763/IPCSIT.2012.V51.94 Research o Iterest Model of User Behavior

More information

Shadow Document Methods of Results Merging

Shadow Document Methods of Results Merging Shadow Documet Methods of Results Mergig Shegli Wu ad Fabio Crestai Departmet of Computer ad Iformatio Scieces Uiversity of Strathclyde, Glasgow, UK {s.wu,f.crestai}@cis.strath.ac.uk ABSTRACT I distributed

More information

Throughput-Delay Scaling in Wireless Networks with Constant-Size Packets

Throughput-Delay Scaling in Wireless Networks with Constant-Size Packets Throughput-Delay Scalig i Wireless Networks with Costat-Size Packets Abbas El Gamal, James Mamme, Balaji Prabhakar, Devavrat Shah Departmets of EE ad CS Staford Uiversity, CA 94305 Email: {abbas, jmamme,

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

EFFECT OF QUERY FORMATION ON WEB SEARCH ENGINE RESULTS

EFFECT OF QUERY FORMATION ON WEB SEARCH ENGINE RESULTS Iteratioal Joural o Natural Laguage Computig (IJNLC) Vol. 2, No., February 203 EFFECT OF QUERY FORMATION ON WEB SEARCH ENGINE RESULTS Raj Kishor Bisht ad Ila Pat Bisht 2 Departmet of Computer Sciece &

More information

Lower Bounds for Sorting

Lower Bounds for Sorting Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

Hashing Functions Performance in Packet Classification

Hashing Functions Performance in Packet Classification Hashig Fuctios Performace i Packet Classificatio Mahmood Ahmadi ad Stepha Wog Computer Egieerig Laboratory Faculty of Electrical Egieerig, Mathematics ad Computer Sciece Delft Uiversity of Techology {mahmadi,

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme Improvig Iformatio Retrieval System Security via a Optimal Maximal Codig Scheme Dogyag Log Departmet of Computer Sciece, City Uiversity of Hog Kog, 8 Tat Chee Aveue Kowloo, Hog Kog SAR, PRC dylog@cs.cityu.edu.hk

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

ECE4050 Data Structures and Algorithms. Lecture 6: Searching

ECE4050 Data Structures and Algorithms. Lecture 6: Searching ECE4050 Data Structures ad Algorithms Lecture 6: Searchig 1 Search Give: Distict keys k 1, k 2,, k ad collectio L of records of the form (k 1, I 1 ), (k 2, I 2 ),, (k, I ) where I j is the iformatio associated

More information

Dynamic Programming and Curve Fitting Based Road Boundary Detection

Dynamic Programming and Curve Fitting Based Road Boundary Detection Dyamic Programmig ad Curve Fittig Based Road Boudary Detectio SHYAM PRASAD ADHIKARI, HYONGSUK KIM, Divisio of Electroics ad Iformatio Egieerig Chobuk Natioal Uiversity 664-4 Ga Deokji-Dog Jeoju-City Jeobuk

More information

Towards Compressing Web Graphs

Towards Compressing Web Graphs Towards Compressig Web Graphs Micah Adler Λ Uiversity of Massachusetts, Amherst Michael Mitzemacher y Harvard Uiversity Abstract We cosider the problem of compressig graphs of the lik structure of the

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe

More information

Stone Images Retrieval Based on Color Histogram

Stone Images Retrieval Based on Color Histogram Stoe Images Retrieval Based o Color Histogram Qiag Zhao, Jie Yag, Jigyi Yag, Hogxig Liu School of Iformatio Egieerig, Wuha Uiversity of Techology Wuha, Chia Abstract Stoe images color features are chose

More information

Τεχνολογία Λογισμικού

Τεχνολογία Λογισμικού ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Τεχνολογία Λογισμικού, 7ο/9ο εξάμηνο 2018-2019 Τεχνολογία Λογισμικού Ν.Παπασπύρου, Αν.Καθ. ΣΗΜΜΥ, ickie@softlab.tua,gr

More information

Song Recommendation for Social Singing Community

Song Recommendation for Social Singing Community Sog Recommedatio for Social Sigig Commuity Kuag Mao 1 Ju Fa 2 Lida Shou 1 Gag Che 1 Moha Kakahalli 2 1 College of Computer Sciece, Zhejiag Uiversity, Hagzhou, Chia 2 School of Computig, Natioal Uiversity

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

Lecture 2: Spectra of Graphs

Lecture 2: Spectra of Graphs Spectral Graph Theory ad Applicatios WS 20/202 Lecture 2: Spectra of Graphs Lecturer: Thomas Sauerwald & He Su Our goal is to use the properties of the adjacecy/laplacia matrix of graphs to first uderstad

More information

arxiv: v2 [cs.ds] 24 Mar 2018

arxiv: v2 [cs.ds] 24 Mar 2018 Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves

More information

A Note on Least-norm Solution of Global WireWarping

A Note on Least-norm Solution of Global WireWarping A Note o Least-orm Solutio of Global WireWarpig Charlie C. L. Wag Departmet of Mechaical ad Automatio Egieerig The Chiese Uiversity of Hog Kog Shati, N.T., Hog Kog E-mail: cwag@mae.cuhk.edu.hk Abstract

More information

New Fuzzy Color Clustering Algorithm Based on hsl Similarity

New Fuzzy Color Clustering Algorithm Based on hsl Similarity IFSA-EUSFLAT 009 New Fuzzy Color Clusterig Algorithm Based o hsl Similarity Vasile Ptracu Departmet of Iformatics Techology Tarom Compay Bucharest Romaia Email: patrascu.v@gmail.com Abstract I this paper

More information

One advantage that SONAR has over any other music-sequencing product I ve worked

One advantage that SONAR has over any other music-sequencing product I ve worked *gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig

More information

Solutions for Homework 2

Solutions for Homework 2 Solutios for Homework 2 IIR Book: Exercise.2 (0.5 ) Cosider these documets: Doc breakthrough drug for schizophreia Doc 2 ew schizophreia drug Doc 3 ew approach for treatmet of schizophreia Doc 4 ew hopes

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

Performance Comparisons of PSO based Clustering

Performance Comparisons of PSO based Clustering Performace Comparisos of PSO based Clusterig Suresh Chadra Satapathy, 2 Guaidhi Pradha, 3 Sabyasachi Pattai, 4 JVR Murthy, 5 PVGD Prasad Reddy Ail Neeruoda Istitute of Techology ad Scieces, Sagivalas,Vishaapatam

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Performance Plus Software Parameter Definitions

Performance Plus Software Parameter Definitions Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios

More information

Identification of the Swiss Z24 Highway Bridge by Frequency Domain Decomposition Brincker, Rune; Andersen, P.

Identification of the Swiss Z24 Highway Bridge by Frequency Domain Decomposition Brincker, Rune; Andersen, P. Aalborg Uiversitet Idetificatio of the Swiss Z24 Highway Bridge by Frequecy Domai Decompositio Bricker, Rue; Aderse, P. Published i: Proceedigs of IMAC 2 Publicatio date: 22 Documet Versio Publisher's

More information

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure

More information

Security of Bluetooth: An overview of Bluetooth Security

Security of Bluetooth: An overview of Bluetooth Security Versio 2 Security of Bluetooth: A overview of Bluetooth Security Marjaaa Träskbäck Departmet of Electrical ad Commuicatios Egieerig mtraskba@cc.hut.fi 52655H ABSTRACT The purpose of this paper is to give

More information

Which movie we can suggest to Anne?

Which movie we can suggest to Anne? ECOLE CENTRALE SUPELEC MASTER DSBI DECISION MODELING TUTORIAL COLLABORATIVE FILTERING AS A MODEL OF GROUP DECISION-MAKING You kow that the low-tech way to get recommedatios for products, movies, or etertaiig

More information

VALIDATING DIRECTIONAL EDGE-BASED IMAGE FEATURE REPRESENTATIONS IN FACE RECOGNITION BY SPATIAL CORRELATION-BASED CLUSTERING

VALIDATING DIRECTIONAL EDGE-BASED IMAGE FEATURE REPRESENTATIONS IN FACE RECOGNITION BY SPATIAL CORRELATION-BASED CLUSTERING VALIDATING DIRECTIONAL EDGE-BASED IMAGE FEATURE REPRESENTATIONS IN FACE RECOGNITION BY SPATIAL CORRELATION-BASED CLUSTERING Yasufumi Suzuki ad Tadashi Shibata Departmet of Frotier Iformatics, School of

More information

Unsupervised Discretization Using Kernel Density Estimation

Unsupervised Discretization Using Kernel Density Estimation Usupervised Discretizatio Usig Kerel Desity Estimatio Maregle Biba, Floriaa Esposito, Stefao Ferilli, Nicola Di Mauro, Teresa M.A Basile Departmet of Computer Sciece, Uiversity of Bari Via Oraboa 4, 7025

More information

Clustering and Classifying Diabetic Data Sets Using K-Means Algorithm

Clustering and Classifying Diabetic Data Sets Using K-Means Algorithm Article ca be accessed olie at http://www.publishigidia.com Clusterig ad Classifyig Diabetic Data Sets Usig K-Meas Algorithm M. Kothaiayaki*, P. Thagaraj** Abstract The k-meas algorithm is well kow for

More information

MOTIF XF Extension Owner s Manual

MOTIF XF Extension Owner s Manual MOTIF XF Extesio Ower s Maual Table of Cotets About MOTIF XF Extesio...2 What Extesio ca do...2 Auto settig of Audio Driver... 2 Auto settigs of Remote Device... 2 Project templates with Iput/ Output Bus

More information

Mobile terminal 3D image reconstruction program development based on Android Lin Qinhua

Mobile terminal 3D image reconstruction program development based on Android Lin Qinhua Iteratioal Coferece o Automatio, Mechaical Cotrol ad Computatioal Egieerig (AMCCE 05) Mobile termial 3D image recostructio program developmet based o Adroid Li Qihua Sichua Iformatio Techology College

More information

Social-P2P: An Online Social Network Based P2P File Sharing System

Social-P2P: An Online Social Network Based P2P File Sharing System 1.119/TPDS.214.23592, IEEE Trasactios o Parallel ad Distributed Systems 1 : A Olie Social Network Based P2P File Sharig System Haiyig She*, Seior Member, IEEE, Ze Li, Studet Member, IEEE, Kag Che Abstract

More information

Keywords Software Architecture, Object-oriented metrics, Reliability, Reusability, Coupling evaluator, Cohesion, efficiency

Keywords Software Architecture, Object-oriented metrics, Reliability, Reusability, Coupling evaluator, Cohesion, efficiency Volume 3, Issue 9, September 2013 ISSN: 2277 128X Iteratioal Joural of Advaced Research i Computer Sciece ad Software Egieerig Research Paper Available olie at: www.ijarcsse.com Couplig Evaluator to Ehace

More information

Rapid Frequent Pattern Growth and Possibilistic Fuzzy C-means Algorithms for Improving the User Profiling Personalized Web Page Recommendation System

Rapid Frequent Pattern Growth and Possibilistic Fuzzy C-means Algorithms for Improving the User Profiling Personalized Web Page Recommendation System Received: November 21, 2017 237 Rapid Frequet Patter Growth ad Possibilistic Fuzzy C-meas Algorithms for Improvig the User Profilig Persoalized Web Page Recommedatio System Sipra Sahoo 1 * Bikram Kesari

More information

An Improvement of the Basic El-Gamal Public Key Cryptosystem

An Improvement of the Basic El-Gamal Public Key Cryptosystem Iteratioal Joural of Computer Applicatios Techology ad Research A Improvemet of the Basic El-Gamal Public Key Cryptosystem W.D.M.G.M. Dissaayake (PG/MPhil/2015/09 Departmet of Computer Egieerig Faculty

More information

Improved Random Graph Isomorphism

Improved Random Graph Isomorphism Improved Radom Graph Isomorphism Tomek Czajka Gopal Paduraga Abstract Caoical labelig of a graph cosists of assigig a uique label to each vertex such that the labels are ivariat uder isomorphism. Such

More information

Ontology-based Decision Support System with Analytic Hierarchy Process for Tour Package Selection

Ontology-based Decision Support System with Analytic Hierarchy Process for Tour Package Selection 2017 Asia-Pacific Egieerig ad Techology Coferece (APETC 2017) ISBN: 978-1-60595-443-1 Otology-based Decisio Support System with Aalytic Hierarchy Process for Tour Pacage Selectio Tie-We Sug, Chia-Jug Lee,

More information