Automatic Recommender System over CMS stored content

Size: px
Start display at page:

Download "Automatic Recommender System over CMS stored content"


1 Automatic Recommender System over CMS stored content Erik van Egmond Bachelor thesis Credits: 18 EC Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park XH Amsterdam Supervisor Dr. Evangelos Kanoulas Institute for Language and Logic Faculty of Science University of Amsterdam Science Park XH Amsterdam June 25, 2015

2 Abstract Visitors are browsing sites on the World Wide Web looking for content that can help them achieve their task; finding particular information, pages about their interest or products to buy. Webmasters want the visitor to find that content as quick and easy as possible, which can be achieved by recommending content. When this is done manually, the webmasters need to spend a lot of time on the recommendations. This thesis explores several options on the automation of the process by examining content recommenders offline and in situ. The recommender systems used in this project are a content-based, a visitor-based and a personalized recommender. After online evaluation online the content-based recommender proved to perform better than the manually selected recommendations. In online evaluation the visitor-based was a great improvement over the content-based recommender and the personalized a small improvement. Thus can be concluded that the usage of automated recommender systems can prove useful for webmasters. 2

3 Contents 1 Introduction 4 2 Relevant work 5 3 Method & Approach Establishing the golden standard Content-based recommendations Visitor-based recommendations Personalized recommendations Results & Evaluation Evaluations methods Comparing to the golden standard Click Through Rate Hypothetical Click Through Rate Team-Draft Interleaving Personalized recommendations Evaluation of the results Performance of the golden standard Performance of the content-based recommender Performance of the visitor-based recommender Performance of the personalized recommender Team-Draft Interleaving: Content-Based - Golden Standard 13 5 Conclusion & Discussion 14 References 16 3

4 1 Introduction The World Wide Web is an information system mainly composed of interlinked hypertext documents (called web pages) which are primarily text documents formatted and annotated with the Hypertext Markup Language (HTML). In contrast to using hand-written and hand-maintained static HTML pages, currently, most websites use Content Management Systems (CMS), which helps store and manage the content on a website and generate the HTML page when the content is requested. This enables webmasters to maintain a website without the requirement of experience at building a website. By using a CMS, content is deployed to the web and becomes accessible to web visitors who come to a site in order to find specific content of their interest or information needs. The webmaster can assist visitors by recommending other pages depending on the current page visitors are viewing. Manual recommendations have many shortcomings, in particular they are inefficient to build. To be able to provide a recommendation, content information and user context is required for all available pages and all visitors of a site. Smaller sites, such as a blog with less than hundred documents, do not encounter this problem. However, many sites have a huge amount of pages; in the case of a site with thousands of documents managed by multiple people, it is no longer possible to take all that content into account. Furthermore, hand-picked documents will become outdated, when time passes new content will be added to the site and could be more relevant than older documents. If the webmaster desires to account for this problem, all recommendations are required to be updated periodically. Hence, manual recommendations are tailored towards the specific content the webmaster has in mind; not personalized to the interests of the visitors, which results in every visitor receiving the same recommendations. As each visitor has different goals on the site, the webmaster can not ensure that each visitor gets the best recommendation possible. On the World Wide Web recommendations systems are an omnipresent phenomenon, which is encountered on a wide variety of websites. On e-commerce sites such as Amazon and ebay recommender systems provide suggestions on other related products, which is has a clear goal: to sell more items. In addition it is simple to measure an appreciated suggestion; the product is bought. The entertainment industry has a large influence on recommendation systems, for example sites like Netflix, Spotify and Youtube rely on recommendation systems to keep visitors consuming their content. Netflix in particular has had a big impact with their Netflix Prize between 2007 and 2009, which encouraged research on recommender systems on behalf of their movie recommendations system. When searching the web using a search engine like Google, Bing or Yahoo a query that is partially entered recieves have several suggestions to what the entire query could be, these are provided by a recommender system. In this thesis solutions to the aforementioned shortcomings will be explored by automating the recommendation task. To achieve this goal, the following question is examined: how can high quality content recommendations be generated based on available content and user interests within the context of a single site? This question has the requirement of generating recommendations of high quality, to evaluate this several evaluation methods are explored. This thesis proposes solutions to this problem by comparing a number of recommender systems and their performances offline and in situ (in the real world 4

5 operational setting). In this research project four methods of recommending pages are proposed, implemented, evaluated, and analysed: (a) manual recommendations composed by a person who manages the site; (b) content-based recommendations generated based on document similarity calculated using the TF-IDF scoring function between the current page a visitor is looking at and all other available pages; (c) visitor-based recommendations produced on the basis of the behavior and trails of all the visitors coming to the site; and finally (d) personalized recommendations using user profiles that can find pages that are of interest for the specific visitor. Several methods of evaluation are implemented and used to assess the quality of the recommendation systems, both offline and online. In the offline evaluation the recommenders can be compared to the manual recommendations (gold standard set) and to each other. Online recommendations will be evaluated on the basis clicks in a within-subject manner. In order to eliminate as much variability as possible from the comparison we use the state-of-the-art Team-Draft Interleaving. In Section 2 this thesis will be positioned in the field by examining several related studies. Subsequently several recommendations systems will be discussed in Section 3. The evaluation of these systems is explained in Section 4.1 and explored in Section 4.2. Finally the thesis is concluded in Section 5. 2 Relevant work A considerable amount of previous studies have examined how related content can be found for a certain user. These studies cover different approaches, commonly a user profile is combined with a content-based recommender, but they differ more in the models that are used. The article by (Lops, De Gemmis, & Semeraro, 2011) provides a good overview on the topic of content based recommenders. Primarily the content is analyzed, secondly user profiles are created and lastly a filter uses the user profile to recommend content. The paper covers two state of the art content-based recommenders, a Keyword-based Vector Space Model and Semantic Analysis by using Ontologies. For user-profiling Probabilistic Methods, Naive Bayes and a Semantic analysis are discussed. These methods provide a good starting point for this research project. (Khribi, Jemni, & Nasraoui, 2008) describes how a recommender can be leveraged in an e-learning environment. Initially it creates user and content offline based on previously obtained data and the relations between pieces of content. Once the system goes online it can extract the profile of a single user and recommend new content. This approach will have problems with a cold start, as data is already required to be able to recommend. By building a topic model of the document the recommending can start solely on the content. Such a model can be made by utilizing latent Dirichlet allocation (Wilson, Chaudhury, & Lall, 2014). Temporal changes in interest are an important aspect of recommending content (Blanco-Fernández, López-Nores, Pazos-Arias, & García-Duque, 2011). If a user has recently consumed content on a particular topic it might be that the user wants something differently. (Wang & McCallum, 2006) discuss an approach that can see the change of topics over time using a non-markov 5

6 continuous-time model of topical trends. This could be applied to the the visited document by the user to see how interest change over time. Using such information common shifts in interest can then be used to recommend other users. The previous works discuss different approaches on the models used to recommend content. Once the models are built and implemented they must be assessed on their quality. For this purpose Team-Draft Interleaving was introduced by (Radlinski, Kurup, & Joachims, 2008), it provides an unbiased method to evaluate the performance of two ranked lists in situ. 3 Method & Approach The recommendations systems that will be discussed are using a collection of pages, C. This collection contains all pages that are available in the CMS. The recommender system usually is a function Recommender(d, n) = {r 1, r 2,..., r n } where d is a document in C, n (optional) is the number of desired recommended pages, and r i are pages from C d, the recommender should not recommend the input page. In the following sections several different recommender systems are discussed: (a) manual recommendations; (b) content-based recommendations; (c) visitor-based recommendations; and finally (d) personalized recommendations, which takes the visitor data instead of the current page. 3.1 Establishing the golden standard The gold standard for a Recommender(d, n) is a manually selected set of pages {r 1, r 2,..., r n }. This gold standard will be used in the evaluation process. For a page d an expert in the content selects a number of pages from C d. This results in a data set where for most documents d in C a number of handpicked documents exists that are considered to be a guarantee for a good recommendation. To retrieve the manual recommendations only the current page is needed as input, ManualRecommender(d) = {r 1, r 2,..., r n }, the actual number of manual recommendations is decided by the webmaster, thus this can not be set in this function. Note that with the manual recommendations some shortcomings exist. That means that these recommendations are not necessarily the best recommendations possible. 3.2 Content-based recommendations When recommending pages the ideal would be read the mind of the visitor, which is unfortunately impossible. The most direct piece of information that is available is the content the visitor is currently looking at. The assumption is that a visitor might also be interested in similar content. The content-based recommender uses the content available in the CMS to provide recommendations. This system recommends new pages based on similarity to the document that is currently being viewed, which is defined by the function: ContentRecommender(d, n) = {r 1, r 2,..., r n }. To retrieve similar documents TF-IDF is used. The terms used for TF are all text base properties on of a document such as the title and body text. The pages with the highest 6

7 similarity to the current page are recommended to the visitor. The similarity between two document is calculated using the cosine similarity, which is calculated by the angle between two document vectors. Two documents with similar frequencies of words which results in a small angle, even if the length of the documents varies greatly. 3.3 Visitor-based recommendations Using the content of one page to recommend provides a small insight in what the user might want to read next, to improve the predictability of the current visitor the behavior of all the visitors can be analyzed. By analyzing the collective behavior, predictions can be made what content is related to the current document. Therefore, in the case of the visitor behavior based recommendations the behavior of all visitors is considered. For each page d C a collection of all pages is created that are visited in combination with page d. Two pages are visited in combination with each other when they occur in the same session. Sessions are defined as a set of pages, where the length of a session is at least two, that are visited by one unique visitor with no more than 30 minutes between consecutive requests. Each visitor has a list of pages and timestamps; [(p 1, t 1 ), (p 2, t 2 ),..., (p n, t n )]. Each session is a list of pages where (t n t n 1 ) < 30 minutes, sessions can be more than 30 minutes long, as long as pages are being requested by the visitor. For the purpose of not including bots, which are visitors that are not human, in the sessions the length of a session is limited to 50 requests. Each page d C has a collection of other pages and the number of occurrences. Recommendations can be retrieved from this collection. A straightforward way would be to recommend the page that has the highest count, i.e. the page that most visitors visited in combination with the current page. This has the problem that very general pages have high counts; almost all visitors have visited the home page as well. However, recommending the homepage is not useful even if the visitor has not visited this page in the session. To prevent this problem TF-IDF is used. In this approach a document is a page with the collection. Using TF-IDF a score is calculated for each page in the collection. The recommender can then retrieve the pages with the highest similarity score compared to the current page; V isitorrecommender(d, n) = {r 1, r 2,..., r n }. 3.4 Personalized recommendations By analyzing the collective behavior, the individual preferences are not captured. A recommendation that is good for one visitor might be uninteresting for another. To improve the individual recommendations the individual behavior must be analyzed. Unlike the previous three recommendation methods the personalized recommender does not take the current page into account. Instead known visited pages of the current visitor are considered, for each visitor the visited pages are clustered using k-means. The centroids of the clusters represent the typical distribution of words for documents in that cluster, this distribution is used to generate pseudo-documents. For these pseudo-documents similar documents from the CMS can be retrieved using the previously discussed 7

8 content-based recommender. Pseudo-documents are generated by distribution, rather than combining all content; the combination of many documents will become a very large document that some systems, will cut off to save memory. Normally this is not a problem as a document is probably about one subject and as such effect on the distribution will be acceptable. But if after the truncation many visited documents are missing the recommender will not work as intended. The number of clusters is configurable; one cluster will result in one pseudodocument for the visited content, more clusters will create a simple form of a topic model. Without any modification to this approach each page has the same recommendations if the behavior is the same. 4 Results & Evaluation 4.1 Evaluations methods Several methods of evaluating the data are used to analyze the effects of the constructed recommender systems. All recommendation systems, that use the current page as input, are compared to the golden standard. Furthermore, the content-based recommender is evaluated using Team-Draft Interleaving by comparing, again, to the golden standard. Besides comparing to the manual recommendations, the Click Through Rate is analyzed to determine if the recommendations proved useful. These methods provide some real word data on the performance of the recommender system. Finally the recommenders are evaluated by a variant of the Click Through Rate. Only pages that occur in a session are considered in the evaluation, it is presumed that if a page is not in a session the visitor had no intention of visiting the site Comparing to the golden standard Both the content-based as the visitor based recommender systems are evaluated by comparing to the gold standard. For each document d C a relatively high number of recommendations (10) is retrieved. Normally ten recommendations would be too much for a user due to a cluttering effect on the site, the average is just over 3. However, this larger number enables the calculation of the recall and precision to a greater depth and can provide a better insight in the performance of the different systems. Recall is defined by equation 1, whereas the precision is calculated by equation 2. The precision and recall can be plotted in one plot, with the recall on the x-axis and the precision on the y-axis, as seen in Figure 1 (Page 13). Generally the precision starts high with a low recall and slowly drops while the recall increases. This plot is generated by calculating the precision and recall for one to ten recommendations separately and plotting the results. Recall = P recision = ManualRecommender(d) Recommender(d) M anualrecommender(d) ManualRecommender(d) Recommender(d) Recommender(d) (1) (2) 8

9 For all requests with manual recommendations new recommendations generated based on the recommendation model. This is done, rather than on a page basis, to make sure that pages that are visited more are weighed more. If a page that is only visited once would have the same weight as a page that is visited 100 times the results would not be realistic compared to a real life setting Click Through Rate The measure determined by comparison to the gold standard might not result in the best scores for the best recommenders, as the manual recommendation themselves might not be optimal, other methods of evaluation must be explored. The first of these methods is the Click Through Rate (CTR), which is the percentage of impressions that resulted in at least one click. One impression is one time a set of recommendations is provided to the visitor and at least one click is registered. Clicks are determined by the visit of one of the recommended pages the current page within the same session; so when p 1 is recommended and p 1 is visited after the current page, a click is registered Hypothetical Click Through Rate As not all recommenders can be evaluated in a live environment with actual visitors, an alternative to the Click Through Rate is proposed. Using a similar method to determining the CTR a hypothetical CTR (hctr) can be calculated: instead of actually providing the recommendations to the visitor, recommendations are generated for each page in the request logs that also has manual recommendations, and are evaluated on possible clicks. It is assumed that when a visitor visited a page that would have been suggested, the recommendation would be a good recommendation. Hence, this method provides a way to evaluate the generated recommendations without the need of a deployment Team-Draft Interleaving The previous methods assume that one recommender provides several pages and that the visitor will pick one of them. This will provide results that can explain which system recommends the best pages, however, it is not possible to directly compare recommenders. Despite the fact that one recommender performs good, it is unknown if another recommender can outperform the previous. To be able to determine which system has the preference of the user both system should be presented at the same time, a state-of-the-art approach is Team-Draft Interleaving. Team-Draft Interleaving is a method that merges two ranked lists based on random coin flips. The two ranked lists are generated by a recommender. Each time a page is clicked the system that originally recommended this page receives a point. Over time the recommender with the most points has received the most clicks will be perceived as the best recommender of the two. Suppose there are two generated lists of recommendations; A = {p 1, p 2, p 3, p 4 } and B = {p 4, p 2, p 5, p 6 }, as seen in Table 1. A simple method of interleaving would be to take a page from A, take a page from B, take a page from A, etc. If a page was already added, take the next page. This would always result in the following list: L = {p 1, p 4, p 2, p 5 }. The problem with this list is that page 9

10 A i is always one rank higher in the merged list than page B i, which can result in a bias towards recommender A. To overcome this problem a stochastic variation is introduced. Each round a coin is flipped; if it is heads, list A is merged before B, if it is tails the reverse. An example sequence of coin tosses would be tails first and secondly heads which would result in the following merged list: L = {p 4, p 1, p 2, p 5 }. The first item to be added to the list is p 4 from B, followed by p 1 from A. As the second coin is heads p 2 is added from A. Now an item from list B must be added, however, the next item is p 2, which was added earlier so the next item from the list is chosen: p 5. List A List B Simple Interleave TDI (TH) TDI (HT) TDI (TT) TDI (HH) p 1 p 4 p 1 p 4 p 1 p 4 p 1 p 2 p 1 p 4 p 1 p 4 p 1 p 4 p 3 p 5 p 2 p 2 p 5 p 5 p 2 p 4 p 6 p 5 p 5 p 2 p 2 p 5 Table 1: Example of interleaving with 2 teams Using this approach page A i is befor B i in 50% of the merges, which will cancel the bias towards either one of the lists. As the bias is removed from the merged list the recommenders have to be scored to be able to evaluate the results. The scoring of the results from Team-Draft Interleaving can be done by simply counting each hit. A click on a link can be counted in several ways; (a) each item gets assigned to the team it has the highest occurrence; (b) each item gets assigned to the team where it has an occurrence; (c) each item gets assigned to the team where it has an occurrence, if it occurs in both it is counted half. After evaluation the recommender that is able to recommend most pages should have the highest score, hence option b is chosen. Option c is not chosen as it devalues options that are recommended by both, however, that is not relevant in this case Personalized recommendations Evaluating the personalized recommender is a challenge as in the current scope of the project there is no option for the recommender to be tested on actual visitors. Therefore the recommender should be evaluated on data that has been gathered before. The way the evaluation is done by verifying if the recommender can recommend documents that the visitor would have visited on their own. If a visitor visit such a page, the recommender has a hit, otherwise a miss. To verify using this method all sessions of all visitors are gathered. The training set consists of the first 70% of the sessions of each user. The test set is the remainder of the sessions. For each visitor, using the first 70% of the sessions, a list of recommended pages is generated. Subsequently, using the remaining 30% the recommendations are verified. For each recommended page a hit is counted if that pages was visited in the remaining 30% of the sessions. A miss is recorded if the recommended page is not visited. In the evaluation of the personalized recommender only visitors with more than 10 request are considered. This is done so that there is enough data to create the topic models for the user. 10

11 4.2 Evaluation of the results The recommenders that are examined during this thesis are evaluated following the methods discussed in Section 4.1. For all automated recommenders three recommendations per page are used to calculate the recall of the manual recommendations, hctrs and CTRs as this is approximately the average number of recommendations provided by editors. See Table 2 for a quick overview of the results, the continuation of this section will go in more depth. As far as applicable each recommender is evaluated on the comparison against the golden standard (Section 4.1.1), the hypothetical Click Through Rate (Section 4.1.3) and the Click Through Rate (Section 4.1.2). The personalized recommender has a different approach to calculate the CTR as described in Section Finally Team-Draft Interleaving is used to compare content-based recommender directly with the manual recommendations, see Section Manual Content-based Recall of manual recommendations 100% 11.2% 30.3% Hypothetical Click Through Rate - 5.1% 44.7% 9.6% 13.9% 11.7% 11.7% 12.3% Click Through Rate online 14.33% 21% Visitor-based Personalized (1 cluster) Personalized (2 clusters) Personalized (3 clusters) Personalized (4 clusters) Personalized (5 clusters) Table 2: Summary of the results Performance of the golden standard Initially the performance of the golden standard must be established. This performance can later be used to compare the automated recommendations. During a period of one month there were 5340 impressions with a total of pages recommended, that is an average of 3.3 recommendations per page with a recommendation block. This average will be used as the number of default recommendations for the other methods. During this period 765 impressions resulted in at least one click, which is 14.33% of the impressions. In total 966 recommendations where clicked, a click through rate of 5.5% Performance of the content-based recommender The content-based recommender is evaluated both online as offline. This provides a way to compare online evaluation with offline evaluation. Comparison against the golden standard: First the content-based recommendations are compared to the manual recommendations. There are 5226 pages with recommendations with a total of recommended pages, of which 1631 pages are in the manually selected pages as well, this is a recall of 11.2%. Using the precision and recall a Precision-Recall curve can be plotted, see Figure 1. This plot shows the curve for the visitor-based recommender as well. The curve for the content-based recommender starts with an increasing slope, 11

12 this is because the first pages recommended are usually not in the manually recommended pages, thus having a low precision. Hypothetical Click Through Rate: For each request with manual recommendations the content-based recommender provides the automated recommendations. There were 4868 of such requests, 247 of those were later visited, an of hctr 5.1%. Online Click Through Rate: For a period of 6 days the content-based recommender has made 1736 impressions of which 369 resulted in at least one click; a click through rate of 21%. Online, the content-based recommender reaches a high CTR of 21%, this is an improvement over the manual recommendations. The recall of the manual recommendations of 11.2% suggests that the manual recommendations might not be the best way to suggest content as only a small amount of these recommendations also where recommended using the content-based Performance of the visitor-based recommender As the visitor-based recommender is trained on the request data and tested on the same request data the data set is split in a training set of 70% and a test set of 30%. The data used are sessions as the pages are recommended based on the occurrence in the same session. The total size of the test set is 2980 sessions. Using the training set a recommendation model is created following the method described in Section 3.3. Once the model is generated for each session in the test set the pages that should have recommendations are located, for which the recommendations are generated using the model. Comparison against the golden standard: There are 5227 requests that had manual recommendations, with a total of recommended pages. For these requests new recommendations are generated based on the recommendation model. For all impressions 4092 of the generated recommendations where the same as the manual selected pages, that is a recall of 30.3%. Using the precision and recall a Precision-Recall curve can be plotted, see Figure 1. This plot shows the curve for the content-based recommender as well. Hypothetical Click Through Rate: For 1405 impressions recommendations are generated, of which 628 impressions received at least one click; a hctr of 44.7%. Online Click Through Rate: This recommender was not deployed to a live environment. As the visitor-based recommender was not deployed only results from offline evaluation are available, there a very high hypothetical click through rate of 44.7% was found. The recall of the golden standard was an improvement over the content-based recommender as well. It should be noted that although the hctr is high, when deployed the CTR will probably be significantly lower; even if the recommendations are perfect, not all visitors will follow the recommendation. In Figure 1 the visitor-based curve is shown higher than the content-based curve, indicating a better performance. This is in accordance with the result when only 3 pages are recommended. 12

13 Figure 1: Precision-Recall plots for the content- and visitor-based recommenders and their ability to mimic the golden standard Performance of the personalized recommender For the personalized evaluation 366 visitors, who have more than 10 requests, are analyzed. For each of these five topic models, consisting of one to five clusters, are created, which are used to recommend pages as described in Section 3.4. Subsequently the pages are evaluated by determining which are actually visited in the test set. Comparison against the golden standard: The personalized recommender provides recommendations based on previous behaviour of the visitor; the current page is not used to provide recommendations. Hence, it does not make sense to compare to the golden standard. Hypothetical Click Through Rate: There are 5 different settings evaluated for the personalized recommendations. The average hctr is 11.9%, more details can be found in Table 3. That the topic model with one cluster is lower than the others, while they are quite close to each other, might be an indication that there are usually only 2 topics in the previously visited documents. Online Click Through Rate: The personalized recommender was not deployed. Although only the measure used was the hctr the results seem to position the performance of this recommender slightly above the content-based recommender Team-Draft Interleaving: Content-Based - Golden Standard Lastly the content-based recommender is compared to the golden standard using Team-Draft Interleaving. In a time period of 14 days there have been 1723 impressions of the recommendations, of which there have been 6104 pages recommended using the content-based recommender and 4692 pages from the golden standard with a total of 8764 pages, as pages can belong to both teams. The 13

14 K Impressions Impressions followed CTR % % % % % Table 3: Results for the personalized recommender large difference between the amount of pages from the content-based and the gold standard is due to the fact that not all pages with recommendations have pages from the golden standard. After the trail the content-based recommender has won 61% of the impressions and the manual recommendations 39%. Of the recommended pages 286 were followed from the content-based, 247 from the golden standard and 354 in total, this results in the click through rates of 4.7% for the content-based, 5.3% for the golden standard. Most impressions were won by the context-based recommender and overall had more points. This suggests that the content-based recommendations are better than the manually selected recommendations, which is in line with the results from the separate trials. 5 Conclusion & Discussion As seen in the results of the manually selected recommendations, the impact of such systems can be really positive, which underlines the importance of good recommendations and the benefit of automating the process. Using several methods this automation has been explored and evaluated. Firstly the content-based recommender was tested and evaluated and yielded promising results. Both in the separate trial as in the Team-Draft Interleaving trial the content-based performed better than the manually selected recommendations. However, in the offline evaluation the recommender performs much worse. This might be a result from the fact that it almost only registers clicks for content that is manually recommended, of which the recall is only 11.25%. The visitor-based recommender performs really well in the offline evaluation with a recall of the manually selected recommendations of 30.3% and a hctr of 44.7%. A reason for the extreme performance of the visitor-based recommender could be that visitors might not click all good recommendations. It is impossible to reach very high click through rates, therefore it is to be expected that the actual rates would be lower if the visitor-based recommender would be in situ. Regardless of this the offline performance offline was an improvement over the content-based recommender, which indicates that the performance will be an improvement online as well. Finally the personalized recommender was tested and evaluated, in this case only the hctr was calculated, which was higher than the content-based recommender, although not as high as the visitor-based recommender. Following the results of the examined recommender systems it can be concluded that indeed high quality content recommendations can be generated 14

15 using an automated recommender. Both usage of the available content and user interest show promising results. A possible improvement for more personalized recommendations would be to shift more toward the visitor-based recommendation by clustering similar users and using their collective behaviour to recommend content, instead of the collective behavior of all visitors or only one visitor. To improve the conclusions that can be drawn from offline evaluation the other recommenders, visitor-based and personalized, should be operating online. This will provide insight on how the hctrs can be translated to CTRs. The timespan should also be increased from a few days to a few weeks, the amount of visitors tends to fluctuate over time and to capture good measures the trials should last longer. The results of this project are likely to be replicated in other settings, although the not all site rely heavily on written content, such as e-commerce or other entertainment sites, the visitor-based recommender can be used in all settings that generate enough web-traffic. 15

16 References Blanco-Fernández, Y., López-Nores, M., Pazos-Arias, J. J., & García-Duque, J. (2011). An improvement for semantics-based recommender systems grounded on attaching temporal information to ontologies and user profiles. Engineering Applications of Artificial Intelligence, 24 (8), Khribi, M. K., Jemni, M., & Nasraoui, O. (2008). Automatic recommendations for e-learning personalization based on web usage mining techniques and information retrieval. In Advanced learning technologies, icalt 08. eighth ieee international conference on (pp ). Lops, P., De Gemmis, M., & Semeraro, G. (2011). Content-based recommender systems: State of the art and trends. In Recommender systems handbook (pp ). Springer. Radlinski, F., Kurup, M., & Joachims, T. (2008). How does clickthrough data reflect retrieval quality? In Proceedings of the 17th acm conference on information and knowledge management (pp ). Wang, X., & McCallum, A. (2006). Topics over time: a non-markov continuoustime model of topical trends. In Proceedings of the 12th acm sigkdd international conference on knowledge discovery and data mining (pp ). Wilson, J., Chaudhury, S., & Lall, B. (2014). Improving collaborative filtering based recommenders using topic modelling. In Proceedings of the 2014 ieee/wic/acm international joint conferences on web intelligence (wi) and intelligent agent technologies (iat)-volume 01 (pp ). 16

Applying Multi-Armed Bandit on top of content similarity recommendation engine

Applying Multi-Armed Bandit on top of content similarity recommendation engine Applying Multi-Armed Bandit on top of content similarity recommendation engine Andraž Hribernik Lorand Dali Dejan Lavbič University of Ljubljana

More information

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Collaborative Filtering using Euclidean Distance in Recommendation Engine Indian Journal of Science and Technology, Vol 9(37), DOI: 10.17485/ijst/2016/v9i37/102074, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Collaborative Filtering using Euclidean Distance

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

How to Find Your Most Cost-Effective Keywords

How to Find Your Most Cost-Effective Keywords GUIDE How to Find Your Most Cost-Effective Keywords 9 Ways to Discover Long-Tail Keywords that Drive Traffic & Leads 1 Introduction If you ve ever tried to market a new business or product with a new website,

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

The influence of caching on web usage mining

The influence of caching on web usage mining The influence of caching on web usage mining J. Huysmans 1, B. Baesens 1,2 & J. Vanthienen 1 1 Department of Applied Economic Sciences, K.U.Leuven, Belgium 2 School of Management, University of Southampton,

More information

How To Construct A Keyword Strategy?

How To Construct A Keyword Strategy? Introduction The moment you think about marketing these days the first thing that pops up in your mind is to go online. Why is there a heck about marketing your business online? Why is it so drastically

More information

Semantic Website Clustering

Semantic Website Clustering Semantic Website Clustering I-Hsuan Yang, Yu-tsun Huang, Yen-Ling Huang 1. Abstract We propose a new approach to cluster the web pages. Utilizing an iterative reinforced algorithm, the model extracts semantic

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information


ELEVATESEO. INTERNET TRAFFIC SALES TEAM PRODUCT INFOSHEETS. JUNE V1.0 WEBSITE RANKING STATS. Internet Traffic SALES TEAM PRODUCT INFOSHEETS. JUNE 2017. V1.0 1 INTERNET TRAFFIC Internet Traffic Most of your internet traffic will be provided from the major search engines. Social Media services and other referring

More information

Semantic Clickstream Mining

Semantic Clickstream Mining Semantic Clickstream Mining Mehrdad Jalali 1, and Norwati Mustapha 2 1 Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran 2 Department of Computer Science, Universiti

More information


THE FUTURE OF PERSONALIZATION IS VISUAL WHITE PAPER WHITE PAPER The Future of Personalization is Visual 1 It s hard to believe that ecommerce has been around for more than two decades, and juggernaut sites like Amazon and ebay were first launched in the

More information

google SEO UpdatE the RiSE Of NOt provided and hummingbird october 2013

google SEO UpdatE the RiSE Of NOt provided and hummingbird october 2013 google SEO Update The Rise of Not Provided and Hummingbird October 2013 Lead contributors David Freeman Head of SEO Havas Media UK Winston Burton VP, Director of SEO Havas

More information


A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

AI Dining Suggestion App. CS 297 Report Bao Pham ( ) Advisor: Dr. Chris Pollett

AI Dining Suggestion App. CS 297 Report Bao Pham ( ) Advisor: Dr. Chris Pollett AI Dining Suggestion App CS 297 Report Bao Pham (009621001) Advisor: Dr. Chris Pollett Abstract Trying to decide what to eat can be challenging and time-consuming. Google or Yelp are two popular search

More information



More information

Query Disambiguation from Web Search Logs

Query Disambiguation from Web Search Logs Vol.133 (Information Technology and Computer Science 2016), pp.90-94 Query Disambiguation from Web Search Logs Christian Højgaard 1, Joachim Sejr 2, and Yun-Gyung

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Web Page Recommender System based on Folksonomy Mining for ITNG 06 Submissions

Web Page Recommender System based on Folksonomy Mining for ITNG 06 Submissions Web Page Recommender System based on Folksonomy Mining for ITNG 06 Submissions Satoshi Niwa University of Tokyo Takuo Doi University of Tokyo Shinichi Honiden University of Tokyo National

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269} Abstract.

More information

ihits: Extending HITS for Personal Interests Profiling

ihits: Extending HITS for Personal Interests Profiling ihits: Extending HITS for Personal Interests Profiling Ziming Zhuang School of Information Sciences and Technology The Pennsylvania State University Abstract Ever since the boom of

More information

Towards a hybrid approach to Netflix Challenge

Towards a hybrid approach to Netflix Challenge Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the

More information

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul 1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given

More information

Jargon Buster. Ad Network. Analytics or Web Analytics Tools. Avatar. App (Application) Blog. Banner Ad

Jargon Buster. Ad Network. Analytics or Web Analytics Tools. Avatar. App (Application) Blog. Banner Ad D I G I TA L M A R K E T I N G Jargon Buster Ad Network A platform connecting advertisers with publishers who want to host their ads. The advertiser pays the network every time an agreed event takes place,

More information

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report Abstract The goal of influence maximization has led to research into different

More information

Tips and Guidance for Analyzing Data. Executive Summary

Tips and Guidance for Analyzing Data. Executive Summary Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Using Text Learning to help Web browsing

Using Text Learning to help Web browsing Using Text Learning to help Web browsing Dunja Mladenić J.Stefan Institute, Ljubljana, Slovenia Carnegie Mellon University, Pittsburgh, PA, USA Dunja.Mladenic@{,} Abstract Web browsing

More information

Retrieving images based on a specific place in a living room

Retrieving images based on a specific place in a living room Retrieving images based on a specific place in a living room Anouk E.M. Visser 6277209 Bachelor thesis Credits: 18 EC Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science

More information

Just in time and relevant knowledge thanks to recommender systems and Semantic Web.

Just in time and relevant knowledge thanks to recommender systems and Semantic Web. Just in time and relevant knowledge thanks to recommender systems and Semantic Web. Plessers, Ben (1); Van Hyfte, Dirk (2); Schreurs, Jeanne (1) Organization(s): 1 Hasselt University, Belgium; 2 i.know,

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

IJMIE Volume 2, Issue 9 ISSN:

IJMIE Volume 2, Issue 9 ISSN: WEB USAGE MINING: LEARNER CENTRIC APPROACH FOR E-BUSINESS APPLICATIONS B. NAVEENA DEVI* Abstract Emerging of web has put forward a great deal of challenges to web researchers for web based information

More information

Information Retrieval

Information Retrieval Information Retrieval WS 2016 / 2017 Lecture 2, Tuesday October 25 th, 2016 (Ranking, Evaluation) Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University

More information

Supervised classification of law area in the legal domain

Supervised classification of law area in the legal domain AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms

More information

Image Credit: Photo by Lukas from Pexels

Image Credit: Photo by Lukas from Pexels Are you underestimating the importance of Keywords Research In SEO? If yes, then really you are making huge mistakes and missing valuable search engine traffic. Today s SEO world talks about unique content

More information


CURZON PR BUYER S GUIDE WEBSITE DEVELOPMENT CURZON PR BUYER S GUIDE WEBSITE DEVELOPMENT Website Development WHAT IS WEBSITE DEVELOPMENT? This is the development of a website for the Internet (World Wide Web) Website development can range from developing

More information

Learn SEO Copywriting

Learn SEO Copywriting This is video 1.1 in the online course: Learn SEO Copywriting Module 1: An introduction to SEO copywriting What we ll cover in this session What is SEO? What is SEO copywriting? How does Google work Factors

More information

Top 10 pre-paid SEO tools

Top 10 pre-paid SEO tools Top 10 pre-paid SEO tools Introduction In historical terms, Google regularly updates its search algorithms judging by the previous years. Predictions for the 2016 tell us that the company work process

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada Charles

More information


SEO KEYWORD SELECTION SEO KEYWORD SELECTION Building Your Online Marketing Campaign on Solid Keyword Foundations TABLE OF CONTENTS Introduction Why Keyword Selection is Important 01 Chapter I Different Types of Keywords 02

More information

School of Computer and Information Science

School of Computer and Information Science School of Computer and Information Science CIS Research Placement Report Multiple threads in floating-point sort operations Name: Quang Do Date: 8/6/2012 Supervisor: Grant Wigley Abstract Despite the vast

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong Lam

More information

Automated Tagging for Online Q&A Forums

Automated Tagging for Online Q&A Forums 1 Automated Tagging for Online Q&A Forums Rajat Sharma, Nitin Kalra, Gautam Nagpal University of California, San Diego, La Jolla, CA 92093, USA {ras043, nikalra, gnagpal} Abstract Hashtags created

More information


CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017 CPSC 340: Machine Learning and Data Mining Recommender Systems Fall 2017 Assignment 4: Admin Due tonight, 1 late day for Monday, 2 late days for Wednesday. Assignment 5: Posted, due Monday of last week

More information

Theme Identification in RDF Graphs

Theme Identification in RDF Graphs Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France Abstract. An increasing number of RDF datasets is published

More information

Keywords: geolocation, recommender system, machine learning, Haversine formula, recommendations

Keywords: geolocation, recommender system, machine learning, Haversine formula, recommendations Volume 6, Issue 4, April 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Geolocation Based

More information

Codify: Code Search Engine

Codify: Code Search Engine Codify: Code Search Engine Dimitriy Zavelevich (zavelev2) Kirill Varhavskiy (varshav2) Abstract: Codify is a vertical search engine focusing on searching code and coding problems due to it s ability to

More information

LSI Keywords To GROW Your

LSI Keywords To GROW Your The Definitive Guidebook to LSI Keywords: LSI Keywords To GROW Your Traffic & Ranking Brought to you by LSIGraph The Definitive Guidebook to LSI Keywords: LSI Keywords To Grow Your Traffic And Ranking

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

Improving Range Query Performance on Historic Web Page Data

Improving Range Query Performance on Historic Web Page Data Improving Range Query Performance on Historic Web Page Data Geng LI Lab of Computer Networks and Distributed Systems, Peking University Beijing, China Bo Peng Lab of Computer Networks

More information

1 SEO Synergy. Mark Bishop 2014

1 SEO Synergy. Mark Bishop 2014 1 SEO Synergy 2 SEO Synergy Table of Contents Disclaimer... 3 Introduction... 3 Keywords:... 3 Google Keyword Planner:... 3 Do This First... 4 Step 1... 5 Step 2... 5 Step 3... 6 Finding Great Keywords...

More information

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis} Abstract This paper analyzes the performance of various KNNs techniques as applied to the

More information

5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction

5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007) Two types of technologies are widely used to overcome

More information

Getting the most from your websites SEO. A seven point guide to understanding SEO and how to maximise results

Getting the most from your websites SEO. A seven point guide to understanding SEO and how to maximise results Getting the most from your websites SEO A seven point guide to understanding SEO and how to maximise results About this document SEO: What is it? This document is aimed at giving a SEO: What is it? 2 SEO

More information

number of documents in global result list

number of documents in global result list Comparison of different Collection Fusion Models in Distributed Information Retrieval Alexander Steidinger Department of Computer Science Free University of Berlin Abstract Distributed information retrieval

More information

Did you know that SEO increases traffic, leads and sales? SEO = More Website Visitors More Traffic = More Leads More Leads= More Sales

Did you know that SEO increases traffic, leads and sales? SEO = More Website Visitors More Traffic = More Leads More Leads= More Sales 1 Did you know that SEO increases traffic, leads and sales? SEO = More Website Visitors More Traffic = More Leads More Leads= More Sales What is SEO? Search engine optimization is the process of improving

More information

Dental Buyers Guide 101

Dental Buyers Guide 101 Website Design and Marketing Dental Buyers Guide 101 Top Questions to Ask When Hiring a Dental Website Provider Your website is arguably the most important piece of the marketing puzzle. It s the cornerstone

More information

Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001

Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001 Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001 Leonard Paas previously worked as a senior consultant at the Database Marketing Centre of Postbank. He worked on

More information

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek Recommender Systems: Practical Aspects, Case Studies Radek Pelánek 2017 This Lecture practical aspects : attacks, context, shared accounts,... case studies, illustrations of application illustration of

More information


BUYER S GUIDE WEBSITE DEVELOPMENT BUYER S GUIDE WEBSITE DEVELOPMENT At Curzon we understand the importance of user focused design. EXECUTIVE SUMMARY This document is designed to provide prospective clients with a short guide to website

More information

Project Report. An Introduction to Collaborative Filtering

Project Report. An Introduction to Collaborative Filtering Project Report An Introduction to Collaborative Filtering Siobhán Grayson 12254530 COMP30030 School of Computer Science and Informatics College of Engineering, Mathematical & Physical Sciences University

More information

Approaches to Mining the Web

Approaches to Mining the Web Approaches to Mining the Web Olfa Nasraoui University of Louisville Web Mining: Mining Web Data (3 Types) Structure Mining: extracting info from topology of the Web (links among pages) Hubs: pages pointing

More information

Parallel learning of content recommendations using map- reduce

Parallel learning of content recommendations using map- reduce Parallel learning of content recommendations using map- reduce Michael Percy Stanford University Abstract In this paper, machine learning within the map- reduce paradigm for ranking

More information

McGill University - Faculty of Engineering Department of Electrical and Computer Engineering

McGill University - Faculty of Engineering Department of Electrical and Computer Engineering McGill University - Faculty of Engineering Department of Electrical and Computer Engineering ECSE 494 Telecommunication Networks Lab Prof. M. Coates Winter 2003 Experiment 5: LAN Operation, Multiple Access

More information

Singular Value Decomposition, and Application to Recommender Systems

Singular Value Decomposition, and Application to Recommender Systems Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation

More information


INTRODUCTION. Chapter GENERAL Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which

More information

Structured Completion Predictors Applied to Image Segmentation

Structured Completion Predictors Applied to Image Segmentation Structured Completion Predictors Applied to Image Segmentation Dmitriy Brezhnev, Raphael-Joel Lim, Anirudh Venkatesh December 16, 2011 Abstract Multi-image segmentation makes use of global and local features

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance

More information

Chapter 3 Process of Web Usage Mining

Chapter 3 Process of Web Usage Mining Chapter 3 Process of Web Usage Mining 3.1 Introduction Users interact frequently with different web sites and can access plenty of information on WWW. The World Wide Web is growing continuously and huge

More information


RESOLUTION MEDIA GOOGLE MOBILE SEARCH RANKING FACTORS FOR BETTER MOBILE SEO. What Matters in Mobile Search Today RESOLUTION MEDIA GOOGLE MOBILE SEARCH RANKING FACTORS FOR BETTER MOBILE SEO What Matters in Mobile Search Today Google Mobile Search - Ranking Factors for Better Mobile SEO Table of Contents Introduction...1

More information

The MailNinja 7-Step Success Formula For Sending Lead Generating Campaigns

The MailNinja 7-Step Success Formula For Sending Lead Generating  Campaigns The MailNinja 7-Step Success Formula For Sending Lead Generating Email Campaigns The MailNinja 7-Step Success Formula For Sending Lead Generating Email Campaigns Over the past 10 years we ve perfected

More information

Chapter I INTRODUCTION. and potential, previous deployments and engineering issues that concern them, and the security

Chapter I INTRODUCTION. and potential, previous deployments and engineering issues that concern them, and the security Chapter I INTRODUCTION This thesis provides an introduction to wireless sensor network [47-51], their history and potential, previous deployments and engineering issues that concern them, and the security

More information

Automatically Generating Queries for Prior Art Search

Automatically Generating Queries for Prior Art Search Automatically Generating Queries for Prior Art Search Erik Graf, Leif Azzopardi, Keith van Rijsbergen University of Glasgow {graf,leif,keith} Abstract This report outlines our participation

More information

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Information Processing and Management 43 (2007) 1044 1058 Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Anselm Spoerri

More information



More information

Recommender Systems 6CCS3WSN-7CCSMWAL

Recommender Systems 6CCS3WSN-7CCSMWAL Recommender Systems 6CCS3WSN-7CCSMWAL Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:

More information

Final Report - Smart and Fast Sorting

Final Report - Smart and Fast  Sorting Final Report - Smart and Fast Email Sorting Antonin Bas - Clement Mennesson 1 Project s Description Some people receive hundreds of emails a week and sorting all of them into different categories (e.g.

More information

An Exploration of Query Term Deletion

An Exploration of Query Term Deletion An Exploration of Query Term Deletion Hao Wu and Hui Fang University of Delaware, Newark DE 19716, USA, Abstract. Many search users fail to formulate queries that

More information

Next Level Marketing Online techniques to grow your business Hudson Digital

Next Level Marketing Online techniques to grow your business Hudson Digital Next Level Marketing Online techniques to grow your business. 2019 Hudson Digital Your Online Presence Chances are you've already got a web site for your business. The fact is, today, every business needs

More information

Introduction to and calibration of a conceptual LUTI model based on neural networks

Introduction to and calibration of a conceptual LUTI model based on neural networks Urban Transport 591 Introduction to and calibration of a conceptual LUTI model based on neural networks F. Tillema & M. F. A. M. van Maarseveen Centre for transport studies, Civil Engineering, University

More information

An enhanced similarity measure for utilizing site structure in web personalization systems

An enhanced similarity measure for utilizing site structure in web personalization systems University of Wollongong Research Online University of Wollongong in Dubai - Papers University of Wollongong in Dubai 2008 An enhanced similarity measure for utilizing site structure in web personalization

More information

Telkomtelstra Corporate Website Increase a Business Experience through telkomtelstra Website

Telkomtelstra Corporate Website Increase a Business Experience through telkomtelstra Website Telkomtelstra Corporate Website Increase a Business Experience through telkomtelstra Website Award for Innovation in Corporate Websites Asia Pacific Stevie Awards 2016 Table of Content Telkomtelstra Website

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

Context-based Navigational Support in Hypermedia

Context-based Navigational Support in Hypermedia Context-based Navigational Support in Hypermedia Sebastian Stober and Andreas Nürnberger Institut für Wissens- und Sprachverarbeitung, Fakultät für Informatik, Otto-von-Guericke-Universität Magdeburg,

More information

COMP6471 WINTER User-Centered Design

COMP6471 WINTER User-Centered Design COMP6471 WINTER 2003 User-Centered Design Instructor: Shahriar Ameri, Ph.D. Student: Pedro Maroun Eid, ID# 5041872. Date of Submission: Monday, March 10, 2003. (Week 9) Outline Outline... 2 ABSTRACT...3

More information

Case Study: Best Strategy To Rank Your Content On Google

Case Study: Best Strategy To Rank Your Content On Google Case Study: Best Strategy To Rank Your Content On Google SEOPressor Connect Presents: Case Study: Best Strategy To Rank Your Content On Google Copyright 2016 SEOPressor Connect All Rights Reserved 1 There

More information


AN SEO GUIDE FOR SALONS AN SEO GUIDE FOR SALONS AN SEO GUIDE FOR SALONS Set Up Time 2/5 The basics of SEO are quick and easy to implement. Management Time 3/5 You ll need a continued commitment to make SEO work for you. WHAT

More information

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Deep Character-Level Click-Through Rate Prediction for Sponsored Search Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as

More information

Digital Marketing Glossary of Basic Terms & Concepts

Digital Marketing Glossary of Basic Terms & Concepts Digital Marketing Glossary of Basic Terms & Concepts A/B Testing Testing done to compare two variations of something against a variable. Often done to test the effectiveness of marketing tactics such as

More information

Northeastern University in TREC 2009 Million Query Track

Northeastern University in TREC 2009 Million Query Track Northeastern University in TREC 2009 Million Query Track Evangelos Kanoulas, Keshi Dai, Virgil Pavlu, Stefan Savev, Javed Aslam Information Studies Department, University of Sheffield, Sheffield, UK College

More information

The Ultimate On-Page SEO Checklist

The Ultimate On-Page SEO Checklist The Ultimate On-Page SEO Checklist This on-page SEO guide provides an introduction to some of the SEO copywriting techniques I use to increase traffic to my clients websites. If you re looking to improve

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

A Survey on Information Extraction in Web Searches Using Web Services

A Survey on Information Extraction in Web Searches Using Web Services A Survey on Information Extraction in Web Searches Using Web Services Maind Neelam R., Sunita Nandgave Department of Computer Engineering, G.H.Raisoni College of Engineering and Management, wagholi, India

More information



More information

Detecting Network Intrusions

Detecting Network Intrusions Detecting Network Intrusions Naveen Krishnamurthi, Kevin Miller Stanford University, Computer Science {naveenk1, kmiller4} Abstract The purpose of this project is to create a predictive model

More information

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem Stefan Hauger 1, Karen H. L. Tso 2, and Lars Schmidt-Thieme 2 1 Department of Computer Science, University of

More information

The ebuilders Guide to selecting a Web Designer

The ebuilders Guide to selecting a Web Designer The ebuilders Guide to selecting a Web Designer With the following short guide we hope to give you and your business a better grasp of how to select a web designer. We also include a short explanation

More information

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information

More information

EBOOK. On-Site SEO Made MSPeasy Everything you need to know about Onsite SEO

EBOOK. On-Site SEO Made MSPeasy Everything you need to know about Onsite SEO EBOOK On-Site SEO Made MSPeasy Everything you need to know about Onsite SEO K SEO easy ut Onsite SEO What is SEO & How is it Used? SEO stands for Search Engine Optimisation. The idea of SEO is to improve

More information