MovieRec - CS 410 Project Report

Size: px
Start display at page:

Download "MovieRec - CS 410 Project Report"

Transcription

1 MovieRec - CS 410 Project Report Team : Pattanee Chutipongpattanakul - chutipo2 Swapnil Shah - sshah219

2 Abstract MovieRec is a unique movie search engine that allows users to search for any type of the movie that they want to watch without any prior knowledge of films. By using the most helpful reviews voted by usual moviegoers from the most popular sites like IMDB and Amazon, MovieRec does not use any pre existing categories like most of the websites and completely rely on the most helpful reviews provided by the moviegoers, which allows the user to find the most unbiased results. The user can search by using any keyword that describes the kind of movie that they want to watch and provide us a feedback on our results that can help make the search results better. In the addition, after finding the relevant results, the user can click on the name of the desired movie, and it will lead the user to Amazon Instant Video Page that allows anyone to watch trailer or buy/rent a movie. Introduction The idea of the MovieRec comes from the daily inconveniences that many users may face when it comes to all the available movie search engines. Everyone loves watching movies. The movie search engines that are popular among people have very limited functionality. The most popular movie database IMDB only allows user to search by using the movie titles. The question arises that what if the user just wants to search what type of movie he/she wants to watch. For example, fairytale or a movie with a twisted ending or similar to that. The movie recommendation search engines out there do not provide reliable results as most of them already are organized into predefined categories like Genres. However, while surfing through the websites, we can read a lots of user reviews for a movie sorted as most helpful reviews. Most helpful reviews are the ones that most of the general population agree with. So, we decided to use that as our dataset because people discuss what they like or didn t like in a movie, spoilers, important plot points into their reviews. Using the most helpful reviews allows us to find some useful keywords describing a movie that can t be predefined. Goals and Challenges MovieRec takes movie reviews from multiple websites such as Amazon and IMDB for each movie and stores it as documents by using text files. The system takes queries from the user and depending on the mode our project is running on, a relevance feedback mechanism is available for the user.

3 We faced many challenges through out the project; each phase has a problem of its own: Populating the database: Our first challenge is find an efficient way to crawl for the movies and keeping our database as wide as possible, the initial plan was to crawl the review pages and extract both the reviews and the ratings. We planned to weigh the ratings according to the usefulness of the review; however, this is where we encountered our first problem. The structure of the website we chose to extract the reviews made if difficult to extract specific elements such as ratings since the ruby crawler we used extracted the inner text of the page. We did manage to extract the most useful reviews, but we were forced to adjust our plans regarding weighing of the ratings and the score band we were planning to implement stated in the proposal. Another challenge we met was combining the crawled data from multiple websites without merging the text files manually. After crawling we ended up having two sets of reviews per movie, which is inconvenient since we are using the document structure provided by lucene, we must use one text file per document so that we can create those during indexing. We did manage to populate our database after efforts and moved on to our next goal. Implementing search and ranking function: Before utilizing assignment 3 we did research on Solr and deciding against using it. We turned to using available resources instead. Implementing the search engine and ranking function went smoothly since we took advantage of assignment 3 s codes. We analyzed what each method does and tried to modify it, the process of dissecting the assignment took time but it enabled us to understand the toolkit better. We wanted to bring something to the table so we want to create our own ranking function. This is where our challenge came in, the function used in the assignment was BM25, and we experimented with known functions simple TFIDF and BM25L. After studying multiple functions we tried combining them, that is how we drafted our function. The results were less than satisfactory, but we used the function anyways since we plan on building on it as we further develop the project. Implementing relevance feedback: As mentioned in the proposal, we planned to have a new function that differentiates our search engine from the existing projects that we often come across. We were planning to adjust query according to the feedback from the users so that a better result can be given. We were planning to have 3 options, helpful, not helpful, and watched. We wanted to include the third option because the documents that were already watched will not be deemed irrelevant to the query. We researched and found out that lucene does not offer relevance feedback so another challenge appears. We searched for other toolkits related to lucene and found LucQE, which is a toolkit allowing query expansion. Query expansion in LucQE is based on Rocchio. We tried to incorporate it into the search engine, however we were not successful, we kept on going back to LucQE multiple times throughout the development of our project but until now we received multiple bugs and could not use the toolkit. We tried implementing feedback using data structures in Java to keep track of the queries and

4 document scores, find the most recurring words for the results deemed useful and use it for query expansion for the time being and resulted in limited success. Combining the user interface and the search engine: We faced multiple challenges off the bat since the interface was not working on our computers. We simply could not get the interface up and running. EWS machines didn t allows us to download certain packages needed for the working interface. Thus, we ended up installing Linux on to our system, and tried it. There were massive confusions about which apache java version to use because it wasn't compiling. In addition, it wasn t exporting the JAVAPATH correctly. After a lot of research on where to add the JAVAPATH, and some changes in the pom file, the basic interface started running on the local host. Everything went smoothly after the interface was up and running. After researching on the process of combining them, we designed our own user interface and joined it with the search engine. Work Summarization We built MovieRec as a standard text retrieval engine. Instead of having a conventional database search like other sites which categorizes movies by tags, we use crawled movie reviews as our dataset and not tag our movies eliminating the limiting factor of what category a movie fits into. We utilized available resources and toolkits, modified parts and did research on more efficient implementations. The search results from our site is not as accurate as we hope, but we are planning to expand our dataset by crawling for more reviews and develop our ranking function to make the returned results more accurate. Related Work We are quite confident that there are websites that offers similar functions as our website, however to the best of our knowledge none of the existing projects offered feedback from the users for query expansion. Further more, we do not know any websites that base their search upon the reviews from the general public which enables searching by contents of the movies instead of the general genre filtering or conventional tagging of the movies. We did try researching websites that might be similar to our work. Jinni

5 Jinni is one of the most popular and highly recommended movie recommendation service that we surveyed from the web. It returned the most relevant results. However, they were still based on the pre-defined tags that they have assigned to each movie. So, many times we ended up finding the total non relevant results to some of the simplest queries. For the tags that are not found, we found out that Jinni only searches through the name of the movies and returns the movies that contains the query inside its name. We find that this is not efficient because more than often the movie name does not associate with the contents. Suggestmemovie Suggestmemovie is another one of the popular movie recommendation website. It does have a very useful filtering featues. However, it fails miserably for many queries like twisted plot, gory, fairytale. The results were very poor and sometimes non existence. When entered queries to find movies of a certain theme, the results returned are not accurate at all, the movies that were returned are normally not related to that theme. The search and recommendation system fails more than succeed. IMBD The most popular movie database website on the internet that completely relies on the title of the movies when it comes to searching for movies. However, the website provides a useful function of allowing the users to provide reviews for the movies and also allows upvotes and downvotes. It even sorts the most useful reviews into one page. This is a factor we took advantage of, in order to make our database the most rich and accurate, we only used the reviews from IMDB that were deemed most useful by the general public. Methods Populating database We listed the movies we would like to populate our database with and obtain the urls manually. We used the ruby crawler and the PhantomJS text scraper to perform the extraction of the texts in those pages. We decided to crawl the reviews from multiple websites because we believe that in order to get a good description of the movie and have a fair dataset, we should get reviews from both the general public, amazon customers and critics. The dataset will only then represent the movies fairly enough for the user to search keywords that are related to the movie content and the words will be found in the reviews. The crawled text files is then processed by merging the text files with similar titles to become one. Search Engine We initially planned on using Solr, which we did heavy research on, however during implementation we ran into problem with importing the packages so we decided to use lucene

6 for the time being. Therefore MovieRec utilizes a backend search engine implemented using lucene. The textfiles were indexed into document objects, a class provided by lucene. The files were stored in name, url and contents manner. We modified the program so that the url will not lead to the page it was crawled from, but the instant movie function of Amazon.com so that when the results were returned the user can directly have access to the movie be it buying or renting. The search utilizes our ranking function and returns the results as array list of ResultDocs. The code snippets were taken from the most highlighted portion of the review, only a few lines of the review containing those words were displayed in our interface. Our ranking function Relevance feedback and Query expansion We researched about implementations of relevance feedback in lucene. We studied LucQE and the module it provided (Rocchio Query Expansion) and decided to implement it. After a failed attempt we wrote a java program which once takes contents of the documents, finds words with highest frequencies, performs normalization and smoothing with general LM, using all the documents. We used this when the user is using the interactive search through the terminal. Our program will ask whether the document is relevant or not. If it is relevant topic language models obtained from the program will be used with the original query to expand the search. We also tried storing the queries in a data structure with the relevant documents deemed by the user and keeping scores according to feedback. Our relevance feedback as of right now is far from complete, however, further work will include going back and try to implement LucQE s query expansion module again. Interface After finished connecting the back end with the sample interface provided in assignment 3, we designed our own interface. Initially we were going to include a function that the picture beside the results will change according to the relevant feedback given. However we could not carry out that plan since our relevance feedback and query expansion is not yet complete. After finishing the first version of our web interface, we joined it with the search engine backend yielding MovieRec as seen below.

7 Usage and Evaluation The main function of MovieRec is that the words that were searched will actually match the contents of the movie or is a accurate description of the movie. It will not only match the title with the query word, it will also find those words in the reviews that were deemed most useful. We will test the website. The feedback was not implemented in the user interface, however it was used in the terminal interactive search.

8 For the above example, we entered the query dragon in MovieRec and Jinni. Our search engine returns the movies that have dragons in them like How to train your dragon, The Hobbit Desolation of Smaug, Shrek etc. while most of the results in Jinni returns the results that have the word dragon in the title like Green Dragon, The girl with the dragon tattoo. In the screenshot below, we searched the keyword twisted plot. For which, MovieRec returns some of the most relevant movies that has twisted plots or endings like Now you see me, The usual suspect, The sixth sense. which is quite useful, while suggestmemovie does not return anything for the same query.

9 . While it returns highly relevant results most of the times, sometimes due to the user reviews being very ambiguous, the result gets affected. When we searched for the word magic, as seen in the screenshot below one of the movies that returned was Jurrassic Park because of the ambiguous nature of Natural Language. But that is the reason why we developed the feedback system, so we can improve our search results for the queries.

10 Conclusions and Future works The we ended up diverting from many of our initial plans due to technicalities, however the final product that we ended up producing worked as we intended. MovieRec is a movie search engine that does not restrict the users to only search by traditional keywords that mostly describes themes, moods, genres, or any other filters. MovieRec allows the user to search by any keyword they can think of, from broad keywords like gory, to narrow keywords like cops or womanizers or even names of movie characters or even certain elements of the movie. We have implemented and tweaked crawling, indexing, searching and ranking from knowledge obtained from class and through assignments. Although our project has flaws, but we do intend to extend on that in further iterations. Fixing limitations We can go back and finish the proper implementation of relevance feedback and integrate it to the frontend of the application as well. We can also improve our ranking function to become more accurate. We can attempt to overcome the limiting factors we find in crawling and implementing the ratings and display the range or ratings with the movies returned. Expansions We can further expand our database by crawling reviews from more websites, even allowing the users to add reviews of movies to add to our data set. And lastly, we could also add functions that summarizes the reviews and remove redundant reviews from the data set.

11 Individual Contributions Swapnil Shah (sshah219) - Research on Solr - Crawling and populating the database - modification of text files - frontend user interface and backend search engine connection - ranking function - System setup and installation work Pattanee Chutipongpattanakul (chutipo2) - Research on LucQE - Crawling and populate database - Query expansion and relevance feedback using java data structure - Write java program to process reviews using unigram language models to find topic models - smoothing topic models - Ranking function - Web application (user interface frontend) References [1] Rubens, Neil. "LucQE [lucky] Lucene Query Expansion Module." LucQE. Computer Modelling and New Technologies, Web. 13 May [2] "Apache Solr." Apache Lucene. The Apache Software Foundation, Web. 13 May [3] "Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & More." Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & More. N.p., n.d. Web. 11 May [4] IMDb. IMDb.com, n.d. Web. 11 May [5] "Suggest Me Movies." Suggest Me Movie RSS. N.p., n.d. Web. 13 May [6] "Jinni: Find Movies, TV Shows Matching Your Taste & Watch Online." Find Movies, TV Shows Matching Your Taste and Watch Online. N.p., n.d. Web. 13 May 2014."Jinni: Find Movies, TV Shows Matching Your Taste & Watch Online." Find Movies, TV Shows Matching Your Taste and Watch Online. N.p., n.d. Web. 13 May 2014.

Codify: Code Search Engine

Codify: Code Search Engine Codify: Code Search Engine Dimitriy Zavelevich (zavelev2) Kirill Varhavskiy (varshav2) Abstract: Codify is a vertical search engine focusing on searching code and coding problems due to it s ability to

More information

Texas Death Row. Last Statements. Data Warehousing and Data Mart. By Group 16. Irving Rodriguez Joseph Lai Joe Martinez

Texas Death Row. Last Statements. Data Warehousing and Data Mart. By Group 16. Irving Rodriguez Joseph Lai Joe Martinez Texas Death Row Last Statements Data Warehousing and Data Mart By Group 16 Irving Rodriguez Joseph Lai Joe Martinez Introduction For our data warehousing and data mart project we chose to use the Texas

More information

CSE 454 Final Report TasteCliq

CSE 454 Final Report TasteCliq CSE 454 Final Report TasteCliq Samrach Nouv, Andrew Hau, Soheil Danesh, and John-Paul Simonis Goals Your goals for the project Create an online service which allows people to discover new media based on

More information

News Article Matcher. Team: Rohan Sehgal, Arnold Kao, Nithin Kunala

News Article Matcher. Team: Rohan Sehgal, Arnold Kao, Nithin Kunala News Article Matcher Team: Rohan Sehgal, Arnold Kao, Nithin Kunala Abstract: The news article matcher is a search engine that allows you to input an entire news article and it returns articles that are

More information

5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction

5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007) Two types of technologies are widely used to overcome

More information

Comparative Analysis of OPACs. be of greater benefit and value given my current role as Reference and Adult Services Librarian

Comparative Analysis of OPACs. be of greater benefit and value given my current role as Reference and Adult Services Librarian Lucinda D. Mazza mazzal1@southernct.edu Information Analysis & Organization, ILS-506-S70 Professor Eino Sierpe November 23, 2009 Comparative Analysis of OPACs Upon reviewing the two assignment choices,

More information

INFO 1103 Homework Project 2

INFO 1103 Homework Project 2 INFO 1103 Homework Project 2 February 15, 2019 Due March 13, 2019, at the end of the lecture period. 1 Introduction In this project, you will design and create the appropriate tables for a version of the

More information

Recommender Systems 6CCS3WSN-7CCSMWAL

Recommender Systems 6CCS3WSN-7CCSMWAL Recommender Systems 6CCS3WSN-7CCSMWAL http://insidebigdata.com/wp-content/uploads/2014/06/humorrecommender.jpg Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:

More information

Ranking in a Domain Specific Search Engine

Ranking in a Domain Specific Search Engine Ranking in a Domain Specific Search Engine CS6998-03 - NLP for the Web Spring 2008, Final Report Sara Stolbach, ss3067 [at] columbia.edu Abstract A search engine that runs over all domains must give equal

More information

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano

More information

VISUAL RERANKING USING MULTIPLE SEARCH ENGINES

VISUAL RERANKING USING MULTIPLE SEARCH ENGINES VISUAL RERANKING USING MULTIPLE SEARCH ENGINES By Dennis Lim Thye Loon A REPORT SUBMITTED TO Universiti Tunku Abdul Rahman in partial fulfillment of the requirements for the degree of Faculty of Information

More information

Istat s Pilot Use Case 1

Istat s Pilot Use Case 1 Istat s Pilot Use Case 1 Pilot identification 1 IT 1 Reference Use case X 1) URL Inventory of enterprises 2) E-commerce from enterprises websites 3) Job advertisements on enterprises websites 4) Social

More information

ONS Beta website. 7 December 2015

ONS Beta website. 7 December 2015 ONS Beta website Terminology survey results 7 December 2015 Background During usability sessions, both moderated and online, it has become clear that users do not understand the majority of terminology

More information

Library. Guide to Searching the OPAC (Online Public Access Catalogue)

Library. Guide to Searching the OPAC (Online Public Access Catalogue) Library Guide to Searching the OPAC (Online Public Access Catalogue) Wessam El Husseini Assistant librarian for technical operations & information skills welabd@bue.edu.eg March 2012 The library owns several

More information

Search Engine Optimization (SEO) using HTML Meta-Tags

Search Engine Optimization (SEO) using HTML Meta-Tags 2018 IJSRST Volume 4 Issue 9 Print ISSN : 2395-6011 Online ISSN : 2395-602X Themed Section: Science and Technology Search Engine Optimization (SEO) using HTML Meta-Tags Dr. Birajkumar V. Patel, Dr. Raina

More information

Bixo - Web Mining Toolkit 23 Sep Ken Krugler TransPac Software, Inc.

Bixo - Web Mining Toolkit 23 Sep Ken Krugler TransPac Software, Inc. Web Mining Toolkit Ken Krugler TransPac Software, Inc. My background - did a startup called Krugle from 2005-2008 Used Nutch to do a vertical crawl of the web, looking for technical software pages. Mined

More information

iflix is246 Multimedia Metadata Final Project Supplement on User Feedback Sessions Cecilia Kim, Nick Reid, Rebecca Shapley

iflix is246 Multimedia Metadata Final Project Supplement on User Feedback Sessions Cecilia Kim, Nick Reid, Rebecca Shapley iflix is246 Multimedia Metadata Final Project Supplement on User Feedback Sessions Cecilia Kim, Nick Reid, Rebecca Shapley Table of Contents Table of Contents 2 Interviews with Users 2 Conclusions 2 Transcripts

More information

Module 9 Kelsie Donaldson Casey Boland Nitish Pahwa. IMDb, August 13th, 2002

Module 9 Kelsie Donaldson Casey Boland Nitish Pahwa. IMDb, August 13th, 2002 Module 9 Kelsie Donaldson Casey Boland Nitish Pahwa IMDb, August 13th, 2002 IMDb.com Sitemap Landing Page Navigation bar News Forums Awards Movies TV Box office Search bar Header Site title logo Footer

More information

Alyssa Grieco. Data Wrangling Final Project Report Fall 2016 Dangerous Dogs and Off-leash Areas in Austin Housing Market Zip Codes.

Alyssa Grieco. Data Wrangling Final Project Report Fall 2016 Dangerous Dogs and Off-leash Areas in Austin Housing Market Zip Codes. Alyssa Grieco Data Wrangling Final Project Report Fall 2016 Dangerous Dogs and Off-leash Areas in Austin Housing Market Zip Codes Workflow Datasets Data was taken from three sources on data.austintexas.gov.

More information

Google Organic Click Through Study: Comparison of Google's CTR by Position, Industry, and Query Type

Google Organic Click Through Study: Comparison of Google's CTR by Position, Industry, and Query Type Google Organic Click Through Study: Comparison of Google's CTR by Position, Industry, and Query Type 21 Corporate Drive, Suite 200 Clifton Park, NY 12065 518-270-0854 www.internetmarketingninjas.com Table

More information

Furl Furled Furling. Social on-line book marking for the masses. Jim Wenzloff Blog:

Furl Furled Furling. Social on-line book marking for the masses. Jim Wenzloff Blog: Furl Furled Furling Social on-line book marking for the masses. Jim Wenzloff jwenzloff@misd.net Blog: http://www.visitmyclass.com/blog/wenzloff February 7, 2005 This work is licensed under a Creative Commons

More information

Creating a Classifier for a Focused Web Crawler

Creating a Classifier for a Focused Web Crawler Creating a Classifier for a Focused Web Crawler Nathan Moeller December 16, 2015 1 Abstract With the increasing size of the web, it can be hard to find high quality content with traditional search engines.

More information

SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES

SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES Introduction to Information Retrieval CS 150 Donald J. Patterson This content based on the paper located here: http://dx.doi.org/10.1007/s10618-008-0118-x

More information

SkillSwap. A community of learners and teachers

SkillSwap. A community of learners and teachers Team: Jacob Yu Villa, Dana Murphy, Tuan Tran SkillSwap A community of learners and teachers Problem During our needfinding process, we found that many people felt discouraged about learning due to the

More information

Life, the Universe, and CSS Tests XML Prague 2018

Life, the Universe, and CSS Tests XML Prague 2018 It turns out that the answer to the ultimate question of life, the Universe, and CSS Tests isn t a number. It is, in fact, multiple numbers. It is the answers to: How many test results are correct? How

More information

Tetrus Project. By Catherine Bendebury

Tetrus Project. By Catherine Bendebury Tetrus Project By Catherine Bendebury I. Initial Task The initial problem was to take a tetrus, and cut it up into four symmetrical parts. The cuts were to be made along each arm, achieving an interlocking

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

THE WEB SEARCH ENGINE

THE WEB SEARCH ENGINE International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) Vol.1, Issue 2 Dec 2011 54-60 TJPRC Pvt. Ltd., THE WEB SEARCH ENGINE Mr.G. HANUMANTHA RAO hanu.abc@gmail.com

More information

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

Polyratings Website Update

Polyratings Website Update Polyratings Website Update Senior Project Spring 2016 Cody Sears Connor Krier Anil Thattayathu Outline Overview 2 Project Beginnings 2 Key Maintenance Issues 2 Project Decision 2 Research 4 Customer Survey

More information

Data Crow Version 2.0

Data Crow Version 2.0 Data Crow Version 2.0 http://www.datacrow.net Document version: 4.1 Created by: Robert Jan van der Waals Edited by: Paddy Barrett Last Update: 26 January, 2006 1. Content 1. CONTENT... 2 1.1. ABOUT DATA

More information

15 Minute Traffic Formula. Contents HOW TO GET MORE TRAFFIC IN 15 MINUTES WITH SEO... 3

15 Minute Traffic Formula. Contents HOW TO GET MORE TRAFFIC IN 15 MINUTES WITH SEO... 3 Contents HOW TO GET MORE TRAFFIC IN 15 MINUTES WITH SEO... 3 HOW TO TURN YOUR OLD, RUSTY BLOG POSTS INTO A PASSIVE TRAFFIC SYSTEM... 4 HOW I USED THE GOOGLE KEYWORD PLANNER TO GET 11,908 NEW READERS TO

More information

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO Lecture 7: Information Retrieval II. Aidan Hogan

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO Lecture 7: Information Retrieval II. Aidan Hogan CC5212-1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2017 Lecture 7: Information Retrieval II Aidan Hogan aidhog@gmail.com How does Google know about the Web? Inverted Index: Example 1 Fruitvale Station is a 2013

More information

The Business Of Filing Apache Derby Issues In Jira

The Business Of Filing Apache Derby Issues In Jira The Business Of Filing Apache Derby Issues In Jira Note from the author: As the current volunteer administrator of the Derby project in Jira, I intend this document to help those not very familiar with

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document

More information

Kaggle See Click Fix Model Description

Kaggle See Click Fix Model Description Kaggle See Click Fix Model Description BY: Miroslaw Horbal & Bryan Gregory LOCATION: Waterloo, Ont, Canada & Dallas, TX CONTACT : miroslaw@gmail.com & bryan.gregory1@gmail.com CONTEST: See Click Predict

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

1. Conduct an extensive Keyword Research

1. Conduct an extensive Keyword Research 5 Actionable task for you to Increase your website Presence Everyone knows the importance of a website. I want it to look this way, I want it to look that way, I want this to fly in here, I want this to

More information

DISCRETE COMPUTER SIMULATION WITH JAVA

DISCRETE COMPUTER SIMULATION WITH JAVA DISCRETE COMPUTER SIMULATION WITH JAVA Ricardo Augusto Cassel Department of Management Science - The Management School - Lancaster University Lancaster, LA1 4YX, United Kingdom Michael Pidd Department

More information

UTILIZING FORUM META-INFORMATION TO IMPROVE RELEVANCE IN FORUM DISCOVERY HAN-WEN YEH THESIS

UTILIZING FORUM META-INFORMATION TO IMPROVE RELEVANCE IN FORUM DISCOVERY HAN-WEN YEH THESIS UTILIZING FORUM META-INFORMATION TO IMPROVE RELEVANCE IN FORUM DISCOVERY BY HAN-WEN YEH THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science

More information

Build Meeting Room Management Website Using BaaS Framework : Usergrid

Build Meeting Room Management Website Using BaaS Framework : Usergrid Build Meeting Room Management Website Using BaaS Framework : Usergrid Alvin Junianto Lan 13514105 Informatics, School of Electrical Engineering and Informatics Bandung Institute of Technology Bandung,

More information

Evaluating the suitability of Web 2.0 technologies for online atlas access interfaces

Evaluating the suitability of Web 2.0 technologies for online atlas access interfaces Evaluating the suitability of Web 2.0 technologies for online atlas access interfaces Ender ÖZERDEM, Georg GARTNER, Felix ORTAG Department of Geoinformation and Cartography, Vienna University of Technology

More information

The Topic Specific Search Engine

The Topic Specific Search Engine The Topic Specific Search Engine Benjamin Stopford 1 st Jan 2006 Version 0.1 Overview This paper presents a model for creating an accurate topic specific search engine through a focussed (vertical)

More information

CSCI 320 Group Project

CSCI 320 Group Project CSCI 320 Group Project Project Description This is a semester long group project. Project Goals Group project of 3-4 students. Groups will not change after assigned. Select a project domain from the list

More information

An Intelligent Method for Searching Metadata Spaces

An Intelligent Method for Searching Metadata Spaces An Intelligent Method for Searching Metadata Spaces Introduction This paper proposes a manner by which databases containing IEEE P1484.12 Learning Object Metadata can be effectively searched. (The methods

More information

How to predict IMDb score

How to predict IMDb score How to predict IMDb score Jiawei Li A53226117 Computational Science, Mathematics and Engineering University of California San Diego jil206@ucsd.edu Abstract This report is based on the dataset provided

More information

Zurich Open Repository and Archive. Private Cross-page Movie Recommendations with the Firefox add-on OMORE

Zurich Open Repository and Archive. Private Cross-page Movie Recommendations with the Firefox add-on OMORE University of Zurich Zurich Open Repository and Archive Winterthurerstr. 190 CH-8057 Zurich http://www.zora.uzh.ch Year: 2009 Private Cross-page Movie Recommendations with the Firefox add-on OMORE Bannwart,

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

Getting started with Inspirometer A basic guide to managing feedback

Getting started with Inspirometer A basic guide to managing feedback Getting started with Inspirometer A basic guide to managing feedback W elcome! Inspirometer is a new tool for gathering spontaneous feedback from our customers and colleagues in order that we can improve

More information

In 4 Months. How to Turn Your Ecommerce Business Around A VERTICAL RAIL CASE STUDY

In 4 Months. How to Turn Your Ecommerce Business Around A VERTICAL RAIL CASE STUDY A VERTICAL RAIL CASE STUDY How to Turn Your Ecommerce Business Around In 4 Months. SockSmith.com is a local retail institution, specializing in socks, that is located throughout California. They pride

More information

GOOGLE ANALYTICS 101 INCREASE TRAFFIC AND PROFITS WITH GOOGLE ANALYTICS

GOOGLE ANALYTICS 101 INCREASE TRAFFIC AND PROFITS WITH GOOGLE ANALYTICS GOOGLE ANALYTICS 101 INCREASE TRAFFIC AND PROFITS WITH GOOGLE ANALYTICS page 2 page 3 Copyright All rights reserved worldwide. YOUR RIGHTS: This book is restricted to your personal use only. It does not

More information

Atlassian JIRA Introduction to JIRA Issue and Project Tracking Software Tutorial 1

Atlassian JIRA Introduction to JIRA Issue and Project Tracking Software Tutorial 1 Atlassian JIRA Introduction to JIRA Issue and Project Tracking Software Tutorial 1 Once again, we are back with another tool tutorial. This time it s the Issue and Project Tracking Software Atlassian JIRA.

More information

The ICT4me Curriculum

The ICT4me Curriculum The ICT4me Curriculum About ICT4me ICT4me is an after school and summer curriculum for middle school youth to develop ICT fluency, interest in mathematics, and knowledge of information, communication,

More information

The ICT4me Curriculum

The ICT4me Curriculum The ICT4me Curriculum About ICT4me ICT4me is an after school and summer curriculum for middle school youth to develop ICT fluency, interest in mathematics, and knowledge of information, communication,

More information

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018 CC5212-1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018 Lecture 6 Information Retrieval: Crawling & Indexing Aidan Hogan aidhog@gmail.com MANAGING TEXT DATA Information Overload If we didn t have search Contains

More information

Week - 01 Lecture - 04 Downloading and installing Python

Week - 01 Lecture - 04 Downloading and installing Python Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 01 Lecture - 04 Downloading and

More information

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012 Python & Web Mining Lecture 6 10-10-12 Old Dominion University Department of Computer Science CS 495 Fall 2012 Hany SalahEldeen Khalil hany@cs.odu.edu Scenario So what did Professor X do when he wanted

More information

Clickbank Domination Presents. A case study by Devin Zander. A look into how absolutely easy internet marketing is. Money Mindset Page 1

Clickbank Domination Presents. A case study by Devin Zander. A look into how absolutely easy internet marketing is. Money Mindset Page 1 Presents A case study by Devin Zander A look into how absolutely easy internet marketing is. Money Mindset Page 1 Hey guys! Quick into I m Devin Zander and today I ve got something everybody loves! Me

More information

CaseComplete Roadmap

CaseComplete Roadmap CaseComplete Roadmap Copyright 2004-2014 Serlio Software Development Corporation Contents Get started... 1 Create a project... 1 Set the vision and scope... 1 Brainstorm for primary actors and their goals...

More information

N N Sudoku Solver. Sequential and Parallel Computing

N N Sudoku Solver. Sequential and Parallel Computing N N Sudoku Solver Sequential and Parallel Computing Abdulaziz Aljohani Computer Science. Rochester Institute of Technology, RIT Rochester, United States aaa4020@rit.edu Abstract 'Sudoku' is a logic-based

More information

MiPhone Phone Usage Tracking

MiPhone Phone Usage Tracking MiPhone Phone Usage Tracking Team Scott Strong Designer Shane Miller Designer Sierra Anderson Designer Problem & Solution This project began as an effort to deter people from using their phones in class.

More information

News English.com Ready-to-use ESL / EFL Lessons 2005 was a second longer than usual

News English.com Ready-to-use ESL / EFL Lessons 2005 was a second longer than usual www.breaking News English.com Ready-to-use ESL / EFL Lessons The Breaking News English.com Resource Book 1,000 Ideas & Activities For Language Teachers http://www.breakingnewsenglish.com/book.html 2005

More information

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Creating a Recommender System. An Elasticsearch & Apache Spark approach Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused

More information

What Are CSS and DHTML?

What Are CSS and DHTML? 6/14/01 10:31 AM Page 1 1 What Are CSS and DHTML? c h a p t e r ch01.qxd IN THIS CHAPTER What Is CSS? What Is DHTML? DHTML vs. Flash Browser Hell What You Need to Know Already Welcome to the world of CSS

More information

Seminar report Google App Engine Submitted in partial fulfillment of the requirement for the award of degree Of CSE

Seminar report Google App Engine Submitted in partial fulfillment of the requirement for the award of degree Of CSE A Seminar report On Google App Engine Submitted in partial fulfillment of the requirement for the award of degree Of CSE SUBMITTED TO: SUBMITTED BY: www.studymafia.org www.studymafia.org Acknowledgement

More information

Out for Shopping-Understanding Linear Data Structures English

Out for Shopping-Understanding Linear Data Structures English Out for Shopping-Understanding Linear Data Structures English [MUSIC PLAYING] [MUSIC PLAYING] TANZEELA ALI: Hi, it's Tanzeela Ali. I'm a software engineer, and also a teacher at Superior University, which

More information

Remote Access Synchronization DL Parent

Remote Access Synchronization DL Parent Remote Access Synchronization DL Parent 205 Distance Learning Features Switched-On Schoolhouse 2008 School Edition has two optional distance learning features available: SOS Remote Access and SOS Synchronization.

More information

Novel Cognition RSSPlugIn Disclaimer

Novel Cognition RSSPlugIn  Disclaimer Novel Cognition RSSPlugIn Disclaimer: Although every effort has been made to represent this guide and everything else related to the NovCogRSS plugin, as truthful as possible to the best of our knowledge

More information

Design and Implementation of Agricultural Information Resources Vertical Search Engine Based on Nutch

Design and Implementation of Agricultural Information Resources Vertical Search Engine Based on Nutch 619 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 51, 2016 Guest Editors: Tichun Wang, Hongyang Zhang, Lei Tian Copyright 2016, AIDIC Servizi S.r.l., ISBN 978-88-95608-43-3; ISSN 2283-9216 The

More information

Reading How the Web Works

Reading How the Web Works Reading 1.3 - How the Web Works By Jonathan Lane Introduction Every so often, you get offered a behind-the-scenes look at the cogs and fan belts behind the action. Today is your lucky day. In this article

More information

Content Structure Guidelines

Content Structure Guidelines Content Structure Guidelines Motion Picture Laboratories, Inc. i CONTENTS 1 Scope... 1 1.1 Content Structure... 1 1.2 References... 1 1.3 Comments... 1 2 Tree Structure and Identification... 2 2.1 Content

More information

XP: Backup Your Important Files for Safety

XP: Backup Your Important Files for Safety XP: Backup Your Important Files for Safety X 380 / 1 Protect Your Personal Files Against Accidental Loss with XP s Backup Wizard Your computer contains a great many important files, but when it comes to

More information

TABLE OF CONTENTS CHANGES IN 2.0 FROM 1.O

TABLE OF CONTENTS CHANGES IN 2.0 FROM 1.O TABLE OF CONTENTS CHANGES IN 2.0 FROM 1.0 INTRODUCTION THE BOTTOM LINE ATTACHED FILES FONTS KEYBOARD WORD PROCESSING PROGRAMS INSTALLING FONTS INSTALLING KEYBOARDS MODIFYING KEYBOARDS TO YOUR LIKING OPEN

More information

USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM. Cédric Mesnage Southampton Solent University United Kingdom

USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM. Cédric Mesnage Southampton Solent University United Kingdom USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM Cédric Mesnage Southampton Solent University United Kingdom Abstract Musicbrainz is a crowd-sourced database of music metadata. The level 6 class of Data

More information

MSI Sakib - Blogger, SEO Researcher and Internet Marketer

MSI Sakib - Blogger, SEO Researcher and Internet Marketer About Author: MSI Sakib - Blogger, SEO Researcher and Internet Marketer Hi there, I am the Founder of Techmasi.com blog and CEO of Droid Digger (droiddigger.com) android app development team. I love to

More information

How to use search, recommender systems and online community to help users find what they want. Rashmi Sinha

How to use search, recommender systems and online community to help users find what they want. Rashmi Sinha The Quest for the "right item": How to use search, recommender systems and online community to help users find what they want. Rashmi Sinha Summary of the talk " Users have different types of information

More information

CS 206 Introduction to Computer Science II

CS 206 Introduction to Computer Science II CS 206 Introduction to Computer Science II 03 / 25 / 2013 Instructor: Michael Eckmann Today s Topics Comments/Questions? More on Recursion Including Dynamic Programming technique Divide and Conquer techniques

More information

Personalized Movie Database System

Personalized Movie Database System Grand Valley State University ScholarWorks@GVSU Technical Library School of Computing and Information Systems 2015 Personalized Movie Database System Jayaprakash Garaga Grand Valley State University Follow

More information

Homework: Building an Apache-Solr based Search Engine for DARPA XDATA Employment Data Due: November 10 th, 12pm PT

Homework: Building an Apache-Solr based Search Engine for DARPA XDATA Employment Data Due: November 10 th, 12pm PT Homework: Building an Apache-Solr based Search Engine for DARPA XDATA Employment Data Due: November 10 th, 12pm PT 1. Overview This assignment picks up where the last one left off. You will take your JSON

More information

Project Report. Team 233. Hongnian Yu, Dong Liang, Tianlei Sun, Jian Zhu California Institute of Technology Department of Electrical Engineering

Project Report. Team 233. Hongnian Yu, Dong Liang, Tianlei Sun, Jian Zhu California Institute of Technology Department of Electrical Engineering Project Report Team 233 Hongnian Yu, Dong Liang, Tianlei Sun, Jian Zhu California Institute of Technology Department of Electrical Engineering 1 Team Member & Work Split Group members: Hongnian Yu, Dong

More information

GNU OCTAVE BEGINNER'S GUIDE BY JESPER SCHMIDT HANSEN DOWNLOAD EBOOK : GNU OCTAVE BEGINNER'S GUIDE BY JESPER SCHMIDT HANSEN PDF

GNU OCTAVE BEGINNER'S GUIDE BY JESPER SCHMIDT HANSEN DOWNLOAD EBOOK : GNU OCTAVE BEGINNER'S GUIDE BY JESPER SCHMIDT HANSEN PDF GNU OCTAVE BEGINNER'S GUIDE BY JESPER SCHMIDT HANSEN DOWNLOAD EBOOK : GNU OCTAVE BEGINNER'S GUIDE BY JESPER SCHMIDT HANSEN PDF Click link bellow and free register to download ebook: GNU OCTAVE BEGINNER'S

More information

MoVis Movie Recommendation and Visualization

MoVis Movie Recommendation and Visualization MoVis Movie Recommendation and Visualization Introduction CPSC 547 Infomation Visualization Project Ye Chen clara.yechen@gmail.com Yujie Yang yangyujie.hust@gmail.com Nowadays, movies becomes a popular

More information

Blog Pro for Magento 2 User Guide

Blog Pro for Magento 2 User Guide Blog Pro for Magento 2 User Guide Table of Contents 1. Blog Pro Configuration 1.1. Accessing the Extension Main Setting 1.2. Blog Index Page 1.3. Post List 1.4. Post Author 1.5. Post View (Related Posts,

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 2:00pm-3:30pm, Tuesday, December 15th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

run your own search engine. today: Cablecar

run your own search engine. today: Cablecar run your own search engine. today: Cablecar Robert Kowalski @robinson_k http://github.com/robertkowalski Search nobody uses that, right? Services on the Market Google Bing Yahoo ask Wolfram Alpha Baidu

More information

Which application/messaging protocol is right for me?

Which application/messaging protocol is right for me? Which application/messaging protocol is right for me? Building a connected device solution calls for several design and architectural decisions. Which protocol(s) should you use to connect your devices

More information

Search Engine Architecture II

Search Engine Architecture II Search Engine Architecture II Primary Goals of Search Engines Effectiveness (quality): to retrieve the most relevant set of documents for a query Process text and store text statistics to improve relevance

More information

» How do I Integrate Excel information and objects in Word documents? How Do I... Page 2 of 10 How do I Integrate Excel information and objects in Word documents? Date: July 16th, 2007 Blogger: Scott Lowe

More information

Information Retrieval

Information Retrieval Information Retrieval Module Introduction CS6200: Information Retrieval Welcome to Information Retrieval. In this class, you ll learn about many of the exciting technologies that define life on the web

More information

HOPE Project AAL Smart Home for Elderly People

HOPE Project AAL Smart Home for Elderly People 1.1.1.1.1 HOPE Project AAL-2008-1-099 Smart Home for Elderly People D10 User Interface Mockup Report Version: 1 1.0 Report Preparation Date: 28.02.2010 Classification: Partner Responsible: Restricted I2S

More information

Graphics Performance Benchmarking Framework ATI. Presented to: Jerry Howard. By: Drew Roberts, Nicholas Tower, Jason Underhill

Graphics Performance Benchmarking Framework ATI. Presented to: Jerry Howard. By: Drew Roberts, Nicholas Tower, Jason Underhill Graphics Performance Benchmarking Framework ATI Presented to: Jerry Howard By: Drew Roberts, Nicholas Tower, Jason Underhill Executive Summary The goal of this project was to create a graphical benchmarking

More information

Blogging in a Hurry July 7, 2005

Blogging in a Hurry July 7, 2005 July 7, 2005 Table of Contents Posting Articles 2 2 Getting to the Post Article page 5 Composing your article 5 Publishing your article 6 Viewing your article Posting Photos 9 9 Photos and photo albums

More information

LAB 7: Search engine: Apache Nutch + Solr + Lucene

LAB 7: Search engine: Apache Nutch + Solr + Lucene LAB 7: Search engine: Apache Nutch + Solr + Lucene Apache Nutch Apache Lucene Apache Solr Crawler + indexer (mainly crawler) indexer + searcher indexer + searcher Lucene vs. Solr? Lucene = library, more

More information

THE ULTIMATE SEO MIGRATION GUIDE

THE ULTIMATE SEO MIGRATION GUIDE THE ULTIMATE SEO MIGRATION GUIDE How to avoid a drop in ranking when launching a new website YOU KNOW THAT IT S TIME TO REBUILD YOUR WEBSITE, THERE IS ALWAYS THAT NAGGING CONCERN THAT THE NEW WEBSITE WILL

More information

How to implement applications for Smart Devices... using GeneXus.

How to implement applications for Smart Devices... using GeneXus. 1. How to implement applications for Smart Devices... using GeneXus. 2. Let s suppose that we need to develop a simplified application for a real estate agency... 1 This real estate agency works with certain

More information

Combining Information Retrieval and Relevance Feedback for Concept Location

Combining Information Retrieval and Relevance Feedback for Concept Location Combining Information Retrieval and Relevance Feedback for Concept Location Sonia Haiduc - WSU CS Graduate Seminar - Jan 19, 2010 1 Software changes Software maintenance: 50-90% of the global costs of

More information

Problem & Solution Overview. Tasks & Final Interface Scenarios

Problem & Solution Overview. Tasks & Final Interface Scenarios Bronwyn E. Manager Clark P. Developer Bernardo V. Designer Hilary S. User Testing/Documentation Problem & Solution Overview Joynus is a mobile and location based way for users to connect through spontaneous

More information

Towards a hybrid approach to Netflix Challenge

Towards a hybrid approach to Netflix Challenge Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the

More information

Impressory Documentation

Impressory Documentation Impressory Documentation Release 0.2-SNAPSHOT William Billingsley January 10, 2014 Contents 1 Contents: 3 1.1 Courses.................................................. 3 1.2 Enrolling students............................................

More information

Sony s Open Devices Project. Goals Achievements. What went right? What went wrong? Lessons learned

Sony s Open Devices Project. Goals Achievements. What went right? What went wrong? Lessons learned 1 Sony s Open Devices Project Goals Achievements What went right? What went wrong? Lessons learned 2 Ambitious project to support open software on Sony Mobile s phone platforms 2 main areas: Android Open

More information