Computer Science 572 Midterm Prof. Horowitz Tuesday, March 12, 2013, 12:30pm 1:45pm

Similar documents
Computer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm

Computer Science 572 Exam Prof. Horowitz Tuesday, April 24, 2017, 8:00am 9:00am

Computer Science 572 Exam Prof. Horowitz Monday, November 27, 2017, 8:00am 9:00am

68A8 Multimedia DataBases Information Retrieval - Exercises

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Digital Marketing for Small Businesses. Amandine - The Marketing Cookie

Search Like a Pro. How Search Engines Work. Comparison Search Engine. Comparison Search Engine. How Search Engines Work

NBA 600: Day 15 Online Search 116 March Daniel Huttenlocher

Exam IST 441 Spring 2014

A web directory lists web sites by category and subcategory. Web directory entries are usually found and categorized by humans.

The Ultimate Digital Marketing Glossary (A-Z) what does it all mean? A-Z of Digital Marketing Translation

CMSC 476/676 Information Retrieval Midterm Exam Spring 2014

CSE 494: Information Retrieval, Mining and Integration on the Internet

CS47300 Web Information Search and Management

Computer Science 572 Exam Prof. Horowitz Wednesday, February 22, 2017, 8:00am 8:50am

Link Analysis and Web Search

Midterm Exam Search Engines ( / ) October 20, 2015

AN SEO GUIDE FOR SALONS

Mining Web Data. Lijun Zhang

Pay-Per-Click Advertising Special Report

International Marketing? Just Google It!

Link Analysis in Web Mining

Module Contact: Dr Dan Smith, CMP Copyright of the University of East Anglia Version 1

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

Chapter 6: Information Retrieval and Web Search. An introduction

Mining Web Data. Lijun Zhang

Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University

Information Retrieval. hussein suleman uct cs

CS/INFO 1305 Information Retrieval

How to Get Your Website Listed on Major Search Engines

6 WAYS Google s First Page

Basic & Pro Resellers

Search Engine Optimization Forms An overview of the optimization and registration process

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Exam IST 441 Spring 2013

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s

CS/INFO 1305 Summer 2009

Oleksandr Kuzomin, Bohdan Tkachenko

Today we show how a search engine works

3/21/2016 AN INTRODUCTION TO SEARCH ENGINE OPTIMIZATION. Search Engine Optimization (SEO) Basics for Attorneys

EBOOK. On-Site SEO Made MSPeasy Everything you need to know about Onsite SEO

Student Guide to NLN Testing Portal

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

CS 103, Fall 2008 Midterm 1 Prof. Nakayama

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018

Practice Questions for Midterm

Chapter 2. Architecture of a Search Engine

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

Where is My Website? How to Get Your Site Found

Outline. Lecture 2: EITN01 Web Intelligence and Information Retrieval. Previous lecture. Representation/Indexing (fig 1.

Search Engine Optimization

We Push Buttons. SEO Glossary

Executed by Rocky Sir, tech Head Suven Consultants & Technology Pvt Ltd. seo.suven.net 1

Introduction to Information Retrieval

Do-It-Yourself Guide for Advertisers

THE WEB SEARCH ENGINE

1. Create your website. 2. Choose a template

Class Note #02. [Overall Information] [During the Lecture]

Evaluation of Retrieval Systems

Stanford University Computer Science Department Solved CS347 Spring 2001 Mid-term.

Digital Marketing In The Kingdom By Ciaran Doyle for Brains

Administrative. Web crawlers. Web Crawlers and Link Analysis!

Tim Cohn TimWCohn

CS371R: Final Exam Dec. 18, 2017

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

Acceptance. Changes to this Policy

KEYWORD GENERATION FOR SEARCH ENGINE ADVERTISING

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency

Oracle Adaptive Risk Manager Online Dashboard and Reporting Guide

Exam IST 441 Spring 2011

Search & Google. Melissa Winstanley

Search Engine Marketing

CLOAK OF VISIBILITY : DETECTING WHEN MACHINES BROWSE A DIFFERENT WEB

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine

Cloak of Visibility. -Detecting When Machines Browse A Different Web. Zhe Zhao

Europcar International Franchisee Websites Search Engine Optimisation

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO Lecture 6: Information Retrieval I. Aidan Hogan

Optimising your web presence

What is Google Analytics? What Can You Learn From Google Analytics? How Can Google Analytics Help Your Business? Agenda

Extracting Rankings for Spatial Keyword Queries from GPS Data

CMSC201 Computer Science I for Majors

Web Spam Taxonomy. Zoltán Gyöngyi Hector Garcia-Molina

Searching the Web for Information

In your school or local public library, log on to the library catalogue.

LIST OF ACRONYMS & ABBREVIATIONS

Big Data Analytics CSCI 4030

SEO is one of three types of three main web marketing tools: PPC, SEO and Affiliate/Socail.

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Tuesday 21st May 2013 Time: 09:45-11:45

WHAT DOES THIS PRIVACY POLICY COVER?

Lecture 27: Learning from relational data

Advertising Network Affiliate Marketing Algorithm Analytics Auto responder autoresponder Backlinks Blog

Fritztile is a brand of The Stonhard Group THE STONHARD GROUP Privacy Notice The Stonhard Group" Notice Whose Personal Data do we collect?

Michael Phelps Foundation: Privacy Policy

Practical: Observing how real search engines work

All-In-One-Designer SEO Handbook

Brainspace: Quick Reference

How to Stop Wasting Money On Your Google AdWords Campaigns

WebReach Product Glossary

Transcription:

Computer Science 572 Midterm Prof. Horowitz Tuesday, March 12, 2013, 12:30pm 1:45pm Name: Student Id Number: 1. This is a closed book exam. 2. Please answer all questions. 3. There are a total of 40 questions. Each question is worth 2 1/2 points. 4. Write your answers in the space provided immediately below the question only. 1. Can a web page author claim his page is copyrighted if he forgets to insert a Copyright notice statement on the page? 2. Name the four types of protection for intellectual property. 3. Google was successfully sued by the United States Federal government for offering ads by Canadian pharmacies. What did Google do that was wrong? 4. Define and describe what is a term-document incidence matrix. 1

5. What do the terms AdWords and AdCenter refer to? 6. When an advertiser decides to bid on a set of keywords, e.g. European cars, Google, Bing and Yahoo allow the advertisers to match keywords in several ways. Name two of these methods, and describe each one in a single short sentence. 7. How is the failure of a Map worker handled in the Map/Reduce framework? 8. What is a tracking pixel? 2

9. State Zipf s Law 10. When investigating click fraud, there are both online tests and offline tests. Give an example of: i) an online test. ii) an offline test. 11. What effect does the following line in a web page have? <meta name=robots content="noindex,nofollow"> 12. Recall and Precision are two measures of the effectiveness of an Information Retrieval system. If A is the number of relevant records retrieved, B is the number of relevant records not retrieved, and C is the number of irrelevant records retrieved, define Recall and Precision in terms of A, B, and C. 3

13. The terms TF and IDF are used in information retrieval. What do the terms stand for? 14. A study of how to design a web page crawler to locate the best quality pages was done by Cho and Garcia-Molina. i) What measure of quality did they use? ii) What algorithm did they determine would produce the highest quality pages in the shortest time? 15. Google and Bing both allow advertisers to restrict where their ads will be seen; the restriction can be by country, by state, by city. Name one way to accomplish this. 16. What is Hadoop? 17. With respect to search engines, what does the term relevance feedback refer to? 18. What is the Soundex Algorithm? 19. Suppose there are only two web pages, each with only one link that points to the other web page. What will be the PageRank of each page? 20. As a website grows and adds more pages with more links to web pages outside of the website, how is the total PageRank of the website affected? 4

21. True or False? Google, Yahoo, and Bing record all user clicks, both on ads and on organic search results. 22. When Google must decide how to order the ads for a given query phrase, what formula does it use? 23. Suppose one advertiser bids $1.00 for his ad to be displayed and a second advertiser bids $0.50 for his ad to be displayed and all other factors affecting ads are identical. If the first advertiser s ad is clicked on how much does he pay Google? 24. Suppose the Pepsi Cola company wants to bid on the words Coca Cola whenever they are entered as a query, so a Pepsi Cola ad will appear. Is this legal? 25. What does DMCA stand for? 26. When a search engine gets a query such as what are the movie times for The Artist, how are they able to identify the local movie theaters? 27. What is a way to guarantee that an advertiser s ad will appear at the top of a Google or Bing results page? 28. Define cloaking. 5

29. What is de-duplication? Give two examples of why it needs to be done. 30. What is Google s reason for not telling an advertiser why each and every click was marked as valid? 31. What is a parked domain? 32. Write out all of the 3-grams for the following phrase: Fourscore and seven years ago our fathers brought forth a nation 6

33. Google offers a variety of special operators that can be used to narrow a search. Define the following: i) filetype: ii) site: iii) allinanchor: 34. Some browsers now include a feature that prevents third-party cookies from being placed on a browser. Name the three parties involved. 35. The HITS Algorithm developed by Jon Kleinberg identifies two types of web pages that have special significance. Name and describe these two types of web pages. 36. When creating an index of documents search engines make use of case folding, stemming and stop word removal. Briefly define these three terms in one sentence each. 7

37. List the four main features/functions that Apache Tika provides. 38. Define Kendall s Tau distance in words, i.e. without using mathematical symbols. 39. Given two sequences of length n, what is their maximum Kendall Tau distance? 40. Define Spearman s footrule distance for two lists of n items without using mathematical symbols. 8