Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Similar documents
Ranking of ads. Sponsored Search

CS47300: Web Information Search and Management

Introduction to Information Retrieval

Web Search Basics. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Information retrieval

A Taxonomy of Web Search

Web Search Basics. Berlin Chen Department t of Computer Science & Information Engineering National Taiwan Normal University

Plan for today. CS276B Text Retrieval and Mining Winter Evolution of search engines. Connectivity analysis

CS47300: Web Information Search and Management

Information Retrieval. Lecture 9 - Web search basics

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

Web Crawling. Introduction to Information Retrieval CS 150 Donald J. Patterson

CS 345A Data Mining Lecture 1. Introduction to Web Mining

Information Retrieval Spring Web retrieval

Marketing & Back Office Management

CS490W. Web Search (I) Luo Si. Department of Computer Science Purdue University. Slides from Manning, C., Raghavan, P. and Schütze, H.

21. Search Models and UIs for IR

This document is for informational purposes only. PowerMapper Software makes no warranties, express or implied in this document.

Web Search (I) Luo Si. Department of Computer Science Purdue University. Slides from Manning, C., Raghavan, P. and Schütze, H.

Identifying Web Spam With User Behavior Analysis

Digital Media & Society CAO: GA884

I Travel on mobile / UK

The Ultimate Digital Marketing Glossary (A-Z) what does it all mean? A-Z of Digital Marketing Translation

Campaign Goals, Objectives and Timeline SEO & Pay Per Click Process SEO Case Studies SEO, SEM, Social Media Strategy On Page SEO Off Page SEO

3 Media Web. Understanding SEO WHITEPAPER

Desktop Crawls. Document Feeds. Document Feeds. Information Retrieval

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

Who We Are! Natalie Timpone

Breakdown of Some Common Website Components and Their Costs.

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search

Uses of web scraping for official statistics

Search With Better Results

THE HISTORY & EVOLUTION OF SEARCH

Information Retrieval

Social Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

All-In-One Cloud-Based Blaster

Information Retrieval May 15. Web retrieval

Outlook Web Access Getting Started

Adding content to your Blackboard 9.1 class

Search With Better Results. by Hewie Poplock

OUR TOP DATA SOURCES AND WHY THEY MATTER

Webscraping at Statistics Netherlands

Once you click on the Enterprise Icon found on your desktop you will be asked for your password. This Default Code Is

Contents. Management. Client. Choosing One 1/20/17

Module 1: Internet Basics for Web Development (II)

Promoting Website CS 4640 Programming Languages for Web Applications

RCA Business & Technical Conference

Before I show you this month's sites, I need to go over a couple of things, so that we are all on the same page.

Search Engines. Charles Severance

A Survey on Web Information Retrieval Technologies

Below is another example, taken from a REAL profile on one of the sites in my packet of someone abusing the sites.

SEO. Definitions/Acronyms. Definitions/Acronyms

Webinar Series. Sign up at February 15 th. Website Optimization - What Does Google Think of Your Website?

Key Largo Wastewater Treatment District Board of Commissioners Meeting Agenda Item Summary

Website Name. Project Code: # SEO Recommendations Report. Version: 1.0

Layer by Layer: Protecting from Attack in Office 365

Automatic Identification of User Goals in Web Search [WWW 05]

HOW DOES A SEARCH ENGINE WORK?

: Semantic Web (2013 Fall)

I Travel on mobile / FR

FAQ: Crawling, indexing & ranking(google Webmaster Help)

SEO Today s Agenda: Introduction What to expect today How search engines work What is SEO? Foundational SEO On and off page basics

Short review of the Coinmama.com. Registration on the Coinmama

THE WEB SEARCH ENGINE

Information Retrieval. Session 11 LBSC 671 Creating Information Infrastructures

CS 160: Lecture 15. Outline. How can we Codify Design Knowledge? Motivation for Design Patterns. Design Patterns. Example from Alexander: Night Life

The online customer experience: researching and planning a web presence MBA 563 WEEK 5

WorkBook release note

CS/INFO 1305 Summer 2009

Full Website Audit. Conducted by Mathew McCorry. Digimush.co.uk

. social? better than. 7 reasons why you should focus on . to GROW YOUR BUSINESS...

1. Conduct an extensive Keyword Research

Information Retrieval

CONTENT CALENDAR USER GUIDE SOCIAL MEDIA TABLE OF CONTENTS. Introduction pg. 3

Registering as a parent

User Interaction: jquery

THE TRUTH ABOUT SEARCH 2.0

Mobile Travel Trends in China. Nov 2013

Search Engine Optimization

Searching the Deep Web

By Paul Botelho WEB HOSTING FALSE POSITIVES AVOIDING SPAM

Exclusive Leads for Attorneys WHY PAY PER CLICK?

SEARCH ENGINE MARKETING (SEM)

THE ULTIMATE SEO MIGRATION GUIDE

Chopra Teachers Directory Listing Manual

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

How to do an On-Page SEO Analysis Table of Contents

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Querying Introduction to Information Retrieval INF 141 Donald J. Patterson. Content adapted from Hinrich Schütze

Online Marketing Strategies to Increase Sales & Guest Loyalty. Monday, March 8, 2010

Relevant?!? Algoritmi per IR. Goal of a Search Engine. Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Web Search

Web Survey. Part 1. Answer the following questions about the department s current Web site.

When it comes to your website redesign, form and function need to be a package deal.

Computer Training Center 1515 SW 10 th Avenue Topeka KS

EECS 282 Information Systems Design and Programming. Atul Prakash Professor, Computer Science and Engineering University of Michigan

TRUECALLER INSIGHTS SPECIAL REPORT: THE TOP 20 COUNTRIES AFFECTED BY SPAM CALLS

Mapping for end users

WHY EVERY ONLINE ECOMMERCE STORE NEEDS MOBILE APP?

How to Manage and Maintain Your Website

Transcription:

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org

Web Search Basics The Web as a graph Web pages are nodes Hyperlinks are directed edges

Web Search Basics Characteristics of the web Significant Duplication 30%-40% is some studies [Brod97, Shiv99] www.copyscape.com High linkage more than 8 links per page on average Spam Billions of pages of it.

Web Search Basics The User flickr:crankyt Search Results Web Spider Indexer The Web Indices Ad Indices

Size of the Web How big is the web? What is measured? Number of hosts Number of static html pages Number of hosts - netcraft survey http://news.netcraft.com/archives/web_server_survey.html Monthly report on hosts and servers Number of pages Lots of estimates which warrant further discussion

Size of the Web How big is the web? Netcraft Web Server Survey

Size of the Web Rate of change [Cho00] 720k pages from 270 popular sites sample daily for 5 months in 1999 40% changed weekly, 23% daily [Fett02] Massive study: 151M pages checked over a few months Significant changes 7% weekly Any change 25% weekly

Size of the Web Rate of change [Ntul04] 154 large sites recrawled from scratch weekly 8% had new pages ever week 8% die 5% new content 25% new links per week

Size of the Web Rate of change Fetterly et al. study in 2002 150 million pages over 11 weekly crawls Bucketed into 85 groups according to amount of change

Size of the Web Web Evolution The nature of the web is change Not much work on studying web evolution Exception is Fetterly et. al, 2003 Some effort has been made to extrapolate from small samples using fractal models [Dill et. al. 2001]

Overview Overview Introduction Classic Information Retrieval Web IR Sponsored Search Web Search Basics Size of the Web Web Users Spam

Web Users User Search Needs in Brod02/RL04

Web Users User Search Needs in Brod02/RL04 Informational Want to learn about something (~40%/65%)

Web Users User Search Needs in Brod02/RL04 Informational Want to learn about something (~40%/65%) Navigational Want to go to that page (~25%/15%)

Web Users User Search Needs in Brod02/RL04 Informational Want to learn about something (~40%/65%) Navigational Want to go to that page (~25%/15%) Transactional Want to do something (~35%/20%) Access a service, download, shop

Web Users User Search Needs in Brod02/RL04 Informational Want to learn about something (~40%/65%) Navigational Want to go to that page (~25%/15%) Transactional Want to do something (~35%/20%) Access a service, download, shop Others? Exploration, social, etc...

Web Users Web Users Make ill defined queries Short Average in 2001: 2.54 terms (80% < 3 words) Average in 1998: 2.35 terms (88% < 3 words) [Silv98] Imprecise terms Suboptimal syntax (no operators) Low effort (spelling mistakes)

Web Users Web Users Wide Variance in Needs Expectations Knowledge Bandwidth

Web Users Web Users Behavior 85% look over one result screen only 78% of queries are not modified Follow links ( the scent of information )

Web Users Power law Few popular broad queries Many rare specific queries

Web Users Top queries Most are related to sex 2008 Who What How (Google) http://www.google.com/intl/en/press/zeitgeist2008/mind.html

Web Users Top queries Live demo - WARNING this is not very safe... Is it safe to Is it legal to why does why doesn t why is why isn t americans are

Web Users How far do people look for results?

Web Users True Example * The User flickr:crankyt Task Stop the noisy fan in the courtyard Info Need Info about EPA regulations Verbal Form What are EPA rules on noise pollution? Query EPA Sound Pollution To Google or to GoTo Business Week Online 9/28/2001

Web Users How do users evaluate search engines? Quality of pages Classic IR relevance Also important: Trust Duplicate elimination Readability Fast Access No pop-ups

Web Users How do users evaluate search engines? Precision is more important than recall Precision: How precise is a portal in locating relevant results? Recall How thorough is the coverage of available relevant results? Precision with 1 result, 10 results, 2-3 pages of results. When is recall important?

Web Users How do users evaluate search engines? Recall is sometimes important: Googling for a new doctor Googling a prospective employee Googling your date

Web Users How do users evaluate search engines? Good U/I Simple No Clutter Pre and post processing tools Spell check ( Did you mean...? ) Suggested alternative searches Links to resources (maps, images, stock quotes) Able to deal with typical behavior e.g., a URL typed into a search box

Web Users Loyalty to a given search engine iprospect Survey 4/2004

Overview Overview Introduction Classic Information Retrieval Web IR Sponsored Search Web Search Basics Size of the Web Web Users Helping the User Spam

Helping the user True Example * The User flickr:crankyt Task Stop the noisy fan in the courtyard Info Need Info about EPA regulations Verbal Form What are EPA rules on noise pollution? Query EPA Sound Pollution To Google or to GoTo Business Week Online 9/28/2001

Helping the user Answering the need behind the query The query is often an imprecise indicator of what the user really wants What can we do to get a better handle on the underlying information need? Query language Adjust rank of English results for a Japanese query Use user context In particular geographic context

Helping the user Answering the need behind the query Guess what type of information the user wants a web page? a map? a stock price? what else? Correct queries Suggest correct spellings Suggest related searches (google-fu)

Helping the user Examples - language

Helping the user Examples - query spelling

Helping the user Examples - query expansion

Helping the user Query shortcuts Map: irvine, ca 92614 Calculation: 5+4 Flight Info: american airlines 715 Stock price: msft Unit conversion: 1 dollar in euro Music: White Stripes

Helping the user Examples - query shortcuts

Helping the user Examples - query shortcuts

Helping the user Examples - query aggregations