Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW

Similar documents
CSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies

Searching the Web for Information

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

Searching for Information

Searching. Outline. Copyright 2006 Haim Levkowitz. Copyright 2006 Haim Levkowitz

Wading Through the Web Conducting Research on the Internet

SEO According to Google

Chapter 4 A Hypertext Markup Language Primer

Introduction. What do you know about web in general and web-searching in specific?

power up your business SEO (SEARCH ENGINE OPTIMISATION)

Today we show how a search engine works

Almost 80 percent of new site visits begin at search engines. A couple of years back Nielsen published a list of popular search engines.

Website/Blog Admin Using WordPress

Crawler. Crawler. Crawler. Crawler. Anchors. URL Resolver Indexer. Barrels. Doc Index Sorter. Sorter. URL Server

Information Retrieval May 15. Web retrieval

Chapter 27 Introduction to Information Retrieval and Web Search

Information Retrieval Spring Web retrieval

Philcat Search Strategies. Phillips Library Peabody Essex Museum

Search Engine Architecture. Hongning Wang

Google and Beyond: Research-Quality Web Searching

COS Pivot Profile Overview

An Introduction to Search Engines and Web Navigation

Search Engine Optimisation Basics for Government Agencies

Activity: Google. Activity #1: Playground. Search Engine Optimization Google Results Organic vs. Paid. SEO = Search Engine Optimization

How to Get Your Website Listed on Major Search Engines

Chapter 3: Google Penguin, Panda, & Hummingbird

An Overview of Search Engine. Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia

Business Forum Mid Devon. Optimising your place on search engines

Below, we will walk through the three main elements of the algorithm, which include Domain Attributes, On-Page and Off-Page factors.

FRPG 188X Birth, Life and Death Dr. Jeffrey Maynes. Introduction to Library Research. Spring 2014

A web directory lists web sites by category and subcategory. Web directory entries are usually found and categorized by humans.

Module 1: Internet Basics for Web Development (II)

Searching the Deep Web

Internet Lead Generation START with Your Own Web Site

6 WAYS Google s First Page

Using the Penn State Search Engine

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

Information Retrieval

Purpose, features and functionality

LIBRARY OVERVIEW Your Library account Libraries homepage gateway to all library collections and services Collections & services

Wading Through the Web Conducting Research on the Internet

3 Media Web. Understanding SEO WHITEPAPER

Search With Better Results

SharePoint 2010 and 2013 Auditing and Site Content Administration using PowerShell

FAMILY SEARCH WIKI GOLDMINE

Office of Human Resources. Hiring Adjuncts CREATING THE POSTING (NO POSITION DESCRIPTION REQUIRED!)

Internet Basics. Basic Terms and Concepts. Connecting to the Internet

SE Workshop PLAN. What is a Search Engine? Components of a SE. Crawler-Based Search Engines. How Search Engines (SEs) Work?

EBSCOhost Web 6.0. User s Guide EBS 2065

= a hypertext system which is accessible via internet

Keywords. The Foundation of your Internet Business.. By Eric Graudins: TheInternetBloke.com Worldwide Rights Reserved.

Instructor: Kathleen Scheaffer Content: Adopted from Gwen Harris

Search With Better Results. by Hewie Poplock

OTA 210 Research Guide

Savvy Searcher: Recognizing Bias

What the is SEO? And how you can kick booty in the interwebs game

Search Engine Technology. Mansooreh Jalalyazdi

You got a website. Now what?

WEB SPAM IDENTIFICATION THROUGH LANGUAGE MODEL ANALYSIS

The Ultimate On-Page SEO Checklist

extreme searching: how to avoid extreme frustration and bird walks presented by Kathy Schrock Overview The Problems

How to use EBSCOhost Research Databases

Beyond Google Other Good Search Engines Directories Web Page Evaluation Checklist

Searching for Information

Nexis User Guide.

Document Library User s Guide Guest Users. 1 SUEZ Proprietary

PROJECT REPORT (Final Year Project ) Project Supervisor Mrs. Shikha Mehta

Midterm 1 Review Sheet CSS 305 Sp 06

Nexis User Guide.

6 TOOLS FOR A COMPLETE MARKETING WORKFLOW

WEBSITES PUBLISHING. Website is published by uploading files on the remote server which is provided by the hosting company.

Use the following searching strategies to conduct an effective and efficient online search.

TOP RANKING FACTORS A QUICK OVERVIEW OF THE TRENDING TOP RANKING FACTORS

SEO. Definitions/Acronyms. Definitions/Acronyms

Secrets of Profitable Freelance Writing

Google Analytics: Part 3

Search Engines. Charles Severance

How to Get Your Web Maps to the Top of Google Search

seobility First steps towards successful SEO

Chapter 6: ISAR Systems: Functions and Design

Information Retrieval

Understanding the Ever-Changing World of SEO

AN SEO GUIDE FOR SALONS

WebBiblio Subject Gateway System:

Science Direct. Quick Reference Guide. Empowering Knowledge

What is SEO? { Search Engine Optimization }

Social Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Google Webpage SEO Secrets Of Listing On Google First Page

2018 SEO CHECKLIST. Use this checklist to ensure that you are optimizing your website by following these best practices.

For Starters Web 4.0. Entrée Thrive Online. Dessert Listen and Evolve. Search Marketing for Today s Lunch Menu

DATA MINING - 1DL105, 1DL111

Wales Council for Voluntary Action Supporting charities, volunteers and communities

Automatic Identification of User Goals in Web Search [WWW 05]

SEO. A Lecture by Usman Akram for CIIT Lahore Students

Library Search Quick Guide

LIBRARY RESOURCES & GUIDES APA STYLE YOUR LITERATURE REVIEW PRIMARY & SECONDARY SOURCES SEARCHING LIBRARY E-RESOURCES ( DATABASES ) FOR ARTICLES

Provided by TryEngineering.org -

Oleksandr Kuzomin, Bohdan Tkachenko

SEO Factors Influencing National Search Results

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Transcription:

Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology Third Edition by Lawrence Snyder Searching in All the Right Places The Obvious and Familiar To find tax information, ask the tax office Libraries Online Many college and public libraries let you access their online catalogs and other information resources Libraries provide online facilities that are well organized and trustworthy Remember that many pre-1985 documents are not yet available online Plus Librarians are real live experts 1-2 5-2 How Is Information Organized? Important Properties of Classifications Hierarchical classification (like a family tree) Information is grouped into a small number of categories, each of which is easily described (top-level classification) Information in each category is divided into subcategories (second-level classifications), and so on Eventually the classifications become small enough for you to look through the whole category to find the information you need This is a process of elimination as much as choosing appropriate subcategories Descriptive terms must cover all the information in the category and be easy for a searcher to apply Subcategories do not all have to use the same classifications Information in the category defines how best to classify it There is no single way to classify information 1-3 1-4 5-3 5-4

Design of Hierarchies Levels in a Hierarchy General rules for design and terminology of hierarchies Root is usually at the top (branching metaphor) "Going up in the hierarchy" means the classifications becomes more inclusive or general "Going down in the hierarchy" means the classifications become more specific or detailed The greater-than (>) symbol is a common way to show going down through levels of classification A one-level hierarchy has only one level of "branching" no subdirectories To count levels, remember There is always a root There are always "leaves" the categories themselves The root and leaves do not count as levels Groupings may overlap (one item can appear in more than one category), or be partitioned (every category appears only once) Number of levels may differ by category, even in the same hierarchical tree 1-5 1-6 5-5 5-6 How Is Web Site Information Organized? Homepage is the top-level classification for the whole Web site Classifications are the roots of hierarchies that organize large volumes of similar types of information Topic clusters are sets of related links For example, sidebar and top of page navigation links Content information often fills the rest of a page 1-7 1-8 5-7 5-8

Searching the Web for Information Crawlers How a Search Engine Works Two basic parts: 1. Crawler: Visits sites on the Internet, discovering Web pages and building an index to the Web's content 2. Query processor: Looks up user-submitted keywords in the index and reports back which Web pages the crawler has found containing those words Popular Search Engines: Google, Yahoo!, MSN, AOL, Ask When a crawler visits a website: First identifies all the links to other Web pages on that page Checks its records to see if it has visited those pages recently If not, adds them to list of pages to be crawled Records in an index the keywords used on a page (appear in the title, the body, or in anchor text) Crawlers can miss pages No page points to it Page is dynamically created on-the-fly Page has only images Page type is not recognized (not HTML, PDF, etc.) 1-9 1-10 5-9 5-10 Query Processors Page Ranking Gets keywords from user and looks them up in its index Even if a page has not yet been crawled, it might be reported because it is linked from a page that has been crawled, and the keywords appear in the anchor text on the crawled page Important to give the right terms to look up Google's idea: PageRank Orders links by relevance to user Relevance is computed by counting the links to a page (the more pages link to a page, the more relevant that page must be) Each page that links to another page is considered a "vote" for that page Google also considers whether the "voting page" is itself highly ranked 1-11 1-12 5-11 5-12

Asking the Right Question Logical Operators Choosing the right terms and knowing how the search engine will use them Words or phrases? Search engines generally consider each word separately Ask for an exact phrase by placing quotations marks around it "thai restaurants" AND, OR, NOT AND: Tells search engine to return only pages containing both terms (default) Thai AND restaurants OR: Tell search engine to find pages containing either word, including pages where they both appear Thai OR Siam NOT/-: Excludes pages with the given word -review AND and OR are infix operators; they go between the terms NOT/- is a prefix operator; it precedes the term to be excluded Google Help: Cheat Sheet http://www.google.com/help/cheatsheet.html 1-13 1-14 5-13 5-14 Five Tips for an Efficient Search 1. Be clear about what sort of page you seek (company or organization, reference page, etc.) 2. Think about what type of organization might publish the page you want You might be able to guess the URL 3. List terms that are likely to appear on the pages you are looking for 4. Assess the results Before looking at each returned page, check the results to see how effective your search was 1-15 5. Consider a two-pass strategy (focused searches) Do a broad topic search, and then search within your results 1-16 5-15 5-16

1-17 1-18 5-17 5-18 Web Information: Truth or Fiction? Anyone can publish anything on the web Note prevalence of blogs and wikis Some of what gets published is false, misleading, deceptive, self-serving, slanderous, or disgusting If it is on the web it must be true. NOT! How do we know if the pages we find in our search are reliable? 1-19 1-20 5-19 5-20

Do Not Assume Too Much Characteristics of Legitimate Sites Registered domain names may be misleading or deliberate hoaxes www.whitehouse.gov vs. www.whitehouse.org vs. www.whitehouse.com Look for who or what organization publishes the Web page Respected organizations publish the best information available A two-step check for the site's publisher 1. InterNIC (www.internic.net/whois.html) provides the name of the company that assigned the site's IP address, and a link to the WhoIs server maintained by that company 2. Go to the WhoIs Server site and type the domain name or IP address again. Information returned is the owner's name and physical address 1-21 Web sites are most believable if they have these features: Physical Existence Site provides a street address, phone number, e-mail address Expertise Site includes references, citations or credentials, related links Clarity Site is well organized, easy to use, and has sitesearching facilities Currency Site was recently updated Professionalism Site's grammar, spelling, and punctuation are correct; all links work Remember that a site can have all these features and still not be legitimate. When in doubt, check it out (including cross checking). Ask a librarian. Example: http://www.dhmo.org/ (Hoax about dangers of Dihydrogen monoxide H 2 O) 1-22 5-21 5-22 1-23 1-24 5-23 5-24

1-25 1-26 5-25 5-26 1-27 1-28 5-27 5-28

1-29 1-30 5-29 5-30 1-31 1-32 5-31 5-32