Finding Information on the Information Highway How to get around in the Internet
Finding information on the information highway the Internet vs the World Wide Web Search engines Subject directories Online databases Boolean searches
Aren t the Internet and the Web the same thing?
the Internet the World Wide Web Think of the Internet as the physical components necessary to build a [massive] computer network (nodes, cables, servers, gateways, routers, firewalls, etc.). Think of the web as all the services (i.e. email, webpages, file transfers, etc.) available over the Internet; each service requires its own protocol (SMTP, HTTP, FTP, etc.).
The primary internet protocols: Transmission Control Protocol Internet Protocol File Transfers File Transfer Protocol Web Pages HyperText Transfer Protocol email Simple Mail Transfer Protocol Post Office Protocol
Finding information on the Internet: Search Engines
Search engines are comprised of 3 basic components A spider aka: crawler/bot program that crawls across the Web collecting info A database organized by an indexer program Search engine software pulls hits based on your inquiry
The search process: The user enters key words or phrases The search engine spider searches the database index to find matching items The software returns Hits (results). [The hits are prioritized according to multiple factors] A search engine is just a program, nothing validates or authenticates the results; no human review takes place.
Why does the same search in different search engines get different results? Each search engine uses different algorithms or spiders. The hits are dependent on database content. Each engine has a different method for ranking or relevance: These might be based on factors such as: Frequency: How many times do the words occur in the website? Location: Are keywords contained in the URL or the site name? Each engine may search different sites. Is the search being conducted across the entire web? Is this a specialty search? Bottom line? Use more than one search engine to perform research!
Some of the factors search engines may use to rank results: Factors based on the site itself Frequency Location Page count Website structure Factors based on external criteria Link popularity Click popularity Demographics Alliances $$$ (who pays the most to have their sites shown)
What are metasearch engines? Search engines that search search engines instead of individual websites Think how much wider the search area is!
Finding information on the Internet: Subject Directories
How do subject directories differ from search engines? Utilize the human element to categorize Typically more commercial/consumer oriented Drill-down search by subject, not keywords Hierarchical organization Topics Subtopics
A great resource on subject directories: http://www.lib.berkeley.edu/teachinglib/guides/internet/subjdirectories.html
Finding information on the Internet: Online Databases
Online databases are referred to as the hidden Internet or deep web Online databases provide access to resources outside the reach of web crawlers or search spiders: Newspapers, journals, periodicals Academic papers, white papers Corporate data and specialty data About Yahoo! Library Index
A great resource for databases: http://www.itc.nl/pub/home/library/library-generalinformation/more-info-databases/lii_info.html
Making the most of your search
Do your searches end up returning an overwhelming number of hits? Use Boolean operators to tweak them!
The basic Boolean operators:
Examples of how Boolean operators affect your search: example returns car AND ford car OR ford car NOT ford Documents containing BOTH the words car and ford (AND is assumed when 2 words are used) Documents containing either word and both words OR results in the greatest number of hits Documents containing car that do NOT contain ford NOT generally returns the smallest number of hits
Ways to further refine your search: example combinations (car AND ford) NOT Gerald Quotes Men In Black Wildcard (*) Bio* Wildcard (%,?) Smithw%ck returns Documents containing BOTH the words car and ford but nothing about President Gerald Ford Documents containing the exact string of words within the quotes, not any occurrence of any of the words Documents that contain any words that begin with the letters bio (biology, biography, biotech, etc) % stands for any letter great when a word may be spelled in a variety of ways. Smithwick - Smithwyck - Smithweck
More considerations (these may vary based on the search engine used) example stopwords keys (not concepts) proximity operators returns Ignored by the search engine (a, an, the, of, by, with, for, to, etc.) Break phrases down into keywords (TQM in manufacturing assembly lines) (total quality management, TQM, production, manufacturing, assembly line production, etc.) designate how close keywords should be variety Change spellings; try abbreviations, singular/plural forms, related terms, synonyms