Searching on the WWW Directory Oriented Search Engines Often looking for some specific information WWW has a growing collection of Search Engines to aid in locating information The Search Engines return addresses of pages that correspond to the user provided keyword (s) and links to those addresses These engines are similar il to editorial i services they determine the best sites on the Web and include them in categorized listings some have both a directory of categorized sites and a search engine for searching both the listing and the Internet The Search Engines fall into two categories Directory Searches (e.g. Yahoo ) Network Searches - not previously organized into categories ( e.g InfoSeek, Lycos, Google, AltaVista, Hotbot, Web Crawler) Searching on the WWW 1/30 Searching on the WWW 2/30 Directory Search Engines Yahoo Searching the Yahoo Directory Yahoo is the grandfather of these search engines started in April, 1994 by 2 graduate EE students at Stanford - to keep track of their personal interests on the Internet Yahoo searches are indexed by category ; searches move deeper through its menu-style links until reaching a site of interest t to the user web site creators submit their site address, content,contact address, to Yahoo editor s and request listing in Yahoo at specific location User can either move through the directories manually OR use a basic search engine based on keywords ( word or simple phrases) Productive use of the search engine ( for all engines) takes some practice to obtain useful results a search results in a list of possible matches in the Yahoo database (and now with links to Alta Vista located pages) this list is in hotlink format : click on any entry in the list to go to that location 14 top level categories Searching on the WWW 3/30 Searching on the WWW 4/30
General Search Tips Net Search Engines 1. Use more descriptive, specific words as opposed to general ones. For example, a search for "Lamborghini" will return much more specific results than a search for "sports cars." 2. Try an Advanced Search: Use the "+" (plus) sign for words that your results MUST contain. Or use the "-" (minus) sign in your query to tell the search engine that your results should NOT contain a certain word. When using these options, do not leave any space between the sign and the word. This type of engine is more oriented toward searching and less oriented toward editorializing Basically : they find key phrases in titles of Web pages some search headers and document titles, some search their own indexes of Internet documents and pages, others just rummage the Internet Searching on the WWW 5/30 Searching on the WWW 6/30 Net Search Engines HotBot http://www.hotbot.com InfoSeek : provides a few lines from each site in returned listing; clarifying how your keyword is is used there and so if site is of interest to you or is not. Lycos : very comprehensive search engine - uses a database with millions of link descriptors and documents. Engine searches links, headings and titles for keywords you enter + several search options. WebCrawler : searches by document title and content using keywords provided by user. Not as comprehensive as Lycos but is quick and easy to use. Google : The current in search engine. Owned and maintained originally by SUN Computer, now a separate company. Searching on the WWW 7/30 Searching on the WWW 8/30
Searching on the WWW 9/30 Searching on the WWW 10/30 Searching on the WWW 11/30 http://www.uml.edu/libraries/directory/h-inet.html Searching on the WWW 12/30
Search Engines Searching on the WWW 13/30 Searching on the WWW 14/30 Search Engines Searching on the WWW 15/30 Searching on the WWW 16/30
Search Engines Search Engines Searching on the WWW 17/30 Searching on the WWW 18/30 Search Engines http://www.askjeeves.com ~ 2006 AskJeeves ~ 2007 Searching on the WWW 19/30 Searching on the WWW 20/30
How Net Search Engines Work Each is designed to provide access to a database of information provided by Internet sites around the world Some are Net specific ( Lycos, WebCrawler); others also search UseNet, on-line pubs., information archives (Infoseek, Excite) User provides the search key: the keyword(s) the better the choice of keyword, the more effective the search avoid commonly appearing words e.g. www, computer, PC, Mac, why not? : Mac appears in Macintosh, Machine, Macaroni,!!! Searching on the WWW 21/30 Can use Boolean operators : AND, NOT, OR Searching on the WWW 22/30 Lycos Lycos started at Carnegie Mellon University as a library research project, then merged with Home Shopping Network then was purchased by a Spanish company for several $ billion then was dumped as a loss ( commercially). very comprehensive and very accurate search engine includes approx. 90 % of the Web in its searches ==>can be busy or slow during peak business hours Includes more options and has a much larger database of indexed Web documents than InfoSeek and others Searching on the WWW 23/30 Lycos consists of 3 parts (all interconnected): 1st : groups of programs called spiders that go out and search the Web, ftp and gopher sites every day 2nd : catalog database contains results returned by the spiders: the URL s, information about documents found at each site, number of times a site is referenced by other Web addresses. Most popular sites are indexed first 3rd : search engine itself - sorts through the catalog and produces a list of hits, listed in descending di order of relevance to your keyword criteria Searching on the WWW 24/30
Other Search Engines Improving Query Effectiveness Deja News Research Services : searches only UseNet newgroups http://www.dejanews.com Excite : searches Web documents + 2 weeks worth of UseNet newsgroups and classified ads http://www.excite.com AltaVista : built and owned by DEC* very fast and powerful Web indexer - probably the fastest of all the search engines. http://www.altavista.dec.com * DEC = Digital Equipment Corp. once the 2 nd largest computer manufacturer in the world => bought by Compaq => bought by HP. Refining a Query Here are some effective techniques to try: 1.Identify y a phrase Before: home run records After: "home run" records The before query is ambiguous. Is it looking for the home page of songs -- like "Run, Run, Run" or baseball statistics? Identifying "home run" as a phrase eliminates the ambiguity. This is the most powerful query refinement technique. Better search engines now also look for affiliations in consecutive words without being so directed Searching on the WWW 25/30 Searching on the WWW 26/30 Improving Query Effectiveness Improving Query Effectiveness 2. Add a word or a phrase Before: home run records After: home run records baseball As before, the before query is ambiguous. Adding baseball makes the query less ambiguous. You'll get more total matches (because the query is broadened with an additional term), but the relevance ranking will be better. 3. Capitalize when appropriate p Before: wired digital white house, baby bells, bill gates After: Wired, Digital, White House, Baby Bells, Bill Gates These examples, when all lower case, have a variety of possible interpretations. For example, without capitalization: wired could refer to electrical cables and not Wired Magazine baby bells could refer to the Bells' children on the "Young and the Restless." Searching on the WWW 27/30 Searching on the WWW 28/30
Improving Query Effectiveness Improving Query Effectiveness 4.Use a require or reject operator (+,-) Before: Barney After: Barney +Smith -dinosaur Barney alone is ambiguous. Is it looking for Smith Barney investment t information or cartoon dinosaur pages or Flintstones character? You can use the reject operator (the "minus" sign) to eliminate the cartoon dinosaur interpretation. Or, you can require that the word "Smith" be in the document. The after version above does both. Searching on the WWW 29/30 5. Use a field specifier Before: Pick of the Day After: "Pick of the Day" If you are looking for a particular page that you remember the name of, use the title: field specifier to require that the word or phrase be in the title of the page. Searching on the WWW 30/30 Searching for phrases (words next to each other) When searching for a phrase such as better business bureau or new england patriots, where you want the words in that order, just enclose the phrase in quotes. A search on new england patriots returns all pages with any or all of those words, in any order somewhere on the page (with pages containing all the words ranked higher). But a search on " new england patriots " just finds pages with that exact phrase on the page. Searching on the WWW 31/30 Using plus (+) and minus (-) signs These signs tell our search engine which terms must (+) and must not (-) be present in the returned documents. When using these options, do not leave any space between the sign and the word. Plus (+) If you put a plus sign directly in front of a word, all the documents engine retrieves will contain that word. So if you search for +billiards +rules, you'll be sure to get the rules of the game. Remember, you must mark each word appropriately to have these tools work. For instance, if you type billiards +rules, all of the documents returned will have "rules" in the text, but not necessarily billiards. Minus (-) If you put a minus sign directly in front of a word, engine will NOT retrieve documents containing that word. So if you search for +billiards -equipment -supplies, you'll be spared the billiardsrelated documents that emphasize equipment and supplies. Searching on the WWW 32/30