PLAN SE Workshop Ellen Wilson Olena Zubaryeva Search Engines: How do they work? Search Engine Optimization (SEO) optimize your website How to search? Tricks Practice What is a Search Engine? A page on the web connected to a backend program Allows a user to enter words which characterize a required page Returns links to pages which match the query Components of a SE Robot (or Worm or Spider) collects pages checks for page changes Indexer constructs a sophisticated file structure to enable fast page retrieval Searcher satisfies user queries How Search Engines (SEs) Work? Crawler-Based SEs Human-Powered Directories "Hybrid Search Engines" Or Mixed Results Crawler-Based Search Engines Crawler-based search engines, such as Google, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found. If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. 1
Human-Powered Directories A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted. Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site. "Hybrid Search Engines" In the web's early days, it used to be that a search engine either presented crawler-based results or human-powered listings. Today, it is extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search is more likely to present human-powered listings from LookSmart. However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries. How Search Engines Rank Web Pages Search Engine Optimization Location, Location, Location...and Frequency Tags (<title>, <meta>, <b>, top of the page) How close words (from the query) are to each other on the website Quality of links going to and from a page Penalization for "spamming, when a word is repeated hundreds of times on a page, to increase the frequency and propel the page higher in the listings. Off the Page ranking criteria: By analyzing how pages link to each other. Why do results differ? Some search engines index more web pages than others. Some search engines also index web pages more often than others. The result is that no search engine has the exact same collection of web pages to search through. That naturally produces differences, when comparing their results. Different algorithms to compute relevance of the page to a particular query Search Engine Placement Tips Why is it important to be on the first page of the results? Most users do not go beyond the first page. How to optimize your website? Pick your target keywords: How do you think people will search for your web page? The words you imagine them typing into the search box are your target keywords. Pick target words differently for each page on your website. Your target keywords should always be at least two or more words long. 2
Search Engine Placement Tips: Position Your Keywords Make sure your target keywords appear in the crucial locations on your web pages. The page's HTML <title> tag is most important. The titles should be relatively short and attractive. Several phrases are enough for the description. Search engines also like pages where keywords appear "high" on the page: headline, first paragraphs of your web page. Keep in mind that tables and large JavaScript sections can make your keywords less relevant because they appear lower on the page. Search Engine Placement Tips: Have Relevant Content Keywords need to be reflected in the page's content. Put more than graphics on a page Don't use frames Use the <ALT.> tag Make good use of <TITLE> and <H1> Consider using the <META> tag Get people to link to your page Hiding Web Pages You may wish to have web pages that are not indexed (for example, test pages). It is also possible to hide web content from robots, using the Robots.txt file and the robots meta tag. Not all crawlers will obey this, so this is not foolproof. How to search? What do we search? Information Reviews, news Advice, methods, how to avoid errors Educational material Examples: Laptop reviews How to write a cover letter? How to conduct seminars? Main Steps Make a decision about the search Formulate a topic. Define a type of resources that you are looking for Find relevant words for description Find websites with information Choose the best out of them Feedback: How did you search? 3
Main Problems Why is it difficult to search? Know the problem, don t know what to look for Lose focus (go to interesting but nonrelevant sites) Perform superficial (shallow) search Search Spam Typical Problems Links are often out of date Usually too many links are returned Returned links are not very relevant The Engines don't know about enough pages Different engines return different results Political bias (http://web.simmons.edu/~wilson4/sandbo x/google-image.html) Typical Mistakes Unnecessary words in a query Unsuitable choice of keywords Not enough flexibility in changing keywords (SEs) Divide the time devoted to search and evaluation of search results Your search did not match any documents. Bad Query! Search Tricks Technical aspects Search of Resources Useful words Useful sites Technical aspects New Window/ New Tab Use Favorites Ctrl+N - Open New Window 10 Windows (Don t be afraid to work with a lot of windows) Use Bars (i.e. Google Bar, Yahoo Bar) Search of Resources What can we search for? Thematic resource (http://www.topicmaps.org) Community Collection of articles Forum Catalogue of resources, links File (file types) Encyclopedia article Digital library Contact information (i.e. email) 4
Improving Query Results To look for a particular page use an unusual phrase you know is on that page Use phrase queries where possible Check your spelling! Progressively use more terms If you don't find what you want, use another Search Engine! Useful words Download, free Pdf, ppt, doc, zip, mp3 Forum, directory, links Faq, for newbies, for beginners, guide, rules, checklist Lecture notes, survey How, where, correct Useful sites Wikipedia.org Open Directory Project (http://dmoz.org/) How Search Engines Work (http://searchenginewatch.com/showpage. html?page=2168031) Google Help Center (http://www.google.com/support/?ctx=web) Questions? From Theory to Practice Uno Google game: compose a query consisting of two words for which Google will give only one link in the retrieved set Find a Wikipedia article on system integration Find a list of bowling clubs in Boston Find a full-text of the Love Story Find 3 different link to a page web.simmons.edu/~asist/ Find a big forum (more than 5000 messages) on Java 5