Web Search Algorithms - 1 -
|
|
- Primrose Leonard
- 5 years ago
- Views:
Transcription
1 Web Search Algorithms - 1 -
2 Why web search in this module? WWW is the delivery platform and the interface How do we find information and services on the web we try to generate a url that seems sensible Dell Computers Ford Ireland But products? GPS Devices is not ok Or, we use a Search Engine So we rely on Search Engines - we even use them to look up spellings and as a calculator! Search Engines bring people to a website For most, such as Google, ranking algorithm is closely guarded, wholesome, true, uncorrupted, and not paid advertisements are merely sold based on similarity to query keywords. This leads to the industry of Search Engine Optimisations (SEO)... the Google Dance - 2 -
3 Text IR - Google as example Google is operational since 1998 Two PhD students from Stanford?? Billion documents Early search engines competed on size of index, related to how powerful their infrastructure was. Not an issue now. Stopped advertising after 8,168,684,336 pages in Aug 2005 Size now, effectively unknown Also has??? billion images not all unique images Flickr has about 2B (Nov 2007); FaceBook had 4.1 B at that time - 3 -
4 Searching or Marketing? However, Search Engines must make a profit! Advertisment Sales Marketing Paid Listings And selling their indexes A lot of Search Engines are also marketing companies This is at odds with the idea that a search engine is a page you visit on the way elsewhere. The less time you spend there the better! But, many people pass through the doors, so they sell query focussed advertisements You can estimate by looking at the main page of the search engine
5 How do SEs help user searches It is known that we search for people / home pages. companies / company HPs (or guess from URLs). a particular product or service. a fact, buried in one or more documents, any one of which will do a document, an entire document, with text/image, and nothing smaller will do. an overview on a broad or narrow topic Media Search an MPEG-4 file. Through image databases. Through (digital) video library, and/or through a video. If the SE knows the type of query, then ranking can be tailored to that query, because different search types can be satisfied by different search algorithms
6 Search Engines Originally SE s were web directorys Manually generated (e.g. Yahoo!) Then automatic crawler-based Search Engines developed The web got big and manual categorisation was becoming too difficult (e.g. Lycos) Today the large SE s index over?? billion web pages. The first crawler-based SE was the WWWW in
7 Architecture of a Search Engine - 7 -
8 My Google! - 8 -
9 Bing -9-
10 Facebook? Is it a Search Engine?
11 Facebook Social Graph A Previous Class College Friends Friends IR Research Community
12 TWITTER
13 The Landscape is changing
14 Web 1.0 Web 3.0 Web 1.0 Static content... Companies created content We were consumers Web 2.0 User generated content Communities and creators... We create, filter, recommend the content Web 3.0 UGC and... Semantic Web... Life streams? Social and Location What is the next big thing?
15 Web 1.0 Search engines over prepared and planned content Organisations and some users SEO was the way to optimise WEB 1.0 HTML and static content
16 - 16 -
17 Web 2.0 User and Organisation Generated Content Social Graphs Social Filtering and Social Ranking Examples: Social networks : facebook, twitter, linkedin Shared bookmarks: digg, delicious, reddit, stumbleupon Social media sharing :flickr, youtube Blogs (MSN space, wordpress, blogger) Even 3D social worlds... Social gaming?
18 - 18 -
19 Web 3.0 Semantic Web Many media types... Integrated for smarter uses Rich media integration Personalisation to the user context Life streaming of content We are integrated into our own entertainment
20 What is Web3.0 about?
21 The Search Landscape Changing enormously
22 Continuous Partial Attention Be aware of Continuous Partial Attention... a kind of multitasking skimming the surface of the incoming data, picking out the relevant details, and moving on to the next stream. Continuous not episodic Cast a wider net, but never full attention So.. How does this impact on search?
23 And don t forget the twitter curve
24 Google AdSense
25 Spamming Spamming is a technique based on the manipulation of content in order to affect ranking from search engines Bogus meta tags, hidden text, plan text Also link spamming Huge SE resources are used in defeating spamming - more than in search quality improvement! Getting in the top-10 is essential for businesses 85% of users only look at top 10. Lead to the business of Search Engine Optimisation
26 Search Engine Ranking As we all know, simply examining web page content as text is not enough.. We need to examine ranking factors.. Positive and negative
27 Positive Ranking Factors : Term Location In the TITLE of the page, most important In the body of the text, but must MAKE SENSE In the Heading text (H1,H2 ) In the Domain Name Also in page URL In ALT tag and image title In BOLD/STRONG tags Terms near the top likely ranked higher than other terms
28 Positive Ranking Factors : Page Attributes Importance of the page in the Website Number of links to it from the same website Quality of links to other pages Age of a document Older may be more authorative We will see authorities later! Newer may be better for some queries (e.g. news) Amount of text on the page Structure of the page Frequency of updates Spelling and correctness of HTML
29 SE Ranking + : Website Issues Linkage of the Website Global link popularity of the website Like a global Pagerank (SiteRank) Relevance of the links into the website Link popularity of the site in a topical community Rate of new inbound links to a website Age of a website (older is better) Freshness of a website (new pages is better) Relevancy of the website (as well as the page) Clickthrough rate for the website Reputation of the top-level domain E.g..GOV &.EDU can not easily be bought
30 SE Ranking + : Linkage Issues Anchor text of inbound links as a description of the WWW page Also text surrounding the link into the webpage Topical relationship between source and target of link Link popularity of the page in a topical community Age of links The older the better, i.e. long lasting links Pagerank of the webpage Googles PageRank algorithm Number of links into a web page
31 Positive Ranking Factors : Images Images on a web page Can provide a chance to express ideas in a visual way that can convey a considerable amount of information Add to the attractiveness and perceived quality of a site. Recent Microsoft Patent on Scoring Relevance of a Document Based on Image Text Also.. Remember to name the image properly and have alt element
32 Negative Ranking Factors Link Farm Participation Try to artificially increase PageRank Proportion of links to or from known Spamming sites Duplicate Content to already indexed content Server Errors or server down-time External links to low-quality content Low level of visitors to the website Try to include hidden text on the page
33 Using the Ranking Factors PageRank Factors Linkage Factors Negative Factors Website Factors Page Factors Term Location Factors Result User Query The Search Engine ranking process is a closely guarded trade secret of the search engines
34 So lets look in some detail at some of these ranking factors Linkage-based Search
35 The Shape of the WWW This is based on a study of 200 million web pages. Scale up to WWW scale
36 Spidering : finding WWW content A Search Engine needs to find WWW content for its index This is done by the spidering software Starting from some seed WWW pages, the spider software downloads these pages and extracts the links, thereby learning about new pages to crawl. WWW-scale crawling means crawling thousands of pages per second
37 A Basic Crawling Algorithm You need to be linked to from the main Remember the shape! Given a set of seed URLs (WWW pages addresses): Add them to a (priority) queue of URLs While the queue is not empty (!empty) Take the first URL (u) off the queue Download the WWW page for u Store the URL in a list of seen URLs Index it If u is a HTML page, extract the links (y) For each y add it to the queue if it has not been visited before
38 Spiders must behave! Most crawlers/spiders will follow some rules: A spider must never request large numbers of documents from the same host sequentially change the target website as often as is feasible. A spider must never (for whatever reason) repeatedly request the same document. If a document is unavailable, it s position in the queue must be penalized Repeated failures must be taken into account and the document flagged as unavailable and taken off the queue. A spider must respect author s wishes as expressed using the robots exclusion protocol
39 Robots Exclusion allows Web site administrators to indicate to visiting robots which parts of their site should not be visited by the robot. Most good robots will process it BUT it makes a crawler less efficient more explorative crawling required To exclude all robots from the entire server User-agent: * Disallow: / To allow all robots complete access User-agent: * Disallow: To exclude all robots from part of the server User-agent: * Disallow: /cgi-bin/ Disallow: /private/ To exclude a single robot User-agent: BadBot Disallow: / To allow a single robot User-agent: WebCrawler Disallow: User-agent: * Disallow: /
40 Robots.txt example
41 Another Example
42 And one more
43 Simple Overview 1. Spidering WWW 2. Indexing View WWW page 3. Ranking
44 WWWW the first SE WWWW (94) did not use the content of a page for indexing, it used: Title of the document Text in the URL String Any anchor text from links pointing to the page. Based on using the UNIX egrep program to search through disk files. All SEs now use Linkage Analysis to exploit latent human judgement to improve retrieval performance This is in addition to using the document content
45 Some history Citation Analysis Most significant contribution to web search is the technique for how to rank Journals based on quality (impact) Citation indexing the impact factor measurement based on two elements: the number of citations in the current year to any articles published in the journal over the previous two years. the number of articles published by the journal during these two years. Letting j be a journal and IF j be the Impact Factor of journal j, we have: # Citations ( last 2years ) IF j = # Published Articles ( last 2years ) This impact factor was originally applied to medical journals as a simple method of comparing journals to each other regardless of their size
46 Hirsch Index (h-index) Citation Analysis is a balance between quality (number of citations) and quantity (number of papers); Among scientists, the h-index is becoming popular for measurement it s the number of published papers which each have a number of citations greater or equal to that number. Alan Smeaton has 250+ papers, about 3,000 citations, and an h-index of 30; Desmond Higgins (UCD) has 29,000 citations (22,500 on one paper), and an h-index of 22; Linkage analysis in web topology does something like this, as we ll see
47 Linkage Analysis Linkage Analysis : a method of ranking web sites which is based on the exploitation of latent human judgments mined from the hyperlinks that exist between documents on the WWW. The first generation of web search engines were effectively TF-IDF or BM25, or equivalent. And they have addressed the engineering problems of web spidering and efficient searching for large numbers of both users and documents. Linkage Analysis important since late 90s. Anecdotally this appears to have improved the precision of retrieval yet there was little scientific evidence in support of this until recently
48 Origin : Citation Analysis How to rank Journals based on quality (impact) Citation indexing the impact factor measurement based on two elements: the number of citations in the current year to any articles published in the journal over the previous two years. the number of articles published by the journal during these two years. Letting j be a journal and IF j be the Impact Factor of journal j, we have: # Citations ( last 2years ) IF j = # Published Articles ( last 2years ) This impact factor was originally applied to medical journals as a simple method of comparing journals to each other regardless of their size
49 Mining links can tell us that Bibliographic Coupling A and B are similar because they both cite C,D,E Co-citation Analysis A and B are similar because they are both cited by C,D,E
50 What else can we do with links? Count them? Distinguish between good and bad ones? How we employ them is called Linkage Analysis Linkage-based ranking schemes can be seen to belong to one of two distinct classes: Query-independent schemes, A score is assigned to a document once and used for all subsequent queries.» independent of a given query. Fast processing at query time! Query-dependent schemes, assigns a linkage score to a page in the context of a given query. Slower processing at query time!
51 Assumed Properties of Links When extracting information for linkage analysis from hyperlinks on the Web, two core properties can be assumed: A link between two documents on the web carries the implication of related content. If different people authored the documents (different domains, therefore off-site links), then the first author found the second document valuable. An author can-not be allowed to influence the linkage score of documents within his/her domain. Off-site links (links between web sites) are more important that links within websites or within documents
52 Link Types in-link to doc F : 5,8,9 out-link from doc F : 4,6,10 self-links: 2,11 on-site links: 6,8,12 off-site links: 1,3,4,5,9,10 on-site in-links to doc F:? off-site out-links of doc F:?
53 Basic Linkage Analysis Given a linkage graph (below), Page A is a better page than B because F G Off-site links only C A H B J D E I
54 Expanding on this However, page B may actually be better F G C A H B J CNN E Yahoo I D So we use iterative processes like PageRank or Kleinberg s
55 Generating a linkage score Let n be some web page and S n be the set of web pages that link into n across off-site links: P n = S n In this case, the P n score (Popularity score) is based purely on the in-degree of document n Could be the sole source of document ranking given a set of relevant documents (boolean IR) OR could work by integrating normal document retrieval (TF-IDF / BM25 scores) to generate an overall weight. Once again, we let n be some web page and S n be the set to pages that link into n: (#! Sim( q, n) ) + ( ) Sc ' = "! n S n parameters assumes normalisation
56 More simple linkage techniques Weighted Citation Raking Spreading Activation & Co-citation Analysis SA: Spreads a score across outlinks CA: Passes a score back to hub document
57 Hubs & Authorities A Hub is a document that contains links to many other documents C D E F An Authority is a document that many documents link to W A good Hub links to good Authorities A X A good Authority links to good Hubs Y Z
58 What makes a good Hub? What makes a good hub for the query web browsers? Internet Explorer Amaya Netscape Opera Hub Mozilla Firefox NeoPlanet MyBrowser
59 What Makes a good Authority What makes a good Authority for the query web browsers? Hub Hub Hub Hub Hub Hub Hub Internet Explorer Amaya Hub Hub Netscape Mozilla Hub Opera Firefox NeoPlanet MyBrowser
60 And What makes these authorities good? Good hubs that themselves link into good authorities a self-re-inforcing relationship! Hub Hub Hub Hub Hub Hub Hub Internet Explorer Amaya Hub Hub Netscape Mozilla Hub Opera Firefox NeoPlanet MyBrowser
61 The Influence of Links A Document s content can be represented by the anchor text of the in-links (all) into that doc, not by the document itself. More in-links, means more content, better chance of getting returned for a query. Very Simple, but effective! Improved by windowing Document Anchor Text Doc
62 The Importance of Windows
63 The Importance of Windows
64 Iterative Linkage Algorithms PageRank
65 PageRank Query INDEPENDENT score for every documents An important aspect of Google ranking? It allocates a PageRank (query independent importance) score to every document in an index, and this score is used when ranking documents. Simple Iterative Algorithm Until convergence A simulation of a random user s behaviour when browsing the web. Equivalent to a user randomly following links, or getting bored and randomly jumping to a random page anywhere on the WWW. In effect it is based on the probability of a user landing on any given page. This can be applied to other graphs than the WWW graph social networks, blog comments?
66 Key points The PR of A is divided equally among its outlinks 1/4 The PR of B is equal to the sum of the transferable PR of all its in-links W PR W =1 1/4 PR A = 1 A X PR X =1 B PR B = 2¼ 1/4 1/4 Y PR Y =1 Z PR Z =1 ¼ ½ ½ + 1 2¼
67 For Example the PageRank PR F of document F is equal to PR B divided the outdegree of B summed with PR D divided by the out-degree of D. PRB PR F = + 2 PR 3 D
68 The Simplified Technique 1, Calculate a pre-iteration PageRank score for each document for all n in N, PR = 2, Calculate PageRank score for each document n 1 N PR' n = c #! m" S n outd PR m egree m assume c = 1 3, Store new PageRank scores for all n in N, PR = PR n ' n 4, If not convergence then goto
69 A Simple Web Graph F G A B C E D
70 PageRank Sample Graph Total =
71 PageRank after Iteration Total =
72 PageRank after Iteration Total =
73 PageRank Problem 1 (Dangling Links)??
74 PageRank - Problem 2 (Rank-Sink)
75 PageRank Problem 1 (Dangling Links)??
76 PageRank Problem 1 (Dangling Links) removed
77 PageRank - Problem 2 (Rank-Sink)
78 PageRank - Problem 2 (Rank-Sink) 15% 15% 15% A Vector over All Web Pages 0.14 Doc Doc 2 15% 15% Doc 3 Doc 4 Doc Doc Doc 7 15% 15% Hence if all PageRanks sum to 1.0, then E =
79 The two problems Dangling Links: these are links that point to a page which itself contains no outlinks Docs which the system knows about (and has anchor text descriptions for) but has not downloaded yet. Or just docs with no links out If the PageRank of the web pages associated with the target of these links is not redistributed at each iteration and is lost from the system SOLUTION : Remove page or use Universal Document Rank Sinks: these are two or more pages that have outlinks to each other, but to no other pages. Assuming we have at least one inlink into these pages from a page outside of these pages then at each iteration rank enters these pages and never exits accumulates rank SOLUTION: using the E Vector with E = 0.15 or the inclusion of a Virtual (Universal Document)
80 How to use this Vector? This vector has an entry for each document and is used as an indicator of how to distribute any redundant rank back into the system. Each documents entry in the Vector (E) represents the proportion of rank to given to that document, but it is believed to be uniform with E = 0.15 if the sum of all pageranks sums to 1. But we can do personalisation e.g. to focus on Formula1 pages increase their weight in E. Letting E n be some vector over the Web pages that corresponds to a source of rank, c is a constant which is maximised and PR = 1 (sum of all PageRanks = 1), we have the following formula: PR m PR ' n c! + (1 " c)! m$ Sn outd egreem = # E n
81 Alternate Solution! Vector 0.14 Doc Doc 2 UD * Doc 3 Doc Doc Doc Doc 7 Probability of a user being bored is now 1/(n+1) where n = number of outlinks not
82 Personalised PageRank Vector 0.10 Doc Doc 2 UD * Doc 3 Doc Doc Doc Doc
83 Using PageRank Query PageRank Array Content Score (n) PageRank Score (n)??? Formula??? Final Document Score
84 Kleinberg s Algorithm Kleinberg s algorithm is similar to PageRank, in that it is an iterative algorithm based purely on the linkage of the documents on the web. However it does have some major differences: It is executed at query time, and not at indexing time, with the associated hit on performance that accompanies query-time processing. Is it used in SE s not common! It computes two scores per document (hub and authority) as opposed to a single score. It is processed on a small subset of relevant documents, not all documents as was the case with PageRank
85 Recall Hubs and Authorities HUB Page: a hub page is a page that contains a number of links to pages containing information about some topic, e.g. a resource page containing links to documents on a topic such as Formula 1 motor racing. Required pages have a hub score representing it s quality as a source of links. AUTHORITY Page: an authority page is one that contains a lot of information about some topic, an authoritive page. Consequently, many pages will link to this page, thus giving us a means of identifying it. Required pages also have an authority score representing its perceived quality by other people. Documents with high authority scores are expected to contain relevant content, whereas documents with high hub scores are expected to contain links to relevant documents
86 HITS Process 2 1 Root Set Expanded Set 3 ( Authq ) ( Hub ) Hubp =! for all p " q Authp =! q for all q " p Focused subgraph of WWW
87 Hub Scores Hub p ( Auth ) =! q for all p " q X P Y Z Q contains X,Y and Z
88 Authority Scores Auth p ( Hub ) =! q for all q " p X Y P Z Q contains X,Y and Z
89 Kleinberg s HITS Technique Iteratively calculates Hub & Authority scores Begin with all Hubs & Authority scores = iterations needed until convergence Hub scores based on Authority scores of off-site outlink docs. Auth scores based on Hub scores of off-site inlink docs. Return top X Hubs and/or Authorities Once expanded set generated then no further content analysis (Topic Independent). Narrow Topic will diffuse to a Broader Topic Broad Topic may produce inaccurate results
90 Kleinberg s Algorithm Hubi # 1 Authi # 1 loop : for n = 1,2,... N : Auth' Hub' Normalise Auth' Normalise Hub' end while( not converged ) n n = =! m" S! n" T n n Hub Auth n n, m o, obtaining Auth obtaining Hub n n
91 Wrapping up SEs SE s now provide more than just searching and are portals - consumer-oriented gateway to web resources which is editorially controlled links to what search engines, or their paying clients, believe you may be interested in. Search engines are for profit ventures, not charities Some sell their indexes Mostly advertising 10% to 15% of queries to the major search engines are on adult themes. Offer lots of extra s including : media search, identification of names, Amazon links, related searches listing, page translation, language specific search then there is photo management, , music
92 Final thoughts Sub 1 second querying is essential No time for interesting algorithms, Q&A, Manual Query Expansion, Belief is that searchers happy with sub-optimal results as long as no delay in getting them. No industry standard benchmark for evaluation
Information Retrieval Spring Web retrieval
Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic
More informationInformation Retrieval May 15. Web retrieval
Information Retrieval May 15 Web retrieval What s so special about the Web? The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically
More informationWeb Search Ranking. (COSC 488) Nazli Goharian Evaluation of Web Search Engines: High Precision Search
Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web Search Engines: High Precision Search Traditional IR systems are evaluated based on precision and recall. Web search
More informationAdvertising Network Affiliate Marketing Algorithm Analytics Auto responder autoresponder Backlinks Blog
Advertising Network A group of websites where one advertiser controls all or a portion of the ads for all sites. A common example is the Google Search Network, which includes AOL, Amazon,Ask.com (formerly
More informationpower up your business SEO (SEARCH ENGINE OPTIMISATION)
SEO (SEARCH ENGINE OPTIMISATION) SEO (SEARCH ENGINE OPTIMISATION) The visibility of your business when a customer is looking for services that you offer is important. The first port of call for most people
More informationSEOHUNK INTERNATIONAL D-62, Basundhara Apt., Naharkanta, Hanspal, Bhubaneswar, India
SEOHUNK INTERNATIONAL D-62, Basundhara Apt., Naharkanta, Hanspal, Bhubaneswar, India 752101. p: 305-403-9683 w: www.seohunkinternational.com e: info@seohunkinternational.com DOMAIN INFORMATION: S No. Details
More informationGlossary of on line marketing terms
Glossary of on line marketing terms As more and more NCDC members become interested and involved in on line marketing, the demand for a deeper understanding of the terms used in the field is growing. To
More informationSEO Services. Climb up the Search Engine Ladder
SEO Services Climb up the Search Engine Ladder 2 SEARCH ENGINE OPTIMIZATION Increase your Website s Visibility on Search Engines INTRODUCTION 92% of internet users try Google, Yahoo! or Bing first while
More informationA web directory lists web sites by category and subcategory. Web directory entries are usually found and categorized by humans.
1 After WWW protocol was introduced in Internet in the early 1990s and the number of web servers started to grow, the first technology that appeared to be able to locate them were Internet listings, also
More informationSearching the Web What is this Page Known for? Luis De Alba
Searching the Web What is this Page Known for? Luis De Alba ldealbar@cc.hut.fi Searching the Web Arasu, Cho, Garcia-Molina, Paepcke, Raghavan August, 2001. Stanford University Introduction People browse
More informationCHAPTER THREE INFORMATION RETRIEVAL SYSTEM
CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost
More informationExperimental study of Web Page Ranking Algorithms
IOSR IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. II (Mar-pr. 2014), PP 100-106 Experimental study of Web Page Ranking lgorithms Rachna
More informationAN SEO GUIDE FOR SALONS
AN SEO GUIDE FOR SALONS AN SEO GUIDE FOR SALONS Set Up Time 2/5 The basics of SEO are quick and easy to implement. Management Time 3/5 You ll need a continued commitment to make SEO work for you. WHAT
More informationModule 1: Internet Basics for Web Development (II)
INTERNET & WEB APPLICATION DEVELOPMENT SWE 444 Fall Semester 2008-2009 (081) Module 1: Internet Basics for Web Development (II) Dr. El-Sayed El-Alfy Computer Science Department King Fahd University of
More informationWorld Wide Web has specific challenges and opportunities
6. Web Search Motivation Web search, as offered by commercial search engines such as Google, Bing, and DuckDuckGo, is arguably one of the most popular applications of IR methods today World Wide Web has
More informationSpice UK. Susan Hallam. Susan Hallam Page 1. Spice UK. Agenda for Today
UK UK www.shcl.co.uk susan@shcl.co.uk Agenda for Today Getting Found in Google Social Media Marketing Adwords Pay Per Click Advertising Promotion Techniques Google Analytics susan@shcl.co.uk Page 1 UK
More informationseosummit seosummit April 24-26, 2017 Copyright 2017 Rebecca Gill & ithemes
April 24-26, 2017 CLASSROOM EXERCISE #1 DEFINE YOUR SEO GOALS Template: SEO Goals.doc WHAT DOES SEARCH ENGINE OPTIMIZATION REALLY MEAN? Search engine optimization is often about making SMALL MODIFICATIONS
More information5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search
Seo tutorial Seo tutorial Introduction to seo... 4 1. General seo information... 5 1.1 History of search engines... 5 1.2 Common search engine principles... 6 2. Internal ranking factors... 8 2.1 Web page
More informationHow To Construct A Keyword Strategy?
Introduction The moment you think about marketing these days the first thing that pops up in your mind is to go online. Why is there a heck about marketing your business online? Why is it so drastically
More informationTable of Contents. How Google Works in the Real World. Why Content Marketing Matters. How to Avoid Getting BANNED by Google
Table of Contents How Google Works in the Real World Why Content Marketing Matters How to Avoid Getting BANNED by Google 5 Things Your Content MUST HAVE According to Google The Greatest Content Secret
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationIntroduction! 2. Why You NEED This Guide 2. Step One: Research! 3. What Are Your Customers Searching For? 3. Step Two: Title Tag!
Table of Contents Introduction! 2 Why You NEED This Guide 2 Step One: Research! 3 What Are Your Customers Searching For? 3 Step Two: Title Tag! 4 The First Thing Google Sees 4 How Do I Change It 4 Step
More informationSearch Engines. Charles Severance
Search Engines Charles Severance Google Architecture Web Crawling Index Building Searching http://infolab.stanford.edu/~backrub/google.html Google Search Google I/O '08 Keynote by Marissa Mayer Usablity
More informationInformation Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group
Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)
More informationWhat s an SEO Strategy With Out Social Media?
What s an SEO Strategy With Out Social Media? Search & Social Mark Chard Social Media has become a huge part of our everyday life. We keep in touch with friends and family through Facebook, we express
More informationInformation Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.
Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.
More informationAN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES
Journal of Defense Resources Management No. 1 (1) / 2010 AN OVERVIEW OF SEARCHING AND DISCOVERING Cezar VASILESCU Regional Department of Defense Resources Management Studies Abstract: The Internet becomes
More informationInformation Retrieval. Lecture 4: Search engines and linkage algorithms
Information Retrieval Lecture 4: Search engines and linkage algorithms Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk Today 2
More informationCorner The Local Search Engine Market Four Steps to Ensure your Business will Capitalize from Local Google Search Exposure by Eric Rosen
Corner The Local Search Engine Market Four Steps to Ensure your Business will Capitalize from Local Google Search Exposure by Eric Rosen 2011 www.marketingoutthebox.com Table of Contents Introduction
More informationEuropcar International Franchisee Websites Search Engine Optimisation
Introduction Everybody would like their site to be found easily on search engines. There is no magic that can guarantee this, but there are some principles that by following will help in your search engine
More informationThe Insanely Powerful 2018 SEO Checklist
The Insanely Powerful 2018 SEO Checklist How to get a perfectly optimized site with the 2018 SEO checklist Every time we start a new site, we use this SEO checklist. There are a number of things you should
More informationLecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule
Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule 1 How big is the Web How big is the Web? In the past, this question
More information6 WAYS Google s First Page
6 WAYS TO Google s First Page FREE EBOOK 2 CONTENTS 03 Intro 06 Search Engine Optimization 08 Search Engine Marketing 10 Start a Business Blog 12 Get Listed on Google Maps 15 Create Online Directory Listing
More informationTHE QUICK AND EASY GUIDE
THE QUICK AND EASY GUIDE TO BOOSTING YOUR ORGANIC SEO A FEROCIOUS DIGITAL MARKETING AGENCY About Designzillas IS YOUR BUSINESS FEROCIOUS? Our Digital Marketing Agency specializes in custom website design
More informationSearch Engine Optimization. MBA 563 Week 6
Search Engine Optimization MBA 563 Week 6 SEARCH ENGINE OPTIMIZATION (SEO) Search engine marketing 2 major methods TWO MAJOR METHODS - OBJECTIVE IS TO BE IN THE TOP FEW SEARCH RESULTS 1. Search engine
More informationSEO Factors Influencing National Search Results
SEO Factors Influencing National Search Results 1. Domain Age Domain Factors 2. Keyword Appears in Top Level Domain: Doesn t give the boost that it used to, but having your keyword in the domain still
More informationSEO. Definitions/Acronyms. Definitions/Acronyms
Definitions/Acronyms SEO Search Engine Optimization ITS Web Services September 6, 2007 SEO: Search Engine Optimization SEF: Search Engine Friendly SERP: Search Engine Results Page PR (Page Rank): Google
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationWhat is SEO? Search Engine Optimization 101
What is SEO? Search Engine Optimization 101 What is Search Engine Optimization (SEO)? Paid Search Listings SEO is the practice of improving and promoting a website to increase the number of Organic visitors
More informationWhat the is SEO? And how you can kick booty in the interwebs game
What the F^@& is SEO? And how you can kick booty in the interwebs game 1 WHAT THE F^$& is SEO?? SEO (SEARCH ENGINE OPTIMIZATION) is the process of improving your website so that it attracts more visitors
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationSearching the Web [Arasu 01]
Searching the Web [Arasu 01] Most user simply browse the web Google, Yahoo, Lycos, Ask Others do more specialized searches web search engines submit queries by specifying lists of keywords receive web
More informationWebsite Name. Project Code: # SEO Recommendations Report. Version: 1.0
Website Name Project Code: #10001 Version: 1.0 DocID: SEO/site/rec Issue Date: DD-MM-YYYY Prepared By: - Owned By: Rave Infosys Reviewed By: - Approved By: - 3111 N University Dr. #604 Coral Springs FL
More informationTHE HISTORY & EVOLUTION OF SEARCH
THE HISTORY & EVOLUTION OF SEARCH Duration : 1 Hour 30 Minutes Let s talk about The History Of Search Crawling & Indexing Crawlers / Spiders Datacenters Answer Machine Relevancy (200+ Factors)
More informationHow to Get Your Web Maps to the Top of Google Search
How to Get Your Web Maps to the Top of Google Search HOW TO GET YOUR WEB MAPS TO THE TOP OF GOOGLE SEARCH Chris Brown CEO & Co-founder of Mango SEO for web maps is particularly challenging because search
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationWhy it Really Matters to RESNET Members
Welcome to SEO 101 Why it Really Matters to RESNET Members Presented by Fourth Dimension at the 2013 RESNET Conference 1. 2. 3. Why you need SEO How search engines work How people use search engines
More informationBusiness Forum Mid Devon. Optimising your place on search engines
Optimising your place on search engines What do I know? Professional copywriter since 1996 Words inform Google and Bing Content is now king on Google Work on SEO campaigns for clients Who are Oxygen? Who
More informationA STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE
A STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE Bohar Singh 1, Gursewak Singh 2 1, 2 Computer Science and Application, Govt College Sri Muktsar sahib Abstract The World Wide Web is a popular
More informationSEO ISSUES FOUND ON YOUR SITE (MARCH 29, 2016)
www.advantageserviceco.com SEO ISSUES FOUND ON YOUR SITE (MARCH 29, 2016) This report shows the SEO issues that, when solved, will improve your site rankings and increase traffic to your website. 16 errors
More informationActivity: Google. Activity #1: Playground. Search Engine Optimization Google Results Organic vs. Paid. SEO = Search Engine Optimization
E-Marketing ----- SEO Topics Exploring search engine optimization tactics and techniques to achieve high rankings On-Page optimization Off-Page optimization Understand how web search engines handle your
More informationDigital Marketing Proposal
Digital Marketing Proposal ---------------------------------------------------------------------------------------------------------------------------------------------- 1 P a g e We at Tronic Solutions
More informationWeb Structure Mining using Link Analysis Algorithms
Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.
More informationFor Starters Web 4.0. Entrée Thrive Online. Dessert Listen and Evolve. Search Marketing for Today s Lunch Menu
Search Marketing for 2010 Today s Lunch Menu For Starters Web 4.0 Entrée Thrive Online Dessert Listen and Evolve HZDG SEO Monday, December 7, 2009 2 For Starters Web 4.0 What is the current status of the
More informationDigital Marketing for Small Businesses. Amandine - The Marketing Cookie
Digital Marketing for Small Businesses Amandine - The Marketing Cookie Search Engine Optimisation What is SEO? SEO stands for Search Engine Optimisation. Definition: SEO is a methodology of strategies,
More informationWeighted Page Rank Algorithm Based on Number of Visits of Links of Web Page
International Journal of Soft Computing and Engineering (IJSCE) ISSN: 31-307, Volume-, Issue-3, July 01 Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page Neelam Tyagi, Simple
More informationReading Time: A Method for Improving the Ranking Scores of Web Pages
Reading Time: A Method for Improving the Ranking Scores of Web Pages Shweta Agarwal Asst. Prof., CS&IT Deptt. MIT, Moradabad, U.P. India Bharat Bhushan Agarwal Asst. Prof., CS&IT Deptt. IFTM, Moradabad,
More informationInformation Retrieval
Introduction to Information Retrieval CS3245 12 Lecture 12: Crawling and Link Analysis Information Retrieval Last Time Chapter 11 1. Probabilistic Approach to Retrieval / Basic Probability Theory 2. Probability
More informationGary Viray Founder, Search Opt Media Inc. Search.Rank.Convert.
SEARCH + SOCIAL Gary Viray Founder, Search Opt Media Inc. Goo gol Google Algorithm Change Google Toolbar December 2000 Birth of Toolbar Pagerank They move the toilet mid stream. 404P Pages are ranking
More informationAlmost 80 percent of new site visits begin at search engines. A couple of years back Nielsen published a list of popular search engines.
SEO OverView We have a problem, we want people to visit our Web site, that's the purpose after all to bring people to our website and increase traffic inorder to buy soundspirit products and learn more
More informationSEO: SEARCH ENGINE OPTIMISATION
SEO: SEARCH ENGINE OPTIMISATION SEO IN 11 BASIC STEPS EXPLAINED What is all the commotion about this SEO, why is it important? I have had a professional content writer produce my content to make sure that
More informationCS47300 Web Information Search and Management
CS47300 Web Information Search and Management Search Engine Optimization Prof. Chris Clifton 31 October 2018 What is Search Engine Optimization? 90% of search engine clickthroughs are on the first page
More informationBrief (non-technical) history
Web Data Management Part 2 Advanced Topics in Database Management (INFSCI 2711) Textbooks: Database System Concepts - 2010 Introduction to Information Retrieval - 2008 Vladimir Zadorozhny, DINS, SCI, University
More informationSearching the Web for Information
Search Xin Liu Searching the Web for Information How a Search Engine Works Basic parts: 1. Crawler: Visits sites on the Internet, discovering Web pages 2. Indexer: building an index to the Web's content
More informationUNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.
UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index
More information3 Media Web. Understanding SEO WHITEPAPER
3 Media Web WHITEPAPER WHITEPAPER In business, it s important to be in the right place at the right time. Online business is no different, but with Google searching more than 30 trillion web pages, 100
More informationLesson 2 Analysing Your Online Presence
Lesson 2 Analysing Your Online Presence On completion of this lesson you should be able to: Be aware of some website diagnostic tools available on the internet Understand how to perform various diagnostic
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly
More informationDigital Marketing Glossary of Basic Terms & Concepts
Digital Marketing Glossary of Basic Terms & Concepts A/B Testing Testing done to compare two variations of something against a variable. Often done to test the effectiveness of marketing tactics such as
More informationHome Page. Title Page. Page 1 of 14. Go Back. Full Screen. Close. Quit
Page 1 of 14 Retrieving Information from the Web Database and Information Retrieval (IR) Systems both manage data! The data of an IR system is a collection of documents (or pages) User tasks: Browsing
More informationThursday, 26 January, 12. Web Site Design
Web Site Design Not Just a Pretty Face Easy to update Responsive (mobile, tablet and web-friendly) Fast loading RSS enabled Connect to social channels Easy to update To me, that means one platform, WordPress.
More informationDigital News and Social Content. How to revitalize your news content and make it relevant in the digital age
Digital News and Social Content How to revitalize your news content and make it relevant in the digital age Adapting to the Digital World A new format: Provide the facts Package it in sections Add visual
More informationAdministrative. Web crawlers. Web Crawlers and Link Analysis!
Web Crawlers and Link Analysis! David Kauchak cs458 Fall 2011 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture15-linkanalysis.ppt http://webcourse.cs.technion.ac.il/236522/spring2007/ho/wcfiles/tutorial05.ppt
More informationGoogle Analytics. Gain insight into your users. How To Digital Guide 1
Google Analytics Gain insight into your users How To Digital Guide 1 Table of Content What is Google Analytics... 3 Before you get started.. 4 The ABC of Analytics... 5 Audience... 6 Behaviour... 7 Acquisition...
More informationF. Aiolli - Sistemi Informativi 2007/2008. Web Search before Google
Web Search Engines 1 Web Search before Google Web Search Engines (WSEs) of the first generation (up to 1998) Identified relevance with topic-relateness Based on keywords inserted by web page creators (META
More informationLecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science
Lecture 9: I: Web Retrieval II: Webology Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen April 10, 2003 Page 1 WWW retrieval Two approaches
More informationSEO and Monetizing The Content. Digital 2011 March 30 th Thinking on a different level
SEO and Monetizing The Content Digital 2011 March 30 th 2011 Getting Found and Making the Most of It 1. Researching target Audience (Keywords) 2. On-Page Optimisation (Content) 3. Titles and Meta Tags
More informationSearch Engine Technology. Mansooreh Jalalyazdi
Search Engine Technology Mansooreh Jalalyazdi 1 2 Search Engines. Search engines are programs viewers use to find information they seek by typing in keywords. A list is provided by the Search engine or
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationCOMP 4601 Hubs and Authorities
COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one
More informationWebSite Grade For : 97/100 (December 06, 2007)
1 of 5 12/6/2007 1:41 PM WebSite Grade For www.hubspot.com : 97/100 (December 06, 2007) A website grade of 97 for www.hubspot.com means that of the thousands of websites that have previously been submitted
More informationKnowing something about how to create this optimization to harness the best benefits will definitely be advantageous.
Blog Post Optimizer Contents Intro... 3 Page Rank Basics... 3 Using Articles And Blog Posts... 4 Using Backlinks... 4 Using Directories... 5 Using Social Media And Site Maps... 6 The Downfall Of Not Using
More informationSEO According to Google
SEO According to Google An On-Page Optimization Presentation By Rachel Halfhill Lead Copywriter at CDI Agenda Overview Keywords Page Titles URLs Descriptions Heading Tags Anchor Text Alt Text Resources
More informationA Survey on Web Information Retrieval Technologies
A Survey on Web Information Retrieval Technologies Lan Huang Computer Science Department State University of New York, Stony Brook Presented by Kajal Miyan Michigan State University Overview Web Information
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationThe influence of caching on web usage mining
The influence of caching on web usage mining J. Huysmans 1, B. Baesens 1,2 & J. Vanthienen 1 1 Department of Applied Economic Sciences, K.U.Leuven, Belgium 2 School of Management, University of Southampton,
More informationInternet Basics. Basic Terms and Concepts. Connecting to the Internet
Internet Basics In this Learning Unit, we are going to explore the fascinating and ever-changing world of the Internet. The Internet is the largest computer network in the world, connecting more than a
More informationSEO ISSUES FOUND ON YOUR SITE (DECEMBER 30, 2016)
www.alpinebillings.com SEO ISSUES FOUND ON YOUR SITE (DECEMBER 30, ) This report shows the SEO issues that, when solved, will improve your site rankings and increase traffic to your website. 13 errors
More informationReview of Wordpresskingdom.com
Review of Wordpresskingdom.com Generated on 208-2-6 Introduction This report provides a review of the key factors that influence the SEO and usability of your website. The homepage rank is a grade on a
More informationDigital Communication. Daniela Andreini
Digital Communication Daniela Andreini Using Digital Media Channels to support Business Objectives ENGAGE Build customer and fan relationships through time to achieve retention goals KPIs -% active hurdle
More informationResearch. Niche research. Market research
Research LeapFroggr.com Market research Niche research Data gathering Current state of the website vs competitors Your business info (business name, address, contact #s, etc.) Activate Analytics on site
More informationPLUS. Checklist. 5 top tips. on content marketing. Marketing WHS HR Business Growth International Trade Legal
PLUS 5 top tips on content marketing Checklist Marketing WHS HR Business Growth International Trade Legal The ABCS SEO Checklist How are you performing with your SEO? Take our checklist and find out! Check
More informationAdministrivia. Crawlers: Nutch. Course Overview. Issues. Crawling Issues. Groups Formed Architecture Documents under Review Group Meetings CSE 454
Administrivia Crawlers: Nutch Groups Formed Architecture Documents under Review Group Meetings CSE 454 4/14/2005 12:54 PM 1 4/14/2005 12:54 PM 2 Info Extraction Course Overview Ecommerce Standard Web Search
More informationSEO News. 15 SEO Fixes for Better Rankings. For SEO, marketing books and guides, visit
SEO News SEO News and Updates as We Published So Far! For latest news, visit http://www.nigcworld.com/wp/seonews this week latest seo updates google bing yahooothers/ 15 SEO Fixes for Better Rankings 1.
More informationSocial Networks 2015 Lecture 10: The structure of the web and link analysis
04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information
More informationLecture 8: Linkage algorithms and web search
Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk 2017
More informationDigital Marketing. Introduction of Marketing. Introductions
Digital Marketing Introduction of Marketing Origin of Marketing Why Marketing is important? What is Marketing? Understanding Marketing Processes Pillars of marketing Marketing is Communication Mass Communication
More informationFOUNDATIONAL SEO Search Engine Optimization. Adam Napolitan
FOUNDATIONAL SEO Search Engine Optimization Adam Napolitan anapolitan@ucdavis.edu Technical SEO Content SEO Amplification SEO Don t forget UGC SEM Search Engine Marketing How does? it work What is the
More informationBibliometrics: Citation Analysis
Bibliometrics: Citation Analysis Many standard documents include bibliographies (or references), explicit citations to other previously published documents. Now, if you consider citations as links, academic
More informationHow Does a Search Engine Work? Part 1
How Does a Search Engine Work? Part 1 Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0 What we ll examine Web crawling
More information