Internet Search. (COSC 488) Nazli Goharian Nazli Goharian, 2005, Outline
|
|
- Myra Wilson
- 5 years ago
- Views:
Transcription
1 Internet Search (COSC 488) Nazli Goharian Nazli Goharian, 2005, Outline Web: Indexing & Efficiency Partitioned Indexing Index Tiering & other early termination techniques Index in Dynamic Environment Improving effectiveness of Web search engines Web page ranking Query log, anchor text, authority/hub, page rank, sponsored search, localized search, social search Result snippets Social Search tagging, collaborative search/filtering, recommender system Real-time search Peer-to-Peer Search 2 1
2 The Web Document collections are scattered across many geographical areas. Constraints prohibiting the centralization of data include: Data security Volume Rate of change Political and legal constraints Other proprietary motivations 3 Web Search Parallel and distributed processing Web search tools access data distributed on servers worldwide but indexed centrally. Most of these systems have a partitioned index on large clusters of servers with a centralized control. They store pointers in the form of hypertext links to various Web servers. 4 2
3 Partitioned Indexing Partitioning of index across multiple machines, based on either: Terms (Global index organization) Each node holds posting list for some terms Using content-index, query terms sent to nodes having the terms Higher concurrency level, but larger postings lists Documents (Local index organization) more common Each node holds a complete index (shorter PLs) Query terms sent to all nodes Top k results from each node merged Global statistics (e.g.. idf) must be calculated A Hybrid approach in Tiered Indexing may be used 5 Index Tiering A popular early termination technique to improve the efficiency of query processing Dividing nodes into two tiers to allocate the index of most popular documents on tier 1 and the rest on tier 2. Search tier 1 first, if not enough results then search tier 2. Note: other popular early termination techniques (top-doc and query pruning) were discussed earlier in the semester! 6 3
4 Distributed Index Construction Not possible on a single machine Various architecture for distributed indexing MapReduce architecture (a term-partitioned index) Master node assigns tasks to worker nodes (map workers & reduce workers) to split up the computing jobs: Map Phase: Parsing & building localized <term, doc> pairs Reduce Phase: Combining/merging posting pairs for each term 7 MapReduce (Cont d) Map & reduce phases can be done in parallel on many machines A map machine can be a reducer machine in the process Data broken into pieces (shards) generally 16M-64 M [128M] and send to map workers as they finish their job Map workers work on one shard at a time (generally), unless having more than one CPU, parse and generate <term,doc> pair (can be combined to <term,doc,tf> Sort based on term, and then secondary key (doc_id) The same keys (terms) are assigned to the same reduce worker Load should be balanced on the reducers 8 4
5 MapReduce (Cont d) Taken from: C. Manning, P. Raghavan, H. Schutze, Introduction to Information Retrieval, Cambridge University Press, Query Servers Each server has its own disk holding a portion of index Queries are distributed, via a centralized control, to servers that contain the related posting lists Common terms may map to many servers No single point of resource contention (efficient) If a server crashes, that portion of index is not available 10 5
6 Index in Dynamic Environment Data collection is not static Reconstruct the index periodically from scratch (many search engines use this) Maintain an auxiliary index to store new document & remerge with existing index Maintain multiple indexes - complicated in maintaining collection statistics 11 Outline Web: Indexing & Efficiency Partitioned Indexing Index Tiering & other early termination techniques Index in Dynamic Environment Improving effectiveness of Web search engines Web page ranking Query log, anchor text, authority/hub, page rank, sponsored search, localized search, social seacrh Result snippets Social Search tagging, collaborative search/filtering, recommender system Real-time search Peer-to-Peer Search 12 6
7 Definitions. Web graph: each page is a node and links are directed edges from one node to other node Out-links (out-degree) A: links from page A to B In-links (in-degree) A: links from other pages to A Sink: if out-links = 0 Source: if in-links=0 Static page: pages that are generated prior to any request Dynamic page: pages that generated as the result of a request Hidden/deep web: pages with no links/password protected/via a Form, Indexable Web: union of pages indexed by major search engines 13 Evaluation of Web Search Engines: High Precision Search Traditional IR systems are evaluated based on precision and recall. Web search engines are evaluated based on top N documents. Recall estimation is very difficult Precision is of limited concern, as many users do not look beyond 1 st screen. => How fast and accurate the first results screen is generated? 14 7
8 Web Page Ranking Considering both query dependant and query independent scores (captured during indexing), a global score is generated for each page: Query dependant score Similarity measures such as Cosine, BM25, proximity, Query independent score Link analysis (anchor text, popularity metrics such as: authorities and hub, page rank, ) Sponsored search Localized search Query log analysis etc. 15 Query Log Analysis Using user query patterns on certain days and time of day, week, month, and year, many optimizations are possible: Pre-cache likely Web pages in anticipation of user queries to reduce page access delays; increasing system throughput (efficiency optimization) Possible to adjust relevance ranking to tune for certain user queries (accuracy optimization) 16 8
9 Anchor Text Short, 2-3 terms, describe the linked/destination page. May/may not be a different point of view than the author s. Anchor text of links to a doc d i included in index for d i Extended anchor text (text surrounding anchor text) may also be used Generally weighted based on frequency (notion of idf) Spamming problem 17 Page Rank A scoring mechanism in Web search (trade marked by Google and patented by Stanford) Generally calculated at the time of crawling Using incoming and outgoing links as an indicator of popularity, adjusts Web page score Popular page is defined as a page that - Many Web pages link to it (inlinks) - Important (popular) pages link to it May be affected by link spam 18 9
10 Page Rank PageRank ( A) = (1 d ) + d N PageRank ( D ) D... D n C ( D 1 i ) i C(D i ) : number of links out from page D i d : damping factor (from 0-1; commonly 0.85) N: total number of pages An Iterative Algorithm: Initially all pages are assigned an arbitrary page rank (1/n), summing to 1 Iteratively calculate the scores until the new scores do not change significantly To converge faster, may initialize page ranks based on number of inlinks, log info,. 19 Authorities and Hub Various algorithms based on assigning each retrieved web page two scores: Authority and Hub scores. (HITS: Hyperlink-Induced Topic Search, 1999) Authority page: an authoritative source on a given topic Hub page: page listing pointers to authority pages on a topic Authority score: summation of scores of all the hubs pointing to that authority page Hub score: summation of scores of all authority pages the hub is pointing to 20 10
11 Computing Authority and Hub Scores Retrieve all pages containing the query term t. This is called root set. (~200 pgs) Create a set including union of root set pages, pages that point to root set pages, and pages that root set pages point to. This is called base set. Using the base set to compute the hub and authority scores. An iterative algorithm: Initialize hubs and authorities a score of 1 Update s(h) and s(a) 21 Sponsored Search Search system vendors sell advertisers keywords so that whenever such words are issued in a query, the advertiser s desired homepage link is returned. Sponsored search results are biased towards advertisers with higher bids, click frequency of Ads, Significant revenue is generated to search engine vendors via such search approach (ex.: per click (50 sents to 15 dollars) 22 11
12 Sponsored Search Search engines maintain an advertisement database (Description of advertisement, link to that page, bids, popularity, ) Searching the advertisement database for a match to: query terms keywords extracted from retrieved result page (pseudo-relevance feedback, page features, ) Ranking advertisements based on bids (on keywords) and advertisement popularity (using clickthrough data logs) 23 Localized Search Using geographic information to modify the ranking of results (in addition to SC scores, link based scores, ). Geographic information maybe derived from: Location of device sending the query Context of query restaurant near Al Capone s home s town restaurant Near White Sox stadium Geographic location in the query Chicago restaurants Geographic location in a document metadata 24 12
13 Result Snippets Providing users a short summary (snippet) of page (title, url, link to cached page, snippet). Static snippets Query independent Created at indexing time and cached Containing title, n number of sentences/words, (NLP can be used) Dynamic snippets Query dependent Created at the time of results scoring Windows of the document - also called KWIC (keyword in context) 25 Result Snippets Index maintains sentence level information Snippet sentences can be picked: Based on query term(s): heading Location in document (n th sentence) Closeness of query terms in sentence Ratio of query terms in sentence Unique query terms in sentence From page metadata 26 13
14 Result Snippets An effective snippet should:(clarke et al 2007 s clickthrough analysis) have all the query terms (unless already included in title) Use the page metadata, if needed Display URL and mark the query terms Provide meaningful snippets vs. only some keywords 27 Outline Web: Indexing & Efficiency Partitioned Indexing Index Tiering & other early termination techniques Index in Dynamic Environment Improving effectiveness of Web search engines Web page ranking Query log, anchor text, authority/hub, page rank, sponsored search, localized search, social search Result snippets Social Search tagging, collaborative search/filtering, recommender system Real-time search Peer-to-Peer Search 28 14
15 Social Search Social search introduces new aspects to search engines Village paradigm (Collaborative) [Horowitz & Kamvar, WWW 10] Crowd/ Social network /friends vs. Corpus-based Routing questions to potential answerers Community of users, sharing goal or interest, participate in search and interact with each other online YouTube, Twitter, Flickr, Facebook, Myspace, LinkedIn, forums, blogs, online games, From Wikipedia: Social search or a social search engine is a type of web search that takes into account the Social Graph of the person initiating the search query. When applied to web search this Social-Graph approach to relevance is in contrast to established algorithmic or machine-based approaches where relevance is determined by analyzing the text of each document or the link structure of the documents 29 Real-Time Search Traditional search indexes the crawled pages Real-time search results of search engines such as Google, Bing, Yahoo come from variety of real-time search services such as twitter, flicker, your-tube, etc. Receive data directly from various social media and blogs (subscribed to social networking sites) A filtering engine identifies spams Measuring relevance -- The ranking is based on: Time, relevance to query, number of followers of authors, reputation of a link defined by the frequency of forwarding (re-tweets), First real time search: Summize in 2007 with real time trend analysis later on merged with twitter 2008) 30 15
16 Social Search Documents or websites are deemed relevant if searcher s social network were also interested in it. Nature of queries Many cases opinionated, subjective Query length (Many cases longer queries than Web s) Index Storing user s behavior ( responsiveness, answer quality, expertise) Mapping users to topics 31 Social Search Social Search Ranking based on combination of: Query-dependent (prob. of a good answer to query q by user u) Similarity of results to query (various ranking: cosine, bm25, proxomity, ) Relatedness of query/results to user Query-independent How many users bookmarked x Social Trust Similarity of asker to answerer -- user profiles similarity, users connectedness 32 16
17 Social Search Mapping users to topics. An example: [Horowitz & Kamvar, WWW 10] User specifies interest /expertise in topics Friends of users indicate the expertise of user u in topics Automatically identified topics from User s existing online profiles User s homepages, blogs User s status messages (Twitter, Facebook, IM, ) 33 Social Search Measuring connectedness using cosine similarity over various features, such as: [Horowitz & Kamvar, WWW 10] Social connection (common friends and affiliations) Demographic similarity Profile similarity (e.g., common favorite movies) Vocabulary match (e.g., IM shortcuts) Chattiness match (frequency of follow-up messages) Verbosity match (the average length of messages) Politeness match (e.g., use of Thanks! ) Speed match (responsiveness to other users) 34 17
18 Social Search Sample approach: [Karweg, et.al, CIKM 11]: Social Relevance Score (SRS) ranks the result elements of a query according to their social relevance for the user. It is calculated based on 2 factors: Engagement Intensity: how intense the users interacted with the result Engagement: Interaction in terms of recommendation, rating, status messages Intensity: effort of textual feedback vs. rating score /thumps up Trust Score : level of trust to those who recommend a link Assigned by users & refined by social network analysis using page-rank on social graph SRS(i): social rank score of document/page i X: a user in social network interacted/recommended page i SRS ( i) = t ( x). e ( i) x E i s x 35 Social Search -- Trust Trust has been discussed for years in sociology and social psychology [Marsh, Ph.D. dissertation,1994] formalized trust as a computational concept (agents that keep history of behaviors) Trust in peer-to-peer, EigenTrust [Kamvar et al. 2004] (corrupt vs. valid files) Various efforts in formalization of trust in recommender systems and social network [Swearingen and Sinha,2001], [Ziegler and Golbeck [2006]. The more similar two people were, the greater the trust between them [Ziegler and Golbeck [2006]. Trust in a person is a commitment to an action based on a belief that the future actions of that person will lead to a good outcome. Example: Alice trusts Bob regarding if she chooses to read a message (commits to an action) that Bob sends her (based on her belief that Bob will not waste her time 36 18
19 Tagging Social media sites allow users to tag the data User tags act as manual indexing of data in addition to automatic indexing User tags serve as folksonomy Tags are used to organize and search data Challenges with the tagged data: Vocabulary mismatch Noisy or Spam tags Missing tags 37 Searching Tagged Data: Vocabulary Mismatch problem Tag keywords describe textual or non-textual data and are used to search for items Tags are very sparse (only few keywords) Boolean (conjunctive, disjunctive) search can lead to high precision/low recall or high recall/low precision To reduce the vocabulary mismatch perform stemming, or pseudo-relevance feedback 38 19
20 Searching Tagged Data: Noisy and Spam Tags Spam/misspelled/non-relevant tags mislead search Some incentive must be provided to users to report spam tags, and to enter good quality tags. Log and statistical information may help to identify spam tags 39 Searching Tagged Data: Missing Tags Automatically generate tags for items with missing tags, using: Term weight of textual representation of item Classification of item to a label (i.e.. Tag) 40 20
21 Tag Clouds The most popular tags are represented to users to provide a more wide view of collection Tag cloud displays the tags as a weighted list The font size is proportional to the weight Thanks to: tagcloud generator & F. Silvestri, CNR, Italy, S. Orlando, U. of Venice, Italy 41 Recommender Systems 21
Web Search Ranking. (COSC 488) Nazli Goharian Evaluation of Web Search Engines: High Precision Search
Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web Search Engines: High Precision Search Traditional IR systems are evaluated based on precision and recall. Web search
More informationEfficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)
Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-
More informationSocial Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson
Social Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson The Anatomy of a Large-Scale Social Search Engine by Horowitz, Kamvar WWW2010 Web IR Input is a query of keywords
More informationClustering (COSC 416) Nazli Goharian. Document Clustering.
Clustering (COSC 416) Nazli Goharian nazli@cs.georgetown.edu 1 Document Clustering. Cluster Hypothesis : By clustering, documents relevant to the same topics tend to be grouped together. C. J. van Rijsbergen,
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationClustering (COSC 488) Nazli Goharian. Document Clustering.
Clustering (COSC 488) Nazli Goharian nazli@ir.cs.georgetown.edu 1 Document Clustering. Cluster Hypothesis : By clustering, documents relevant to the same topics tend to be grouped together. C. J. van Rijsbergen,
More informationFunctionality, Challenges and Architecture of Social Networks
Functionality, Challenges and Architecture of Social Networks INF 5370 Outline Social Network Services Functionality Business Model Current Architecture and Scalability Challenges Conclusion 1 Social Network
More informationInformation Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.
Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.
More informationSocial Search Networks of People and Search Engines. CS6200 Information Retrieval
Social Search Networks of People and Search Engines CS6200 Information Retrieval Social Search Social search Communities of users actively participating in the search process Goes beyond classical search
More informationInformation Retrieval Spring Web retrieval
Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris Manning at Stanford U.) The Web as a Directed Graph
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationSearch Engine Architecture II
Search Engine Architecture II Primary Goals of Search Engines Effectiveness (quality): to retrieve the most relevant set of documents for a query Process text and store text statistics to improve relevance
More informationSearching the Web What is this Page Known for? Luis De Alba
Searching the Web What is this Page Known for? Luis De Alba ldealbar@cc.hut.fi Searching the Web Arasu, Cho, Garcia-Molina, Paepcke, Raghavan August, 2001. Stanford University Introduction People browse
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationInformation Retrieval May 15. Web retrieval
Information Retrieval May 15 Web retrieval What s so special about the Web? The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically
More informationDigital Marketing Proposal
Digital Marketing Proposal ---------------------------------------------------------------------------------------------------------------------------------------------- 1 P a g e We at Tronic Solutions
More informationThe Anatomy of a Large-Scale Hypertextual Web Search Engine
The Anatomy of a Large-Scale Hypertextual Web Search Engine Article by: Larry Page and Sergey Brin Computer Networks 30(1-7):107-117, 1998 1 1. Introduction The authors: Lawrence Page, Sergey Brin started
More information60-538: Information Retrieval
60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing
More informationSearch Engines Information Retrieval in Practice
Search Engines Information Retrieval in Practice W. BRUCE CROFT University of Massachusetts, Amherst DONALD METZLER Yahoo! Research TREVOR STROHMAN Google Inc. ----- PEARSON Boston Columbus Indianapolis
More informationSocial Networks 2015 Lecture 10: The structure of the web and link analysis
04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information
More informationDATA MINING II - 1DL460. Spring 2014"
DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationWeb Search Algorithms - 1 -
Web Search Algorithms - 1 - Why web search in this module? WWW is the delivery platform and the interface How do we find information and services on the web we try to generate a url that seems sensible
More informationBruno Martins. 1 st Semester 2012/2013
Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4
More information[DIGITAL MARKETING PROPOSAL TO WEBSITE NAME]
[DIGITAL MARKETING PROPOSAL TO WEBSITE NAME] About RAKESH TECH SOLUTIONS We at RAKESH TECH Solutions are committed to provide you the best solution in Digital Marketing and also best support in the industry.
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationAURA ACADEMY Training With Expertised Faculty Call us on for Free Demo
AURA ACADEMY Training With Expertised Faculty Call us on 8121216332 for Free Demo DIGITAL MARKETING TRAINING Digital Marketing Basics Basics of Advertising What is Digital Media? Digital Media Vs. Traditional
More informationInformation Retrieval
Introduction to Information Retrieval Boolean retrieval Basic assumptions of Information Retrieval Collection: Fixed set of documents Goal: Retrieve documents with information that is relevant to the user
More informationWeb Personalization & Recommender Systems
Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher, Depaul University - Recent publications: see the last page (Reference section) Web Personalization & Recommender
More informationTelling Experts from Spammers Expertise Ranking in Folksonomies
32 nd Annual ACM SIGIR 09 Boston, USA, Jul 19-23 2009 Telling Experts from Spammers Expertise Ranking in Folksonomies Michael G. Noll (Albert) Ching-Man Au Yeung Christoph Meinel Nicholas Gibbins Nigel
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationAn Oracle White Paper October Oracle Social Cloud Platform Text Analytics
An Oracle White Paper October 2012 Oracle Social Cloud Platform Text Analytics Executive Overview Oracle s social cloud text analytics platform is able to process unstructured text-based conversations
More informationDIGITAL MARKETING TRAINING. What is marketing and digital marketing? Understanding Marketing and Digital Marketing Process?
DIGITAL MARKETING TRAINING CURRICULUM Overview of Digital Marketing What is marketing and digital marketing? Understanding Marketing and Digital Marketing Process? Website Creation Understanding about
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable References Bigtable: A Distributed Storage System for Structured Data. Fay Chang et. al. OSDI
More informationInverted List Caching for Topical Index Shards
Inverted List Caching for Topical Index Shards Zhuyun Dai and Jamie Callan Language Technologies Institute, Carnegie Mellon University {zhuyund, callan}@cs.cmu.edu Abstract. Selective search is a distributed
More informationDATA MINING - 1DL105, 1DL111
1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationAddressing the Challenges of Underspecification in Web Search. Michael Welch
Addressing the Challenges of Underspecification in Web Search Michael Welch mjwelch@cs.ucla.edu Why study Web search?!! Search engines have enormous reach!! Nearly 1 billion queries globally each day!!
More informationHow To Construct A Keyword Strategy?
Introduction The moment you think about marketing these days the first thing that pops up in your mind is to go online. Why is there a heck about marketing your business online? Why is it so drastically
More informationUNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.
UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index
More informationRECOMMENDATIONS HOW TO ATTRACT CLIENTS TO ROBOFOREX
RECOMMENDATIONS HOW TO ATTRACT CLIENTS TO ROBOFOREX Your success as a partner directly depends on the number of attracted clients and their trading activity. You can hardly influence clients trading activity,
More informationApproaches to Mining the Web
Approaches to Mining the Web Olfa Nasraoui University of Louisville Web Mining: Mining Web Data (3 Types) Structure Mining: extracting info from topology of the Web (links among pages) Hubs: pages pointing
More informationAn Overview of Search Engine. Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia
An Overview of Search Engine Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia haixu@microsoft.com July 24, 2007 1 Outline History of Search Engine Difference Between Software and
More informationQuerying Introduction to Information Retrieval INF 141 Donald J. Patterson. Content adapted from Hinrich Schütze
Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Overview Boolean Retrieval Weighted Boolean Retrieval Zone Indices
More informationTechnology in Action Complete, 13e (Evans et al.) Chapter 3 Using the Internet: Making the Most of the Web's Resources
Technology in Action Complete, 13e (Evans et al.) Chapter 3 Using the Internet: Making the Most of the Web's Resources 1) The Internet is. A) an internal communication system for businesses B) a communication
More informationBasic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert
More informationWeb Personalization & Recommender Systems
Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher, Depaul University - Recent publications: see the last page (Reference section) Web Personalization & Recommender
More informationA Survey on Web Information Retrieval Technologies
A Survey on Web Information Retrieval Technologies Lan Huang Computer Science Department State University of New York, Stony Brook Presented by Kajal Miyan Michigan State University Overview Web Information
More informationModule 1: Internet Basics for Web Development (II)
INTERNET & WEB APPLICATION DEVELOPMENT SWE 444 Fall Semester 2008-2009 (081) Module 1: Internet Basics for Web Development (II) Dr. El-Sayed El-Alfy Computer Science Department King Fahd University of
More informationInformation Retrieval. Lecture 11 - Link analysis
Information Retrieval Lecture 11 - Link analysis Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 35 Introduction Link analysis: using hyperlinks
More informationParallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem
I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **
More informationIntroduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 21 Link analysis
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 21 Link analysis Content Anchor text Link analysis for ranking Pagerank and variants HITS The Web as a Directed Graph Page A Anchor
More informationIntroduction to Information Retrieval. (COSC 488) Spring Nazli Goharian. Course Outline
Introduction to Information Retrieval (COSC 488) Spring 2012 Nazli Goharian nazli@cs.georgetown.edu Course Outline Introduction Retrieval Strategies (Models) Retrieval Utilities Evaluation Indexing Efficiency
More informationQuery Refinement and Search Result Presentation
Query Refinement and Search Result Presentation (Short) Queries & Information Needs A query can be a poor representation of the information need Short queries are often used in search engines due to the
More informationThe Ultimate Digital Marketing Glossary (A-Z) what does it all mean? A-Z of Digital Marketing Translation
The Ultimate Digital Marketing Glossary (A-Z) what does it all mean? In our experience, we find we can get over-excited when talking to clients or family or friends and sometimes we forget that not everyone
More informationAdvertising Network Affiliate Marketing Algorithm Analytics Auto responder autoresponder Backlinks Blog
Advertising Network A group of websites where one advertiser controls all or a portion of the ads for all sites. A common example is the Google Search Network, which includes AOL, Amazon,Ask.com (formerly
More informationVALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER
VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER CS6007-INFORMATION RETRIEVAL Regulation 2013 Academic Year 2018
More informationCHAPTER THREE INFORMATION RETRIEVAL SYSTEM
CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost
More informationAnatomy of a search engine. Design criteria of a search engine Architecture Data structures
Anatomy of a search engine Design criteria of a search engine Architecture Data structures Step-1: Crawling the web Google has a fast distributed crawling system Each crawler keeps roughly 300 connection
More informationSEO and Monetizing The Content. Digital 2011 March 30 th Thinking on a different level
SEO and Monetizing The Content Digital 2011 March 30 th 2011 Getting Found and Making the Most of It 1. Researching target Audience (Keywords) 2. On-Page Optimisation (Content) 3. Titles and Meta Tags
More informationSearching the Web for Information
Search Xin Liu Searching the Web for Information How a Search Engine Works Basic parts: 1. Crawler: Visits sites on the Internet, discovering Web pages 2. Indexer: building an index to the Web's content
More informationBrief (non-technical) history
Web Data Management Part 2 Advanced Topics in Database Management (INFSCI 2711) Textbooks: Database System Concepts - 2010 Introduction to Information Retrieval - 2008 Vladimir Zadorozhny, DINS, SCI, University
More informationDigital Marketing Overview of Digital Marketing Website Creation Search Engine Optimization What is Google Page Rank?
Digital Marketing Overview of Digital Marketing What is marketing and digital marketing? Understanding Marketing and Digital Marketing Process? Website Creation Understanding about Internet, websites,
More informationDepartment of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _
COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ LP: CS6007 Rev. No: 01 Date: 27/06/2017 Sub.
More informationInformation Retrieval. Lecture 9 - Web search basics
Information Retrieval Lecture 9 - Web search basics Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 30 Introduction Up to now: techniques for general
More informationGraph and Link Mining
Graph and Link Mining Graphs - Basics A graph is a powerful abstraction for modeling entities and their pairwise relationships. G = (V,E) Set of nodes V = v,, v 5 Set of edges E = { v, v 2, v 4, v 5 }
More informationMarketing & Back Office Management
Marketing & Back Office Management Menu Management Add, Edit, Delete Menu Gallery Management Add, Edit, Delete Images Banner Management Update the banner image/background image in web ordering Online Data
More informationEfficient query processing
Efficient query processing Efficient scoring, distributed query processing Web Search 1 Ranking functions In general, document scoring functions are of the form The BM25 function, is one of the best performing:
More informationCS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University
CS6200 Information Retrieval Jesse Anderton College of Computer and Information Science Northeastern University Major Contributors Gerard Salton! Vector Space Model Indexing Relevance Feedback SMART Karen
More informationWhy it Really Matters to RESNET Members
Welcome to SEO 101 Why it Really Matters to RESNET Members Presented by Fourth Dimension at the 2013 RESNET Conference 1. 2. 3. Why you need SEO How search engines work How people use search engines
More informationWeb consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page
Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information
More informationDigital Marketing for Small Businesses. Amandine - The Marketing Cookie
Digital Marketing for Small Businesses Amandine - The Marketing Cookie Search Engine Optimisation What is SEO? SEO stands for Search Engine Optimisation. Definition: SEO is a methodology of strategies,
More informationIndex construction CE-324: Modern Information Retrieval Sharif University of Technology
Index construction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch.
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More informationWelcome to the class of Web Information Retrieval!
Welcome to the class of Web Information Retrieval! Tee Time Topic Augmented Reality and Google Glass By Ali Abbasi Challenges in Web Search Engines Min ZHANG z-m@tsinghua.edu.cn April 13, 2012 Challenges
More informationCS/INFO 1305 Summer 2009
Information Retrieval Information Retrieval (Search) IR Search Using a computer to find relevant pieces of information Text search Idea popularized in the article As We May Think by Vannevar Bush in 1945
More informationLecture 8: Linkage algorithms and web search
Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk 2017
More informationCrawler. Crawler. Crawler. Crawler. Anchors. URL Resolver Indexer. Barrels. Doc Index Sorter. Sorter. URL Server
Authors: Sergey Brin, Lawrence Page Google, word play on googol or 10 100 Centralized system, entire HTML text saved Focused on high precision, even at expense of high recall Relies heavily on document
More informationHow to organize the Web?
How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper
More informationDIGITAL MARKETING For your Company
DIGITAL MARKETING For your Company www.almada.co 1 About Us Established in 1998 with 8 developer team and 42 offshore team, a PCI DSS, ISO 27001, 9001 certified Data Center & service provider, a world-leading
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More information21. Search Models and UIs for IR
21. Search Models and UIs for IR INFO 202-10 November 2008 Bob Glushko Plan for Today's Lecture The "Classical" Model of Search and the "Classical" UI for IR Web-based Search Best practices for UIs in
More informationGary Viray Founder, Search Opt Media Inc. Search.Rank.Convert.
SEARCH + SOCIAL Gary Viray Founder, Search Opt Media Inc. Goo gol Google Algorithm Change Google Toolbar December 2000 Birth of Toolbar Pagerank They move the toilet mid stream. 404P Pages are ranking
More informationSEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE
SEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE What is Search Engine Optimization? SEO is a marketing discipline focused on growing visibility in organic (non-paid) search engine results. Why
More informationEinführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme
Einführung in Web und Data Science Community Analysis Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Today s lecture Anchor text Link analysis for ranking Pagerank and variants
More informationWeb Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy
Text Technologies for Data Science INFR11145 Web Search Instructor: Walid Magdy 14-Nov-2017 Lecture Objectives Learn about: Working with Massive data Link analysis (PageRank) Anchor text 2 1 The Web Document
More informationITP 140 Mobile Technologies. Mobile Topics
ITP 140 Mobile Technologies Mobile Topics Topics Analytics APIs RESTful Facebook Twitter Google Cloud Web Hosting 2 Reach We need users! The number of users who try our apps Retention The number of users
More informationRecent Researches on Web Page Ranking
Recent Researches on Web Page Pradipta Biswas School of Information Technology Indian Institute of Technology Kharagpur, India Importance of Web Page Internet Surfers generally do not bother to go through
More information6 WAYS Google s First Page
6 WAYS TO Google s First Page FREE EBOOK 2 CONTENTS 03 Intro 06 Search Engine Optimization 08 Search Engine Marketing 10 Start a Business Blog 12 Get Listed on Google Maps 15 Create Online Directory Listing
More informationInformation Retrieval. hussein suleman uct cs
Information Management Information Retrieval hussein suleman uct cs 303 2004 Introduction Information retrieval is the process of locating the most relevant information to satisfy a specific information
More informationLecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science
Lecture 9: I: Web Retrieval II: Webology Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen April 10, 2003 Page 1 WWW retrieval Two approaches
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea What is this course about? Processing Indexing Retrieving textual data (or audio, video, geo-spatial,, data) Fits in four
More informationInformation Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured
More informationIndex Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search
Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing
More informationIntro to Peer-to-Peer Search
Intro to Peer-to-Peer Search (COSC 416) Nazli Goharian nazli@cs.georgetown.edu 1 Outline Peer-to-peer historical perspective Problem definition Local client data processing Ranking functions Metadata copying
More information