Module Contact: Dr Dan Smith, CMP Copyright of the University of East Anglia Version 1
|
|
- Emery Freeman
- 5 years ago
- Views:
Transcription
1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015/16 INFORMATION RETRIEVAL CMP-5036A/CMP-6008A Time allowed: 2 hours Answer any TWO questions. Notes are not permitted in this examination. Do not turn over until you are told to do so by the Invigilator. CMP-5036A/CMP-6008A Module Contact: Dr Dan Smith, CMP Copyright of the University of East Anglia Version 1
2 Page 2 1. (a) Describe the Rocchio technique for relevance feedback in information retrieval. Use appropriate mathematical expressions, and make sure that every term you use is clearly defined. What is the main problem with this technique and what practical measures can be taken to alleviate this problem? [15 marks] (b) What is query expansion in information retrieval? Describe two techniques for query expansion and comment on any problems associated with them. [10 marks] (c) Explain why n-grams of words are useful in information retrieval, using examples that demonstrate this. Despite their usefulness, search engines often do not use them extensively. Why? [5 marks] (d) Explain why direct estimation of the probability of observing a phrase or sentence drawn from a particular natural language is unfeasible. Explain how the n-gram assumption circumvents this problem. Use the mathematical expressions appropriate for bigrams in your explanation. [10 marks] (e) A hypothetical language has only three words, A, B, and C. An example of a sentence from the language is given below: C C A B B A B B C C A B B A A. Using this sentence, obtain unsmoothed estimates of (i) the unigram probabilities of the words in the language; (ii) the observed bigram probabilities. [3 marks] [7 marks] (f) Find the probability of observing the sentence B C A B using your estimated unigram and bigram probabilities. [5 marks] (g) Explain what problem occurs when estimating the probability of the sentence C B C using bigrams. How can this problem be addressed? [5 marks] CMP-5036A/CMP-6008A Version 1
3 Page 3 2. (a) Describe the processes by which a user s information need is communicated to a typical search engine, a response generated, and the user s possible reactions to the system s response. (b) Describe the main components of a simple inverted index suitable for bag of words information retrieval methods. You should illustrate your answer with diagrams, where appropriate. (c) Describe how this structure can be extended to accommodate proximity search and the effects these changes are likely to have on index size and search time. [16 marks] (d) Describe the basic structure and main operations of a simple web crawler, how these are enhanced in current commercial crawlers, and discuss the issues faced in ensuring that crawlers are directed towards good quality material.you should illustrate your answer with diagrams as necessary. [20 marks] CMP-5036A/CMP-6008A Version 1 TURN OVER
4 Page 4 3. (a) State, in your own words, the Probability Ranking Principle for Information Retrieval. (b) Suppose a document is represented as a vector of terms x, and it can be either relevant (R) or non-relevant (NR) to a certain query, represented as a vector q. It can be shown that the likelihood ratio L = Pr(R q, x) Pr(NR q, x) (1) can be written as L = Pr( x R, q) Pr( x NR, q) Pr(R q) Pr(NR q). (2) (i) Explain the meaning of both the numerator and denominator of the first term in equation 2. How can this term be interpreted in terms of odds? (ii) Why can the second term of equation 2 be ignored when ranking a set of documents with respect to their relevance to the query? [4 marks] (c) Suppose that we have a set of 10 documents some of which we know are relevant (R) to a certain query and some of which are non-relevant (NR). We also know whether the first term of the query, x 1, appears in each of the documents. This information is shown in Table 1. Table 1: Relevance/non-relevance of 10 documents to a query, and presence (1) or absence (0) of term x 1 in these documents. Document No: Presence: Relevance: R R R R R R NR NR NR NR Let p 1 be the probability that term x 1 is present in a relevant document and u 1 the prob. that term x 1 is present in a non-relevant CMP-5036A/CMP-6008A Version 1
5 Page 5 document. (i) Make estimates of p 1 and u 1. (ii) The quality of a term x i for retrieval can be estimated by computing c i, where c i = log p i (1 p i ) + log (1 u i). (3) u i (A) How can this term be interpreted in terms of odds? (B) Explain why a problem is caused if the term x i does not appear in any of the documents known to be relevant or non-relevant. How can this problem be alleviated? (C) Show that, by making a reasonable assumptions about the proportion of relevant documents in a large collection, the second term in equation 3 can be estimated as log(n/d f i ), where N is the number of documents in the collection and d f i is the document frequency i.e. the number of documents that the term x i appears in. State clearly the assumptions you make, and comment on the significance of this result. CMP-5036A/CMP-6008A Version 1 TURN OVER
6 Page 6 4. (a) Explain (i) the development of link analysis techniques in bibliometrics, (ii) the motivation for the use of link analysis algorithms in web search, and (iii) the terms authority, hub, and in-degree for link analysis algorithms. (b) Describe the main recent developments in search engine technology designed to improve the quality of search results. (c) Describe the simple version of the PageRank algorithm, using relevant diagrams and formulae in your answer, and briefly explain its importance in the success of Google s search engine. [24 marks] (d) Discuss the most important aspects of web documents that are used as signals of good (or bad) quality by search engines. [16 marks] END OF PAPER CMP-5036A/CMP-6008A Version 1
Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015-16 DATA STRUCTURES AND ALGORITHMS CMP-5014Y Time allowed: 3 hours Section A (Attempt any 4 questions: 60 marks) Section
More informationModule Contact: Dr Pierre Chardaire, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015/16 INTRODUCTORY PROGRAMMING CMP-0005B Time allowed: 2 hours. Answer BOTH questions from section A and ONE question
More informationModule Contact: Dr Pierre Chardaire, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2014/15 INTRODUCTORY PROGRAMMING CMP-0005B Time allowed: 2 hours. Answer BOTH questions from section A and ONE question
More informationCopyright of the University of East Anglia Version 2
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2014-15 DATABASE SYSTEMS CMP-5005B / CMPC2B08 Time allowed: 3 hours Answer THREE questions. All questions carry equal weight.
More informationDo not turn over until you are told to do so by the invigilator. Module Contact: Dr. Beatriz de la Iglesia (CMP)
UNIVERSITY OF EAST ANGLIA School of Computing Sciences January PG Examination 2013-14 DATABASE MANIPULATION CMPSMB11 Time allowed: 3 hours Answer THREE questions. All questions carry equal weight. Notes
More informationCopyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series Examination 2015-16 NETWORKS CMP-5037B/CMP-6009B Time allowed: 2 hours Answer any THREE questions. Notes are not permitted in this examination.
More informationModule Contact: Dr Stephen Laycock, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series PG Examination 2013-14 COMPUTER GAMES DEVELOPMENT CMPSME27 Time allowed: 2 hours Answer any THREE questions. (40 marks each) Notes are
More informationUNIVERSITY OF EAST ANGLIA School of Computing Sciences May/June UG Examination EMBEDDED SYSTEMS CMPE3D02. Time allowed: 3 hours
UNIVERSITY OF EAST ANGLIA School of Computing Sciences May/June UG Examination 2011-12 EMBEDDED SYSTEMS CMPE3D02 Time allowed: 3 hours Answer FOUR questions. All questions carry equal weight. Notes are
More informationModule Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2013-14 PROGRAMMING FOR NON-SPECIALISTS CMPC2X02 Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section
More informationModule Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 2
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2012-13 PROGRAMMING 1 CMPC1M0Y Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section B (Attempt one
More informationAnswer BOTH questions from Section A and ONE question from Section B.
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2013/14 INTRODUCTORY PROGRAMMING CMP-0005B Time allowed: 2 hours Answer BOTH questions from Section A and ONE question
More informationCS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018
CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018 Dan Jurafsky Tuesday, January 23, 2018 1 Part 1: Group Exercise We are interested in building
More informationBirkbeck (University of London)
Birkbeck (University of London) MSc Examination for Internal Students Department of Computer Science and Information Systems Information Retrieval and Organisation (COIY64H7) Credit Value: 5 Date of Examination:
More informationModule Contact: Dr. Ben Milner Copyright of the University of East Anglia Version 2
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series Examination 2013-14 COMPUTER NETWORKS CMPSMD22 Time allowed: 3 hours Answer Question 1 and THREE questions from questions 2 to 6. Notes
More informationModule Contact: Dr Tony Bagnall, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2016-17 DATA STRUCTURES AND ALGORITHMS CMP-5014Y Time allowed: 2 hours Section A (Attempt all questions: 75 marks) Section
More informationModule Contact: Dr Taoyang Wu, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2016-17 PROGRAMMING FOR NON-SPECIALISTS CMP-5020B Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section
More informationModels for Document & Query Representation. Ziawasch Abedjan
Models for Document & Query Representation Ziawasch Abedjan Overview Introduction & Definition Boolean retrieval Vector Space Model Probabilistic Information Retrieval Language Model Approach Summary Overview
More informationUniversity of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015
University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 2:00pm-3:30pm, Tuesday, December 15th Name: ComputingID: This is a closed book and closed notes exam. No electronic
More informationModule Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015-16 PROGRAMMING FOR APPLICATIONS CMP-4009B Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section
More informationCSE 494: Information Retrieval, Mining and Integration on the Internet
CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)"
CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Language Model" Unigram language
More informationModule Contact: Dr Beatriz de la Iglesia, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015-16 DATABASE SYSTEMS CMP-4010B / CMP-5038B Time allowed: 3 hours Answer THREE questions out of FIVE. All questions
More informationModule Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015-16 PROGRAMMING 1 CMP-4008Y Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section B (Attempt one
More informationComputer Science 572 Midterm Prof. Horowitz Tuesday, March 12, 2013, 12:30pm 1:45pm
Computer Science 572 Midterm Prof. Horowitz Tuesday, March 12, 2013, 12:30pm 1:45pm Name: Student Id Number: 1. This is a closed book exam. 2. Please answer all questions. 3. There are a total of 40 questions.
More informationJan Pedersen 22 July 2010
Jan Pedersen 22 July 2010 Outline Problem Statement Best effort retrieval vs automated reformulation Query Evaluation Architecture Query Understanding Models Data Sources Standard IR Assumptions Queries
More informationA Constant Rate of Change Name Part 1
A Constant Rate of Change Name Part 1 Consider the function table below. Complete this page by solving the problems at the bottom. Use a separate sheet of paper for your descriptions and explanations.
More informationModule Contact: Dr Gavin Cawley, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2016-17 PROGRAMMING 1 CMP-4008Y Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section B (Attempt one
More informationModule Contact: Dr Pierre Chardaire, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2016-17 INTRODUCTORY PROGRAMMING CMP-0005A Time allowed: 2 hours Answer BOTH questions from Section A and ONE question
More informationThe City School. Comprehensive Worksheet MATHEMATICS Class 6. Candidate Name: Index Number: Section: Branch/Campus: Date:
The City School Comprehensive Worksheet 2017-2018 MATHEMATICS Class 6 Candidate Name: Index Number: Section: Branch/Campus: Date: Maximum Marks: 100 Time Allowed: 2 hours INSTRUCTIONS: Write your name,
More informationModule Contact: Dr Graeme Richards, CMP. Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015/16 WEB BASED PROGRAMMING CMP-4011A Time allowed: 2 hours Answer BOTH questions in Section A and TWO questions from
More informationNatural Language Processing
Natural Language Processing Language Models Language models are distributions over sentences N gram models are built from local conditional probabilities Language Modeling II Dan Klein UC Berkeley, The
More informationExam IST 441 Spring 2014
Exam IST 441 Spring 2014 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.
More informationExam IST 441 Spring 2011
Exam IST 441 Spring 2011 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.
More informationLecture 7: Relevance Feedback and Query Expansion
Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Ronan Cummins Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk
More informationModule Contact: Dr Stephen Laycock, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 217-18 GRAPHICS 1 CMP-51B Time allowed: 2 hours Answer THREE from FOUR questions (4 marks each) Notes are not permitted
More informationModule Contact: Dr Anthony J. Bagnall, CMP Copyright of the University of East Anglia Version 2
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2014/15 PROGRAMMING 2 CMP-5015Y Time allowed: 2 hours Answer four questions. All questions carry equal weight. Notes are
More informationComputer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm
Computer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm Name: Student Id Number: 1. This is a closed book exam. 2. Please answer all questions. 3. There are a total of 40 questions.
More informationP A1 for 152 with working
Paper 1M: 1H 1 32.968 M1 for correct method (condone one error) for digits 32968 ft (dep M1) for correct placement of decimal pt 2 m 2 + 10m + 21 M1 for at least 3 terms out of a maximum of 4 correct from
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationHow to organize the Web?
How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"
CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Retrieval Models" Provide
More informationCE161-4-AU UNIVERSITY OF ESSEX. Undergraduate Examinations 2014 DIGITAL SYSTEMS ARCHITECTURE. Time allowed: TWO hours
CE161-4-AU UNIVERSITY OF ESSEX Undergraduate Examinations 2014 DIGITAL SYSTEMS ARCHITECTURE Time allowed: TWO hours The following items are provided: Graph paper (available on invigilator s desk) Candidates
More informationModule Contact: Dr Rudy Lapeer, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2014-15 GRAPHICS 1 CMP-5010B Time allowed: 2 hours Answer THREE questions. Notes are not permitted in this examination
More informationUniversity of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015
University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic
More informationCS371R: Final Exam Dec. 18, 2017
CS371R: Final Exam Dec. 18, 2017 NAME: This exam has 11 problems and 16 pages. Before beginning, be sure your exam is complete. In order to maximize your chance of getting partial credit, show all of your
More informationEXTRACTION OF RELEVANT WEB PAGES USING DATA MINING
Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,
More informationMidterm Exam Search Engines ( / ) October 20, 2015
Student Name: Andrew ID: Seat Number: Midterm Exam Search Engines (11-442 / 11-642) October 20, 2015 Answer all of the following questions. Each answer should be thorough, complete, and relevant. Points
More informationPresented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu
Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day
More informationC1 allow ft - interprets answer to round up to integer value. at 1 2. at 4 6
1 0.1,0.106,0.16,0.61 2 37 1000 3 39 4 1, 2, 4, 5, 10, 20 M1 for at least 3 factors for all factors with no additions 5 17 P1 start to process information eg. 130 8 or repeated subtraction from 130 or
More informationModule Contact: Dr Gavin Cawley, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2017-18 PROGRAMMING 1 CMP-4008Y Time allowed: 2 hours Answer FOUR questions. All questions carry equal weight. Notes are
More informationMATH 021 UNIT 2 HOMEWORK ASSIGNMENTS
MATH 021 UNIT 2 HOMEWORK ASSIGNMENTS General Instructions You will notice that most of the homework assignments for a section have more than one part. Usually, the part (A) questions ask for explanations,
More informationPagerank Scoring. Imagine a browser doing a random walk on web pages:
Ranking Sec. 21.2 Pagerank Scoring Imagine a browser doing a random walk on web pages: Start at a random page At each step, go out of the current page along one of the links on that page, equiprobably
More informationQuerying Introduction to Information Retrieval INF 141 Donald J. Patterson. Content adapted from Hinrich Schütze
Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Overview Boolean Retrieval Weighted Boolean Retrieval Zone Indices
More informationLecture 8: Linkage algorithms and web search
Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk 2017
More informationTHE PRINCIPLE OF INDUCTION. MARK FLANAGAN School of Electrical, Electronic and Communications Engineering University College Dublin
THE PRINCIPLE OF INDUCTION MARK FLANAGAN School of Electrical, Electronic and Communications Engineering University College Dublin The Principle of Induction: Let a be an integer, and let P(n) be a statement
More informationDocument Representation : Quiz
Document Representation : Quiz Q1. In-memory Index construction faces following problems:. (A) Scaling problem (B) The optimal use of Hardware resources for scaling (C) Easily keep entire data into main
More informationA Semantic Based Search Engine for Open Architecture Requirements Documents
Calhoun: The NPS Institutional Archive Reports and Technical Reports All Technical Reports Collection 2008-04-01 A Semantic Based Search Engine for Open Architecture Requirements Documents Craig Martell
More information6th Grade Math. Lindsay Law - Curriculum Facilitator (ext. 2085)
6th Grade Math Purpose Students will become flexible thinkers and complex problem solvers by applying essential mathematical ideas and concepts through a rigorous, focused, and relevant curriculum. Philosophy
More information2. A Bernoulli distribution has the following likelihood function for a data set D: N 1 N 1 + N 0
Machine Learning Fall 2015 Homework 1 Homework must be submitted electronically following the instructions on the course homepage. Make sure to explain you reasoning or show your derivations. Except for
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationCMP 2 Grade Mathematics Curriculum Guides
CMP 2 Grade 7 2007 2008 Mathematics Curriculum Guides Wisconsin Mathematics Standard MPS Learning Target Wisconsin Assessment Descriptors for Mathematics Curriculum Throughout The Year A. Mathematical
More informationnumber Understand the equivalence between recurring decimals and fractions
number Understand the equivalence between recurring decimals and fractions Using and Applying Algebra Calculating Shape, Space and Measure Handling Data Use fractions or percentages to solve problems involving
More informationExam IST 441 Spring 2013
Exam IST 441 Spring 2013 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.
More informationELEMENTARY NUMBER THEORY AND METHODS OF PROOF
CHAPTER 4 ELEMENTARY NUMBER THEORY AND METHODS OF PROOF Copyright Cengage Learning. All rights reserved. SECTION 4.2 Direct Proof and Counterexample II: Rational Numbers Copyright Cengage Learning. All
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationDepartment of Electronic Engineering FINAL YEAR PROJECT REPORT
Department of Electronic Engineering FINAL YEAR PROJECT REPORT BEngCE-2007/08-HCS-HCS-03-BECE Natural Language Understanding for Query in Web Search 1 Student Name: Sit Wing Sum Student ID: Supervisor:
More informationWeb Search Engines: Solutions to Final Exam, Part I December 13, 2004
Web Search Engines: Solutions to Final Exam, Part I December 13, 2004 Problem 1: A. In using the vector model to compare the similarity of two documents, why is it desirable to normalize the vectors to
More informationCS47300 Web Information Search and Management
CS47300 Web Information Search and Management Search Engine Optimization Prof. Chris Clifton 31 October 2018 What is Search Engine Optimization? 90% of search engine clickthroughs are on the first page
More information* * MATHEMATICS (MEI) 4751/01 Introduction to Advanced Mathematics (C1) ADVANCED SUBSIDIARY GCE. Thursday 15 May 2008 Morning
ADVANCED SUBSIDIARY GCE MATHEMATICS (MEI) 4751/01 Introduction to Advanced Mathematics (C1) Candidates answer on the Printed Answer Book OCR Supplied Materials: Printed Answer Book (inserted) MEI Examination
More informationMonday 23 January 2012 Morning
Monday 2 January 202 Morning AS GCE MATHEMATICS (MEI) 477 Decision Mathematics QUESTION PAPER *472202* Candidates answer on the Printed Answer Book. OCR supplied materials: Printed Answer Book 477 MEI
More informationLink Analysis in Web Mining
Problem formulation (998) Link Analysis in Web Mining Hubs and Authorities Spam Detection Suppose we are given a collection of documents on some broad topic e.g., stanford, evolution, iraq perhaps obtained
More informationStructural Text Features. Structural Features
Structural Text Features CISC489/689 010, Lecture #13 Monday, April 6 th Ben CartereGe Structural Features So far we have mainly focused on vanilla features of terms in documents Term frequency, document
More informationAutomatic Summarization
Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization
More informationModule Contact: Dr R J Lapeer, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2012-13 GRAPHICS 1 CMPC2G04 Time allowed: 2 hours Answer THREE questions out of FOUR. (40 marks each) Notes are not permitted
More informationNatural Language Processing
Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document
More informationWeb Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy
Text Technologies for Data Science INFR11145 Web Search Instructor: Walid Magdy 14-Nov-2017 Lecture Objectives Learn about: Working with Massive data Link analysis (PageRank) Anchor text 2 1 The Web Document
More informationOutline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity
Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using
More informationMorgan County School District Re-3. Pre-Algebra 9 Skills Assessment Resources. Content and Essential Questions
Morgan County School District Re-3 August The tools of Algebra. Use the four-step plan to solve problems. Choose an appropriate method of computation. Write numerical expressions for word phrases. Write
More informationDo not turn over until you are told to do so by the Invigilator.
UNIVERSITY OF EAST ANGLIA School of Mathematics UG End of Year Examination 2002-2003 PROGRAMMING FOR MATHEMATICIANS Time allowed: TWO hours Answer ALL FOUR questions in Section A Answer ONE Question from
More informationΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου
Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationFinal Exam Search Engines ( / ) December 8, 2014
Student Name: Andrew ID: Seat Number: Final Exam Search Engines (11-442 / 11-642) December 8, 2014 Answer all of the following questions. Each answer should be thorough, complete, and relevant. Points
More informationMining The Web. Anwar Alhenshiri (PhD)
Mining The Web Anwar Alhenshiri (PhD) Mining Data Streams In many data mining situations, we know the entire data set in advance Sometimes the input rate is controlled externally Google queries Twitter
More informationThe University of Oregon May 16, 2015 Oregon Invitational Mathematics Tournament:
The University of Oregon May 16, 2015 Oregon Invitational Mathematics Tournament: losed book examination Geometry Exam Time: 90 minutes Last Name First Name School Grade (please circle one): 7 8 9 10 11
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationAutomatically improving floating point code
Automatically improving floating point code Scientists Write Code Every scientist needs to write code Analyze data Simulate models Control experiments Scientists Write Code Every scientist needs to write
More informationNatural Language Processing. SoSe Question Answering
Natural Language Processing SoSe 2017 Question Answering Dr. Mariana Neves July 5th, 2017 Motivation Find small segments of text which answer users questions (http://start.csail.mit.edu/) 2 3 Motivation
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationBuilding Search Applications
Building Search Applications Lucene, LingPipe, and Gate Manu Konchady Mustru Publishing, Oakton, Virginia. Contents Preface ix 1 Information Overload 1 1.1 Information Sources 3 1.2 Information Management
More informationThe Anatomy of a Large-Scale Hypertextual Web Search Engine
The Anatomy of a Large-Scale Hypertextual Web Search Engine Article by: Larry Page and Sergey Brin Computer Networks 30(1-7):107-117, 1998 1 1. Introduction The authors: Lawrence Page, Sergey Brin started
More informationCMPSCI 646, Information Retrieval (Fall 2003)
CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where
More informationSEARCHMETRICS WHITEPAPER RANKING FACTORS Targeted Analysis for more Success on Google and in your Online Market
2018 SEARCHMETRICS WHITEPAPER RANKING FACTORS 2018 Targeted for more Success on Google and in your Online Market Table of Contents Introduction: Why ranking factors for niches?... 3 Methodology: Which
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Graph Data & Introduction to Information Retrieval Huan Sun, CSE@The Ohio State University 11/21/2017 Slides adapted from Prof. Srinivasan Parthasarathy @OSU 2 Chapter 4
More informationThis is a function because no vertical line can be drawn so that it intersects the graph more than once.
Determine whether each relation is a function. Explain. 1. A function is a relation in which each element of the domain is paired with exactly one element of the range. So, this relation is a function.
More informationTerm-Specific Smoothing for the Language Modeling Approach to Information Retrieval: The Importance of a Query Term
Term-Specific Smoothing for the Language Modeling Approach to Information Retrieval: The Importance of a Query Term Djoerd Hiemstra University of Twente, Centre for Telematics and Information Technology
More information6.8 Sine ing and Cosine ing It
SECONDARY MATH III // MODULE 6 In the previous tasks of this module you have used the similarity of circles, the symmetry of circles, right triangle trigonometry and proportional reasoning to locate stakes
More informationCS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University
CS6200 Information Retrieval David Smith College of Computer and Information Science Northeastern University Indexing Process Processing Text Converting documents to index terms Why? Matching the exact
More informationUNIVERSITY OF MORATUWA
UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Engineering 2011 Intake Semester 7 Examination CS4422 WIRELESS AND BROADBAND NETWORKING Time allowed: 2
More informationScalable Trigram Backoff Language Models
Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work
More information