Module Contact: Dr Dan Smith, CMP Copyright of the University of East Anglia Version 1

Size: px
Start display at page:

Download "Module Contact: Dr Dan Smith, CMP Copyright of the University of East Anglia Version 1"

Transcription

1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015/16 INFORMATION RETRIEVAL CMP-5036A/CMP-6008A Time allowed: 2 hours Answer any TWO questions. Notes are not permitted in this examination. Do not turn over until you are told to do so by the Invigilator. CMP-5036A/CMP-6008A Module Contact: Dr Dan Smith, CMP Copyright of the University of East Anglia Version 1

2 Page 2 1. (a) Describe the Rocchio technique for relevance feedback in information retrieval. Use appropriate mathematical expressions, and make sure that every term you use is clearly defined. What is the main problem with this technique and what practical measures can be taken to alleviate this problem? [15 marks] (b) What is query expansion in information retrieval? Describe two techniques for query expansion and comment on any problems associated with them. [10 marks] (c) Explain why n-grams of words are useful in information retrieval, using examples that demonstrate this. Despite their usefulness, search engines often do not use them extensively. Why? [5 marks] (d) Explain why direct estimation of the probability of observing a phrase or sentence drawn from a particular natural language is unfeasible. Explain how the n-gram assumption circumvents this problem. Use the mathematical expressions appropriate for bigrams in your explanation. [10 marks] (e) A hypothetical language has only three words, A, B, and C. An example of a sentence from the language is given below: C C A B B A B B C C A B B A A. Using this sentence, obtain unsmoothed estimates of (i) the unigram probabilities of the words in the language; (ii) the observed bigram probabilities. [3 marks] [7 marks] (f) Find the probability of observing the sentence B C A B using your estimated unigram and bigram probabilities. [5 marks] (g) Explain what problem occurs when estimating the probability of the sentence C B C using bigrams. How can this problem be addressed? [5 marks] CMP-5036A/CMP-6008A Version 1

3 Page 3 2. (a) Describe the processes by which a user s information need is communicated to a typical search engine, a response generated, and the user s possible reactions to the system s response. (b) Describe the main components of a simple inverted index suitable for bag of words information retrieval methods. You should illustrate your answer with diagrams, where appropriate. (c) Describe how this structure can be extended to accommodate proximity search and the effects these changes are likely to have on index size and search time. [16 marks] (d) Describe the basic structure and main operations of a simple web crawler, how these are enhanced in current commercial crawlers, and discuss the issues faced in ensuring that crawlers are directed towards good quality material.you should illustrate your answer with diagrams as necessary. [20 marks] CMP-5036A/CMP-6008A Version 1 TURN OVER

4 Page 4 3. (a) State, in your own words, the Probability Ranking Principle for Information Retrieval. (b) Suppose a document is represented as a vector of terms x, and it can be either relevant (R) or non-relevant (NR) to a certain query, represented as a vector q. It can be shown that the likelihood ratio L = Pr(R q, x) Pr(NR q, x) (1) can be written as L = Pr( x R, q) Pr( x NR, q) Pr(R q) Pr(NR q). (2) (i) Explain the meaning of both the numerator and denominator of the first term in equation 2. How can this term be interpreted in terms of odds? (ii) Why can the second term of equation 2 be ignored when ranking a set of documents with respect to their relevance to the query? [4 marks] (c) Suppose that we have a set of 10 documents some of which we know are relevant (R) to a certain query and some of which are non-relevant (NR). We also know whether the first term of the query, x 1, appears in each of the documents. This information is shown in Table 1. Table 1: Relevance/non-relevance of 10 documents to a query, and presence (1) or absence (0) of term x 1 in these documents. Document No: Presence: Relevance: R R R R R R NR NR NR NR Let p 1 be the probability that term x 1 is present in a relevant document and u 1 the prob. that term x 1 is present in a non-relevant CMP-5036A/CMP-6008A Version 1

5 Page 5 document. (i) Make estimates of p 1 and u 1. (ii) The quality of a term x i for retrieval can be estimated by computing c i, where c i = log p i (1 p i ) + log (1 u i). (3) u i (A) How can this term be interpreted in terms of odds? (B) Explain why a problem is caused if the term x i does not appear in any of the documents known to be relevant or non-relevant. How can this problem be alleviated? (C) Show that, by making a reasonable assumptions about the proportion of relevant documents in a large collection, the second term in equation 3 can be estimated as log(n/d f i ), where N is the number of documents in the collection and d f i is the document frequency i.e. the number of documents that the term x i appears in. State clearly the assumptions you make, and comment on the significance of this result. CMP-5036A/CMP-6008A Version 1 TURN OVER

6 Page 6 4. (a) Explain (i) the development of link analysis techniques in bibliometrics, (ii) the motivation for the use of link analysis algorithms in web search, and (iii) the terms authority, hub, and in-degree for link analysis algorithms. (b) Describe the main recent developments in search engine technology designed to improve the quality of search results. (c) Describe the simple version of the PageRank algorithm, using relevant diagrams and formulae in your answer, and briefly explain its importance in the success of Google s search engine. [24 marks] (d) Discuss the most important aspects of web documents that are used as signals of good (or bad) quality by search engines. [16 marks] END OF PAPER CMP-5036A/CMP-6008A Version 1

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015-16 DATA STRUCTURES AND ALGORITHMS CMP-5014Y Time allowed: 3 hours Section A (Attempt any 4 questions: 60 marks) Section

More information

Module Contact: Dr Pierre Chardaire, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Pierre Chardaire, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015/16 INTRODUCTORY PROGRAMMING CMP-0005B Time allowed: 2 hours. Answer BOTH questions from section A and ONE question

More information

Module Contact: Dr Pierre Chardaire, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Pierre Chardaire, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2014/15 INTRODUCTORY PROGRAMMING CMP-0005B Time allowed: 2 hours. Answer BOTH questions from section A and ONE question

More information

Copyright of the University of East Anglia Version 2

Copyright of the University of East Anglia Version 2 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2014-15 DATABASE SYSTEMS CMP-5005B / CMPC2B08 Time allowed: 3 hours Answer THREE questions. All questions carry equal weight.

More information

Do not turn over until you are told to do so by the invigilator. Module Contact: Dr. Beatriz de la Iglesia (CMP)

Do not turn over until you are told to do so by the invigilator. Module Contact: Dr. Beatriz de la Iglesia (CMP) UNIVERSITY OF EAST ANGLIA School of Computing Sciences January PG Examination 2013-14 DATABASE MANIPULATION CMPSMB11 Time allowed: 3 hours Answer THREE questions. All questions carry equal weight. Notes

More information

Copyright of the University of East Anglia Version 1

Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series Examination 2015-16 NETWORKS CMP-5037B/CMP-6009B Time allowed: 2 hours Answer any THREE questions. Notes are not permitted in this examination.

More information

Module Contact: Dr Stephen Laycock, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Stephen Laycock, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series PG Examination 2013-14 COMPUTER GAMES DEVELOPMENT CMPSME27 Time allowed: 2 hours Answer any THREE questions. (40 marks each) Notes are

More information

UNIVERSITY OF EAST ANGLIA School of Computing Sciences May/June UG Examination EMBEDDED SYSTEMS CMPE3D02. Time allowed: 3 hours

UNIVERSITY OF EAST ANGLIA School of Computing Sciences May/June UG Examination EMBEDDED SYSTEMS CMPE3D02. Time allowed: 3 hours UNIVERSITY OF EAST ANGLIA School of Computing Sciences May/June UG Examination 2011-12 EMBEDDED SYSTEMS CMPE3D02 Time allowed: 3 hours Answer FOUR questions. All questions carry equal weight. Notes are

More information

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2013-14 PROGRAMMING FOR NON-SPECIALISTS CMPC2X02 Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section

More information

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 2

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 2 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2012-13 PROGRAMMING 1 CMPC1M0Y Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section B (Attempt one

More information

Answer BOTH questions from Section A and ONE question from Section B.

Answer BOTH questions from Section A and ONE question from Section B. UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2013/14 INTRODUCTORY PROGRAMMING CMP-0005B Time allowed: 2 hours Answer BOTH questions from Section A and ONE question

More information

CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018

CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018 CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018 Dan Jurafsky Tuesday, January 23, 2018 1 Part 1: Group Exercise We are interested in building

More information

Birkbeck (University of London)

Birkbeck (University of London) Birkbeck (University of London) MSc Examination for Internal Students Department of Computer Science and Information Systems Information Retrieval and Organisation (COIY64H7) Credit Value: 5 Date of Examination:

More information

Module Contact: Dr. Ben Milner Copyright of the University of East Anglia Version 2

Module Contact: Dr. Ben Milner Copyright of the University of East Anglia Version 2 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series Examination 2013-14 COMPUTER NETWORKS CMPSMD22 Time allowed: 3 hours Answer Question 1 and THREE questions from questions 2 to 6. Notes

More information

Module Contact: Dr Tony Bagnall, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Tony Bagnall, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2016-17 DATA STRUCTURES AND ALGORITHMS CMP-5014Y Time allowed: 2 hours Section A (Attempt all questions: 75 marks) Section

More information

Module Contact: Dr Taoyang Wu, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Taoyang Wu, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2016-17 PROGRAMMING FOR NON-SPECIALISTS CMP-5020B Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section

More information

Models for Document & Query Representation. Ziawasch Abedjan

Models for Document & Query Representation. Ziawasch Abedjan Models for Document & Query Representation Ziawasch Abedjan Overview Introduction & Definition Boolean retrieval Vector Space Model Probabilistic Information Retrieval Language Model Approach Summary Overview

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 2:00pm-3:30pm, Tuesday, December 15th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015-16 PROGRAMMING FOR APPLICATIONS CMP-4009B Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section

More information

CSE 494: Information Retrieval, Mining and Integration on the Internet

CSE 494: Information Retrieval, Mining and Integration on the Internet CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:

More information

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)"

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3) CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Language Model" Unigram language

More information

Module Contact: Dr Beatriz de la Iglesia, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Beatriz de la Iglesia, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015-16 DATABASE SYSTEMS CMP-4010B / CMP-5038B Time allowed: 3 hours Answer THREE questions out of FIVE. All questions

More information

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015-16 PROGRAMMING 1 CMP-4008Y Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section B (Attempt one

More information

Computer Science 572 Midterm Prof. Horowitz Tuesday, March 12, 2013, 12:30pm 1:45pm

Computer Science 572 Midterm Prof. Horowitz Tuesday, March 12, 2013, 12:30pm 1:45pm Computer Science 572 Midterm Prof. Horowitz Tuesday, March 12, 2013, 12:30pm 1:45pm Name: Student Id Number: 1. This is a closed book exam. 2. Please answer all questions. 3. There are a total of 40 questions.

More information

Jan Pedersen 22 July 2010

Jan Pedersen 22 July 2010 Jan Pedersen 22 July 2010 Outline Problem Statement Best effort retrieval vs automated reformulation Query Evaluation Architecture Query Understanding Models Data Sources Standard IR Assumptions Queries

More information

A Constant Rate of Change Name Part 1

A Constant Rate of Change Name Part 1 A Constant Rate of Change Name Part 1 Consider the function table below. Complete this page by solving the problems at the bottom. Use a separate sheet of paper for your descriptions and explanations.

More information

Module Contact: Dr Gavin Cawley, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Gavin Cawley, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2016-17 PROGRAMMING 1 CMP-4008Y Time allowed: 2 hours Section A (Attempt all questions: 80 marks) Section B (Attempt one

More information

Module Contact: Dr Pierre Chardaire, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Pierre Chardaire, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2016-17 INTRODUCTORY PROGRAMMING CMP-0005A Time allowed: 2 hours Answer BOTH questions from Section A and ONE question

More information

The City School. Comprehensive Worksheet MATHEMATICS Class 6. Candidate Name: Index Number: Section: Branch/Campus: Date:

The City School. Comprehensive Worksheet MATHEMATICS Class 6. Candidate Name: Index Number: Section: Branch/Campus: Date: The City School Comprehensive Worksheet 2017-2018 MATHEMATICS Class 6 Candidate Name: Index Number: Section: Branch/Campus: Date: Maximum Marks: 100 Time Allowed: 2 hours INSTRUCTIONS: Write your name,

More information

Module Contact: Dr Graeme Richards, CMP. Copyright of the University of East Anglia Version 1

Module Contact: Dr Graeme Richards, CMP. Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2015/16 WEB BASED PROGRAMMING CMP-4011A Time allowed: 2 hours Answer BOTH questions in Section A and TWO questions from

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Language Models Language models are distributions over sentences N gram models are built from local conditional probabilities Language Modeling II Dan Klein UC Berkeley, The

More information

Exam IST 441 Spring 2014

Exam IST 441 Spring 2014 Exam IST 441 Spring 2014 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.

More information

Exam IST 441 Spring 2011

Exam IST 441 Spring 2011 Exam IST 441 Spring 2011 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.

More information

Lecture 7: Relevance Feedback and Query Expansion

Lecture 7: Relevance Feedback and Query Expansion Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Ronan Cummins Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk

More information

Module Contact: Dr Stephen Laycock, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Stephen Laycock, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 217-18 GRAPHICS 1 CMP-51B Time allowed: 2 hours Answer THREE from FOUR questions (4 marks each) Notes are not permitted

More information

Module Contact: Dr Anthony J. Bagnall, CMP Copyright of the University of East Anglia Version 2

Module Contact: Dr Anthony J. Bagnall, CMP Copyright of the University of East Anglia Version 2 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2014/15 PROGRAMMING 2 CMP-5015Y Time allowed: 2 hours Answer four questions. All questions carry equal weight. Notes are

More information

Computer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm

Computer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm Computer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm Name: Student Id Number: 1. This is a closed book exam. 2. Please answer all questions. 3. There are a total of 40 questions.

More information

P A1 for 152 with working

P A1 for 152 with working Paper 1M: 1H 1 32.968 M1 for correct method (condone one error) for digits 32968 ft (dep M1) for correct placement of decimal pt 2 m 2 + 10m + 21 M1 for at least 3 terms out of a maximum of 4 correct from

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

How to organize the Web?

How to organize the Web? How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper

More information

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1) CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Retrieval Models" Provide

More information

CE161-4-AU UNIVERSITY OF ESSEX. Undergraduate Examinations 2014 DIGITAL SYSTEMS ARCHITECTURE. Time allowed: TWO hours

CE161-4-AU UNIVERSITY OF ESSEX. Undergraduate Examinations 2014 DIGITAL SYSTEMS ARCHITECTURE. Time allowed: TWO hours CE161-4-AU UNIVERSITY OF ESSEX Undergraduate Examinations 2014 DIGITAL SYSTEMS ARCHITECTURE Time allowed: TWO hours The following items are provided: Graph paper (available on invigilator s desk) Candidates

More information

Module Contact: Dr Rudy Lapeer, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Rudy Lapeer, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2014-15 GRAPHICS 1 CMP-5010B Time allowed: 2 hours Answer THREE questions. Notes are not permitted in this examination

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

CS371R: Final Exam Dec. 18, 2017

CS371R: Final Exam Dec. 18, 2017 CS371R: Final Exam Dec. 18, 2017 NAME: This exam has 11 problems and 16 pages. Before beginning, be sure your exam is complete. In order to maximize your chance of getting partial credit, show all of your

More information

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,

More information

Midterm Exam Search Engines ( / ) October 20, 2015

Midterm Exam Search Engines ( / ) October 20, 2015 Student Name: Andrew ID: Seat Number: Midterm Exam Search Engines (11-442 / 11-642) October 20, 2015 Answer all of the following questions. Each answer should be thorough, complete, and relevant. Points

More information

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day

More information

C1 allow ft - interprets answer to round up to integer value. at 1 2. at 4 6

C1 allow ft - interprets answer to round up to integer value. at 1 2. at 4 6 1 0.1,0.106,0.16,0.61 2 37 1000 3 39 4 1, 2, 4, 5, 10, 20 M1 for at least 3 factors for all factors with no additions 5 17 P1 start to process information eg. 130 8 or repeated subtraction from 130 or

More information

Module Contact: Dr Gavin Cawley, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr Gavin Cawley, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2017-18 PROGRAMMING 1 CMP-4008Y Time allowed: 2 hours Answer FOUR questions. All questions carry equal weight. Notes are

More information

MATH 021 UNIT 2 HOMEWORK ASSIGNMENTS

MATH 021 UNIT 2 HOMEWORK ASSIGNMENTS MATH 021 UNIT 2 HOMEWORK ASSIGNMENTS General Instructions You will notice that most of the homework assignments for a section have more than one part. Usually, the part (A) questions ask for explanations,

More information

Pagerank Scoring. Imagine a browser doing a random walk on web pages:

Pagerank Scoring. Imagine a browser doing a random walk on web pages: Ranking Sec. 21.2 Pagerank Scoring Imagine a browser doing a random walk on web pages: Start at a random page At each step, go out of the current page along one of the links on that page, equiprobably

More information

Querying Introduction to Information Retrieval INF 141 Donald J. Patterson. Content adapted from Hinrich Schütze

Querying Introduction to Information Retrieval INF 141 Donald J. Patterson. Content adapted from Hinrich Schütze Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Overview Boolean Retrieval Weighted Boolean Retrieval Zone Indices

More information

Lecture 8: Linkage algorithms and web search

Lecture 8: Linkage algorithms and web search Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk 2017

More information

THE PRINCIPLE OF INDUCTION. MARK FLANAGAN School of Electrical, Electronic and Communications Engineering University College Dublin

THE PRINCIPLE OF INDUCTION. MARK FLANAGAN School of Electrical, Electronic and Communications Engineering University College Dublin THE PRINCIPLE OF INDUCTION MARK FLANAGAN School of Electrical, Electronic and Communications Engineering University College Dublin The Principle of Induction: Let a be an integer, and let P(n) be a statement

More information

Document Representation : Quiz

Document Representation : Quiz Document Representation : Quiz Q1. In-memory Index construction faces following problems:. (A) Scaling problem (B) The optimal use of Hardware resources for scaling (C) Easily keep entire data into main

More information

A Semantic Based Search Engine for Open Architecture Requirements Documents

A Semantic Based Search Engine for Open Architecture Requirements Documents Calhoun: The NPS Institutional Archive Reports and Technical Reports All Technical Reports Collection 2008-04-01 A Semantic Based Search Engine for Open Architecture Requirements Documents Craig Martell

More information

6th Grade Math. Lindsay Law - Curriculum Facilitator (ext. 2085)

6th Grade Math. Lindsay Law - Curriculum Facilitator (ext. 2085) 6th Grade Math Purpose Students will become flexible thinkers and complex problem solvers by applying essential mathematical ideas and concepts through a rigorous, focused, and relevant curriculum. Philosophy

More information

2. A Bernoulli distribution has the following likelihood function for a data set D: N 1 N 1 + N 0

2. A Bernoulli distribution has the following likelihood function for a data set D: N 1 N 1 + N 0 Machine Learning Fall 2015 Homework 1 Homework must be submitted electronically following the instructions on the course homepage. Make sure to explain you reasoning or show your derivations. Except for

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

CMP 2 Grade Mathematics Curriculum Guides

CMP 2 Grade Mathematics Curriculum Guides CMP 2 Grade 7 2007 2008 Mathematics Curriculum Guides Wisconsin Mathematics Standard MPS Learning Target Wisconsin Assessment Descriptors for Mathematics Curriculum Throughout The Year A. Mathematical

More information

number Understand the equivalence between recurring decimals and fractions

number Understand the equivalence between recurring decimals and fractions number Understand the equivalence between recurring decimals and fractions Using and Applying Algebra Calculating Shape, Space and Measure Handling Data Use fractions or percentages to solve problems involving

More information

Exam IST 441 Spring 2013

Exam IST 441 Spring 2013 Exam IST 441 Spring 2013 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.

More information

ELEMENTARY NUMBER THEORY AND METHODS OF PROOF

ELEMENTARY NUMBER THEORY AND METHODS OF PROOF CHAPTER 4 ELEMENTARY NUMBER THEORY AND METHODS OF PROOF Copyright Cengage Learning. All rights reserved. SECTION 4.2 Direct Proof and Counterexample II: Rational Numbers Copyright Cengage Learning. All

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

Department of Electronic Engineering FINAL YEAR PROJECT REPORT Department of Electronic Engineering FINAL YEAR PROJECT REPORT BEngCE-2007/08-HCS-HCS-03-BECE Natural Language Understanding for Query in Web Search 1 Student Name: Sit Wing Sum Student ID: Supervisor:

More information

Web Search Engines: Solutions to Final Exam, Part I December 13, 2004

Web Search Engines: Solutions to Final Exam, Part I December 13, 2004 Web Search Engines: Solutions to Final Exam, Part I December 13, 2004 Problem 1: A. In using the vector model to compare the similarity of two documents, why is it desirable to normalize the vectors to

More information

CS47300 Web Information Search and Management

CS47300 Web Information Search and Management CS47300 Web Information Search and Management Search Engine Optimization Prof. Chris Clifton 31 October 2018 What is Search Engine Optimization? 90% of search engine clickthroughs are on the first page

More information

* * MATHEMATICS (MEI) 4751/01 Introduction to Advanced Mathematics (C1) ADVANCED SUBSIDIARY GCE. Thursday 15 May 2008 Morning

* * MATHEMATICS (MEI) 4751/01 Introduction to Advanced Mathematics (C1) ADVANCED SUBSIDIARY GCE. Thursday 15 May 2008 Morning ADVANCED SUBSIDIARY GCE MATHEMATICS (MEI) 4751/01 Introduction to Advanced Mathematics (C1) Candidates answer on the Printed Answer Book OCR Supplied Materials: Printed Answer Book (inserted) MEI Examination

More information

Monday 23 January 2012 Morning

Monday 23 January 2012 Morning Monday 2 January 202 Morning AS GCE MATHEMATICS (MEI) 477 Decision Mathematics QUESTION PAPER *472202* Candidates answer on the Printed Answer Book. OCR supplied materials: Printed Answer Book 477 MEI

More information

Link Analysis in Web Mining

Link Analysis in Web Mining Problem formulation (998) Link Analysis in Web Mining Hubs and Authorities Spam Detection Suppose we are given a collection of documents on some broad topic e.g., stanford, evolution, iraq perhaps obtained

More information

Structural Text Features. Structural Features

Structural Text Features. Structural Features Structural Text Features CISC489/689 010, Lecture #13 Monday, April 6 th Ben CartereGe Structural Features So far we have mainly focused on vanilla features of terms in documents Term frequency, document

More information

Automatic Summarization

Automatic Summarization Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization

More information

Module Contact: Dr R J Lapeer, CMP Copyright of the University of East Anglia Version 1

Module Contact: Dr R J Lapeer, CMP Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2012-13 GRAPHICS 1 CMPC2G04 Time allowed: 2 hours Answer THREE questions out of FOUR. (40 marks each) Notes are not permitted

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document

More information

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Web Search Instructor: Walid Magdy 14-Nov-2017 Lecture Objectives Learn about: Working with Massive data Link analysis (PageRank) Anchor text 2 1 The Web Document

More information

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using

More information

Morgan County School District Re-3. Pre-Algebra 9 Skills Assessment Resources. Content and Essential Questions

Morgan County School District Re-3. Pre-Algebra 9 Skills Assessment Resources. Content and Essential Questions Morgan County School District Re-3 August The tools of Algebra. Use the four-step plan to solve problems. Choose an appropriate method of computation. Write numerical expressions for word phrases. Write

More information

Do not turn over until you are told to do so by the Invigilator.

Do not turn over until you are told to do so by the Invigilator. UNIVERSITY OF EAST ANGLIA School of Mathematics UG End of Year Examination 2002-2003 PROGRAMMING FOR MATHEMATICIANS Time allowed: TWO hours Answer ALL FOUR questions in Section A Answer ONE Question from

More information

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Final Exam Search Engines ( / ) December 8, 2014

Final Exam Search Engines ( / ) December 8, 2014 Student Name: Andrew ID: Seat Number: Final Exam Search Engines (11-442 / 11-642) December 8, 2014 Answer all of the following questions. Each answer should be thorough, complete, and relevant. Points

More information

Mining The Web. Anwar Alhenshiri (PhD)

Mining The Web. Anwar Alhenshiri (PhD) Mining The Web Anwar Alhenshiri (PhD) Mining Data Streams In many data mining situations, we know the entire data set in advance Sometimes the input rate is controlled externally Google queries Twitter

More information

The University of Oregon May 16, 2015 Oregon Invitational Mathematics Tournament:

The University of Oregon May 16, 2015 Oregon Invitational Mathematics Tournament: The University of Oregon May 16, 2015 Oregon Invitational Mathematics Tournament: losed book examination Geometry Exam Time: 90 minutes Last Name First Name School Grade (please circle one): 7 8 9 10 11

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Automatically improving floating point code

Automatically improving floating point code Automatically improving floating point code Scientists Write Code Every scientist needs to write code Analyze data Simulate models Control experiments Scientists Write Code Every scientist needs to write

More information

Natural Language Processing. SoSe Question Answering

Natural Language Processing. SoSe Question Answering Natural Language Processing SoSe 2017 Question Answering Dr. Mariana Neves July 5th, 2017 Motivation Find small segments of text which answer users questions (http://start.csail.mit.edu/) 2 3 Motivation

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Building Search Applications

Building Search Applications Building Search Applications Lucene, LingPipe, and Gate Manu Konchady Mustru Publishing, Oakton, Virginia. Contents Preface ix 1 Information Overload 1 1.1 Information Sources 3 1.2 Information Management

More information

The Anatomy of a Large-Scale Hypertextual Web Search Engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine The Anatomy of a Large-Scale Hypertextual Web Search Engine Article by: Larry Page and Sergey Brin Computer Networks 30(1-7):107-117, 1998 1 1. Introduction The authors: Lawrence Page, Sergey Brin started

More information

CMPSCI 646, Information Retrieval (Fall 2003)

CMPSCI 646, Information Retrieval (Fall 2003) CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where

More information

SEARCHMETRICS WHITEPAPER RANKING FACTORS Targeted Analysis for more Success on Google and in your Online Market

SEARCHMETRICS WHITEPAPER RANKING FACTORS Targeted Analysis for more Success on Google and in your Online Market 2018 SEARCHMETRICS WHITEPAPER RANKING FACTORS 2018 Targeted for more Success on Google and in your Online Market Table of Contents Introduction: Why ranking factors for niches?... 3 Methodology: Which

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Graph Data & Introduction to Information Retrieval Huan Sun, CSE@The Ohio State University 11/21/2017 Slides adapted from Prof. Srinivasan Parthasarathy @OSU 2 Chapter 4

More information

This is a function because no vertical line can be drawn so that it intersects the graph more than once.

This is a function because no vertical line can be drawn so that it intersects the graph more than once. Determine whether each relation is a function. Explain. 1. A function is a relation in which each element of the domain is paired with exactly one element of the range. So, this relation is a function.

More information

Term-Specific Smoothing for the Language Modeling Approach to Information Retrieval: The Importance of a Query Term

Term-Specific Smoothing for the Language Modeling Approach to Information Retrieval: The Importance of a Query Term Term-Specific Smoothing for the Language Modeling Approach to Information Retrieval: The Importance of a Query Term Djoerd Hiemstra University of Twente, Centre for Telematics and Information Technology

More information

6.8 Sine ing and Cosine ing It

6.8 Sine ing and Cosine ing It SECONDARY MATH III // MODULE 6 In the previous tasks of this module you have used the similarity of circles, the symmetry of circles, right triangle trigonometry and proportional reasoning to locate stakes

More information

CS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University

CS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University CS6200 Information Retrieval David Smith College of Computer and Information Science Northeastern University Indexing Process Processing Text Converting documents to index terms Why? Matching the exact

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Engineering 2011 Intake Semester 7 Examination CS4422 WIRELESS AND BROADBAND NETWORKING Time allowed: 2

More information

Scalable Trigram Backoff Language Models

Scalable Trigram Backoff Language Models Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work

More information