Informa(on Retrieval. Informa(on Retrieval
|
|
- Phebe Fowler
- 5 years ago
- Views:
Transcription
1 Informa(on Retrieval CISC489/ , Lecture #1 Monday, Feb. 9 Ben CartereFe Informa(on Retrieval 1
2 Informa(on Retrieval Domains, Applica(ons, and Tasks Web search Ver(cal search Enterprise search Media search Ques(on answering Recommender systems Adver(sing Personal item search Passage retrieval Filtering Summariza(on Clustering Topic detec(on Cross language Federated search Metasearch Social search Novel item retrieval 2
3 What is IR? Gerard Salton, 1968: Informa(on retrieval is a field concerned with the structure, analysis, organiza(on, storage, searching, and retrieval of informa(on. This class is about computa(onal methods for the structure, analysis, organiza(on, storage, searching, and retrieval of informa(on. And primarily about text documents. What is a Document? Examples: web pages, , books, news stories, scholarly papers, text messages, Word, Powerpoint, PDF, forum pos(ngs, patents, IM sessions, etc. Common proper(es: Significant text content. Some structure (e.g., (tle, author, date for papers; subject, sender, des(na(on for ). 3
4 Examples of Documents <DOC> <DOC> <DOC> <DOCNO>WSJ </DOCNO> <DOCNO>FR </DOCNO> <DOCNO>ZF </DOCNO> <DD> = </DD> <DOC> <DOCID>fr f2.A1067</DOCID> <DOCID> OV: &M; </DOCID> <AN> <DOCNO>DOE </DOCNO> </AN> <TEXT> <HL> Politics & Policy: <TEXT> <ITAG tagnum=69> <JOURNAL>PC Magazine Dec v9 n21 FDA Focuses Interpretation <ITAG tagnum=41>[docket * Full Text of the relative GI No. COPYRIGHT toxicities 89N-0432]</ITAG> Ziff-Davis Publishing Co. of On Eli Lilly cytotoxic drugs depends on the endpoint In Drug <DOC> Inquiry Histological <ITAG assays tagnum=56>par </JOURNAL> of the dynamics Pharmaceutical, of mitotic Inc.; and --- <DOCNO>AP </DOCNO> necrotic to cells Withdraw in murine Approval <TITLE>Generic crypts of revealed Three 3D Abbreviated Drafting. few New (Software Drug Review) (one Probe of <FILEID>AP-NR the Industry apparently Shifts Applications; radical 1612EST</FILEID> differences Opportunity of three evaluations between for a individual Hearing</ITAG> of low-cost 3D CAD programs in Generic Drugs Illegal Low-cost Activities CAD: modeling by Manufacturers for the masses. To Possible <FIRST>u Brand-Name w drugs AM-GenericDrugs and between drugs and 0740</FIRST> radiation. Manufacturing <SECOND>AM-Generic Problems microcolony <T2>SUMMARY: Drugs,740</SECOND> assay of clonogenic </T2>The (evaluation) Food cells and reveals Drug Administration --- <HEAD>FDA differences Chief Says (FDA) Agency between proposes </TITLE> Descrip-on: Needs drugs to More in withdraw the Power ability approval to of cells of abbreviated By Bill Punish Richards Cheating maintain and Bruce on new crypt Drug Ingersoll drug Tests</HEAD> integrity applications <AUTHOR>Haase, or to (ANDA's) Bruce.&M; regenerate , crypt-like , Staff Reporters <BYLINE>By To be relevant a document must iden(fy a specific generic drug company structures of DEBORAH The Wall MESCE</BYLINE> Street held Journal by </AUTHOR> Par Pharmaceutical, </HL> Inc., One Ram <DD> 08/24/89 <BYLINE>Associated </DD> </TEXT> Ridge Press Rd., Writer</BYLINE> Spring <SUMMARY>Generic Valley, NY Software (Par). Inc s The $349 Generic 3D being inves(gated by the FDA or Congress. It also must iden(fy the drug, i.e., <SO> WALL <DATELINE>WASHINGTON STREET JOURNAL </DOC> (J) grounds </SO> (AP) </DATELINE> the Drafting proposed is withdrawal a low-cost the generic drug for Zantac. are (1) that the <CO> LLY <TEXT> PRX </CO> applications contain </SUMMARY> untrue statements of material <DESCRIPT> <IN> DRUG The MANUFACTURERS Food and Drug (DRG) fact, Administration </IN> and (2) that, chief based told Congress new information <GV> FOOD on AND Friday DRUG the ADMINISTRATION agency evaluated needs (FDA) more together Company: </GV> authority with the Generic to punish evidence Software available Inc. (Products).&O; when <TEXT> generic drug companies the applications Product: that cheat on were safety approved, Generic tests there 3-D Drafting is a lack 1.1 of (CAD Food and and Drug misrepresent Administration data substantial investigators to win product evidence Software).&O; are approvals. that the drugs will have the looking into possible brand-name effects drug they manufacturing purport Topic: or are Computer-Aided represented Design to have problems </TEXT> at an Indianapolis under plant the owned conditions by Eli of Lilly use prescribed, recommended, & Co. </DOC> or suggested in </DESCRIPT> their labeling. </ITAG> <TEXT> </TEXT> </TEXT> </TEXT> </DOC> </DOC> </DOC> Documents vs. Database Records Database records are typically made up of well defined fields (or a<ributes). e.g. company names, addresses, account numbers, drug names, patent numbers, inves(ga(on file numbers. Easy to compare fields with well defined seman(cs to queries in order to find matches. Our query has no fields and our documents have lifle structure. 4
5 IR vs. Databases Informa-on Retrieval Data: Semi structured. Heterogeneous. Noisy. Unstructured or semistructured queries. Natural language seman(cs. Infrequent off line index changes. Databases Data: Structured. Homogeneous. Clean. Structured queries. Well defined field seman(cs. Frequent on line index changes. Generic Drugs Illegal Ac(vi(es by Manufacturers <DOC> <DOCNO>FR </DOCNO> <DOC> <DOC> <DOCID>fr f2.A1067</DOCID> <DOCNO>WSJ </DOCNO> <DOCNO>ZF </DOCNO> <DD> = </DD> <DOC> <TEXT> <DOCID> OV: &M; </DOCID> <AN> <DOCNO>DOE </DOCNO> <ITAG tagnum=69> </AN> <HL> Politics & Policy: <TEXT> <ITAG tagnum=41>[docket <JOURNAL>PC No. Magazine 89N-0432]</ITAG> Dec v9 n21 FDA Focuses Interpretation of the relative * Full Text GI toxicities COPYRIGHT Ziff-Davis of Publishing On Eli Lilly cytotoxic <ITAG drugs tagnum=56>par depends 1990.&M; on the Pharmaceutical, endpoint chosen. Inc.; In Drug <DOC> Inquiry Histological Withdraw assays of Approval the </JOURNAL> dynamics of Three of Abbreviated mitotic and New --- <DOCNO>AP </DOCNO> necrotic Applications; cells in murine Opportunity <TITLE>Generic crypts revealed for 3D a Hearing</ITAG> few Drafting. (Software Review) Probe of <FILEID>AP-NR the Industry apparently Shifts radical 1612EST</FILEID> differences of three between evaluations individual of low-cost 3D CAD programs To Possible <FIRST>uw Brand-Name drugs AM-GenericDrugs and <T2>SUMMARY: between drugs </T2>The Low-cost and 0740</FIRST> radiation. Food CAD: and modeling The Drug Administration for the masses. Manufacturing <SECOND>AM-Generic Problems microcolony (FDA) Drugs,740</SECOND> assay proposes of clonogenic (evaluation) to withdraw cells approval reveals of major --- <HEAD>FDA differences Chief Says new Agency between drug applications Needs drugs </TITLE> More in the Power (ANDA's) ability to , of cells , to By Bill Punish Richards Cheating maintain and Bruce on crypt Drug Ingersoll Tests</HEAD> integrity held by <AUTHOR>Haase, Par or Pharmaceutical, to regenerate Bruce.&M; crypt-like Inc., One Staff Reporters <BYLINE>By structures of DEBORAH Ridge The Wall MESCE</BYLINE> Rd., Spring Street Journal </AUTHOR> Valley, NY (Par). The </HL> <DD> 08/24/89 <BYLINE>Associated </DD> </TEXT> grounds Press Writer</BYLINE> for the <SUMMARY>Generic proposed withdrawal Software (1) Inc s that $349 the Generic 3D <SO> WALL <DATELINE>WASHINGTON STREET JOURNAL </DOC> applications (J) </SO> (AP) </DATELINE> contain Drafting untrue is a statements low-cost of material <CO> LLY <TEXT> fact, and (2) that, PRX </CO> </SUMMARY> based on new information <IN> DRUG The MANUFACTURERS Food and Drug evaluated (DRG) Administration together </IN> <DESCRIPT> chief with told the Congress evidence available when <GV> FOOD on AND Friday DRUG the ADMINISTRATION agency the needs applications (FDA) more </GV> authority Company: were approved, to Generic punish there Software is a Inc. lack (Products).&O; of <TEXT> generic drug companies substantial that cheat evidence Product: on safety that Generic the tests drugs 3-D will Drafting have 1.1 the (CAD Food and and Drug misrepresent Administration data effects investigators to win they product purport Software).&O; are approvals. or represented to have looking into under the conditions of use prescribed, recommended, possible brand-name drug manufacturing Topic: Computer-Aided Design problems </TEXT> or suggested in their labeling. at an Indianapolis plant owned by Eli Lilly & Co. </DOC> </ITAG> </DESCRIPT> <TEXT> </TEXT> </DOC> </TEXT> </TEXT> </DOC> </DOC> 5
6 Comparing Text Determining whether a document matches a query is a fundamental problem of IR. Exact match is not enough: Many different ways to state the same informa(on Documents may be relevant even when lacking some of the query terms. Documents may be nonrelevant even if they contain all the query terms. Relevance What does it mean for a document to be relevant? Simple defini(on: A relevant document contains informa(on that a person was looking for when they submifed a query to the search engine. Many factors influence a person s decision about what is relevant: e.g., task, context, novelty, style. Topical relevance (same topic) vs. user relevance (everything else). How can we build an engine that retrieves relevant documents? 6
7 Retrieval Retrieval models define a view of relevance. Ranking algorithms used in search engines are based on retrieval models. Most models describe sta(s(cal proper(es of text rather than linguis(c proper(es. i.e. coun(ng simple text features such as words. Sta(s(cal approach started with Luhn in the 50s. Linguis(c features can be part of a sta(s(cal model. Evalua(on How do we know whether the engine is doing a good job of finding relevant documents? Evalua(on is experimental procedures and measures for comparing system output with user expecta(ons. IR evalua(on methods now used in many fields. Recall and precision are examples of effec(veness measures. 7
8 Not Just Documents New applica(ons increasingly involve new media. e.g. video, photos, music, speech Like text, content is difficult to describe and compare. text may be used to represent them (e.g. tags). IR approaches to search and evalua(on are appropriate. Dimensions of IR Content Applica-ons Tasks Text Web search Ad hoc search Images Ver(cal search Filtering Video Enterprise search Classifica(on Scanned docs Desktop search Ques(on answering Audio Forum search Music P2P search Literature search 8
9 Ad hoc search: IR Tasks Find relevant documents for an arbitrary text query. Filtering: Iden(fy relevant user profiles for a new document. Classifica(on: Iden(fy relevant labels for documents. Ques(on answering: Give a specific answer to a ques(on. IR and Search Engines A search engine is the prac(cal applica(on of informa(on retrieval techniques to large scale text collec(ons. Relevance, retrieval, evalua(on are issues. So are users and informa(on needs, performance, coverage, upda(ng, scalability, adaptability, and ability to handle specific problems (like spam). 9
10 Components of a Search Engine Universe of things to organize and search Filter/crawler/domain/ Corpus query Query parser Parser/tokenizer User Interface f(q,d) Indexer Retrieval func(on Server(s) Displayed results Retrieved results Building a Search Engine Text processing and indexing. Parsing; tokenizing; stopping and stemming; inverted indexes; scalability; index updates. Query processing and ranking. Query languages; index look up; retrieval models; features; relevance feedback; user interac(on. Evalua(on. Effec(veness at performing task; querying speed; user sa(sfac(on. 10
11 Course Overview This course is about informa(on retrieval in prac(ce: the applica(on of IR to search engine design and implementa(on. Course project: Design and implement a small search engine capable of indexing and searching Wikipedia pages. Evaluate its performance over provided queries. Add something interes(ng to it. Course Structure First half: Fundamentals of indexing, retrieval, and evalua(on. By the midterm we will have covered all aspects of designing a basic search engine. Second half: Addi(onal topics in search engine func(onality. Fielded search, user interac(on, clustering, linkgraph features, crawling, etc. 11
12 Textbook Search Engines: Informa(on Retrieval in Prac(ce by W. Bruce Crop, Donald Metzler, and Trevor Strohman. Unfortunately not yet published. I have PDFs of chapters. Also check supplemental texts on the course web page. Course Project Design and implement a small search engine to index and search Wikipedia pages. Semester long project in three phases: I. Indexing. II. Searching and evalua(ng. III. Addi(onal features. By midterm we will have covered everything needed to complete the first two phases. 12
13 Course Project: Phases For phases I and II, you will produce: A wrifen report of your design decisions and implementa(on details, including problems you encountered and how you resolved them. Code. Milestone worksheet responses. Timeline: Phase I: about 1.5 months. Phase II: about 1 month. Phase III: about 1 month. Course Project: Milestones Each phase has milestones to make sure you are not running into trouble. Worksheets with ques(ons you can answer using your code. Milestones and worksheets will be available in advance so you may work ahead if desired (we recommend it). If you are having trouble, we will be able to help you early. 13
14 Course Project: Phase III Phase III involves adding extra features to your engine. Anything we cover in the second half of the course, or anything in the book but not covered, or something else. You will write a 2 4 page proposal explaining how you would add the feature to your current code base. At the end of the semester you will give a short presenta(on on your engine. Course Project: Implementa(on This is a programming project! You may use any programming language the professor and/or TA understand. We highly recommend C, C++, or Java. You will have accounts on my lab cluster. No disk quotas; 16Gb RAM per node; 8 cores per node. Do not use for file sharing or other illicit ac(vity! 14
15 Course Project: Data I have obtained all English language Wikipedia pages. The top 10% with highest PageRank are provided for the project. 489 students must index and search 20% of those (2% of English Wikipedia). 689 students must index and search 100% of those (10% of English Wikipedia). Extra credit: index and search even more. Project Grading Project is 60% of total grade. Each phase is 20%. Phases I and II break down as follows: WriFen report (2 4 pages): 5% Code: 10% Turning in worksheets: 5% Phase III: Proposal (2 4 pages): 10% for 689, 15% for 489. Code: 5% for 689, 0% for 489. Final presenta(on: 5%. 15
16 Homeworks and Exams In addi(on to the project, there will be 5 homeworks and 2 exams (midterm and final). Each homework is 4% of total grade. Each exam is 10% of total grade. Exams will cover implementa(on details of project. Books and Resources Informa(on Retrieval, Keith van Rijsbergen. hfp:// Introduc(on to Informa(on Retrieval, Manning et al. hfp://www csli.stanford.edu/~hinrich/ informa(on retrieval book.html Check the course web page open! hfp:// 16
CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University
CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Course Goals To help you to understand search engines, evaluate and compare them, and
More informationCS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University
CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Course Goals To help you to understand search engines, evaluate and compare them, and
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More information60-538: Information Retrieval
60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are
More informationInforma/on Retrieval. Text Search. CISC437/637, Lecture #23 Ben CartereAe. Consider a database consis/ng of long textual informa/on fields
Informa/on Retrieval CISC437/637, Lecture #23 Ben CartereAe Copyright Ben CartereAe 1 Text Search Consider a database consis/ng of long textual informa/on fields News ar/cles, patents, web pages, books,
More informationInformation Retrieval
Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information
More informationInformation Retrieval CS6200. Jesse Anderton College of Computer and Information Science Northeastern University
Information Retrieval CS6200 Jesse Anderton College of Computer and Information Science Northeastern University What is Information Retrieval? You have a collection of documents Books, web pages, journal
More informationExam IST 441 Spring 2011
Exam IST 441 Spring 2011 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.
More informationCSCI 5417 Information Retrieval Systems! What is Information Retrieval?
CSCI 5417 Information Retrieval Systems! Lecture 1 8/23/2011 Introduction 1 What is Information Retrieval? Information retrieval is the science of searching for information in documents, searching for
More informationIntroduction & Administrivia
Introduction & Administrivia Information Retrieval Evangelos Kanoulas ekanoulas@uva.nl Section 1: Unstructured data Sec. 8.1 2 Big Data Growth of global data volume data everywhere! Web data: observation,
More informationSearch Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson
Search Engines Informa1on Retrieval in Prac1ce Annota1ons by Michael L. Nelson All slides Addison Wesley, 2008 Evalua1on Evalua1on is key to building effec$ve and efficient search engines measurement usually
More informationIntroduc3on to Data Management
ICS 101 Fall 2014 Introduc3on to Data Management Assoc. Prof. Lipyeow Lim Informa3on & Computer Science Department University of Hawaii at Manoa Lipyeow Lim - - University of Hawaii at Manoa 1 The Data
More informationCS506/606 - Topics in Information Retrieval
CS506/606 - Topics in Information Retrieval Instructors: Class time: Steven Bedrick, Brian Roark, Emily Prud hommeaux Tu/Th 11:00 a.m. - 12:30 p.m. September 25 - December 6, 2012 Class location: WCC 403
More informationCS 572: Information Retrieval. Lecture 1: Course Overview and Introduction 11 January 2016
CS 572: Information Retrieval Lecture 1: Course Overview and Introduction 11 January 2016 1/11/2016 CS 572: Information Retrieval. Spring 2016 1 Lecture Plan What is IR? (the big questions) Course overview
More informationBIO 139 HUMAN ANATOMY AND PHYSIOLOGY II LABORATORY SYLLABUS
BLUEGRASS COMMUNITY AND TECHNICAL COLLEGE NATURAL SCIENCES Spring 2013 BIO 139 HUMAN ANATOMY AND PHYSIOLOGY II LABORATORY SYLLABUS 0 CREDIT HOURS 2 CONTACT HOURS PREREQUISITE: BIO 137 COURSE DESCRIPTION:
More informationECE573 Introduction to Compilers & Translators
ECE573 Introduction to Compilers & Translators Tentative Syllabus Fall 2005 Tu/Th 9:00-10:15 AM, EE 115 Instructor Prof. R. Eigenmann Tel 49-41741 Email eigenman@ecn Office EE334C Office Hours Tu 10:15-11:30
More informationExam IST 441 Spring 2014
Exam IST 441 Spring 2014 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.
More informationBuilding Test Collections. Donna Harman National Institute of Standards and Technology
Building Test Collections Donna Harman National Institute of Standards and Technology Cranfield 2 (1962-1966) Goal: learn what makes a good indexing descriptor (4 different types tested at 3 levels of
More informationInformation Retrieval CSCI
Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1
More informationCloudSearch and the Democra1za1on of Informa1on Retrieval
SIGIR 2012 Portland CloudSearch and the Democra1za1on of Informa1on Retrieval Daniel E. Rose A9.com danrose@a9.com What Does A9 Do? Product Search Adver1sing Technology 15 August 2012 Visual Search Community
More informationIn this course, you need to use Pearson etext. Go to "Pearson etext and Video Notes".
**Disclaimer** This syllabus is to be used as a guideline only. The information provided is a summary of topics to be covered in the class. Information contained in this document such as assignments, grading
More informationCOSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era. Spring Instructor: Grace Hui Yang
COSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era Spring 2016 Instructor: Grace Hui Yang The Web provides abundant information which allows us to live more conveniently and
More informationEvaluating and Improving Software Usability
Evaluating and Improving Software Usability 902 : Thursday, 9:30am - 10:45am Philip Lew www.xbosoft.com Understand, Evaluate and Improve 2 Agenda Introduc7on Importance of usability What is usability?
More informationIntroduction to Information Technology ITP 101x (4 Units)
Objective Concepts Introduction to Information Technology ITP 101x (4 Units) Upon completing this course, students will: - Understand the fundamentals of information technology - Learn core concepts of
More informationCS 4230 Java Application Development Syllabus
General Information CS 4230 Java Application Development Semester: Fall 2016 Textbook: Core Java Volume II, 9th Edition, by Horstmann & Cornell, 2013, Prentice Hall, ISBN 978 0 1370 8160 8 Location: SLCC
More informationAbout the Course. Reading List. Assignments and Examina5on
Uppsala University Department of Linguis5cs and Philology About the Course Introduc5on to machine learning Focus on methods used in NLP Decision trees and nearest neighbor methods Linear models for classifica5on
More informationCS 4230 Java Application Development Syllabus
General Information Semester: Fall 2018 Textbook: Core Java Volume II, 9th Edition, by Horstmann & Cornell, 2013, Prentice Hall, ISBN 978-0-1370-8160-8 Location: SLCC BB 330 Instructor Info: Website: Trevor
More informationCS 3030 Scripting Languages Syllabus
General Information CS 3030 Scripting Languages Semester: Fall 2017 Textbook: Location: Instructor Info: None. We will use freely available resources from the Internet. Online Ted Cowan tedcowan@weber.edu
More informationInternet Web Technologies ITP 104 (2 Units)
Internet Web Technologies ITP 104 (2 Units) Spring 2011 Objective This course is intended to teach the basics involved in publishing content on the World Wide Web. This includes the language of the Web
More informationCS 241 Data Organization using C
CS 241 Data Organization using C Fall 2018 Instructor Name: Dr. Marie Vasek Contact: Private message me on the course Piazza page. Office: Farris 2120 Office Hours: Tuesday 2-4pm and Thursday 9:30-11am
More informationIn this course, you need to use Pearson etext. Go to "Pearson etext and Video Notes".
**Disclaimer** This syllabus is to be used as a guideline only. The information provided is a summary of topics to be covered in the class. Information contained in this document such as assignments, grading
More informationCSC 443: Web Programming
1 CSC 443: Web Programming Haidar Harmanani Department of Computer Science and Mathematics Lebanese American University Byblos, 1401 2010 Lebanon Today 2 Course information Course Objectives A Tiny assignment
More informationExam IST 441 Spring 2013
Exam IST 441 Spring 2013 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.
More informationOverview of the Class
Overview of the Class Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California (USC) have explicit permission to make copies
More informationSearch Engines Information Retrieval in Practice
Search Engines Information Retrieval in Practice W. BRUCE CROFT University of Massachusetts, Amherst DONALD METZLER Yahoo! Research TREVOR STROHMAN Google Inc. ----- PEARSON Boston Columbus Indianapolis
More informationESET 369 Embedded Systems Software, Spring 2018
ESET 369 Embedded Systems Software, Spring 2018 Syllabus Contact Information: Professor: Dr. Byul Hur Office: Fermier 008A Telephone: (979) 845-5195 FAX: E-mail: byulmail@tamu.edu Web: rftestgroup.tamu.edu
More informationChapter 2. Architecture of a Search Engine
Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them
More informationTextbook(s) and other required material: Raghu Ramakrishnan & Johannes Gehrke, Database Management Systems, Third edition, McGraw Hill, 2003.
Elective course in Computer Science University of Macau Faculty of Science and Technology Department of Computer and Information Science SFTW371 Database Systems II Syllabus 1 st Semester 2013/2014 Part
More informationBehrang Mohit : txt proc! Review. Bag of word view. Document Named
Intro to Text Processing Lecture 9 Behrang Mohit Some ideas and slides in this presenta@on are borrowed from Chris Manning and Dan Jurafsky. Review Bag of word view Document classifica@on Informa@on Extrac@on
More informationrepresen/ng the world in 1s and 0s CS 4100/5100 Founda/ons of AI
represen/ng the world in 1s and 0s CS 4100/5100 Founda/ons of AI Announcements Assignment 2 clarifica/ons Final projects: what s next? Feedback Project Proposal Midterm Exam: October 18th ASP CLARIFICATIONS
More informationDatabase Systems: Concepts, design, and implementation ISE 382 (3 Units)
Database Systems: Concepts, design, and implementation ISE 382 (3 Units) Spring 2013 Description Obectives Instructor Contact Information Office Hours Concepts in modeling data for industry applications.
More informationFall Principles of Knowledge Discovery in Databases. University of Alberta
Principles of Knowledge Discovery in Databases Fall 1999 Dr. Osmar R. Zaïane 2 1 Class and Office Hours Class: Mondays, Wednesdays and Fridays from 10:00 to 10:50 Office Hours: Tuesdays from 11:00 to 11:55
More informationSearch Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson
Search Engines Informa1on Retrieval in Prac1ce Annotations by Michael L. Nelson All slides Addison Wesley, 2008 Indexes Indexes are data structures designed to make search faster Text search has unique
More informationISM 324: Information Systems Security Spring 2014
ISM 324: Information Systems Security Spring 2014 Instructor: Co-Instructor: Office: E-Mail: Phone: Office Hours: Jeffrey Wall Hamid Nemati 392 Bryan Building jdwall2@uncg.edu (email is the preferred method
More information1. Textbook #1: Our Digital World (ODW). 2. Textbook #2: Guidelines for Office 2013 (GFO). 3. SNAP: Assessment Software
CIS - Survey of Computer Information Systems SPRING 014-16-Week Course Professor: JON P. RAGER Weekly Schedule Note: This schedule is subjected to BE CHANGED at your instructor's discretion. Please check
More informationCS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University
CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Indexing Process Indexes Indexes are data structures designed to make search faster Text search
More information61A Lecture 7. Monday, September 15
61A Lecture 7 Monday, September 15 Announcements Homework 2 due Monday 9/15 at 11:59pm Project 1 deadline extended, due Thursday 9/18 at 11:59pm! Extra credit point if you submit by Wednesday 9/17 at 11:59pm
More informationCompilers for Modern Architectures Course Syllabus, Spring 2015
Compilers for Modern Architectures Course Syllabus, Spring 2015 Instructor: Dr. Rafael Ubal Email: ubal@ece.neu.edu Office: 140 The Fenway, 3rd floor (see detailed directions below) Phone: 617-373-3895
More informationOperating Systems, Spring 2015 Course Syllabus
Operating Systems, Spring 2015 Course Syllabus Instructor: Dr. Rafael Ubal Email: ubal@ece.neu.edu Office: 140 The Fenway, 3rd floor (see detailed directions below) Phone: 617-373-3895 Office hours: Wednesday
More informationAdd/Drop Deadlines and Procedures. Fall 2017
Add/Drop Deadlines and Procedures Fall 2017 1 TABLE OF CONTENTS When Can Students Amend Their Fall Schedule? Fall Add/Drop Timeline Page 3 Frequently Asked Questions Page 4 How Do Students Sign In to the
More informationCS-490WIR Web Information Retrieval and Management. Luo Si
CS490W: Web Information Retrieval & Management CS-490WIR Web Information Retrieval and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces
More informationRochester Institute of Technology Golisano College of Computing and Information Sciences Department of Information Sciences and Technologies
Rochester Institute of Technology Golisano College of Computing and Information Sciences Department of Information Sciences and Technologies 4002-360.01 ~ Introduction to Database & Data Modeling ~ Spring
More informationBIO 139 HUMAN ANATOMY AND PHYSIOLOGY II LABORATORY SYLLABUS
BLUEGRASS COMMUNITY AND TECHNICAL COLLEGE NATURAL SCIENCES Spring 2011 BIO 139 HUMAN ANATOMY AND PHYSIOLOGY II LABORATORY SYLLABUS 0 CREDIT HOURS 2 CONTACT HOURS PREREQUISITE: BIO 137 COURSE DESCRIPTION:
More informationXML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson
Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges
More informationRed Hat Certified Engineer (RH300) 50 Cragwood Rd, Suite 350 South Plainfield, NJ 07080
COURSE SYLLABUS Red Hat Certified Engineer (RH300) 50 Cragwood Rd, Suite 350 South Plainfield, NJ 07080 Victoria Commons, 613 Hope Rd Building #5, Eatontown, NJ 07724 130 Clinton Rd, Fairfield, NJ 07004
More informationCompiling Techniques
Lecture 1: Introduction 20 September 2016 Table of contents 1 2 3 Essential Facts Lecturer: (christophe.dubach@ed.ac.uk) Office hours: Thursdays 11am-12pm Textbook (not strictly required): Keith Cooper
More informationOverview of the Class
Overview of the Class Copyright 2014, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California (USC) have explicit permission to make copies
More informationCompulsory course in Computer Science
Compulsory course in Computer Science University of Macau Faculty of Science and Technology Department of Computer and Information Science SFTW241 Programming Languages Architecture I Syllabus 2 nd Semester
More informationInstructor: Anna Miller
Media Graphics ADV 3203 Fall 2016 Advertising Media Graphics - 81584 - ADV 3203 Mondays and Wednesdays 12:15 PM - 1:30 PM room 1011 And Advertising Media Graphics - 82354 - ADV 3203 Mondays and Wednesdays
More informationProcessing Structural Constraints
SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited
More informationAssignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis
Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Due by 11:59:59pm on Tuesday, March 16, 2010 This assignment is based on a similar assignment developed at the University of Washington. Running
More informationESET 369 Embedded Systems Software, Fall 2017
ESET 369 Embedded Systems Software, Fall 2017 Syllabus Contact Information: Professor: Dr. Byul Hur Office: Fermier 008A Telephone: (979) 845-5195 FAX: E-mail: byulmail@tamu.edu Web: rftestgroup.tamu.edu
More informationIntroduc9on to the Course
Mestrado em Engenharia Informá9ca e de Computadores Master Degree (MSc) in Informa9on Systems and Computer Engineering Administração e Gestão de Infra-estruturas de IT IT Infrastructure Management and
More informationUniversity of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015
University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 2:00pm-3:30pm, Tuesday, December 15th Name: ComputingID: This is a closed book and closed notes exam. No electronic
More informationINFO/CS 4302 Web Informa6on Systems
INFO/CS 4302 Web Informa6on Systems FT 2012 Week 7: RESTful Webservice APIs - Bernhard Haslhofer - 2 3 4 Source: hmp://www.blogperfume.com/new- 27- circular- social- media- icons- in- 3- sizes/ 5 Plan
More informationIntroduction to Web Design & Computer Principles
Introduction to Web Design & Computer Principles CSCI-UA.0004-007 Instructor: Adam Scher Tuesday/Thursday 8:00am - 9:15am Warren Weaver Hall Room 101 What s in store today... Who Am I? Course Overview
More informationBIO 139 HUMAN ANATOMY AND PHYSIOLOGY II LABORATORY SYLLABUS
BLUEGRASS COMMUNITY AND TECHNICAL COLLEGE NATURAL SCIENCES Fall 2015 BIO 139 HUMAN ANATOMY AND PHYSIOLOGY II LABORATORY SYLLABUS 0 CREDIT HOURS 2 CONTACT HOURS PREREQUISITE: BIO 137 COURSE DESCRIPTION:
More informationWhat makes an applica/on a good applica/on? How is so'ware experienced by end- users? Chris7an Campo EclipseCon 2012
What makes an applica/on a good applica/on? How is so'ware experienced by end- users? Chris7an Campo EclipseCon 2012 Who are we? Chris/an Campo How is so:ware experienced by end- users? What is Usability?
More informationComputer Science Technology Department
Computer Science Technology Department Houston Community College Department Phone Number: ab Houston Community College ITSC 1309 Integrated Software Applications I Course Syllabus Instructor Course Reference
More informationNatural Language Processing
Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document
More informationCENG505 Advanced Computer Graphics Lecture 1 - Introduction. Instructor: M. Abdullah Bülbül
CENG505 Advanced Computer Graphics Lecture 1 - Introduction Instructor: M. Abdullah Bülbül 1 What is Computer Graphics? Using computers to generate and display images. 2 Computer Graphics Applica>ons (Where
More informationBIO 139 HUMAN ANATOMY AND PHYSIOLOGY II LABORATORY SYLLABUS
BLUEGRASS COMMUNITY AND TECHNICAL COLLEGE NATURAL SCIENCES Fall 2011 BIO 139 HUMAN ANATOMY AND PHYSIOLOGY II LABORATORY SYLLABUS 0 CREDIT HOURS 2 CONTACT HOURS PREREQUISITE: BIO 137 COURSE DESCRIPTION:
More informationCourse Title: Network+/Networking Fundamentals. Course Section: CNS-101-I1. FORMAT: Online
Course Title: Network+/Networking Fundamentals Course Section: CNS-101-I1 FORMAT: Online TIME FRAME: Start Date: 15 January 2018 End Date: 06 May 2018 CREDITS: 4 INSTRUCTOR: Carlos J. Garcia Office Hours:
More informationCourse Title: Computer Networking 2. Course Section: CNS (Winter 2018) FORMAT: Face to Face
Course Title: Computer Networking 2 Course Section: CNS-106-50 (Winter 2018) FORMAT: Face to Face TIME FRAME: Start Date: 15 January 2018 End Date: 28 February 2018 Monday & Wednesday 1:00pm 5:00pm CREDITS:
More informationComputer Technology Division. Course Syllabus for: COMT Spring Instructor: Joe Bolen
Computer Technology Division Course Syllabus for: COMT 11009 Spring 2013 Instructor: Joe Bolen Course: Computer Assembly & Configuration COMT 11009 Spring 2013 / Tuscarawas / Call # 12133 / Section 800
More informationInforma)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies
Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:
More informationIntroduction to Information Retrieval. Hongning Wang
Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an
More informationSpring CISM 3330 Section 01D (crn: # 10300) Monday & Wednesday Classroom Miller 2329 Syllabus revision: #
Spring 2018 - CISM 3330 Section 01D (crn: # 10300) Monday & Wednesday 0800 0915 Classroom Miller 2329 Syllabus revision: # 171124 FACULTY DATA: Dr. Douglas Turner Phone: 678.839.5252 Miller 2223 OFFICE
More informationIncrease Engagement in Educa0on with Video Streaming. How The University of Maine Changed Their Learning Experience with Wowza
Increase Engagement in Educa0on with Video How The University of Maine Changed Their Learning Experience with Wowza Agenda Introduc0ons The University of Maine BioMedia Lab Synapse LCMS (Learning Content
More informationModern Information Retrieval
Modern Information Retrieval Chapter 3 Retrieval Evaluation Retrieval Performance Evaluation Reference Collections CFC: The Cystic Fibrosis Collection Retrieval Evaluation, Modern Information Retrieval,
More informationECE 467 Section 201 Network Implementation Laboratory
ECE 467 Section 201 Network Implementation Laboratory Spring 2015 Class Meets: Day: Wednesday Time: 4:30 PM to 7:10 PM Where: Johnson Center, Room G10C Instructor: Ben Allen My Contact Information: E-mail
More informationCS54701: Information Retrieval
CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful
More informationGes$one Avanzata dell Informazione Part A Full- Text Informa$on Management. Full- Text Indexing
Ges$one Avanzata dell Informazione Part A Full- Text Informa$on Management Full- Text Indexing Contents } Introduction } Inverted Indices } Construction } Searching 2 GAvI - Full- Text Informa$on Management:
More informationCourse Syllabus. Course Information
Course Syllabus Course Information Course: MIS 6V99 Special Topics Programming for Data Science Section: 5U1 Term: Summer 2017 Meets: Friday, 6:00 pm to 10:00 pm, JSOM 2.106 Note: Beginning Fall 2017,
More informationCISN 340 Data Communication and Networking Fundamentals Fall 2012 (Hybrid)
CISN 340 Data Communication and Networking Fundamentals Fall 2012 (Hybrid) Instructor: Kevin M. Anderson, MBA, CCAI, MCSE, MCDBA, Office Phone: (916) 650-2926 CNE, LCP, CIW Associate, Security+, N +, A
More informationCENTRAL TEXAS COLLEGE INDUSTRIAL TECHNOLOGY DEPARTMENT SYLLABUS FOR GRPH 1459 VECTOR GRAPHICS FOR PRODUCTION SEMESTER HOURS CREDIT: 4
CENTRAL TEXAS COLLEGE INDUSTRIAL TECHNOLOGY DEPARTMENT SYLLABUS FOR GRPH 1459 VECTOR GRAPHICS FOR PRODUCTION SEMESTER HOURS CREDIT: 4 INTRODUCTION A. A study and use of vector graphics for production.
More informationCMSC433 - Programming Language Technologies and Paradigms. Introduction
CMSC433 - Programming Language Technologies and Paradigms Introduction Course Goal To help you become a better programmer Introduce advanced programming technologies Deconstruct relevant programming problems
More informationTowards Process Understanding:
Towards Process Understanding: sta2s2cal analysis applied to the manufacturing process of tablets Drug Product Development: A QbD Approach Nadia Bou-Chacra Faculty of Pharmaceutical Sciences University
More informationVK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Results Exercise 01 Exercise 02 Retrieval
More informationCSE373: Data Structures and Algorithms Lecture 1: Introduc<on; ADTs; Stacks/Queues
CSE373: Data Structures and Algorithms Lecture 1: Introduc
More informationClinical Research Professionals Educa3on Session Barb Greguson Alliance Sta3s3cal and Data Center
Clinical Research Professionals Educa3on Session Barb Greguson Alliance Sta3s3cal and Data Center greguson.barbara@mayo.edu Alliance for Clinical Trials in Oncology Fall 2015 Group Mee3ng Presenta3on Objec3ves
More informationWeb Programming Spring 2010
Web Programming Spring 2010 Course number: M&IS 24065 Section: 001/ 002 CRN: 11441/13343 Location: BSA 205/BSA 324 Meeting day: TR Meeting time: 2:15-3:30 PM/5:30-6:45 PM Instructor Name: Professor Janet
More informationECSE 425 Lecture 1: Course Introduc5on Bre9 H. Meyer
ECSE 425 Lecture 1: Course Introduc5on 2011 Bre9 H. Meyer Staff Instructor: Bre9 H. Meyer, Professor of ECE Email: bre9 dot meyer at mcgill.ca Phone: 514-398- 4210 Office: McConnell 525 OHs: M 14h00-15h00;
More informationDatabase Management System Implementation. Who am I? Who is the teaching assistant? TR, 10:00am-11:20am NTRP B 140 Instructor: Dr.
Database Management System Implementation TR, 10:00am-11:20am NTRP B 140 Instructor: Dr. Yan Huang TA: TBD Who am I? Dr. Yan Huang, graduated 2003 from University of Minnesota Research interests: database,
More informationElementary IR: Scalable Boolean Text Search. (Compare with R & G )
Elementary IR: Scalable Boolean Text Search (Compare with R & G 27.1-3) Information Retrieval: History A research field traditionally separate from Databases Hans P. Luhn, IBM, 1959: Keyword in Context
More informationWindows Server 2008 Applications Infrastructure Configuration (ITMT 2322)
Windows Server 2008 Applications Infrastructure Configuration (ITMT 2322) Credit: 3 semester credit hours (2 hours lecture, 4 hours lab) Prerequisite/Co-requisite: None Course Description A course in the
More informationSYLLABUS BIOLOGY 1107: Principles of Biology I Fall Semester Section 001: (9:05-9:55AM, Mon, Wed and Fri) Bldg.: Laurel Hall, Room 102
SYLLABUS BIOLOGY 1107: Principles of Biology I Fall Semester 2016 Section 001: (9:05-9:55AM, Mon, Wed and Fri) Bldg.: Laurel Hall, Room 102 Section 020: (10:10-11:00AM, Mon, Wed and Fri) Bldg.: Laurel
More informationCS490W: Web Information Search & Management. CS-490W Web Information Search and Management. Luo Si. Department of Computer Science Purdue University
CS490W: Web Information Search & Management CS-490W Web Information Search and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces between
More informationCISC327 - So*ware Quality Assurance
CISC327 - So*ware Quality Assurance Lecture 8 Introduc
More informationPrinciples of Programming Languages
Principles of Programming Languages h"p://www.di.unipi.it/~andrea/dida2ca/plp- 14/ Prof. Andrea Corradini Department of Computer Science, Pisa Lesson 10! Con:nua:on of the course Syntax- Directed Transla:on
More information