COSC 431 Information Retrieval. Phrase Search & Structured Search

Size: px
Start display at page:

Download "COSC 431 Information Retrieval. Phrase Search & Structured Search"

Transcription

1 COSC 431 Information Retrieval Phrase Search & Structured Search 1

2 Phrase Searching What are Structured documents Meta-data Structured documents Outline Searching Structured documents Meta-data Semi-structured documents Searching Progressive filters Embedded paths Dewey decimal codes 2

3 Phrase Searching Using AND Phrase search: Information Retrieval A = Postings for Information B = Postings for Retrieval P = A & B Advantages No false negatives Fast No changes to the search engine (just to the query parser) High recall Disadvantages Many false positives Low precision (apparently) Can t do proximity ( within n words of ) searching 3

4 Phrase Searching Using Bi-grams Index adjacent words as a single term Example documents Doc1: University of Otago Doc2: Otago University Bi-gram postings: Of Otago <1,1> Otago University <2,1> University Of <1,1> Now search for the bi-gram Otago University is the same as searching for any single term 4

5 Phrase Searching Using N-grams How to search for University of Otago Search for University of and of Otago and assume any document containing both is correct Search for University of and of Otago adjacent to each other? Could use tri-grams Where should n in n-gram stop? 5

6 Phrase Searching For arbitrary proximity / adjacency search it is necessary to store word positions Where before <d n,f n > was used for postings, now <d n,w n > is used w n is the word number within the given document <d n,f n > can be calculated from <d n,w n > by counting the number of w n for a given d n Relevance ranking therefore not affected You can compute idf phrase from these postings 6

7 Short-Phrase Searching Example documents Doc1: University of Otago Doc2: Otago University Postings become: Of <1,2> Otago <1,3><2,1> University <1,1><2,2> To find Otago University Load postings for Otago (L 1 ) Load postings for University (L 2 ) Merge together looking for L 1.d = L 2.d L 1.w = L 2.w 1 Document is the same and position in L 2 is one larger then position in L 1 7

8 Proximity Searching To find Otago within n words of University Load postings for Otago (L 1 ) Load postings for University (L 2 ) Merge together looking for L 1.d = L 2.d n <= L 2.w L 1.w Document is the same and position and the distance between the two words is less than or equal to n 8

9 Long-Phrase Searching E.g. University of Otago T 1 = university T 2 = of T 3 = otago L 1 = postings(t 1 ) For n = 2 to T L 2 = postings(t n ) L 1 = adjacent(l 1, L 2 ) In other words, the result of adjacency is a postings list so phrases of varying length can be found using the same algorithm But more commonly, a multi-way merge is performed where the shortest list drives a merge (with skipping) 9

10 Efficient Phrase Searching How do we reduce the computation? Use w n from start of collection, insert a gap between documents (e.g. 100 words) to prevent finding phrases across document boundaries Example documents Doc1: University of Otago Doc2: Otago University Postings become: Of <1,2> Otago <1,3><2,104> University <1,1><2,105> 10

11 Efficient Phrase Searching No longer need to compare d n, as word numbers are no longer re-used. If L 2.w = L 1.w + 1 then terms must be adjacent Postings become <w 1 ><w 2 > <w n > Of <2> Otago <3><104> University <1><105> Then convert into document IDs by merging with a boundary <3><105> 11

12 Region Algebra Storing term positions rather than document numbers is the approach taken by the Wumpus search engine of the University of Waterloo We ll see a little later on that it can be adapted to semi-structured documents with little extra effort This branch of research is known as region algebra 12

13 Structured Documents Not all documents are flowing text (most contain some form of structure) Card catalogue EndNote, Reference Manager, ProCite, MEDLARS Two Letter Format Example from PubMed Today, XML or JSON are the preferred formats 13

14 Structured Documents ID AU - Gritzalis D AU - Kokolakis S DP TI - Security policy development for Healthcare Information Systems. TA - Stud Health Technol Inform VI - 96 PG AB - In this paper the issue of security policy development for health information systems is addressed. Security policy development involves the definition of the policy content, the analysis of the social, organisational, and technical contexts, as well as the organisation of the policy development process. We present the structure of security policies, analyse the characteristics of the HIS context, and analyse the different categories of methodologies, which can be used towards this end. 14

15 Structured Documents <PubmedArticle> <MedlineCitation Owner="NLM" Status="Completed"> <PMID> </PMID> <MedlineJournalInfo> <MedlineTA>Stud Health Technol Inform</MedlineTA> </MedlineJournalInfo> <Article> <Journal> <JournalIssue PrintYN="Y"> <Volume>96</Volume> <PubDate><Year>2003</Year></PubDate> </JournalIssue> </Journal> <ArticleTitle>Security policy development for Healthcare Information Systems.</ArticleTitle> <Pagination><MedlinePgn>105-10</MedlinePgn></Pagination> <Abstract> <AbstractText>In this paper the issue of security policy development for health information systems is addressed. Security policy development involves the definition of the policy content, the analysis of the social, organisational, and technical contexts, as well as the organisation of the policy development process. We present the structure of security policies, analyse the characteristics of the HIS context, and analyse the different categories of methodologies, which can be used towards this end.</abstracttext> </Abstract> <AuthorList CompleteYN="Y"> <Author> <LastName>Gritzalis</LastName><Initials>D</Initials> </Author> <Author> <LastName>Kokolakis</LastName><Initials>S</Initials> </Author> </AuthorList> </Article> </MedlineCitation> </PubmedArticle> 15

16 Structured Documents The document could be stored in a relational database (e.g. Oracle) ARTID JID ABID Date Vol Page TITLE Security policy development for Healthcare Information Systems. ABID JID Journal Abstract 1 In this paper the issue of security policy development for health information systems is addressed. Security policy development involves the definition of the policy content, the analysis of the social 1 Stud Health Technol Inform ARTID AID AID Surname Initial 1 Gritzalis d 2 Kokolakis s 16

17 Properties Structured Documents Information is in structures (fields) Not all structures need to appear in every document Structures are flat (even if they appear to be hierarchical) Can easily be kept in a database Many structured documents originate in databases When converting to XML, the relational references get dropped and the entities get strung together to form documents New questions can be asked of this structured information 17

18 Structured Documents What has Dr Brown written? Compare precision of To Brown as author Document contains Brown Compare recall of To Brown as author Document contains Brown If documents are marked up correctly, structured information retrieval should increase precision, while not forfeiting recall 18

19 Structured Searching Can t ask relational questions Who publishes with Dr Brown Can only ask IR questions What documents contain author Brown Examples What has been published in SIGIR? What was published in 2009? What cited Dr Brown in the 2009 SIGIR? 19

20 Metadata Metadata is information known about the document that is not part of the document In HTML, this includes the web-page URL, and often the text of anchors pointing to the page 20

21 Metadata How does Google know which pages come from New Zealand when this (often) isn t in the page itself? Google allows a structured search on the metadata in combination with the document contents e.g. otago site:nz 21

22 Metadata Metadata is often structured. It can be thought of as part of the document, only you can t see it In HTML <meta> tags are used, the contents of the <meta> tags are not displayed If the metadata can be constructed, and linked to the document at index time, it becomes possible to index the metadata at index time (convert it into structured data) Either add metadata to the document, or index it as terms in the document without adding it to the document (i.e. don t make the length of the document longer) ebay (and others) do a substantial amount of this 22

23 Metadata If it s possible to index virtual metadata, it s possible to index virtual documents select * from table where author=brown Index each row the database returns as a document then discard the row. The document s id should be somehow linked back into the database (the rowid?) 23

24 Searching Structured Documents In a structured document, all the data is in nonoverlapping structures Earlier example: AU - Gritzalis D AU - Kokolakis S DP TI - Security policy development for Healthcare Information Systems. TA - Stud Health Technol Inform VI - 96 PG Unique words by structure. Stopping numbers: AU TA TI d gritzalis kokolakis s health inform stud technol development for healthcare information policy security systems Build an inverted file index for each unique structure 24

25 Searching Structured Documents AU Dict d gritzalis kokolakis s AU Postings <1,1> TA Dict health inform stud technol TA Postings <1,1> TI Dict development for healthcare information policy security systems TI Postings <1,1> 25

26 Searching Structured Documents Each structure is in it s own index Given the query Gritzalis as author Gritzalis:AU Determine which inverted index to use and load the postings from there From the AU dictionary Binary search for the word Gritzalis If found, load the postings Process postings as usual Given the query security in title or abstract security:ti or security:ab First search the TI index then the AB index 26

27 Searching Structured The cost of searching one inverted index is two disk seeks and two disk reads The cost of searching a structured document with 9 structures is therefore 18 seeks and 18 reads. This is too slow! 27

28 Solution: Searching Structured Users usually perform simple searches without structural constraints, so we build a global index too: TI development for healthcare information policy security systems TA health inform stud technol AU d gritzalis kokolakis s TI Postings TA Postings AU Postings ALL d development for gritzalis health healthcare inform information kokolakis policy s security stud systems technol ALL Postings 28

29 ebay The ebay search engine sometimes does this, but merges the vocabularies by prefixing with the zone. For efficiency it includes a default zone Vocabulary :d :development :for :gritzalis :health :healthcare :inform :information :kokolakis :policy :s :security :stud :systems :technol AU:d AU:gritzalis AU:kokolakis AU:s TA:health TA:inform TA:stud TA:technol TI:development TI:for TI:healthcare TI:information TI:policy TI:security TI:systems Postings 29

30 Semi-Structured Semi-structured data is not flat-structured Compare <title>evolution of the second beta-galactosidase of Escherichia coli</title> To <title>evolution of the second beta-galactosidase of <species>escherichia coli</species></title> Does Escherichia coli lay in the <title> structure or the <species> structure? In a semi-structured document it is in <title>, it is in <species> and it is in <species> in <title> Semi-structured formats include: SGML, XML, HTML, and so on 30

31 Semi-Structured Many different kinds of queries Find documents containing: The given term The given path The given term in a given tag The given term in a partially specified path The given term in a fully specified path And there are new complications in phrase search: <title><species>e. coli</species> inquiry calls for stricter laws</title> Find coli inquiry crossing tag boundaries 31

32 Semi-Structured Documents What has Dr Brown authored? What cites Dr Brown? In what papers does Dr Brown self cite? <article> <tig> <au> <fnm>j.m.</fnm> <snm>brown</snm> </au> <atl>real-time Process Control</atl> <ti>annal Hist Comput</ti> <obi> <volno>117</volno> <issno>1</issno> </obi> <pp>3-3</pp> </tig> <bb> <au> <fnm>j.m.</fnm> <snm>brown</snm> </au> <atl>computer Controlled Processing</atl> <ti>chem Eng Prog</ti> <obi> <volno>156</volno> <issno>5</issno> </obi> <pp>63-67</pp> </bb> </article> 32

33 Region Algrbra: Series of Filters The document collection is divided into a series of contiguous extents, one per tag (the extents are chosen by the indexer) Each extent is represented: (start, end) Each series of extents is a list: [(start, end), ] <collection> <document> <section> <title> <collection> <document> <section> <title> </title> </section> <section> </section> <section> </section> </document> <document> <title> </title> <section> </section> <section> </section> </document> </collection> 33

34 Series of Filters Structures are represented as a contiguous extent [(start, end), (start, end), ] Terms are represented as a contiguous extents [(start, start + 1), ] coli <title> Phrases are represented as contiguous extents [(start, start + len), ] local enquiry All operations on the collection can now be expressed as a series of filters 34

35 Find coli in <title> Series of Filters Load the extent list for coli coli Load extent list for <title> <title> Filter coli by <title> coli <title> ANSWER 35

36 Series of Filters Find coli in <title> in <section> As above, filter coli by <title> coli in <title> Now filter result by <section> coli in <title> <section> ANSWER 36

37 Series of Filters Problem: <a> in <b> or <b> in <a> coli in <title> in <section> is not the same as coli in <section> in <title> coli in <title> in <section> coli <title> <section> ANSWER coli in <section> in <title> coli <section> <title> ANSWER But series of filters gives the same result! 37

38 Series of Filters Problem: Self-containing structures <b>the <b>brown</b> cows</b> Where is the beginning and end of the <b> extent? <b> <b> = (start, start, end, end) How is the query brown in <b> in <b> resolved? Problem: Frequent structures Some structures are very common (e.g. <p> tags). These tags can be more common than the most common words. The index can become clogged with these extents 38

39 Series of Filters These problems have been addressed by keeping not only the extent for a tag, but also the depth Problematically, this increases the size of the postings list even further and adds to the computation cost of calculating the results 39

40 ebay The ebay search engine sometimes does this Terms are stored with term frequencies and word positions: Lectures <d,tf,w,w,w>,<d,tf,w,w>, zone markers are stored (start, length) rather than (start, end) zone_markers:title <d,tf,s,l,s,l>,<d,tf,s,l>, If the d s match and w s are between s and s+l then d is a matching document 40

41 Embedded Paths In place of each <d n, f n > in the postings, additionally embed the location in the document <d n, p n, f n > Many variants exist <doc> <sec> <p>fox in Sox</p> </sec> </doc> <doc> <sec> <p>the Cat in the Hat</p> </sec> <sec> <p>comes Back</p> </sec> </doc> <doc> <sec> <p>green Eggs</p> <p>and Ham</p> </sec> </doc> sec:2 doc:1 sec:4 p:3 p:6 p:5 Example postings green: <3,3,1> ham: <3,6,1> hat: <2,3,1> in: <1,3,1><2,3,1> 41

42 Searching Embedded Paths Determine which nodes of the tree are relevant and select postings from only those positions doc:1 Fox in <p> sec:2 sec:4 <p> is structures (3,5,6) Load postings for fox, use only those where p n = 3, 5 or 6 Self containing structures As the entire structure is represented, self-containing structures are supported Frequent structures p:3 p:6 p:5 Each structure is represented once and each posting is a fixed size 42

43 Additionally Embedded Paths <d n,p n,f n > can be converted into <d n,f n > by collecting all occurrences of the term where d n is the same Ranking Because postings can be converted into <d n,f n > we can rank whole documents using any of the standard ranking functions Various optimizations to store the embedded paths for fast processing exist: term <p n > <d pn,f pn ><d pn,f pn ><d pn,f pn ><d pn,f pn > <p n > <d pn,f pn > 43

44 Embedded Paths The problem with the embedded path approach is that the whole tree for the entire collection must be kept in memory. This does not scale in semistructured free-text data such as HTML. One solution to this is to trim the tree by ignoring tags: Smaller than some length Smaller than, say, 50 words Pre-determined to be unlikely to be useful <i>, <b>, <firstname>, etc. 44

45 Dewey Decimal Codes Originallly from libraries, Dewey is a hierarchical classification scheme The tree from a single document can be labelled using a similar scheme The root is 1, the children are 1:1, 1:2, etc. Their children are 1:1:1, 1:1:2, etc. These codes are stored directly in the postings, <d n, p n, f n > where p n is the Dewey code From p n it is possible to know: Which postings are for the same node (p n1 =p n2 ) The child / parent relationship between postings 45

46 Dewey Decimal Codes The problem with the Dewey encoding of postings is that the Dewey codes can get very long They can get so long that more space is spent storing the codes than storing the postings 46

47 Structured IR It is an unfortunate state of affairs that the three best systems for semi-structured IR all have scalability problems The best solution to the problem is not known One possible hack is to search for documents first and then apply a post-filter to each document. This post-filter can result is a linear search over the entire collection (remember signature files?) 47

48 Summary Structured Information Retrieval Structured documents Meta-data Semi-structured documents Methods of searching The best method is not currently known 48

Query Processing and Alternative Search Structures. Indexing common words

Query Processing and Alternative Search Structures. Indexing common words Query Processing and Alternative Search Structures CS 510 Winter 2007 1 Indexing common words What is the indexing overhead for a common term? I.e., does leaving out stopwords help? Consider a word such

More information

Introduction to Information Retrieval. Lecture Outline

Introduction to Information Retrieval. Lecture Outline Introduction to Information Retrieval Lecture 1 CS 410/510 Information Retrieval on the Internet Lecture Outline IR systems Overview IR systems vs. DBMS Types, facets of interest User tasks Document representations

More information

Web Information Retrieval. Lecture 4 Dictionaries, Index Compression

Web Information Retrieval. Lecture 4 Dictionaries, Index Compression Web Information Retrieval Lecture 4 Dictionaries, Index Compression Recap: lecture 2,3 Stemming, tokenization etc. Faster postings merges Phrase queries Index construction This lecture Dictionary data

More information

Indexing and Query Processing. What will we cover?

Indexing and Query Processing. What will we cover? Indexing and Query Processing CS 510 Winter 2007 1 What will we cover? Key concepts and terminology Inverted index structures Organization, creation, maintenance Compression Distribution Answering queries

More information

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze Indexing UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze All slides Addison Wesley, 2008 Table of Content Inverted index with positional information

More information

modern database systems lecture 4 : information retrieval

modern database systems lecture 4 : information retrieval modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS3245 Information Retrieval Lecture 6: Index Compression 6 Last Time: index construction Sort- based indexing Blocked Sort- Based Indexing Merge sort is effective

More information

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges

More information

Introduction to Information Retrieval (Manning, Raghavan, Schutze)

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 3 Dictionaries and Tolerant retrieval Chapter 4 Index construction Chapter 5 Index compression Content Dictionary data structures

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 05 Index Compression 1 05 Index Compression - Information Retrieval - 05 Index Compression 2 Last lecture index construction Sort-based indexing

More information

Recap: lecture 2 CS276A Information Retrieval

Recap: lecture 2 CS276A Information Retrieval Recap: lecture 2 CS276A Information Retrieval Stemming, tokenization etc. Faster postings merges Phrase queries Lecture 3 This lecture Index compression Space estimation Corpus size for estimates Consider

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 10: XML Retrieval Hinrich Schütze, Christina Lioma Center for Information and Language Processing, University of Munich 2010-07-12

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

Indexing and Searching

Indexing and Searching Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 8 2. Information Retrieval:

More information

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488) Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-

More information

Indexing Web pages. Web Search: Indexing Web Pages. Indexing the link structure. checkpoint URL s. Connectivity Server: Node table

Indexing Web pages. Web Search: Indexing Web Pages. Indexing the link structure. checkpoint URL s. Connectivity Server: Node table Indexing Web pages Web Search: Indexing Web Pages CPS 296.1 Topics in Database Systems Indexing the link structure AltaVista Connectivity Server case study Bharat et al., The Fast Access to Linkage Information

More information

Indexing and Searching

Indexing and Searching Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 9 2. Information Retrieval:

More information

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf

More information

Indexing: Part IV. Announcements (February 17) Keyword search. CPS 216 Advanced Database Systems

Indexing: Part IV. Announcements (February 17) Keyword search. CPS 216 Advanced Database Systems Indexing: Part IV CPS 216 Advanced Database Systems Announcements (February 17) 2 Homework #2 due in two weeks Reading assignments for this and next week The query processing survey by Graefe Due next

More information

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Inverted Indexes Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Basic Concepts Inverted index: a word-oriented mechanism for indexing a text collection to speed up the

More information

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf

More information

Indexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton

Indexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton Indexing Index Construction CS6200: Information Retrieval Slides by: Jesse Anderton Motivation: Scale Corpus Terms Docs Entries A term incidence matrix with V terms and D documents has O(V x D) entries.

More information

Computer Science 136 Spring 2004 Professor Bruce. Final Examination May 19, 2004

Computer Science 136 Spring 2004 Professor Bruce. Final Examination May 19, 2004 Computer Science 136 Spring 2004 Professor Bruce Final Examination May 19, 2004 Question Points Score 1 10 2 8 3 15 4 12 5 12 6 8 7 10 TOTAL 65 Your name (Please print) I have neither given nor received

More information

COMP6237 Data Mining Searching and Ranking

COMP6237 Data Mining Searching and Ranking COMP6237 Data Mining Searching and Ranking Jonathon Hare jsh2@ecs.soton.ac.uk Note: portions of these slides are from those by ChengXiang Cheng Zhai at UIUC https://class.coursera.org/textretrieval-001

More information

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs

More information

Lecture 5: Information Retrieval using the Vector Space Model

Lecture 5: Information Retrieval using the Vector Space Model Lecture 5: Information Retrieval using the Vector Space Model Trevor Cohn (tcohn@unimelb.edu.au) Slide credits: William Webber COMP90042, 2015, Semester 1 What we ll learn today How to take a user query

More information

CS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University

CS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University CS6200 Information Retrieval David Smith College of Computer and Information Science Northeastern University Indexing Process!2 Indexes Storing document information for faster queries Indexes Index Compression

More information

CS347. Lecture 2 April 9, Prabhakar Raghavan

CS347. Lecture 2 April 9, Prabhakar Raghavan CS347 Lecture 2 April 9, 2001 Prabhakar Raghavan Today s topics Inverted index storage Compressing dictionaries into memory Processing Boolean queries Optimizing term processing Skip list encoding Wild-card

More information

Today s topics CS347. Inverted index storage. Inverted index storage. Processing Boolean queries. Lecture 2 April 9, 2001 Prabhakar Raghavan

Today s topics CS347. Inverted index storage. Inverted index storage. Processing Boolean queries. Lecture 2 April 9, 2001 Prabhakar Raghavan Today s topics CS347 Lecture 2 April 9, 2001 Prabhakar Raghavan Inverted index storage Compressing dictionaries into memory Processing Boolean queries Optimizing term processing Skip list encoding Wild-card

More information

Effective searching strategies and techniques

Effective searching strategies and techniques Effective searching strategies and techniques Getting the most from electronic information resources Objectives To understand the importance of effective searching To develop guidelines for planning and

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 3: Dictionaries and tolerant retrieval 1 Outline Dictionaries Wildcard queries skip Edit distance skip Spelling correction skip Soundex 2 Inverted index Our

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Logistics. CSE Case Studies. Indexing & Retrieval in Google. Review: AltaVista. BigTable. Index Stream Readers (ISRs) Advanced Search

Logistics. CSE Case Studies. Indexing & Retrieval in Google. Review: AltaVista. BigTable. Index Stream Readers (ISRs) Advanced Search CSE 454 - Case Studies Indexing & Retrieval in Google Some slides from http://www.cs.huji.ac.il/~sdbi/2000/google/index.htm Logistics For next class Read: How to implement PageRank Efficiently Projects

More information

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi. Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data

More information

Query Evaluation Strategies

Query Evaluation Strategies Introduction to Search Engine Technology Term-at-a-Time and Document-at-a-Time Evaluation Ronny Lempel Yahoo! Research (Many of the following slides are courtesy of Aya Soffer and David Carmel, IBM Haifa

More information

A Security Model for Multi-User File System Search. in Multi-User Environments

A Security Model for Multi-User File System Search. in Multi-User Environments A Security Model for Full-Text File System Search in Multi-User Environments Stefan Büttcher Charles L. A. Clarke University of Waterloo, Canada December 15, 2005 1 Introduction and Motivation 2 3 4 5

More information

More on indexing CE-324: Modern Information Retrieval Sharif University of Technology

More on indexing CE-324: Modern Information Retrieval Sharif University of Technology More on indexing CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Plan

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2015/16 IR Chapter 04 Index Construction Hardware In this chapter we will look at how to construct an inverted index Many

More information

Midterm Exam Search Engines ( / ) October 20, 2015

Midterm Exam Search Engines ( / ) October 20, 2015 Student Name: Andrew ID: Seat Number: Midterm Exam Search Engines (11-442 / 11-642) October 20, 2015 Answer all of the following questions. Each answer should be thorough, complete, and relevant. Points

More information

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following

More information

Component ranking and Automatic Query Refinement for XML Retrieval

Component ranking and Automatic Query Refinement for XML Retrieval Component ranking and Automatic uery Refinement for XML Retrieval Yosi Mass, Matan Mandelbrod IBM Research Lab Haifa 31905, Israel {yosimass, matan}@il.ibm.com Abstract ueries over XML documents challenge

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s Summary agenda Summary: EITN01 Web Intelligence and Information Retrieval Anders Ardö EIT Electrical and Information Technology, Lund University March 13, 2013 A Ardö, EIT Summary: EITN01 Web Intelligence

More information

Text Technologies for Data Science INFR Indexing (2) Instructor: Walid Magdy

Text Technologies for Data Science INFR Indexing (2) Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Indexing (2) Instructor: Walid Magdy 03-Oct-2018 Lecture Objectives Learn more about indexing: Structured documents Extent index Index compression Data structure

More information

Text Technologies for Data Science INFR Indexing (2) Instructor: Walid Magdy

Text Technologies for Data Science INFR Indexing (2) Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Indexing (2) Instructor: Walid Magdy 10-Oct-2017 Lecture Objectives Learn more about indexing: Structured documents Extent index Index compression Data structure

More information

60-538: Information Retrieval

60-538: Information Retrieval 60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are

More information

CIS192 Python Programming

CIS192 Python Programming CIS192 Python Programming HTTP & HTML & JSON Harry Smith University of Pennsylvania November 1, 2017 Harry Smith (University of Pennsylvania) CIS 192 Lecture 10 November 1, 2017 1 / 22 Outline 1 HTTP Requests

More information

Chapter 4. Processing Text

Chapter 4. Processing Text Chapter 4 Processing Text Processing Text Modifying/Converting documents to index terms Convert the many forms of words into more consistent index terms that represent the content of a document What are

More information

Overview. Lecture 3: Index Representation and Tolerant Retrieval. Type/token distinction. IR System components

Overview. Lecture 3: Index Representation and Tolerant Retrieval. Type/token distinction. IR System components Overview Lecture 3: Index Representation and Tolerant Retrieval Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group 1 Recap 2

More information

Index Compression. David Kauchak cs160 Fall 2009 adapted from:

Index Compression. David Kauchak cs160 Fall 2009 adapted from: Index Compression David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture5-indexcompression.ppt Administrative Homework 2 Assignment 1 Assignment 2 Pair programming?

More information

Relational Approach. Problem Definition

Relational Approach. Problem Definition Relational Approach (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Slides are mostly based on Information Retrieval Algorithms and Heuristics, Grossman & Frieder 1 Problem Definition Three conceptual

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 7: Scores in a Complete Search System Paul Ginsparg Cornell University, Ithaca,

More information

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler Lecture 12: BT Trees Courtesy to Goodrich, Tamassia and Olga Veksler Instructor: Yuzhen Xie Outline B-tree Special case of multiway search trees used when data must be stored on the disk, i.e. too large

More information

Identifying and Ranking Relevant Document Elements

Identifying and Ranking Relevant Document Elements Identifying and Ranking Relevant Document Elements Andrew Trotman and Richard A. O Keefe Department of Computer Science University of Otago Dunedin, New Zealand andrew@cs.otago.ac.nz, ok@otago.ac.nz ABSTRACT

More information

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system. Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.

More information

Horn Formulae. CS124 Course Notes 8 Spring 2018

Horn Formulae. CS124 Course Notes 8 Spring 2018 CS124 Course Notes 8 Spring 2018 In today s lecture we will be looking a bit more closely at the Greedy approach to designing algorithms. As we will see, sometimes it works, and sometimes even when it

More information

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

B-Trees. Version of October 2, B-Trees Version of October 2, / 22 B-Trees Version of October 2, 2014 B-Trees Version of October 2, 2014 1 / 22 Motivation An AVL tree can be an excellent data structure for implementing dictionary search, insertion and deletion Each operation

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 04 Index Construction 1 04 Index Construction - Information Retrieval - 04 Index Construction 2 Plan Last lecture: Dictionary data structures Tolerant

More information

The Anatomy of a Large-Scale Hypertextual Web Search Engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine The Anatomy of a Large-Scale Hypertextual Web Search Engine Article by: Larry Page and Sergey Brin Computer Networks 30(1-7):107-117, 1998 1 1. Introduction The authors: Lawrence Page, Sergey Brin started

More information

EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling

EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling Doug Downey Based partially on slides by Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze Announcements Project progress report

More information

Administrative. Distributed indexing. Index Compression! What I did last summer lunch talks today. Master. Tasks

Administrative. Distributed indexing. Index Compression! What I did last summer lunch talks today. Master. Tasks Administrative Index Compression! n Assignment 1? n Homework 2 out n What I did last summer lunch talks today David Kauchak cs458 Fall 2012 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture5-indexcompression.ppt

More information

Recap of the previous lecture. This lecture. A naïve dictionary. Introduction to Information Retrieval. Dictionary data structures Tolerant retrieval

Recap of the previous lecture. This lecture. A naïve dictionary. Introduction to Information Retrieval. Dictionary data structures Tolerant retrieval Ch. 2 Recap of the previous lecture Introduction to Information Retrieval Lecture 3: Dictionaries and tolerant retrieval The type/token distinction Terms are normalized types put in the dictionary Tokenization

More information

Creating SQL Tables and using Data Types

Creating SQL Tables and using Data Types Creating SQL Tables and using Data Types Aims: To learn how to create tables in Oracle SQL, and how to use Oracle SQL data types in the creation of these tables. Outline of Session: Given a simple database

More information

Searching with Tags: Do Tags Help Users Find Things?

Searching with Tags: Do Tags Help Users Find Things? Margaret Kipp College of Information and Computer Science Long Island University Searching with Tags: Do Tags Help Users Find Things? margaret.kipp@gmail.com - http://myweb.liu.edu/~mkipp/

More information

Web Search Ranking. (COSC 488) Nazli Goharian Evaluation of Web Search Engines: High Precision Search

Web Search Ranking. (COSC 488) Nazli Goharian Evaluation of Web Search Engines: High Precision Search Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web Search Engines: High Precision Search Traditional IR systems are evaluated based on precision and recall. Web search

More information

CSCI S-Q Lecture #12 7/29/98 Data Structures and I/O

CSCI S-Q Lecture #12 7/29/98 Data Structures and I/O CSCI S-Q Lecture #12 7/29/98 Data Structures and I/O Introduction The WRITE and READ ADT Operations Case Studies: Arrays Strings Binary Trees Binary Search Trees Unordered Search Trees Page 1 Introduction

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information

Financial Tracking Service Tutorial: Using Data Search

Financial Tracking Service Tutorial: Using Data Search Tutorial: 2 of 27 TABLE OF CONTENTS What is data Search? 3 1.Search Page 4 2. Search Options 5 3. Year(s) 6 4. Search Terms 7 5. Get the Data 10 6. Viewing the Data Results 11 7. Flows (the data) 12 8.

More information

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

Tree Parsing. $Revision: 1.4 $

Tree Parsing. $Revision: 1.4 $ Tree Parsing $Revision: 1.4 $ Compiler Tools Group Department of Electrical and Computer Engineering University of Colorado Boulder, CO, USA 80309-0425 i Table of Contents 1 The Tree To Be Parsed.........................

More information

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 4, 2012 CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL IOAN BADARINZA AND ADRIAN STERCA Abstract. In this paper

More information

Text Properties and Languages

Text Properties and Languages Text Properties and Languages 1 Statistical Properties of Text How is the frequency of different words distributed? How fast does vocabulary size grow with the size of a corpus? Such factors affect the

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Information Retrieval and Web Search Lecture 1: Introduction and Boolean retrieval Outline ❶ Course details ❷ Information retrieval ❸ Boolean retrieval 2 Course details

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Information Retrieval. Chap 7. Text Operations

Information Retrieval. Chap 7. Text Operations Information Retrieval Chap 7. Text Operations The Retrieval Process user need User Interface 4, 10 Text Text logical view Text Operations logical view 6, 7 user feedback Query Operations query Indexing

More information

An Effective and Efficient Approach for Keyword-Based XML Retrieval. Xiaoguang Li, Jian Gong, Daling Wang, and Ge Yu retold by Daryna Bronnykova

An Effective and Efficient Approach for Keyword-Based XML Retrieval. Xiaoguang Li, Jian Gong, Daling Wang, and Ge Yu retold by Daryna Bronnykova An Effective and Efficient Approach for Keyword-Based XML Retrieval Xiaoguang Li, Jian Gong, Daling Wang, and Ge Yu retold by Daryna Bronnykova Search on XML documents 2 Why not use google? Why are traditional

More information

Data Structures and Methods. Johan Bollen Old Dominion University Department of Computer Science

Data Structures and Methods. Johan Bollen Old Dominion University Department of Computer Science Data Structures and Methods Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen January 20, 2004 Page 1 Lecture Objectives 1. To this point:

More information

3-2. Index construction. Most slides were adapted from Stanford CS 276 course and University of Munich IR course.

3-2. Index construction. Most slides were adapted from Stanford CS 276 course and University of Munich IR course. 3-2. Index construction Most slides were adapted from Stanford CS 276 course and University of Munich IR course. 1 Ch. 4 Index construction How do we construct an index? What strategies can we use with

More information

Ges$one Avanzata dell Informazione Part A Full- Text Informa$on Management. Full- Text Indexing

Ges$one Avanzata dell Informazione Part A Full- Text Informa$on Management. Full- Text Indexing Ges$one Avanzata dell Informazione Part A Full- Text Informa$on Management Full- Text Indexing Contents } Introduction } Inverted Indices } Construction } Searching 2 GAvI - Full- Text Informa$on Management:

More information

CIS 45, The Introduction. What is a database? What is data? What is information?

CIS 45, The Introduction. What is a database? What is data? What is information? CIS 45, The Introduction I have traveled the length and breadth of this country and talked with the best people, and I can assure you that data processing is a fad that won t last out the year. The editor

More information

Full-Text Indexing For Heritrix

Full-Text Indexing For Heritrix Full-Text Indexing For Heritrix Project Advisor: Dr. Chris Pollett Committee Members: Dr. Mark Stamp Dr. Jeffrey Smith Darshan Karia CS298 Master s Project Writing 1 2 Agenda Introduction Heritrix Design

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Data warehouse architecture consists of the following interconnected layers:

Data warehouse architecture consists of the following interconnected layers: Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and

More information

CPS352 Lecture - Indexing

CPS352 Lecture - Indexing Objectives: CPS352 Lecture - Indexing Last revised 2/25/2019 1. To explain motivations and conflicting goals for indexing 2. To explain different types of indexes (ordered versus hashed; clustering versus

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation

More information

ISSUES IN INFORMATION RETRIEVAL Brian Vickery. Presentation at ISKO meeting on June 26, 2008 At University College, London

ISSUES IN INFORMATION RETRIEVAL Brian Vickery. Presentation at ISKO meeting on June 26, 2008 At University College, London ISSUES IN INFORMATION RETRIEVAL Brian Vickery Presentation at ISKO meeting on June 26, 2008 At University College, London NEEDLE IN HAYSTACK MY BACKGROUND Plant chemist, then reports librarian Librarian,

More information

Relevance of a Document to a Query

Relevance of a Document to a Query Relevance of a Document to a Query Computing the relevance of a document to a query has four parts: 1. Computing the significance of a word within document D. 2. Computing the significance of word to document

More information

Digital Arithmetic. Digital Arithmetic: Operations and Circuits Dr. Farahmand

Digital Arithmetic. Digital Arithmetic: Operations and Circuits Dr. Farahmand Digital Arithmetic Digital Arithmetic: Operations and Circuits Dr. Farahmand Binary Arithmetic Digital circuits are frequently used for arithmetic operations Fundamental arithmetic operations on binary

More information

LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology

LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University of New Brunswick Learning Objects Summit Fredericton,

More information

Representing Data Elements

Representing Data Elements Representing Data Elements Week 10 and 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 18.3.2002 by Hector Garcia-Molina, Vera Goebel INF3100/INF4100 Database Systems Page

More information

Oracle BI 11g R1: Build Repositories Course OR102; 5 Days, Instructor-led

Oracle BI 11g R1: Build Repositories Course OR102; 5 Days, Instructor-led Oracle BI 11g R1: Build Repositories Course OR102; 5 Days, Instructor-led Course Description This Oracle BI 11g R1: Build Repositories training is based on OBI EE release 11.1.1.7. Expert Oracle Instructors

More information

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time: English Student no:... Page 1 of 14 Contact during the exam: Geir Solskinnsbakk Phone: 735 94218/ 93607988 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

Overview. What is system analysis and design? Tools and models Methodologies

Overview. What is system analysis and design? Tools and models Methodologies Overview What is system analysis and design? Tools and models Methodologies Information Systems What is a system? Why do systems fail? What is systems analysis and design? How do we do systems analysis?

More information

Chapter 13 XML: Extensible Markup Language

Chapter 13 XML: Extensible Markup Language Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server

More information