Advance Indexing. Limock July 3, 2014
|
|
- Howard Gordon
- 6 years ago
- Views:
Transcription
1 Advance Indexing Limock July 3,
2 Papers 1) Gurajada, Sairam : "On-line index maintenance using horizontal partitioning." Proceedings of the 18th ACM conference on Information and knowledge management. ACM, ) Fontoura, Marcus, et al : "Efficiently encoding term cooccurrences in inverted indexes." Proceedings of the 20th ACM international conference on Information and knowledge management. ACM,
3 Why Use an On-line Indexing Dynamic document collections. Process documents indexing while serving queries request. 3
4 Off-line indexing Main-Memory Documents List of <term, doc> pairs split Merge pairs with same term merge 4
5 On-line Indexing Techniques Basic Approaches Re-build In-place Multi-partition Single Partition -Immediate Merge Re-merge Vertical Partitioning Horizontal Partitioning -Geometric Partitioning -Logarithmic Merge Single-split Indexing -Single-split Immediate Merge -Single-split Multi-partition Multi-split Indexing -Index Tree 5
6 Re-build Update New index is build from scratch Keep existing index for querying Discard old index once new index is available Simple High query performance Inefficient index approach 6
7 In-place Update Append posting terms in in-memory index to terms in on-disk index Each term has an overallocation space Relocate terms if overallocation is exhausted Require an effective overallocation strategy 7
8 Merge-based Immediate Merge One on-disk inverted index exists. Merge existing index and in-memory index to replace the existing index High query performance > contiguous posting list High index maintenance cost 8
9 Vertical Multi-partition Geometric Partitioning Each partition contains sub-index Partition 0 is reserved for in-memory index block A partition k has index of size 0 or [r (k 1) postings S,(r 1)r (k 1 ) S] Geometric ratio 9
10 Vertical Multi-partition Logarithmic Merge Every sub-index is given unique generation number g Merge sub-index g to create sub index g+1 log k N partition, where N is the number of in-memory blocks created 10
11 Proposed Work Horizontal partitioning Splitting terms into 2 : frequent and infrequent Maintain frequent-term sub-index with better query performance approach (i.e active merge) Maintain infrequent-term sub-index with better index performance approach (i.e lazy merge) 11
12 Single-split Immediate Search Lazy-merge approach for infrequent-term Immediate-merge for frequent term High query performance Poor index maintenance performance 12
13 Single-split Multi-partition Lazy-merge approach for infrequent-term Active-merge for frequent term Slightly lower query performance Better index maintenance performance 13
14 Index Tree Binary where each node contain a set of partition and each partition contains a sub-index. There are 3 parameter for each node : P, R and T Split sub-index into frequent-term(right node) and infrequent-term(left node) For left node : decrement value of R and increment value of P For right node : increment value of R and decrement value of P 14
15 Index Tree 15
16 Index Tree Step by Step Construction 16
17 Index Tree - Algorithm 17
18 Experiment Data source : Wikipedia 7 Millions HTML documents Average size per document : 12KB Compute term frequencies from static AOL query log Query log contains over 1.2 million unique query terms Maximum in-memory index size is set to 60MB 1GB RAM and 240GB hard drive Document deletion are not handled Parameter P : 3, R : 1, T : 80%-20% 18
19 Result Index and Query Performance Index Performance Comparison Query Performance Comparison 19
20 Result Query vs Index Comparison Query Index Performance Index Maintenance Performance 1 Multi-split (Index Tree) Single-split multi-partition Geometric Partition Single-split Immediate 0 0 Query Performance 1 20
21 Papers 1) Gurajada, Sairam : "On-line index maintenance using horizontal partitioning." Proceedings of the 18th ACM conference on Information and knowledge management. ACM, ) Fontoura, Marcus, et al : "Efficiently encoding term cooccurrences in inverted indexes." Proceedings of the 20th ACM international conference on Information and knowledge management. ACM,
22 General Information Cost of query evaluation depends on : Number of query terms Length of their posting lists Precomputation of common sub-queries help reduce this cost Replace multiple query terms with one composite term Stored in the index as regular posting lists Constraint : Memory 22
23 Proposed Work Using bitmaps to encode term co-occurence Require k bits for encoding k different terms with the posting list in a list with bitmaps Able to resolve any queries involving any of the 2k combination of the chosen terms Constraint : Posting lists are longer compared to precomputed list 23
24 Evaluation Time Cost Function Cost Function : i=2 F (q)= L1 G ( Li ) n G is sub-linear function quantifying the number of accesses to the secondary lists G(x)=C 1 +C 2 log(x) Cost Function recalculated : n F (q)= L1 (12+log Li ) i=2 24
25 Index Construction - Bitmaps Let B the association matrix where bij if there is a bit for term t j in list Li ' s bitmaps. minimize : subject to : F (B, q) q Q bij Li S bij {0,1} i Compute benefit ratio : F (B {bij }, q) F (B, q) λ ij = q Q Li 25
26 Index Construction Precomputed Lists Let P a set of precomputed lists where pij is the indicator variable representing the results of query t i t j minimize : subject to : F (P, q) q Q pij Li L j S ij Compute benefit ratio : pij {0,1} F ( P { pij }, q) F ( P, q) λ ' ij = q Q Li L j 26
27 Index Construction - Hybrid Hybrid algorithm that at each step select precomputed list pij or a bitmap b that maximizes marginal benefit given by ij Equation 1 or 2 27
28 Query Evaluation - Bitmaps 28
29 Query Evaluation Precomputed Lists 29
30 Query Evaluation - Hybrid First invoke Algorithm 3 to identify precomputed lists Then invoke Algorithm 2 for removing some of the lists that are covered by bitmaps in shorter lists. This is to minimize L1 30
31 Experiment Data : TREC WT01g corpus 1.68 million web pages Extract textual content and discard HTML tags Each post in index contain 4byte of docid and variable size payload(0-32 bits) containing bitmaps Index size : 1.5GB, max 6GB AOL query log for query workload 21M queries for training set 50K queries from remaining 2.6M queries for testing set 31
32 Result Query Latency Improvement 32
33 Result Query Latency Improvement 33
34 Thank you 34
Information Retrieval and Organisation
Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2015/16 IR Chapter 04 Index Construction Hardware In this chapter we will look at how to construct an inverted index Many
More informationSSD-based Information Retrieval Systems
Efficient Online Index Maintenance for SSD-based Information Retrieval Systems Ruixuan Li, Xuefan Chen, Chengzhou Li, Xiwu Gu, Kunmei Wen Huazhong University of Science and Technology Wuhan, China SSD
More informationSSD-based Information Retrieval Systems
HPCC 2012, Liverpool, UK Efficient Online Index Maintenance for SSD-based Information Retrieval Systems Ruixuan Li, Xuefan Chen, Chengzhou Li, Xiwu Gu, Kunmei Wen Huazhong University of Science and Technology
More informationText Analytics. Index-Structures for Information Retrieval. Ulf Leser
Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf
More informationText Analytics. Index-Structures for Information Retrieval. Ulf Leser
Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf
More informationAnalyzing the performance of top-k retrieval algorithms. Marcus Fontoura Google, Inc
Analyzing the performance of top-k retrieval algorithms Marcus Fontoura Google, Inc This talk Largely based on the paper Evaluation Strategies for Top-k Queries over Memory-Resident Inverted Indices, VLDB
More informationIndex Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search
Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing
More informationIndex construction CE-324: Modern Information Retrieval Sharif University of Technology
Index construction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch.
More information3-2. Index construction. Most slides were adapted from Stanford CS 276 course and University of Munich IR course.
3-2. Index construction Most slides were adapted from Stanford CS 276 course and University of Munich IR course. 1 Ch. 4 Index construction How do we construct an index? What strategies can we use with
More informationA Hybrid Approach to Index Maintenance in Dynamic Text Retrieval Systems
A Hybrid Approach to Index Maintenance in Dynamic Text Retrieval Systems Stefan Büttcher and Charles L. A. Clarke School of Computer Science, University of Waterloo, Canada {sbuettch,claclark}@plg.uwaterloo.ca
More informationIndex construction CE-324: Modern Information Retrieval Sharif University of Technology
Index construction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch.
More informationInformation Retrieval
Introduction to Information Retrieval CS3245 Information Retrieval Lecture 6: Index Compression 6 Last Time: index construction Sort- based indexing Blocked Sort- Based Indexing Merge sort is effective
More informationOutlook. File-System Interface Allocation-Methods Free Space Management
File System Outlook File-System Interface Allocation-Methods Free Space Management 2 File System Interface File Concept File system is the most visible part of an OS Files storing related data Directory
More informationEfficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)
Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-
More information1 o Semestre 2007/2008
Efficient Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 6 7 Outline 1 2 3 4 5 6 7 Text es An index is a mechanism to locate a given term in
More informationindex construct Overview Overview Recap How to construct index? Introduction Index construction Introduction to Recap
to to Information Retrieval Index Construct Ruixuan Li Huazhong University of Science and Technology http://idc.hust.edu.cn/~rxli/ October, 2012 1 2 How to construct index? Computerese term document docid
More informationInformation Retrieval
Information Retrieval Suan Lee - Information Retrieval - 04 Index Construction 1 04 Index Construction - Information Retrieval - 04 Index Construction 2 Plan Last lecture: Dictionary data structures Tolerant
More informationMemory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358
Memory Management Reading: Silberschatz chapter 9 Reading: Stallings chapter 7 1 Outline Background Issues in Memory Management Logical Vs Physical address, MMU Dynamic Loading Memory Partitioning Placement
More informationIndex Construction 1
Index Construction 1 October, 2009 1 Vorlage: Folien von M. Schütze 1 von 43 Index Construction Hardware basics Many design decisions in information retrieval are based on hardware constraints. We begin
More informationmodern database systems lecture 4 : information retrieval
modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation
More informationBuilding an Inverted Index
Building an Inverted Index Algorithms Memory-based Disk-based (Sort-Inversion) Sorting Merging (2-way; multi-way) 2 Memory-based Inverted Index Phase I (parse and read) For each document Identify distinct
More informationIndex construction CE-324: Modern Information Retrieval Sharif University of Technology
Index construction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch.
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 4: Index Construction 1 Plan Last lecture: Dictionary data structures Tolerant retrieval Wildcards Spell correction Soundex a-hu hy-m n-z $m mace madden mo
More informationFile Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System
File Systems Kartik Gopalan Chapter 4 From Tanenbaum s Modern Operating System 1 What is a File System? File system is the OS component that organizes data on the raw storage device. Data, by itself, is
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationMemory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts
Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationIndex Compression. David Kauchak cs160 Fall 2009 adapted from:
Index Compression David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture5-indexcompression.ppt Administrative Homework 2 Assignment 1 Assignment 2 Pair programming?
More informationAdministrative. Distributed indexing. Index Compression! What I did last summer lunch talks today. Master. Tasks
Administrative Index Compression! n Assignment 1? n Homework 2 out n What I did last summer lunch talks today David Kauchak cs458 Fall 2012 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture5-indexcompression.ppt
More informationFlexible Cache Cache for afor Database Management Management Systems Systems Radim Bača and David Bednář
Flexible Cache Cache for afor Database Management Management Systems Systems Radim Bača and David Bednář Department ofradim Computer Bača Science, and Technical David Bednář University of Ostrava Czech
More informationInverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5
Inverted Indexes Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Basic Concepts Inverted index: a word-oriented mechanism for indexing a text collection to speed up the
More informationData Representation. Types of data: Numbers Text Audio Images & Graphics Video
Data Representation Data Representation Types of data: Numbers Text Audio Images & Graphics Video Analog vs Digital data How is data represented? What is a signal? Transmission of data Analog vs Digital
More informationIndexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton
Indexing Index Construction CS6200: Information Retrieval Slides by: Jesse Anderton Motivation: Scale Corpus Terms Docs Entries A term incidence matrix with V terms and D documents has O(V x D) entries.
More informationMain Memory and the CPU Cache
Main Memory and the CPU Cache CPU cache Unrolled linked lists B Trees Our model of main memory and the cost of CPU operations has been intentionally simplistic The major focus has been on determining
More informationCS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University
CS6200 Information Retrieval David Smith College of Computer and Information Science Northeastern University Indexing Process!2 Indexes Storing document information for faster queries Indexes Index Compression
More informationChapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition
Chapter 7: Main Memory Operating System Concepts Essentials 8 th Edition Silberschatz, Galvin and Gagne 2011 Chapter 7: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure
More informationInformation Retrieval
Information Retrieval Suan Lee - Information Retrieval - 05 Index Compression 1 05 Index Compression - Information Retrieval - 05 Index Compression 2 Last lecture index construction Sort-based indexing
More informationChapter 10: Case Studies. So what happens in a real operating system?
Chapter 10: Case Studies So what happens in a real operating system? Operating systems in the real world Studied mechanisms used by operating systems Processes & scheduling Memory management File systems
More informationIntroduc)on to. CS60092: Informa0on Retrieval
Introduc)on to CS60092: Informa0on Retrieval Ch. 4 Index construc)on How do we construct an index? What strategies can we use with limited main memory? Sec. 4.1 Hardware basics Many design decisions in
More informationCluster based Mixed Coding Schemes for Inverted File Index Compression
Cluster based Mixed Coding Schemes for Inverted File Index Compression Jinlin Chen 1, Ping Zhong 2, Terry Cook 3 1 Computer Science Department Queen College, City University of New York USA jchen@cs.qc.edu
More informationCSE 454. Index Compression Alta Vista PageRank
CSE 454 Index Compression Alta Vista PageRank 1 Review t 1 d i q Vector Space Representation Dot Product as Similarity Metric d j t 2 TF-IDF for Computing Weights w ij = f(i,j) * log(n/n i ) Where q =
More informationLogistics. CSE Case Studies. Indexing & Retrieval in Google. Review: AltaVista. BigTable. Index Stream Readers (ISRs) Advanced Search
CSE 454 - Case Studies Indexing & Retrieval in Google Some slides from http://www.cs.huji.ac.il/~sdbi/2000/google/index.htm Logistics For next class Read: How to implement PageRank Efficiently Projects
More informationPart II: Memory Management. Chapter 7: Physical Memory Chapter 8: Virtual Memory Chapter 9: Sharing Data and Code in Main Memory
Part II: Memory Management Chapter 7: Physical Memory Chapter 8: Virtual Memory Chapter 9: Sharing Data and Code in Main Memory 1 7. Physical Memory 7.1 Preparing a Program for Execution Program Transformations
More information7. Query Processing and Optimization
7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one
More informationColumn-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi
Column-Stores vs. Row-Stores How Different are they Really? Arul Bharathi Authors Daniel J.Abadi Samuel R. Madden Nabil Hachem 2 Contents Introduction Row Oriented Execution Column Oriented Execution Column-Store
More informationThe levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms
The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested
More informationCompression of the Stream Array Data Structure
Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In
More informationPage Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices
Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices Ilhoon Shin Seoul National University of Science & Technology ilhoon.shin@snut.ac.kr Abstract As the amount of digitized
More informationA Combined Semi-Pipelined Query Processing Architecture For Distributed Full-Text Retrieval
A Combined Semi-Pipelined Query Processing Architecture For Distributed Full-Text Retrieval Simon Jonassen and Svein Erik Bratsberg Department of Computer and Information Science Norwegian University of
More informationHYRISE In-Memory Storage Engine
HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation 12.2
More informationComputer Memory. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1
Computer Memory Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1 Warm Up public int sum1(int n, int m, int[][] table) { int output = 0; for (int i = 0; i < n; i++) { for (int j = 0; j
More informationMemory Management. An expensive way to run multiple processes: Swapping. CPSC 410/611 : Operating Systems. Memory Management: Paging / Segmentation 1
Memory Management Logical vs. physical address space Fragmentation Paging Segmentation An expensive way to run multiple processes: Swapping swap_out OS swap_in start swapping store memory ready_sw ready
More informationIndexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze
Indexing UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze All slides Addison Wesley, 2008 Table of Content Inverted index with positional information
More informationRealtime Search with Lucene. Michael
Realtime Search with Lucene Michael Busch @michibusch michael@twitter.com buschmi@apache.org 1 Realtime Search with Lucene Agenda Introduction - Near-realtime Search (NRT) - Searching DocumentsWriter s
More informationDelhi Noida Bhopal Hyderabad Jaipur Lucknow Indore Pune Bhubaneswar Kolkata Patna Web: Ph:
Serial :. T_CS_A_Operating System_ Delhi Noida Bhopal yderabad Jaipur Lucknow Indore une Bhubaneswar Kolkata atna Web: E-mail: info@madeeasy.in h: - CLASS TEST - COMUTER SCIENCE & IT Subject : Operating
More informationdoc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague
Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of
More informationCS307: Operating Systems
CS307: Operating Systems Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building 3-513 wuct@cs.sjtu.edu.cn Download Lectures ftp://public.sjtu.edu.cn
More informationLECTURE 12. Virtual Memory
LECTURE 12 Virtual Memory VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a cache for magnetic disk. The mechanism by which this is accomplished
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More informationIndexing and Searching
Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 9 2. Information Retrieval:
More informationChapter 8: Memory Management
Chapter 8: Memory Management Chapter 8: Memory Management Background Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging 8.2 Silberschatz, Galvin and Gagne 2005 Background Program/Code
More informationChapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition
Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation
More informationMain Memory. Electrical and Computer Engineering Stephen Kim ECE/IUPUI RTOS & APPS 1
Main Memory Electrical and Computer Engineering Stephen Kim (dskim@iupui.edu) ECE/IUPUI RTOS & APPS 1 Main Memory Background Swapping Contiguous allocation Paging Segmentation Segmentation with paging
More informationRecap: lecture 2 CS276A Information Retrieval
Recap: lecture 2 CS276A Information Retrieval Stemming, tokenization etc. Faster postings merges Phrase queries Lecture 3 This lecture Index compression Space estimation Corpus size for estimates Consider
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationInformation Systems (Informationssysteme)
Information Systems (Informationssysteme) Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2018 c Jens Teubner Information Systems Summer 2018 1 Part IX B-Trees c Jens Teubner Information
More informationV.2 Index Compression
V.2 Index Compression Heap s law (empirically observed and postulated): Size of the vocabulary (distinct terms) in a corpus E[ distinct terms in corpus] n with total number of term occurrences n, and constants,
More informationOutlook. Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium
Main Memory Outlook Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium 2 Backgound Background So far we considered how to share
More informationAssignment No. 1. Abdurrahman Yasar. June 10, QUESTION 1
COMPUTER ENGINEERING DEPARTMENT BILKENT UNIVERSITY Assignment No. 1 Abdurrahman Yasar June 10, 2014 1 QUESTION 1 Consider the following search results for two queries Q1 and Q2 (the documents are ranked
More informationFile Shredders. and, just what is a file?
File Shredders. File shredders delete a file but they do that in a way that is different from how the Windows operating system (and all regular Windows applications) delete files. To understand the difference,
More informationMEMORY MANAGEMENT/1 CS 409, FALL 2013
MEMORY MANAGEMENT Requirements: Relocation (to different memory areas) Protection (run time, usually implemented together with relocation) Sharing (and also protection) Logical organization Physical organization
More informationFile Systems. CS170 Fall 2018
File Systems CS170 Fall 2018 Table of Content File interface review File-System Structure File-System Implementation Directory Implementation Allocation Methods of Disk Space Free-Space Management Contiguous
More informationEvaluation Strategies for Top-k Queries over Memory-Resident Inverted Indexes
Evaluation Strategies for Top-k Queries over Memory-Resident Inverted Indexes Marcus Fontoura 1, Vanja Josifovski 2, Jinhui Liu 2, Srihari Venkatesan 2, Xiangfei Zhu 2, Jason Zien 2 1. Google Inc., 1600
More informationMemory management: outline
Memory management: outline Concepts Swapping Paging o Multi-level paging o TLB & inverted page tables 1 Memory size/requirements are growing 1951: the UNIVAC computer: 1000 72-bit words! 1971: the Cray
More informationMemory management: outline
Memory management: outline Concepts Swapping Paging o Multi-level paging o TLB & inverted page tables 1 Memory size/requirements are growing 1951: the UNIVAC computer: 1000 72-bit words! 1971: the Cray
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationSingle-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder
Single-pass restore after a media failure Caetano Sauer, Goetz Graefe, Theo Härder 20% of drives fail after 4 years High failure rate on first year (factory defects) Expectation of 50% for 6 years https://www.backblaze.com/blog/how-long-do-disk-drives-last/
More information[537] Flash. Tyler Harter
[537] Flash Tyler Harter Flash vs. Disk Disk Overview I/O requires: seek, rotate, transfer Inherently: - not parallel (only one head) - slow (mechanical) - poor random I/O (locality around disk head) Random
More informationCSCI 5417 Information Retrieval Systems Jim Martin!
CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 4 9/1/2011 Today Finish up spelling correction Realistic indexing Block merge Single-pass in memory Distributed indexing Next HW details 1 Query
More informationCS 550 Operating Systems Spring File System
1 CS 550 Operating Systems Spring 2018 File System 2 OS Abstractions Process: virtualization of CPU Address space: virtualization of memory The above to allow a program to run as if it is in its own private,
More informationvsan Stretched Cluster Bandwidth Sizing First Published On: Last Updated On:
vsan Stretched Cluster Bandwidth Sizing First Published On: 07-20-2016 Last Updated On: 11-22-2017 1 Table of Contents 1. VSAN Stretched Cluster 1.1.Overview 1.2.General Guidelines 1.3.Bandwidth Requirements
More informationOutline of the course
Outline of the course Introduction to Digital Libraries (15%) Description of Information (30%) Access to Information (30%) User Services (10%) Additional topics (15%) Buliding of a (small) digital library
More informationChapter 8: Memory-Management Strategies
Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and
More informationEngineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05
Engineering Goals Scalability Availability Transactional behavior Security EAI... Scalability How much performance can you get by adding hardware ($)? Performance perfect acceptable unacceptable Processors
More informationColumn Stores vs. Row Stores How Different Are They Really?
Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background
More informationCorso di Biblioteche Digitali
Corso di Biblioteche Digitali Vittore Casarosa casarosa@isti.cnr.it tel. 050-315 3115 cell. 348-397 2168 Ricevimento dopo la lezione o per appuntamento Valutazione finale 70-75% esame orale 25-30% progetto
More informationDistributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016
Distributed Systems 05r. Case study: Google Cluster Architecture Paul Krzyzanowski Rutgers University Fall 2016 1 A note about relevancy This describes the Google search cluster architecture in the mid
More informationIndex Construction. Slides by Manning, Raghavan, Schutze
Introduction to Information Retrieval ΕΠΛ660 Ανάκτηση Πληροφοριών και Μηχανές Αναζήτησης ης Index Construction ti Introduction to Information Retrieval Plan Last lecture: Dictionary data structures Tolerant
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More information6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS
Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long
More informationBigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13
Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 4: Index Construction Hinrich Schütze, Christina Lioma Institute for Natural Language Processing, University of Stuttgart 2010-05-04
More informationEfficiency vs. Effectiveness in Terabyte-Scale IR
Efficiency vs. Effectiveness in Terabyte-Scale Information Retrieval Stefan Büttcher Charles L. A. Clarke University of Waterloo, Canada November 17, 2005 1 2 3 4 5 6 What is Wumpus? Multi-user file system
More informationStorage hierarchy. Textbook: chapters 11, 12, and 13
Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular
More informationPrinceton University. Computer Science 217: Introduction to Programming Systems. The Memory/Storage Hierarchy and Virtual Memory
Princeton University Computer Science 27: Introduction to Programming Systems The Memory/Storage Hierarchy and Virtual Memory Goals of this Lecture Help you learn about: Locality and caching The memory
More informationStorage Management 1
Storage Management Goals of this Lecture Help you learn about: Locality and caching Typical storage hierarchy Virtual memory How the hardware and OS give applications the illusion of a large, contiguous,
More informationWeb Information Retrieval. Lecture 4 Dictionaries, Index Compression
Web Information Retrieval Lecture 4 Dictionaries, Index Compression Recap: lecture 2,3 Stemming, tokenization etc. Faster postings merges Phrase queries Index construction This lecture Dictionary data
More informationMain Memory CHAPTER. Exercises. 7.9 Explain the difference between internal and external fragmentation. Answer:
7 CHAPTER Main Memory Exercises 7.9 Explain the difference between internal and external fragmentation. a. Internal fragmentation is the area in a region or a page that is not used by the job occupying
More information