L02 : 08/21/2015 L03 : 08/24/2015.
|
|
- Constance Letitia Simon
- 6 years ago
- Views:
Transcription
1 L02 : 08/21/ Multimedia use to be the idea of Big Data Definition of Big Data is moving data (It will be different in yrs). Big data is highly complex. One way to look at Big data is by the drivers of big data What makes big data possible? What is the hype? 1 Terabyte(TB) = 1,000 GB 1 Petabyte(PB) = 1,000 TB 1 Exabyte(EB) = 1,000 PB L03 : 08/24/2015 Nature of Big Data Challenges Data structures for Big Data 5 V s 1) Velocity a) Data in Motion b) Streaming Data 2) Volume 3) Variety 4) Value 5) Veracity
2 L04 : 08/26/2015 OVERVIEW I/O Problems (continued) Searching on Big Data Suffix trees Intro Properties Applications Have CPU and data is stored in RAM. We can go between CPU and RAM to do calculations. Assume there is infinite memory. There is no difference between time for the RAM and CPU. To get from bottom level to RAM could take a lot of time.
3 Basic RAM model of computation: Capability problem given situation: Problem: Disk input/output is really slow. Disk access time approximately equals 10^6 times for CPU to process the data. Scalability Problem: Process time is growing:
4 Ultimately we want a single I/O, but we have block I/O How to solve the problem for computing systems, for reading and writing data: Technology can reduce time between disk and CPU Reduce the disk I/O operations. Adjeroh s Solution below: Need to introduce some notations: B = # of blocks read at a time N = total number of items (Amount of data we need to read) M= number of items that can fit in main memory (main memory size) Make assumption that memory is bigger than B 2 (M >= B 2 ) If you want to read every item you have to do N /B Simple scanning will be N /B I/Os rather N I/O operations Locality is Key!!! Simple example: Traversing a linked list. N = 10 B = 2 M = 4 Basic Algorithm:
5 Reading data 2 items at a time. > Algorithm approximately equals N=10 I/O s Improved Placement >Number of I/O s approximately equals N/B = 5 I/O s Consider when: N= 256 x 10^6, B=8000 disk access time = 1ms Using basic algorithm: Time needed = 71 hours Using improved placement: Time needed approximately equals 32 sec. *Block I/O is hardware issue but we must understand the software side of the issue.* Standard results on block I/O Basic Algorithm Improved Algorithm Scanning N N/B Sorting NlogN N/B * log m/b (N/B) Permitting N min { N, N/B * log m/b (N/B) Searching log 2 N Log B N We want to sort data to make reading data more efficient.
6 Search Data Structures >Finding the item >Ranking the web pages Simple Naive Search Given the database T, the pattern P, Find all positions in T where P occurs. Three types of search questions: > Decision query > Counting query > Enumerate/location query T: 1 N P: 1 m Will take a long time to find answer. Will take O(Nm). We need to focus on decreasing this time to seconds. L03 : 08/24/2015 Suffix Trees Intro Searching With Construction Problems Suffix Arrays Naive Search Algorithm Inputs: T= t 1 t 2... t n P =P 1 P 2... P m
7 Best case is n. > If you have a big data set, the N can be quite big. Overall time = O((n m+1)m) = O(nm) > On average: O(n) EXAMPLE: searching on google Suffix Trees T = acraca$ Prefixes : a ac acr.. acraca$ Suffix Tree (ST): >A tree that represents all of the suffixes in a given strip. ex. T= acraca$
8 > If we take a given node the branches from that node will have different symbols > These trees have algorithms that were used to construct them(in slide handout) > Look at SUFFIX TREE FROM LCA LIST to construct a tree in linear time. (Pg. 70) Storing data in O(n) is a problem. A search Trie only takes O(m) Suffix tree requires 33n Bytes to store (each integer is 4 bits) L06 : 08/31/2015 Problems with ST s Suffix Arrays Intro Searching with Construction
9 Generaqlized Suffix Tree If we have multiple sequences and want to search on them at the same time: ex T1, T2,... Tk T = T 1 $ 1, T 2 $ 2 T k $ k Representing a node as an array > Consider the two types of nodes: >Internal Node: >Leaf Node:
10 Ways to represent Nodes: O(m) Using arrays at each node (Fastest Search) O(m* ) Using linked list at each node O(m*log ) binary tree (Sigma is very small compared to total length of the sequence) Size of the ST >Original Text = 1n bytes (Assuming is 256) >We can represent 1 symbol using 1 byte. > At each node we have an integer I.D. >Internal Nodes: Node I.D. > 1int = 4n bytes parent ID > 1int = 4n bytes Edge labels > 2int = 8n bytes Leaf Nodes: ID > 4n bytes parent > 4n bytes Suffix Links > 8n bytes Total : 33n bytes The issue is that if we look at the 33n then 33n +n can be quite huge. T = a c r a c a $ Suffix: T= acraca$ 1 craca$ 2 raca$ 3 aca$ 4 ca$ 5 a$ 6 $ 7
11 Searching with the SA: > Binary search using the SA based off of example above: P=c r y P=p 1 p 2... p m SA = [ ] STEP 1: STEP 2: c = = T SA[4] = a?? NOPE c > a c = = T SA[6] = c?? YES m is the number of binary searches we need to make. m*logn >Size will be 1n + 4n bytes = 5n bytes > WE want to avoid suffix trees and get into suffix arrays
12 L07 : 09/02/2015 Searching on SA SA Construction LCP (Longest Common Prefix) From SA to ST Recall: T= a c r a c a $ P= c r y n= T, m= P 10 SA Sorted Suffixes 1 7 $ 2 6 a$ 3 4 aca$ 4 1 acraca$ 5 5 ca$ 6 2 craca$ 7 3 raca$ *Trace through this example with the code below to find out if the pattern matches.*
13 >Can traverse the suffix tree nodes from left to right to give us the suffix array. Searching with SA (via Binary Search) Example: T= a c r a c a $ P= c r y
14 when k = 1 mid=1 + 7/2 = 4 T SA[mid] [1] == P[1]?? T 1 [1]= a==p[1]=c NO c>a low=mid+1; mid=low+high/2=6 ST: size(st T )>=33n bytes size(sa T )>=5n bytes >A suffix tree is light weight Construction of suffix array 1) Simply list the suffixes, then sort them. Each suffix has n length. >Need O(nlogn)*O(n) => O(n 2 logn) 2) Traverse the ST depth first from left to right. =>O(n) time, O(n) space *Look at Manber Myers suffix sorting algorithm in text* L08 : 09/04/2015 Suffix Arrays (continued) Construction LCP PageRank Intro Algorithm Problems Trust Rank O(n 2 ) direct sorting of suffixes
15 O(n) via ST Today will go through O(nlogn) successive doubling (without ST). And we will talk about O(n) without ST. History of ST and SA > The whole idea of suffix tree was introduced in > It was not till around 1991 till we have what is now called UK Konen s Algorithm.(33n) > Farach in 1996 introduced dividing suffixes into two groups.(76n) > In 1993 a Suffix array was discovered: Manber & Meyer T + SA = (1+4)bytes =. Required O(nlogn) to construct. Use first column to induce the other column. Can exploit the letters already found in previous columns. Successive doubling: Constructing Suffix Array
16 > LCP (longest Common Prefix) LCP
17 * If you have your suffix array you can construct a suffix tree and find LCP.* LCA = longest common ancestor depth of LCA: Page Rank
18 Damping factor means that a certain node will always point to another node. PR i (K) = (1 d)/n + PR i (k)
Applications of Suffix Tree
Applications of Suffix Tree Let us have a glimpse of the numerous applications of suffix trees. Exact String Matching As already mentioned earlier, given the suffix tree of the text, all occ occurrences
More informationData Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data
More informationCSE 373 Lecture 19: Wrap-Up of Sorting
CSE 373 Lecture 19: Wrap-Up of Sorting What s on our platter today? How fast can the fastest sorting algorithm be? Lower bound on comparison-based sorting Tricks to sort faster than the lower bound External
More informationParallel Distributed Memory String Indexes
Parallel Distributed Memory String Indexes Efficient Construction and Querying Patrick Flick & Srinivas Aluru Computational Science and Engineering Georgia Institute of Technology 1 In this talk Overview
More informationSolution to Problem 1 of HW 2. Finding the L1 and L2 edges of the graph used in the UD problem, using a suffix array instead of a suffix tree.
Solution to Problem 1 of HW 2. Finding the L1 and L2 edges of the graph used in the UD problem, using a suffix array instead of a suffix tree. The basic approach is the same as when using a suffix tree,
More informationSuffix links are stored for compact trie nodes only, but we can define and compute them for any position represented by a pair (u, d):
Suffix links are the same as Aho Corasick failure links but Lemma 4.4 ensures that depth(slink(u)) = depth(u) 1. This is not the case for an arbitrary trie or a compact trie. Suffix links are stored for
More informationSuffix-based text indices, construction algorithms, and applications.
Suffix-based text indices, construction algorithms, and applications. F. Franek Computing and Software McMaster University Hamilton, Ontario 2nd CanaDAM Conference Centre de recherches mathématiques in
More informationMain Memory and the CPU Cache
Main Memory and the CPU Cache CPU cache Unrolled linked lists B Trees Our model of main memory and the cost of CPU operations has been intentionally simplistic The major focus has been on determining
More information4. Suffix Trees and Arrays
4. Suffix Trees and Arrays Let T = T [0..n) be the text. For i [0..n], let T i denote the suffix T [i..n). Furthermore, for any subset C [0..n], we write T C = {T i i C}. In particular, T [0..n] is the
More informationSuffix Trees. Martin Farach-Colton Rutgers University & Tokutek, Inc
Suffix Trees Martin Farach-Colton Rutgers University & Tokutek, Inc What s in this talk? What s a suffix tree? What can you do with them? How do you build them? A challenge problem What s in this talk?
More informationLowest Common Ancestor (LCA) Queries
Lowest Common Ancestor (LCA) Queries A technique with application to approximate matching Chris Lewis Approximate Matching Match pattern to text Insertion/Deletion/Substitution Applications Bioinformatics,
More informationLecture 5: Suffix Trees
Longest Common Substring Problem Lecture 5: Suffix Trees Given a text T = GGAGCTTAGAACT and a string P = ATTCGCTTAGCCTA, how do we find the longest common substring between them? Here the longest common
More informationSpring 2017 EXTERNAL SORTING (CH. 13 IN THE COW BOOK) 2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel,
Spring 2017 EXTERNAL SORTING (CH. 13 IN THE COW BOOK) 2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 1 Motivation for External Sort Often have a large (size greater than the available
More informationThe Anatomy of a Large-Scale Hypertextual Web Search Engine
The Anatomy of a Large-Scale Hypertextual Web Search Engine Article by: Larry Page and Sergey Brin Computer Networks 30(1-7):107-117, 1998 1 1. Introduction The authors: Lawrence Page, Sergey Brin started
More informationSuffix Arrays Slides by Carl Kingsford
Suffix Arrays 02-714 Slides by Carl Kingsford Suffix Arrays Even though Suffix Trees are O(n) space, the constant hidden by the big-oh notation is somewhat big : 20 bytes / character in good implementations.
More informationLinear Work Suffix Array Construction
Linear Work Suffix Array Construction Juha Karkkainen, Peter Sanders, Stefan Burkhardt Presented by Roshni Sahoo March 7, 2019 Presented by Roshni Sahoo Linear Work Suffix Array Construction March 7, 2019
More informationIntroduction to I/O Efficient Algorithms (External Memory Model)
Introduction to I/O Efficient Algorithms (External Memory Model) Jeff M. Phillips August 30, 2013 Von Neumann Architecture Model: CPU and Memory Read, Write, Operations (+,,,...) constant time polynomially
More informationAn AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time
B + -TREES MOTIVATION An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations finish within O(log N) time The theoretical conclusion
More informationSuffix trees and applications. String Algorithms
Suffix trees and applications String Algorithms Tries a trie is a data structure for storing and retrieval of strings. Tries a trie is a data structure for storing and retrieval of strings. x 1 = a b x
More informationCSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012
CSE2: Data Abstractions Lecture 7: B Trees James Fogarty Winter 20 The Dictionary (a.k.a. Map) ADT Data: Set of (key, value) pairs keys must be comparable insert(jfogarty,.) Operations: insert(key,value)
More informationDesign and Analysis of Algorithms Lecture- 9: B- Trees
Design and Analysis of Algorithms Lecture- 9: B- Trees Dr. Chung- Wen Albert Tsao atsao@svuca.edu www.408codingschool.com/cs502_algorithm 1/12/16 Slide Source: http://www.slideshare.net/anujmodi555/b-trees-in-data-structure
More informationThe Right Read Optimization is Actually Write Optimization. Leif Walsh
The Right Read Optimization is Actually Write Optimization Leif Walsh leif@tokutek.com The Right Read Optimization is Write Optimization Situation: I have some data. I want to learn things about the world,
More informationIndexing and Searching
Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 8 2. Information Retrieval:
More informationSuccinct dictionary matching with no slowdown
LIAFA, Univ. Paris Diderot - Paris 7 Dictionary matching problem Set of d patterns (strings): S = {s 1, s 2,..., s d }. d i=1 s i = n characters from an alphabet of size σ. Queries: text T occurrences
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationCSE 373 OCTOBER 25 TH B-TREES
CSE 373 OCTOBER 25 TH S ASSORTED MINUTIAE Project 2 is due tonight Make canvas group submissions Load factor: total number of elements / current table size Can select any load factor (but since we don
More informationScan Algorithm Effects on Parallelism and Memory Conflicts
Scan Algorithm Effects on Parallelism and Memory Conflicts 11 Parallel Prefix Sum (Scan) Definition: The all-prefix-sums operation takes a binary associative operator with identity I, and an array of n
More informationExternal Sorting Implementing Relational Operators
External Sorting Implementing Relational Operators 1 Readings [RG] Ch. 13 (sorting) 2 Where we are Working our way up from hardware Disks File abstraction that supports insert/delete/scan Indexing for
More informationString Matching Algorithms
String Matching Algorithms 1. Naïve String Matching The naïve approach simply test all the possible placement of Pattern P[1.. m] relative to text T[1.. n]. Specifically, we try shift s = 0, 1,..., n -
More informationOn the Performance of MapReduce: A Stochastic Approach
On the Performance of MapReduce: A Stochastic Approach Sarker Tanzir Ahmed and Dmitri Loguinov Internet Research Lab Department of Computer Science and Engineering Texas A&M University October 28, 2014
More informationMotivation for Sorting. External Sorting: Overview. Outline. CSE 190D Database System Implementation. Topic 3: Sorting. Chapter 13 of Cow Book
Motivation for Sorting CSE 190D Database System Implementation Arun Kumar User s SQL query has ORDER BY clause! First step of bulk loading of a B+ tree index Used in implementations of many relational
More informationXML Storage and Indexing
XML Storage and Indexing Web Data Management and Distribution Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook
More informationPS2 out today. Lab 2 out today. Lab 1 due today - how was it?
6.830 Lecture 7 9/25/2017 PS2 out today. Lab 2 out today. Lab 1 due today - how was it? Project Teams Due Wednesday Those of you who don't have groups -- send us email, or hand in a sheet with just your
More informationLecture 9 March 4, 2010
6.851: Advanced Data Structures Spring 010 Dr. André Schulz Lecture 9 March 4, 010 1 Overview Last lecture we defined the Least Common Ancestor (LCA) and Range Min Query (RMQ) problems. Recall that an
More informationCS 310: Memory Hierarchy and B-Trees
CS 310: Memory Hierarchy and B-Trees Chris Kauffman Week 14-1 Matrix Sum Given an M by N matrix X, sum its elements M rows, N columns Sum R given X, M, N sum = 0 for i=0 to M-1{ for j=0 to N-1 { sum +=
More informationCSCI 104 Tries. Mark Redekopp David Kempe
1 CSCI 104 Tries Mark Redekopp David Kempe TRIES 2 3 Review of Set/Map Again Recall the operations a set or map performs Insert(key) Remove(key) find(key) : bool/iterator/pointer Get(key) : value [Map
More informationIndex Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search
Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing
More informationSORTING. Practical applications in computing require things to be in order. To consider: Runtime. Memory Space. Stability. In-place algorithms???
SORTING + STRING COMP 321 McGill University These slides are mainly compiled from the following resources. - Professor Jaehyun Park slides CS 97SI - Top-coder tutorials. - Programming Challenges book.
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (I)
COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap
More informationIndexing and Searching
Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 9 2. Information Retrieval:
More informationMITOCW watch?v=ninwepprkdq
MITOCW watch?v=ninwepprkdq The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To
More informationEvaluating XPath Queries
Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 4: Index Construction 1 Plan Last lecture: Dictionary data structures Tolerant retrieval Wildcards Spell correction Soundex a-hu hy-m n-z $m mace madden mo
More informationAlgorithms and Data Structures Lesson 3
Algorithms and Data Structures Lesson 3 Michael Schwarzkopf https://www.uni weimar.de/de/medien/professuren/medieninformatik/grafische datenverarbeitung Bauhaus University Weimar May 30, 2018 Overview...of
More informationAlgorithm Analysis. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I
Algorithm Analysis College of Computing & Information Technology King Abdulaziz University CPCS-204 Data Structures I Order Analysis Judging the Efficiency/Speed of an Algorithm Thus far, we ve looked
More informationCpt S 223 Course Overview. Cpt S 223, Fall 2007 Copyright: Washington State University
Cpt S 223 Course Overview 1 Course Goals Learn about new/advanced data structures Be able to make design choices on the suitable data structure for different application/problem needs Analyze (objectively)
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationCS101 Lecture 30: How Search Works and searching algorithms.
CS101 Lecture 30: How Search Works and searching algorithms. John Magee 5 August 2013 Web Traffic - pct of Page Views Source: alexa.com, 4/2/2012 1 What You ll Learn Today Google: What is a search engine?
More informationLecture 18 April 12, 2005
6.897: Advanced Data Structures Spring 5 Prof. Erik Demaine Lecture 8 April, 5 Scribe: Igor Ganichev Overview In this lecture we are starting a sequence of lectures about string data structures. Today
More informationSuffix Tree and Array
Suffix Tree and rray 1 Things To Study So far we learned how to find approximate matches the alignments. nd they are difficult. Finding exact matches are much easier. Suffix tree and array are two data
More informationEfficient Data Structures for Tamper-Evident Logging
Efficient Data Structures for Tamper-Evident Logging Scott A. Crosby Dan S. Wallach Rice University Everyone has logs Tamper evident solutions Current commercial solutions Write only hardware appliances
More informationMapReduce, Hadoop and Spark. Bompotas Agorakis
MapReduce, Hadoop and Spark Bompotas Agorakis Big Data Processing Most of the computations are conceptually straightforward on a single machine but the volume of data is HUGE Need to use many (1.000s)
More informationCSE 506: Opera.ng Systems. The Page Cache. Don Porter
The Page Cache Don Porter 1 Logical Diagram Binary Formats RCU Memory Management File System Memory Allocators Threads System Calls Today s Lecture Networking (kernel level Sync mem. management) Device
More information4. Suffix Trees and Arrays
4. Suffix Trees and Arrays Let T = T [0..n) be the text. For i [0..n], let T i denote the suffix T [i..n). Furthermore, for any subset C [0..n], we write T C = {T i i C}. In particular, T [0..n] is the
More informationMassive Data Algorithmics. Lecture 12: Cache-Oblivious Model
Typical Computer Hierarchical Memory Basics Data moved between adjacent memory level in blocks A Trivial Program A Trivial Program: d = 1 A Trivial Program: d = 1 A Trivial Program: n = 2 24 A Trivial
More informationSuffix Trees and Arrays
Suffix Trees and Arrays Yufei Tao KAIST May 1, 2013 We will discuss the following substring matching problem: Problem (Substring Matching) Let σ be a single string of n characters. Given a query string
More informationIndexing and Searching
Indexing and Searching Introduction How to retrieval information? A simple alternative is to search the whole text sequentially Another option is to build data structures over the text (called indices)
More informationCOMP4128 Programming Challenges
Multi- COMP4128 Programming Challenges School of Computer Science and Engineering UNSW Australia Table of Contents 2 Multi- 1 2 Multi- 3 3 Multi- Given two strings, a text T and a pattern P, find the first
More informationCSE 373: Data Structures and Algorithms. Memory and Locality. Autumn Shrirang (Shri) Mare
CSE 373: Data Structures and Algorithms Memory and Locality Autumn 2018 Shrirang (Shri) Mare shri@cs.washington.edu Thanks to Kasey Champion, Ben Jones, Adam Blank, Michael Lee, Evan McCarty, Robbie Weber,
More informationIn-memory processing of big data via succinct data structures
In-memory processing of big data via succinct data structures Rajeev Raman University of Leicester SDP Workshop, University of Cambridge Overview Introduction Succinct Data Structuring Succinct Tries Applications
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:
More informationBinary Trees
Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what
More informationThe Page Cache 3/16/16. Logical Diagram. Background. Recap of previous lectures. The address space abstracvon. Today s Problem.
The Page Cache Don Porter Binary Formats RCU Memory Management File System Logical Diagram Memory Allocators Threads System Calls Today s Lecture Networking (kernel level Sync mem. management) Device CPU
More informationIndexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25
Indexing Jan Chomicki University at Buffalo Jan Chomicki () Indexing 1 / 25 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow (nanosec) (10 nanosec) (millisec) (sec) Very small Small
More informationUNIT III BALANCED SEARCH TREES AND INDEXING
UNIT III BALANCED SEARCH TREES AND INDEXING OBJECTIVE The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in constant
More informationIndex construction CE-324: Modern Information Retrieval Sharif University of Technology
Index construction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch.
More informationScan Primitives for GPU Computing
Scan Primitives for GPU Computing Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 2 Prefix
More informationDIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS
CHAPTER 11 SORTING ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA AND MOUNT (WILEY 2004) AND SLIDES FROM NANCY M. AMATO AND
More informationApplication of TRIE data structure and corresponding associative algorithms for process optimization in GRID environment
Application of TRIE data structure and corresponding associative algorithms for process optimization in GRID environment V. V. Kashansky a, I. L. Kaftannikov b South Ural State University (National Research
More information3-2. Index construction. Most slides were adapted from Stanford CS 276 course and University of Munich IR course.
3-2. Index construction Most slides were adapted from Stanford CS 276 course and University of Munich IR course. 1 Ch. 4 Index construction How do we construct an index? What strategies can we use with
More information(Refer Slide Time: 0:19)
Theory of Computation. Professor somenath Biswas. Department of Computer Science & Engineering. Indian Institute of Technology, Kanpur. Lecture-15. Decision Problems for Regular Languages. (Refer Slide
More informationBalanced Trees Part One
Balanced Trees Part One Balanced Trees Balanced search trees are among the most useful and versatile data structures. Many programming languages ship with a balanced tree library. C++: std::map / std::set
More informationIndexing Web pages. Web Search: Indexing Web Pages. Indexing the link structure. checkpoint URL s. Connectivity Server: Node table
Indexing Web pages Web Search: Indexing Web Pages CPS 296.1 Topics in Database Systems Indexing the link structure AltaVista Connectivity Server case study Bharat et al., The Fast Access to Linkage Information
More informationData Compression. Guest lecture, SGDS Fall 2011
Data Compression Guest lecture, SGDS Fall 2011 1 Basics Lossy/lossless Alphabet compaction Compression is impossible Compression is possible RLE Variable-length codes Undecidable Pigeon-holes Patterns
More informationCache-efficient string sorting for Burrows-Wheeler Transform. Advait D. Karande Sriram Saroop
Cache-efficient string sorting for Burrows-Wheeler Transform Advait D. Karande Sriram Saroop What is Burrows-Wheeler Transform? A pre-processing step for data compression Involves sorting of all rotations
More informationLecture 7 February 26, 2010
6.85: Advanced Data Structures Spring Prof. Andre Schulz Lecture 7 February 6, Scribe: Mark Chen Overview In this lecture, we consider the string matching problem - finding all places in a text where some
More informationIndex construction CE-324: Modern Information Retrieval Sharif University of Technology
Index construction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch.
More informationCSE 417 Dynamic Programming (pt 5) Multiple Inputs
CSE 417 Dynamic Programming (pt 5) Multiple Inputs Reminders > HW5 due Wednesday Dynamic Programming Review > Apply the steps... optimal substructure: (small) set of solutions, constructed from solutions
More informationExercise 1 : B-Trees [ =17pts]
CS - Fall 003 Assignment Due : Thu November 7 (written part), Tue Dec 0 (programming part) Exercise : B-Trees [+++3+=7pts] 3 0 3 3 3 0 Figure : B-Tree. Consider the B-Tree of figure.. What are the values
More informationPAPER Constructing the Suffix Tree of a Tree with a Large Alphabet
IEICE TRANS. FUNDAMENTALS, VOL.E8??, NO. JANUARY 999 PAPER Constructing the Suffix Tree of a Tree with a Large Alphabet Tetsuo SHIBUYA, SUMMARY The problem of constructing the suffix tree of a tree is
More informationSORTING, SETS, AND SELECTION
CHAPTER 11 SORTING, SETS, AND SELECTION ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA AND MOUNT (WILEY 2004) AND SLIDES FROM
More informationSorting atomic items. Chapter 5
Sorting atomic items Chapter 5 Sorting on 2-level memory model Atomic items occupy constant-fixed number of memory cells (no variable length). Usually 4 or 8 bytes. Sequence S of n atomic items with n>m
More informationFUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett
FUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett BUILDING BETTER TOOLS Cache-Oblivious Algorithms Succinct Data Structures RAM MODEL Almost everything you do in Haskell assumes this model Good for ADTs,
More informationOutline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014
Outline Gerênciade Dados daweb -DCC922 - XML Query Processing ( Apresentação basedaem material do livro-texto [Abiteboul et al., 2012]) 2014 Motivation Deep-first Tree Traversal Naïve Page-based Storage
More informationTreelogy: A Benchmark Suite for Tree Traversals
Purdue University Programming Languages Group Treelogy: A Benchmark Suite for Tree Traversals Nikhil Hegde, Jianqiao Liu, Kirshanthan Sundararajah, and Milind Kulkarni School of Electrical and Computer
More informationSuppose you are accessing elements of an array: ... or suppose you are dereferencing pointers: temp->next->next = elem->prev->prev;
CSE 100: B-TREE Memory accesses Suppose you are accessing elements of an array: if ( a[i] < a[j] ) {... or suppose you are dereferencing pointers: temp->next->next = elem->prev->prev;... or in general
More informationSorting. CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk
CMPS 2200 Fall 2014 Sorting Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk 11/11/14 CMPS 2200 Intro. to Algorithms 1 How fast can we sort? All the sorting algorithms
More information) $ f ( n) " %( g( n)
CSE 0 Name Test Spring 008 Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to compute the sum of the n elements of an integer array is: # A.
More informationChecking for duplicates Maximum density Battling computers and algorithms Barometer Instructions Big O expressions. John Edgar 2
CMPT 125 Checking for duplicates Maximum density Battling computers and algorithms Barometer Instructions Big O expressions John Edgar 2 Write a function to determine if an array contains duplicates int
More informationQB LECTURE #1: Algorithms and Dynamic Programming
QB LECTURE #1: Algorithms and Dynamic Programming Adam Siepel Nov. 16, 2015 2 Plan for Today Introduction to algorithms Simple algorithms and running time Dynamic programming Soon: sequence alignment 3
More informationPlot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;
How will execution time grow with SIZE? int array[size]; int A = ; for (int i = ; i < ; i++) { for (int j = ; j < SIZE ; j++) { A += array[j]; } TIME } Plot SIZE Actual Data 45 4 5 5 Series 5 5 4 6 8 Memory
More informationSuffix Arrays CMSC 423
Suffix Arrays CMSC Suffix Arrays Even though Suffix Trees are O(n) space, the constant hidden by the big-oh notation is somewhat big : 0 bytes / character in good implementations. If you have a 0Gb genome,
More informationSection 4 SOLUTION: AVL Trees & B-Trees
Section 4 SOLUTION: AVL Trees & B-Trees 1. What 3 properties must an AVL tree have? a. Be a binary tree b. Have Binary Search Tree ordering property (left children < parent, right children > parent) c.
More informationInformation Systems (Informationssysteme)
Information Systems (Informationssysteme) Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2018 c Jens Teubner Information Systems Summer 2018 1 Part IX B-Trees c Jens Teubner Information
More informationInformation Retrieval
Information Retrieval Suan Lee - Information Retrieval - 04 Index Construction 1 04 Index Construction - Information Retrieval - 04 Index Construction 2 Plan Last lecture: Dictionary data structures Tolerant
More informationSearching a Sorted Set of Strings
Department of Mathematics and Computer Science January 24, 2017 University of Southern Denmark RF Searching a Sorted Set of Strings Assume we have a set of n strings in RAM, and know their sorted order
More informationApplied Databases. Sebastian Maneth. Lecture 14 Indexed String Search, Suffix Trees. University of Edinburgh - March 9th, 2017
Applied Databases Lecture 14 Indexed String Search, Suffix Trees Sebastian Maneth University of Edinburgh - March 9th, 2017 2 Recap: Morris-Pratt (1970) Given Pattern P, Text T, find all occurrences of
More informationCHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang
CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365
More informationGiven a text file, or several text files, how do we search for a query string?
CS 840 Fall 2016 Text Search and Succinct Data Structures: Unit 4 Given a text file, or several text files, how do we search for a query string? Note the query/pattern is not of fixed length, unlike key
More information