Outline. The History of Histograms. Yannis Ioannidis University of Athens, Hellas
|
|
- Christal Douglas
- 6 years ago
- Views:
Transcription
1 The History of Histograms Yannis Ioannidis University of Athens, Hellas Outline Prehistory Definitions and Framework The Early Past 10 Years Ago The Recent Past Industry Competitors The Future
2 Prehistory Word `histogram of Greek origin `histo-s = `mast `gram-ma = `something written Not used originally in the Greek language! Introduced by Karl Pearson in 1892 for a common form of graphical representation Prehistory
3 Prehistory 1662: Concept exists at least since then in mortality tables of J. Graunt 1786: Bar charts introduced by W. Playfair to capture Scottish imports/exports 1833: Histograms introduced by A. M. Guerry as discrete approximations to distribution functions 1859: Florence Nightingale used them to compare mortality of soldiers and civilians Prehistory
4 Prehistory Playfair s bar chart Definitions Data Distributions One-dimensional data distribution = Set of (attribute value, frequency) pairs Large and non-uniform need compression and approximation Concentrate on numeric attributes
5 Definitions Data Distributions Freq Area Spread Definitions Data Distributions Combinations of multiple attribute values Joint frequency Multidimensional data distributions = Set of (value combination, joint frequency) pairs
6 Definitions Multidimensional Data Distributions Motivation Selectivity estimation Approximate query answering within Query optimization Query profiling for user feedback Load balancing for parallel join execution Partition-based temporal join execution
7 Definitions Histograms Partition data distribution into β disjoint buckets Approximate values (value combinations) and frequencies within each bucket Definitions Histograms Freq
8 Definitions Histograms Freq bucket 1 bucket 2 Framework Histogram Parameters Partition rule: 4 orthogonal parameters Partition class Sort parameter Partition constraint Source parameter Construction algorithm
9 Framework Histogram Parameters approximation within bucket Frequency approximation within bucket Error guarantees Framework Partition Class Indicates restrictions on partitioning Serial: non-overlapping ranges of sort parameter values End-biased: at most one non-singleton bucket
10 Framework Sort Parameter Derivative of data distribution element (its value and/or frequency) Attribute values (V) Frequencies (F) Areas (A) = spread x frequency Serial: buckets must contain contiguous sort parameter values Framework Partition Class and Sort Parameter VALUE FREQUENCY
11 Framework Partition Class and Sort Parameter VALUE FREQUENCY SORT PAR B B B B4 Framework Partition Class and Sort Parameter VALUE FREQUENCY SORT PAR B B B B4
12 Framework Partition Class and Sort Parameter VALUE FREQUENCY SORT PAR B B B B Framework Source Parameter Derivative of data distribution element (its value and/or frequency) Spreads (S) Frequencies (F) Cumulative frequencies (C) Areas (A) Partition constraint applied on source parameter
13 Framework Partition Constraint Mathematical constraint on the source parameter that partitioning must satisfy General direction: Avoid grouping vastly different source parameter values Framework Partition Constraint Equi-sum: equalize sums V-optimal: minimize variance Maxdiff: minimize maximum difference of adjacent source values Compressed: preserve high source values and equalize sums of the rest Spline-based: minimize square root of error
14 Framework Partition Constr. and Source Parameter VALUE FREQ SORT PAR SOURCE PAR B B B3 B4 Framework Histogram Parameters Notation class : constraint (sort, source) Special notation for serial partition class constraint (sort, source)
15 Framework Histogram Parameters Same parameters for multidimensional histograms Partition rule more intricate: not always analyzable into 4 orthogonal parameters No sort parameter often The Early Past Dark Ages Essentially, use of 1-bucket histograms Large errors
16 The Early Past First Appearance Kooi s PhD Thesis equi-width histograms equi-width = equi-sum (V, S) Adopted by INGRES The Early Past First Appearance Freq
17 The Early Past First Appearance Freq The Early Past First Alternative Don t equalize ranges of values but number of tuples in bucket equi-depth histograms equi-depth = equi-sum (V, F) Source is only difference Adopted by several commercial systems
18 The Early Past First Alternative Freq The Early Past Optimal Sort Parameter Theorem: For single join queries and accurate knowledge of values, serial histograms with frequency as sort parameter are optimal. Generalization of practice to keep highfrequency values accurately.
19 The Early Past Optimal Sort Parameters Freq 10 Years Ago Theorem: For single join queries and accurate knowledge of values, serial histograms with frequency as sort parameter are optimal.
20 The Recent Past Optimal partition constraints and source parameters? Optimality when values are not known accurately? Optimal values of other histogram characteristics? The Recent Past Optimal Constraint and Source Theorem: For the average join query and accurate knowledge of values, v-optimal histograms with frequency as source parameter are optimal. v-optimal (F, F) v-optimal: minimize variance of source values
21 The Recent Past Optimal Constraint and Source Freq The Recent Past If values are not known accurately, no optimality result on any histogram characteristic Several experimental results identify key choices
22 The Recent Past New Partition Constraints All try to group similar source values max-diff: bucket borders at highest differences of adjacent source values compressed: Preserve high values of source and equalize sums of the rest The Recent Past maxdiff Freq
23 The Recent Past compressed Freq The Recent Past Alternative Partition Constraints Variations on the optimal knot placement problem Linear splines only Discontinuous across bucket boundaries
24 Choices The Recent Past New Sort and Source Parameters Attribute values (V) Spreads (S) Frequencies (F) Areas (A) Cumulative frequencies (C) value is best sort parameter overall area and frequency are best source parameters overall The Recent Past Multidimensional Partition Rules Multidimensional value domain cannot be sorted to serve as sort parameter Many alternatives to partition the space of values into buckets Although possible, frequency has not been used as sort parameter
25 The Recent Past Multidimensional Partition Class A la Grid File A la K-D-B-Tree (MHIST) GENHIST STHoles The Recent Past Multidimensional Data Distributions
26 The Recent Past M-D Partition Class: Grid File 2 1 The Recent Past M-D Partition Class: MHIST 2 1
27 The Recent Past M-D Partition Class: GENHIST 2 1 The Recent Past M-D Partition Class: GENHIST 2 1
28 The Recent Past M-D Partition Class: GENHIST 2 1 The Recent Past M-D Partition Class: STHoles 2 1
29 The Recent Past Histogram Framework Partition rule Partition class Sort parameter Partition constraint Source parameter Construction algorithm and frequency approximation Error guarantees The Recent Past Approximation Continuous value assumption: (min and) max value Uniform spread assumption: above + number of unique values Popularity-based spread: above with fake num of unique values Kernel estimation
30 The Recent Past Approximation Freq 7 min max The Recent Past Approximation Freq 24 min max
31 The Recent Past Approximation All generalized to multidimensional case Tradeoff between number of buckets and information kept within each bucket The Recent Past Frequency Approximation Uniform distribution assumption: average frequency Linear spline approximation: above + spline s angle
32 The Recent Past Frequency Approximation Freq Industrial Presence Only 1-dimensional histograms 1970 s: trivial histograms (1 bucket) 1980 s: equi-width histograms 1990 s: equi-depth histograms 2000 s:
33 Industrial Presence DB2 compressed (V, F) Default of 10 singleton and 20 nonsingleton buckets Store cumulative frequencies Construction based on reservoir sample Indices used to quantify dependencies LEO learning is key Industrial Presence ORACLE equi-depth = equi-sum (V, F) Indices used to quantify dependencies On-the-fly dependence estimation Past selectivities stored for future use
34 Industrial Presence SQL Server max-diff (V, F) Up to 199 buckets Store cumulative frequencies Store frequency of max accurately Construction based on sample Indices use to quantify dependencies Histogram Competitors Wavelets Sampling (usually complementary) Specialized techniques
35 The Future Histograms and clustering Bucket recognition and representation Histograms and tree indices approximation Comprehensive technique comparison Other data types The Future Histograms and Clustering Clustering is identical problem! Grouping of similar elements into buckets (bucket = cluster = pattern) Small approximation within bucket Multidimensional elements are attribute value combinations above + frequency
36 The Future Histograms and Clustering Freq The Future Histograms and Clustering Freq
37 The Future Histograms and Clustering Freq The Future Histograms and Clustering Very different techniques Apply on one problem techniques developed for the other Partition rules Construction algorithms Approximate representations within bucket
38 The Future Bucket Recognition and Representation Essence of histograms or clustering Identify groups of similar elements Similarity on few characteristics (source) Store approximation of these characteristics Which are the similar characteristics? [Pattern Recognition] The Future Bucket Recognition and Representation Maybe not original element dimensions Maybe not the same for all groups
39 The Future Bucket Recognition and Representation Freq The Future Bucket Recognition and Representation Freq
40 The Future Bucket Recognition and Representation Freq The Future Bucket Recognition and Representation Freq
41 The Future Bucket Recognition and Representation Not clustering in the value-frequency space, but the spread-frequency space Why the difference in treatment? Is this always better? How can we recognize winner? Freq The Future Bucket Recognition and Representation
42 Freq The Future Bucket Recognition and Representation Freq The Future Bucket Recognition and Representation
43 Freq The Future Bucket Recognition and Representation The Future Histograms and Tree Indices Root of the B+ tree partitions space of values into non-overlapping buckets Each bucket further subdivided into smaller buckets Appropriate info next to each bucket turns each node into a histogram Entire B+ tree becomes Hierarchical Histogram
44 The Future Histograms and Tree Indices The Future Histograms and Tree Indices - Index fanout decreases +Indexing and estimation in one + Incremental estimation with increasing estimate accuracy
45 The Future Histograms and Tree Indices B+ tree node is equi-depth histogram What kind of trees with other constraints? V-optimal Max-diff Compressed Unbalanced trees: exact search slower Unbalanced trees: approximate answers more accurate The Future Histograms and Tree Indices Take into account query frequency Represent popular values more accurately higher in the tree New hierarchical histograms/indices may be faster than traditional ones
46 Conclusions Histograms very successful in databases Possibly best tradeoff between Simplicity Efficiency Effectiveness Applicability The Future New approaches to some characteristics Untouched foundational problems The next 10 years even more exciting!
Improved Histograms for Selectivity Estimation of Range Predicates
Improved Histograms for Selectivity Estimation of Range Predicates Abstract Viswanath Poosala University of Wisconsin-Madison poosala@cs.wisc.edu Peter J. Haas IBM Almaden Research Center peterh@almaden.ibm.com
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationInternational Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA
International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI
More informationApproximations in Database Systems
Approximations in Database Systems Yannis Ioannidis Dept. of Informatics and Telecommunications, University of Athens, Hellas (Greece) yannis@di.uoa.gr http://www.di.uoa.gr/ yannis Abstract. The need for
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More informationOptimization Overview
Lecture 17 Optimization Overview Lecture 17 Lecture 17 Today s Lecture 1. Logical Optimization 2. Physical Optimization 3. Course Summary 2 Lecture 17 Logical vs. Physical Optimization Logical optimization:
More informationModule 9: Selectivity Estimation
Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock
More informationUNIT 2 Data Preprocessing
UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and
More informationCOMP 465: Data Mining Still More on Clustering
3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following
More informationCost Models. the query database statistics description of computational resources, e.g.
Cost Models An optimizer estimates costs for plans so that it can choose the least expensive plan from a set of alternatives. Inputs to the cost model include: the query database statistics description
More informationSummary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4
Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is
More information1 Introduction Computing approximate answers to multi-dimensional range queries is a problem that arises in query optimization, data mining and data w
Paper Number 42 Approximating multi-dimensional aggregate range queries over real attributes Dimitrios Gunopulos George Kollios y Vassilis J. Tsotras z Carlotta Domeniconi x Abstract Finding approximate
More informationA popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines
A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationProject Participants
Annual Report for Period:10/2004-10/2005 Submitted on: 06/21/2005 Principal Investigator: Yang, Li. Award ID: 0414857 Organization: Western Michigan Univ Title: Projection and Interactive Exploration of
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationIntroduction to Spatial Database Systems
Introduction to Spatial Database Systems by Cyrus Shahabi from Ralf Hart Hartmut Guting s VLDB Journal v3, n4, October 1994 Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationIndexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel
Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes
More informationApproximation Algorithms for Geometric Intersection Graphs
Approximation Algorithms for Geometric Intersection Graphs Subhas C. Nandy (nandysc@isical.ac.in) Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108, India. Outline
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationHigh-Performance Parallel Database Processing and Grid Databases
High-Performance Parallel Database Processing and Grid Databases David Taniar Monash University, Australia Clement H.C. Leung Hong Kong Baptist University and Victoria University, Australia Wenny Rahayu
More informationCMPT 354: Database System I. Lecture 7. Basics of Query Optimization
CMPT 354: Database System I Lecture 7. Basics of Query Optimization 1 Why should you care? https://databricks.com/glossary/catalyst-optimizer https://sigmod.org/sigmod-awards/people/goetz-graefe-2017-sigmod-edgar-f-codd-innovations-award/
More informationChapter 9. Cardinality Estimation. How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11
Chapter 9 How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11 Wilhelm-Schickard-Institut für Informatik Universität Tübingen 9.1 Web Forms Applications
More informationBy Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad
By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad Data Analytics life cycle Discovery Data preparation Preprocessing requirements data cleaning, data integration, data reduction, data
More informationQuery Processing and Advanced Queries. Query Optimization (4)
Query Processing and Advanced Queries Query Optimization (4) Two-Pass Algorithms Based on Hashing R./S If both input relations R and S are too large to be stored in the buffer, hash all the tuples of both
More informationa. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.
Probability and Statistics Chapter 2 Notes I Section 2-1 A Steps to Constructing Frequency Distributions 1 Determine number of (may be given to you) a Should be between and classes 2 Find the Range a The
More informationReview: Query Evaluation Steps. Example Query: Logical Plan 1. What We Already Know. Example Query: Logical Plan 2.
Review: Query Evaluation Steps CSE 444: Database Internals SQL query Parse & Rewrite Query Lecture 10 Query Optimization (part 1) Query optimization Select Logical Plan Select Physical Plan Query Execution
More informationMultivariate Standard Normal Transformation
Multivariate Standard Normal Transformation Clayton V. Deutsch Transforming K regionalized variables with complex multivariate relationships to K independent multivariate standard normal variables is an
More informationStorage hierarchy. Textbook: chapters 11, 12, and 13
Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationMaximum Differential Graph Coloring
Maximum Differential Graph Coloring Sankar Veeramoni University of Arizona Joint work with Yifan Hu at AT&T Research and Stephen Kobourov at University of Arizona Motivation Map Coloring Problem Related
More informationOn Multi-Stack Boundary Labeling Problems
On Multi-Stack Boundary Labeling Problems MICHAEL A. BEKOS 1, MICHAEL KAUFMANN 2, KATERINA POTIKA 1, ANTONIOS SYMVONIS 1 1 National Technical University of Athens School of Applied Mathematical & Physical
More informationECT7110. Data Preprocessing. Prof. Wai Lam. ECT7110 Data Preprocessing 1
ECT7110 Data Preprocessing Prof. Wai Lam ECT7110 Data Preprocessing 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest,
More informationChap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary
Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model
More informationMultidimensional Data and Modelling - DBMS
Multidimensional Data and Modelling - DBMS 1 DBMS-centric approach Summary: l Spatial data is considered as another type of data beside conventional data in a DBMS. l Enabling advantages of DBMS (data
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationIntroduction to Indexing R-trees. Hong Kong University of Science and Technology
Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationCS 521 Data Mining Techniques Instructor: Abdullah Mueen
CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationData Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationEdge Equalized Treemaps
Edge Equalized Treemaps Aimi Kobayashi Department of Computer Science University of Tsukuba Ibaraki, Japan kobayashi@iplab.cs.tsukuba.ac.jp Kazuo Misue Faculty of Engineering, Information and Systems University
More information6. Parallel Volume Rendering Algorithms
6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks
More informationData Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA
Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI
More informationSegmentation of Images
Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a
More informationIndexing Techniques. Indexing Techniques in Warehousing The UB-Tree Algorithm. Prepared by: Supervised by: March 24, 2003
Indexing Techniques Indexing Techniques in Warehousing The UB-Tree Algorithm Prepared by: Supervised by: March 24, 2003 1 Outline! Indexing Techniques Overview! Indexing Issues! Introduction to the UB-Tree
More informationCS54100: Database Systems
CS54100: Database Systems Query Optimization 26 March 2012 Prof. Chris Clifton Query Optimization --> Generating and comparing plans Query Generate Plans Pruning x x Estimate Cost Cost Select Pick Min
More informationFebruary 2017 (1/20) 2 Piecewise Polynomial Interpolation 2.2 (Natural) Cubic Splines. MA378/531 Numerical Analysis II ( NA2 )
f f f f f (/2).9.8.7.6.5.4.3.2. S Knots.7.6.5.4.3.2. 5 5.2.8.6.4.2 S Knots.2 5 5.9.8.7.6.5.4.3.2..9.8.7.6.5.4.3.2. S Knots 5 5 S Knots 5 5 5 5.35.3.25.2.5..5 5 5.6.5.4.3.2. 5 5 4 x 3 3.5 3 2.5 2.5.5 5
More informationEvaluating Multidimensional Histograms in ProstgreSQL
Evaluating Multidimensional Histograms in ProstgreSQL Dougal Sutherland Swarthmore College 500 College Ave Swarthmore, PA dsuther1@swarthmore.edu Ryan Carlson Swarthmore College 500 College Ave Swarthmore,
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationPrinciples of Data Management. Lecture #13 (Query Optimization II)
Principles of Data Management Lecture #13 (Query Optimization II) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Reminder:
More informationTHE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER
THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationProcessing and Others. Xiaojun Qi -- REU Site Program in CVMA
Advanced Digital Image Processing and Others Xiaojun Qi -- REU Site Program in CVMA (0 Summer) Segmentation Outline Strategies and Data Structures Overview of Algorithms Region Splitting Region Merging
More informationData Organization and Processing
Data Organization and Processing Spatial Join (NDBI007) David Hoksza http://siret.ms.mff.cuni.cz/hoksza Outline Spatial join basics Relational join Spatial join Spatial join definition (1) Given two sets
More informationNorwegian University of Science and Technology Technical Report IDI-TR-02/2008. DYTAF: Dynamic Table Fragmentation in Distributed Database Systems
Norwegian University of Science and Technology Technical Report IDI-TR-02/2008 DYTAF: Dynamic Table Fragmentation in Distributed Database Systems Jon Olav Hauglid, Kjetil Nørvåg and Norvald H. Ryeng DASCOSA
More informationAUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved
AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH 2019 SNOWFLAKE Our vision Allow our customers to access all their data in one place so they can make actionable decisions anytime, anywhere, with any number
More informationQuery Processing. Introduction to Databases CompSci 316 Fall 2017
Query Processing Introduction to Databases CompSci 316 Fall 2017 2 Announcements (Tue., Nov. 14) Homework #3 sample solution posted in Sakai Homework #4 assigned today; due on 12/05 Project milestone #2
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationMesh Decimation. Mark Pauly
Mesh Decimation Mark Pauly Applications Oversampled 3D scan data ~150k triangles ~80k triangles Mark Pauly - ETH Zurich 280 Applications Overtessellation: E.g. iso-surface extraction Mark Pauly - ETH Zurich
More informationAn Introduction to Spatial Databases
An Introduction to Spatial Databases R. H. Guting VLDB Journal v3, n4, October 1994 Speaker: Giovanni Conforti Outline: a rather old (but quite complete) survey on Spatial DBMS Introduction & definition
More informationLearning to Recognize Faces in Realistic Conditions
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationDatabase Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.
Database Systems ( 料 ) December 13/14, 2006 Lecture #10 1 Announcement Assignment #4 is due next week. 2 1 Overview of Query Evaluation Chapter 12 3 Outline Query evaluation (Overview) Relational Operator
More informationBinary Image Processing. Introduction to Computer Vision CSE 152 Lecture 5
Binary Image Processing CSE 152 Lecture 5 Announcements Homework 2 is due Apr 25, 11:59 PM Reading: Szeliski, Chapter 3 Image processing, Section 3.3 More neighborhood operators Binary System Summary 1.
More informationRepeating Segment Detection in Songs using Audio Fingerprint Matching
Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm
More informationChapters 15 and 16b: Query Optimization
Chapters 15 and 16b: Query Optimization (Slides by Hector Garcia-Molina, http://wwwdb.stanford.edu/~hector/cs245/notes.htm) Chapters 15-16b 1 Query Optimization --> Generating and comparing plans Query
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables
More informationPoints Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked
Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations
More informationSTA Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationSTA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationATYPICAL RELATIONAL QUERY OPTIMIZER
14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe
More informationHierarchical Minimum Spanning Trees for Lossy Image Set Compression
Hierarchical Minimum Spanning Trees for Lossy Image Set Compression Anthony Schmieder, Barry Gergel, and Howard Cheng Department of Mathematics and Computer Science University of Lethbridge, Alberta, Canada
More information(Lec 14) Placement & Partitioning: Part III
Page (Lec ) Placement & Partitioning: Part III What you know That there are big placement styles: iterative, recursive, direct Placement via iterative improvement using simulated annealing Recursive-style
More informationHOW TO USE THIS BOOK... V 1 GETTING STARTED... 2
TABLE OF CONTENTS HOW TO USE THIS BOOK...................... V 1 GETTING STARTED.......................... 2 Introducing Data Analysis with Excel...2 Tour the Excel Window...3 Explore the Ribbon...4 Using
More informationEfficient Range Query Processing on Uncertain Data
Efficient Range Query Processing on Uncertain Data Andrew Knight Rochester Institute of Technology Department of Computer Science Rochester, New York, USA andyknig@gmail.com Manjeet Rege Rochester Institute
More informationTemporal Aggregation and Join
TSDB15, SL05 1/49 A. Dignös Temporal and Spatial Database 2014/2015 2nd semester Temporal Aggregation and Join SL05 Temporal aggregation Span, instant, moving window aggregation Aggregation tree, balanced
More informationDifferentially Private H-Tree
GeoPrivacy: 2 nd Workshop on Privacy in Geographic Information Collection and Analysis Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media System Center University of Southern
More informationAn Efficient Transformation for Klee s Measure Problem in the Streaming Model Abstract Given a stream of rectangles over a discrete space, we consider the problem of computing the total number of distinct
More informationCluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6
Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Fall 2013 Reading: Chapter 3 Han, Chapter 2 Tan Anca Doloc-Mihu, Ph.D. Some slides courtesy of Li Xiong, Ph.D. and 2011 Han, Kamber & Pei. Data Mining. Morgan Kaufmann.
More informationQuery Optimization. Vishy Poosala Bell Labs
Query Optimization Vishy Poosala Bell Labs 1 Outline Introduction Necessary Details Cost Estimation Result Size Estimation Standard approach for query optimization Other ways Related Concepts 2 Given a
More informationAn Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm
An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm Jared Vicory and M. Emre Celebi Department of Computer Science Louisiana State University, Shreveport, LA, USA ABSTRACT Gray-level
More informationWavelet-Based Histograms for Selectivity Estimation
Wavelet-Based Histograms for Selectivity Estimation Yossi Matias Department of Computer Science Tel Aviv University Tel Aviv 69978, Israel matias+www@math.tau.ac.il Jeffrey Scott Vitter Purdue University
More informationData Analytics for. Transmission Expansion Planning. Andrés Ramos. January Estadística II. Transmission Expansion Planning GITI/GITT
Data Analytics for Andrés Ramos January 2018 1 1 Introduction 2 Definition Determine which lines and transformers and when to build optimizing total investment and operation costs 3 Challenges for TEP
More informationImage Segmentation. Shengnan Wang
Image Segmentation Shengnan Wang shengnan@cs.wisc.edu Contents I. Introduction to Segmentation II. Mean Shift Theory 1. What is Mean Shift? 2. Density Estimation Methods 3. Deriving the Mean Shift 4. Mean
More informationClustering, Histograms, Sampling, MDS, and PCA
Clustering, Histograms, Sampling, MDS, and PCA Class 11 1 Recall: The MRV Model 2 1 Recall: Simplification Simplification operators - today! Simplification operands Data space (structure level) Data item
More informationFractal Compression. Related Topic Report. Henry Xiao. Queen s University. Kingston, Ontario, Canada. April 2004
Fractal Compression Related Topic Report By Henry Xiao Queen s University Kingston, Ontario, Canada April 2004 Fractal Introduction Fractal is first introduced in geometry field. The birth of fractal geometry
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More information