CS 347 Parallel and Distributed Data Processing

Size: px
Start display at page:

Download "CS 347 Parallel and Distributed Data Processing"

Transcription

1 C 37 Parallel and Distributed Data Processing pring 2016 Notes : Query Optimization Query Optimization Cost estimation trategies for exploring plans Q min C 37 Notes 2 Based on estimating result sizes Like in centralized databases But # of IOs may not be the best metric E.g., transmission time may dominate work at site 1 1 work at site 2 2 answer time/$ C 37 Notes 3 C 37 Notes

2 Another reason why # of IOs is not enough: parallelism Plan A Plan B Cost metrics E.g., IOs, bytes transmitted, $, Additive ask IOs site 1 site 2 site 3 50 IOs 70 IOs 50 IOs esponse time metric Not additive Need scheduling and dependency info ask 2 ask 3 kew is important C 37 Notes 5 C 37 Notes 6 Also take into account esponse time example tart up cost Data distribution cost/time esource contention (for memory, disk, network) Cost of assembling results site 1 site 2 site 3 site start up distribution searching + sending results final processing C 37 Notes 7 C 37 Notes 8

3 earch trategies Exhaustive (with pruning) Hill climbing (greedy) Query separation Exhaustive earch Consider all query plans (given a set of techniques for operators) Prune some plans Heuristics C 37 Notes 9 C 37 Notes 10 Exhaustive earch earch trategies Example A B > > ( ) ( ) In generating plans, keep goal in mind E.g., if goal is parallelism (in system with fast network) Consider partitioning relations first 1 2 ship to semi join Prune because cross-product not necessary Prune because larger relation first ship to semi join E.g., if goal is reduction of network traffic Consider semi-joins C 37 Notes 11 C 37 Notes 12

4 Better plans 2 Better plans 1 x initial plan 1 x initial plan Worse plans Worse plans C 37 Notes 13 C 37 Notes 1 Example V A B C relation site size V 0 tuple size = 1 Goal: minimize data transmission Initial plan end relations to one site What site do we send all relations to? o site 1: cost = = 90 o site 2: cost = = 80 o site 3: cost = = 70 o site : cost = = 60 C 37 Notes 15 C 37 Notes 16

5 P 0 (1 ) (2 ) (3 ) Compute V at site Local search Consider sending each relation to neighbor C 37 Notes 17 C 37 Notes 18 Assume size = 20 size = 5 size V = 1 Option A Option B cost = 30 cost = 0 Worse off cost = 30 cost = 30 No savings C 37 Notes 19 C 37 Notes 20

6 Option C Option D cost = 50 cost = 35 Win cost = 50 cost = 25 Bigger win C 37 Notes 21 C 37 Notes 22 P 1 (2 3) α = (1 ) (3 ) Compute answer at site epeat local search reat α = as relation α 1 3 vs. α 1 3 α α 1 3 C 37 Notes 23 C 37 Notes 2

7 Hill climbing may miss best plan E.g., best plan could be P best (3 ) β = V β ( 2) β' = β β' (2 1) β'' = β' β'' (1 ) (optional) Compute answer C 37 Notes 25 β'' 3 β' V β Hill climbing may miss best plan E.g., best plan could be P best (3 ) β = V β ( 2) β' = β β' (2 1) β'' = β' β'' (1 ) (optional) Compute answer = 30 = 1 = 1 = 1 = 33 total C 37 Notes 26 β'' 3 β' V β Costs could be low because β is very selective earch trategies Exhaustive (with pruning) Hill climbing (greedy) Query separation Query eparation eparate query into 2 or more steps Optimize each step independently C 37 Notes 27 C 37 Notes 28

8 Query eparation Query eparation σc1 Example imple queries technique 1. Compute = A [ σ c2 ] = A [ σ c3 ] σc2 A σc3 σc1 A 2. Compute J = 3. Compute answer σ c1 { [ J σ c2 ] [ J σ c3 ] } σc2 σc3 C 37 Notes 29 C 37 Notes 30 Query eparation 1. Compute = A [ σ c2 ] = A [ σ c3 ] 2. Compute J = 3. Compute answer σ c1 { [ J σ c2 ] [ J σ c3 ] } Compute the A values in the answer first Query eparation imple query elations have a single attribute Output has a single attribute E.g., J = Get tuples from sites matching A and compute answer next C 37 Notes 31 C 37 Notes 32

9 Query eparation Idea 1. Decompose query Local processing imple query (or queries) Final processing Query eparation Philosophy Hard part is distributed join Do this part with only keys; get the rest of the data later impler to optimize simple queries 2. Optimize simple query C 37 Notes 33 C 37 Notes 3 ummary Cost estimation Optimization strategies Exhaustive (with pruning) Hill climbing (greedy) Query separation Words of Wisdom Optimization is like chess playing May have to make sacrifices for later gains Move data, partition relations, build indexes C 37 Notes 35 C 37 Notes 36

Query optimization. Query Optimization. Query Optimization. Cost estimation. Another reason why plain IOs not enough: Parallelism

Query optimization. Query Optimization. Query Optimization. Cost estimation. Another reason why plain IOs not enough: Parallelism Query optimization Query Optimization It is safer to accept any chance that offers itself, and extemporize a procedure to fit it, than to get a good plan matured, and wait for a chance of using it. Thomas

More information

σ (R.B = 1 v R.C > 3) (S.D = 2) Conjunctive normal form Topics for the Day Distributed Databases Query Processing Steps Decomposition

σ (R.B = 1 v R.C > 3) (S.D = 2) Conjunctive normal form Topics for the Day Distributed Databases Query Processing Steps Decomposition Topics for the Day Distributed Databases Query processing in distributed databases Localization Distributed query operators Cost-based optimization C37 Lecture 1 May 30, 2001 1 2 Query Processing teps

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing C 347 Parallel and Distributed Data Processing pring 2016 Query Processing Decomposition Localization Optimization Notes 3: Query Processing C 347 Notes 3 2 Decomposition ame as in centralized system 1.

More information

Optimization of Distributed Queries

Optimization of Distributed Queries Query Optimization Optimization of Distributed Queries Issues in Query Optimization Joins and Semijoins Query Optimization Algorithms Centralized query optimization: Minimize the cots function Find (the

More information

Scheduling Strategies for Processing Continuous Queries Over Streams

Scheduling Strategies for Processing Continuous Queries Over Streams Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Scheduling Strategies for Processing Continuous Queries Over Streams Qingchun Jiang, Sharma Chakravarthy

More information

Introduction Background Distributed DBMS Architecture Distributed Database Design Semantic Data Control Distributed Query Processing

Introduction Background Distributed DBMS Architecture Distributed Database Design Semantic Data Control Distributed Query Processing Outline Introduction Background Distributed DBMS Architecture Distributed Database Design Semantic Data Control Distributed Query Processing Query Processing Methodology Distributed Query Optimization

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement. COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations

More information

Algorithm Design Techniques. Hwansoo Han

Algorithm Design Techniques. Hwansoo Han Algorithm Design Techniques Hwansoo Han Algorithm Design General techniques to yield effective algorithms Divide-and-Conquer Dynamic programming Greedy techniques Backtracking Local search 2 Divide-and-Conquer

More information

Introduction Background Distributed DBMS Architecture Distributed Database Design Semantic Data Control Distributed Query Processing

Introduction Background Distributed DBMS Architecture Distributed Database Design Semantic Data Control Distributed Query Processing Outline Introduction Background Distributed DBMS Architecture Distributed Database Design Semantic Data Control Distributed Query Processing Query Processing Methodology Distributed Query Optimization

More information

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. May, 30, CPSC 322, Lecture 14 Slide 1

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. May, 30, CPSC 322, Lecture 14 Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) May, 30, 2017 CPSC 322, Lecture 14 Slide 1 Announcements Assignment1 due now! Assignment2 out today CPSC 322, Lecture 10 Slide 2 Lecture

More information

CS 188: Artificial Intelligence. Recap Search I

CS 188: Artificial Intelligence. Recap Search I CS 188: Artificial Intelligence Review of Search, CSPs, Games DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered. You

More information

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN NonMetric Data Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large

More information

Fundamentals of Database Systems

Fundamentals of Database Systems Fundamentals of Database Systems Assignment: 4 September 21, 2015 Instructions 1. This question paper contains 10 questions in 5 pages. Q1: Calculate branching factor in case for B- tree index structure,

More information

Magda Balazinska - CSE 444, Spring

Magda Balazinska - CSE 444, Spring Query Optimization Algorithm CE 444: Database Internals Lectures 11-12 Query Optimization (part 2) Enumerate alternative plans (logical & physical) Compute estimated cost of each plan Compute number of

More information

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk

More information

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Query Processing. Introduction to Databases CompSci 316 Fall 2017 Query Processing Introduction to Databases CompSci 316 Fall 2017 2 Announcements (Tue., Nov. 14) Homework #3 sample solution posted in Sakai Homework #4 assigned today; due on 12/05 Project milestone #2

More information

Geometric Routing: Of Theory and Practice

Geometric Routing: Of Theory and Practice Geometric Routing: Of Theory and Practice PODC 03 F. Kuhn, R. Wattenhofer, Y. Zhang, A. Zollinger [KWZ 02] [KWZ 03] [KK 00] Asymptotically Optimal Geometric Mobile Ad-Hoc Routing Worst-Case Optimal and

More information

Lecture overview. Knowledge-based systems in Bioinformatics, 1MB602, Goal based agents. Search terminology. Specifying a search problem

Lecture overview. Knowledge-based systems in Bioinformatics, 1MB602, Goal based agents. Search terminology. Specifying a search problem Lecture overview Knowledge-based systems in ioinformatics, 1M602, 2006 Lecture 6: earch oal based agents earch terminology pecifying a search problem earch considerations Uninformed search euristic methods

More information

Efficiency vs. Effectiveness in Terabyte-Scale IR

Efficiency vs. Effectiveness in Terabyte-Scale IR Efficiency vs. Effectiveness in Terabyte-Scale Information Retrieval Stefan Büttcher Charles L. A. Clarke University of Waterloo, Canada November 17, 2005 1 2 3 4 5 6 What is Wumpus? Multi-user file system

More information

Database Management Systems

Database Management Systems Database Management Systems Distributed Databases Doug Shook What does it mean to be distributed? Multiple nodes connected by a network Data on the nodes is logically related The nodes do not need to be

More information

Implementation of Relational Operations

Implementation of Relational Operations Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows

More information

Evaluation of Relational Operations. Relational Operations

Evaluation of Relational Operations. Relational Operations Evaluation of Relational Operations Chapter 14, Part A (Joins) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Relational Operations v We will consider how to implement: Selection ( )

More information

Xiaoqing Zhu, Sangeun Han and Bernd Girod Information Systems Laboratory Department of Electrical Engineering Stanford University

Xiaoqing Zhu, Sangeun Han and Bernd Girod Information Systems Laboratory Department of Electrical Engineering Stanford University Congestion-aware Rate Allocation For Multipath Video Streaming Over Ad Hoc Wireless Networks Xiaoqing Zhu, Sangeun Han and Bernd Girod Information Systems Laboratory Department of Electrical Engineering

More information

Schedulability Analysis of the Linux Push and Pull Scheduler with Arbitrary Processor Affinities

Schedulability Analysis of the Linux Push and Pull Scheduler with Arbitrary Processor Affinities Schedulability Analysis of the Linux Push and Pull Scheduler with Arbitrary Processor Affinities Arpan Gujarati, Felipe Cerqueira, and Björn Brandenburg Multiprocessor real-time scheduling theory Global

More information

Chapter 4 Distributed Query Processing

Chapter 4 Distributed Query Processing Chapter 4 Distributed Query Processing Table of Contents Overview of Query Processing Query Decomposition and Data Localization Optimization of Distributed Queries Chapter4-1 1 1. Overview of Query Processing

More information

Heuristic Optimisation

Heuristic Optimisation Heuristic Optimisation Part 3: Classification of algorithms. Exhaustive search Sándor Zoltán Németh http://web.mat.bham.ac.uk/s.z.nemeth s.nemeth@bham.ac.uk University of Birmingham S Z Németh (s.nemeth@bham.ac.uk)

More information

Query Processing and Query Optimization. Prof Monika Shah

Query Processing and Query Optimization. Prof Monika Shah Query Processing and Query Optimization Query Processing SQL Query Is in Library Cache? System catalog (Dict / Dict cache) Scan and verify relations Parse into parse tree (relational Calculus) View definitions

More information

Semantic Optimization of Preference Queries

Semantic Optimization of Preference Queries Semantic Optimization of Preference Queries Jan Chomicki University at Buffalo http://www.cse.buffalo.edu/ chomicki 1 Querying with Preferences Find the best answers to a query, instead of all the answers.

More information

Chapter 5: Algorithms and Heuristics. CS105: Great Insights in Computer Science

Chapter 5: Algorithms and Heuristics. CS105: Great Insights in Computer Science Chapter 5: Algorithms and Heuristics CS105: Great Insights in Computer Science Last Time... Selection Sort - Mentioned Bubble Sort Binary Search Sort - Based on lg(n) QuickSort Guess Who? Each player picks

More information

DFS. Depth-limited Search

DFS. Depth-limited Search DFS Completeness? No, fails in infinite depth spaces or spaces with loops Yes, assuming state space finite. Time complexity? O(b m ), terrible if m is much bigger than d. can do well if lots of goals Space

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

Extending Simple Tabular Reduction with Short Supports. Christopher Jefferson, Peter Nightingale University of St Andrews

Extending Simple Tabular Reduction with Short Supports. Christopher Jefferson, Peter Nightingale University of St Andrews Extending Simple Tabular Reduction with Short Supports Christopher Jefferson, Peter Nightingale University of St Andrews Constraints, GAC Suppose we have finite-domain variables x 1, x 2, x 3 with domains

More information

Set 2: State-spaces and Uninformed Search. ICS 271 Fall 2015 Kalev Kask

Set 2: State-spaces and Uninformed Search. ICS 271 Fall 2015 Kalev Kask Set 2: State-spaces and Uninformed Search ICS 271 Fall 2015 Kalev Kask You need to know State-space based problem formulation State space (graph) Search space Nodes vs. states Tree search vs graph search

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. Oct, 7, CPSC 322, Lecture 14 Slide 1

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. Oct, 7, CPSC 322, Lecture 14 Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) Oct, 7, 2013 CPSC 322, Lecture 14 Slide 1 Department of Computer Science Undergraduate Events More details @ https://www.cs.ubc.ca/students/undergrad/life/upcoming-events

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part V Lecture 15, March 15, 2015 Mohammad Hammoud Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+ Tree) and Hash-based (i.e., Extendible

More information

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction C12/1 Introduction to Compilers and Translators pring 00 Outline More tips forll1) grammars Bottom-up parsing LR0) parser construction Lecture 5: Bottom-up parsing Lecture 5 C 12/1 pring '00 Andrew Myers

More information

Classification Using Genetic Programming. Patrick Kellogg General Assembly Data Science Course (8/23/15-11/12/15)

Classification Using Genetic Programming. Patrick Kellogg General Assembly Data Science Course (8/23/15-11/12/15) Classification Using Genetic Programming Patrick Kellogg General Assembly Data Science Course (8/23/15-11/12/15) Iris Data Set Iris Data Set Iris Data Set Iris Data Set Iris Data Set Create a geometrical

More information

DATABASE DESIGN II - 1DL400

DATABASE DESIGN II - 1DL400 DTSE DESIGN II - 1DL400 Fall 2016 course on modern database systems http://www.it.uu.se/research/group/udbl/kurser/dii_ht16/ Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Today More about Trees. Introduction to Computers and Programming. Spanning trees. Generic search algorithm. Prim s algorithm Kruskal s algorithm

Today More about Trees. Introduction to Computers and Programming. Spanning trees. Generic search algorithm. Prim s algorithm Kruskal s algorithm Introduction to omputers and Programming Prof. I. K. Lundqvist Lecture 8 pril Today More about Trees panning trees Prim s algorithm Kruskal s algorithm eneric search algorithm epth-first search example

More information

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML Decision Trees Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week. Database Systems ( 料 ) December 13/14, 2006 Lecture #10 1 Announcement Assignment #4 is due next week. 2 1 Overview of Query Evaluation Chapter 12 3 Outline Query evaluation (Overview) Relational Operator

More information

Reducing Multiclass to Binary. LING572 Fei Xia

Reducing Multiclass to Binary. LING572 Fei Xia Reducing Multiclass to Binary LING572 Fei Xia 1 Highlights What? Converting a k-class problem to a binary problem. Why? For some ML algorithms, a direct extension to the multiclass case may be problematic.

More information

Chapter 2: Intro to Relational Model

Chapter 2: Intro to Relational Model Non è possibile visualizzare l'immagine. Chapter 2: Intro to Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Example of a Relation attributes (or columns)

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University HW2 Q1.1 parts (b) and (c) cancelled. HW3 released. It is long. Start early. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/26/17 Jure Leskovec, Stanford

More information

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky Administriva Lab 2 Final version due next Wednesday CS 133: Databases Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky Problem sets PSet 5 due today No PSet out this week optional practice

More information

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory

More information

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model Web Science & Technologies University of Koblenz Landau, Germany Relational Data Model Overview Relational data model; Tuples and relations; Schemas and instances; Named vs. unnamed perspective; Relational

More information

Assessing Information Quality for the Composite Relational Operation Join

Assessing Information Quality for the Composite Relational Operation Join Proceedings of the eventh International Conference on Information Quality (ICIQ-02) Assessing Information Quality for the Composite elational Operation Join Amir Parssian umit arkar and Varghese Jacob

More information

Efficient and Effective Clustering Methods for Spatial Data Mining. Raymond T. Ng, Jiawei Han

Efficient and Effective Clustering Methods for Spatial Data Mining. Raymond T. Ng, Jiawei Han Efficient and Effective Clustering Methods for Spatial Data Mining Raymond T. Ng, Jiawei Han 1 Overview Spatial Data Mining Clustering techniques CLARANS Spatial and Non-Spatial dominant CLARANS Observations

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014 Distributed DBMS Advantages and disadvantages of distributed databases. Functions of DDBMS. Distributed database design. Distributed Database A logically interrelated collection of shared data (and a description

More information

DBMS Query evaluation

DBMS Query evaluation Data Management for Data Science DBMS Maurizio Lenzerini, Riccardo Rosati Corso di laurea magistrale in Data Science Sapienza Università di Roma Academic Year 2016/2017 http://www.dis.uniroma1.it/~rosati/dmds/

More information

Finding optimal configurations Adversarial search

Finding optimal configurations Adversarial search CS 171 Introduction to AI Lecture 10 Finding optimal configurations Adversarial search Milos Hauskrecht milos@cs.pitt.edu 39 Sennott Square Announcements Homework assignment is out Due on Thursday next

More information

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

More information

An Ant-Based Routing Algorithm to Achieve the Lifetime Bound for Target Tracking Sensor Networks

An Ant-Based Routing Algorithm to Achieve the Lifetime Bound for Target Tracking Sensor Networks An Ant-Based Routing Algorithm to Achieve the Lifetime Bound for Target Tracking Sensor Networks Peng Zeng Cuanzhi Zang Haibin Yu Shenyang Institute of Automation Chinese Academy of Sciences Target Tracking

More information

On Graph Query Optimization in Large Networks

On Graph Query Optimization in Large Networks On Graph Query Optimization in Large Networks Peixiang Zhao, Jiawei Han Department of omputer Science University of Illinois at Urbana-hampaign pzhao4@illinois.edu, hanj@cs.uiuc.edu September 14th, 2010

More information

Feature Subset Selection using Clusters & Informed Search. Team 3

Feature Subset Selection using Clusters & Informed Search. Team 3 Feature Subset Selection using Clusters & Informed Search Team 3 THE PROBLEM [This text box to be deleted before presentation Here I will be discussing exactly what the prob Is (classification based on

More information

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20 Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine

More information

Potential Midterm Exam Questions

Potential Midterm Exam Questions Potential Midterm Exam Questions 1. What are the four ways in which AI is usually viewed? Which of the four is the preferred view of the authors of our textbook? 2. What does each of the lettered items

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+

More information

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A

More information

Intelligent Agents. Introduction to Heuristic Search. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University

Intelligent Agents. Introduction to Heuristic Search. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University Intelligent Agents Introduction to Heuristic earch Ute chmid Cognitive ystems, Applied Computer cience, amberg University last change: June 22, 2010 U. chmid (Cogys) Intelligent Agents last change: June

More information

Artificial Intelligence (part 4c) Strategies for State Space Search. (Informed..Heuristic search)

Artificial Intelligence (part 4c) Strategies for State Space Search. (Informed..Heuristic search) Artificial Intelligence (part 4c) Strategies for State Space Search (Informed..Heuristic search) Search Strategies (The Order..) Uninformed Search breadth-first depth-first iterative deepening uniform-cost

More information

Coping with the Limitations of Algorithm Power Exact Solution Strategies Backtracking Backtracking : A Scenario

Coping with the Limitations of Algorithm Power Exact Solution Strategies Backtracking Backtracking : A Scenario Coping with the Limitations of Algorithm Power Tackling Difficult Combinatorial Problems There are two principal approaches to tackling difficult combinatorial problems (NP-hard problems): Use a strategy

More information

Advanced Databases: Parallel Databases A.Poulovassilis

Advanced Databases: Parallel Databases A.Poulovassilis 1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger

More information

Interactive segmentation, Combinatorial optimization. Filip Malmberg

Interactive segmentation, Combinatorial optimization. Filip Malmberg Interactive segmentation, Combinatorial optimization Filip Malmberg But first... Implementing graph-based algorithms Even if we have formulated an algorithm on a general graphs, we do not neccesarily have

More information

Relational Algebra Equivalencies. Database Systems: The Complete Book Ch

Relational Algebra Equivalencies. Database Systems: The Complete Book Ch elational Algebra Equivalencies Database ystems: he Complete Book Ch. 16.2-16.3 Implementing: Joins olution 1 (Nested-Loop) For Each (a in A) { For Each (b in B) { emit (a, b); }} A B 2 Implementing: Joins

More information

CIS 192: Artificial Intelligence. Search and Constraint Satisfaction Alex Frias Nov. 30 th

CIS 192: Artificial Intelligence. Search and Constraint Satisfaction Alex Frias Nov. 30 th CIS 192: Artificial Intelligence Search and Constraint Satisfaction Alex Frias Nov. 30 th What is AI? Designing computer programs to complete tasks that are thought to require intelligence 4 categories

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization References Access path selection in a relational database management system. Selinger. et.

More information

Two-player Games ZUI 2016/2017

Two-player Games ZUI 2016/2017 Two-player Games ZUI 2016/2017 Branislav Bošanský bosansky@fel.cvut.cz Two Player Games Important test environment for AI algorithms Benchmark of AI Chinook (1994/96) world champion in checkers Deep Blue

More information

ITCS 6150 Intelligent Systems. Lecture 6 Informed Searches

ITCS 6150 Intelligent Systems. Lecture 6 Informed Searches ITCS 6150 Intelligent Systems Lecture 6 Informed Searches Compare two heuristics Compare these two heuristics h 2 is always better than h 1 for any node, n, h 2 (n) >= h 1 (n) h 2 dominates h 1 Recall

More information

Incremental Query Optimization

Incremental Query Optimization Incremental Query Optimization Vipul Venkataraman Dr. S. Sudarshan Computer Science and Engineering Indian Institute of Technology Bombay Outline Introduction Volcano Cascades Incremental Optimization

More information

CSE 530 Midterm Exam

CSE 530 Midterm Exam CSE 530 Midterm Exam Name: (Print CLEARLY) Question Points Possible Points Earned 1 25 2 10 3 20 4 20 5 15 Total 90 1 Question 1 Heap Files Suppose we want to create a heap file with a page size of 512

More information

Artificial Intelligence. Chapters Reviews. Readings: Chapters 3-8 of Russell & Norvig.

Artificial Intelligence. Chapters Reviews. Readings: Chapters 3-8 of Russell & Norvig. Artificial Intelligence Chapters Reviews Readings: Chapters 3-8 of Russell & Norvig. Topics covered in the midterm Solving problems by searching (Chap. 3) How to formulate a search problem? How to measure

More information

GRASP. Greedy Randomized Adaptive. Search Procedure

GRASP. Greedy Randomized Adaptive. Search Procedure GRASP Greedy Randomized Adaptive Search Procedure Type of problems Combinatorial optimization problem: Finite ensemble E = {1,2,... n } Subset of feasible solutions F 2 Objective function f : 2 Minimisation

More information

CS154, Lecture 18: PCPs, Hardness of Approximation, Approximation-Preserving Reductions, Interactive Proofs, Zero-Knowledge, Cold Fusion, Peace in

CS154, Lecture 18: PCPs, Hardness of Approximation, Approximation-Preserving Reductions, Interactive Proofs, Zero-Knowledge, Cold Fusion, Peace in CS154, Lecture 18: PCPs, Hardness of Approximation, Approximation-Preserving Reductions, Interactive Proofs, Zero-Knowledge, Cold Fusion, Peace in the Middle East There are thousands of NP-complete problems

More information

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R

More information

Routing Outline. EECS 122, Lecture 15

Routing Outline. EECS 122, Lecture 15 Fall & Walrand Lecture 5 Outline EECS, Lecture 5 Kevin Fall kfall@cs.berkeley.edu Jean Walrand wlr@eecs.berkeley.edu Definition/Key Questions Distance Vector Link State Comparison Variations EECS - Fall

More information

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1 CMPUT 391 Database Management Systems Query Processing: The Basics Textbook: Chapter 10 (first edition: Chapter 13) Based on slides by Lewis, Bernstein and Kifer University of Alberta 1 External Sorting

More information

Search and Optimization

Search and Optimization Search and Optimization Search, Optimization and Game-Playing The goal is to find one or more optimal or sub-optimal solutions in a given search space. We can either be interested in finding any one solution

More information

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem An Evolutionary Algorithm for the Multi-objective Shortest Path Problem Fangguo He Huan Qi Qiong Fan Institute of Systems Engineering, Huazhong University of Science & Technology, Wuhan 430074, P. R. China

More information

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology Mobile and Heterogeneous databases Distributed Database System Query Processing A.R. Hurson Computer Science Missouri Science & Technology 1 Note, this unit will be covered in four lectures. In case you

More information

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost

More information

Advanced algorithms. topological ordering, minimum spanning tree, Union-Find problem. Jiří Vyskočil, Radek Mařík 2012

Advanced algorithms. topological ordering, minimum spanning tree, Union-Find problem. Jiří Vyskočil, Radek Mařík 2012 topological ordering, minimum spanning tree, Union-Find problem Jiří Vyskočil, Radek Mařík 2012 Subgraph subgraph A graph H is a subgraph of a graph G, if the following two inclusions are satisfied: 2

More information

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld ) Local Search and Optimization Chapter 4 Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld ) 1 Outline Local search techniques and optimization Hill-climbing

More information

Selectivity Estimation for Extraction Operators over Text Data

Selectivity Estimation for Extraction Operators over Text Data Selectivity Estimation for Extraction Operators over Text Data Daisy Zhe Wang, Long Wei, Yunyao Li, Frederick Reiss, and Shivakumar Vaithyanathan University of California, Berkeley and IBM Almaden Research

More information

Semi-Independent Partitioning: A Method for Bounding the Solution to COP s

Semi-Independent Partitioning: A Method for Bounding the Solution to COP s Semi-Independent Partitioning: A Method for Bounding the Solution to COP s David Larkin University of California, Irvine Abstract. In this paper we introduce a new method for bounding the solution to constraint

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Query Optimization 26 March 2012 Prof. Chris Clifton Query Optimization --> Generating and comparing plans Query Generate Plans Pruning x x Estimate Cost Cost Select Pick Min

More information

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE COMP 62421 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Querying Data on the Web Date: Wednesday 24th January 2018 Time: 14:00-16:00 Please answer all FIVE Questions provided. They amount

More information

A Fast Randomized Algorithm for Multi-Objective Query Optimization

A Fast Randomized Algorithm for Multi-Objective Query Optimization A Fast Randomized Algorithm for Multi-Objective Query Optimization Immanuel Trummer and Christoph Koch {firstname}.{lastname}@epfl.ch École Polytechnique Fédérale de Lausanne ABSTRACT Query plans are compared

More information

LR(0) Parsing Summary. LR(0) Parsing Table. LR(0) Limitations. A Non-LR(0) Grammar. LR(0) Parsing Table CS412/CS413

LR(0) Parsing Summary. LR(0) Parsing Table. LR(0) Limitations. A Non-LR(0) Grammar. LR(0) Parsing Table CS412/CS413 LR(0) Parsing ummary C412/C41 Introduction to Compilers Tim Teitelbaum Lecture 10: LR Parsing February 12, 2007 LR(0) item = a production with a dot in RH LR(0) state = set of LR(0) items valid for viable

More information

Parallel Query Optimisation

Parallel Query Optimisation Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation

More information

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept

More information

Graph and Heuristic Search. Lecture 3. Ning Xiong. Mälardalen University. Agenda

Graph and Heuristic Search. Lecture 3. Ning Xiong. Mälardalen University. Agenda Graph and Heuristic earch Lecture 3 Ning iong Mälardalen University Agenda Uninformed graph search - breadth-first search on graphs - depth-first search on graphs - uniform-cost search on graphs General

More information

Constraint Satisfaction Problems (CSPs) Lecture 4 - Features and Constraints. CSPs as Graph searching problems. Example Domains. Dual Representations

Constraint Satisfaction Problems (CSPs) Lecture 4 - Features and Constraints. CSPs as Graph searching problems. Example Domains. Dual Representations Constraint Satisfaction Problems (CSPs) Lecture 4 - Features and Constraints Jesse Hoey School of Computer Science University of Waterloo January 22, 2018 Readings: Poole & Mackworth (2nd d.) Chapt. 4.1-4.8

More information