Query Processing Strategies in Distributed Database
|
|
- Lillian Farmer
- 6 years ago
- Views:
Transcription
1 Query Processing Strategies in Distributed Database Kunal Jamsutkar, M.Tech, Department of Computer Engineering and Information Technology, V.J.T.I., Mumbai Viki Patil, M.Tech, Department of Computer Engineering and Information Technology, V.J.T.I., Mumbai Dr.B.B.Meshram, Professor, Department of Computer Engineering and Information Technology, V.J.T.I., Mumbai ABSTRACT Query optimization is an important part of database management system. In this paper, through the research on query optimization technology, based on a number of optimization algorithms commonly used in distributed query, It aims to arrive at an optimal query processing plan for a given distributed query. As per the approach, the query plans having the required data residing close to each other are considered more efficient and, therefore, these generated query plans would result in efficient query processing.. Keywords: Database, query processing, distributed query strategy, system model, query processing cost, cost measures. Introduction In recent years, with the development of computer network and database technology, distributed database is more and more widely used; with the expanding application, data queries are increasingly complex, the efficiency requests are increasingly high, so query processing is a key issue of the distributed database system. In a distributed database environment, data stored at different sites connected through network. A distributed database management systems (DDBMS) support creation and maintenance of distributed database. The research literature proposes a wide variety of query optimization algorithms. Yu/chang give comprehensive overviews on various query optimization techniques for distribute database management system [20]. However, these overviews do not attempt to develop a model of query optimization that explains and presents the algorithms in a uniform way. This understanding in case we want to change or extend existing algorithms to adapt them to new requirements. In this research we consider query processing algorithms for a Distributed Database system. There has been many research done on distributed query processing methods (see [2],[3]). Increased reliability and performance can also be attained with a distributed database. All database systems must be able to respond to requests for information from the user i.e. process queries. How a DBMS processes queries and the methods it uses to optimize their performance are topics that will be covered in this paper. In certain sections of this paper, various concepts will be illustrated with an example. Since many optimization algorithms differ in their computational behavior while reflecting aspects of the implementation environment at the same time, it is the purpose of this paper to understand all of them by few simple concepts. Finally, we summaries our findings and discuss future work. General aspects of optimization To provide a better understanding of what we mean by the term query, query processing and query optimization. Further we discuss the algorithms of query optimization that can found in all optimization algorithms described in the papers. Definitions And Examples A. What Is A Query? A database query is the instructing a DBMS to update or retrieve specific data to/from the physically stored medium. The actual updating and retrieval of data is performed through various low-level operations. Examples of such operations for a relational DBMS can be relational algebra operations such as project, join, select,cartesian product, etc. B. The Query Processor There are three phases that a query passes through during the DBMS processing of that query: 1. Parsing and translation 2. Optimization 3. Evaluation Most queries submitted to a DBMS are in a highlevel language such as SQL. During the parsing and translation stage, the human readable form of the query is translated into forms usable by the DBMS. These can be in the forms of a relational algebra Blue Ocean Research Journals 71
2 expression, query tree and query graph. Consider the following SQL query: SELECT make FROM vehicles WHERE make = Toyota. This can be translated into the following relational algebra expressions: ( π make (vehicles)) make (vehicles)) And represented as a query graph: Toyota Make= Camaro Fig 2. Query Graph vehicles After parsing and translation into a relational algebra expression, the query is then transformed into a form, usually a query tree or graph that can be handled by the optimization engine. The optimization engine then performs various analyses on the query data, generating a number of valid evaluation plans. From there, it determines the most appropriate evaluation plan to execute. After the evaluation plan has been selected, it is passed into the DMBS query-execution engine (also referred to as the runtime database processor), where the plan is executed and the results are returned. B.1- Parsing and Translating the Query The first step in processing a query submitted to a DBMS is to convert the query into a form usable by the query processing engine. High-level query languages such as SQL represent a query as a string, or sequence, of characters. Certain sequences of characters represent various types of tokens such as keywords, operators, operands, literal strings, etc. Like all languages, there are rules (syntax and grammar) that govern how the tokens can be combined into understandable (i.e. valid) statements. The primary job of the parser is to extract the tokens from the raw string of characters and translate them into the corresponding internal data elements (i.e. relational algebra operations and operands) and structures (i.e. query tree, query graph).the last job of the parser is to verify the validity and syntax of the original query string. B.2- Optimizing the Query In this stage, the query processor applies rules to the internal data structures of the query to transform these structures into equivalent, but more efficient representations. The rules can be based upon mathematical models of the relational algebra expression and tree (heuristics), upon cost estimates of different algorithms applied to operations or upon the semantics within the query and the relations it involves. Selecting the proper rules to apply, when to apply them and how they are applied is the function of the query optimization engine. B.3- Evaluating the Query The final step in processing a query is the evaluation phase. The best evaluation plan candidate generated by the optimization engine is selected and then executed. (Note that there can exist multiple methods of executing a query. Besides processing a query in a simple sequential manner, some of a query s individual operations can be processed in parallel either as independent processes or as interdependent pipelines of processes or threads. Regardless of the method chosen, the actual results should be same.) C. Query Processing Query processing is defined as the activities involved in parsing, validating, optimizing and executing a query. The main aim of query processing is Transform query written in high-level language (e.g. SQL), into correct and efficient execution strategy expressed in low-level language (implementing Relational Algebra) and to find information in one or more databases and deliver it to the user quickly and efficiently. High level user query Query Processor Low level data manipulation commands Fig.3 Flow of Query Processing D. Query Optimization Query optimization is defined as the activity of choosing an efficient execution strategy for processing a query. Query optimization is a part of query processing. The main aims of query optimization are to choose a transformation that minimizes resource usage, Reduce total execution time of query and also reduce response time of query. Distributed Query Processing Methodology: Blue Ocean Research Journals 72
3 Journal of Engineering, Computers & Applied Sciences (JEC&AS) ISSN No: Distributed query processing contains four stagess which are as follows: 1. Query decomposition 2. Data localization 3. Global optimization 4. Local optimization. D.1- Query decomposition In this stage we are giving Calculus Query as an input and we are gettingg output as Algebraic Query. This stage is again divided in four stages they are Normalization, Restructuring Analysis, Simplification and Input: Calculus query on global relations Normalization Manipulate query quantifiers and qualification Analysis detects and rejects incorrect queries Possible for only a subset of relational calculus Simplification eliminate redundant predicates Restructuring calculus query ==> algebraic query More than one translation is possible use transformation rules. D.2- Data localization: in this stage Algebraic query on distributed relations is input and fragment query is output. In this stage fragment involvement is determined. D.3- Global optimization: in this stage Fragment Query is input and optimized fragment query is output. Finding best global schedule is done in this stage. D.4- Local optimization: Best global execution schedule is input and localized optimization queries are output in this stage. It containn two sub stages they are Select the bestt access path, Use the centralized optimization techniques. E. Distributed Query Optimization: Distributed query optimization is defined as finding efficient execution strategy path in distributed networks. Query optimization is difficult in distributed environment. There are three components of distributed query optimization they are Access Method, Join Criteria, and Transmission Costs. Access Method: The methods which are used to access data from distributed environment like hashing, indexing etc. Join Criteria: In distributed database data is presented in different sites. Join criteria is used to join the different sites to get optimized result. Transmission Costs: If data from multiple sitess must be joined to satisfy a single query, then the cost of transmitting the results from intermediate steps needs to bee factored into the equation. At times, it may be moree cost effectivee simply to ship entire tables across the network to enable processing to occur at a single site, thereby reducing overall transmission costs. This component of query optimization is an issue only in a distributed environment. There are many distributed query optimization issues somee of them are types of optimizers, optimization granularity, network topologies and optimization timing. Fig.4 Query processing methodology 3. Optimal Distribution Strategies for Simple Queries. Query optimization algorithms that derive optimal distribution strategies for a class of distributed queries called simple queries. Blue Ocean Research Journals 73
4 There are various algorithms are used for query optimization such as Algorithm PARALLEL [3] was used to derive a minimal response time distribution strategy for any given simple query. Algorithm SERIAL [3] strategy consists of transmitting each relation in a serial order. Algorithm GENERAL. Minimization of response time and total time is done by three different versions of the algorithm, which are A. Response Time Version B. Total Time Version C. Handling Redundant Data Transmission Algorithm-S is a static algorithm, as are PARALLEL, SERIAL, GENERAL, and D. In a static algorithm, the strategy is generated before any transmission or intersite joining takes place. Therefore, the algorithm must include some method for estimating the effect of a semijoin on the parameters. Related Work The query is decomposed into single-joining-attribute subqueries. Candidate schedules are generated for each subquery separately. There is an integration step but no synchronization step. By contrast, algorithm-s uses a more precise interpretation of attribute independence which takes into account forced reductions in the projected size of nonjoining attributes with low value multiplicity (keys, for instance). Since reductions are not restricted to single attributes, the decomposition into subqueries is no longer desirable and is not done. The integration step which follows is very similar to that of GENERAL. The final SYNCHRONIZE step is used to detect beneficial semi join delays which might have been missed because integrated schedules are generated for each relation separately. In modifying and extending GENERAL, we get different strategies which result in reduced costs. These substantial cost savings show up when using the response time minimization objective as well as the total time minimization objective. For most complex queries, algorithm-s provides the same the integrated schedules are chosen to be strategy whether the response-time or total-time version is used. A.1- AN OPTIMIZATION EXAMPLE Assume that the COURSE table and the ENROLLMENT table exist at Site 1; the STUDENT table exists at Site 2.If either all of the tables existed at a single site, or the DBMS supported distributed multi-site requests. However, if the DMBS cannot perform (or optimize) distributed multi-site requests, programmatic optimization must be performed. There are at least six different ways to go about optimizing this three-table join. Option 1: Start with Site 1 and join COURSE and ENROLLMENT, selecting only physics courses. For each qualifying row, move it to Site2 to be joined with STUDENT to see if any are seniors. Option 2: Start with Site 1 and join COURSE and ENROLLMENT, selecting only physics courses, and move the entire result set to Site 2 to be joined with STUDENT, checking for senior students only. Option 3: Start with Site 2 and select only seniors from STUDENT. For each of these examine the join of COURSE and ENROLLMENT at Site 1 for physics classes. Option 4: Start with Site 2 and select only seniors from STUDENT at Site 2, and move the entire result set to Site 1 to be joined with COURSE and ENROLLMENT, checking for physics classes only. Option 5: Move the COURSE and ENROLLMENT tables to Site 2 and proceed with a local three-table join. Option 6: Move the STUDENT to Site 1 and proceed with a local three-table join. Which of these six options will perform the best? Unfortunately, the only correct answer is "It depends." The optimal choice will depend upon: 1. the size of the tables; 2.the size of the result sets that is, the number of qualifying rows and their length in bytes; and 3.the efficiency of the network. B. THE ROLE OF INDEXES The utilization of indexes can dramatically reduce the execution time of various operations such as select and join. Let us review some of the types of index Blue Ocean Research Journals 74
5 file structures and the roles they play in reducing execution time and overhead: Dense Index: Data-file is ordered by the search key and every search key value has a separate index record. This structure requires only a single seek to find the first occurrence of a set of contiguous records with the desired search value. Sparse Index: Data-file is ordered by the index search key and only some of the search key values have corresponding index records. Each index record s data-file pointer points to the first data-file record with the search key value. While this structure can be less efficient (in terms of number of disk accesses) than a dense index to find the desired records, it requires less storage space and less overhead during insertion and deletion operations. Primary Index: The data file is ordered by the attribute that is also the search key in the index file. Primary indices can be dense or sparse. This is also referred to as an Index-Sequential File [5]. For scanning through a relation s records in sequential order by a key value, this is one of the fastest and more efficient structures -- locating a record has a cost of 1 seek, and the contiguous makeup of the records in sorted order minimizes the number of blocks that have to be read. However, after large numbers of insertions and deletions, the performance can degrade quite quickly, and the only way to restore the performance is to perform reorganization. Secondary Index: The data file is ordered by an attribute that is different from the search key in the index file. Secondary indices must be dense. Multi-Level Index: An index structure consisting of 2 or more tiers of records where an upper tier s records point to associated index records of the tier below. The bottom tier s index records contain the pointers to the data-file records. Multi-level indices can be used, for instance, to reduce the number of disk block reads needed during a binary search. Clustering Index: A two-level index structure where the records in the first level contain the clustering field value in one field and a second field pointing to a block [of 2nd level records] in the second level. The records in the second level have one field that points to an actual data file record or to another 2nd level block. B+-tree Index: Multi-level index with a balanced-tree structure. Finding a search key value in a B+-tree is proportional to the height of the tree maximum number of seeks required is lg height. While this, on average, is more than a single-level, dense index that requires only one seek, the B+-tree structure has a distinct advantage in that it does not require reorganization it is self-optimizing because the tree is kept balanced during insertions and deletions. Many mission-critical applications require high performance with near-100% uptime, which cannot be achieved with structures requiring reorganization. The leaves of the B+tree are used to reorganize the data file. C. New query optimization techniques in distributed database: C.1- Cost based query optimization: Objective of Cost-based query optimization is estimate the cost of different equivalent query expressions and chose the execution plan with the lowest cost. Cost based query optimization mainly depends on two factors they are solution space and cost function. Solution space: this is depends on the set of equivalent algebraic expressions. Cost function: cost function is equivalent to summation of I/O cost, CPU cost and communication cost. It also depends on different distributed environments. By considering these factors cost based query optimization is processed in distributed environment. C.2- Heuristic based query optimization: Heuristic based query optimization process involve following steps: 1) Perform Selection operations as early as possible. 2) Combine Cartesian product with subsequent selection whose predicate represents join condition into a Join operation. 3) Use associatively of binary operations to rearrange leaf nodes so leaf nodes with most restrictive Selection operations executed first. Blue Ocean Research Journals 75
6 4) Perform Projections operations as early as possible. 5) Eliminate duplicate computations. It is mainly used to minimize cost of selecting sites for multi join operations. Advantages of Distributed query optimization: Distributed Query optimization techniques provide exact results in distributed environment. These techniques provide efficient performance in different distributed networks. In internet these techniques helps to search exact information and extract the required one. D. Query Processing in Relational Database Systems The conventional method of processing a query in a relational DBMS is to parse the SQL statement and produce a relational calculus-like logical representation of the query, and then to invoke the query optimizer, which generates a query plan. The query plan is fed into an execution engine that directly executes it, typically with little or no runtime decision-making (Figure 5). The query plan can be thought of as a tree of unary and binary relational algebra operators, where each operator is annotated with specific details about the algorithm to use (e.g., nested loops join versus hash join) and how to allocate resources (e.g., memory). In many cases the query plan also includes low-level physical operations like sorting, network shipping, etc. that do not affect the logical representation of the data. Certain query processors consider only restricted types of queries, rather than SQL. A common example of this is select project-join or SPJ queries: an SPJ query essentially represents a single SQL SELECT-FROM-WHERE block with no aggregation or subqueries. User Query Execution Query Query Query Optimizer Plan Executor Result Example for SPJ queries: SELECT * FROM R,S,T,U WHERE R.s=S.a AND S.b=T.b AND T.c=U.c Fig 5. Query Plan E. Results for related Algorithm Based above examples we summarize the Complexities of all algorithms. Table 1 Complexity table Algorithms Complexity 1.Parallel O(m 2 ) 2. Serial O(mlog 2 m) 3.General 3.1 Procedure Total O (σm 2 ) 3.2Procedure Response 4.Algorithem S. O(mlogm) Where m is the number of required relations in the query. Conclusion Algorithm-S is a straightforward modification and extension of Apers, Hevner, and Yao's algorithm GENERAL. In GENERAL, the attribute independence assumption is interpreted to mean that a semijoin has no effect on the projected size of nonjoining attributes. This is significant, since low response time costs and low total time costs are both desirable objectives, even though one may predominate in a given situation.most real-world data is not well structured. Today's databases typically contain much non-structured data such as text, images, video, and audio, often distributed across computer networks. To process these kinds of data and optimize queries on this data requires these distributed query optimization techniques. References [1] R. Hevner and S. B. Yao, Query Processing in distributed database systems," IEEE Trans. Software Eng., vol. SE-5, pp ,May [2] William Perrizo, A Method for Processing Distributed Database Queries, IEEE Trans. Software Eng., vol. SE-10,No.4,JULY1984. [3] Peter M. G. Apers, Alan R. Hevner, And S. Bing Yao, Optimization Algorithms for Distributed Queries, IEEE Trans.,1983 [4] M. Tamer Ozsu, GTE Laboratories, Patrick Valduriez, Distributed Database Systems: Where Are We Now? IEEE INRIA, [5] Sakti Pramanik And David Vineyard, Optimizing Join Queries in Distributed Databases, IEEE, Blue Ocean Research Journals 76
7 [6] Arbee L. P. Chen and Victor 0. K. Li, Improvement Algorithms for Semijoin Query Processing Programs in Distributed Database Systems, IEEE, [7] AviSilbershatz, Hank Korth and S.Sudarshan. Database System Concepts, 4 th a. Edition. McGraw-Hill, [8] RamezElmasri and Shamkant B. Navathe.Fundamentals of Database Systems, second Edition. Addison-Wesley Publishing Company, [9] Donald Kossmann and Konrad Stocker. Iterative Dynamic Programming: A new Class of Query Optimization Algorithms. ACM Transactions on Database Systems, Vol. 25, No. 1, March 2000, Pages [10] Hsiao-Fei Liu, Ya-Hui Chang and Kun-Mao Chao. An Optimal Algorithm for Querying Tree Structures and its Applications in Bioinformatics. ACM SIGMOD Record Vol. 33, No. 2, June [11] Thomas Schwentick. XPath Query Containment. ACM SIGMOD Record, Vol.33, No. 1, March [12] Wesley W.Chu and Paul Hurley, Optimal Query Processing for Distributed Database Systems. IEEE Trans. computers, vol.c-31, No.9, September [13] W.Cellary, Z.Krolikowski and T.Morzy, Other Comments on Optimization Algorithms for Distributed Qyeries. IEEE Trans. On Software Engineering, vol.14, No.4, April [14] PauraS.M.Tsai,ArbeeL.P.Chen, Optimizing Queries with Foreign Function in a Distributed Environment, IEEE Trans. On Knowledge and data engineering, vol.14,no.4,july/august [15] Dave D.Straube and M.TamerOzsu, Query Optimization and Execution Plan Generation in Object Oriented Data Management Systems, IEEE Trans. On Knowledge and data engineering, vol.7, No.2, April [16] Stefano Ceri and George Gottlob, Translating SQL into Relational Algebra Optimization, Semantics and Equivalence of SQL Queries, IEEE Trans. On Software engineering, vol.se- 11, No.4, April [17] P.A.BersteinN.Goodman,E.Wong,G.L.Reeve and J.Rothmie, Query Processing in a system for distributed database (SDD-1), ACM Trans.DatabaseSyst.,Vol 6,Dec [18] S.Chaudhari and K.Shim, Query Optimization in presence of foreign Function, Proc.intl conf. vary large data bases, [19] D.Chiu and Y.Ho, A methodology for interpreting tree queries into optimal semi-join expression, inproc.acmsigmod,may [20] C.Yu and Caching, Distributed Query Processing, ACM Comput.Surveys, Vol.16, no.4, Dec [21] Ming Syan Chen and Philip S.Yu, Using Combination joins and semijoins operations for distributed query processing, IEEE Transactions on Knowledge and Data Engineering, [22] Chihping Wang and Ming Syan Chen, On the Complexity of Distributed Query Optimization, IEEE Transactions on Knowledge and Data Engineering, Volume 8,no.4, Aug [23] Ming Syan Chen and Philip S.Yu, Using join operations As reducer for distributed query processing, IEEE Transactions on Knowledge and Data Engineering, [24] Konrad Stocker,Donald Kossmann,Reinhard Braumandl and Alfons Kemper, Integrating semi- join Reducers into State-of-the-Art Query Processors,IEEE, Blue Ocean Research Journals 77
Architecture of Cache Investment Strategies
Architecture of Cache Investment Strategies Sanju Gupta The Research Scholar, The IIS University, Jaipur khandelwalsanjana@yahoo.com Abstract - Distributed database is an important field in database research
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationDATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11
DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance
More informationWhat happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques
376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list
More informationSomething to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:
Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base
More informationPragmatic Approach to Query Optimization
Pragmatic Approach to Query Optimization Subhi H. Hamdoon, PhD. College of Applied Sciences, Ministry of Higher Education, Oman Virendra Gawande, PhD. College of Applied Sciences, Ministry of Higher Education,
More informationAnalysis of Query Processing and Optimization
Analysis of Query Processing and Optimization Nimra Memon, Muhammad Saleem Vighio, Shah Zaman Nizamani, Niaz Ahmed Memon, Adeel Riaz Memon, Umair Ramzan Shaikh Abstract Modern database management systems
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More information7. Query Processing and Optimization
7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one
More informationKathleen Durant PhD Northeastern University CS Indexes
Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationIntegration of Transactional Systems
Integration of Transactional Systems Distributed Query Processing Robert Wrembel Poznań University of Technology Institute of Computing Science Robert.Wrembel@cs.put.poznan.pl www.cs.put.poznan.pl/rwrembel
More informationDistributed Databases Systems
Distributed Databases Systems Lecture No. 05 Query Processing Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Outline
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationTeaching Scheme Business Information Technology/Software Engineering Management Advanced Databases
Teaching Scheme Business Information Technology/Software Engineering Management Advanced Databases Level : 4 Year : 200 2002 Jim Craven (jcraven@bournemouth.ac.uk) Stephen Mc Kearney (smckearn@bournemouth.ac.uk)
More informationQuery Processing and Query Optimization. Prof Monika Shah
Query Processing and Query Optimization Query Processing SQL Query Is in Library Cache? System catalog (Dict / Dict cache) Scan and verify relations Parse into parse tree (relational Calculus) View definitions
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationAdvanced Databases. Lecture 1- Query Processing. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Advanced Databases Lecture 1- Query Processing Masood Niazi Torshiz Islamic Azad university- Mashhad Branch www.mniazi.ir Overview Measures of Query Cost Selection Operation Sorting Join Operation Other
More informationTri-variate Optimization Strategies of Semi-Join Technique on Distributed Databases
Tri-variate Optimization Strategies of Semi-Join Technique on Distributed Databases Sunita M. Mahajan, PhD. Principal Department of Computer Science Mumbai Education Trust, Bandra, Vaishali P. Jadhav Research
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationEfficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)
Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-
More informationAn Overview of various methodologies used in Data set Preparation for Data mining Analysis
An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of
More informationQUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION
E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Database Engines Main Components Query Processing Transaction Processing Access Methods JAN 2014 Slide
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation 12.2
More informationHorizontal Aggregations for Mining Relational Databases
Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,
More informationQuery optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.
Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE
More informationReview. Support for data retrieval at the physical level:
Query Processing Review Support for data retrieval at the physical level: Indices: data structures to help with some query evaluation: SELECTION queries (ssn = 123) RANGE queries (100
More informationHorizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator
Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator R.Saravanan 1, J.Sivapriya 2, M.Shahidha 3 1 Assisstant Professor, Department of IT,SMVEC, Puducherry, India 2,3 UG student, Department
More informationQuery Processing SL03
Distributed Database Systems Fall 2016 Query Processing Overview Query Processing SL03 Distributed Query Processing Steps Query Decomposition Data Localization Query Processing Overview/1 Query processing:
More informationAdvanced Databases: Parallel Databases A.Poulovassilis
1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger
More informationFile Structures and Indexing
File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due
More informationData Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationHash-Based Indexing 165
Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19
More informationPublished by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1
Optimization of Join Queries on Distributed Relations Using Semi-Joins Suresh Sapa 1, K. P. Supreethi 2 1, 2 JNTUCEH, Hyderabad, India Abstract The processing and optimizing a join query in distributed
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More informationMobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology
Mobile and Heterogeneous databases Distributed Database System Query Processing A.R. Hurson Computer Science Missouri Science & Technology 1 Note, this unit will be covered in four lectures. In case you
More informationAlgorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)
Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationOptimization of Queries in Distributed Database Management System
Optimization of Queries in Distributed Database Management System Bhagvant Institute of Technology, Muzaffarnagar Abstract The query optimizer is widely considered to be the most important component of
More informationData Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationDatabase Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building
External Sorting and Query Optimization A.R. Hurson 323 CS Building External sorting When data to be sorted cannot fit into available main memory, external sorting algorithm must be applied. Naturally,
More informationChapter 17: Parallel Databases
Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems
More informationQuery processing and optimization
Query processing and optimization These slides are a modified version of the slides of the book Database System Concepts (Chapter 13 and 14), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan.
More informationThe Hibernate Framework Query Mechanisms Comparison
The Hibernate Framework Query Mechanisms Comparison Tisinee Surapunt and Chartchai Doungsa-Ard Abstract The Hibernate Framework is an Object/Relational Mapping technique which can handle the data for applications
More informationDatabase Technology. Topic 7: Data Structures for Databases. Olaf Hartig.
Topic 7: Data Structures for Databases Olaf Hartig olaf.hartig@liu.se Database System 2 Storage Hierarchy Traditional Storage Hierarchy CPU Cache memory Main memory Primary storage Disk Tape Secondary
More informationCSE 444: Database Internals. Lectures 5-6 Indexing
CSE 444: Database Internals Lectures 5-6 Indexing 1 Announcements HW1 due tonight by 11pm Turn in an electronic copy (word/pdf) by 11pm, or Turn in a hard copy in my office by 4pm Lab1 is due Friday, 11pm
More informationArchitecture of a Database Management System Ray Lockwood
Assorted Topics Architecture of a Database Management System Pg 1 Architecture of a Database Management System Ray Lockwood Points: A DBMS is divided into modules or layers that isolate functionality.
More informationSDD-1 Algorithm Implementation
National Institute of Technology Karnataka, Surathkal Project Report on SDD-1 Algorithm Implementation Under the Guidance of: Mr. Dr. Anantha Narayana (Professor) Submitted by: Mr. Vasanth Raja Chittampally
More informationModule 9: Selectivity Estimation
Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock
More informationQuery Execution [15]
CSC 661, Principles of Database Systems Query Execution [15] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Query processing involves Query processing compilation parsing to construct parse
More informationParser: SQL parse tree
Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient
More informationQuery Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.
COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations
More informationUser Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM
Module III Overview of Storage Structures, QP, and TM Sharma Chakravarthy UT Arlington sharma@cse.uta.edu http://www2.uta.edu/sharma base Management Systems: Sharma Chakravarthy Module I Requirements analysis
More informationQuery Optimization in Distributed Databases. Dilşat ABDULLAH
Query Optimization in Distributed Databases Dilşat ABDULLAH 1302108 Department of Computer Engineering Middle East Technical University December 2003 ABSTRACT Query optimization refers to the process of
More informationChapter 3. Algorithms for Query Processing and Optimization
Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms
More informationPrinciples of Data Management. Lecture #9 (Query Processing Overview)
Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm
More informationCS 245 Midterm Exam Solution Winter 2015
CS 245 Midterm Exam Solution Winter 2015 This exam is open book and notes. You can use a calculator and your laptop to access course notes and videos (but not to communicate with other people). You have
More informationOverview of Query Processing and Optimization
Overview of Query Processing and Optimization Source: Database System Concepts Korth and Silberschatz Lisa Ball, 2010 (spelling error corrections Dec 07, 2011) Purpose of DBMS Optimization Each relational
More informationQuery Processing and Optimization *
OpenStax-CNX module: m28213 1 Query Processing and Optimization * Nguyen Kim Anh This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Query processing is
More informationIndexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel
Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes
More informationData about data is database Select correct option: True False Partially True None of the Above
Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another
More informationPhysical Level of Databases: B+-Trees
Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,
More information! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large
Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!
More informationChapter 20: Parallel Databases
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 20: Parallel Databases. Introduction
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationWhat s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence
What s a database system? Review of Basic Database Concepts CPS 296.1 Topics in Database Systems According to Oxford Dictionary Database: an organized body of related information Database system, DataBase
More informationBasant Group of Institution
Basant Group of Institution Visual Basic 6.0 Objective Question Q.1 In the relational modes, cardinality is termed as: (A) Number of tuples. (B) Number of attributes. (C) Number of tables. (D) Number of
More informationNotes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes
uery Processing Olaf Hartig David R. Cheriton School of Computer Science University of Waterloo CS 640 Principles of Database Management and Use Winter 2013 Some of these slides are based on a slide set
More informationDatabase Tuning and Physical Design: Basics of Query Execution
Database Tuning and Physical Design: Basics of Query Execution Spring 2018 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Query Execution 1 / 43 The Client/Server
More informationInternational Journal of Modern Trends in Engineering and Research e-issn: p-issn:
International Journal of Modern Trends in Engineering and Research www.ijmter.com Fragmentation as a Part of Security in Distributed Database: A Survey Vaidik Ochurinda 1 1 External Student, MCA, IGNOU.
More informationMahathma Gandhi University
Mahathma Gandhi University BSc Computer science III Semester BCS 303 OBJECTIVE TYPE QUESTIONS Choose the correct or best alternative in the following: Q.1 In the relational modes, cardinality is termed
More informationIndexing and Hashing
C H A P T E R 1 Indexing and Hashing This chapter covers indexing techniques ranging from the most basic one to highly specialized ones. Due to the extensive use of indices in database systems, this chapter
More informationQuery Processing Strategies and Optimization
Query Processing Strategies and Optimization CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/25/12 Agenda Check-in Design Project Presentations Query Processing Programming Project
More informationCourse Outline and Objectives: Database Programming with SQL
Introduction to Computer Science and Business Course Outline and Objectives: Database Programming with SQL This is the second portion of the Database Design and Programming with SQL course. In this portion,
More informationCSE 544, Winter 2009, Final Examination 11 March 2009
CSE 544, Winter 2009, Final Examination 11 March 2009 Rules: Open books and open notes. No laptops or other mobile devices. Calculators allowed. Please write clearly. Relax! You are here to learn. Question
More informationCPSC 421 Database Management Systems. Lecture 11: Storage and File Organization
CPSC 421 Database Management Systems Lecture 11: Storage and File Organization * Some material adapted from R. Ramakrishnan, L. Delcambre, and B. Ludaescher Today s Agenda Start on Database Internals:
More informationQuery Decomposition and Data Localization
Query Decomposition and Data Localization Query Decomposition and Data Localization Query decomposition and data localization consists of two steps: Mapping of calculus query (SQL) to algebra operations
More informationIntroduction Alternative ways of evaluating a given query using
Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational Expressions Dynamic Programming for Choosing Evaluation Plans Introduction
More informationChapter 19 Query Optimization
Chapter 19 Query Optimization It is an activity conducted by the query optimizer to select the best available strategy for executing the query. 1. Query Trees and Heuristics for Query Optimization - Apply
More informationRelational Query Optimization
Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6) CPSC404, Laks V.S. Lakshmanan 1 What you will learn from this lecture Cost-based query optimization (System R) Plan space
More informationCS 245 Midterm Exam Winter 2014
CS 245 Midterm Exam Winter 2014 This exam is open book and notes. You can use a calculator and your laptop to access course notes and videos (but not to communicate with other people). You have 70 minutes
More informationChapter 18: Parallel Databases
Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery
More informationChapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction
Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of
More informationThree Read Priority Locking for Concurrency Control in Distributed Databases
Three Read Priority Locking for Concurrency Control in Distributed Databases Christos Papanastasiou Technological Educational Institution Stereas Elladas, Department of Electrical Engineering 35100 Lamia,
More informationInformation Management (IM)
1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;
More informationIntro to DB CHAPTER 12 INDEXING & HASHING
Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing
More informationPrinciples of Parallel Algorithm Design: Concurrency and Decomposition
Principles of Parallel Algorithm Design: Concurrency and Decomposition John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 2 12 January 2017 Parallel
More informationReview. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System
Review Relational Query Optimization R & G Chapter 12/15 Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory
More informationCSC 742 Database Management Systems
CSC 742 Database Management Systems Topic #16: Query Optimization Spring 2002 CSC 742: DBMS by Dr. Peng Ning 1 Agenda Typical steps of query processing Two main techniques for query optimization Heuristics
More informationQuery Optimization Overview. COSC 404 Database System Implementation. Query Optimization. Query Processor Components The Parser
COSC 404 Database System Implementation Query Optimization Query Optimization Overview The query processor performs four main tasks: 1) Verifies the correctness of an SQL statement 2) Converts the SQL
More informationQuery Processing. high level user query. low level data manipulation. query processor. commands
Query Processing high level user query query processor low level data manipulation commands 1 Selecting Alternatives SELECT ENAME FROM EMP,ASG WHERE EMP.ENO = ASG.ENO AND DUR > 37 Strategy A ΠENAME(σDUR>37
More informationD B M G Data Base and Data Mining Group of Politecnico di Torino
Database Management Data Base and Data Mining Group of tania.cerquitelli@polito.it A.A. 2014-2015 Optimizer operations Operation Evaluation of expressions and conditions Statement transformation Description
More information