Distributing the Derivation and Maintenance of Subset Descriptor Rules

Size: px
Start display at page:

Download "Distributing the Derivation and Maintenance of Subset Descriptor Rules"

Transcription

1 Distributing the Derivation and Maintenance of Subset Descriptor Rules Jerome Robinson, Barry G. T. Lowden, Mohammed Al Haddad Department of Computer Science, University of Essex Colchester, Essex, CO4 3SQ, U.K. Abstract This draft paper describes a solution to the rule maintenance problem for data descriptor rules derived from data that may subsequently change. The method utilises any available computers in the local area network, to derive and maintain rule sets. Introduction Database query processing involves the selection and manipulation of data subsets specified by the query or by the query processor. Descriptors for data subsets are useful in optimising the query processing task. For example, histograms are simple subset descriptors. Each bar in the histogram describes a data subset by specifying the number of data items in that subset. This descriptor information is used in conventional query optimisation to schedule the order of operations on intermediate data sets in the query execution plan. Attribute pair rules [14] are subset descriptors which state dependencies between data values within subsets. Rules of this kind are the basis of semantic query optimisation [13, 17, 18] and can also be used to support data caching in remote clients to a database management system [20]. The problem for applications using subset descriptors is that any change to data may require a corresponding change to the description of one or more of the data subsets. This implies the need for fast derivation and maintenance of subset descriptors, in a way that does not add workload to the database server. We investigate the use of multiple workstations in the same local area network as the data server to handle the work of descriptor derivation and maintenance. Tasks are distributed to these computers by a particular workstation (the master in a master-slave configuration). The tasks run on each slave workstation as background programs. 2. Background Information A subset descriptor is a selector, descriptor pair. The selector is a data-value constraint (a selection condition) which identifies a subset of data items in the database. The descriptor provides information about data items in that subset. The selector uses a collection of values and ranges for specified attributes as a selection constraint to include tuples in the subset. It may be the Boolean expression in the WHERE clause of an SQL query, for example. Attribute Pair (AP) rules [14] have been identified as a particularly useful form of subset descriptor for semantic query optimisation (SQO) and remote cache management. An AP rule has the form A B, where A and B both have of the structure of query selection conditions, but consequent B also describes tuples selected by condition A. For example, c(10..20) d(27..36) means in the set of tuples which satisfy the condition (10 c < 20) for attribute c, all have attribute d values in the range This apparently simple rule structure hides detail, since an AP rule set refers to a particular database table which may be a virtual relation, containing the pair of attributes. For example, the rule: ship_class(class, type, draught, _,_) ship(_, class, type, status, _,_) (draught < 50) (status = Active ) 1

2 is the AP rule: (draught < 50) (status = Active ) on a virtual relation which is the natural join of base relations ship_class and ship. Since a whole set of such AP rules is associated with the table it is inappropriate to repeat information in every rule. Sets of AP rule subset descriptors are derived from the data in preparation for query processing. Each AP rule is an ordered pair of conditions, which allows rules to be used as directed edges in a graph [13] whose paths provide transitive inferences which are further descriptions of the subset selected by the start of sub-path condition. AP rules can be extended to multi-consequent rules [13] without losing this graph edge semantics. Eg, the descriptor: c(10..20) d(27..36) (f {'OMG', 'TPC'}) h( ) represents three AP rules, all with the same antecedent condition, c(10..20). A single antecedent look-up in the rule set thus provides multiple assertions about the selected subset. Reducing look-up time for descriptors is important in query optimisation, since the goal is reduced query execution time. The consequent is a vector of assertions, so that descriptors for different subsets are easily compared or combined by specific vector element. For example, a database query selects data items with a(75..90) AND c(10..15), and two relevant descriptors exist which describe subsets containing the required data items: a(70..95) d(18..43) (f { ODMG, 'OMG'}) h(13..71) c(10..20) d(27..36) (f {'OMG', 'TPC'}) h( ) The query conditions are sub-ranges of the antecedent conditions. Each antecedent therefore selects a superset of that selected by the corresponding query condition. Furthermore, the conjunction in the query specifies the intersection of the two sets described by the two rules. Pairwise comparison of vector elements in the two rule consequents show incompatible values for attribute h. This indicates that the two query conditions select disjoint subsets of data items. No tuples can satisfy both query conditions so the result set will be empty, and the empty answer can be returned immediately, without consulting the database. In 'Associative Caching' [4, 5, 8] each client computer keeps a copy of each of its query result sets in its own local database. The purpose is to reduce the size and frequency of queries to the remote data server accessed by wide-area network. This can reduce query cost factors based on access restrictions imposed by the server such as authorisation delays, payment charges for data, and server breakdown or workload delay as well as internet delays. For each new query the client tries to find some or all of the required data in its local collection of result sets. Usually this is done by syntactic comparison of the new query with each previous query [e.g. 2, 11] to detect overlapping data sets. Attribute-Pair Range Rules which the server derives from its data for its own use in query optimisation, can be further utilised to provide descriptors for each query result set. This new information adds to the limited description currently available to a client in the form of the previous query expression. It enables clients to recognise data overlap for new queries which refer to attributes not mentioned in the previous query, so that local data can now be exploited for syntactically unrelated queries [15]. Subset descriptors are a form of knowledge about the data, which is derived directly from the data. But unlike many forms of KDD it must be exact [10] rather than probability-based. This means it cannot use only samples of data sets. It must process all tuples in the subset it describes. Therefore the use of subset descriptors can introduce a significant processing workload. But the data server should not be required to do extra work of this kind, since it may delay current queries. Creating metadata to make future queries faster would make current queries slower. Furthermore, data may change (in environments other than the static data warehouse or data archive) and this requires a corresponding change to descriptors. So it 2

3 would be useful to find an existing hardware resource that can be used to do this work instead of the data server. 3. Creating a Parallel Virtual Machine from Networked Computers PVM [6] is a well-established software system which enables a group of workstations linked by local-area network to work together as a Virtual Machine. Modern workstations have more computing power than they use. Successive generations of workstations increase the computing power and capacity of previous generations. Therefore the amount of spare computing capability in a network of desktop machines is steadily increasing. This is a resource that can be used to analyse and summarize data sets. PVM allows networked workstations to be used by spawning new background programs on the machines. The programs can accept messages from a main process/program on a particular computer telling them what to do and can return the specified results. Data sets can be transferred directly between machines, or via the Network File System. This allows the data server to be treated as one component in a multi-workstation machine. The processing and memory resources of the machine can expand and contract dynamically by varying the number of computers being used. The task of deriving subset descriptors from data can be distributed to multiple workstations in the local area network, as follows. Main Process Identify table and relevant attributes Retrieve the database table data Establish the PVM machine Sub-processes on different computers Receive the workload from the main process Send the same amount of data to each slave Sort the records according to a specific attribute Derive the rules Wait for the rules Receive and merge the rule subsets Send the derived rules to the main process Fig. 1 Using Multiple workstations to Create a set of Subset Descriptor Rules 3

4 4. Rule Derivation Algorithm Used 1. The Main Process chooses an attribute to be antecedent for the current set of rules, and identifies the MIN and MAX values if it is numeric. It broadcasts those values to all computers, and the message also specifies the number of rules required in the set. 2. Each computer then receives from the Main Process a subset of the database table to be described, and sorts it on the attribute specified as antecedent. 3. After sorting, each computer divides the MIN..MAX range into the specified number of sub-ranges. This is the number of rules required, since each sub-range produces a rule. 4. Each computer divides its sorted table (part of the original database table) into disjoint subsets, using the sub-ranges to select tuples. The ordered sequence of tuples is scanned, building each rule incrementally. For example, if the next sub-range is 10 a < 25 for the antecedent attribute named a then all tuples in the relevant sub-sequence of tuples will contribute to the rule. If the first tuple in the sub-set has 26 as the value of attribute c, then the rule so far is (10 a < 25) (c = 26). Descriptors for other consequent attributes are added in the same way. The next tuple in the ordered sequence has c = 31, so the rule describing all tuples encountered so far becomes (10 a < 25) (26 c 31). If the next tuple has c = 29 then the rule remains unchanged because it correctly describes the set of three tuples which includes this new tuple. Thus each new tuple encountered during the scan through the ordered table will either extend the consequent range or leave it unchanged, so that when no more tuples satisfy the selection condition (10 a < 25) the rule describes all tuples in that sub-set. The next tuple in the sorted data sequence starts a new descriptor for the next sub-set, with antecedent (25 a < 40), for example. When the end of the sorted table is reached, the computer has produced the specified number of sub-range descriptor rules. 5. Each computer returns its set of rules to the Main Process, which merges corresponding rules from all the separate computers to create a single rule set with the specified number of rules. This rule set describes the whole database table. Corresponding rules are rules with the same antecedent condition, produced in separate computers. Rule merging is just another stage of incremental rule generation. For example, rules (40 a < 55) (61 c 83) and (40 a < 55) (68 c 74) are provided by two computers. The combined rule is (40 a < 55) (61 c 83), since this describes both sets. If a further computer provides the rule (40 a < 55) (75 c 85), the descriptor for the Union of the three tuple sub-sets is (40 a < 55) (61 c 85). Another computer returns (40 a < 55) no tuples, so the rule remains (40 a < 55) (61 c 85). 5. Performance of the Multi-computer Rule Derivation Algorithm The elapsed time for multi-computer rule derivation has been measured in experiments. The following graph shows a typical example of the experimental results obtained. It shows the measured times to derive rules from a table with rows, of 112 bytes per row, distributed to varying numbers of networked workstations. The attribute used as antecedent for the derived rules was of Character String type, which is much slower to sort than numeric attributes. Although measured times for numeric antecedent attributes are much shorter, the shape of the graph is very similar, indicating a rapid reduction in time as the number of computers used increases. This is the time needed to derive a set of rules from a database table. The rule set is like a histogram with an Attribute Pair rule or multi-consequent rule describing the subset represented by each bar of the histogram. 4

5 Measured Time to Derive Rules from tuples whose antecedent attribute is of String type 700 Elapsed Time (seconds) Measured Time Expected time 625/H : Number of Computers used in the local network Fig. 2 Observed Performance of Multi-Computer Rule Derivation Total time is significantly reduced by working with multiple computers. But the time reduction is also remarkable in being better than one might predict. Dividing work between three workers can divide the total time by three; although additional work to distribute data and synchronize the workers may prevent the theoretical speedup of T/H, where T is the time for a single worker and H is the number of workers. The graph plots values of T/H for comparison with the measured times. T was 625 seconds. For two or more computers the elapsed time was found to be shorter than T/H. Values plotted in the graph are as follows. No. of Hosts, H : Measured Time : Expected, 625/H : The tuple (14 Mbyte) example is typical of results from experiments on data sets of various sizes and data types. Better than T/H performance was observed for all. Several factors contribute to this speedup. The NlogN complexity of the Quicksort algorithm, which consumes most of the elapsed time in the rule derivation process, is one factor. If the elapsed time, T, to sort a set of data is proportional to N.logN, then T = (1/k).N.logN, where 1/k is the constant of proportionality. But T = 625 seconds when N = , so k = Then values for T can be predicted as (1/1066)N.logN, where N is /H, and H is the number of workstations. But observed times are still significantly faster than these predicted times. The following graph indicates the connection between T/H and NlogN as the size of the data sub-set in each machine decreases as the data set is partitioned between more computers. 5

6 N N*logN /H Number of Computers, H ( N is the number of Data Items per computer, i.e /H ) Fig. 3 Comparison of NlogN values with T/H A second factor, which contributes to the large speedup when distributing the sort algorithm, is the amount of paging required as the size of data set to be sorted increases. The proportion of pages which cause page faults, requiring swapping from disk, increases with the amount by which the data set exceeds the available main memory space. Each disk access is a severe time penalty. So the smaller data sets provided by division to more machines reduce the number of these delays. A third factor, related to available main memory space and paging, is the data transfer time when sending large data subsets to computers to sort. Message passing is used between computers. The receive buffer in PVM message passing is limited by the amount of main memory available to dynamically utilise as buffer space. Blocking send is used to reliably transfer data, so that delays can occur as the amount of data exceeds the amount of physical memory space. Paging to virtual memory must occur before physical memory frames are available as buffer space to accept more data. This delay does not occur when the number of computers used is great enough (depending on the size of the whole data set). 6. Rule Maintenance If the data changes, rules describing the data may need to change. Insert, Delete and Update are the database operations that can change the data. Tuple INSERT has the same effect on descriptors as a new tuple encountered during the table scan described in section 4. The numeric or string value of the antecedent attribute in the new tuple maps to the relevant rule. Assertions in that rule describing other attributes may need to be extended by values in this new tuple. If several sets of rules exist, each with a different antecedent attribute, then the new tuple maps to one rule in each set. Deleting a tuple does not require any change to range assertion rules, since deletion does not falsify rules. Any remaining data values are still within consequent-specified range limits. However, choosing to create new descriptors for any rule whose antecedent includes attribute values in the deleted tuple may provide narrower ranges as consequent assertions. This is beneficial because narrow consequent ranges can match more query conditions. 6

7 Updating a single tuple changes the value of one or more fields in an existing tuple and is equivalent to reading and Deleting the tuple before Inserting the new version. Rule maintenance actions are therefore the same as Delete followed by Insert. However, if an Update changes a field in all tuples in the table, the server will disable all assertions about that column of the table until they can all be revised. This makes one of the elements in some rule consequent vectors temporarily unavailable. When a new tuple is Inserted into the database table it is also sent to one of the computers to add to its data subset. As a result of this new data the computer may notify the Main Process that one of its subset descriptor rules has changed. For example, a(15..20) c(63..91) is a revised rule produced by the computer. To merge this with the existing rule set, the current rule: a(15..20) c(29..71) which was previously produced by merging results from all computers, becomes a(15..20) c(29..91). If a tuple is Updated, in the central database table, the old version of the tuple is broadcast to all computers, so that the machine with a matching tuple can delete it, before Inserting the new version. After a delete, n of the rules can be revised in the affected computer, where n attributes were updated. It then notifies the Main Process that an improved version of that particular subset descriptor is available, and the Main Process examines the corresponding rule from all other computers in order to create a new merged descriptor for that subset. The master computer retains all the rule sub-sets created in all the slaves, to use in this incremental rule maintenance process. 7. Conclusions Converting a Database table to a set of subset descriptors rules is a data reduction process, because the rule set is much smaller than its data set. (The descriptors provide a summary of the data). Partitioning a data set and then merging rule sets derived from the partitions is found to be an effective way to speed up the creation of rule sets. A sorting algorithm was used to get the data subsets into a structure (a set of sorted sequences) which can be used as a look-up table to rapidly derive rules and to update those rules when the data changes. Merging rule sets from a collection of workstations is very fast. Much faster than merging sorted data subsets. The configuration of 'master' workstation with a set of 'slave' workstations in a local area network provides an effective way to solve the problem of maintaining derived descriptors rules as the data changes. The master workstation is (also) the user interface to the database, accepting queries and data updates from networked users. It sends all data changes to the slave workstations as well as to the data server, and the slaves respond with any changes caused to their rule subsets. The workload of rule derivation and maintenance does not affect the data server, because it is done on different computers. Workstations in a local network are commonly underutilised. Their computing capacity is rarely used to its full extent because modern desktop computers are powerful machines. But typical application programs have a use profile that makes the machines virtually idle for most of the time, with occasional bursts of activity. We utilise such networked workstations as a distributed computing resource, to derive and maintain data descriptor rules by means of background programs on the workstations. The master workstation uses derived rules for semantic query optimisation [13,16,18] and for remote client cache management [20], but it can also answer queries from the sorted data in slave workstations as well as from the data server. This method of query optimisation by generating query execution plans that use workstation data sets, as well as the database data server, is the subject of our current research. 7

8 References 1. S. Abiteboul, R. Hull and V. Viannu, Foundations of Databases, Addison-Wesley, Adali, S., Candan, K. S., Papakonstantinou, Y., Subrahmanian, V. S.: Query Caching and Optimization in Distributed Mediator Systems. Proc ACM SIGMOD Conf. (1996) Julie Basu, Meikel Poess, and Arthur M. Keller, Performance Analysis of an Associative Caching Scheme for Client-Server Databases, Technical Note STAN-CS-TN-97-61, Stanford University, Computer Science Dept., September Julie Basu, Meikel Poess, and Arthur M. Keller, High Performance and Scalability Through Associative Client-Side Caching, Seventh International Workshop on High Performance Transaction Systems, Pacific Grove, CA, September Dar, S., Franklin, M. J., Jonsson, B. T., Srivastava, D., Tan, M.: Semantic Data Caching and Replacement, Proc. 22nd VLDB Conference (1996) A Geist, et al, "PVM: Parallel Virtual Machine. A Users' Guide and Tutorial for Networked Parallel Computing", MIT Press, Godfrey, P., and Gryz, J., Semantic Query Caching for Heterogeneous Databases, KRDB'97, 4th International Workshop on Knowledge Representation meets Data Bases, , Keller, A. M., Basu, J.: A Predicate-based Caching Scheme for Client-Server Database Architectures. VLDB Journal 5(1) 1996, G. Piatetsky-Shapiro, Discovery, Analysis and Presentation of Strong Rules, Knowledge Discovery in Databases, Eds. G. Piatetsky-Shapiro and W. J. Frawley, MIT Press (1991) Qian, X.: Query Folding. 12th IEEE Intl. Conference on Data Engineering (1996) Robinson, J., Lowden, B. G. T.: Data Analysis for Query Processing. 2nd Intl. Symposium on Intelligent Data Analysis (1997) (LNCS 1280) 13. Robinson, J., Lowden, B. G. T.: Semantic Query Optimisation and Rule Graphs. KRDB'98, 5th International Workshop on Knowledge Representation meets Data Bases, , J. Robinson and B. G. T. Lowden, Attribute-Pair Range Rules. Proc. DEXA'98, 9th Intl. Conference on Database and Expert Systems Applications (1998) (LNCS 1460) 16. S. Shekhar, B. Hamidzadeh, A. Kohli, and M. Coyle. Learning transformation rules for semantic query optimization: A data-driven approach, IEEE Transactions on Knowledge and Data Engineering, 5(6), , S.T. Shenoy, Z.M. Ozsoyoglu, A System for Semantic Query Optimization, Proc ACM SIGMOD Conference, 1987, pp M. Siegel, E. Sciore, S. Salveter, A Method for Automatic Rule Derivation to Support Semantic Query Optimization, ACM TODS 17(4) , Divesh Srivastava, Shaul Dar, H. V. Jagadish, Alon Y. Levy, Answering Queries with Aggregation Using Views, Proc. 22 nd VLDB Conference (1996) J. Robinson and B. G. T. Lowden, Extending the Re-use of Query Results at Remote Client Sites, Proc. DEXA 00, 11th Intl. Conf. on Database and Expert Systems Applications, 2000, pages Springer (LNCS 1873). 8

Utilizing Multiple Computers in Database Query Processing and Descriptor Rule Management

Utilizing Multiple Computers in Database Query Processing and Descriptor Rule Management Utilizing Multiple Computers in Database Query Processing and Descriptor Rule Management Jerome Robinson, Barry G. T. Lowden, Mohammed Al Haddad Department of Computer Science, University of Essex Colchester,

More information

Attribute-Pair Range Rules

Attribute-Pair Range Rules Lecture Notes in Computer Science 1 Attribute-Pair Range Rules Jerome Robinson Barry G. T. Lowden Department of Computer Science, University of Essex Colchester, Essex, CO4 3SQ, U.K. {robij, lowdb}@essex.ac.uk

More information

The Use of Statistics in Semantic Query Optimisation

The Use of Statistics in Semantic Query Optimisation The Use of Statistics in Semantic Query Optimisation Ayla Sayli ( saylia@essex.ac.uk ) and Barry Lowden ( lowdb@essex.ac.uk ) University of Essex, Dept. of Computer Science Wivenhoe Park, Colchester, CO4

More information

I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications,

I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications, I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications, Proc. of the International Conference on Knowledge Management

More information

Using A Network of workstations to enhance Database Query Processing Performance

Using A Network of workstations to enhance Database Query Processing Performance Using A Network of workstations to enhance Database Query Processing Performance Mohammed Al Haddad, Jerome Robinson Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

A Statistical Approach to Rule Selection in Semantic Query Optimisation

A Statistical Approach to Rule Selection in Semantic Query Optimisation A Statistical Approach to Rule Selection in Semantic Query Optimisation Barry G. T. Lowden and Jerome Robinson Department of Computer Science, The University of ssex, Wivenhoe Park, Colchester, CO4 3SQ,

More information

Oracle Database 11g: SQL Tuning Workshop

Oracle Database 11g: SQL Tuning Workshop Oracle University Contact Us: Local: 0845 777 7 711 Intl: +44 845 777 7 711 Oracle Database 11g: SQL Tuning Workshop Duration: 3 Days What you will learn This Oracle Database 11g: SQL Tuning Workshop Release

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo

More information

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

A Fast Transformation Method to Semantic Query Optimisation

A Fast Transformation Method to Semantic Query Optimisation A Fast Transformation Method to Semantic Query Optimisation Ayla Sayli ( saylia@essex.ac.uk ) and Barry Lowden ( lowdb@essex.ac.uk ) University of Essex, Dept. of Computer Science, Wivenhoe Park, Colchester,

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Migrating to Object Data Management

Migrating to Object Data Management Migrating to Object Data Management Arthur M. Keller * Stanford University and Persistence Software Paul Turner Persistence Software Abstract. We discuss issues of migrating to object data management.

More information

Knowledge Discovery from Client-Server Databases

Knowledge Discovery from Client-Server Databases Knowledge Discovery from Client-Server Databases Nell Dewhurst and Simon Lavington Department of Computer Science, University of Essex, Wivenhoe Park, Colchester CO4 4SQ, UK neilqessex, ac.uk, lavingt

More information

A Case for Merge Joins in Mediator Systems

A Case for Merge Joins in Mediator Systems A Case for Merge Joins in Mediator Systems Ramon Lawrence Kirk Hackert IDEA Lab, Department of Computer Science, University of Iowa Iowa City, IA, USA {ramon-lawrence, kirk-hackert}@uiowa.edu Abstract

More information

Query Rewriting Using Views in the Presence of Inclusion Dependencies

Query Rewriting Using Views in the Presence of Inclusion Dependencies Query Rewriting Using Views in the Presence of Inclusion Dependencies Qingyuan Bai Jun Hong Michael F. McTear School of Computing and Mathematics, University of Ulster at Jordanstown, Newtownabbey, Co.

More information

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating

More information

Computing Data Cubes Using Massively Parallel Processors

Computing Data Cubes Using Massively Parallel Processors Computing Data Cubes Using Massively Parallel Processors Hongjun Lu Xiaohui Huang Zhixian Li {luhj,huangxia,lizhixia}@iscs.nus.edu.sg Department of Information Systems and Computer Science National University

More information

Striped Grid Files: An Alternative for Highdimensional

Striped Grid Files: An Alternative for Highdimensional Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Advanced Databases: Parallel Databases A.Poulovassilis

Advanced Databases: Parallel Databases A.Poulovassilis 1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator R.Saravanan 1, J.Sivapriya 2, M.Shahidha 3 1 Assisstant Professor, Department of IT,SMVEC, Puducherry, India 2,3 UG student, Department

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large

More information

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building External Sorting and Query Optimization A.R. Hurson 323 CS Building External sorting When data to be sorted cannot fit into available main memory, external sorting algorithm must be applied. Naturally,

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures) CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA M. GAUS, G. R. JOUBERT, O. KAO, S. RIEDEL AND S. STAPEL Technical University of Clausthal, Department of Computer Science Julius-Albert-Str. 4, 38678

More information

Novel Materialized View Selection in a Multidimensional Database

Novel Materialized View Selection in a Multidimensional Database Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/

More information

Architecting Object Applications for High Performance with Relational Databases

Architecting Object Applications for High Performance with Relational Databases Architecting Object Applications for High Performance with Relational Databases Shailesh Agarwal 1 Christopher Keene 2 Arthur M. Keller 3 1.0 Abstract This paper presents an approach for architecting OO

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY [Agrawal, 2(4): April, 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY An Horizontal Aggregation Approach for Preparation of Data Sets in Data Mining Mayur

More information

A Fast Method for Ensuring the Consistency of Integrity Constraints

A Fast Method for Ensuring the Consistency of Integrity Constraints A Fast Method for Ensuring the Consistency of Integrity Constraints Barry G. T. Lowden and Jerome Robinson Department of Computer Science, The University of Essex, Wivenhoe Park, Colchester CO4 3SQ, Essex,

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

Scalability via Parallelization of OWL Reasoning

Scalability via Parallelization of OWL Reasoning Scalability via Parallelization of OWL Reasoning Thorsten Liebig, Andreas Steigmiller, and Olaf Noppens Institute for Artificial Intelligence, Ulm University 89069 Ulm, Germany firstname.lastname@uni-ulm.de

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23 Final Exam Review 2 Kathleen Durant CS 3200 Northeastern University Lecture 23 QUERY EVALUATION PLAN Representation of a SQL Command SELECT {DISTINCT} FROM {WHERE

More information

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Database Engines Main Components Query Processing Transaction Processing Access Methods JAN 2014 Slide

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

An Oracle White Paper April 2010

An Oracle White Paper April 2010 An Oracle White Paper April 2010 In October 2009, NEC Corporation ( NEC ) established development guidelines and a roadmap for IT platform products to realize a next-generation IT infrastructures suited

More information

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

Data Access Paths for Frequent Itemsets Discovery

Data Access Paths for Frequent Itemsets Discovery Data Access Paths for Frequent Itemsets Discovery Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science {marekw, mzakrz}@cs.put.poznan.pl Abstract. A number

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX

Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX ABSTRACT Symmetric multiprocessor (SMP) computers can increase performance by reducing the time required to analyze large volumes

More information

Data integration supports seamless access to autonomous, heterogeneous information

Data integration supports seamless access to autonomous, heterogeneous information Using Constraints to Describe Source Contents in Data Integration Systems Chen Li, University of California, Irvine Data integration supports seamless access to autonomous, heterogeneous information sources

More information

More on Conjunctive Selection Condition and Branch Prediction

More on Conjunctive Selection Condition and Branch Prediction More on Conjunctive Selection Condition and Branch Prediction CS764 Class Project - Fall Jichuan Chang and Nikhil Gupta {chang,nikhil}@cs.wisc.edu Abstract Traditionally, database applications have focused

More information

MapReduce-II. September 2013 Alberto Abelló & Oscar Romero 1

MapReduce-II. September 2013 Alberto Abelló & Oscar Romero 1 MapReduce-II September 2013 Alberto Abelló & Oscar Romero 1 Knowledge objectives 1. Enumerate the different kind of processes in the MapReduce framework 2. Explain the information kept in the master 3.

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data

More information

Similarity Joins of Text with Incomplete Information Formats

Similarity Joins of Text with Incomplete Information Formats Similarity Joins of Text with Incomplete Information Formats Shaoxu Song and Lei Chen Department of Computer Science Hong Kong University of Science and Technology {sshaoxu,leichen}@cs.ust.hk Abstract.

More information

ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET

ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET zk0 ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET COMPUTER SCIENCE DEPARTMENT AND CENTER FOR AUTOMATION RESEARCH AND INSTITUTE FOR ADVANCED COMPUTER STUDIES UNIVERSITY

More information

Evaluation of Parallel Programs by Measurement of Its Granularity

Evaluation of Parallel Programs by Measurement of Its Granularity Evaluation of Parallel Programs by Measurement of Its Granularity Jan Kwiatkowski Computer Science Department, Wroclaw University of Technology 50-370 Wroclaw, Wybrzeze Wyspianskiego 27, Poland kwiatkowski@ci-1.ci.pwr.wroc.pl

More information

TPC-DI. The First Industry Benchmark for Data Integration

TPC-DI. The First Industry Benchmark for Data Integration The First Industry Benchmark for Data Integration Meikel Poess, Tilmann Rabl, Hans-Arno Jacobsen, Brian Caufield VLDB 2014, Hangzhou, China, September 4 Data Integration Data Integration (DI) covers a

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

Mining Distributed Frequent Itemset with Hadoop

Mining Distributed Frequent Itemset with Hadoop Mining Distributed Frequent Itemset with Hadoop Ms. Poonam Modgi, PG student, Parul Institute of Technology, GTU. Prof. Dinesh Vaghela, Parul Institute of Technology, GTU. Abstract: In the current scenario

More information

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic tomas.skopal@vsb.cz

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Updates through Views

Updates through Views 1 of 6 15 giu 2010 00:16 Encyclopedia of Database Systems Springer Science+Business Media, LLC 2009 10.1007/978-0-387-39940-9_847 LING LIU and M. TAMER ÖZSU Updates through Views Yannis Velegrakis 1 (1)

More information

New Join Operator Definitions for Sensor Network Databases *

New Join Operator Definitions for Sensor Network Databases * Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 41 New Join Operator Definitions for Sensor Network Databases * Seungjae

More information

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost

More information

Increasing Database Performance through Optimizing Structure Query Language Join Statement

Increasing Database Performance through Optimizing Structure Query Language Join Statement Journal of Computer Science 6 (5): 585-590, 2010 ISSN 1549-3636 2010 Science Publications Increasing Database Performance through Optimizing Structure Query Language Join Statement 1 Ossama K. Muslih and

More information

Impala Intro. MingLi xunzhang

Impala Intro. MingLi xunzhang Impala Intro MingLi xunzhang Overview MPP SQL Query Engine for Hadoop Environment Designed for great performance BI Connected(ODBC/JDBC, Kerberos, LDAP, ANSI SQL) Hadoop Components HDFS, HBase, Metastore,

More information

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2 Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,

More information

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag. Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE

More information

Web-based Energy-efficient Cache Invalidation in Wireless Mobile Environment

Web-based Energy-efficient Cache Invalidation in Wireless Mobile Environment Web-based Energy-efficient Cache Invalidation in Wireless Mobile Environment Y.-K. Chang, M.-H. Hong, and Y.-W. Ting Dept. of Computer Science & Information Engineering, National Cheng Kung University

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

An Information-Theoretic Approach to the Prepruning of Classification Rules

An Information-Theoretic Approach to the Prepruning of Classification Rules An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs

More information

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java s Devrim Akgün Computer Engineering of Technology Faculty, Duzce University, Duzce,Turkey ABSTRACT Developing multi

More information

Training. Data Modelling. Framework Manager Projects (2 days) Contents

Training. Data Modelling. Framework Manager Projects (2 days) Contents We aim to provide you with the right training, at the right time and at the right price'. A cost effective solution to your business objectives. Our trainers are experts in IBM Cognos applications and

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Analyzing Dshield Logs Using Fully Automatic Cross-Associations

Analyzing Dshield Logs Using Fully Automatic Cross-Associations Analyzing Dshield Logs Using Fully Automatic Cross-Associations Anh Le 1 1 Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697, USA anh.le@uci.edu

More information

Performance Optimization for Informatica Data Services ( Hotfix 3)

Performance Optimization for Informatica Data Services ( Hotfix 3) Performance Optimization for Informatica Data Services (9.5.0-9.6.1 Hotfix 3) 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

The Design and Optimization of Database

The Design and Optimization of Database Journal of Physics: Conference Series PAPER OPEN ACCESS The Design and Optimization of Database To cite this article: Guo Feng 2018 J. Phys.: Conf. Ser. 1087 032006 View the article online for updates

More information

University of Waterloo Midterm Examination Sample Solution

University of Waterloo Midterm Examination Sample Solution 1. (4 total marks) University of Waterloo Midterm Examination Sample Solution Winter, 2012 Suppose that a relational database contains the following large relation: Track(ReleaseID, TrackNum, Title, Length,

More information

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples. Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations Table of contents Faster Visualizations from Data Warehouses 3 The Plan 4 The Criteria 4 Learning

More information

Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms

Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms Ruoming Jin Department of Computer and Information Sciences Ohio State University, Columbus OH 4321 jinr@cis.ohio-state.edu

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Join (SQL) - Wikipedia, the free encyclopedia

Join (SQL) - Wikipedia, the free encyclopedia 페이지 1 / 7 Sample tables All subsequent explanations on join types in this article make use of the following two tables. The rows in these tables serve to illustrate the effect of different types of joins

More information

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 A Real Time GIS Approximation Approach for Multiphase

More information

Supporting Fuzzy Keyword Search in Databases

Supporting Fuzzy Keyword Search in Databases I J C T A, 9(24), 2016, pp. 385-391 International Science Press Supporting Fuzzy Keyword Search in Databases Jayavarthini C.* and Priya S. ABSTRACT An efficient keyword search system computes answers as

More information

HYRISE In-Memory Storage Engine

HYRISE In-Memory Storage Engine HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University

More information

Optimizing System Performance

Optimizing System Performance 243 CHAPTER 19 Optimizing System Performance Definitions 243 Collecting and Interpreting Performance Statistics 244 Using the FULLSTIMER and STIMER System Options 244 Interpreting FULLSTIMER and STIMER

More information

Optimising Mediator Queries to Distributed Engineering Systems

Optimising Mediator Queries to Distributed Engineering Systems Optimising Mediator Queries to Distributed Engineering Systems Mattias Nyström 1 and Tore Risch 2 1 Luleå University of Technology, S-971 87 Luleå, Sweden Mattias.Nystrom@cad.luth.se 2 Uppsala University,

More information

Analysis of Basic Data Reordering Techniques

Analysis of Basic Data Reordering Techniques Analysis of Basic Data Reordering Techniques Tan Apaydin 1, Ali Şaman Tosun 2, and Hakan Ferhatosmanoglu 1 1 The Ohio State University, Computer Science and Engineering apaydin,hakan@cse.ohio-state.edu

More information

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

A 12-STEP SORTING NETWORK FOR 22 ELEMENTS

A 12-STEP SORTING NETWORK FOR 22 ELEMENTS A 12-STEP SORTING NETWORK FOR 22 ELEMENTS SHERENAZ W. AL-HAJ BADDAR Department of Computer Science, Kent State University Kent, Ohio 44240, USA KENNETH E. BATCHER Department of Computer Science, Kent State

More information

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09 Distributed File Systems CS 537 Lecture 15 Distributed File Systems Michael Swift Goal: view a distributed system as a file system Storage is distributed Web tries to make world a collection of hyperlinked

More information