Technical Report - Distributed Database Victor FERNANDES - Université de Strasbourg /2000 TECHNICAL REPORT

Size: px

Start display at page:

Download "Technical Report - Distributed Database Victor FERNANDES - Université de Strasbourg /2000 TECHNICAL REPORT"

Eustace Morris
6 years ago
Views:

1 TECHNICAL REPORT Distributed Databases And Implementation of the TPC-H Benchmark Victor FERNANDES DESS Informatique Promotion : 1999 / 2000 Page 1 / 29

2 TABLE OF CONTENTS ABSTRACT... 3 INTRODUCTION... 3 HOW TO CREATE A DISTRIBUTED DATABASE... 4 ORACLE SETUP... 5 Database Links... 5 Private, Public, and Global Database Links... 6 Transparency in a Distributed Database System... 7 Location Transparency... 7 BENCHMARK... 8 RESULTS... 9 GRAPHICAL RESULTS CONCLUSION ANNEXES NETWORK SETUP First stage Net8 Assistant for a client First stage Net8 Assistant for a server Second stage Net8 Easy Config GLOBAL ADMINISTRATION QUERIES TIME REPORT Page 2 / 29

3 ABSTRACT Technical Report - Distributed Database This technical report describes the basics concepts of a Distributed Database (DB), and presents different examples how to create a query on an Oracle DB a remote Oracle DB. Introduction A distributed database is a set of databases stored on multiple computers that typically appears to applications as a single database. Consequently, an application can simultaneously access and modify the data in several databases in a network. Each database in the system is controlled by its local server but cooperates to maintain the consistency of the global distributed database. This figure illustrates a representative Oracle distributed database system. Drau.itec.uni-klu.ac.at Database Server Isel.itec.uni-klu.ac.at Database Server Node 1 Node 2 Network Database Link... DEPT TABLE... Db1 Database Db0 Database EMP TABLE... TRANSACTION INSERT INTO EMP@db0...; DELETE FROM DEPT...; SELECT... FROM EMP@db0...; COMMIT; TRANSACTION INSERT INTO EMP@db0...; DELETE FROM DEPT...; SELECT... FROM EMP@db0...; COMMIT;... Page 3 / 29

4 How to create a distributed database As reported on previous page, in order to create a distributed database you need at least two servers with a database instance on it running, and a network. For our study, we have two Windows NT 4.0 Server, running the Oracle database 8.0.5, and seven PC's under Linux with the Oracle database 8i. The next figure illustrates our study cluster. LAN GIGA SWITCH (FDDI Network 1GB/sec) LPC1 LPC2 LPC4 LPC5 LAN 100 MB / sec SWITCH LPC22 LPC23 LPC24 LAN 100 MB / sec HUB FIREWALL ISEL DRAU Computer configuration LINUX SUSE 6.3 PII 450 Mhz, 128 Mo RAM, 8Go HDD Windows NT 4.0 Server PII 450, 128 Mo RAM, 8Go HDD Computer Name LPC1, LPC2, LPC4, LPC5, LPC22, LPC23, LPC24 ISEL, DRAU Page 4 / 29

5 Oracle setup Before starting the Oracle setup you must setup the network (see 'Setup the network', in the annexe). If the network is ok, we can create a database link. Each database in a distributed database distincts all other databases in the system by its own global database name. Oracle forms a database's global database name by prefixing the database's network domain with the individual database's name. For example, the next figure illustrates a representative hierarchical arrangement of databases throughout a network. COM ACME_TOOLS ACME_AUTO Division 1 Division 2 Division 3 Asia Americas Europe Japan US Mexico UK Germany HQ Fin. Sales Mfgt Sales HQ Sales Sales Sales Sales While several databases can have the same individual name, each database must have a unique global database name. For example, the network domains US.AMERICAS.ACME_AUTO.COM and UK.EUROPE.ACME_AUTO.COM contains a SALES database. SALES.US.AMERICAS.ACME_AUTO.COM SALES.UK.EUROPE.ACME_AUTO.COM Database Links To facilitate application requests in a distributed database system, Oracle uses database links. A database link defines a one-way communication path an Oracle database to another database. Database links are essentially transparent to the users of an Oracle distributed database system, because the name of a database link is the same as the global name of the database to which the link points. Page 5 / 29

6 For example, the following SQL statement creates a database link in the local database that describes a path to the remote Db0 on DRAU Server service_test.itec.uni.klu.ac.at. service_test.itec.uni.klu.ac.at is the same as the name service in Net8 easy config CREATE DATABASE LINK my_link using 'service_test.itec.uni.klu.ac.at' ; After creating a database link, applications connected to the local database can access data in the remote service_test.itec.uni.klu.ac.at database. Now you can make a query like : SELECT * FROM dept@my_link ; Or INSERT INTO dept@my_link VALUES (...); Or DELETE FROM dept@my_link WHERE...; Private, Public, and Global Database Links Oracle allows you to create private, public, and global database links. Private Database Link Public Database Link Global Database Link You can create a private database link in a specific schema of a database. Only the owner of a private database link or PL/SQL subprograms in the schema can use a private database link to access data and database objects in the corresponding remote database. Eg: CREATE DATABASE LINK my_link using 'service_test.itec.uni.klu.ac.at' ; You can create a public database link for a database. All users and PL/SQL subprograms in the database can use a public database link to access data and database objects in the corresponding remote database. Eg: CREATE PUBLIC DATABASE LINK my_link using 'service_test.itec.uni.klu.ac.at' ; When an Oracle network uses Oracle Names, the names servers in the system automatically create and manage global database links for every Oracle database in the network. All users and PL/SQL subprograms in any database can use a global database link to access data and database objects in the corresponding remote database. For more information, you can see Oracle documentation. Page 6 / 29

7 Transparency in a Distributed Database System With minimal effort, you can make the functionality of an Oracle distributed database system transparent to users that work with the system. The goal of transparency is to make a distributed database system appear as though it is a single Oracle database. Consequently, the system does not burden developers and users of the system with complexities that would otherwise make distributed database application development challenging and detract user productivity. The following sections explain more about transparency in a distributed database system. Location Transparency An Oracle distributed database system has features that allow application developers and administrators to hide the physical location of database objects applications and users. Location transparency exists when a user can universally refer to a database object such as a table, regardless of the node to which an application connects. Location transparency has several benefits, including: Access to remote data is simple, because database users do not need to know the physical location of database objects. Administrators can move database objects with no impact on end-users or existing database applications. Most typically, administrators and developers use synonyms to establish location transparency for the tables and supporting objects in an application schema. For example, the following statements create synonyms in a database for tables in another, remote database. CREATE PUBLIC SYNONYM emp FOR emp@my_link; Now, rather than access the remote tables with a query such as: SELECT ename, dname FROM dept@my_link e, dept@my_link d WHERE e.deptno = d.deptno; an application can issue a much simpler query that does not have to account for the location of the remote tables. SELECT e.ename, e.dname FROM emp e, dept d WHERE e.deptno = d.deptno; In addition to synonyms, developers can also use views and stored procedures to establish location transparency for applications that work in a distributed database system. Page 7 / 29

8 BENCHMARK Technical Report - Distributed Database The TPC-H (Ad-hoc, decision support) benchmark represents decision support environments users don't know which queries will be executed against a database system; hence, the "ad-hoc" label. Given this ad-hocness, no re-knowledge of the queries can be built into the DBMS system and the query execution times can be very long. The next figure show the schema of the database used for our tests. The size of the database is about 100 M Bytes, it corresponds to a scale of 0.1 when you generate the flat files for the loading of the database. PART Rows P_PARTKEY P_NAME P_MFGR P_BRAND P_TYPE P_SIZE P_CONTAINER P_RETAILPRICE P_COMMENT S_SUPPKEY S_NAME S_ADDRESS S_NATIONKEY S_PHONE S_ACCTBAL S_COMMENT PARTSUPP Rows PS_PARTKEY PS_SUPPKEY PS_AVAILQTY PS_SUPPLYCOST PS_COMMENT CUSTOMER Rows SUPPLIER 1000 Rows C_CUSTKEY C_NAME C_ADDRESS C_NATIONKEY C_PHONE C_ACCTBAL C_MKTSEGMENT C_COMMENT NATION 25 Rows N_NATIONKEY N_NAME N_REGIONKEY N_COMMENT LINEITEM Rows L_ORDERKEY L_PARTKEY L_SUPPKEY L_LINENUMBER L_QUANTITY L_EXTENDEDPRICE L_DISCOUNT L_TAX L_RETURNFLAG L_LINESTATUS L_SHIPDATE L_COMMITDATE L_RECEIPTDATE L_SHIPINSTRUCT L_SHIPMODE L_COMMENT REGION 5 Rows R_REGIONKEY R_NAME R_COMMENT ORDERS Rows O_ORDERKEY O_CUSTKEY O_ORDERSTATUS O_TOTALPRICE O_ORDERDATE O_ORDERSPRIORITY O_CLERK O_SHIPPRIORITY O_COMMENT The benchmark has been tested on different topologies (machines), and on each node of the cluster, but for the benchmark we take only two nodes LPC1 and LPC2, because all the nodes doesn t have the same configuration and disk space. Anyway this was beyond the scope of this training period. The first test has been made on a single node with the totality of the database, in order to obtain a reference benchmark time. Page 8 / 29

9 The next step has been done on two nodes with a network at 100 Mb/s. On the first node, there are the tables : - REGION, - PART, - PARTSUPP, - SUPPLIER, - NATION, and on the second node : - CUSTOMER, - LINEITEM, - ORDERS. The last test has been carried out with the same repartition of the tables on each node, but with a faster network ( 1 Giga bits / s FDDI). Remark: The size of the database is about 100 M bytes (70 Mbytes for LINEITEM table), and the loading of the database has been done with database the generator (TPC-H tool) this tool creates eight flat files, we used the sqlloader tool to insert them into the database Results We have done severals tests with a different size of the databases, the first test was made with a database of 1 Giga Bytes. However we could not carry out the tests correctly (some queries run over two nights), as a consequence we choose a smaller size for our tests. Important remark: For a good performance cluster, the most important for each node of the cluster is to have a fast hard disk and a lot of memory. A good configuration for each node of the cluster: PC Bi-processor, 256 Mo RAM at least 2 controllers SCSI (UW or U2W), 2-3 Hard disk SCSI (UW or U2W) one disk for the system, one for the index, swap and one for the datafiles like tablespaces. And a good network (1Giga bit Ethernet). Reference benchmark on a single node Computer name Request sql Time in seconds Database 100 Mo Time in seconds Database 250 Mo Time in seconds Database 500 Mo LPC1 Q7.sql LPC1 Q8.sql LPC1 Q9.sql LPC1 Q10.sql LPC1 Q11.sql LPC1 Q12.sql LPC1 Q14.sql LPC1 Q15.sql LPC1 Q16.sql LPC1 Q17.sql LPC1 Q18.sql LPC1 Q19.sql LPC1 Q20.sql Page 9 / 29

10 Reference benchmark on two nodes (network 100 Mb/s) (queries test on LPC1) Computer name Request sql Time in seconds Database 100 Mo Time in seconds Database 250 Mo Time in seconds Database 500 Mo LPC1, LPC2 Q7.sql LPC1, LPC2 Q8.sql LPC1, LPC2 Q9.sql LPC1, LPC2 Q10.sql LPC1, LPC2 Q11.sql LPC1, LPC2 Q12.sql LPC1, LPC2 Q14.sql LPC1, LPC2 Q15.sql LPC1, LPC2 Q16.sql LPC1, LPC2 Q17.sql LPC1, LPC2 Q18.sql LPC1, LPC2 Q19.sql LPC1, LPC2 Q20.sql Reference benchmark on two nodes (network 100 Mb/s) (queries test on LPC2) Computer name Request sql Time in seconds Database 100 Mo Time in seconds Database 250 Mo Time in seconds Database 500 Mo LPC1, LPC2 Q7.sql LPC1, LPC2 Q8.sql LPC1, LPC2 Q9.sql LPC1, LPC2 Q10.sql LPC1, LPC2 Q11.sql LPC1, LPC2 Q12.sql LPC1, LPC2 Q14.sql LPC1, LPC2 Q15.sql LPC1, LPC2 Q16.sql LPC1, LPC2 Q17.sql LPC1, LPC2 Q18.sql LPC1, LPC2 Q19.sql LPC1, LPC2 Q20.sql Reference benchmark on two nodes (network 1 Gb/s) (queries test on LPC1) Computer name Request sql Time in seconds Database 100 Mo Time in seconds Database 250 Mo Time in seconds Database 500 Mo LPC1, LPC2 Q7.sql LPC1, LPC2 Q8.sql LPC1, LPC2 Q9.sql LPC1, LPC2 Q10.sql LPC1, LPC2 Q11.sql LPC1, LPC2 Q12.sql LPC1, LPC2 Q14.sql LPC1, LPC2 Q15.sql LPC1, LPC2 Q16.sql LPC1, LPC2 Q17.sql LPC1, LPC2 Q18.sql LPC1, LPC2 Q19.sql LPC1, LPC2 Q20.sql Page 10 / 29

11 Reference benchmark on two nodes (network 1 Gb/s) (queries test on LPC2) Computer name Request sql Time in seconds Database 100 Mo Time in seconds Database 250 Mo Time in seconds Database 500 Mo LPC1, LPC2 Q7.sql LPC1, LPC2 Q8.sql LPC1, LPC2 Q9.sql LPC1, LPC2 Q10.sql LPC1, LPC2 Q11.sql LPC1, LPC2 Q12.sql LPC1, LPC2 Q14.sql LPC1, LPC2 Q15.sql LPC1, LPC2 Q16.sql LPC1, LPC2 Q17.sql LPC1, LPC2 Q18.sql LPC1, LPC2 Q19.sql LPC1, LPC2 Q20.sql Page 11 / 29

12 Graphical results Technical Report - Distributed Database Time in second Benchmark 100 M Bytes of Data Single node two nodes 100MB/S two nodes 100MB/S Test on LPC2 two nodes 1GB/S two nodes 1GB/S Test on LPC2 Queries two nodes 1GB/S Test on LPC2 two nodes 1GB/S two nodes 100MB/S Test on LPC2 two nodes 100MB/S Single node On this figure, two queries are missing, because the execution time was too high, and we could not have seen the details for the other queries. Thus, the next figure show the queries Q8 and Q9 Time in second Q8 Q9 two nodes 100MB/S Single node Queries two nodes 1GB/S Test on LPC2 two nodes 1GB/S two nodes 100MB/S Test on LPC2 Benchmark 100 M Bytes of Data Single node two nodes 100MB/S two nodes 100MB/S Test on LPC2 two nodes 1GB/S two nodes 1GB/S Test on LPC2 Page 12 / 29

13 800 Time in second Benchmark 250 M Bytes of Data Single node two nodes 100MB/S two nodes 100MB/S Test on LPC2 two nodes 1GB/S two nodes 1GB/S Test on LPC2 0 Q7 Q10 Q11 Q12 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Queries tw o nodes 1GB/S Single node Like on the previous page, two queries are missing in this figure. So the next figure shows the queries Q8 and Q9 Time in second Benchmark 250 M Bytes of Data Single node Q8 Q9 two nodes 1GB/S Test on LPC2 two nodes 1GB/S two nodes 100MB/S Test on LPC2 two nodes 100MB/S Single node Queries two nodes 100MB/S two nodes 100MB/S Test on LPC2 two nodes 1GB/S two nodes 1GB/S Test on LPC2 Page 13 / 29

14 Time in second 2500 Benchmark 500 M Bytes of Data Single node two nodes 100MB/S two nodes 100MB/S Test on LPC2 two nodes 1GB/S two nodes 1GB/S Test on LPC2 0 Q7 Q10 Q11 Q12 Q14 Q15 Q16 Q17 Q18 Q19 Q20 tw o nodes 1GB/S Single node Queries Like on the previous page, two queries are missing in this figure. So the next figure shows the queries Q8 and Q9 Time in second Benchmark 500 M Bytes of Data Single node Q8 Q9 Queries Single node two nodes 1GB/S two nodes 100MB/S two nodes 100MB/S Test on LPC2 two nodes 1GB/S two nodes 1GB/S Test on LPC2 Page 14 / 29

15 Conclusion Technical Report - Distributed Database We saw in this technical report, the various mechanisms to carry out a distributed database. The goal of this project was to show that a distributed database could bring a profit in performances for the execution time of SQL requests. With the various tests which were carried out, we realise that according to requests' the results differ. Indeed, for some distributed requests the execution times to decreases compared to the single node case. The problems of this project are not clause, indeed it remains much of points to be studied in order to have a global vision of the subject, it would be necessary for example to carry out tests with a cluster having more powerful nodes, or to test various distributions of table on 2, 3 to see 4 databases. Finally, the basic concept of distributed data, is only one introduction to other subjects of reflection such as the database replication. Finally to conclude, I would wish to thank Mr Harald KOSCH. Page 15 / 29

16 ANNEXES Page 16 / 29

Network setup Technical Report - Distributed Database The way the network is setup depends on if the machine acts as server or client or client - server First stage Net8 Assistant for a client Call

17 Network setup Technical Report - Distributed Database The way the network is setup depends on if the machine acts as server or client or client - server First stage Net8 Assistant for a client Call the Net8 Assistant program Prompt>netasst & The only thing, you have to setup, is a domain (WORLD is the default value, but you can write what you want.) I recommend to set an internet domain like itec.uni.klu.ac.at First stage Net8 Assistant for a server For your server, you have to add a listener. And only configure the database name, and the listening location, and the default domain name. Tips and Tricks: try do not write specific character like " - " like uni-klu or " \ " or " / " because when we create a database link it doesn t work. Page 17 / 29

Second stage Net8 Easy Config On each server and client, you have to setup the Net8 easy config program to be sure that the communication works well between all the nodes

For this you have to lauch the Net8 easy config program and this screen appears : Prompt>netec & First, you have to add a new service and a new service name (this name is

18 Second stage Net8 Easy Config On each server and client, you have to setup the Net8 easy config program to be sure that the communication works well between all the nodes of the cluster. For this you have to lauch the Net8 easy config program and this screen appears : Prompt>netec & First, you have to add a new service and a new service name (this name is very significant for the continuation of the events). Remark: The service name is automatically supplemented with the default domain that you have choose before. With Net8 easy config program you can test a communication to an Oracle database server. After this, you have to a network protocol (TCP/IP is ed by default) Page 18 / 29

5) and for Oracle 8i you can specified Oracle Database name or a System Identifier for an old version of Oracle.

19 Now, you have to specify the host name of the remote database and the port number. (the port 1521 is the default value). Now, you have to specify the System Identifier. (SID is only use for old versions of Oracle like 8.0.5) and for Oracle 8i you can specified Oracle Database name or a System Identifier for an old version of Oracle. Tips and Tricks : before testing, I recommend to click on next button and to click on finish button, because sometimes the program CRASH ;-). Then by restart the program you will be able to test if your service of communication functions correctly Page 19 / 29

20 Now you can test your communication And normally you obtain this screen : If it doesn t work, you can see the Oracle Documetation, or look the following tip and tricks. Page 20 / 29

If this service is not correctly started, you ve got some problems to communicate with a remote database and TNS errors messages.

21 Tips and Tricks : You can see if some Oracle s services function correctly under windows NT. The service the most important is the OracleTNSListener80. If this service is not correctly started, you ve got some problems to communicate with a remote database and TNS errors messages. Maybe, it's possible to do another install of oracle ;-(. Under linux : You can test if the listener is working. PROMPT >lsnrctrl LSNRCTRL>status LSNRCTRL>start 'for start the listener stop for stopping the listener, help for help Page 21 / 29

Global administration Technical Report - Distributed Database For the managing of all of these databases, I recommend to use the tool Oracle Enterprise Manager. This tool works only on Windows.

22 Global administration Technical Report - Distributed Database For the managing of all of these databases, I recommend to use the tool Oracle Enterprise Manager. This tool works only on Windows. Oracle Enterprise Manager combines a graphical console, agents, common services, and tools to provide an integrated, comprehensive systems management platform for managing Oracle products. From Enterprise Manager's Console, you can: - Administrate, diagnose, and tune multiple databases. - Distribute software to multiple servers and clients. - Schedule jobs on multiple nodes at varying time intervals. - Monitor objects and events throughout the network. - Customise your display using multiple graphic maps and groups of network objects, such as nodes and databases. - Administer Oracle Parallel Servers. - Integrate participating Oracle and third-party tools (Fail safe,...). Remark: On each server, the OracleAgent80 services NT must be started, and on Linux you call the listener program: Prompt>lsnrctl LSNRCTL>dbsnmp_start to start or dbsnmp_stop to stop the agent. Page 22 / 29

23 Queries Querie n 1: -- TPC-H/TPC-R Volume Shipping Query (Q7) -- Functional Query Definition -- Approved February 1998 supp_nation, cust_nation, l_year, sum(volume) as revenue ( n1.n_name as supp_nation, n2.n_name as cust_nation, to_char(l_shipdate, 'YYYY') as l_year, l_extendedprice * (1 - l_discount) as volume supplier, lineitem, orders, customer, nation n1, nation n2 s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = ':1' and n2.n_name = ':2') or (n1.n_name = ':2' and n2.n_name = ':1') ) and l_shipdate between '01-jan-95' and '31-dec-96' ) shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year / Querie n 2: -- TPC-H/TPC-R National Market Share Query (Q8) -- Functional Query Definition -- Approved February 1998 o_year, sum(volume) mkt_share ( to_char(o_orderdate, 'YYYY') o_year, l_extendedprice * (1 - l_discount) volume, n2.n_name nation part, supplier, lineitem, orders, customer, nation n1, nation n2, region p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = ':2' and s_nationkey = n2.n_nationkey and o_orderdate between date ' ' and date ' ' and p_type = ':3' ) all_nations group by o_year order by o_year Page 23 / 29

24 Querie n 3: -- TPC-H/TPC-R Product Type Profit Measure Query (Q9) -- Functional Query Definition -- Approved February 1998 nation, o_year, sum(amount) sum_profit ( n_name nation, to_char(o_orderdate,'yyyy') o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity amount part, supplier, lineitem, partsupp, orders, nation s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%:1%' ) profit group by nation, o_year order by nation, o_year desc Querie n 4: -- TPC-H/TPC-R Returned Item Reporting Query (Q10) -- Functional Query Definition -- Approved February 1998 c_custkey, c_name, c_acctbal, n_name, c_address, c_phone, sum(l_extendedprice * (1 - l_discount)) revenue customer, orders, lineitem, nation c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= '01-JAN-1999' and o_orderdate < '01-MAR-1999' and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address Querie n 5: -- TPC-H/TPC-R Important Stock Identification Query (Q11) -- Functional Query Definition -- Approved February 1998 ps_partkey, sum(ps_supplycost * ps_availqty) as value partsupp, supplier, nation ps_suppkey = s_suppkey Page 24 / 29

25 and s_nationkey = n_nationkey and n_name = ':1' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( sum(ps_supplycost * ps_availqty) * 150 partsupp, supplier, nation ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = ':1' ) order by value desc Querie n 6: Technical Report - Distributed Database -- TPC-H/TPC-R Shipping Modes and Order Priority Query (Q12) -- Functional Query Definition -- Approved February 1998 l_shipmode, sum(l_quantity) high_line_count orders, lineitem o_orderkey = l_orderkey and l_shipmode in ('AIR', 'SHIP') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= '01-JAN-1950' and l_receiptdate < '01-JAN-1998' group by l_shipmode order by l_shipmode Querie n 7: -- TPC-H/TPC-R Promotion Effect Query (Q14) -- Functional Query Definition -- Approved February * sum(l_extendedprice * (1 - l_discount)) as promo_revenue lineitem, part l_partkey = p_partkey and l_shipdate >= '01-JAN-1950' and l_shipdate < '01-JAN-1999' Querie n 8: -- TPC-H/TPC-R Top Supplier Query (Q15) -- Functional Query Definition -- Approved February 1998 create view revenue (supplier_no, total_revenue) as l_suppkey, sum(l_extendedprice * (1 - l_discount)) lineitem l_shipdate >= '01-JAN-1950' Page 25 / 29

26 and l_shipdate < '01-JAN-1999' group by l_suppkey / s_suppkey, s_name, s_address, s_phone, total_revenue supplier, revenue s_suppkey = supplier_no and total_revenue = ( max(total_revenue) revenue ) order by s_suppkey / drop view revenue Querie n 9: -- TPC-H/TPC-R Parts/Supplier Relationship Query (Q16) -- Functional Query Definition -- Approved February 1998 p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt partsupp, part p_partkey = ps_partkey and p_brand <> ':1' and p_type not like ':2%' and p_size in (10, 100, 2000, 3000, 50, 69, 5000, 10000) and ps_suppkey not in ( s_suppkey supplier s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size Querie n 10: -- TPC-H/TPC-R Small-Quantity-Order Revenue Query (Q17) -- Functional Query Definition -- Approved February 1998 sum(l_extendedprice) / 7.0 as avg_yearly lineitem, part p_partkey = l_partkey and p_brand = ':1' and p_container = ':2' and l_quantity < ( 0.2 * avg(l_quantity) lineitem l_partkey = p_partkey ); Page 26 / 29

27 Querie n 11: -- TPC-H/TPC-R Large Volume Customer Query (Q18) -- Function Query Definition -- Approved February 1998 c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) customer, orders, lineitem o_orderkey in ( l_orderkey lineitem group by l_orderkey having sum(l_quantity) > ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate; Querie n 12: -- TPC-H/TPC-R Discounted Revenue Query (Q19) -- Functional Query Definition -- Approved February 1998 sum(l_extendedprice* (1 - l_discount)) as revenue lineitem, part ( p_partkey = l_partkey and p_brand = ':1' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 1 and l_quantity <= and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = ':2' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 1 and l_quantity <= and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = ':3' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 1 and l_quantity <= and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); Page 27 / 29

28 Querie n 13: -- TPC-H/TPC-R Potential Part Promotion Query (Q20) -- Function Query Definition -- Approved February 1998 s_name, s_address supplier, nation s_suppkey in ( ps_suppkey partsupp ps_partkey in ( p_partkey part p_name like ':1%' ) and ps_availqty > ( 0.5 * sum(l_quantity) lineitem l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= '01-JAN-1950' and l_shipdate < '01-JAN-1999') ) and s_nationkey = n_nationkey and n_name = ':3' order by s_name; Page 28 / 29

29 Time report Page 29 / 29

TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets

TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets TPC-H Benchmark Set TPC-H Benchmark TPC-H is an ad-hoc and decision support benchmark. Some of queries are available in the current Tajo. You can download the TPC-H data generator here. DDL for TPC-H datasets