An Oracle White Paper April PDF Free Download

An Oracle White Paper April 2010

In October 2009, NEC Corporation ( NEC ) established development guidelines and a roadmap for IT platform products to realize a next-generation IT infrastructures suited for cloud computing, announcing an IT platform vision called REAL IT PLATFORM Generation 2. The goal of the REAL IT PLATFORM Generation 2 vision is to provide flexible, dependable, and easy-to-use systems to customers based on NEC s virtualization, high reliability, integration, and usability technologies. Oracle Corporation Japan ( Oracle Japan ) began providing Oracle GRID technology for realizing grid computing several years before the announcement of Oracle Database 10g. In November 2006, Oracle Japan established the Oracle GRID Center, a facility designed for technical verifications performed in partnership with affiliated vendors, to create next-generation grid-based business solutions. Based on the collaborative relationship extending over some 20 years, NEC and Oracle Japan have promoted a strategic technology alliance (STA) to develop and realize next-generation IT infrastructures. As part of this effort, NEC joined the Oracle GRID Center to give concrete shape to the REAL IT PLATFORM Generation 2 vision. Oracle GRID Center conducted pre-release validation program for Oracle Database 11g Release 2 with cooperative partners. There are two purposes of this program. These are improvement of product quality through validation testing, including feedback to development departments, and enablement of the new features at the time of the product release for effective user scenarios. This paper is one of the work products of this program with NEC.

Overview The environment that businesses must adapt to is changing more rapidly than ever before, and decisionmakers are required to make business decisions and improvements to their business quickly. In order to make these rapid business decisions, decision-makers use Data Warehouses (DWH) to view historical data and perform predictive analysis. To make rapid decisions, the DWH must provide data quickly to end users. To deliver that performance Oracle Database has been providing parallel execution for faster SQL execution, which utilizes storage and CPU resources efficiently. Recently, the need for storage I/O performance has grown due to the increase of data caused by the complex and global business environments. On the other hand, because of the increased capacity that a single disk can hold, it has become a trend to design storage systems based on the data size which results in a relatively small amount of physical disks. As a result, the marginal performance of the storage systems results in a possible bottleneck for the entire DWH system. In-Memory Parallel Query (PQ) was implemented as a way to solve that bottleneck in Oracle Database 11g Release 2 by caching data on physical memory while performing parallel statements. This paper introduces a solution that improves the performance of the entire DWH system by effectively using the In-Memory PQ on a NEC Express5800/Scalable HA server. This particular server has a large memory capacity making it suitable for a system leveraging In-Memory PQ. In-Memory Parallel Query Conventional PQ execution uses Direct Path Read, which loads data directly from the disks and bypassing the database buffer cache (buffer cache). Although reading data from memory has smaller latency, it has been virtually impossible to cache large amounts of data into the buffer cache because of the cost of high-volume memory modules and hardware limitations. Assuming that conventional parallel execution did load data into the buffer cache, the data will be cached out one after another due to the limited buffer cache size, and the overhead to manage the cached data will decrease the SQL response time. Hence, the conventional parallel execution performed by the Oracle Database was adopted to load data directly from the disks. However, the latest servers can now host sufficient amounts of physical memory. For example, the NEC Express5800/Scalable HA server has memory capacity up to 1024G (256GB 4BOX). Such a server can hold a large amount of data in the buffer cache, and therefore can fully utilize the In-Memory PQ capabilities. 3

In-Memory PQ is a parallel query feature that uses database segments (such as tables, indexes) cached in the buffer cache to execute a query, therefore eliminating the performance limitation set by the storage. In case the target data is not cached, response time is slightly slower than with Direct Path Read because there there is a need to warm up the cache (e.g. load data into the cache). If PQ deals with more data than about 80% of buffer cache size, it automatically detects this when generating the SQL execution plan and uses Direct Path Read instead of the buffer cache (Figure 2). 4

To use In-Memory PQ, the initialization parameter PARALLEL_DEGREE_POLICY must be set to "AUTO" (default: MANUAL). Change in the application is not required. In addition, the combination of In-Memory PQ with Oracle Partitioning and/or Table Compression is very effective as we will explain in more detail. Oracle Partitioning Oracle Partitioning is a feature that divides tables and indexes into multiple parts (Partition). Users can access partitioned tables as a normal table, and no change in the application is necessary. Tables and indexes can be divided using different methods such as range (divide by key range), hash (divide by hash key), list (divided by key value) and composite partitions which is a combination of two partitioning methods. Table 1 shows selectable composite partitions and Oracle Database Version. 5

MAIN Table 1 selectable composite partition SUB RANGE LIST HASH RANGE 11gR1 9iR2 8i LIST 11gR1 11gR1 11gR1 Partitioning provides three benefits; performance, manageability and availability. This paper explains the performance benefit when using partition pruning which is especially effective for DWH queries. Partition pruning improves query performance by only accessing partitions that include the desired data. For example, consider a case where a query requests sales data for a specific month (2009/10) from the SALES table. If the SALES table is a non-partitioned table, the query scans the entire table. However, if the SALES table is partitioned, the query scans only one partition (2009/10) because all its data is located in this single partition. There is no need to scan the irrelevant partitions, resulting in an increased performance (Figure 3). 6

The combined use of Oracle Partitioning and In-Memory PQ is very effective. When using partitioned tables, there is a greater tendency for the data to be cached in the buffer cache because partition pruning reduces the actual data size to be used. There is no need to scan the irrelevant partitions increasing the chances that the data fits within 80% of the buffer cache (Figure 4). Table Compression Table Compression is a feature that reduces the table size by eliminates duplicate data within an Oracle Database Block (block) which allows more rows to be stored within a single block. This function is especially effective to reduce response time in DWH systems because it reduces the disk I/O required to retrieve a given number of rows. Moreover, the data cached in the buffer cache remains compressed, allowing more data to be stored within the buffer cache resulting in an expansion of the applicable query range for In-Memory PQ (Figure 5). 7

Platform Introduction NEC's A1160 system architecture features an efficiently scalable, highly reliable, and easily serviceable solution perfect for system consolidation, virtualization deployments, and enterprise database applications. Each A1160 system can scale from 1-node, 4-sockets, 24-cores, and 256GB memory, to 4-nodes, 16-sockets, 96-cores, and 1TB memory. 8

Verification Environment Oracle Database 11g Release 2 was installed on an Express5800/A1160 with a database created on an istorage D3 storage connected by 4 4Gbps fibre channel cables (FC). The database has 1TB of data. : ( ) : ( ) ( ) : 9

Verification Model Generally, there are two types of users of a DWH; general users and executives. The verification was done under the hypothesis that general users such as sales representatives mainly access recent data to resolve the problems they face, and executives tend to access data across several years to make mid-and-long term business plans. The trends of queries that are executed by each type of users are as follows. The queries that general users execute is executed by many users and executed with high frequency. On the other hand, the queries that executives execute is executed by a few users and executed with low frequency (Figure 7, Table 2). ( ) These users execute queries concurrently in a DWH system, and in conventional servers with a small memory footprint, each parallel query request is executed by Direct Path Read because the server is unable to cache large data set. In this case, the performance of the PQ is often limited by Storage I/O bandwidth due to the workload concentrating on the storage (Figure 8). 10

Verification was done to overcome this situation by using large memory server (NEC Express5800/Scalable HA server) and In-Memory PQ (New Function of Oracle Database 11g Release 2). 11

Verification 1: Comparison between conventional PQ and In-Memory PQ Although more than one user execute a different query in real DWH systems, as mentioned in the previous chapter (Verification Model), this chapter uses a simple case where 1 user executes the same query, and compared it between conventional PQ and In-Memory PQ. Conventional PQ is set by disabling In-Memory PQ setting (PARALLEL_DEGREE_POLICY=MANUAL). In addition, this chapter also explains how the Oracle Database decides whether to use Direct Path Read or not when executing a PQ which requires a large size of data. The verification was performed as follows. 1. Prepare a range-partitioned table (Each partition has data for 1 month, data size is 2.5GB per partition) 2. Compare the performance of same query with In-Memory PQ enabled and disabled. 3. Expand the data size of the query by adding more months to be covered in the query (changing where clause of the SQL). 4. Return to step 2 and run the test again The verification environment is as follows. ) 12

Figure 10 is the result of this verification, showing the relative response time. The response time of a query that uses Direct Path Read when data size of query range 15GB is assigned a value of 100 and acts as the baseline in the tests. This baseline is used in verification 1 and 2. If the data size of a query range is less than about 80% of buffer cache size, an In-Memory PQ is executed and the tests here show that the execution is faster than not using In-Memory PQ. If the data size is larger than about 80% of buffer cache size, the query is executed using Direct Path Read and we see the traditional PQ execution times. 13

Next, in order to confirm the performance and H/W resource usage difference between In-Memory PQ and Direct Path Read, we focus on the baseline data size of 15GB. This baseline data size is a little less than about 80% of buffer cache. Figure 11 is comparison response time between Direct Path Read and In-Memory PQ. In this verification, In-Memory PQ is five times faster than Direct Path Read. As for In-Memory PQ, it is supposed that all target data is cached in the buffer cache in advance, and the result above is obtained after loading all data into buffer cache. Executing this query before the target data is loaded into the buffer cache will result in a slower response time than Direct Path Read. Figure 12 is the Disk I/O time-series graph of In-Memory PQ and Direct Path Read. This figure shows the relative Disk I/O speed (Read) where the value of 100 is set to the maximum Disk I/O (Read) speed examined in advance. When Direct Path Read is used, the storage resource used almost neared its capacity ceiling. When In- Memory PQ is executed, storage resource usage is very low as expected. 14

Figure 13 is the CPU usage rate time-series graph of Direct Path Read and In-Memory PQ When Direct Path Read is used, CPU usage rate peakes at about 10 %. In this case, the CPU resource is not used very effectively. This under utilization is caused by CPU waits, which means CPU is waiting for I/O supply from the storage. While In-Memory PQ is executed, CPU usage is driven to almost 100%, showing the effective use of CPU resources. The above results show that In-Memory PQ is faster than Direct Path Read PQ because of the effective use of CPU resources and low use of storage resource. The next verification shows the case where the data size of query range is larger than about 80% of buffer cache size. Figure 14 is the comparison response time between In-Memory PQ disabled and enabled; where data size of query range is about 20GB (80% of buffer cache size is 19.6GB). In this case, although In-Memory PQ is enabled, Direct Path Read is executed, resulting in a nearly equeal response time in both cases. 15

Following data are the Top 5 event of In-Memory PQ is disabled and enabled. In both cases, more than 90% of time is used by Direct Path Read event, confirming that Direct Path Read is executed even though In-Memory PQ is enabled. As described above, when In-Memory PQ is enabled and the data size is smaller than about 80% of the buffer cache size, the PQ is executed using the buffer cache effectively, while in cases where the data size of query range is larger than about 80%, Direct Path Read is used. From the above results, in our DWH model, the short term queries which are executed by general users are sped up from In-Memory PQ, while long term queries executed by executives are executed using Direct Path Read. Enabling In-Memory PQ therefore speeds up the shorter running, repetitive queries, without interfering in any way with the longer running, strategic queries. 16

Verification 2: Expand applicable query range for In-Memory PQ As might be expected from the result of Verification 1, even when In-Memory PQ is enabled, if the data size becomes larger than 80% of the buffer cache size, In-Memory PQ will not be applied. There are 2 solutions for such situation. One is to expand the buffer cache size by adding physical memory, and the other is to reduce the data size. In Verification 2-1, a feature of NEC Express5800/A1160, Addition BOX was used to expand the buffer cache size. In Verification 2-2, Table Compression of the Oracle Database was used to reduce the data size. This section verifies how to expand the applicable query range for In-Memory PQ by using the feature of NEC s Express5800/A1160, Addition BOX. This allows users to add resources to the server. By adding an additional BOX, physical memory size is increased which enables the setting of the buffer cache to a larger size. The same verification was conducted as in Verification 1, but now by using a 2BOX configuration., The results were then compared to the ones obtained in Verification 1 (1BOX). This verification environment is as follows. : : : : ) 17

Figure 15 shows the comparison of the applicable query range of In-Memory PQ between 1BOX and 2BOX. With 1BOX, the boundary line between In-Memory PQ and Direct Path Read is about 20GB, while the line moved to about 40GB in case of 2BOX. The results confirm that expanding the applicable query range of In-Memory PQ is possible using the feature of NEC Express5800/A1160, Addition BOX. It is also confirmed that expanding the buffer cache delivers an increase of query performance for a larger set of queries. This section verifies how to expand the applicable query range of In-Memory PQ by using Oracle Database Table Compression. Table Compression allows more data to be cached, because the buffer cache caches the data in compressed form. Furthermore, the performance of the query is improved not only for In-Memory PQ but for Direct Path Read as well due to the data size reduction by compressing that data. The same verification was conducted as in Verification 1 but with a table compression ratio of 2.1.Again,the results were then compared against the results using a table that is not compressed. 18

This verification environment is as follows. : : Figure 16 shows the response time between a compressed table and no compressed table. Figure 16 confirms that compressing the table results in an expansion to the applicable query range from about 40GB to about 85GB (about 80% of the buffer cache size (41.1GB) * Compress Rate (2.1)). This means that more data has been cached due to compression. If the data size is more than about 85GB, although the query is executed using Direct Path Read, the compressed case is about 2 times faster than the case where no compression is done. This is the effect of decreasing the disk I/O by leveraging Table Compression. As described above, using Table Compression expands the applicable query range for an In-Memory PQ for queries executed by general users, allowing increases in the DWH data size. Furthermore, queries which are executed by executives also gain in performance due to the use of compression. Using In-Memory PQ and compression creates a win-win situation for all query users across a broad spectrum of data sizes. 19

Verification 3: Performance Improvement across the system by using In-Memory PQ By executing In-Memory PQ instead of Direct Path Read (conventional way), short term queries executed by general user population are faster, because the data is small enough to be in the buffer cache and because this data is used repeatedly. Furthermore, it is expected to reduce Disk I/O. On the other hand, the queries executed by executives generally query large long term data sets and tend to be executed by Direct Path Read. However, the queries executed by executive users can make full use of storage I/O, part of which was used by general users before, resulting in better performance across the board (Figure 17). In other words, using In-Memory PQ drives better performance for the entire query ecosystem, not just for the direct beneficiaries of In-Memory PQ. 20

This chapter introduces performance improvement of entire Data Warehouse system by using In-Memory PQ. The verification environment model is as follows. General users use 9 sessions and executives use 1 session, accounting for a total of 10 sessions on the system. All general users execute queries on data for the latest month, and executive execute queries against the latest full year of data. Every session executes queries continuously, and the 10 sessions are always executed concurrently. The average response time for each type of users were measured after a given period of time from executing queries concurrently. The verification environment is as follows. : : : : : : ( ) ( ) 21

Figure 19 shows the comparison for each user s average response time for both cases where In-Memory PQ was enabled and disabled. If In-Memory PQ is enabled, the queries that are executed by the general users were some 5 times faster than with In-Memory PQ disabled and the queries executed by executives were some 9 times faster. 22

For reference, the CPU usage rate (usr+sys) is shown below for both In-Memory PQ enabled and disabled (Figure 20). When In-Memory PQ is disabled, all queries are executed by using Direct Path Read and the CPU usage rate is low (peaking at about 10%). This shows the disk I/O performance bottleneck. On the other hand, when In-Memory PQ is enabled, the data used by the general users is supplied from physical memory whereas the data used by executives is supplied via I/O from the storage subsystem. In this case, the CPU can continue to process the data supplied from physical memory while waiting for the data from storage. Because of this, CPU usage rate is kept at a fairly high rate. This means the DWH system can use HW resources much more effectively by leveraging In-Memory PQ and all users benefit from the feature. 23

This verification confirms the improved performance of the entire DWH system by using In-Memory PQ. In a general DWH system, the disk I/O tends to be performance bottleneck, but by using In-Memory PQ, the queries executed by In-Memory PQ instead of Direct Path Read execute much faster than the limitation caused by the storage I/O capacity. Furthermore, even queries which cannot be executed using In-Memory PQ see an improved performance, because these queries get the benefit of using the extra I/O bandwidth that is released by queries that use In-Memory PQ instead of Direct Path Read. In addition, this verification introduces two ways to expand the applicable query range of In-Memory PQ. One is the Addition BOX functionality of the NEC Express5800/Scalable HA server, which allows expanding the physical memory by adding server resources. The other way is by using Table Compression, which reduces the data size. In the latter case, the advantage is not only the expanded applicable query range for In-Memory PQ, but also the improved performance for queries that are executed using Direct Path Read. To summarize, the combination of NEC Express5800/Scalable HA server and the new feature of Oracle Database 11g Release 2, In-Memory Parallel Query, offers a more than suitable solution for any DWH requiring high performance and concurrent and mixed use query loads. 24