Big Data? Faster Cube Builds? PROC OLAP Can Do It

Similar documents
Guide Users along Information Pathways and Surf through the Data

System Requirements. SAS Profitability Management 2.3. Deployment Options. Supported Operating Systems and Versions. Windows Server Operating Systems

SAS Activity-Based Management Software Release for Windows

SAS Scalable Performance Data Server 4.3

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

System Requirements. SAS Profitability Management 2.1. Server Requirements. Server Hardware Requirements

Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX

Considering the 2.5-inch SSD-based RAID Solution:

When, Where & Why to Use NoSQL?

Abdel Etri, SAS Institute Inc., Cary, NC

System Requirements. SAS Activity-Based Management 7.2. Deployment

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman

The Server-Storage Performance Gap

Four Steps to Unleashing The Full Potential of Your Database

System Requirements. SAS Activity-Based Management Deployment

SoftNAS Cloud Performance Evaluation on Microsoft Azure

Journey to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China

Using SAS Enterprise Guide with the WIK

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization

Finding a needle in Haystack: Facebook's photo storage

An Oracle White Paper September Oracle Utilities Meter Data Management Demonstrates Extreme Performance on Oracle Exadata/Exalogic

Massive Scalability With InterSystems IRIS Data Platform

Best Practice for Creation and Maintenance of a SAS Infrastructure

Increasing Performance for PowerCenter Sessions that Use Partitions

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics

SAS ENTERPRISE GUIDE USER INTERFACE

Tips and Tricks for Organizing and Administering Metadata

Assignment 5. Georgia Koloniari

IF there is a Better Way than IF-THEN

QLogic/Lenovo 16Gb Gen 5 Fibre Channel for Database and Business Analytics

SoftNAS Cloud Performance Evaluation on AWS

OPTIMIZE. MONETIZE. SECURE. Agile, scalable network solutions for service providers.

Using SAS software to shrink the data in your applications

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer

Decision Management with DS2

WHITE PAPER NGINX An Open Source Platform of Choice for Enterprise Website Architectures

How VERITAS Volume Manager Complements Hardware RAID in Microsoft Windows Server Environments

Rapid Provisioning. Cutting Deployment Times with INFINIDAT s Host PowerTools. Abstract STORING THE FUTURE

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Optimizing Performance for Partitioned Mappings

PREREQUISITES FOR EXAMPLES

Traffic Triggers Domain Here.com

Comparing File (NAS) and Block (SAN) Storage

VEXATA FOR ORACLE. Digital Business Demands Performance and Scale. Solution Brief

After completing this course, participants will be able to:

Data Model Considerations for Radar Systems

An Annotated Guide: The New 9.1, Free & Fast SPDE Data Engine Russ Lavery, Ardmore PA, Independent Contractor Ian Whitlock, Kennett Square PA

ABSTRACT INTRODUCTION MACRO. Paper RF

Test Report: Digital Rapids Transcode Manager Application with NetApp Media Content Management Solution

SQL Server Analysis Services

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE

The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

Viságe.BIT. An OLAP/Data Warehouse solution for multi-valued databases

WIND RIVER NETWORKING SOLUTIONS

Sustainable Security Operations

VMware vsphere Beginner s Guide

How Real Time Are Your Analytics?

Seamless Dynamic Web (and Smart Device!) Reporting with SAS D.J. Penix, Pinnacle Solutions, Indianapolis, IN

Configuring SAS Web Report Studio Releases 4.2 and 4.3 and the SAS Scalable Performance Data Server

1Highwinds. Software. Highwinds Software LLC. Document Version 1.1 April 2003

VMware vsphere: Taking Virtualization to the Next Level

On-Premises v7.x Installation Guide

Professional Services Tools Library. Release 2011 FP1

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

Identify and Eliminate Oracle Database Bottlenecks

Red Hat Gluster Storage performance. Manoj Pillai and Ben England Performance Engineering June 25, 2015

Microsoft Office SharePoint Server 2007

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Enabling Efficient and Scalable Zero-Trust Security

Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA Copyright 2003, SAS Institute Inc. All rights reserved.

Performance and User Experience...2. Exceptional Performance Best Practices...2. Testing the xamwebgrid...3. Performance Results...

High-Performance Procedures in SAS 9.4: Comparing Performance of HP and Legacy Procedures

An Inside Look at Version 9 and Release 9.1 Threaded Base SAS procedures Robert Ray, SAS Institute Inc. Cary NC

Greenspace: A Macro to Improve a SAS Data Set Footprint

OLAP Introduction and Overview

SAS Factory Miner 14.2: Administration and Configuration

Reproducibly Random Values William Garner, Gilead Sciences, Inc., Foster City, CA Ting Bai, Gilead Sciences, Inc., Foster City, CA

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b l e d N e x t - G e n e r a t i o n V N X

Let SAS Write and Execute Your Data-Driven SAS Code

IBM WebSphere Message Broker for z/os V6.1 delivers the enterprise service bus built for connectivity and transformation

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment

ELASTIC DATA PLATFORM

Best Practices for Setting BIOS Parameters for Performance

Four-Socket Server Consolidation Using SQL Server 2008

Real-time Protection for Microsoft Hyper-V

Nicholas Dritsas Principal Program Manager Microsoft Corporation Microsoft Corporation. All rights reserved

2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or

Hybrid WAN Operations: Extend Network Monitoring Across SD-WAN and Legacy WAN Infrastructure

SEVEN Networks Open Channel Traffic Optimization

Data Protection at Cloud Scale. A reference architecture for VMware Data Protection using CommVault Simpana IntelliSnap and Dell Compellent Storage

CAST(HASHBYTES('SHA2_256',(dbo.MULTI_HASH_FNC( tblname', schemaname'))) AS VARBINARY(32));

Performance of relational database management

The Data Explosion. A Guide to Oracle s Data-Management Cloud Services

INTELLIGENCE DRIVEN GRC FOR SECURITY

Networking Recap Storage Intro. CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden

Sizing Guidelines and Performance Tuning for Intelligent Streaming

A SAS/AF Application for Parallel Extraction, Transformation, and Scoring of a Very Large Database

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other

Transcription:

ABSTRACT Paper 1794-2014 Big Data? Faster Cube Builds? Can Do It Yunbo (Jenny) Sun, Canada Post Michael Brule, SAS Canada In many organizations the amount of data we deal with increases far faster than the hardware and IT infrastructure to support them. As a result, we encounter significant bottlenecks and I/O bound processes. However, clever use of SAS software can help us find a way around. In this paper we will look at the clever use of to show you how to address I/O bound processing by spreading I/O traffic across different data mounts to increase cube building and querying efficiency. This paper assumes experience with SAS OLAP Cube Studio and/or. INTRODUCTION SAS code has been introduced throughout this paper to demonstrate how the functionality that procedure provides can optimize cube creation and query performance on SAS 9.2 environment. It mainly covers optimizing cube creation and cube query performance by splitting I/O traffic to different data mounts. The advantage of incremental update to a cube is not covered. Also, hardware setup will not be covered beyond the prerequisite of having different data mounts created. Readers need to contact their system administrator or service provider to evaluate their environment. In this paper, the terms data mounts, data partitions, multi thread write/read points or multi mount points are interchangeable. BIG DATA? FASTER CUBE BUILDS? CAN DO IT Data carries power. Organizations use information contained within data to generate value, seek opportunities and implement competitive strategies. In the past few years, data has been growing at exponential rate. As more data flows into businesses, not only does volume increase, but also different types of data increases. Hundreds of Gigabytes of data accumulated creates challenges in the cube building process as well as in end-user query performance. IT professionals usually face infrastructure limitations, such as lack of memory capability, shortage of storage, and overloaded I/O bandwidth. Business users often complaint about query performance, data latency and not capturing enough data to support their decision-making needs. IT professionals prefer to build cubes as quick and small as possible to fit IT infrastructure. On the other hand, business end-users may require daily refreshed cube with large amount of pre-calculated aggregations and indexes on big data in order to achieve timely market competitive advantage. In addition, cost control on IT budgets and the profit generating business goal may sometimes cause cube building process and end-user query performance to be at odds with each other. A balanced approach is required to overcome those challenges. Ultimately, achieving an organization s long term business strategies and short term business tactics should drive decisions and designs throughout the cube building process. In other words, cube building is a business driven process and not an IT driven decision. Fortunately, has options to optimize both. Searching on the web, you will find lots of papers that talk about cube building efficiency. Here, focus will be given to the following SAS 9.2 Procedure Options. 1. DATAPATH= First, DATAPATH = ( pathname... pathnamen ) option. This statement or AGGREGATION statement option specifies the location of one or more partitions in which to place aggregation table data. The first aggregation file location is randomly selected from path specified in the option during the build process. This means that the distribution of the files will be different for each build. When building aggregations or different cubes concurrently, the benefit will be two fold: 1. the files will be more evenly distributed across different mount points, 1

2. and the I/O load during the build and querying time will be more evenly distributed across available I/O channels. Instead of barraging one mount point, this load sharing approach will split I/O traffic across many mount points at the same time, thus increasing your overall I/O throughput rate. This leads to improving cube creation speed during the build process as well as improving read performance during the query process. In this paper, the following example uses three mount point locations showing on how to split aggregation table data. Step 1: Verify cube creation required partition location on different mount points is available. If not, create one. In this paper, &path1./&cubename., &path2./&cubename. and &path3./&cubename. are created before cube creation. %let path1 = /nfshome/sunyunb/sgf/data1 %let path2 = /nfshome/sunyunb/sgf/data2 %let path3 = /nfshome/sunyunb/sgf/data3 %let cubename = SGF14 %let fref = cube /* Code for Multi-Mount Point Location creation */ /* Add check/create Location */ /* Add Location for DATAPATH and PATH options */ %Macro DCreate (fref, dpath, dcube) %let rc=%sysfunc(filename(&fref., &dpath./&dcube.)) %let dir = %sysfunc(dopen(&fref.)) %if &dir. = 0 %then %let newdir = %sysfunc(dcreate(&dcube., &dpath.)) %else %let newdir = %sysfunc(dclose(&dir.)) %Mend Dcreate %DCreate(&fref., &path1., &cubename.) %DCreate(&fref., &path2., &cubename.) %DCreate(&fref., &path3., &cubename.) Step 2: Place 3 mount point locations in Statement DATAPATH option. For example, if &path1. is randomly selected to be the first aggregation file location, the first partition of first aggregation table will be placed in to directory &path1. the second partition of the table into directory &path2. the third partition of the table into &path3. the fourth partition of the table into &path1., and so on. By doing this, I/O traffic will be spit to different mount-points and cube building time will be decreased, as especially if parallel aggregation creation is enabled. automatically creates NWAY aggregation for the new cube or fully refreshed cube (unless specifically disabled). The NWAT is the aggregation crossing of all dimension levels and usually is the largest in the cube. 2

Because NWAY is spread across all the mount points, any additional aggregation builds in the AGGREGATION statement will also benefit from the multi thread reading of the NWAY (NWAY is typically used to build the higher level aggregations). 2. AND PATH= Second, and PATH=( pathname... pathnamen ) options. For fast cube build, NO option can be used. However, the lack of indexes will affect end-user query performance. PATH specifies the locations of the index component files that correspond to each aggregation table partition as specified by the DATAPATH= option. PATH locations could be different from DATAPATH locations. Similar to DATAPATH option, when building a cube, it takes advantage of multi thread write. 3. CONCURRENT = This option specifies the maximum number of aggregations to create in parallel. The default value is 2. Experiment should be based on individual IT resource. Here, the option is setting to 3. That means maximum 3 aggregations could be building at the same time. Basically, limiting aggregations that run concurrently gives each aggregation enough CPU resources to build and can be used to reduce I/O contention in I/O bound systems. 3

4. OTHER COMMON USED PERFORMANCE OPTIONS WORKPATH=, it gives one or more locations for temporary work files. It equivalents to DATAPATH= for aggregation. The WORKPATH is used when MEMORY is limited and it must write temporary data to disk. Note: this may be mitigated by increasing the MEMSIZE setting during the cube build process. ASYNCLIMIT=, it limits the number of indices that will be created in parallel during cube building. It is similar as CONCURRENT for aggregation. SORTSIZE=, it specifies the available memory when creating aggregation. PARTSIZE=, it specifies the size of aggregation table partitions and their index components. CONCLUSION In today s fast-paced marketplace, utilizing, leveraging and maximizing IT resources and capability to gain clear business insight and make quick decisions will enable organizations to outperform their competitors. Therefore, unlocking the power of data is the key to success. The above options discussed in the paper will help IT professionals to overcome their challenges and deliver business value. REFERENCES SAS 9.2 OLAP Server User Guide, SAS Document SAS KNOWLEDGE BASE: DATAPATH and PATH locations are randomly selected for aggregation files Available at http://support.sas.com/kb/38/755.html RECOMMENDED READING SAS 9.2 OLAP Server User Guide CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Yunbo (Jenny) Sun Organization: Canada Post Address: 2701 Riverside Drive, Suite N250 City, State ZIP: Ottawa, Ontario K1A 0B1 Work Phone: 613-734-8744 Fax: 613-734-8814 Email: yunbo.sun@canadapost.ca Name: Michael Brule Organization: SAS Canada Address: 360 Albert Street, Suite 1600 City, State ZIP: Ottawa, Ontario K1R 7X7 Work Phone: 613-755-2305 Fax: 613-231-8526 Email: Michael.Brule@sas.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. APPENDIX A: LIST OF OPTIONS USED FOR PERFORMANCE The following options could be used to optimize cube creation and query performance. ASYNCLIMIT= COMPACT_NWAY NO 4

CONCURRENT = DATAPATH= PATH= SORTSIZE= MAXTHREADS= NOINEX PARTSIZE= SEGSIZE= 5