Exploiting Cloud Storage With DB2 for LUW Philip Nelson ScotDB Limited Session Code: C15 Thursday 17 th November 08:30 09:30 Platform: DB2 for LUW
Agenda Introduction to Cloud Storage Solutions Using Cloud Storage With DB2 : Prior to V11.1 The First Integration : db2remstgmgr (V10.5 Fixpack 5) More Complete Integration : V11.1 More Work Ahead 2
Introduction to Cloud Storage Solutions What do we mean by the cloud anyway? Storage for all occasions Cloud providers supported by DB2 Introduction to Amazon S3 Introduction to SoftLayer Object Storage 3
What Do We Mean By The Cloud The practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer (Google dictionary) The delivery of on-demand computing resources everything from applications to data centers over the Internet on a payfor-use basis (IBM website) A model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources which can be rapidly provisioned and released with minimal management effort (Wikipedia) 4
Cloud Providers Supported Directly By DB2 Amazon Web Services (AWS) http://aws.amazon.com One of the original providers of public cloud services 11 locations in 4 geographies SoftLayer http://www.softlayer.com An IBM Company 18 locations in 4 geographies General comparison AWS lower cost but still reliable (although some reports of contention) SoftLayer operates more like a traditional hosting service 5
Storage For All Occasions Block storage Network file systems Object storage Archive storage Content Delivery Network (CDN) storage Hybrid storage solutions 6
Object Storage : General Uses Storage for objects Files accessible via API Internal content of files not directly accessible Each object also has metadata associated with it Typical uses from DB2 Data exports Files for load, import or ingest Backups Archive log storage Cheaper solutions are available for long term archiving e.g. Amazon Glacier 7
Introduction to Amazon S3 (Simple Storage Service) Arranged into buckets Top level in hierarchy Subdivided into containers Equivalent to subdirectory path Containers contain objects An object is a piece of data and its associated metadata Security controls at all levels Public access to a whole bucket possible Access to Individual file in a single container can be given 8
Introduction to SoftLayer Object Storage Built on OpenStack Object Storage (http://openstack.org) Data stored in a cluster (made resilient by replication) Clusters currently located in Dallas, Amsterdam and Singapore Each data cluster contains containers Equivalent to file system subdirectory paths Containers then contain objects Object is the data and its associated metadata Account controls access to data API endpoints for public internet and private network (inside SoftLayer) 9
Using DB2 With Cloud Storage Prior To Direct Support DB2 produces / consumes files on local file system Use cloud provider s API to move files to/from cloud storage Use scripting to orchestrate DB2 and cloud API calls Most languages have packages to simplify access Exploit (log archiving) user exit to interface DB2 with cloud API 10
Example Script To Store Backup in Amazon S3 (1) (Using Perl Amazon::S3 Module) #!/usr/bin/perl use Amazon::S3; my $aws_access_key_id =????'; # (20 chars) my $aws_secret_access_key =????'; # (40 chars) my $bucket_name = 'scotdb.private'; my $bucket_backup_dir = 'DB2Backups'; my $local_backup_dir = '/db2back'; my $s3 = Amazon::S3->new({ aws_access_key_id => $aws_access_key_id, aws_secret_access_key => $aws_secret_access_key, retry => 1, }); 11
Example Script To Store Backup in Amazon S3 (2) (Using Perl Amazon::S3 Module) # Get a list of the bucket s current files # (in the backup directory in the bucket) my $bucket = $s3->bucket($bucket_name); my $response = $bucket->list_all or die $s3->err. ": ". $s3->errstr; my @remote_backup_files; foreach my $key ( @{ $response->{keys} } ) { $all_remote_files{$key->{key}} = 1; if ($key->{key} =~ m/^$bucket_backup_dir\/([-\w\s\.\/]+)$/) { push @remote_backup_files, $1 if!($1 =~ /.*\/$/); } } 12
Example Script To Store Backup in Amazon S3 (3) (Using Perl Amazon::S3 Module) # Add new files to bucket foreach (@local_backup_files) { my $local_file = $_; if (!grep(/^$local_file$/, @remote_backup_files)) { my $response = $bucket->add_key_filename( "$bucket_backup_dir/$local_file", "$local_backup_dir/$local_file", { content_type => 'application/octet-stream', }, ); if ($response) {print "Added $local_file\n"; } else {print "FAILED $local_file\n";} } } 13
Archive Logging Via User Exit Full details in http://www.idug.org/p/bl/et/blogaid=5 Example could be rewritten using API commands (via wrapper module) DB2 also ships (non cloud storage) sample exits (written in C) Script must Handle each request type DB2 can send (ARCHIVE and RETRIEVE) Produce error messages in format DB2 will understand Copy files to archive (not move them) Configuration notes Up to V9.7 : DB CFG parameter USEREXIT (now discontinued) From V9.5 onwards : DB CFG LOGARCHMETH1 value USEREXIT Executable must be called db2uext2 and sqllib/adm 14
First Integration : db2remstgmgr (V10.5 Fixpack 5) Command line interface for interacting with cloud storage Totally undocumented in the Knowledge Center Run command to obtain syntax notes Not found on DB2 for Windows Only originally worked with SoftLayer Object Storage AWS S3 only works with V11.1 Fixpack 1 (in ESP) Could be used to simplify db2uext2 (user exit) coding In many ways equivalent to db2adutl (for TSM) 15
db2remstgmanager Example db2remstgmanager S3 list container=scotdb.private Lists the contents of S3 bucket (container) Prompts for credentials if not supplied auth1=<val>: S3 Access Key ID, SoftLayer API user name auth2=<val>: S3 Secret Access Key ID, SoftLayer API user key Other actions (apart from list) are put: uploads from local file system to cloud storage get: downloads from cloud storage to local file system Delete: removes object from cloud storage getobjectinfo: retrieves information about object 16
More Complete Integration : V11.1 (Fixpack 1) Documented (partially) in Knowledge Center from V11.1 GA Actually only available from V11.1 Fixpack 1 (in ESP) Performance test results still to follow DB2 directly accesses cloud storage for Backup Load Ingest Obviously a work in progress Most complete in dashdb offerings 17
Register Cloud Storage Location With DB2 New command CATALOG STORAGE ACCESS >>-CATALOG STORAGE ACCESS--ALIAS--alias-name---------> >-VENDOR--+-SOFTLAYER-+--SERVER--+-DEFAULT--+--------> '-S3--------' '-endpoint-' >-USER-storage-user-ID---PASSWORD-storage-password---> >-+-------------------------------+-+---------------+> '-CONTAINER-container-or-bucket-' '-OBJECT-object-' >-+-----------------------+------------------------->< '-+-DBGROUP--group-ID-+-' '-DBUSER--user-ID---' 18
Example : Register Amazon S3 Bucket CATALOG STORAGE ACCESS ALIAS s3bucket VENDOR S3 SERVER DEFAULT USER s3-access-key-id PASSWORD s3-secret-access-key CONTAINER scotdb.private; Similar syntax for SoftLayer Object Storage USER / PASSWORD is SoftLayer UserName / API Key Able to specify SERVER endpoint (default is DALLAS) 19
Notes on Registration of Cloud Storage Location By default registered alias only accessible to SYSADM DBGROUP / DBUSER keywords allow non-sysadm access Access credentials stored in a keystore Keystore shared with native encryption DBM CFG parameters KEYSTORE_TYPE and KEYSTORE_LOCATION Registration available across whole instance Use LIST STORAGE ACCESS to list all registered aliases Remove using UNCATALOG STORAGE ACCESS ALIAS <alias> Master key in keystore can by rotated using ROTATE MASTER KEY FOR STORAGE ACCESS 20
Syntax To Refer To Cloud Storage Object DB2REMOTE://<alias>//<storage-path>/<file-name> <alias> as defined by CATALOG STORAGE ACCESS <storage-path> defines the container <file-name> is the remote file to access / store 21
Example : Backup to Amazon S3 Following assumes alias s3bucket already catalogued BACKUP DATABASE MYDB TO DB2REMOTE://s3bucket/backups Documentation mentions filename as well As with all backups, DB2 determines the filename 22
Example : Load from Amazon S3 Bucket Again assumes alias s3bucket has been cataloged LOAD FROM DB2REMOTE://s3bucket/loadfiles/loaddata.ixf OF IXF REPLACE INTO MYTABLE ; This time, file name as well as container needs specified 23
More Work Ahead A number of obvious missing pieces Ability to EXPORT directly to cloud storage Interface for archive logging (archive and retrieval) Instrumentation LIST HISTORY and associated SQL functions Status / progress of utilities Documentation db2remstgmgr totally undocumented LIST and UNCATALOG of aliases only mentioned in CATALOG docs Command line docs (db2? <command> ) gives errors Outputs SQL error where documentation for new command should be 24
Performance and Reliability Considerations Need to compare direct to cloud with local backup then upload Local server to / from cloud storage SoftLayer : choose nearest endpoint using SERVER <endpoint> S3 : location specified on bucket creation Cloud server to / from cloud storage Use collocated servers and storage Not just performance but cost benefits No user-manageable controls on retries or parallelism 25
Cost Implications Amazon and SoftLayer have similar charging models Both favour collocated servers Factors on overall cost include - Cost per gigabyte Movement of data into object store Retrieval of data from object store Differences if source server in the cloud Same geography Different geography 26
Amazon S3 Basic Cost Factors Price per gigabyte stored Price decreases (slightly) after 1 TB No charge for upload data transfers No charge for downloads to same AWS region Charges for download to another AWS region or outside AWS Outside AWS at least 4x more expensive than between AWS regions Charge for use of API calls (in blocks of 1000 calls) If servers hosted in AWS, basically only pay for storage ($30/TB) If servers not hosted in AWS Backups cheap (if not restoring often) Loads expensive (data transfer costs) 27
SoftLayer Basic Cost Factors Price per gigabyte stored (currently 33% > S3) No charge for upload data transfers No charge for downloads to SoftLayer servers Charges for downloads to external servers (currently = S3) No charge for API calls 28
Beware External Transfer (Download) Costs Solution is most cost effective when servers also cloud based If servers are not cloud hosted Backups are cheap (basically just storage costs) Restores are costly (so beware frequent restore scenarios) Don t use to store data for loading apart from to cloud servers Typical cost example (current prices) 1 TB of data stored : $30 $40 1 TB of data transferred externally : $90 On AWS make sure servers and storage in same region 29
Security and Data Location Considerations Data transfers are encrypted (HTTPS) Both AWS and SoftLayer offer encryption at rest Not clear whether this is used by DB2 Best to use DB2 s backup encryption (> V10.5 FP5) Be careful where you send your data Legal implications of sending out of region If using AWS S3 ensure bucket created in right region If using SoftLayer specify appropriate endpoint when cataloging alias 30
Conclusions Heading in the right direction Seeing similar facilities as offered by TSM for local systems Work in progress Key piece needed is handling of archive logging Still needs assessed for performance Best when servers are also cloud based 31
Philip Nelson ScotDB Limited teamdba@scotdb.com Please fill out your session evaluation before leaving! Session C15 Exploiting Cloud Storage With DB2 for LUW