IBM Big SQL Partner Application Verification Quick Guide

Size: px

Start display at page:

Download "IBM Big SQL Partner Application Verification Quick Guide"

Irma Bryant
6 years ago
Views:

1 IBM Big SQL Partner Application Verification Quick Guide VERSION: 1.6 DATE: Sept 13, 2017 EDITORS: R. Wozniak D. Rangarao

2 Table of Contents 1 Overview of the Application Verification Process Platform Specific Considerations IBM Big SQL Introduction Architecture Resources for IBM Big SQL Verification Assistance Platform Access Client Connection and Drivers Big SQL basics Supported file formats Creating a Big SQL table Loading data to Big SQL Registering serializer/deserializers (SerDes) to read custom data formats Hive compatability Company Information: Contact Information Partner Product Overview Operating System(s) supported: Product Description: Application Status and Deployment Topology Do you intend to distribute IBM client-side drivers with your application? Will IBM Big SQL be available as a standard supported platform on your application? Support Model Joint Clients Using our Solution (optional) Reference Clients detail (optional) Verification Test details Test Environment Big SQL Features exercised in testing Test results and interoperability issues or bugs discovered Performance testing Reliability, Backup and Failover testing Minimum version number of your application required Completed Document: /31/17 2

3 1 Overview of the Application Verification Process To achieve the Ready for IBM Analytics mark for your application IBM requires completion of the Application Verification Process, which is designed to provide clients with assurance that our products work together. This guide contains essential information about accessing our technology and contains the form needed to submit test results to the IBM Analytics Ecosystem Team for review. We believe that you, the application providers, are the best judge of whether your application is functioning correctly. However, should you need technical assistance from our subject matter experts, we are here to work with you. We expect that in most cases you have done appropriate levels of testing, which are far more suitable for proving application readiness than those we might attempt to provide. We do look to verify and archive this document, when completed, as a record of the process used to achieve the verification that our products work together smoothly. Once approved, you will be able to download and use the Ready for IBM Analytics mark on collateral and be included in the Business Partner Application Showcase. 2 Platform Specific Considerations IBM Big SQL 2.1 Introduction Big SQL is a SQL engine for Hadoop that concurrently exploits Hive, HBase and Spark using a single database connection. Big SQL is also a useful platform for data warehouse offload and consolidation, a key use case for many Hadoop users. Big SQL also provides federated access to RDBMS sources outside of Hadoop with IBM Fluid Query technology. Because of the wide variety of use-cases and environments that Big SQL may be configured in it is important to both consider what common users cases will likely be, as well as documenting exactly what configurations have been used in your verification process. Our hybrid data management offerings, including Db2, Db2 Warehouse, Db2 on z/os and Big SQL, are built on a common SQL engine. Depending on the use case, you may find varying 5/31/17 3

degrees of commonality. For details, please refer to Common IBM SQL Features for Portable Data Application Development on IBM Developer Works. 2.

4 degrees of commonality. For details, please refer to Common IBM SQL Features for Portable Data Application Development on IBM Developer Works. 2.2 Architecture The Big SQL architecture includes two types of nodes - Management nodes - Worker or compute nodes All data is held in the Hadoop File System (HDFS) while the database infrastructure provides a logical view of the data while managing the underlying metadata. All SQL requests are routed to the management node following which they are compiled and optimized to generate a parallel execution plan that is then pushed to the compute nodes for execution. When a compute node receives a query plan, it dispatches special processes that know how to read and write HDFS data natively. Big SQL uses native and Java opensource-based readers (and writers) that are able to ingest different file formats. The Big SQL engine pushes predicates down to these processes so they can, in turn, apply projection and selection closer to the data. 5/31/17 4

5 These processes also transform input data into an appropriate format for consumption with Big SQL. 3 Resources for IBM Big SQL Verification 3.1 Assistance If you need assistance with the process, please submit an inquiry to the Analytics Ecosystem team and we will promptly contact you to answer any questions you may have. IBM ROLE NAME PHONE Analytics Ecosystem -team - Analytics Ecosystem Support Partner Lab: John Skier jskier@us.ibm.com Platform Access IBM Big SQL Free Trial Version Download a Free Trial version: Link: Big SQL Free Trial Version URL: 5/31/17 5

6 IBM Big SQL Knowledge Center Complete documentation of Big SQL features, functions, syntax and operational interfaces. Includes diagnostics and troubleshooting procedures. Link: Big SQL Knowledge Center URL: ibm.biz/bigsqlref 3.3 Client Connection and Drivers Technical details on connecting clients to Big SQL can be found here: iginsights.analyze.doc/doc/bigsql_connecting.html All IBM client drivers for all Db2 products may be found here: Link: URL: Big SQL basics Supported file formats BigSQL generally supports anything that Hadoop handles, including compression types, file formats, and SerDes, among others. The supported file formats include: - - Delimited text file - Sequence file o Binary sequence file o Text sequence file - Record Columnar (RC) file - Parquet file - Avro file - Optimized Row Columnar (ORC) file Recommended compression types : - - Snappy - gzip - deflate - bzip2 5/31/17 6

7 3.4.2 Creating a Big SQL table CREATE Table statement example to create a HADOOP table in Big SQL, HADOOP keyword indicates that data is stored on the cluster in the distributed file system. Example CREATE TABLE command create hadoop table users ( id int not null primary key, office_id int null, fname varchar(30) not null, lname varchar(30) not null) row format delimited fields terminated by ' ' stored as textfile; By default data for this table is stored in the subdirectory of Hive warehouse /hive/warehouse/myid.db/users. Optionally LOCATION clause can be used in the CREATE TABLE to layer Big SQL schema over existing DFS directory contents. Metadata is stored in SYSCAT.* and SYSHADOOP.* tables/views Loading data to Big SQL Data can be loaded to Big SQL either from local or remote file system using the command below Example LOAD commands load hadoop using file url 'sftp://myid:mypassword@myserver.ibm.com:22/installdir/bigsql/samples/data/gosalesdw.go_region_dim.txt with SOURCE PROPERTIES ('field.delimiter'='\t') INTO TABLE gosalesdw.go_region_dim overwrite; load hadoop using jdbc connection url 'jdbc:db2://some.host.com:portnum/sampledb' with parameters (user='myid', password='mypassword') from table MEDIA columns (ID, NAME) where 'CONTACTDATE < '' ''' into table media_db2table_jan overwrite with load properties ('num.map.tasks' = 10); 5/31/17 7

8 3.4.4 Registering serializer/deserializers (SerDes) to read custom data formats When using Big SQL to store and query unconventional data that may have complex or varying structures such as JSON data you can use custom SerDes. -- Create table for JSON data using open source hive-json-serde-0.2.jar SerDe -- Location clause points to DFS dir containing JSON data -- External clause means DFS dir & data won t be drop after DROP TABLE command Example CREATE EXTERNAL HADOOP TABLE command create external hadoop table socialmedia-json (Country varchar(20), FeedInfo varchar(300),... ) row format serde 'org.apache.hadoop.hive.contrib.serde2.jsonserde' location '</hdfs_path>/myjson'; select * from socialmedia-json; Hive compatability Big SQL can directly define and execute Hive User Defined Functions (UDF s). Example Hive UDF command CREATE FUNCTION MyHiveUDF (ARG1 VARCHAR(20), ARG2 INT) RETURNS INT SPECIFIC MYHIVEUDF PARAMETER STYLE HIVE EXTERNAL NAME com.myco.myhiveudf DETERMINISTIC LANGUAGE JAVA Spark integration with Big SQL Spark jobs can be invoked from Big SQL using a table UDF abstraction. Example Spark invocation UDTF SELECT * FROM TABLE(SYSHADOOP.EXECSPARK( language => 'scala', class => 'com.ibm.biginsights.bigsql.examples.readjsonfile', uri => 'hdfs://host.port.com:8020/user/bigsql/demo.json', card => )) AS doc WHERE doc.country IS NOT NULL 5/31/17 8

9 Partner Information 3.5 Company Information: Company Name Company Address Company Website 3.6 Contact Information Please provide the name, title and contact information of all relevant in completion of this verification: Name Role Address Phone 3.7 Partner Product Overview Product Name: Product Version: Product GA Date: Notes: 3.8 Operating System(s) supported: OS Distribution Version 5/31/17 9

10 3.9 Product Description: Please provide a brief description of your application: 3.10 Application Status and Deployment Topology Please answer the following questions: Broadly speaking how long has your application been commercially available? Do you intend your application to be: (check all that apply) o available as a traditional on-premise offering o available for private cloud deployments? 3.11 Do you intend to distribute IBM client-side drivers with your application? Note: IBM license terms permit partners to easily do this at no cost. 5/31/17 10

11 3.12 Will IBM Big SQL be available as a standard supported platform on your application? Will Big SQL be available to clients as a standard option, such as via menu selection, or standard configuration parameter, in the current or near-future version? 3.13 Support Model If an IBM customer opens a support issue that involves this product, what support channels exist that IBM can engage? Please provide all relevant channels of engagement, including a support portal, phone support, or support. Please include in this section any additional information that will help with client support, such as available on-line help systems for your application, relevant developer forums, etc. Please indicate if you are a member of TSAnet (YES / NO) 5/31/17 11

12 4 Joint Clients Using our Solution (optional) Do you have existing or pending clients using your solution with IBM Big SQL? If client anonymity is required please use generic descriptions like an insurance company or a large bank. If you do not have any existing or prospective joint clients please skip this section. 4.1 Reference Clients detail (optional) Do you have joint clients who are reference clients for you? Please provide details. 5/31/17 12

13 5 Verification Test details Briefly describe the process used to verify interoperability of your product with the IBM Big SQL server. Attach any test results as needed. Please specify which configuration and version of IBM Big SQL and Hadoop your certified on: IBM Big SQL version: Hadoop version: 5.1 Test Environment OS: OS: Please describe the general environment your solution was operating in, including other components that were used, including BigInsights components, Apache components, etc. 5.2 Big SQL Features exercised in testing 5/31/17 13

14 Please list the major Big SQL features which were exercised in your testing. For instance, does your application make use of Spark analytic functions, etc. 5.3 Test results and interoperability issues or bugs discovered Please describe the results of your testing, in particular did you find any problems with the interoperation of Big SQL and your application. Please describe and include IBM PTR s (Problem Trouble Report numbers) if they have been created to address defects in our products. (Attach any printouts or logs of test results to this form as needed) 5.4 Performance testing Please characterize performance results or findings you would like to highlight, if any. 5/31/17 14

15 5.5 Reliability, Backup and Failover testing Did you test the reliability of the application? Did you test backup and restore functions, or failover testing? Were you using any third-party software for this in your configurations? 5.6 Minimum version number of your application required Is there a minimum version number of your product required to interoperate with IBM Big SQL? 6 Completed Document: Please send this completed form to the Analytics Ecosystem team at Mail to: Submit Results 5/31/17 15

Db2 Partner Application Verification Quick Guide

Db2 Partner Application Verification Quick Guide VERSION: 1.6 DATE: Sept. 13, 2017 EDITOR: R. Wozniak Table of Contents 1 Overview of the Application Verification Process... 3 2 Platform Specific Considerations