Building a Big Data Solution Using IBM DB2 for z/os

Size: px

Start display at page:

Download "Building a Big Data Solution Using IBM DB2 for z/os"

Thomasine Barker
6 years ago
Views:

1 z Analytics WW Sales and Technical Sales Building a Big Data Solution Using IBM DB2 for z/os Jane Man Senior Software Engineer janeman@us.ibm.com IBM Corporation 2012 IBM Corporation

2 Please note IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2 2

3 Agenda Big Data Overview What is Hadoop? What is BigInsight? DB2 11 for z/os connectors to BigData What is it? How does it work? Use Cases Other BigInsights BigSql 3.0 DoveTail Co:Z Sqoop Veristorm Enterprise Summary Questions and Answers 3 IBM Silicon Valley Laboratory WW Tech Sales Boot Camp

4 Big Data is All Data And All Paradigms Transactional & Application Data Machine Data Social Data Enterprise Content Volume Velocity Variety Variety Structured Semi-structured Highly unstructured Highly unstructured Throughput Ingestion Veracity Volume IBM Corporation

5 Demand for differently structured data to be seamlessly integrated, to augment analytics / decisions Analytics and decision engines reside where the DWH / transaction data is Noise (veracity) surrounds the core business data Social Media, s, docs, telemetry, voice, video, content Expanding our insights getting closer to the truth Lower risk and cost Increased profitability Data Warehouse Integration Business Analytics DB2 for z/os IMS Information Governance Circle of trust widens 5

6 Big Data Use Cases Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 360 o View of the Customer Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time Operations Analysis Analyze a variety of machine data for improved business results Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency 6

7 What is An open source software framework that supports dataintensive distributed applications High throughput, batch processing runs on large clusters of commodity hardware Yahoo runs a 4000 nodes Hadoop cluster in 2008 Two main components Hadoop distributed file system self-healing, high-bandwidth clustered storage MapReduce engine 77

8 Hadoop: The underlying principle Lots of redundant disks really inexpensive disks Lots of cores inexpensive cores working all the time Disks crash that s ok just replace them Processors fail that s ok just replace them Network errors happen that s ok - just retry Disks, processors networked 8

9 Hadoop Distributed File System (HDFS) Files are broken in to large blocks (default=64mb). Blocks are replicated (default=3 times) and distributed across the cluster. Durability Availability Throughput Optimized for Streaming reads of large files write-once-read-many access model, append-only 99

10 MapReduce A simple, yet powerful framework for parallel computation Applicable to many problems, flexible data format Basic steps: Do parallel computation (Map) on each block (split) of data in an HDFS file and output a stream of (key, value) pairs to the local file system Redistribute (shuffle) the map output by key Do another parallel computation on the redistributed map output and write results into HDFS (Reduce) Mapper Reducer M 1 R 1 M 2 M 3 shuffle R 2 M 4 10

11 Power of Parallelism 1 Hard Disk = 100MB/sec (~1Gbps) Server = 12 Hard Disks = 1.2GB/sec (~12Gbps) Rack = 20 Servers = 24GB/sec (~240Gbps) Avg. Cluster = 6 Racks = 144GB/sec (~1.4Tbps) Large Cluster = 200 Racks = 4.8TB/sec (~48Tbps) Scanning 4.8TB at 100MB/sec takes 13 hours. 11

12 From Getting Starting to Enterprise Deployment: Different BigInsights Editions For Varying Needs Enterprise class Apache Hadoop Standard Edition - Spreadsheet-style tool -- Web console -- Dashboards - Pre-built applications -- Eclipse tooling -- RDBMS connectivity -- Big SQL -- Monitoring and alerts -- Platform enhancements Enterprise Edition - Accelerators -- GPFS FPO -- Adaptive MapReduce - Text analytics - Enterprise Integration -- Big R -- InfoSphere Streams* -- Watson Explorer* -- Cognos BI* -- Data Click* * Limited use license Quick Start Free. Non-production Same features as Standard Edition plus text analytics and Big R Breadth of capabilities IBM Corporation

13 InfoSphere BigInsights for Hadoop includes the latest Open Source components, enhanced by enterprise components IBM InfoSphere BigInsights for Hadoop Visualization & Ad Hoc Analytics Charting Data Access Runtime Data Store Advanced Analytics R File System BigSheets Dashboard Big R Text Analytics Jaql Hive MapReduce HBase HDFS Applications & Development Eclipse Tooling: MapReduce, Hive, Jaql, Pig, Big SQL, AQL Flume Sqoop Adaptive MapReduce GPFS FPO Stream Computing HCatalog Pig Streams ETL Big SQL BigSheets Reader and Macro Text Analytics Extractors Solr/ Lucene Enterprise Search Search ZooKeeper Console Monitoring Audit & History Oozie Resource Management & Administration Flexible Scheduler YARN* Kerberos LDAP Data Security for Hadoop Security Data Masking Data Matching Data Privacy for Hadoop Governance 13 Open Source IBM * In Beta

14 IBM InfoSphere BigInsights on System z Deployment flexibility: Compatible clusters on or off the Mainframe Secure Perimeter 14

15 What makes sense when? Case 1: Hadoop on the Mainframe DB2 QSAM VSAM IMS Linux Linux Linux Linux Linux Data originates mostly on the mainframe (Log files, database extracts) Data security a primary concern Clients will not send data across external net Relatively small data 100 GB to 10s of TBs Hadoop is valued mainly for richness of tools SMF RMF z/vm z/os Logs IFL IFL IFL CP(s) Case 2: Hadoop off the Mainframe DB2 VSAM QSAM IMS SMF RMF Logs z/os CP(s) Linux Linux Linux Linux Linux Most data originates off the mainframe Security less of a concern since data is not trusted anyway Very large data sets 100s of TB to PBs Hadoop is valued for ability to manage large datasets economically Desire to leverage cheap processing and potentially cloud elasticity 15

16 BigInsights Connector for DB2 for z/os 16

17 Enhancing DB2 Analytics on z with Big Data V11 DB2 is providing the connectors and the DB capability to allow DB2 applications to easily and efficiently access data in Hadoop New user-defined functions New generic table UDF capability IBM Big Insights JAQL 17 { }

18 DB2 11 Support for BigData Goal: Integrate DB2 for z/os with IBM's Hadoop based BigInsights Bigdata platform. Enabling traditional applications on DB2 for z/os to access Big Data analytics. 1. Analytics jobs can be specified using JSON Query Language (Jaql) 1. Submitted to BigInsights 2. Results stored in Hadoop Distributed File System (HDFS). 2. A table UDF (HDFS_READ) reads the Bigdata analytic result from HDFS, for subsequent use in an SQL query. HDFS_READ output table can have variable shapes DB2 11 supports generic table UDF, enabling this function 18 18

19 BigInsights Connector for DB2 for z/os Two DB2 sample functions: JAQL_SUBMIT Submit a JAQL script for execution on BigInsights from DB2 HDFS_READ Read HDFS files into DB2 as a table for use in SQL Notes: Functions are developed by DB2 for z/os Shipped with DB2 11 in prefix.sdsnlod2 Functions are not installed by default Functions and examples are documented by BigInsights Intended to be DB2 samples 19

20 BigInsights Connector Usage Key Roles: DB2 for z/os DBA, Sysprog Issue CREATE FUNCTION DDL on DB2 Refer to SDSNLOD2 dataset member Determine an appropriate WLM environment BigInsights Analytics Team Develop Jaql (or other) analytics on BigInsights DB2 for z/os Application or Analytics Team Use BigInsights Connector functions in DB2 to execute BigInsights analytics and retrieve results 20

21 JAQL_SUBMIT Submit a JAQL script for execution on BigInsights from DB2 SET RESULTFILE = JAQL_SUBMIT ('syslog = lines("hdfs:///idz1470/syslog3sec.txt"); JAQL script containing the analysis [localread(syslog)->filter(strpos($,"$hasp373")>=0)->count()]-> write(del(location="hdfs:///idz1470/iod00s/lab3e2.csv"));, ' iod00s/lab3e2.csv?op=open', ' '' ); options URL of the BigInsights cluster Intended HDFS file to hold the result 21

22 JAQL JSON Query Language Java MapReduce is the assembly language of Hadoop. JAQL is a high-level query language included in BigInsights with three objectives: Semi-structured analytics: analyze and manipulate large-scale semi-structured data, like JSON data Parallelism: uses the Hadoop MapReduce framework to process JSON data in parallel Extensibility: JAQL UDFs, Java functions, JAQL modules JAQL provides a hook into any BigInsights analysis 22

23 HDFS_READ Example HDFS_READ Read a file from HDFS, present the file as a DB2 table for use in SQL SET RESULT_FILE = JAQL_SUBMIT(..... ); URL of the CSV file to be read SELECT BIRESULT.CNT FROM TABLE(HDFS_READ(RESULT_FILE, '') AS BIRESULT(CNT INTEGER); options Definition of the table, how to present the results to SQL 23

24 Example of HDFS_Read The following is an example of a CSV file stored in HDFS 1997,Ford, E350,"ac, abs, moon", ,Chevy, "Venture ""Extended Edition""",, ,Jeep,Grand Cherokee,"MUST SELL! AC, moon roof, loaded", Example SQL statement: SELECT * FROM TABLE (HDFS_Read(' ')) AS X (YEAR INTEGER, MAKE VARCHAR(10), MODEL VARCHAR(30), DESCRIPTION VARCHAR(40), PRICE DECIMAL(8,2)); Result YEAR MAKE MODEL DESCRIPTION PRICE Ford E350 ac, abs, moon Chevy Venture "Extended Edition" (null) Jeep Grand Cherokee MUST SELL! AC, moon roof, loaded 24

25 Integrated Query Example INSERT INTO BI_TABLE (CNT) (SELECT CNT FROM TABLE (HDFS_READ (JAQL_SUBMIT ('syslog = lines("hdfs:///idz1470/syslog3sec.txt"); [localread(syslog)->filter(strpos($,"$hasp373")>=0)->count()]-> write(del(location="hdfs:///idz1470/iod00s/lab3e2.csv"));', ' iod00s/lab3e2.csv?op=open', ' '' ), '' ) ) AS BIGINSIGHTS(CNT INTEGER)); JAQL_SUBMIT can be embedded in HDFS_READ for a synchronous execute/read workflow 25

26 DB2 BigInsights Integration Use Case 26

27 DB2-BigInsights Integration Use Case 1 1. BigInsights ingests data that usually is not ingested by established structured data analysis systems like DB2 e.g. from all clients sent to an insurance company. 2. DB2 kicks off a Hadoop job on BigInsights that analyzes the s and identifies customers who have expressed dissatisfaction with the service and the word cancel, terminate, switch or synonyms thereof, or names of the competitors. 3. BigInsights job runs successfully, creates a file of results (names and addresses of customers at risk) and terminates. 4. DB2 reads the BigInsights result file using user defined table function (HDFS_READ). 5. DB2 joins the result with the Agent table and alerts the agents of the atrisk customers. The agents act upon the at-risk customer and offer a promotion to stave off defection. 27

28 Use Case 2 syslog analysis 1. DB2 Syslog is sent to BigInsight for analysis 2. Issue SQL statements to 1. Count the number of lines in our syslog 2. Count the number of jobs started in our syslog 3. Show the lines containing $HASP373 messages 4. Show job names together with start time 5. Count the occurrences of each job 6. Show which jobs hand more than one occurrence 7. Produce a report of jobs with occurrence, departments, and owner names internet DB2 11 for z/os Infosphere BigInsights

29 Syslog sample 29

30 Count the number of lines in our syslog SELECT BIGINSIGHTS.CNT FROM TABLE (HDFS_READ (JAQL_SUBMIT ('syslog = lines("hdfs:///idz1470/syslog3sec.txt"); [localread(syslog)->count()]-> write(del(location="hdfs:///idz1470/iod00s/lab3e1.csv"));', ) ) ' 'iod00s/lab3e1.csv?op=open', ' 'timeout=1000' ), '' AS BIGINSIGHTS(CNT INTEGER); Result: CNT record(s) selected 30

31 Show the lines containing $HASP373 messages SELECT BIGINSIGHTS.LINE FROM TABLE (HDFS_READ (JAQL_SUBMIT ('syslog = lines("hdfs:///idz1470/syslog3sec.txt"); localread(syslog)->filter(strpos($,"$hasp373")>=0)-> write(del(location="hdfs:///idz1470/iod00s/lab3e3.csv"));', ' 'iod00s/lab3e3.csv?op=open', 31 ' 'timeout=1000' ), '' ) ) AS BIGINSIGHTS(LINE VARCHAR(116));

32 Result: LINE N STLAB :30:00.20 JOB $HASP373 PJ STARTED - INIT PJ - CLASS V - SYS 28 N STLAB :30:00.20 JOB $HASP373 C5LEG003 STARTED - INIT K1 - CLASS L - SYS 28 N STLAB :30:00.21 JOB $HASP373 T1XRP010 STARTED - INIT D6 - CLASS 8 - SYS 28 N STLAB :30:00.21 JOB $HASP373 T1XRP018 STARTED - INIT D9 - CLASS 8 - SYS 28 N STLAB :30:00.21 JOB $HASP373 T1XRP022 STARTED - INIT D7 - CLASS 8 - SYS 28 N STLAB :30:00.21 JOB $HASP373 T1XRP002 STARTED - INIT D5 - CLASS 8 - SYS 28 N STLAB :30:00.26 JOB $HASP373 PJ STARTED - INIT PJ - CLASS V - SYS 28 N STLAB :30:00.26 JOB $HASP373 PJ STARTED - INIT PJ - CLASS V - SYS 28 N STLAB :30:08.71 JOB $HASP373 T1XRP006 STARTED - INIT D0 - CLASS 8 - SYS 28 N STLAB :30:08.71 JOB $HASP373 T1XRP014 STARTED - INIT D8 - CLASS 8 - SYS 28 N STLAB :30:17.25 JOB $HASP373 T1XRP022 STARTED - INIT D7 - CLASS 8 - SYS 28 N STLAB :30:17.25 JOB $HASP373 T1XRP002 STARTED - INIT D5 - CLASS 8 - SYS 28 N STLAB :30:17.28 JOB $HASP373 PJ STARTED - INIT PJ - CLASS V - SYS 28 N STLAB :30:17.29 JOB $HASP373 PJ STARTED - INIT PJ - CLASS V - SYS record(s) selected 32

33 Count the number of jobs started in our syslog SELECT BIGINSIGHTS.CNT FROM TABLE (HDFS_READ ) (JAQL_SUBMIT ) ('syslog = lines("hdfs:///idz1470/syslog3sec.txt"); [localread(syslog)->filter(strpos($,"$hasp373 )>=0)->count()]-> write(del(location="hdfs:///idz1470/iod00s/lab3e2.csv"));', ' 'iod00s/lab3e2.csv?op=open', ' 'timeout=1000' ), '' AS BIGINSIGHTS(CNT INTEGER); Result: CNT record(s) selected 33

34 Integrating HDFS_READ with SQL read file from exercise lab3e4, group to occurrences in db2 SELECT JOBNAME, COUNT(*) AS OCCURS FROM TABLE (HDFS_READ ( ' 'idz1470/iod00s/lab3e4.csv?op=open', '' ) ) AS BIGINSIGHTS(JOBNAME CHAR(8), START TIME) GROUP BY JOBNAME HAVING COUNT(*) > 1; Result: JOBNAME OCCURS C5LEG003 8 C5SQ T1CMD28 2 T1LEG002 5 T1NES002 3 T1QDM28 3 T1XRP002 8 T1XRP006 7 T1XRP010 8 T1XRP014 7 T1XRP018 8 T1XRP record(s) selected

35 Report of Jobs with Job description and Occurrence -- join to JOBCAT table SELECT T1.JOBNAME, J.COMMENT, T1.OCCURS FROM TABLE (HDFS_READ (' 'idz1470/iod00s/lab3e5.csv?op=open', '') ) AS T1(JOBNAME CHAR(8), OCCURS SMALLINT), IOD02S.JOBCAT J WHERE J.JOBNAME = T1.JOBNAME; IOD02S.JOBCAT table OWNERDEPT JOBNAME COMMENT C5CPX002 RUN XMLTABLE JOB C5EAX003 RUN XMLQUERY JOB C5EAX007 RUN XMLTABLE JOB C5LEG003 RUN XML PERFORMANCE JOB C5NES001 RUN XML INDEX JOB C5NES002 RUN XMLQUERY JOB C5NES004 RUN XMLTABLE JOB C5SQ2003 RUN XMLEXISTS JOB. 35 Result JOBNAME COMMENT OCCURS C5CPX002 RUN XMLTABLE JOB 1 C5EAX003 RUN XMLQUERY JOB 1 C5EAX007 RUN XMLTABLE JOB 1 C5LEG003 RUN XML PERFORMANCE JOB 8 C5NES001 RUN XML INDEX JOB 1 C5NES002 RUN XMLQUERY JOB 1 C5NES004 RUN XMLTABLE JOB 1 C5SQ2003 RUN XMLEXISTS JOB 5 PJ TEST UDFS JOB 100 1

36 More Create a View on HDFS_READ for transparent access CREATE VIEW IOD00S.VIEW1 AS SELECT T1.JOBNAME, T1.OCCURS FROM TABLE (HDFS_READ (' 'idz1470/iod00s/lab3e5.csv?op=open', '') ) AS T1(JOBNAME CHAR(8), OCCURS SMALLINT); 36

37 More Write HDFS_READ result to another table -- CREATE table to hold job names & start times CREATE TABLE IOD00S.JOB_STATS (JOBNAME CHAR(8), STARTTIME TIME); -- INSERT into the newly created table INSERT INTO IOD00S.JOB_STATS SELECT JOBNAME, START TIME FROM TABLE (HDFS_READ ( 'idz1470/iod00s/lab3e4.csv?op=open', '' ) ) AS T1(JOBNAME CHAR(8), START TIME); SELECT * FROM iod02s.job_stats ORDER BY STARTTIME DESC JOBNAME STARTTIME PJ :32:59 PJ :32:59 PJ :32:59 PJ :32:59 C5LEG003 20:32:58 T1QDM28 20:32:58 T1NES002 20:32:58 37

38 More Using with Common Table Expressions 38 WITH JOBS(JOBNAME, START) AS (SELECT JOBNAME, START TIME FROM TABLE (HDFS_READ ( ' 'iod02s/lab3e4.csv?op=open', '') ) AS T1(JOBNAME CHAR(8), START TIME) ) SELECT J1.JOBNAME, J1.START FROM JOBS J1 WHERE J1.START >= (SELECT J2.START FROM JOBS J2 WHERE J2.JOBNAME='SMFDMP28') Result: JOBNAME START C5LEG003 20:32:33 T1XRP002 20:32:33 T1XRP022 20:32:33 T1XRP018 20:32:33 PJ :32:33 PJ :32:33 SMFDMP28 20:32:33 PJ :32:33 PJ :32:33 T1XRP006 20:32:41 T1XRP014 20:32:41

39 Other 39

40 Components of an Integration Solution Move data out of DB2 (out of Z) Move data into DB2 (on to Z) Execute analysis on the hadoop cluster 40

41 Big SQL and BigInsights 3.0 Big SQL is a standard component in all IBM BigInsights versions Seamlessly integrated with other BigInsights components Expands IBM's continued focus on large scale, high performance analytics Enterprise class Apache Hadoop Standard Edition - Spreadsheet-style tool - Web console - Dashboards - Pre-built applications - Eclipse tooling - RDBMS connectivity - Big SQL -- Monitoring and alerts -- Platform enhancements Enterprise Edition - Accelerators - GPFS FPO - Adaptive MapReduce - Text analytics - Enterprise Integration -- Big R -- InfoSphere Streams* -- Watson Explorer* -- Cognos BI* -- Data Click* * Limited use license Quick Start Free. Non-production Same features as Standard Edition plus text analytics and Big R Breadth of capabilities 41

42 Loading BigSQL from JDBC data source A JDBC URL may be used to load directly from external data source It supports many options to partition the extraction of data Providing a table and partitioning column Providing a query and a WHERE clause to use for partitioning Example usage: LOAD USING JDBC CONNECTION URL 'jdbc:db2://myhost:50000/sample' WITH PARAMETERS ( user = 'myuser', password='mypassword' ) FROM TABLE STAFF WHERE "dept=66 and job='sales'" INTO TABLE staff_sales PARTITION ( dept=66, job='sales') APPEND WITH LOAD PROPERTIES (bigsql.load.num.map.tasks = 1) ; 42

43 Co:Z Launcher Example OpenSSH user credentials //COZJOB JOB..., //*===================================== //* //* run analysisscript on linux //* //*===================================== //STEP1 EXEC PROC=COZPROC, // ARGS= //STDIN DD * Shell script to be run on target system # This is input to the remote shell /home/jason/bin/analysisscript.sh \ /home/hadoop/hadoop-2.3.0/bin/hdfs dfs -put - /hadoop/db2table.toz 43

44 Co:Z Dataset Pipes Streaming Example //COZJOB JOB..., //* //STEP1 EXEC DSNUPROC,UID='USSPIPE. LOAD', // UTPROC='', SYSTEM= DB2A' //UTPRINT DD SYSOUT=* //SYSUT1... //SORTOUT... //SYSIN DD * Use a TEMPLATE to point to a USS pipe TEMPLATE DATAFIL PATH='/u/admf001/test1' RECFM(VB) LRECL(33) FILEDATA(TEXT) LOAD DATA FORMAT DELIMITED COLDEL X'6B' CHARDEL X'7F' DECPT X'4B' INTO TABLE JASON.TABLE2 (C1 POSITION(*) INTEGER, C2 POSITION(*) VARCHAR) INDDN DATAFIL //COZJOB2 JOB... //STEP1 EXEC PROC=COZPROC, // ARGS='hadoop@kea.svl.ibm.com //STDIN DD * LOAD using the TEMPLATE, listens to the pipe A separate job to write to the pipe from the remote system /home/hadoop/hadoop-2.3.0/bin/hdfs dfs -cat /hadoop/db2table.toz \ tofile '/u/admf001/test1' 44

45 Apache Sqoop Open Source top-level Apache product Transfers data between HDFS and RDBMS (like DB2!) Uses DB2 SQL INSERT and SELECT statements to support export and import No custom DB2 for z/os connector means default performance options sqoop export --driver com.ibm.db2.jcc.db2driver -connect jdbc:db2:// :59000/systmf1 --username JASON --password xxxxxxxx --table SYSADM.NYSETB02 --export-dir "/user/jason/myfile.txt" -m 9 DB2 s JDBC driver My server, credentials, and table 45 Specify parallelism on the remote cluster

46 Veristorm Enterprise (Not an IBM product) zdoop : Veristorm hadoop distribution on z/linux User-friendly GUI allows point and click data movement Copies z/os VSAM, DB2, IMS, datasets and log files into zdoop 46

47 IBM InfoSphere System z Connector for Hadoop System z Mainframe z/os S M F DB2 VSAM IMS System z Connector For Hadoop Linux for System z InfoSphere BigInsights MapReduce, Hbase, Hive HDFS Hadoop on your platform of choice IBM System z for security Power Systems Intel Servers Point and click or batch selfservice data access Lower cost processing & storage Logs z/vm System z Connector CP(s) For Hadoop IFL IFL IFL 47

48 Summary DB2 11 for z/os. enables traditional applications on DB2 for z/os to access IBM's Hadoop based BigInsights Bigdata platform for Big Data analytics. JAQL_SUBMIT Submit a JAQL script for execution on BigInsights from DB2. Results are stored in Hadoop Distributed File System (HDFS). HDFS_READ A table UDF read HDFS files (containing BigIsights analytic result) into DB2 as a table for use in SQL. 48

49 Resources BigInsights (including JAQL documentation) JAQL_SUBMIT and HDFS_READ Integrate DB2 for z/os with InfoSphere BigInsights, Part 1: Set up the InfoSphere BigInsights connector for DB2 for z/os biginsights/index.html Integrate DB2 for z/os with InfoSphere BigInsights, Part 2: Use the InfoSphere BigInsights connector to perform analysis using Jaql and SQL biginsights2/index.html 49

50 InfoSphere BigInsights Quick Start is the newest member of the BigInsights family What is BigInsights Quick Start? No charge, downloadable edition that allows you to experiment with enterprise-grade Hadoop features Simplifies the complexity of Hadoop with easy-to-follow tutorials and videos No time or data limitations to allow you to experiment for a wide range of use cases Download Now! Watch the videos! ibm.co\quickstart ibmurl.hursley.ibm.com/3 PLJ 50 50

51 51 IBM Silicon Valley Laboratory WW Tech Sales Boot Camp

52 52

Data Analytics using MapReduce framework for DB2's Large Scale XML Data Processing

IBM Software Group Data Analytics using MapReduce framework for DB2's Large Scale XML Data Processing George Wang Lead Software Egnineer, DB2 for z/os IBM 2014 IBM Corporation Disclaimer and Trademarks