opensap: Big Data with SAP HANA Vora Course Week 03 - Exercises
opensap TABLE OF CONTENTS 1 TABLES AND VIEWS... 3 Create tables... 4 Select from the table... 4 Listing tables and views... 4 Loading tables from Vora into Spark... 5 Appending tables... 5 Dropping tables... 5 Creating views... 5 SQL VIEW... 5 Dimension View... 6 Cube view... 6 Drop views... 7 Check how the view is created... 7 2 DIFFERENT DATA TYPES... 9 Parquet Files... 10 Creating a table in Vora loading data from parquet format... 10 Check the result... 10 ORC Files... 11 Check the result... 11 3 HIERARCHIES... 12 Create Hierarchies... 13 Check the table... 13 Joining Hierarchies with regular tables... 13 Create the ADDRESSES table and check the values... 13 Create a view of Hierarchy which will make it easier to play with later:... 14 Join the ADDRESSES and OFFICERS tables... 15 Running UDF s on the Hierarchies... 15 Returns the rank of the descendants of the root... 15 Returns the address and the rank for the officers from level 2... 16 2
1 TABLES AND VIEWS Start Zeppelin: - Click on the connect button next to your opensap instance on CAL - A window will pop up. Click on Open next to Zeppelin to connect to Vora Data Modeler for the purpose of this exercise. - Zeppelin opens up in your browser. As you can see you ll have access to the solutions of the exercises and the create table statements via zeppelin notebooks: - You can use the solutions as a guide while running the exercises or creating a new notebook by clicking on Create new note and write your own solution. - In order to run a command on your Zeppelin notebook, click on the play button as below: 3
Create tables %vora CREATE TABLE CUSTOMER (CUSTOMER_ID string, REGION string, LONGITUDE int, LATITUDE int, CUSTOMER_GROUP string, LOCATION string) USING com.sap.spark.vora OPTIONS (tablename "CUSTOMER", paths "/user/vora/customer_data.csv") Select from the table %vora SELECT * FROM CUSTOMER Listing tables and views %vora SHOW TABLES using com.sap.spark.vora ** The result may look differently on your system 4
Loading tables from Vora into Spark %vora REGISTER TABLE SALES USING com.sap.spark.vora IGNORING CONFLICTS or %vora REGISTER ALL TABLES USING com.sap.spark.vora IGNORING CONFLICTS Appending tables %vora APPEND TABLE SALES OPTIONS (paths "/user/vora/sales_2015_data.csv,/user/vora/sales_data.csv", eagerload "true") ** Now you can see the sales from 2015 data is also added Sales from 2013 and 2014 Dropping tables %vora DROP TABLE CUSTOMER Creating views SQL VIEW %vora CREATE VIEW SALES_2014 AS (SELECT * FROM SALES WHERE YEAR = 2014) USING com.sap.spark.vora %vora SELECT * FROM SALES_2014 5
Dimension View Creating a CUSTOMER Dimension from the SALES data: %vora CREATE DIMENSION VIEW CUSTOMERDIM AS SELECT CUSTOMER_ID, YEAR FROM SALES USING com.sap.spark.vora %vora SELECT * FROM CUSTOMERDIM Cube view Jouining the Customer dimension with the sales table to create a sales cube (To show how you could join different tables and dimensions to create a cube) %vora CREATE CUBE VIEW SALESCUBE AS (SELECT * FROM CUSTOMERDIM C JOIN SALES S ON C.CUSTOMER_ID = S.CUSTOMER_ID) USING com.sap.spark.vora 6
Drop views %vora DROP VIEW SALESCUBE using com.sap.spark.vora Checking the initial SQL statement used to create the view %vora DROP VIEW SALESCUBE using com.sap.spark.vora Check how the view is created %vora DESCRIBE TABLE SALES_2014 USING com.sap.spark.vora 7
8
2 DIFFERENT DATA TYPES Start Zeppelin: - Click on the connect button next to your opensap instance on CAL - A window will pop up. Click on Open next to Zeppelin to connect to Vora Data Modeler for the purpose of this exercise. - Zeppelin opens up in your browser. As you can see you ll have access to the solutions of the exercises and the create table statements via zeppelin notebooks: - You can use the solutions as a guide while running the exercises or creating a new notebook by clicking on Create new note and write your own solution. - In order to run a command on your Zeppelin notebook, click on the play button as below: 9
Follow the first steps in exercise 2 to open Zeppelin and run the following exercises. Parquet Files Creating a table in Vora loading data from parquet format %vora CREATE TABLE SALES_P(CUSTOMER_ID string, YEAR string, REVENUE bigint) USING com.sap.spark.vora OPTIONS(tablename "SALES_P", paths "/user/vora/sales_p.parquet/*",format "parquet ) Check the result %vora SELECT * FROM SALES_P 10
ORC Files %vora CREATE TABLE SALES_O(CUSTOMER_ID string, YEAR string, REVENUE bigint) USING com.sap.spark.vora OPTIONS (tablename "SALES_O",paths "/user/vora/sales_o.orc/*",format "orc") Check the result %vora SELECT * FROM SALES_0 11
3 HIERARCHIES Start Zeppelin: - Click on the connect button next to your opensap instance on CAL - A window will pop up. Click on Open next to Zeppelin to connect to Vora Data Modeler for the purpose of this exercise. - Zeppelin opens up in your browser. As you can see you ll have access to the solutions of the exercises and the create table statements via zeppelin notebooks: - You can use the solutions as a guide while running the exercises or creating a new notebook by clicking on Create new note and write your own solution. - In order to run a command on your Zeppelin notebook, click on the play button as below: 12
Create Hierarchies %vora CREATE TABLE OFFICERS (id int, pred int, ord int, rank string) USING com.sap.spark.vora OPTIONS ( tablename "OFFICERS", paths "/user/vora/officers.csv ) Check the table %vora SELECT * FROM OFFICERS Joining Hierarchies with regular tables Create the ADDRESSES table and check the values %vora CREATE TABLE ADDRESSES (rank string, address string) USING com.sap.spark.vora OPTIONS (tablename "ADDRESSES", paths "/user/vora/addresses.csv") %vora SELECT * FROM ADDRESSES 13
Create a view of Hierarchy which will make it easier to play with later: %vora CREATE VIEW HV AS SELECT * FROM HIERARCHY ( USING OFFICERS AS child JOIN PARENT par ON child.pred = par.id SEARCH BY ord ASC START WHERE pred=0 SET node) AS H %vora SELECT * FROM HV 14
Join the ADDRESSES and OFFICERS tables %vora SELECT HV.rank, A.address FROM HV, ADDRESSES A WHERE HV.rank = A.rank Running UDF s on the Hierarchies Returns the rank of the descendants of the root %vora SELECT Children.rank FROM HV Children, HV Parents WHERE IS_ROOT(Parents.node) AND IS_PARENT(Parents.node, Children.node) 15
Returns the address and the rank for the officers from level 2 %vora SELECT OFFICERS.rank, ADDRESSES.address FROM (SELECT Descendants.rank AS rank FROM HV Parents, HV Descendants WHERE IS_DESCENDANT(Descendants.node, Parents.node) AND LEVEL(Parents.node) = 2 ) OFFICERS,ADDRESSES WHERE OFFICERS.rank = ADDRESSES.rank 16
Coding Samples Any software coding or code lines/strings ( Code ) provided in this documentation are only examples and are not intended for use in a production system environment. The Code is only intended to better explain and visualize the syntax and phrasing rules for certain SAP coding. SAP does not warrant the correctness or completeness of the Code provided herein and SAP shall not be liable for errors or damages cause by use of the Code, except where such damages were caused by SAP with intent or with gross negligence. 17
www.sap.com 2016 SAP SE or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. Please see http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information and notices. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP SE s or its affiliated companies strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.