PUBLIC SAP Vora Sizing Guide

Similar documents
Master Guide for SAP HANA Smart Data Integration and SAP HANA Smart Data Quality

PUBLIC Rapid Deployment Guide

SAP Workforce Performance Builder 9.5

What's New in SAP HANA Smart Data Streaming (Release Notes)

Afaria Document Version: Windows Phone Enterprise Client Signing

SAP Vora - AWS Marketplace Production Edition Reference Guide

SAP IoT Application Enablement Reuse Components and Templates

ATTP Settings for ATTP to ATTP Connection

PUBLIC DQM Microservices Blueprints User's Guide

Advanced Reporting in the Online Report Designer Administration Guide

Development Information Document Version: CUSTOMER. ABAP for Key Users

SAP Enable Now. Desktop Components (Cloud Edition)

Starting Guide for Data Warehousing Foundation Components on XSA

Secure Login for SAP Single Sign-On Sizing Guide

SAP Workforce Performance Builder

SAP Workforce Performance Builder 9.5

SAP Enable Now. System Requirements

Configuring Client Keystore for Web Services

System Requirements and Technical Prerequisites for SAP SuccessFactors HCM Suite

Onboarding Guide THE BEST RUN. IMPLEMENTATION GUIDE PUBLIC Document Version:

CUSTOMER Upgrade: SAP Mobile Platform SDK for Mac OS

SAP Workforce Performance Builder 9.5

SAP Enable Now. Desktop Assistant

SAP Enable Now What s New. WHAT S NEW PUBLIC Version 1.0, Feature Pack SAP Enable Now What s New. Introduction PUBLIC 1

Week 2 Unit 3: Creating a JDBC Application. January, 2015

SAP Jam for Microsoft Office integration Reference Guide THE BEST RUN

Non-SAP Backend System Readiness Check

Security Guide SAP Supplier InfoNet

Creating RFC Destinations

HA215 SAP HANA Monitoring and Performance Analysis

HA100 SAP HANA Introduction

SAP HANA Operation Expert Summit PLAN - Hardware Landscapes. Addi Brosig, SAP HANA Product Management May 2014

SAP HANA Data Warehousing Foundation Data Distribution Optimizer / Data Life Cycle Manager DWF SP03

SAP HANA tailored data center integration Frequently Asked Questions

Configuring the SAP Cryptolibrary on the ABAP Application Server

1704 SP2 CUSTOMER. What s New SAP Enable Now

HA301. SAP HANA 2.0 SPS03 - Advanced Modeling COURSE OUTLINE. Course Version: 15 Course Duration:

edocument for Hungary Invoice Registration - SAP Cloud Platform Integration Guide (SAP S/ 4HANA Cloud)

Configuring the Web Service Runtime for ATTP

Analyze Big Data Faster and Store It Cheaper

HA100 SAP HANA Introduction

HA215 SAP HANA Monitoring and Performance Analysis

SAP Hybris Billing, Pricing Simulation Extended Functions Release 2.0, SP03

Managing Business Rules THE BEST RUN. PLANNING AND DESIGN PUBLIC SAP Global Track and Trace Document Version: Cloud 2018.

opensap: Big Data with SAP HANA Vora Course Week 03 - Exercises

HA150 SQL Basics for SAP HANA

HA100 SAP HANA Introduction

SAP HANA SPS 08 - What s New? SAP HANA Interactive Education - SHINE (Delta from SPS 07 to SPS 08) SAP HANA Product Management May, 2014

SAP Jam Application Launcher for Microsoft Windows Reference Guide

VERSION 1.0, FEATURE PACK What s New SAP Enable Now

Week 2 Unit 1: Introduction and First Steps with EJB. January, 2015

HA150. SAP HANA 2.0 SPS02 - SQL and SQLScript for SAP HANA COURSE OUTLINE. Course Version: 14 Course Duration: 3 Day(s)

HA150. SAP HANA 2.0 SPS03 - SQL and SQLScript for SAP HANA COURSE OUTLINE. Course Version: 15 Course Duration:

HA300 SAP HANA Modeling

HA300 SAP HANA Modeling

BC403 Advanced ABAP Debugging

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

DS10. Data Services - Platform and Transforms COURSE OUTLINE. Course Version: 15 Course Duration: 3 Day(s)

SAP EarlyWatch Alert. SAP HANA Deployment Best Practices Active Global Support, SAP AG 2015

BC414. Programming Database Updates COURSE OUTLINE. Course Version: 15 Course Duration: 2 Day(s)

Let s Exploit DITA: How to automate an App Catalog

ADDITIONAL GUIDES Customer SAP Enable Now System Requirements Customer

SAP Jam add-in for Microsoft Office Outlook Administration Guide and Release Notes

edocument for Italy - SAP Cloud Platform Integration Guide

SCM380 SAP MII - Manufacturing Integration and Intelligence Fundamentals

C4C30. SAP Cloud Applications Studio COURSE OUTLINE. Course Version: 21 Course Duration: 4 Day(s)

HA355. SAP HANA Smart Data Integration COURSE OUTLINE. Course Version: 12 Course Duration: 3 Day(s)

HA400 ABAP Programming for SAP HANA

MDG100 Master Data Governance

SAP HANA SPS 08 - What s New? SAP HANA Application Lifecycle Management (Delta from SPS 07 to SPS 08) SAP HANA Product Management June, 2014

HA 450. Application Development for SAP HANA COURSE OUTLINE. Course Version: 12 Course Duration:

SAP HANA SPS 08 - What s New? SAP HANA Modeling (Delta from SPS 07 to SPS 08) SAP HANA Product Management May, 2014

S4D430 Building Views in Core Data Services ABAP (CDS ABAP)

An Approach for Hybrid-Memory Scaling Columnar In-Memory Databases

S4H410. SAP S/4HANA Embedded Analytics and Modeling with Core Data Services (CDS) Views COURSE OUTLINE. Course Version: 05 Course Duration: 2 Day(s)

S4H01. Introduction to SAP S/4HANA COURSE OUTLINE. Course Version: 04 Course Duration: 2 Day(s)

Single Sign-On Extensions Library THE BEST RUN. PUBLIC SAP Single Sign-On 3.0 SP02 Document Version:

Oracle Big Data Connectors

CUSTOMER SAP Afaria Overview

ADM535. DB2 LUW Administration for SAP COURSE OUTLINE. Course Version: Course Duration: 3 Day(s)

SAP Jam Communities What's New 1808 THE BEST RUN. PUBLIC Document Version: August

CA611 Testing with ecatt

CLD100. Cloud for SAP COURSE OUTLINE. Course Version: 16 Course Duration: 2 Day(s)

BOCRC. SAP Crystal Reports Compact Course COURSE OUTLINE. Course Version: 15 Course Duration: 3 Day(s)

UX402 SAP SAPUI5 Development

Complementary Demo Guide

SAP 3D Visual Enterprise 9.0: Localization of Authoring Content

BC405 Programming ABAP Reports

Visual Business Configuration with SAP TM

BW305H. Query Design and Analysis with SAP Business Warehouse Powered by SAP HANA COURSE OUTLINE. Course Version: 15 Course Duration: 5 Day(s)

SLT100. Real Time Replication with SAP LT Replication Server COURSE OUTLINE. Course Version: 13 Course Duration: 3 Day(s)

Data Protection and Privacy for Fraud Watch

SAP Hybris Commerce Leveraging the Public Cloud

HA100 SAP HANA Introduction

BOD410 SAP Lumira 2.0 Designer

UX400. OpenUI5 Development Foundations COURSE OUTLINE. Course Version: 02 Course Duration: 5 Day(s)

SAP HANA SPS 09 - What s New? SAP River

BW405. BW/4HANA Query Design and Analysis COURSE OUTLINE. Course Version: 14 Course Duration: 5 Day(s)

SAP Business One Upgrade Strategy Overview

SAP Business One Hardware Requirements Guide

Transcription:

SAP Vora 2.0 Document Version: 1.1 2017-11-14 PUBLIC

Content 1 Introduction to SAP Vora....3 1.1 System Architecture....5 2 Factors That Influence Performance....6 3 Sizing Fundamentals and Terminology....7 4 Initial Sizing for SAP Vora.... 8 4.1 Engines....8 4.2 Number of Kubernetes Worker Nodes with Vora Containers....11 2 P U B L I C Content

1 Introduction to SAP Vora SAP Vora is a distributed database system for big data processing. SAP Vora can run on a cluster of commodity hardware compute nodes and is built to scale with the size of the data by scaling up the compute cluster. SAP Vora Engines SAP Vora comes with support for various data types, such as relational data, graph data, collections of JSON (Java Script Object Notation) documents, and time series. Each of these data types is managed by a specialized engine, which has tailored internal data structures and algorithms to natively support and efficiently process that type of data. The relational in-memory store allows you to load relational data into main memory for fast access using code generation for query processing. The relational disk engine provides relational data processing for data sets that do not fit into main memory. The time series engine allows you to compress time series data using various compression techniques, while it provides algorithms like cross correlation or histogram computation on the compressed data. The graph engine lets you perform commonly used graph operations on its data and is particularly suited for handling complex read-only analytical queries on very large graphs. The document store supports rich query processing over JSON data. SAP Vora can load and index data from external distributed data stores, such as HDFS and Amazon S3. The data is either kept in main memory for fast processing, or, in the case of the relational disk engine, it is indexed and stored on the hard disks which are locally attached to the compute nodes. Data loaded to SAP Vora can be partitioned by user-defined partitioning schemes, such as range, block, or hash partitioning. SAP Vora contains a distributed query processor, which can evaluate queries on the partitioned data. Metadata (that is, table schemas, partition schemes, and so on) is stored in SAP Vora's own catalog, which persists the catalog entries using SAP Vora's Distributed Log (DLog) infrastructure. Apache Spark Integration The SAP Vora application programming interface (API) has a two-fold integration. Firstly, it provides an SAP Vora client that can be used independently of Spark. This client allows you to create, update, and delete database objects (tables, collections, and graphs). The second part of the API consists of an integration with Apache Spark, which enables you to query data stored in SAP Vora from Spark. Spark queries are either evaluated using Spark mechanisms (after transferring data from the SAP Vora engines to Spark workers), or - wherever possible using the Spark Data Sources API - are pushed for fast processing into the respective SAP Vora engines. The SAP Vora Spark Extension takes care of analyzing Spark plans and pushing the query (or parts of it) into the SAP Vora engines. Introduction to SAP Vora P U B L I C 3

Spark applications can interact with SAP Vora by referencing the SAP Vora Spark Extension JAR file delivered with SAP Vora. External tools that make use of the Java Database Connectivity (JDBC) protocol can interact with SAP Vora through the Spark/Hive Thriftserver shipped with SAP Vora. SAP Vora - SAP HANA Integration SAP Vora integrates with SAP HANA. There are two approaches to combine or federate queries across SAP HANA and SAP Vora: 1. From SAP Vora to SAP HANA via the SAP HANA data source Spark extension The SAP Vora Spark extension contains a Spark data source implementation that allows you to interact with SAP HANA systems. Therefore, Spark applications can combine data from both SAP HANA and SAP Vora. 2. From SAP HANA to SAP Vora via SAP HANA SDA and the VoraODBC protocol SAP HANA SDA allows you to register SAP Vora as a remote source. Tables stored in SAP Vora can be exposed to the SAP HANA catalog as virtual tables. SAP HANA queries that involve virtual tables from the SAP Vora system lead to query execution on the SAP Vora system and an exchange of intermediate data with the SAP HANA system. SAP Vora Tools SAP Vora comes with a graphical UI that contains an SQL console, a graphical view modeler, and a data browser. Kubernetes and Hadoop SAP Vora is deployed to and runs in Kubernetes clusters. All SAP Vora services are containerized using Docker. Based on a cluster description specification (defined in a configuration file), SAP Vora clusters can be booted up in a set of Docker containers and run in a Kubernetes environment. Hardware and software failures are mitigated by failover mechanisms. The Hadoop-related part of SAP Vora (that is, the SAP Vora Spark Extension) is deployed to Hadoop clusters. 4 P U B L I C Introduction to SAP Vora

1.1 System Architecture The SAP Vora engines are services that you add to your existing Kubernetes installation. SAP Vora engines (with the exception of the disk engine) hold data in memory. To increase execution performance on the node level, you add an instance of an SAP Vora engine to each worker node of your Kubernetes cluster. A Kubernetes worker node can run any Vora component, such as the engines, Vora tools, and diagnostic tools. For more information, see the SAP Vora Installation and Administration Guide. Related Information SAP Vora Guides Introduction to SAP Vora P U B L I C 5

2 Factors That Influence Performance A number of factors have an impact on the performance of the system. Data Volume Consider the data volume that the users will process. This will have an impact on how many (worker) nodes you'll need to process the data efficiently. It will also determine the disk and memory size for each node. Workload Characteristics You need to know the workload characteristics to effectively and accurately size your deployment. For SAP Vora this will mainly be the number and size of database objects (tables, time series collections, and so on) managed by SAP Vora, and the number and complexity of concurrent queries. Number and Size of Database Objects Managed by SAP Vora For example, during creation of a table you can define that it will be managed by SAP Vora. Most probably the decision will be based on the query type and the response time expected to execute the query. The number of tables and their size will determine how much memory you need to allocate. Number and Complexity of Concurrent Queries SAP Vora can process queries with a high level of parallelization - up to the maximum number of cores. This number and the expected response time of the defined queries will determine the CPU processing power needed to fulfill this KPI. 6 P U B L I C Factors That Influence Performance

3 Sizing Fundamentals and Terminology SAP provides general sizing information on the SAP Service Marketplace. For the purpose of this guide, we assume that you are familiar with sizing fundamentals. You can find more information at http:// service.sap.com/sizing Sizing Sizing Decision Tree General Sizing Information. This section explains the most important sizing terms, as these terms are used extensively in this document. Sizing Sizing means determining the hardware requirements of an SAP application, such as the network bandwidth, physical memory, CPU processing power, and I/O capacity. The size of the hardware and database is influenced by both business aspects and technological aspects. This means that the number of users using the various application components and the data load they put on the server must be taken into account. Initial Sizing Initial sizing refers to the sizing approach that provides statements about platform-independent requirements of the hardware resources necessary for representative, standard delivery SAP applications. The initial sizing guidelines assume optimal system parameter settings, standard business scenarios, and so on. Expert Sizing This term refers to a sizing exercise where customer-specific data is analyzed and used to put more detail on the sizing result. The main objective is to determine the resource consumption of customized content and applications (not SAP standard delivery) by comprehensive measurements. For more information, see http://service.sap.com/sizing Sizing Sizing Decision Tree Expert Sizing. Sizing Fundamentals and Terminology P U BL IC 7

4 Initial Sizing for SAP Vora Initial sizing guidelines for SAP Vora. Assumptions You have a Hadoop cluster in place for storing data. This Hadoop cluster is separate from the SAP Vora Kubernetes cluster. You know the size of data that is to be processed by SAP Vora. 4.1 Engines Vora Relational In-Memory Engine You can estimate the memory requirements for the relational in-memory engine as follows: Parquet or ORC files: Memory usage = file size * 4 (for storing the data in memory) * 2 (for operations) Memory usage = 8* file size CSV files: Memory usage = uncompressed file size (for storing the data in memory) * 2 (for operations) Memory usage = 2* uncompressed file size Disk usage = <no additional disk space needed> Note that data in memory will not be evicted if memory is scarce. 8 P U B L I C Initial Sizing for SAP Vora

Vora Disk Engine You can use the following parameters to estimate the memory requirements for the disk engine: DataSize: Uncompressed data size on disk LoadDataSize: Uncompressed data size on disk that is to be loaded. Additional memory requirements can be alleviated by splitting large files into smaller chunks. CompressionRatio: Compression ratio of the disk engine (assumed to be 2) NumOfNodes: Number of disk engine nodes in the cluster You can calculate the disk size requirements as follows (bear in mind that the disk space is outside of the Hadoop Distributed File System (HDFS)): Disk usage for storing data: (DataSize / CompressionRatio) * 1.2 Disk usage for select operations: 0.2 * DataSize We assume an additional storage of 20 percent for queries. Maximum additional disk usage for load operations (only needed during the load operation): (200 MB * NumOfNodes * NumOfNodes) + (LoadDataSize / CompressionRatio) Disk usage = 0.8 * DataSize + 0.5 * LoadDataSize + (200 MB * NumOfNodes * NumOfNodes) Memory usage <= 13 GB per node (depending on the Vora Disk Engine settings) Vora Time Series Engine You can estimate the memory consumption of the time series engine as follows: Memory usage = size of the uncompressed data files / 5 (for storing the data in memory) + 0.5 * size of the uncompressed data files (for operations) Memory usage = 0.7 * size of uncompressed data files Disk usage = <no additional disk space needed> Bear in mind that this is a very rough estimation. The size depends heavily on the data. Vora Graph Engine The memory usage of the stored CSV and JSG files (graphs) is roughly three times their uncompressed disk size. Currently, the recommendation is that the graph store should not use more than 1 TB of main memory for performance reasons. Data in memory will not be evicted if memory is scarce. Initial Sizing for SAP Vora P U B L I C 9

Memory usage = 3* size of uncompressed CSV/JSG files (for storing the data in memory) + 1* size of uncompressed CSV/JSG files (for operations) Memory usage = 4* size of uncompressed CSV/JSG files Disk usage = <no additional disk space needed> Vora Document Store The memory usage of the JSON files is roughly the same as their uncompressed disk size. For working on the results, you need the same memory size in addition. Memory usage = uncompressed disk size of JSON files (for storing the data in memory) + uncompressed disk size of JSON files (for operations) Memory usage = 2* uncompressed disk size of JSON files Disk usage = <no additional disk space needed> Additional Requirements Distributed Log (DLog) Sizing for the Distributed Log is only relevant for streaming tables (INSERT INTO). Disk usage = (4 GB + PayloadSize * OverheadFactor (1.05)) * ReplicationFactor (default is 3) = (4 GB + PayloadSize * 1.05) * 3 PayloadSize is the size (in bytes) of the uncompressed data that the customer expects to ingest into SAP Vora. OverheadFactor is DLog s storage overhead (across tail store and stream store for internal data structures and metadata) relative to the overall payload size. ReplicationFactor is the target number of DLog replicas. The default is 3 and can be modified through the catalog. Diagnostics Framework Per node: Memory usage = 256 MB (by FluentD) + 30 MB (by Prometheus node exporter) Per SAP Vora installation: Memory usage = 256 MB (by Elastic Search) + 128 MB (by Kibana) + 128 MB (by Grafana) + 512 MB (by Prometheus server) Disk usage = 20 GB (by Elastic Search) + 10 GB (by Prometheus server) 10 P U B L I C Initial Sizing for SAP Vora

The Elastic Search disk space is calculated as 20 GB. If you change the default log level or the default retention policy of the log, the disk space requirements increase. You should then calculate 500 to 700 GB. Logs The following disk space requirements apply to all engines: Additional disk usage for traces and logs: 5 GB per node Vora Services The memory and disk requirements of the other SAP Vora services, such as the Catalog, Distributed Log, Landscape Server, Thriftserver, and Vora Tools are negligible. Docker Registry Per SAP Vora installation: Disk usage = 50 GB 4.2 Number of Kubernetes Worker Nodes with Vora Containers SAP Vora memory requirements depend heavily on workload characteristics. You can roughly estimate your memory requirements based on the formulas given above. Once you have estimated your memory requirements, you can calculate the number of Kubernetes worker nodes you need for SAP Vora. A 5 percent memory overhead is calculated for Kubernetes cluster management: Number of Kubernetes Worker Nodes = <Overall Memory Usage> : (<Memory of 1 Node> * 0.95) Minimum Kubernetes Sizing for a Production Environment 1 master node and 3 worker nodes (more master nodes are required for high availability (HA) scenarios) Worker node: >= 32 GB main memory, >= 4 cores Master node: Please contact your Kubernetes vendor Minimum Kubernetes Sizing for a Development/Test Environment 1 master node and 1 worker node (recommended: 2 worker nodes) Worker node: >= 32 GB main memory, >= 4 cores Master node: Please contact your Kubernetes vendor Kubernetes Limits Please consider the general Kubernetes limits when setting up a Kubernetes cluster, as described in the Kubernetes documentation: Building Large Clusters Initial Sizing for SAP Vora P U B L I C 11

Example The memory requirements for the individual engines and components are as follows: Engine Data Memory Usage Relational Engine 25 GB uncompressed CSV files 50 GB Time Series Engine 20 GB uncompressed data files 34 GB Graph Engine 10 GB uncompressed CSV files 40 GB Document Store 15 GB uncompressed JSON files 30 GB Diagnostics Framework Logs 1 GB 5 GB The overall main memory usage is 160 GB. Each worker node is equipped with 32 GB of RAM. A 5 percent overhead is calculated for Kubernetes, so for each worker node 30.4 GB of RAM is available for SAP Vora. The number of worker nodes needed is as follows: Worker nodes >= 160 GB / 30.4 GB = 5.2 You therefore need roughly 6 Kubernetes worker nodes with Vora containers. 12 P U B L I C Initial Sizing for SAP Vora

Important Disclaimers and Legal Information Coding Samples Any software coding and/or code lines / strings ("Code") included in this documentation are only examples and are not intended to be used in a productive system environment. The Code is only intended to better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, unless damages were caused by SAP intentionally or by SAP's gross negligence. Accessibility The information contained in the SAP documentation represents SAP's current view of accessibility criteria as of the date of publication; it is in no way intended to be a binding guideline on how to ensure accessibility of software products. SAP in particular disclaims any liability in relation to this document. This disclaimer, however, does not apply in cases of willful misconduct or gross negligence of SAP. Furthermore, this document does not result in any direct or indirect contractual obligations of SAP. Gender-Neutral Language As far as possible, SAP documentation is gender neutral. Depending on the context, the reader is addressed directly with "you", or a gender-neutral noun (such as "sales person" or "working days") is used. If when referring to members of both sexes, however, the third-person singular cannot be avoided or a gender-neutral noun does not exist, SAP reserves the right to use the masculine form of the noun and pronoun. This is to ensure that the documentation remains comprehensible. Internet Hyperlinks The SAP documentation may contain hyperlinks to the Internet. These hyperlinks are intended to serve as a hint about where to find related information. SAP does not warrant the availability and correctness of this related information or the ability of this information to serve a particular purpose. SAP shall not be liable for any damages caused by the use of related information unless damages have been caused by SAP's gross negligence or willful misconduct. All links are categorized for transparency (see: https://help.sap.com/viewer/disclaimer). Important Disclaimers and Legal Information P U BL IC 13

go.sap.com/registration/ contact.html 2017 SAP SE or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies. Please see https://www.sap.com/corporate/en/legal/copyright.html for additional trademark information and notices.