Deploying Apache Cassandra on Oracle Cloud Infrastructure Quick Start White Paper October 2016 Version 1.0

Similar documents
Deploying Oracle NoSQL Database on the Oracle Bare Metal Cloud Platform

Deploying Oracle NoSQL Database on the Oracle Cloud Infrastructure

NOSQL DATABASE CLOUD SERVICE. Flexible Data Models. Zero Administration. Automatic Scaling.

Creating Custom Project Administrator Role to Review Project Performance and Analyze KPI Categories

Deploy VPN IPSec Tunnels on Oracle Cloud Infrastructure. White Paper September 2017 Version 1.0

Veritas NetBackup and Oracle Cloud Infrastructure Object Storage ORACLE HOW TO GUIDE FEBRUARY 2018

Achieving High Availability with Oracle Cloud Infrastructure Ravello Service O R A C L E W H I T E P A P E R J U N E

Oracle CIoud Infrastructure Load Balancing Connectivity with Ravello O R A C L E W H I T E P A P E R M A R C H

Tutorial on How to Publish an OCI Image Listing

Bastion Hosts. Protected Access for Virtual Cloud Networks O R A C L E W H I T E P A P E R F E B R U A R Y

Generate Invoice and Revenue for Labor Transactions Based on Rates Defined for Project and Task

Oracle Cloud Infrastructure Virtual Cloud Network Overview and Deployment Guide ORACLE WHITEPAPER JANUARY 2018 VERSION 1.0

Migrating VMs from VMware vsphere to Oracle Private Cloud Appliance O R A C L E W H I T E P A P E R O C T O B E R

Deploying Custom Operating System Images on Oracle Cloud Infrastructure O R A C L E W H I T E P A P E R M A Y

Oracle Clusterware 18c Technical Overview O R A C L E W H I T E P A P E R F E B R U A R Y

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

VISUAL APPLICATION CREATION AND PUBLISHING FOR ANYONE

RAC Database on Oracle Ravello Cloud Service O R A C L E W H I T E P A P E R A U G U S T 2017

Oracle Secure Backup. Getting Started. with Cloud Storage Devices O R A C L E W H I T E P A P E R F E B R U A R Y

Cloud Operations for Oracle Cloud Machine ORACLE WHITE PAPER MARCH 2017

Oracle Data Provider for.net Microsoft.NET Core and Entity Framework Core O R A C L E S T A T E M E N T O F D I R E C T I O N F E B R U A R Y

JD Edwards EnterpriseOne Licensing

MySQL CLOUD SERVICE. Propel Innovation and Time-to-Market

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Autonomous Data Warehouse in the Cloud

Siebel CRM Applications on Oracle Ravello Cloud Service ORACLE WHITE PAPER AUGUST 2017

Oracle Exadata Statement of Direction NOVEMBER 2017

August 6, Oracle APEX Statement of Direction

StorageTek ACSLS Manager Software

Load Project Organizations Using HCM Data Loader O R A C L E P P M C L O U D S E R V I C E S S O L U T I O N O V E R V I E W A U G U S T 2018

CONTAINER CLOUD SERVICE. Managing Containers Easily on Oracle Public Cloud

Oracle Service Cloud Agent Browser UI. November What s New

ORACLE FABRIC MANAGER

Application Container Cloud

Oracle WebLogic Server Multitenant:

COMPUTE CLOUD SERVICE. Moving to SPARC in the Oracle Cloud

Extreme Performance Platform for Real-Time Streaming Analytics

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Oracle Database Security Assessment Tool

Your New Autonomous Data Warehouse

Automatic Receipts Reversal Processing

TABLE OF CONTENTS DOCUMENT HISTORY 3

Best Practices for Deploying High Availability Architecture on Oracle Cloud Infrastructure

An Oracle White Paper June Enterprise Database Cloud Deployment with Oracle SuperCluster T5-8

INTEGRATION CLOUD SERVICE. Accelerate Your Application Integration Across the Cloud and On Premises

Oracle Enterprise Performance Reporting Cloud. What s New in September 2016 Release (16.09)

Oracle Grid Infrastructure 12c Release 2 Cluster Domains O R A C L E W H I T E P A P E R N O V E M B E R

Establishing secure connectivity between Oracle Ravello and Oracle Cloud Infrastructure Database Cloud ORACLE WHITE PAPER DECEMBER 2017

An Oracle White Paper November Primavera Unifier Integration Overview: A Web Services Integration Approach

Correction Documents for Poland

Repairing the Broken State of Data Protection

Oracle Solaris 11: No-Compromise Virtualization

Oracle Fusion Configurator

April Understanding Federated Single Sign-On (SSO) Process

Protecting Your Investment in Java SE

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Key Features. High-performance data replication. Optimized for Oracle Cloud. High Performance Parallel Delivery for all targets

Oracle TimesTen Scaleout: Revolutionizing In-Memory Transaction Processing

Oracle Learn Cloud. Taleo Release 16B.1. Release Content Document

An Oracle White Paper October Minimizing Planned Downtime of SAP Systems with the Virtualization Technologies in Oracle Solaris 10

Oracle Cloud Applications. Oracle Transactional Business Intelligence BI Catalog Folder Management. Release 11+

WebCenter Portal Task Flow Customization in 12c O R A C L E W H I T E P A P E R J U N E

StorageTek ACSLS Manager Software Overview and Frequently Asked Questions

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle WebLogic Portal O R A C L E S T A T EM EN T O F D I R E C T IO N F E B R U A R Y 2016

Migration Best Practices for Oracle Access Manager 10gR3 deployments O R A C L E W H I T E P A P E R M A R C H 2015

Increasing Network Agility through Intelligent Orchestration

An Oracle Technical White Paper October Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers

Oracle Mobile Application Framework

Sun Fire X4170 M2 Server Frequently Asked Questions

An Oracle White Paper May Oracle VM 3: Overview of Disaster Recovery Solutions

Oracle GoldenGate for Big Data

Oracle Grid Infrastructure Cluster Domains O R A C L E W H I T E P A P E R F E B R U A R Y

Oracle NoSQL Database For Time Series Data O R A C L E W H I T E P A P E R D E C E M B E R

Oracle DIVArchive Storage Plan Manager

ORACLE SOLARIS CLUSTER

APPLICATION BUILDER CLOUD. Application Creation Made Easy

Technical White Paper August Recovering from Catastrophic Failures Using Data Replicator Software for Data Replication

Establishing secure connections between Oracle Ravello and Oracle Database Cloud O R A C L E W H I T E P A P E R N O V E M E B E R

Oracle Java SE Advanced for ISVs

ORACLE SNAP MANAGEMENT UTILITY FOR ORACLE DATABASE

Oracle Learn Cloud. What s New in Release 15B.1

Overview. Implementing Fibre Channel SAN Boot with the Oracle ZFS Storage Appliance. January 2014 By Tom Hanvey; update by Peter Brouwer Version: 2.

Oracle Database Appliance X6-2S / X6-2M ORACLE ENGINEERED SYSTEMS NOW WITHIN REACH FOR EVERY ORGANIZATION

Oracle Utilities CC&B V2.3.1 and MDM V2.0.1 Integrations. Utility Reference Model Synchronize Master Data

Corente Cloud Services Exchange

Oracle Big Data Connectors

October Oracle Application Express Statement of Direction

Using the Oracle Business Intelligence Publisher Memory Guard Features. August 2013

E-BUSINESS SUITE APPLICATIONS R12 (R12.2.5) HR (OLTP) BENCHMARK - USING ORACLE11g ON ORACLE S CLOUD INFRASTRUCTURE

See What's Coming in Oracle CPQ Cloud

See What's Coming in Oracle Taleo Business Edition Cloud Service

Configuring Oracle Business Intelligence Enterprise Edition to Support Teradata Database Query Banding

Benefits of an Exclusive Multimaster Deployment of Oracle Directory Server Enterprise Edition

An Oracle White Paper October The New Oracle Enterprise Manager Database Control 11g Release 2 Now Managing Oracle Clusterware

Loading User Update Requests Using HCM Data Loader

Transitioning from Oracle Directory Server Enterprise Edition to Oracle Unified Directory

An Oracle White Paper October Deploying and Developing Oracle Application Express with Oracle Database 12c

Transcription:

Deploying Apache Cassandra on Oracle Cloud Infrastructure Quick Start White Paper October 2016 Version 1.0

Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. Questions or comments on this paper? Please email: quickstart_us_grp@oracle.com. 1

Table of Contents Cassandra on Oracle Cloud Infrastructure overview... 2 Assumptions... 3 Instance shape selection... 3 Replication factor and Availability Domains... 3 Configuring local NVMe storage... 4 Create the required network resources... 5 Create the required compute instances... 5 Update, configure, install Cassandra on the three nodes... 7 Create the Cassandra cluster... 8 Start the Cassandra cluster... 9

Cassandra on Oracle Cloud Infrastructure overview This white paper is designed as a quick start reference guide for deploying the Apache Software Foundation's Cassandra NoSQL database on the Oracle Cloud Infrastructure platform. Apache Cassandra is a leading NoSQL database providing scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on cloud infrastructure makes it a compelling solution for mission-critical bid data workloads. Cassandra supports replicating across multiple datacenters, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages. Oracle Cloud Infrastructure offers hourly metered Bare Metal instances; by eliminating the hypervisor, Oracle can deliver better performance at a lower cost than traditional IaaS providers. In addition to compute unencumbered by a hypervisor, Oracle Cloud Infrastructure offers instances with up to 28TBs of locally attached NVMe storage. Each 28TB instance is capable of over 3 million 4K IOPs/sec, the perfect platform for a Cassandra workload. Instances in the Oracle Cloud Infrastructure are attached via a 10Gb non-blocking network with no oversubscription. Each Cassandra node has access to the full performance of the interface, there are no noisy neighbors or a hypervisor to share resources with. Instances in the same region are always less than 1ms from each other, this means your Cassandra queries will be completed in less time, at a lower cost than any other IaaS provider. To support highly available Cassandra deployments, Oracle Cloud Infrastructure builds regions with at least 3 Availability Domains (AD). Each AD is a fully independent datacenter with no fault domains shared across ADs. A Cassandra cluster replicated across two ADs is highly available; a Cassandra cluster replicated across three ADs increases data durability and ensures the reliability of the cluster and data set. 2

Assumptions Consumers of this document should:» Be familiar with the fundamentals of the Oracle Cloud Infrastructure:» https://docs.us-phoenix-1.oraclecloud.com/» The Oracle Cloud Infrastructure walkthrough is highly recommended if this is the first time you have used the platform:» https://docs.us-phoenix-1.oraclecloud.com/content/gsg/reference/overviewworkflow.htm» Have a basic understanding of Cassandra:» Cassandra key concepts» http://cassandra.apache.org/ Planning a Cassandra Deployment on Oracle Cloud Infrastructure A successful Cassandra deployment requires the administrator to weigh multiple factors: initial volume of data, typical workload, replication factor, future data growth. This information is used to select the right instance shape and the number of ADs to deploy across. Instance shape selection For Cassandra workloads, customers should use an instance shape that includes local NVMe storage. There are currently two options: BM.HighIO1.512 12.8TBs local NVMe storage 36 cores (72 threads) 512GBs of memory BM.DenseIO1.512 28.8TBs local NVMe storage 36 cores 512GBs of memory It s common for customers to start on the HighIO instance type (12.8TB) and eventually outgrow them. Running Cassandra on Oracle Cloud Infrastructure makes it trivial to move to the larger DenseIO (28.8TB) instances without any downtime on your Cassandra cluster. If your data size shrinks over time, it s also easy to move to the smaller instances. Replication factor and Availability Domains Cassandra is designed to replicate data across fault domains. On Oracle Cloud Infrastructure, each Availability Domain (AD) is its own fault domain. It s important that customers replicate across at least two ADs. Customers who want to increase their data durability should replicate across three ADs. The number of ADs a Cassandra cluster is replicated across impacts the availability and durability of a cluster as well as driving different storage configuration options. 3

Configuring local NVMe storage The local NVMe storage on each instance is unprotected; by default there is no redundant RAID. Customers are responsible for making sure the data on those drives is properly protected. Using Cassandra to replicate the data to additional nodes is one way to protect that data, customers can add an additional layer of data protection by creating a redundant RAID array across the NVMe drives. On each instance the customer will create a RAID array across the NVMe drives, either RAID 10 or RAID 6. RAID 10 provides protection against the failure of a NVMe device but reduces the available storage by 50%. RAID 6 provides parity based protection and, on the BM.Dense instance type will use fewer drives for redundancy than RAID 10 but won t be as fast. NOTE: I I. Anti-Patterns Don t use Oracle Cloud Infrastructure Remote Block Storage LUNs for primary storage. he Block service can be used effectively as a backup target. Building a Cassandra cluster on Oracle Cloud Infrastructure In this walkthrough, we will build a 3-node Cassandra cluster in the PHX region that spans three ADs and replicates across all three ADs. We will isolate the Cassandra nodes from the Internet by using security lists. 4

Create the required network resources 1. A new Virtual Cloud Network a. Name Cassandra_example b. CIDR block 10.0.0.0/27 2. A new internet gateway a. Name - Cassandra_IG 3. A new route table a. Name - Cassandra_Internet_route b. CIDR block 0.0.0.0/0 c. Target Cassandra_IG 4. Three new subnets a. Name AD1_Cassandra_private b. AD PHX-AD-1 c. CIDR block 10.0.0.0/29 d. Route table Cassandra_Internet_route e. Name AD2_Cassandra_private f. AD PHX-AD-2 g. CIDR block 10.0.0.8/29 h. Route table Cassandra_Internet_route i. Name AD3_Cassandra_private j. AD PHX-AD-3 k. CIDR block 10.0.0.16/29 l. Route table Cassandra_Internet_route Create the required compute instances 1. Three new instances a. Name Cassandra_AD1_0 b. Image Oracle-Linux-7.2-2016.10.05-0 c. Shape BM.DenseIO1.512 d. AD PHX-AD-1 e. Cloud Network Cassandra_example f. Subnet AD1_Cassandra_private g. SSH Key <public half of your key pair> 5

h. Name Cassandra_AD2_0 i. Image Oracle-Linux-7.2-2016.10.05-0 j. Shape BM.DenseIO1.512 k. AD PHX-AD-2 l. Cloud Network Cassandra_example m. Subnet AD2_Cassandra_private n. SSH Key <public half of your key pair> o. Name Cassandra_AD3_0 p. Image Oracle-Linux-7.2-2016.10.05-0 q. Shape BM.DenseIO1.512 r. AD PHX-AD-3 s. Cloud Network Cassandra_example t. Subnet AD3_Cassandra_private u. SSH Key <public half of your key pair> For each instance note the public and RFC1918 IP addresses Instance Public IP RFC1918 IP Cassandra_AD1_0 Cassandra_AD2_0 Cassandra_AD3_0 NOTE: All inter-instance communication should be across the RFC1918 address of the instances, not the public IP. Using the public IP adds latency to the connection and limits the bandwidth. Using the RFC1918 IP guarantees access to the full network bandwidth and the lowest possible latency. 6

Update, configure, install Cassandra on the three nodes Repeat these steps on each instance. 1. SSH into the instance - $ ssh i <private key> opc@<public IP> 2. Patch the instance - $ sudo yum update -y 3. Install Java and a few other tools we ll use - $ sudo yum install java mdadm screen dstat -y 4. Create a RAID 6 array across all 9 NVMe drives, create an XFS filesystem on the array and mount the filesystem - $ sudo mdadm --create /dev/md0 --chunk=256 --raid-devices=9 --level=6 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 $ sudo mdadm --detail --scan sudo tee -a /etc/mdadm.conf >> /dev/null $ screen $ sudo mkfs.xfs -s size=4096 -d su=262144 -d sw=6 /dev/md0 $ sudo mkdir /mnt/cassandra $ sudo mount /dev/md0 /mnt/cassandra NOTE: We use screen here in case we get disconnected from the instance while the mkfs command is running. 5. Open up the operating system firewall to allow Cassandra to communicate between instances. We limit communication on the Cassandra ports to the VCN subnet. $ sudo firewall-cmd --permanent --zone=public \ --add-rich-rule= rule family="ipv4" source address="10.0.0.0/27" \ port protocol="tcp" port="7000-7001" accept $ sudo firewall-cmd --permanent --zone=public \ --add-rich-rule= rule family="ipv4" source address="10.0.0.0/27" \ port protocol="tcp" port="7199" accept 7

6. Add the DataStax repo, using yum to install Cassandra is easier than the alternatives $ echo -e "[DataStax]\nname=DataStax Repo for Apache Cassandra\nbaseurl=http://rpm.datastax.com/datastax-ddc/3.2\nenabled=1\ngpgcheck=0" sudo tee /etc/yum.repos.d/datastax.repo $ sudo yum update -y $ sudo yum install datastax-ddc -y 7. Set the cluster name, use the NVMe backed filesystem for data, and a few more details. $ sudo chown cassandra.cassandra /mnt/cassandra/ $ sudo vi /etc/cassandra/conf/cassandra.yaml line 10: cluster_name: ORCL BMC ROCKS line 71: hints_directory: /mnt/cassandra/hints line 170: - /mnt/cassandra/data line 175: commitlog_directory: /mnt/cassandra/commit_log line 287: saved_caches_directory: /mnt/cassandra/saved_caches line 343: -seeds: <comma delimited list of all three RFC 1918 IPs> line 472: listen_address: <the RFC 1918 IP of the local instance> line 801: endpoint_snitch: GossipingPropertyFileSnitch NOTE: To turn on line numbers in vi - :set number Create the Cassandra cluster Now we configure the Cassandra so that every node belongs to the same cluster. To create a cluster while using the GossipingPropertyFileSnitch endpoint snitch we have to create a different /etc/cassandra/conf/cassandrarackdc.properties file on each instance. By default the GossipingPropertyFileSnitch always loads cassandra-topology.properties when it is present. Remove the file from each node. 1. On Cassandra_AD1_0 - $ sudo rm /etc/cassandra/conf/cassandra-topology.properties $ sudo vi /etc/cassandra/conf/cassandra-rackdc.properties dc=ad1 rack=rac1 8

2. On Cassandra_AD2_0 - $ sudo rm /etc/cassandra/conf/cassandra-topology.properties $ sudo vi /etc/cassandra/conf/cassandra-rackdc.properties dc=ad2 rack=rac1 3. On Cassandra_AD3_0 $ sudo rm /etc/cassandra/conf/cassandra-topology.properties $ sudo vi /etc/cassandra/conf/cassandra-rackdc.properties dc=ad3 rack=rac1 Start the Cassandra cluster On each node $ sudo service cassandra start Check the status of the cluster $ nodetool status Congratulations! You have a triple redundant, high performance Cassandra cluster up and running in the OBMCS. Next steps A Cassandra cluster isn t much good without data. Here s a great list of public data sets - https://www.opensciencedatacloud.org/publicdata/. Your cluster can support about 20TBs of data. The US Census Dataset is a great place to get started https://www.opensciencedatacloud.org/publicdata/us_census/ 9

CONNECT WITH US blogs.oracle.com/oracle facebook.com/oracle twitter.com/oracle oracle.com Copyright 2016, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0116 Deploying Apache Cassandra on Oracle Cloud Infrastructure October 2016 Author: Craig Carl (craig.carl@oracle.com) Contributing Authors: Shan Gupta (shan.gupta@oracle.com), Bruce Burns (bruce.burns@oracle.com) 10