Software Defined Storage at the Speed of Flash PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec
Agenda Introduction Software Technology Architecture Review Oracle Configuration & Results Storage & Application Resiliency An All-Flash Array Comparison Other Use Cases 2
INTRODUCTION 3
Solid State Devices Why Solid State? Solid State Drive Usage in Datacenters Latency is killing application performance Milliseconds vs microseconds for SSD Flash does more with less Thousands of IOPS vs few hundreds from HDDs Reduced rackspace, power and cooling Flash is already competitively priced Flash will be less expensive than disks in three years 75% Increased Capacity per Drive 4
Software Defined Storage Why Software Defined Storage? Faster processors and networks Increased slot availability for in-server storage Based on cost effective commodity hardware Improves automation Provide required SLAs Platform independent Software-defined storage (SDS) is created by loading software onto any cluster of standard servers and leveraging any type of server-side storage devices, while providing shared storage capabilities. SDS eliminates the need for storage arrays and storage area networks. Preferred Location for Enterprise SSD 5
SOFTWARE TECHNOLOGY 6
Storage Foundation Cluster File System Veritas Operations Manager Storage Application Virtual Business Services Veritas Cluster Server Cluster File System Flexible Storage Sharing Recovery orchestration across different application tiers Application High Availability Intelligent Monitoring Framework Concurrent file access from any node Checkpoints, dedup & compression Shared storage capabilities using internal disks Storage resiliency 7
Flexible Storage Sharing Using In-Server Flash Features Build true DAS cluster RDMA Interconnect RAID Protection Cluster Volume Manager Cluster File System Compression, Dedup, Snaps, etc. Benefits Reduced Infrastructure Improved Flexibility Leverage in-server storage capabilities Flexible Storage Sharing RDMA over IB/GE Cluster File System Data Protected Across Nodes 8 8
Concepts Devices Device Export An operation to make the device available to all servers in the cluster Remote Device A device that is not attached locally to a server and that needs to be accessed through another server server_1 disk_1 Compute Servers server_1_disk_1 Flexible Storage Sharing Exports disk_1 as server_1_disk_1 Uses Low Latency Transport for data transfer ( aka IO Shipping ) Handles server reconfiguration 9
Flexible Storage Sharing - Data Availability Highly Available Applications Global Namespace Server A Server B CVM Volume_1 CVM Volume_1 Mirror-1 Mirror-2 Mirror-1 Mirror-2 Cluster Interconnect 10
Flexible Storage Sharing IO Shipping Internals Server A Remote Server B NIC / RNIC CVM I/O shipping using RDMA NIC / RNIC Low Latency Transport (LLT) Low Latency Transport (LLT) I/O completed in same context APP IO Shipping Request Starts Response to NIC via LLT IO Processing Begins vxiod CVM / CFS vxiod CVM / CFS App I/O CVM Volume IO Complete Flash 1 Mirror Flash 1 Target Device 11 1
Application High Availability Listener IP NIC OracleSG Oracle Mount Volume Resource Manages a specific application or service Determines how to start, stop, and monitor each specific service Service Group Manages an application service as an unit Contain Resources, Dependencies and Attributes 12
Immediate Fault Detection Traditional Monitoring Framework Intelligent Monitoring Framework Polling Asynchronous Most Clustering Solutions Poll based Monitoring Faulted Symantec Cluster Server Intelligent Monitoring Faulted Faulting Resources Being Monitored Registering Resources Being Monitored Immediate fault detection 13
ARCHITECTURE REVIEW 14
Components Intel SSD DC P3700 Series - Capacities: 400GB, 800GB, 1.6TB, 2TB - IOPS Rnd 4KB 70/30 R/W: Up to 460/175K - Direct path to the CPU with NVMe - Less than 20us latency for sequential access - High Endurance Technology Intel R1208WTTGS 1U Server - 2 x Intel Xeon E5-2697 v3 @ 2.6GHz - 1U Rack - Up to 8 x 2.5 hot-swap drives - Max Memory Size: 3072 GB Intel Ethernet Converged Network Adapter XL710-QDA2 - Dual Port 40 GbE 15
Architecture Oracle 11gR2 Single Instance Symantec Storage Foundation Cluster File System 6.2 Red Hat Enterprise Linux 6.5 2 x Intel Xeon Processor E5-2697 v3 @ 2.6GHz (14 cores) 4 x Intel SSD DC P3700 Series (800 GB) 1 x dual-port 40GbE 128 GB DDR4 Memory 2 x Intel Xeon Processor E5-2697 v3 @ 2.6GHz (14 cores) 4 x Intel SSD DC P3700 Series (800 GB) 1 x dual-port 40GbE 128 GB DDR4 Memory 16
Storage Configuration Resiliency Each SSD is mirrored Each write is committed to a different server Avoid Single Point of Failures File System with global namespace across all nodes Performance Stripe across mirrors for performance Oracle Disk Manager 17
ORACLE CONFIGURATION & RESULTS 18
Oracle Configuration 23 GB SGA per Instance Single Instance Configuration Data files on File System using ODM Oracle High Availability managed by Veritas Cluster Server agents Intelligent Monitoring Framework enabled for Fast Failover 19
Storage Layout Each SSD holds redo log data for one instance plus data for the other three Configuration mirrored across the other server 20
Storage Virtualization & Application High Availability Storage Virtualization File Systems immediately available across all the nodes Data mirrored across nodes in a full redundant configuration Volume Manager makes any storage failure transparent for the application Application High Availability Fast Failover to recover any application failure Full visibility and control through a single pane of glass 21
TPCC Like Benchmarks 600K tpmc 1.1M tpmc 1.5M tpmc 22
SGA Size Effects TPCC benchmarks results are traditionally influenced by SGA size Decreasing SGA size from 82GB to 23GB increased IOPs from 48K to 142K Transactions per minute only marginally degraded from 609K to 608K 23
STORAGE & APPLICATION RESILIENCY 24
Fast Failover & Storage Resiliency App Cluster File System Start Application Check File System Mount File System Import Disks Detect Fault Intelligent Monitoring Framework provides fast failure detection Data on flash accelerates recovery Volume Manager detaches a plex when a server is offline No IO is shipped while other server is offline Data is automatically resynchronized when the server is back Database keeps writing to the underlying storage as normal 25
Fast Failover & Storage Resiliency RUN1 RUN2 RUN3 RUN4 Oracle starting in the other node 0:00:18 0:00:17 0:00:16 0:00:21 Database mounted 0:00:07 0:00:01 0:00:06 0:00:06 Crash recovery 0:02:10 0:02:09 0:00:50 0:02:06 Database online 0:00:04 0:00:02 0:00:02 0:00:02 Total Recovery Time 0:02:39 0:02:29 0:01:14 0:02:35 App Recovery Read KB redo 15,970,970 15,984,448 5,654,785 15,854,764 Blocks need recovery 1,031,287 1,015,626 985,103 1,026,460 Completed redo application (MB) 2,657 2,276 1,033 1,726 Storage Recovery 26
AN ALL FLASH ARRAYS COMPARISON 27
Oracle Wait Event Histogram All-Flash Array Workload with SLOB (75% reads, 25% update) 83,114 physical read/s 15,754 physical write/s Redo log 20.8MB/s http://xtremio.com/wp-content/uploads/2014/07/h13174-wp-optimized-flash-storage-for-oracle-databases.pdf Symantec Solution Workload with TPCC like benchmark 81,355 physical read/s 45,497 physical write/s Redo log 75.8MB/s Storage Costs IOPS DB Instances DB Seq read Latency DB parallel read Redo Log DB Writer response time XtremeIO 340,000 98,686 1 1.07 ms 20.8 MB/s Symantec 56,656 137,852 1 0.277 ms 0.862 ms 75.8 MB/s 0.267 ms 28
Service Times Sustainability XtremeIO Dual X-Brick Scale to two servers (2 & 4 RAC Instances) Scale to two X-Bricks Oracle RAC with SLOB workload Symantec/Intel Solution 2 & 4 Instances per cluster 23GB SGA per Instance TPCC like workload Single Instance Storage Costs IOPS DB Instances DB Seq read Latency DB parallel read Redo Log DB Writer response time XtremeIO 2 X-Bricks 680,000 189,528 2 0.961 ms 41.4 MB/s Symantec 56,656 256,052 2 0.626 ms 1.578 ms 142 MB/s 0.305 ms XtremeIO 2 X-Bricks 680,000 182,041 4 1.747 ms 4.298 ms 28 MB/s 0.691 ms Symantec 56,656 327,614 4 1.423 ms 2.876 ms 187 MB/s 0.424 ms Throughput over a 8 hour period of time 1 Instance http://xtremio.com/wp-content/uploads/2014/07/h13174-wp-optimized-flash-storage-for-oracle-databases.pdf https://www.emc.com/collateral/white-paper/h12117-high-performance-oracle-xtremio-wp.pdf (OLTP 75%/25% Query/Update Table 14-128 Sessions) 29
All-Flash Performance in a 2U Converged Infrastructure Configuration Violin (1) XtremeIO (3) Symantec/Intel XtremeIO (3) Symantec/Intel XtremeIO (8) Symantec/Intel Storage HW Violin 6616 Single X-Brick 8 x Intel P3700 Dual X-Brick 8 x Intel P3700 Dual X-Brick 8 x Intel P3700 Capacity 8TB (6) 10TB (7) 8TB (5) 20TB (7) 8TB (5) 14.94 TB (6) 8TB (5) System RAM 128GB ---- 256GB (4) ---- 256GB (4) 2TB (9) 256GB (4) Fibre Channel Paths 6 4 N/A 4 N/A ---- N/A Storage Costs (MSRP) 540,000 340,000 56,656 680,000 56,656 680,000 56,656 Oracle Instances 4 1 1 2 (RAC) 2 4 (RAC) 4 Highly Available Config NO NO Yes Yes Yes Yes Yes TPmC 1,500,000 ---- 608,069 ---- 1,116,069 ---- 1,427,477 IOPs ---- 98,868 137,852 189,528 256,052 182,041 327,614 DB sequential reads latency (ms) ---- 1.07 0.377 0.961 0.626 1.747 1.423 DB parallel read (ms) ---- ---- 0.862 ---- 1.578 4.298 2.876 Redo Log Thoughput (MB/S) ---- 20.8 75.8 41.4 142 28 187.6 DB writer response time (ms) ---- ---- 0.267 ---- 0.305 0.691 0.424 log file parallel write ---- ---- 2.685 1.1 2.4 2.119 2.812 (1) https://access.redhat.com/sites/default/files/attachments/red_hat-violin_perf.brief_v5_0.pdf (3) http://xtremio.com/wp-content/uploads/2014/07/h13174-wp-optimized-flash-storage-for-oracle-databases.pdf (4) 128GB per system - 23GB SGA per Instance (5) IOPS using 3.2 Usable TB Capacity. Prices for 16TB raw Capacity (8TB usable capacity in a high resilience configuration) (6) Usable capacity (7) Raw capacity (8) https://www.emc.com/collateral/white-paper/h12117-high-performance-oracle-xtremio-wp.pdf (OLTP 75%/25% Query/Update Table 14-128 Sessions) (9) 4 Node configuration with 512GB each node (128GB for each VM - SGA 16GB) 30
OTHER USE CASES 31
Storage & Compute Nodes Scalability using HDDs Sequential Reads & Writes Scale on performance and capacity when needed All servers are writing to and reading from the same file system Servers # HDDs Write GBytes/s Read GBytes/s 2 48 1.71 3.18 4 96 3.4 7.19 6 132 5.04 12.48 8 192 6.62 17.08 32
Sequential Workloads on SSD # SSDs Write GBytes/s Read GBytes/s 8 3.80 5.47 2 copies of data Larger count of HDDs needed to provide lesser performance than SSDs 33
Other Use Cases SAS Grid (8 nodes using internal HDDs) Reduce costs for SAS Analytics POC FINANCIAL Oracle RAC (4 nodes using internal SSDs) Avoid all-flash array costs TELCO TIBCO (2 nodes + 2 nodes at DR site) Avoid SAN infrastructure TRANSPORT Reporting System (2 nodes using internal SSDs) Embedded in Appliance Improve reporting time Build Agile Appliance series based on commodity HW + SSDs FINANCIAL IT PARTNER Call Detail Record custom App Custom Relay on internal SSDs and HDDs to avoid SAN TELCO 34
Q &A 35
Thank You carlos_carrero@symantec.com rajagopal_vaideeswaran@symantec.com 36