How Data Volume Affects Spark Based Data Analytics on a Scale-up Server

Similar documents

Workload Characterization and Optimization of TPC-H Queries on Apache Spark

BigDataBench-MT: Multi-tenancy version of BigDataBench

ibench: Quantifying Interference in Datacenter Applications

DCBench: a Data Center Benchmark Suite

Multi-tenancy version of BigDataBench

Using Alluxio to Improve the Performance and Consistency of HDFS Clusters

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

2

Service Oriented Performance Analysis

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino

StreamBox: Modern Stream Processing on a Multicore Machine

Accelerate Big Data Insights

Understanding the Role of Memory Subsystem on Performance and Energy-Efficiency of Hadoop Applications

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures

The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

Performance Tools for Technical Computing

Hadoop Workloads Characterization for Performance and Energy Efficiency Optimizations on Microservers

Software and Tools for HPE s The Machine Project

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Computational performance and scalability of large distributed enterprise-wide systems supporting engineering, manufacturing and business applications

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

Main-Memory Requirements of Big Data Applications on Commodity Server Platform

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Best Practices for Setting BIOS Parameters for Performance

SQL Server Administration 10987: Performance Tuning and Optimizing SQL Databases. Upcoming Dates. Course Description.

DATABASES AND THE CLOUD. Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland

Performance of Multicore LUP Decomposition

Enosis: Bridging the Semantic Gap between

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa

Composite Metrics for System Throughput in HPC

SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics

Improving CPU Performance of Xen Hypervisor in Virtualized Environment

ArrayUDF Explores Structural Locality for Faster Scientific Analyses

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Bolt: I Know What You Did Last Summer In the Cloud

CS Project Report

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Accelerating Analytical Workloads

Jackson Marusarz Intel Corporation

Virtuozzo Hyperconverged Platform Uses Intel Optane SSDs to Accelerate Performance for Containers and VMs

HPMMAP: Lightweight Memory Management for Commodity Operating Systems. University of Pittsburgh

BigDataBench: a Big Data Benchmark Suite from Web Search Engines

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach

[MS10987A]: Performance Tuning and Optimizing SQL Databases

Availability and Utility of Idle Memory in Workstation Clusters. Anurag Acharya, UC-Santa Barbara Sanjeev Setia, George Mason Univ

XPU A Programmable FPGA Accelerator for Diverse Workloads

Compiler Optimizations and Auto-tuning. Amir H. Ashouri Politecnico Di Milano -2014

Improving Virtual Machine Scheduling in NUMA Multicore Systems

Big Data Using Hadoop

Composable Infrastructure for Public Cloud Service Providers

JPDM, A Structured approach To Performance Tuning. Copyright 2017 Kirk Pepperdine. All rights reserved

Scaling Up Performance Benchmarking

Architecting For Availability, Performance & Networking With ScaleIO

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

BOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing. Chen Zheng ICT,CAS

Evaluation of Intel Memory Drive Technology Performance for Scientific Applications

Course Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led

Accelerating Spark Workloads using GPUs

Scalable Tools - Part I Introduction to Scalable Tools

GPU Consolidation for Cloud Games: Are We There Yet?

Falling Out of the Clouds: When Your Big Data Needs a New Home

Exploiting the Behavior of Generational Garbage Collector

Understanding Application Hiccups

Performance Tuning & Optimizing SQL Databases Microsoft Official Curriculum (MOC 10987)

SoftNAS Cloud Performance Evaluation on Microsoft Azure

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

ARMv8 Micro-architectural Design Space Exploration for High Performance Computing using Fractional Factorial

Apache Flink: Distributed Stream Data Processing

Optimize Data Structures and Memory Access Patterns to Improve Data Locality

A Multi-Tiered Optimization Framework for Heterogeneous Computing

Analytics in the cloud

Profiling: Understand Your Application

33% 148% 2. at 4 years. Silo d applications & data pockets. Slow Deployment of new services. Security exploits growing. Network bottlenecks

Decentralized Distributed Storage System for Big Data

VMware and Xen Hypervisor Performance Comparisons in Thick and Thin Provisioned Environments

Scheduling the Intel Core i7

VIProf: A Vertically Integrated Full-System Profiler

JVM and application bottlenecks troubleshooting

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

How Scalable is your SMB?

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

Java 8 Stream Performance Angelika Langer & Klaus Kreft

Scaling DreamFactory

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

SoftNAS Cloud Performance Evaluation on AWS

Benchmarking and Analysis of Software Network Data Planes

Basics of Performance Engineering

Detecting Memory-Boundedness with Hardware Performance Counters

MARACAS: A Real-Time Multicore VCPU Scheduling Framework

Anastasia Ailamaki. Performance and energy analysis using transactional workloads

Java 8 Stream Performance Angelika Langer & Klaus Kreft

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation

Staged Memory Scheduling

BigDataBench: a Benchmark Suite for Big Data Application

IBM Education Assistance for z/os V2R2

Transcription:

How Data Volume Affects Spark Based Data Analytics on a Scale-up Server Ahsan Javed Awan EMJD-DC (KTH-UPC) (https://www.kth.se/profile/ajawan/) Mats Brorsson(KTH), Vladimir Vlassov(KTH) and Eduard Ayguade(UPC and BSC), 1

Why should we care about architecture support? Data Growing Faster Than Technology 2 *Source: SGI

Cont... Hadoop, Spark, Flink, etc.. Phoenix ++, Metis, Ostrich, etc.. Our OurFocus Focus Improve the node level performance through architecture support 3 *Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/

Conti... A mismatch between the characteristics of emerging workloads and the underlying hardware. M. Ferdman et-al, Clearing the clouds: A study of emerging scale-out workloads on modern hardware, in ASPLOS 2012. Z. Jia, et-al Characterizing data analysis workloads in data centers, in IISWC 2013. Z. Jia et-al, Characterizing and subsetting big data workloads, in IISWC 2014 A. Yasin et-al, Deep-dive analysis of the data analytics workload in cloudsuite, in IISWC 2014. T. Jiang, et-al, Understanding the behavior of in-memory computing workloads, in IISWC 2014 Existing studies lack quantitative analysis of bottlenecks of scale-out frameworks on single-node 4

Which Scale-out Framework? Progress Meeting 12-12-14 [Picture Courtesy: Amir H. Payberah] 5

What are the major bottlenecks?? Our Approach Performance characterization of in-memory data analytics on a modern cloud server, in 5th International IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award). How Data Volume Affects Spark Based Data Analytics on a Scale-up Server Focus of this talk 6

What are the remaining questions?? Our Approach Do Spark based data analytics benefit from using scale-up servers? How severe is the impact of garbage collection on performance of Spark based data analytics? Is file I/O detrimental to Spark based data analytics performance? How does data size affect the micro-architecture performance of Spark based data analytics? 7

What are the contributions?? Our Approach We evaluate the impact of data volume on the performance of Spark based data analytics running on a scale-up server. We quantify the limitations of using Spark on a scale-up server with large volumes of data. We quantify the variations in micro-architectural performance of applications across different data volumes. 8

Methodology Our Approach Use a subset of benchmarks from BigDataBench Use Big Data Generator Suite (BDGS), to generate synthetic datasets of 6 GB, 12 GB and 24 GB. Configure Spark in local mode and tune its internal Parameters Rely on GC logs to collect garbage collection times. Use Spark logs to gather execution time of benchmarks. Use Concurrency Analysis in Intel Vtune to collect wait time and CPU time of executor pool threads Use General Micro-architectural Exploration in Intel Vtune to analyze impact of data volume on micro-architecture characteristics. 9

What are the characteristics of benchmarks? Our Approach 10

System Details Our Hardware Configuration 11

Machine Details Our Hardware Configuration Hyper Threading and Turbo-boost are disabled Hyper Threading and Turbo-boost are disabled 12

Software Parameters Our Approach 13

Do Spark based data analytics benefit from using larger scale-up servers? Spark applications do not benefit significantly by using more than 12-core executors 14

Is GC detrimental to scalability of Spark applications? The proportion of GC time increases with the number of cores 15

Does performance remain consistent as we enlarge the data size? Decrease in Data processed per second ranges from 11% to 93% ( Parallel Scavenge) 16

Does the choice of Garbage Collector impact the data processing capability of the system?? Improvement in DPS ranges from 1.4x to 3.7x on average in Parallel Scavenge as compared to G1 17

How does GC affect data processing capability of the system?? GC time does not scale linearly with data size. 18

How does CPU utilization scale with data volume? CPU Utilization decreases with increase in input data size 19

Is File I/O detrimental to performance? Fraction of file I/O increases by 6x, 18x and 25x for Word Count, Naive Bayes and Sort respectively when input data is increased by 4x 20

How does data size affects micro-architectural performance? 5 to 10 % better instruction retirement as we enlarge the data size 21

Cont.. Execution units inside the core exhibit improved utilization at larger data sets 22

Cont.. Increase in L1 Bound Stalls implies better utilization of L1 Caches 23

Cont.. Spark benchmarks exhibit reduced memory bandwidth utilization 24

Key Findings Spark workloads do not benefit significantly from executors with more than 12 cores. The performance of Spark workloads degrades with large volumes of data due to substantial increase in garbage collection and file I/O time. With out any tuning, Parallel Scavenge garbage collection scheme outperforms Concurrent Mark Sweep and G1 garbage collectors for Spark workloads. Spark workloads exhibit improved instruction retirement due to lower L1 cache misses and better utilization of functional units inside cores at large volumes of data. Memory bandwidth utilization of Spark benchmarks decreases with large volumes of data and is 3x lower than the available offchip bandwidth on our test machine 25

Future Directions NUMA Aware Task Scheduling Cache Aware Transformations Exploiting Processing In Memory Architectures HW/SW Data Prefectching Rethinking Memory Architectures 26