Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi

Similar documents
MapR Enterprise Hadoop

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

Configuring and Deploying Hadoop Cluster Deployment Templates

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Installing SmartSense on HDP

Hortonworks Data Platform

Hadoop An Overview. - Socrates CCDH

Hortonworks SmartSense

Distributed Systems 16. Distributed File Systems II

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved.

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

3. Monitoring Scenarios

Vendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam.

Hortonworks Data Platform

Timeline Dec 2004: Dean/Ghemawat (Google) MapReduce paper 2005: Doug Cutting and Mike Cafarella (Yahoo) create Hadoop, at first only to extend Nutch (

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Managing and Monitoring a Cluster

HDFS What is New and Futures

Hortonworks Data Platform

Apache Hadoop.Next What it takes and what it means

Cmprssd Intrduction To

docs.hortonworks.com

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński

Big Data for Engineers Spring Resource Management

Installing Apache Zeppelin

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

50 Must Read Hadoop Interview Questions & Answers

Managing High Availability

Big Data Hadoop Stack

Big Data 7. Resource Management

Release Notes 1. DLM Release Notes. Date of Publish:

Top 25 Hadoop Admin Interview Questions and Answers

Getting Started 1. Getting Started. Date of Publish:

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

Hortonworks SmartSense

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Oracle Big Data Fundamentals Ed 2

Hortonworks Data Platform

Innovatus Technologies

Hadoop. Introduction / Overview

Hadoop-PR Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer)

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

CS370 Operating Systems

Cloudera Manager Quick Start Guide

Chase Wu New Jersey Institute of Technology

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Hortonworks Data Platform

Exam Questions CCA-500

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Product Documentation. Pivotal HD. Version 2.1. Stack and Tools Reference. Rev: A Pivotal Software, Inc.

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

Hortonworks University. Education Catalog 2018 Q1

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

How to Install and Configure EBF16193 for Hortonworks HDP 2.3 and HotFix 3 Update 2

CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH)

Namenode HA. Sanjay Radia - Hortonworks

Top 25 Big Data Interview Questions And Answers

Introduction to the Hadoop Ecosystem - 1

Exam Questions CCA-505

Getting Started with Hadoop

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

Installing Apache Atlas

CCA Administrator Exam (CCA131)

Certified Big Data and Hadoop Course Curriculum

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

The Evolving Apache Hadoop Ecosystem What it means for Storage Industry

Distributed Computation Models

HDFS Design Principles

A BigData Tour HDFS, Ceph and MapReduce

Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]

HDP 2.3. Release Notes

Big Data Hadoop Course Content

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

How to Install and Configure Big Data Edition for Hortonworks

HADOOP 3.0 is here! Dr. Sandeep Deshmukh Sadepach Labs Pvt. Ltd. - Let us grow together!

Apache ZooKeeper ACLs

Introduction to BigData, Hadoop:-

L5-6:Runtime Platforms Hadoop and HDFS

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS

Big Data Analytics using Apache Hadoop and Spark with Scala

Distributed File Systems II

Dept. Of Computer Science, Colorado State University

ExamTorrent. Best exam torrent, excellent test torrent, valid exam dumps are here waiting for you

Welcome to. uweseiler

Administration 1. DLM Administration. Date of Publish:

Hortonworks Data Platform

Cloudera Administration

Hortonworks Data Platform

Administration 1. DLM Administration. Date of Publish:

Every SAS Cloud has a Silver Lining. Letting your data reign in the cloud

Recovery: auto start components

Hortonworks PR PowerCenter Data Integration 9.x Administrator Specialist.

Hadoop Development Introduction

docs.hortonworks.com

Department of Digital Systems. Digital Communications and Networks. Master Thesis

Cloudera Administration

Transcription:

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures Hiroshi Yamaguchi & Hiroyuki Adachi

About Us 2 Hiroshi Yamaguchi Hiroyuki Adachi Hadoop DevOps Engineer Hadoop Engineer Copyright 2017 Yahoo Japan Corporation. All Rights Reserved.

Largest Portal Site in Japan 74% Smartphone Users No.1 App. publisher >90 million DUB * >69.8 Billion PV/month >34million MAU 82% PC Users * Daily Unique Browsers >100 Services 2.7Billion Shopping Items

Overview of Our Data Platform Web Server Data Platform Analyze PV Conversion Recommendation Ad Targeting....

Agenda 1. Our Hadoop Cluster 2. Issues Involved in Previous Upgrade 3. Upgrade Approach 4. How to Upgrade 5. Results 6. Conclusion

Our Hadoop Cluster

Overview HDP 2.3 + Ambari 2.2 Almost all core components are configured with HA HA: NameNode, ResourceManager, JournalNode, Zookeeper Non HA: Application Timeline Server Secured with Kerberos 800 slave nodes DataNode/NodeManager Other components HiveServer2, Oozie, HttpFS NN s 800 Nodes RM s 128GB Mem Xeon E5-2683v3

Data Volume Total stored data 37 PB Increases about 50 ~ 60 TB/day Increase in data volume

Work Load Multi-tenant cluster ETL / E-Commerce / Advertising / Search... 20K ~ 30K jobs/day Average resource utilization is over 80% YARN work load of the day Data processing needs are growing

Issues Involved in Previous Upgrade

Previous Upgrade Package Installation Upgrade Check Performed in Q4 2015 Without using Ambari Manual Express Upgrade including above mentioned steps

Previous Upgrade Package Installation Upgrade Check stops all components Upgrade Restarting all components for updates to come into effect Check Component s normality checks Wait till missing blocks to be fixed It took 12 hours

Issue 1 : 12 Hours of Cluster Down Why took so long? Large number of slave nodes Multiple nodes failed to start after upgrade Job failed due to data loss Must recover missing blocks

Issue 2 : Finding Right Schedule Challenging to find a right schedule Cluster is multi-tenant, shared by hundreds of services Picking a day with minimal impact e.g. A day without weekly or monthly running jobs Photo by: Aflo

Issue 3 : Coordinating Resource Allocation Post-Upgrade : Restarting all jobs simultaneously caused resource exhaustion Jobs were recovered by precedence Resource Allocation Ad hoc Jobs Upgrade Operation Scheduled Jobs Scheduled Jobs ETL Jobs ETL Jobs Cluster down

Issues to be Addressed Storage Should not affect HDFS Read/Write Should prevent missing blocks Processing To keep Components (Hive, Oozie ) working Impact-less upgrade operation

Upgrade Approach

Possible Upgrading Methods 1. Ambari Provided Express Upgrade 2. Ambari Provided Rolling Upgrade 3. Manual Express Upgrade 4. Manual Rolling Upgrade

Possible Upgrading Methods 1. Ambari Provided Express Upgrade 2. Ambari Provided Rolling Upgrade 3. Manual Express Upgrade 4. Manual Rolling Upgrade 1 st and 2 nd are not suitable for our environment 3 rd has several issues as explained earlier

Approaches for Upgrading Install package Upgrade Check NameNode Slave Node Others Impact-less Rolling Upgrade Grouping components e.g. NameNode Upgrading & Checking each component group

Approaches for Upgrading Install package Upgrade Check Upgrade time increases Slave Nodes Total upgrade time increases as slave nodes are upgraded one-by-one for safety time

Approaches for Upgrading Install package Upgrade Check Upgrade time increases Slave Nodes Automating upgrade and check process to reduce operation cost time

Summary Impact-less rolling upgrade is possible Automating upgrade process reduces operation cost

How to Upgrade

Target Cluster HDP 2.3.x + Ambari 2.2.0 Master nodes HA: NameNode, ResourceManager, JournalNode, Zookeeper Non-HA: Application Timeline Server Slave nodes 800 DataNode/NodeManager Others HiveServer2, Oozie, HttpFS

Apache Ambari ambariagent Name Node ambariagent Resource Manager Web UI REST API ambariagent Data Node Node Manager ambariagent...

Ambari Provided HDP Upgrade Methods Express Upgrade Brings down entire cluster Rolling Upgrade Cannot control Load balanced HiveServer2 restart Collective DataNode restart But we can t adopt either of these methods

Extending Ambari s Operations Safety restart and our environment specific operations Ambari Custom Service Additional operations such as NN failover API wrapper scripts Additional confirmation of service normality Precise control

For Non-stop Upgrade Controlling with Ansible Map Reduce Hive Spark... Replication Missing Blocks Controlling with Custom Ambari Services and Scripts

Upgrade Flow Control Controlling with Ansible Map Reduce Hive Spark... Replication Missing Blocks Controlling with Custom Ambari Services and Scripts

Ansible Configuration management tool Why we chose? Easy to learn Agent less, push based system Can control upgrading workflow

Overview of Upgrading with Ansible ansible-playbook -i hosts/target_cluster play_books/upgrade_master.yml Configuration management Controlling upgrade sequence

Upgrading Flow Ansible Ambari Ambari Ansible Manual Preparing Registering Installing Upgrading Each Component Finalizing Make backups of critical metadata: FsImage, Hive Metastore, Oozie DB. Registers repository for the target (HDP) version. Installs target (HDP) version on all hosts. Upgrades to target version, Some component upgrades in parallel. Finalizes the upgrades. https://docs.hortonworks.com/hdpdocuments/ambari-2.1.1.0/bk_upgrading_ambari/content/_manual_minor_upgrade.html

Upgrading Each Component Master 3 hours Zookeeper NN, JN, JHS, ATS, RM Application 4 hours HttpFS, Oozie Hive Metastore, HiveServer2 0.5 hours Client MapReduce2, Spark, YARN, HDFS, Oozie, Hive, Zookeeper Slave 70 hours DataNode NodeManager

Preventing Job Failures Controlling with Ansible Map Reduce Hive Spark... Replication Missing Blocks Controlling with Custom Ambari Services and Scripts

Ambari Custom Service Ambari can implement custom service Operational commands NameNode F/O Load balancer pool add/remove Can operate as existing Ambari services

Ambari Custom Service Add service definition xml and python scripts No need of manually executing commands on a server Prevents operation miss

Ambari CLI In-house script for cluster admins A wrapper script using API s of Ambari, NameNode, ResourceManager etc. Provides safe operations Pre and post restart check for each component Preventing data loss

Safety Restart for HiveServer2 Client Load Balancer HiveServer2

Safety Restart for HiveServer2 Client Load Balancer Established Connection HiveServer2 Wait for jobs to be finished

Safety Restart for HiveServer2 Client Load Balancer Established Connection HiveServer2 Wait for jobs to be finished

Safety Restart for HiveServer2 Client Load Balancer UPGRADE HiveServer2

Safety Restart for HiveServer2 Client Load Balancer UPGRADE HiveServer2

Safety Restart for HiveServer2 Client Load Balancer HiveServer2

Preventing Data Loss Controlling with Ansible Map Reduce Hive Spark... Replication Missing Blocks Controlling with Custom Ambari Service and Scripts

Safety Restart for DataNode Rack 1 Rack 2 Rack 3 Rack 4 Rack n Missing Blocks: 0 Under Replicated Blocks: 0 Corrupt Blocks: 0 Group of DataNodes at a time Upgrade, restart and wait for Missing Blocks to be 0

Safety Restart for DataNode Rack 1 Rack 2 Rack 3 Rack 4 Rack n Missing Blocks: 0 Under Replicated Blocks: 0 Corrupt Blocks: 0 Group of DataNodes at a time Upgrade, restart and wait for Missing Blocks to be 0

Replication Vs Erasure Coding Replication b1 b2 b1 b2 b1 b3 b3 b3 b2 Erasure Coding (EC) b1 b1 b1 b1 b1 b1 p1 p1 p1

Safety Restart for DataNode (EC) Missing Blocks: 0 Under Replicated Blocks: 0 Corrupt Blocks: 0 One DataNode at a time Upgrade, restart and wait for Missing Blocks to be 0

Test Jobs for Each Component HDFS Read/Write MapReduce hdfs dfs -put /tmp/test yarn jar hadoop-mapreduce-examples.jar pi 5 10 Hive, Hive on Tez Pig, Spark HttpFS, Oozie hive -e select x from default.dual group by x curl --negotiate -u : https://.../webhdfs/v1/?op=liststatus

Results

Results of Non-stop Upgrade Hadoop 2.3.x to 2.3.y upgrade 7 minutes of downtime 0.3% of job failure 0% data loss

Results of Non-stop Upgrade Component Non-stop Description HDFS HttpFS MapReduce Hive Pig Spark Disconnected from Hive Metastore, few jobs failed HiveServer2 Checksum error occurred for some jobs Oozie Non-HA component successful upgrade without any job failures Upgrade was affected up to some extent

Problems During the Upgrade NameNode restart DataNode restart Spark and Hive job failure

NameNode Restart Restart standby NN (nn02) Failover nn01 nn02 nn02 now active Restart standby NN (nn01) Restart to upgrade During startup ERROR: no enough memory NN metadata was huge and it took long to free memory As a workaround, stop NN wait start NN

DataNode Restart Restart failed due to existence of old pid file 2 DataNode Processes Parent: /var/run/hdfs/hadoop-hdfs-datanode.pid Child: /var/run/hdfs/hadoop_secure_dn.pid Usually gets deleted automatically, but Starting regular datanode initialization Still running according to PID file /var/run/hdfs/hadoop_secure_dn.pid

Spark (v1.5) Job Failure Spark job Connecting Hive Metastore

Spark (v1.5) Job Failure Spark job Disconnected Hive Metastore Restart to upgrade Does not reconnect

Hive Job Failure Restart HiveServer2 Uploaded library timestamp: yyyy HDFS Upload necessary tarballs HiveServer2 def setup_hiveserver2():... copy_to_hdfs("mapreduce,... Client Cached library timestamp: xxxx java.io.ioexception:.../mapreduce.t ar.gz changed on src filesystem

Conclusion

Non-stop Upgrade Hadoop 2.3.x to 2.3.y upgrade 7 minutes of downtime 0.3% of job failure 0% data loss No need of separate resource scheduling

Comparison of Upgrade Results Previous Proposed (without automation) Proposed (with automation) Cluster Downtime Maintenance Time 12h 0min 7min 12h 72h 72h Man-hour 108h 648h 21h User impact Operating cost

Future Work Improving non-stop upgrade by bringing down Cluster downtime to 0 Job failures to 0 To be able to perform major upgrades (upgrades involving NameNode metadata changes) Automating of preparation stage and handle failures with recovery operations Contributing to OSS