CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH)

Similar documents
Big Data for Engineers Spring Resource Management

Vendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam.

Big Data 7. Resource Management

Actual4Dumps. Provide you with the latest actual exam dumps, and help you succeed

Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]

Exam Questions CCA-505

2/26/2017. For instance, consider running Word Count across 20 splits

Hadoop MapReduce Framework

Exam Questions CCA-500

Vendor: Cloudera. Exam Code: CCD-410. Exam Name: Cloudera Certified Developer for Apache Hadoop. Version: Demo

CCA Administrator Exam (CCA131)

Introduction To YARN. Adam Kawa, Spotify The 9 Meeting of Warsaw Hadoop User Group 2/23/13

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Introduction to MapReduce

Hortonworks HDPCD. Hortonworks Data Platform Certified Developer. Download Full Version :

PaaS and Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University

Exam Name: Cloudera Certified Developer for Apache Hadoop CDH4 Upgrade Exam (CCDH)

Programming Models MapReduce

Introduction to the Hadoop Ecosystem - 1

CertKiller.CCA-500,55.Q&A

Hadoop-PR Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer)

itpass4sure Helps you pass the actual test with valid and latest training material.

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

Volume 3, Issue 9, September 2015 ISSN Hadoop2 Yarn. Asso.Prof. Ashish Sharma #1 Snehlata Vyas *2

Database Applications (15-415)

Clustering Lecture 8: MapReduce

50 Must Read Hadoop Interview Questions & Answers

Lecture 11 Hadoop & Spark

3. Monitoring Scenarios

Distributed Systems. CS422/522 Lecture17 17 November 2014

Hortonworks PR PowerCenter Data Integration 9.x Administrator Specialist.

Cluster Setup. Table of contents

Introduction to MapReduce

CS 378 Big Data Programming

L5-6:Runtime Platforms Hadoop and HDFS

HADOOP. K.Nagaraju B.Tech Student, Department of CSE, Sphoorthy Engineering College, Nadergul (Vill.), Sagar Road, Saroonagar (Mdl), R.R Dist.T.S.

Vendor: Hortonworks. Exam Code: HDPCD. Exam Name: Hortonworks Data Platform Certified Developer. Version: Demo

ExamTorrent. Best exam torrent, excellent test torrent, valid exam dumps are here waiting for you

MAXIMIZING PARALLELIZATION OPPORTUNITIES BY AUTOMATICALLY INFERRING OPTIMAL CONTAINER MEMORY FOR ASYMMETRICAL MAP TASKS. Shubhendra Shrimal.

HADOOP FRAMEWORK FOR BIG DATA

Hadoop Job Scheduling with Dynamic Task Splitting

KillTest *KIJGT 3WCNKV[ $GVVGT 5GTXKEG Q&A NZZV ]]] QORRZKYZ IUS =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Novel Scheduling Algorithms for Efficient Deployment of MapReduce Applications in Heterogeneous Computing Environments

Cloudera Administration

A brief history on Hadoop

Getting Started with Hadoop

CS370 Operating Systems

Distributed Systems CS6421

CS370 Operating Systems

MI-PDB, MIE-PDB: Advanced Database Systems

Cloudera Administration

Hadoop. copyright 2011 Trainologic LTD

Cloudera Administration

An Enhanced Approach for Resource Management Optimization in Hadoop

Tuning Intelligent Data Lake Performance

YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa

7 Deadly Hadoop Misconfigurations. Kathleen Hadoop Talks Meetup, 27 March 2014

BigData and Map Reduce VITMAC03

Mixing and matching virtual and physical HPC clusters. Paolo Anedda

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

Map Reduce & Hadoop Recommended Text:

A Glimpse of the Hadoop Echosystem

Chapter 5. The MapReduce Programming Model and Implementation

A BigData Tour HDFS, Ceph and MapReduce

Oracle Big Data Fundamentals Ed 2

Facilitating Consistency Check between Specification & Implementation with MapReduce Framework

Cloud Computing CS

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Distributed Computation Models

Tuning Intelligent Data Lake Performance

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

CSE6331: Cloud Computing

PARALLELIZATION OF SEQUENTIAL COMPUTER VISION ALGORITHMS ON BIG-DATA USING DISTRIBUTED CHUNK-BASED FRAMEWORK. Norhan Magdi Sayed Osman

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Hadoop/MapReduce Computing Paradigm

Distributed Resource Management for YARN

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Batch Processing Systems

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391

Your First Hadoop App, Step by Step

Hadoop Integration User Guide. Functional Area: Hadoop Integration. Geneos Release: v4.9. Document Version: v1.0.0

Implementing Mapreduce Algorithms In Hadoop Framework Guide : Dr. SOBHAN BABU

Introduction to Data Management CSE 344

TP1-2: Analyzing Hadoop Logs

Locality Aware Fair Scheduling for Hammr

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Cloudera Installation

Cloudera Installation

Distributed Face Recognition Using Hadoop

UNIT II HADOOP FRAMEWORK

Hadoop Map Reduce 10/17/2018 1

Hadoop On Demand: Configuration Guide

Dr. Chuck Cartledge. 11 Feb. 2015

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi

An Adaptive Scheduling Technique for Improving the Efficiency of Hadoop

Evaluation of Apache Hadoop for parallel data analysis with ROOT

Introduction to HDFS and MapReduce

Transcription:

Cloudera CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Download Full Version : http://killexams.com/pass4sure/exam-detail/cca-410

Reference: CONFIGURATION PARAMETERS DFS.BLOCK.SIZE QUESTION: 91 Which daemons instantiates JVM s to perform mapreduce processing in a cluster running mapreduce v1 (MRv1)? A. Nodemanager B. applicationmanager C. Applicationmaster D. Tasktracker E. Jobtracker F. Datanode G. Namenode H. Resourcemanager Answer: D A TaskTracker is a slave node daemon in the cluster that accepts tasks (Map, Reduce and Shuffle operations) from a JobTracker. There is only One Task Tracker process run on any hadoop slave node. Task Tracker runs on its own JVM process. Every TaskTracker is configured with a set of slots, these indicate the number of tasks that it can accept. The TaskTracker starts a separate JVM processes to do the actual work (called as Task Instance) this is to ensure that process failure does not take down the task tracker. QUESTION: 92 You set mapred.tasktracker.reduce.tasks.maximum to a value of 4 on a cluster running mapreduce v1 (MRv1). How many reducers will run for any given job? A. Maximum of 4 reducers, but the actual number of reducers that run for any given job is based on the volume of input data B. a maximum of 4 reducer s, but the actual number of reducer s that run for any given job is based on the volume intermediate data. C. Four reducer s will run. Once set by the cluster administrator, this parameter can t be overridden D. The number of reducer s for any given job is set by the developer

Answer: B mapred.tasktracker.reduce.tasks.maximum The maximum number of reduce tasks that will be run simultaneously by a task tracker. A reducer uses intermediate data as input. Note: * The mapper output (intermediate data) is stored on the Local file system (NOT HDFS) of each individual mapper nodes. * Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The progress calculation also takes in account the processing of data transfer which is done by reduce process, therefore the reduce progress starts showing up as soon as any intermediate key-value pair for a mapper is available to be transferred to reducer. QUESTION: 93 You configure your hadoop development cluster with both mapreduce frameworks, mapreducev1 (MRv1) and mapreduce v2 (MRv2/YARN). You plan to run only one set of mapreduce daemons at a time in this development environment (running both simultaneously results in an unsuitable cluster but configure in both moving between them is fine). Which two mapreduce (capitation) daemons do you need to configure to run on your masternodes? A. Containermanager B. Nodemanager C. Journalnode D. Applicationmaster E. Resourcemanager F. Jobtracker Answer: E, F Jobtracker for mapreduce V1 (MRv1) and ResourceManager for mapreduce v2 (MRv2/YARN). * Configure YARN daemons If you have decided to run YARN, you must configure the following services: ResourceManager (on a dedicated host) and NodeManager (on every host where you plan to run MapReduce v2 jobs).

* JobTracker is the daemon service for submitting and tracking MapReduce V1 (MRv1) jobs in Hadoop. Note: * Make sure you are not trying to run MRv1 and YARN on the same set of nodes at the same time. This is not supported; it will degrade performance and may result in an unstable cluster deployment. * MapReduce has undergone a complete overhaul and CDH4 now includes MapReduce 2.0 (MRv2). The fundamental idea of MRv2's YARN architecture is to split up the two primary responsibilities of the JobTracker resource management and job scheduling/monitoring into separate daemons: a global ResourceManager (RM) and per-application ApplicationMasters (AM). Incorrect: Not C: JournalNode is for Quorum implementation. QUESTION: 94 Using hadoop s default settings, how much data will be able to store on your hadoop cluster if it is has 12 nodes with 4TB raw diskspace per node allocated to HDFS storage? A. Approximately 3TB B. Approximately 12TB C. Approximately 16TB D. Approximately 48TB Answer: C Default replication value is 3. The following amount of data can be stored: 12 x 4 /3 = 16 TB QUESTION: 95 When requesting a file, how does HDFS retrieves the blocks associated with that file? A. the client polls the datanodes for the block ID s B. the namenode requires the datanode for the block ID s C. The namenode reads the block ID s from memory D. the namenode reads the block ID s from disk

Answer: D Here is how a client RPC request to the Hadoop HDFS NameNode flows through the NameNode. The Hadoop NameNode receives requests from HDFS clients in the form of Hadoop RPC requests over a TCP connection. Typical client requests include mkdir, getblocklocations, create file, etc. Remember HDFS separates metadata from actual file data, and that the NameNode is the metadata server. Hence, these requests are pure metadata requests no data transfer is involved. QUESTION: 96 What four functions do scheduling algorithms perform on hadoop cluster? A. Run jobs at periodic times of the day B. Reduce the job latencies in environment with multiple jobs of different sizes C. Allow multiple users to share clusters in a predictable policy-guided manner D. support the implementation of service-level agreements for multiple cluster users E. allow short jobs to complete even when large, long jobs (consuming a lot of resources are running) F. Reduce the total amount of computation necessary to complete a job. G. Ensure data locality by ordering map tasks so that they run on data local maps slots Answer: A, B, E, F Hadoop schedulers Since the pluggable scheduler was implemented, several scheduler algorithms have been developed for it. / FIFO scheduler The original scheduling algorithm that was integrated within the JobTracker was called FIFO. In FIFO scheduling, a JobTracker pulled jobs from a work queue, oldest job first. This schedule had no concept of the priority or size of the job, but the approach was simple to implement and efficient. / Fair scheduler The core idea behind the fair share scheduler was to assign resources to jobs such that on average over time, each job gets an equal share of the available resources. The result is that jobs that require less time are able to access the CPU and finish intermixed with the execution of jobs that require more time to execute. This behavior allows for some interactivity among Hadoop jobs and permits greater responsiveness of the Hadoop cluster to the variety of job types submitted. The fair scheduler was developed by Facebook. / Capacity scheduler The capacity scheduler shares some of the principles of the fair scheduler but has distinct differences, too. First, capacity scheduling was defined for large clusters, which may have multiple, independent consumers and target applications.

For this reason, capacity scheduling provides greater control as well as the ability to provide a minimum capacity guarantee and share excess capacity among users. The capacity scheduler was developed by Yahoo!. In capacity scheduling, instead of pools, several queues are created, each with a configurable number of map and reduce slots. Each queue is also assigned a guaranteed capacity (where the overall capacity of the cluster is the sum of each queue's capacity). * The introduction of the pluggable scheduler was yet another evolution in cluster computing with Hadoop. The pluggable scheduler permits the use (and development) of schedulers optimized for the particular workload and application. The new schedulers have also made it possible to create multi-user data warehouses with Hadoop, given the ability to share the overall Hadoop infrastructure with multiple users and organizations. QUESTION: 97 What happens when a map task crashes while running a mapreduce job? A. The job immediately fails B. The tasktracker closes the JVM instance and restarts C. The jobtracker attempts to re-run the task on the same node D. the jobtracket attempts to re-run the task on a different node Answer: D When the jobtracker is notified of a task attempt that has failed (by the tasktracker s heartbeat call), it will reschedule execution of the task. The jobtracker will try to avoid rescheduling the task on a tasktracker where it has previously failed.

For More exams visit http://killexams.com Kill your exam at First Attempt...Guaranteed!