BIGData (massive generation of content), has been growing

Size: px

Start display at page:

Download "BIGData (massive generation of content), has been growing"

Dorthy Johnson
6 years ago
Views:

1 1 HEBR: A High Efficiency Block Reporting Scheme for HDFS Sumukhi Chandrashekar and Lihao Xu Abstract Hadoop platform is widely being used for managing, analyzing and transforming large data sets in various systems. Two basic components of Hadoop are: 1) a distributed file system (HDFS) 2) a computation framework (MapReduce). HDFS stores data on simple commodity machines that run DataNode processes (DataNodes). A commodity machine running NameNode process (NameNode) maintains meta data information of the file system. Every DataNode sends lists of all files currently stored on it, known as block report to the NameNode periodically. NameNode processes block reports to build a mapping between files and their locations on various DataNodes. The block reports form a heavy internal load of a Hadoop cluster as they consume computation resources of DataNodes and the NameNode as well as network bandwidth of the cluster. With extensive supporting experiment results, this paper proposes a new block report protocol, HEBR, for Hadoop to significantly reduce both computational and communication overhead, and thus improving overall Hadoop system s performance greatly. Index Terms Hadoop, Hadoop Distributed File System, Efficient distributed file systems, Block reporting scheme. 1 INTRODUCTION BIGData (massive generation of content), has been growing at an average rate of 40% per year, 80% of which is unstructured. Big Data is measured by three Vs: Variety, Velocity, and Volume [3]. Data with Variety can be challenging to categorize, while Velocity which refers to the speed of the data, can be challenging to process. The volume or size of Big Data tends to fluctuate depending on the problem to be assessed [4]. The three Vs makes it increasingly challenging to process Big Data and thus, researchers are on the look for efficient solutions. One solution to deal with Big Data is Apache s Hadoop, an open-source system. In 2006 Yahoo! began investing in its development and used Hadoop as its distributed data processing platform. Since then, Hadoop installations have grown to thousands of nodes in a cluster. For example, the biggest Hadoop clusters in Yahoo! [7], [8] consist of 4000 nodes and have a total space capacity of 14PB each. ebay is a heavy user of MapReduce paradigm, Apache Pig, Apache HBase etc. for search optimization and has a Hadoop cluster comprising of 532 nodes maintaining 5.3PB of data [22]. FaceBook owns 2 Hadoop clusters - a 1100-machine cluster comprising of 8800 cores storing 12PB of data and a 300- machine cluster with 2400 cores storing 3PB of data to maintain internal log and dimension data sources [22]. Hadoop uses its own distributed file system known as Hadoop Distributed File System (HDFS) that runs on commodity hardware to store files. Although HDFS shares many attributes with other distributed file systems [5], it is designed to be significantly more fault tolerant compared S. Chandrasekhar is with the Department of Computer Science, Wayne State University, Detroit MI, USA sumukhic@wayne.edu. L. Xu is with the Department of Computer Science, Wayne State University, Detroit MI, USA e- lihao@wayne.edu. He is an associate professor at the department. with some dedicated hardware solutions such as Redundant Array of Inexpensive Disks (RAID) [6] or Data Replication. A HDFS cluster comprises of a commodity machine running the NameNode process and many others running DataNode processes. The NameNode manages the name space of the file system and the DataNode stores the physical data files that are partitioned into blocks. Each block is replicated (3 replicas by default) and stored on different DataNodes. The number of replicas of a block can be adjusted by a parameter in configuration files of HDFS. A DataNode identifies block replicas on its disk and send periodic reports to the NameNode termed as block reports. With the help of these block reports, the NameNode builds a map that maps blocks of files to their physical locations on DataNodes. The map is referred as BlockMap. The block reports are essential meta data for the HDFS to know the physical locations of files on DataNodes and they form a significant portion of internal loads of a cluster [1]. The block report load depends on the number of DataNodes in a cluster. The cluster might become dysfunctional and may process fewer reads and writes when the load is too high. Although the block reports are randomized so that they do not accumulate at the NameNode, it typically still has to process more than 10 reports per second, each consisting of 60,000 blocks for a normal sized cluster (of about one thousand DataNodes) [1]. In such a typical configuration, approximately 30% of the NameNode s total computation capacity is used to handle block reports. Files are uploaded onto HDFS in intervals. Subsequent block reports are identical when no new files have been uploaded between them. When this happens, the NameNode still updates BlockMap with no new updates, thus wasting a lot of computational and network resources of the cluster. A new block report scheme, HEBR: High Efficiency Block

2 2 Reporting Scheme for HDFS was developed by us that sends much smaller list of blocks more frequently. Through out this paper, we refer to our block reporting scheme as HEBR and the block reporting scheme implemented in latest version of Hadoop as CBR. This paper presents HEBR and the results from different sets of experiments that evaluates and demonstrates its superior performance compared to CBR. HEBR sends as few as 59.78% of blocks to the NameNode on 4-node cluster, thus reducing the processing resources and network bandwidth usage greatly. This improves overall HDFS performance significantly. The paper is organized as follows: Section 2 briefly describes the architecture of HDFS; Section 3 describes the current approaches of sending and processing block reports and other related research areas; Section 4 discusses the proposed block reporting scheme, HEBR. The experimental set up and different test benchmarks on which performance evaluations of HEBR were conducted is discussed in section 5. In sections 6, 7 and 8, the results from some of the experiments conducted on the benchmarks are presented in detail. Section 9 concludes the paper with the finding that HEBR is a significantly more efficient block reporting scheme than CBR and is no worse than CBR under any experimental settings. 2 ARCHITECTURE OF HDFS HDFS follows a master-slave architecture with a single machine running NameNode software component that acts as a master server and many machines running DataNode processes that act as slaves. The DataNodes are typically arranged in a rack and are connected to a switch, usually with one or two Gigabit Ethernet boned links. The switch in turn has up-link connections to another tier of switches, thus, connecting all DataNodes, forming a real-time HDFS cluster. The NameNode manages the entire file system name space and list of blocks belonging to a file in FsImage meta data. FsImage is stored persistently in the physical memory of the NameNode. Each physical data block of a file written by a client application is stored in the local file system of the DataNodes. A block of a file is represented by two physical files - one that has the content of the file and other that has meta data information including the checksum for the block and generation stamp. A client application contacts the NameNode to get appropriate information to cater to read/write requests. To serve a read request, the NameNode returns a list of DataNodes that host blocks of the requested file, and the client contacts the closest DataNode for the data blocks. To serve a write request, the NameNode decides a set of DataNodes based on a proximity algorithm to host blocks of the file and returns this list to the client. The client then writes the blocks onto the DataNodes. The DataNodes may then write the blocks onto other DataNodes in same or different racks in order to maintain minimum number of replicated copies of each blocks. 3 RELATED WORK CBR comprises of Full Block reports (full-block) and Intermediate Block reports (intermediate-block). Details of CBR, research that evaluated performance of HDFS (evaluation strategy, metrics and experiment set up) and research results on improvements of performance of Hadoop are described here. 3.1 CBR A full-block report reports all valid blocks on a disk of a DataNode and an intermediate-block report reports one block that is being received or deleted from the disk of a DataNode Full Block Reports Upon registration to a cluster, a process scans the disk of a DataNode to report all the blocks stored for the first time (first-time block report). Subsequently, when a configurable timer expires (set by default to 6 hours), the DataNode runs the same background scan process. This time, the scanner collects all the current valid blocks on its disk. The scanner reconciles the differences between the blocks that reside on the disk and a map that has the list of blocks in every storage volume of the disk (volumemap). The differences of the blocks are used to update the volumemap. A storage volume in volumemap is associated with a DataStorage id and a block is associated with block id, generation stamp and its length. A map that maps the blocks to DataStorage id of the volume in which the blocks reside is a block report and is represented using a HashMap. The BlockMap that maps the blocks with the DataNodes they reside on, is not stored persistently on the disk of the NameNode [20]. The NameNode knows the locations of the blocks only when it processes the first-time, full-block and intermediate-block reports. An interface in the NameNode processes block reports based on implementation in the method reportdiff(). When a block reported by a DataNode has the same generation stamp and length as recorded in FsImage, the physical location of the block from the block report is updated in BlockMap. When a block reported is not in FsImage, the associated DataNode is notified to invalidate it. If a reported block is invalid, the NameNode triggers replication process and commands the DataNode to delete it. Every block hosted on a DataNode, (even those unchanged) is processed every time when a block report is processed. During this process, no read or write can occur Intermediate Block Reports In practice, files may be written onto the DataNodes before a scheduled full-block report is sent to the NameNode. With only full-block reports to report the physical locations of blocks, a client read request may not be fulfilled if blocks of the files to be read have not been processed at the NameNode yet. One solution to address this problem is intermediate-block reporting scheme discussed here. Any change (addition or deletion of a data block) on the disk of a DataNode triggers an intermediate block report. Such a report is a map that maps a block (the one added or deleted) to DataStorage id of the volume in which the block was uploaded or deleted from is sent to the NameNode [21]. The physical locations of the blocks that are being received or the ones that have already been received and validated at DataNodes are updated in BlockMap. NameNode processes

3 3 the blocks that were deleted at the DataNode differently. It verifies if such blocks are valid and triggers replication tasks. 3.2 Performance Evaluation of HDFS Researchers have explored the areas of evaluating the performance of Hadoop and other file systems [11], [12], [13]. They have used standard test benchmarks. Some of them are [11]: 1) TeraGen: Generates a file of desired size as an output usually ranging between 500GB to 3TB. 2) TeraSort: Sorts an input file across a Hadoop cluster. 3) TeraValidate: Verifies the sorted data for accuracy. 4) RandomWriter: Generates different file sizes based on a size parameter and uploads the files onto HDFS. Several research papers have focused on evaluating read/write performance with small and big data sets [10], [14], [15], [16]. Various statistics such as data throughput, average and standard deviation of I/O rate were computed. The authors of paper [10] integrated parallel virtual file system (PVFS) into Hadoop and compared its performance with HDFS using a set of data-intensive computing benchmark. They noticed that consistency, durability, and persistence trade offs made by HDFS affect application performance. Another research paper analyzed the performance of HDFS in terms of throughput achieved [15]. Yet another work [17] evaluated cluster configurations using Hadoop in order to check for parallelism performance and scalability. This evaluation established the capabilities from the perspective of storage, indexing techniques, query distribution, parallelism, scalability and performance in heterogeneous environments on HDFS. 3.3 Performance Improvement in HDFS In the current implementation of Hadoop, data locality has not been taken into account for launching map tasks, as it is assumed that most maps are data-local. A paper [17] addresses the problem of placing data across DataNodes. Their approach ensures that every DataNode in the cluster has a balanced data processing load. Yet another group [18] shows that providing a simple data placement scheme which considers several aspects of computing platforms and nature of jobs submitted can increase the throughput of the jobs completed by several orders in map-reduce paradigm. They also conducted a performance study of MapReduce on a 100-node cluster of Amazon EC2 with various levels of parallelism. They identified five design factors that affect the performance of Hadoop and investigated alternative methods for each of them. They claimed that careful tuning of these factors will improve overall performance of Hadoop. 4 HEBR: HIGH EFFICIENCY BLOCK REPORTING SCHEME To the best of our knowledge, no previous research efforts have focused on improvements to the performance of HDFS through a more efficient block reporting scheme. In this section, we present HEBR, our new block reporting scheme that is significantly more efficient in both computation and communication than CBR. Processing full-block reports that are similar or worse still, identical to each other, places a high computational load on the NameNode as well as a high communication overhead on the cluster. Since intermediate-block reports has only one block each and the NameNode processes every report independently, processing load on the NameNode increases further. We designed a highly efficient block reporting scheme that reduces network traffic between DataNodes and the NameNode, and also reduces block report load on the NameNode. We call it HEBR (High Efficiency Block Reporting Scheme). It is noted that in many applications, addition or deletion of blocks is usually correlated, e.g., blocks of a same file. Thus, the idea of HEBR is to send smaller block reports that may contain blocks associated with one file to the NameNode in each block report. The efficiency gains of HEBR is a result of it sending: 1) fewer block reports than many intermediate-block reports 2) smaller block reports than a full-block report HEBR sends a first-time block report which is essentially a full-block report at the time of registration of the DataNode to the cluster. This is the only time a DataNode running HEBR sends a full-block report. Subsequent reports contain newly uploaded blocks on the disk of a DataNode. CBR and HEBR are compared in Table 1. TABLE 1 Comparison of Two Approaches of Block Reporting Schemes Sends CBR HEBR A first-time Yes Yes Full-block reports periodically Yes No Intermediate-block reports with one block Yes No Group of newly uploaded blocks No Yes Deleted blocks No Yes A background disk scanning process is scheduled when a timer expires. This process scans the disks of the DataNodes. HEBR introduces a configurable time interval for scheduling block reports. Current time deducted by this interval is the gap between two subsequent block reports. When the scanner identifies blocks that are uploaded onto the disk after a previous block report was sent based on the time stamp on the blocks, it adds them to the next new block report. Although, the number of blocks written onto the disk depends on rate at which they are written and the interval between two reports, usually, more than one block is written onto HDFS between two block reports. Thus, each block report has more number of blocks than an intermediateblock report. But, as HEBR reports only the most recently uploaded blocks to the NameNode, it sends fewer blocks than a full-block report in every block report except in a first-time report. As difference in processing and communication time between a block report with just a single block and one with few blocks is negligible, the reduction in number of block reports sent to the NameNode, increases reporting efficiency. Since we are accumulating few blocks (not just

4 4 a block) before the DataNode reports to the NameNode, the number of block reports communicated in the network are much smaller than all the intermediate-block reports. The fewer the block reports that the NameNode needs to process, the lower the block reporting load it has. In CBR, all the blocks on the disks of the DataNodes are reported. If blocks reported are part of files in FsImage, their physical locations are updated in BlockMap when block reports are processed. If the blocks that are part of the file are not reported in a block report, the NameNode assumes the blocks have been deleted and triggers replication commands. HEBR does not report all the blocks on the disks of the DataNodes. Thus, the strategy for processing block reports have been slightly modified. Along with blocks that are recently uploaded, blocks that are deleted from a DataNode are also sent. To distinguish these blocks from the ones that are in memory, their time stamps are marked as 000. Under the assumption that the number of deleted blocks from DataNodes is much smaller than the number of all blocks stored on it, HEBR reports fewer and more recently uploaded blocks to the NameNode compared to CBR. Recovery mechanism from CBR has been retained. The locations of blocks that are not reported in a recent report but whose locations are already updated in BlockMap do not change. Since such blocks are not assumed to be deleted on disks of DataNodes, no replication process is triggered by the NameNode. For new blocks whose locations are not in BlockMap, physical locations of blocks are added into the map. Only in rare cases, there are more deleted blocks than existing blocks on a DataNode. This ensures that HEBR is always minimal. The implementation details of HEBR are discussed here. The architecture of HDFS (version 2.7.0) has been retained. We: 1) modified the configuration file to insert an interval that triggers HEBR report. 2) Updated the background scanning process of CBR such that it collects blocks that were uploaded or deleted from the disks of DataNodes recently. 3) altered the time stamps of blocks that are deleted from disks of DataNodes. 4) modified code that adds blocks into a block report, such that the blocks collected from the scanner comprises the block reports. 5) modified the block report processing mechanism on the NameNode. Overall, we added or altered 300 LOC (Lines of Java Code) on HDFS. These changes do not break any running cluster and can be easily adapted. 5 EVALUATION A detailed evaluation strategy for HEBR and their results are discussed in the following sections. To assess performance of HDFS using HEBR, we choose two easily accessible benchmarks that were developed to evaluate the performance of HDFS - Word-Count [23] and Random-Writer [24]. These benchmarks were often chosen by many researchers to study read/write throughput in HDFS. The benchmarks comprise of programs that upload only one file for every run onto HDFS [2], [12]. A map-reduce program analyzing Big Data files uploads multiple files for every run onto HDFS. To evaluate HEBR on workloads similar to the real map-reduce scenario, we modified the programs of the benchmarks to upload specified number of same files (50 by default) for every run onto HDFS. In addition, we built our own benchmark APIRandomWrite with some new features added to further evaluate HEBR. The experiments using programs of the benchmarks are conducted on both CBR and HEBR to compare their performance under various workloads. 5.1 Word-Count [23] In this benchmark, the program uses a map function that counts the number of words in each input file (at most four standard files) and a reduce function that accumulates the count of words from individual files. The cumulative count of every word in the input files is written as a file and uploaded onto HDFS. A modified version to upload multiple replicas of the output file during a set of an experiment resembling real workload is used. 5.2 Random-Writer [24] This benchmark comprises of a program that uses map function to create and upload a file onto HDFS. The reduce function is not implemented. The file size is a required parameter for the program. A modified version of the program that uploads multiple copies of the same file during one set of an experiment is used. This scenario is similar to a real world workload. 5.3 APIRandomWrite Since HEBR reports blocks deleted from DataNodes, API- RandomWrite comprises of a program that has following new features not in other standard benchmarks: 1) can upload multiple files and blocks of specified sizes onto HDFS 2) can delete a block and all its replicas in HDFS according to user s specification (number of blocks and the file from which the blocks are to be deleted) 5.4 Experiment Setup The experimental setup using the three benchmarks is discussed in this section. Each experiment conducted has a list of parameters and their default values are illustrated in Table 2 TABLE 2 Experimental Parameters and Default Values Parameter Role Default Values Interval Block Size File Size Trigger generation of block reports Decide number of blocks for files Determine the size of the file to be uploaded 1 minute 128MB Varied

5 5 Experiments are performed across the benchmarks by maintaining two of the above parameters as fixed and varying the third. The experiments conducted are labeled as: Different-File-Sizes, Different-Block-Sizes and Different- Intervals to indicate the varied parameters. The block sizes and the intervals are re-adjusted to three different values from their default (block size: 128MB and interval: 1 minute) in the configuration files of both versions of HDFS running CBR and HEBR. Thus, we conducted many subsets of trials with variations of block sizes and intervals during each set of the experiments. The new adjustment values for block sizes are dependent on file sizes uploaded onto HDFS during a trial. The variations of the intervals (3, 5 and 10 minutes) are standard across all trials. At the start of each experiment, HDFS is reformatted and all previous input files are deleted. The number of blocks in block reports sent from a DataNode running HEBR or CBR to the NameNode during every experiment was recorded. This data was saved onto a file. We collect these files from all the DataNodes that report to the NameNode in order to analyze the performance when the two block reporting schemes were implemented. HEBR scheme was examined when the programs of the three benchmarks are executed on single-node, 3-node and 4-node clusters. The configurations of machines that are part of the 3-node cluster are tabulated below in Table 3. The master node that runs both the NameNode and the DataNode processes is labeled as Master. One of the DataNodes is labeled as Slave1 and the other as Slave2. The commodity machines that constitute single-node and 3-node clusters were old and identical. Since a real world HDFS cluster may comprise of commodity machines of mixed configurations, a 4-node cluster was set up that consists of machines with varying configurations. A node that runs both the NameNode and the DataNode processes is labeled as Master, the other three DataNodes are labeled Slave1, Slave2 and Slave3. Their configurations are listed in Table 4. Nodes TABLE 3 Configurations of Machines in 3-node Cluster Architecture Cpu Modes Cpu Mhz Total Available Memory Master i bit GB Slave1 i bit, 64-bit GB Slave2 i bit GB Nodes TABLE 4 Configurations of Machines in 4-node Cluster Architecture Cpu Modes Cpu Mhz Total Available Memory Master x bit, 64-bit GB Slave1 x bit 64-bit GB Slave2 x bit 64-bit GB Slave3 i bit GB With the data collected from the DataNodes, simple statistics such as average and maximum number of blocks sent to the NameNode were computed and analyzed to compare performance of CBR and HEBR. Fewer blocks in a block report reduce the processing time and network bandwidth usage in the cluster. Thus, the lower the average, the more efficient the block reporting scheme is. An experiment is terminated when four consecutive block reports have no blocks. In the following sections, we present and discuss the results of experiments running programs of the benchmarks. 6 BENCHMARK 1: WORD-COUNT Four files of sizes approximately 700KB, 1.5MB, 650KB and 1.5MB were provided as inputs to the word-count program, not all at the same time. The number of files and their sizes determine the size of generated output file. The largest output file was of size 926KB when all the four files were input simultaneously. The experiments were run on singlenode and 3-node clusters. Experiment results are very rich, and we only discuss the results of one experiment (Different Intervals), which is superset of the other two experiments, running on the 3-node cluster. 6.1 Different Intervals on 3-node Cluster The block sizes were set to 128KB, 256KB and 512KB. The intervals were set to 1, 3 and 5 minutes for each block size adjustments. Under these settings, 50 files each, of sizes: 708KB, 825KB, 860KB and 926KB were uploaded onto the file system. Since experiments uploaded small files, the block sizes were set in units of KB. 36 different experiment trials were conducted and a small portion of the results (for one block size and an interval for the smallest (708KB) and largest file (926KB)) is discussed in this section, while detailed results are presented in Appendix A. Table 5 shows average and maximum number of blocks sent to the NameNode by all the DataNodes. Since the rates of writing blocks onto disks of DataNodes are roughly the same, the maximum number of blocks sent by all the DataNodes may be identical. The results from the experiment shows that HEBR outperforms CBR consistently. It sends significantly fewer blocks in every report when intervals are set to 1, 3 or 5 minutes. Figure 1 shows partial results from the experiment (one block size for every interval and for two file sizes). 6.2 Observation and Remarks Table 6 shows average percentage reduction in the number of blocks sent to the NameNode in block reports for all the experiments conducted using word-count program on single-node and 3-node clusters. The average number of blocks sent to the NameNode from the DataNodes running HEBR is always lower compared with CBR in all experimental settings. The results based on various experiments on single-node and 3-node, suggests that HEBR reduces up to 90% and in more conservative results, up to 62% of block reports sent to the NameNode. This follows from the fact that HEBR collects only the most recent uploads from the disk and not all the blocks into block reports. The meta data that represents a block in a block report takes 3 bytes [21]. Thus, when the average number of blocks in a block report is smaller, the block report sent to the NameNode is smaller in terms of the number of bytes. This reduces the bandwidth consumption and communication overhead between DataNodes and the NameNode.

6 6 TABLE 5 and Maximum number of Blocks sent to NameNode from Different-Intervals running Word-Count Benchmark on 3-node Cluster File Size Block Size Interval Node CBR HEBR 708KB 926KB Reduction (%) CBR Maximum HEBR Maximum Master KB 1 min Slave Slave Master KB 3 min Slave Slave Master MB 1 min Slave Slave Master KB 5 min Slave Slave Master KB 1 min Slave Slave MB 1 min All Node TABLE 6 Percentage Reduction in number of Blocks sent from DataNodes running HEBR using Word-Count Benchmark Cluster Sizes Experiment Reduction (%) Single-node Different-File-Sizes Single-node Different-Block-Sizes Single-node Different-Intervals node Different-File-Sizes node Different-Block-Sizes node Different-Intervals When the files are larger, the blocks are written over a longer period of time. So, fewer blocks per block report are sent by HEBR when large files are uploaded. Larger the block sizes in the configuration file, fewer the number of blocks for each file. When the blocks are large, only few of them are written over a shorter period of time. Hence the average number of blocks sent to the NameNode is reduced. We notice that the average number of blocks is lower when the blocks are reported more frequently than when reported infrequently. If the generation timer between reports is more than 5 minutes, more number of blocks are reported per report. But when block reports are sent very frequently (1 minute), the processing resources of the NameNode may be overloaded. Thus, based on our inferences of the results, for best performances of HEBR, the optimal interval to be set is 3 minutes. A similar trend of results was observed from experiments running on single-node. 7.1 Different-Block-Sizes on 3-node Cluster The program uploads large files (100MB and 120MB) and small files (150KB). The pre-processing time associated with map-reduce paradigm causes delays in writing blocks. As large files were being uploaded, we set block sizes to 50, 128 and 256 MB in configuration files of both HDFS ruining CBR and HEBR schemes. A 100MB or 120MB file is partitioned into 2 or 3 blocks. The intervals were set to 1 minute. DataNodes running HEBR collect and send significantly fewer blocks on an average. Also, the maximum number of blocks sent in a single block report to the NameNode by all the DataNodes that were running HEBR was much lower than the DataNodes running the CBR as shown in Table 7. Figure 2 shows results from the experiment when a block size for each file size is chosen randomly. 7.2 Observation and Remarks Table 10 shows percentage reduction in number of blocks sent from DataNodes running HEBR when experiments were conducted using random-writer program. Experiments conducted in this benchmark also produced similar results to those conducted on the word-count benchmark. Based on the results, the average number of blocks sent to the NameNode from DataNodes running HEBR is always lower compared with CBR in all experimental settings. The results suggest that HEBR reduces up to 98% and in more conservative results, up to 52% of block reports sent to the NameNode. 7 BENCHMARK 2: RANDOM-WRITER Similar experiments were conducted using the program of the Random-Writer on single-node and 3-node cluster. The results are discussed in this section. In real life scenarios, varying sizes of files are stored on the file system. Thus, we upload files of different sizes ranging from KBs to MBs. The results of an experiment (Different-Block-Sizes) conducted on the 3-node cluster are presented here. Although the experiment Different-Block- Sizes is not a superset of the other two, the results were impressive. 8 BENCHMARK 3: APIRANDOMWRITE Similar experiments were conducted using the program of APIRandomWrite. Since our program does not use the mapreduce paradigm to accomplish the task, the processing time for writing files onto disks of DataNode is negligible. Experiments were run on single-node, 3-node and 4-node clusters by varying the parameters in the same order as the previous benchmarks. We evaluated HEBR when both large files (120MB) and small files (926KB) were uploaded onto HDFS. We present the results from one of the experiments (Different Intervals), that uploaded files of different

7 7 Fig. 1. and Maximum number of Blocks sent to NameNode from Different-Intervals running Word-Count on 3-node Cluster TABLE 7 and Maximum number of Blocks sent to NameNode from Different-Block-Sizes running Random-Writer Benchmark on 3-node Cluster File Size Block Size Node CBR HEBR 150KB 926KB 100MB 120MB 50MB 128MB 256MB 50MB 128MB 256MB 50MB 128MB 256MB 50MB 128MB 256MB Reduction (%) CBR Maximum HEBR Maximum Master Slave Slave Master Slave Slave Master Slave Slave Master Slave Slave Master Slave Slave Master Slave Slave Master Slave Slave Master Slave Slave Master Slave Slave Master Slave Slave Master Slave Slave Master Slave Slave

8 8 sizes when both block sizes and intervals were varied in the configuration files on 3-node and 4-node clusters. This experiment is most significant among all the experiments as it is a superset of the other two experiments. 8.1 Different Intervals on 3-node Cluster To have a valid comparison, same file size (926KB) was uploaded for both word-count and APIRandomWrite. As an additional validation, larger files (in MB) were uploaded using APIRandomWrite. A batch file had instructions to upload 50 files of 150KB, 926KB, 100MB and 120MB was used as input to the program of the benchmark. For each of the file sizes, the block sizes were varied from the default (128MB) to 256MB and 50MB. For every block size adjustments, the intervals were varied from 1 minute to 3 or 5 minutes. The experiment has rich results, however, we present only a small portion of it (for two larger files, one intervals for each block size). The entire result is presented in Appendix B. Table 8 and Figure 3 show the results. 8.2 Different Intervals on 4-node Cluster We explored the advantages of HEBR on a larger cluster of four nodes. Experiments that were run on single-node and 3-node clusters were repeated and the results are discussed in this section. We upload files of same sizes as previous experiment, partitioned into blocks based on three different block sizes set to 128MB, 256MB and 512MB. It was determined that 3 minutes was the most optimal interval parameter, and hence, the 5 minute interval was not tested when running experiments on 4-node cluster. Thus, for each experiment set, the intervals were varied from 1 to 3 minutes. Only a portion of results of different trials is presented with details presented in Appendix C. Randomly chosen interval for a pair of file size and block sizes are shown in Table 9. This experiment again illustrates that HEBR sends much smaller number of blocks on an average and saves processing time at NameNode. The files are written at different rates in every DataNode and thus the maximum number of blocks sent by them are different. The results conclusively proves the advantages of using HEBR. We present complete results from the experiment when 50 files of 180MB were uploaded in Figure 4. TABLE 11 Percentage Reduction in number of Blocks sent from DataNodes running HEBR using APIRandomWrite Benchmark Cluster Sizes Experiment Reduction (%) Single-node Different-File-Sizes Single-node Different-Block-Sizes Single-node Different-Intervals node Different-File-Sizes node Different-Block-Sizes node Different-Intervals node Different-File-Sizes node Different-Block-Sizes node Different-Intervals Experiments conducted in this benchmark also produced similar results to those conducted on word-count and random-write benchmarks. The average number of blocks sent to the NameNode from DataNodes running HEBR is lower compared with CBR in all experimental settings. The results suggest that HEBR reduces up to 95% and in more conservative results, up to 62% of block reports sent to the NameNode. 8.3 Observation and Remarks We consolidate the results from all the experiments that were conducted using APIRandomWrite as percentages that indicate the decrease in number of blocks sent to the NameNode when HEBR is implemented in Table 11. TABLE 10 Percentage Reduction in number of Blocks sent from DataNodes running HEBR using Random-Writer Benchmark Cluster Sizes Experiment Reduction (%) Single-node Different-File-Sizes Single-node Different-Block-Sizes Single-node Different-Intervals node Different-File-Sizes node Different-Block-Sizes 85.97

9 9 Fig. 2. and Maximum number of Blocks sent to NameNode from Different-Block-Sizes running Random-Writer Benchmark on 3-node Cluster TABLE 8 and Maximum number of Blocks sent to NameNode from Different-Intervals running APIRandomWrite Benchmark on 3-node Cluster File Size Block Size Interval Node 100MB 120MB 50MB 128MB 256MB 50MB 128MB 256MB CBR HEBR Reduction (%) CBR Maximum HEBR Maximum Master min Slave Slave Master min Slave Slave Master min Slave Slave Master min Slave Slave Master min Slave Slave Master min Slave Slave Fig. 3. and Maximum number of Blocks sent to NameNode from Different-Intervals running APIRandomWrite Benchmark on 3-node Cluster

10 10 TABLE 9 and Maximum number of Blocks sent to NameNode from Different-Intervals running APIRandomWrite Benchmark on 4-node Cluster File Size Block Size Interval Node 100MB 120MB 180MB CBR HEBR Reduction (%) CBR Maximum HEBR Maximum Master MB 1 min Slave Slave Slave Master MB 3 min Slave Slave Slave Master MB 1 min Slave Slave Slave Master MB 3 min Slave Slave Slave Master MB 3 min Slave Slave Slave Master MB 3 min Slave Slave Slave Master MB 1 min Slave Slave Slave Master MB 3 min Slave Slave Slave Master MB 1 min Slave Slave Slave Fig. 4. and Maximum number of Blocks sent to NameNode from Different-Intervals running APIRandomWrite Benchmark on 4-node Cluster uploading 50 Files of 180MB

11 11 9 CONCLUSION In order to obtain meta data information, the current implementation of HDFS scans disks of DataNodes to collect all the blocks irrespective of when they were written to the disk and dump into a list known as block report. This block reporting scheme increases the computation loads on the NameNode and network bandwidth in a Hadoop cluster as it processes unnecessarily many identical block reports. This paper presents a new and much more efficient block reporting scheme, called HEBR (High Efficiency Block Reporting). In HEBR, a full block report containing all the blocks hosted on the DataNode is only sent once at the very first registration process. Subsequent reports only send a collection of blocks that were written between two subsequent block reports. The current implementation does not send blocks that are deleted from the DataNodes but HEBR does. When a large number of blocks are deleted from the disks, it is possible that HEBR may contain more number of blocks than the full block report. In such cases, a full block report is sent to ensure HEBR always performs better. HEBR has been evaluated using three workload benchmarks on various cluster configurations. All our experiments show that HEBR achieves much better performance than the current block reporting scheme: at its best, HEBR reduces the number of blocks sent to the NameNode by about 97.52% and by about 59.78% at its worst performance as compared to the current implementation. Since HEBR does not change the overall architecture of HDFS, it can easily be integrated into existing Hadoop systems. The experiments conducted so far gives us confidence that when these changes are implemented in large scale clusters, it will lead to even more significant improvement in efficiency and performance, thereby directly impacting the bottom line of business enterprises using Hadoop. We anticipate its adoption by the Hadoop community. The average and maximum number of blocks sent from all the three DataNodes during every set of experiments are presented in Table 14. APPENDIX C STATISTICS FROM DIFFERENT-INTERVALS RUN- NING APIRANDOMWRITE BENCHMARK ON 4-NODE CLUSTER We present the entire results from experiments when four files of different sizes are partitioned into various blocks based on three different block sizes are uploaded into file system. We adjust the intervals from 1 minute to 3 and 5 minutes for every set of experiments in configuration files in HDFS running CBR and HEBR. Statistics (average and maximum) number of blocks sent from all the four DataNodes during every set of experiments are presented in Table 15. APPENDIX A STATISTICS FROM DIFFERENT-INTERVALS RUNNING WORD-COUNT BENCHMARK ON 3-NODE CLUSTER We present the results from experiments when four files of different sizes are partitioned into number of blocks and uploaded into the file system. We re-adjust the intervals from 1 minute to 3 and 5 minutes for every set of experiments in configuration files of HDFS running both the block reporting schemes (CBR and HEBR). The maximum and average number of blocks sent from all the three DataNodes at every set of experiments are presented in Table 12 and 13. APPENDIX B STATISTICS FROM DIFFERENT-INTERVALS RUN- NING APTRANDOMWRITE BENCHMARK ON 3-NODE CLUSTER We present the entire results from experiments when four different file sizes are partitioned into various blocks based on three different block sizes are uploaded into file system. We adjust the intervals from 1 minute to 3 and 5 minutes for every experiment set in configuration files of HDFS running both the block reporting schemes (CBR and HEBR).

12 TABLE 12 and Maximum number of Blocks sent to NameNode from Different-Intervals running Word-Count Benchmark on 3-node Cluster (Part 1) File Size Block Size Interval Node CBR Maximum HBER Maximum CBR HEBR Reduction (%) 708KB 825KB Master KB 1 min Slave Slave Master KB 3 min Slave Slave Master KB 5 min Slave Slave Master MB 1 min Slave Slave Master KB 1 min Slave Slave Master KB 3 min Slave Slave Master KB 5 min Slave Slave Master KB 1 min Slave Slave Master KB 3 min Slave Slave Master KB 5 min Slave Slave Master MB 1 min Slave Slave Master KB 1 min Slave Slave Master KB 3 min Slave Slave Master KB 5 min Slave Slave REFERENCES [1] H. Weatherspoon and J. D. Kubiatowicz, Erasure Coding Vs. Replication: A Quantitative Comparison, IPTPS, [2] D. Ismail and S. Harris, Performance Comparison of Big Data Analysis using Hadoop in Physical and Virtual Servers. [3] T. Ivanov and N. Korfiatis and R. V. Zicari, On the inequality of the 3V s of Big Data Architectural Paradigms: A case for heterogeneity, CoRR, [4] P. Zikopoulos, C. Eaton, D. DeRoos, T. Deutsch and G. Lapis, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, McGraw-Hill, [5] B. Dhruba, The hadoop distributed file system: Architecture and design, Apache Hadoop Documentation, [6] C. Fay et al., Bigtable: A distributed storage system for structured data, TOCS, [7] K.V. Shvachko and A.C. Murthy, Scaling Hadoop to 4000 Nodes at Yahoo!, Yahoo! Developer Network Blog, [8] O. OMalley and A.C. Murthy, Hadoop Sorts a Petabyte in Hours and a Terabyte in 62 Seconds, Yahoo! Developer Network Blog, [9] K.V. Shvachko, HDFS scalability: the limits to growth, Hadoop Wiki, [10] J. Shafer, A storage architecture for data-intensive computing, Ph.D. thesis, Rice University, [11] Performance measurement of a Hadoop Cluster. [12] Performance Evaluation of Read and Write Operations in Hadoop Distributed File System. [13] Hadoop Performance Evaluation [14] W. Tantisiriroj, S. Patil, G. Gibson, S. W. Son, S. J. Lang, and R. B. Ross, On the Duality of Data-intensive File System Design: Reconciling HDFS and PVFS, SC11, [15] B. Nicolae, D. Moise, G. Antoniu, L. Boug, and M. Dorier, Blob- Seer: Bringing high throughput under heavy concurrency to Hadoop Map/Reduce applications, IPDPS, [16] Hadoop Scalability and Performance Testing in Heterogeneous Clusters. [17] Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters. [18] A Framework for Performance Analysis and Tuning in Hadoop Based Clusters. [19] A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker, A comparison of approaches to largescale data analysis, SIGMOD, [20] T. White, HDFS Reliability, 2009 [21] M. Foley, Consider redesign of block report processing, [22] K.V. Shvachko, Apache Hadoop The Scalability Updates, Hadoop Wiki, [23] M. B. Alam, M. Hasan and Md. K. Uddin, A New HDFS Structure Model to Evaluate the Performance of Word Count Application on Different File Size. [24]

13 TABLE 13 and Maximum number of Blocks sent to NameNode from Different-Intervals running Word-Count Benchmark on 3-node Cluster (Part 2) File Size Block Size Interval Node CBR Maximum HEBR Maximum CBR HEBR Reduction (%) 860KB 926KB Master KB 1 min Slave Slave Master KB 3 min Slave Slave Master KB 5 min Slave Slave Master MB 1 min Slave Slave Master KB 1 min Slave Slave Master KB 3 min Slave Slave Master KB 5 min Slave Slave Master KB 1 min Slave Slave Master KB 3 min Slave Slave Master KB 5 min Slave Slave Master MB 1 min Slave Slave Master KB 1 min Slave Slave Master KB 3 min Slave Slave Master KB 5 min Slave Slave

14 TABLE 14 and Maximum number of Blocks sent to NameNode from Different-Intervals running APIRandomWrite Benchmark on 3-node Cluster File Size Block Size Interval Node CBR Maximum HEBR Maximum CBR HEBR Reduction (%) 100MB 120MB Master MB 1 min Slave Slave Master MB 3 min Slave Slave Master MB 5 min Slave Slave Master MB 1 min Slave Slave Master MB 3 min Slave Slave Master MB 5 min Slave Slave Master KB 1 min Slave Slave Master KB 3 min Slave Slave Master KB 5 min Slave Slave Master MB 1 min Slave Slave Master MB 3 min Slave Slave Master MB 5 min Slave Slave Master MB 1 min Slave Slave Master MB 3 min Slave Slave Master MB 5 min Slave Slave Master MB 1 min Slave Slave Master MB 3 min Slave Slave Master MB 5 min Slave Slave

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?