Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

Size: px
Start display at page:

Download "Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms"

Transcription

1 Intel Cloud Builders Guide Intel Xeon Processor-based Servers Apache* Hadoop* Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Apache* Hadoop* Intel Xeon Processor 5600 Series Audience and Purpose This reference architecture is for companies who are looking to build their own cloud computing infrastructure, including both enterprise IT organizations and cloud service providers or cloud hosting providers. The decision to use a cloud for the delivery of IT services is best done by starting with the knowledge and experience gained from previous work. This reference architecture gathers into one place the essentials of a Apache* Hadoop* cluster build out complete with benchmarking using TeraSort workload. This paper defines easy to use steps to replicate the deployment at your data center lab environment. The installation is based on Intel -powered servers and creates a multi node, optimized Hadoop environment. The reference architecture contains details on the Hadoop topology, hardware and software deployed, installation and configuration steps, and tests for real-world use cases that should significantly reduce the learning curve for building and operating your first Hadoop infrastructure. It is not expected that this paper can be used as-is. For example, adapting to an existing network and identifying specific management requirements are out of scope for this paper. Therefore, it is expected that the user of this paper will make significant adjustments as required to the design presented in order to meet their specific requirements of their own data center or lab environment. This paper also assumes that the reader has basic knowledge of computing infrastructure components and services. Intermediate knowledge of Linux* operating system, Python*, Hadoop framework and basic system administration skills is assumed. February 2012

2 Table of Contents Executive Summary... 3 Hadoop* Overview... 3 Hadoop System Architecture... 4 Operation of a Hadoop Cluster... 5 TeraSort Workload...7 TeraSort Workflow... 7 Test Methodology... 7 Intel Benchmark Install and Test Tool (Intel BITT)... 8 Intel BITT Benefits... 8 Configuring the Setups... 8 Running TeraSort Results...24 Conclusion

3 Executive Summary Map reduce technology is gaining popularity among enterprises for a variety of large-scale data intensive jobs. Map reduce based on Apache* Hadoop* is rapidly emerging as a technology preferred for big data processing and management. Enterprises are deploying commodity standard server clusters and using business intelligence tools along with Apache Hadoop to obtain high performing solutions for their large scale data processing requirements. Motivation to deploy Hadoop comes from the fact that enterprises are gathering huge unstructured data sets generated by their business processes, which enterprises are looking to exploit to get the most value out of this data to help them in the decision making process. Hadoop infrastructure moves data closer to compute to achieve high processing throughput. In this paper we tried to create a small commodity server cluster based on an Apache Hadoop distribution and ran sort benchmark to get data on how fast the cluster can process data. This reference architecture will give Figure 1: Hadoop* stack an understanding on how to set up the cluster, tune parameters, and run sort benchmark. This reference architecture provides a blue print for building a cluster with Intel Xeon processor based standard server platforms and the open source Apache Hadoop distribution. The paper further describes parameters for tuning and execution of sort benchmark to measure performance. Hadoop* Overview Apache Hadoop is a framework for running applications on large cluster built using standard hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both MapReduce and the Hadoop Distributed File System (HDFS) are designed so that node failures are automatically tolerated by the framework. Hadoop framework consists of three major components: Common: Hadoop Common is a set of utilities that support the Hadoop subprojects. Hadoop Common includes FileSystem, RPC, and serialization libraries. HDFS: The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on lowcost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS can stream file system data. MapReduce: MapReduce was first developed by Google to process large datasets. MapReduce has two functions, map and reduce, and a framework for running a large number of instances of these programs on commodity hardware. The map function reads a set of records from an input file, processes these records, and outputs a set of intermediate records. As part of the map function, a split function distributes the intermediate records across many buckets using a hash function. The reduce function then processes the intermediate records. The MapReduce Framework consists of a single master JobTracker and one slave TaskTracker per cluster node. The master is responsible for scheduling the jobs' component tasks on the slaves, monitoring them, and re-executing the failed tasks. The slaves execute the tasks as directed by the master. 3

4 Hadoop* System Architecture Hadoop framework works on the principle of "moving compute closer to the data." Figure 2 shows typical deployment of Hadoop framework on multiple standard server nodes. The computation occurs on the same node where data resides, which enables Hadoop to deliver better performance compared to storing data on the network. A combination of standard server platforms and Hadoop infrastructure provide a cost efficient and high performance platform for dataparallel applications. Each Hadoop cluster has one Master Node and multiple slave nodes. The Master node runs NameNode and JobTracker functions, coordinating with slave nodes to get the job fed to the cluster completed. The SlaveNodes run TaskTracker, HDFS to store the data, and have Map and Reduce functions which perform the data computations. Figure 2: Hadoop* deployment on standard server nodes 4

5 Operation of a Hadoop* Cluster Figure 3 shows the operation of a Hadoop cluster. The client submits the job to the Master node which acts as an orchestrator with the Slave nodes to complete the job. The JobTracker on the Master node is responsible for controlling the MapReduce job. The slaves run TaskTracker which keeps track of the MapReduce job, reporting the job status to the JobTracker on frequent intervals. In an event of a task failure, the JobTracker reschedules the task on the same slave node or a different slave node. HDFS is a location aware or rack aware file system which primarily manages data in a Hadoop cluster. HDFS replicates the data on various nodes in the cluster to attain data reliability; however, HDFS has a single point of failure in NameNode function. If the NameNode fails the file system and data become inaccessible. Since the JobTracker assigns the data to slave nodes, JobTracker is aware of the data location and efficiently schedules the task where the data is residing, thus decreasing the need to move data from one node to other and saving network bandwidth. Once the map function is complete, the data is transferred to different node to perform reduce function. MapReduce framework provides an efficient way to scale the size of the cluster by adopting modular scaleout strategy. The nodes are scaled out by adding one or more nodes with HDFS and MapReduce functions supporting new nodes as they are added. Figure 3: Operation of Hadoop* cluster 5

6 Cluster hardware setup: Total 17 nodes in the cluster. One Master node and 16 Slave nodes. Data Network: Arista 7124 switch connected to Intel Ethernet Server Adapter X520-DA2 dual 10GbE NIC on every node. Each server has an internal private Intel dual 1GbE NIC connected to a top-of-rack switch that is used for management tasks. Each node has a disk enclosure populated with SATA II 7.2K, 2TB hard disk drives for a total of 24TBs of raw storage per hard disk enclosure. Dual socket Intel 5520 Chipset platform. Two Intel Xeon processor X5680 at 3.33GHz, 12MB cache. 48GB 1333MHz DDR3 memory Red Hat Enterprise Linux* 6.0 (RHEL 6.0)(Kernel: el6..x86_64) Hadoop* Framework v Figure 4: Cluster hardware setup 6

7 TeraSort Workload TeraSort is a popular Hadoop benchmarking workload. The 1TB limit is not a hard-set limit since TeraSort allows the user to sort any size of dataset by changing various parameters. TeraSort benchmark tests HDFS and MapReduce functions in the Hadoop cluster. TeraSort is part of the Hadoop framework and is part of the standard Apache Hadoop installation package. TeraSort is widely used to benchmark and tune large Hadoop clusters with hundreds of nodes. TeraSort works in two steps: TeraGen: This generates random data based on the dataset size set by the user. This dataset is used as input data for the sort benchmark. TeraSort: TeraSort sorts the input data generated by TeraGen and stores the output data on HDFS. An optional third step, called TeraValidate, allows validation of the sorted data. This paper does not discuss this optional third step. TeraSort Workflow Figure 5 shows the workflow of the TeraSort workload tested on our cluster. The flow chart depicts the start of the workload at one control node with one master node kick starting the job and 16 slave nodes dividing 8192 map tasks. Once the map phase is complete, the cluster starts the reduce phase with 243 tasks. When the reduce phase is completed, the data output is stored on the file system. Test Methodology To run the workload we used an Intel Benchmark Install and Test Tool (Intel BITT. The workload was scripted to kickstart the job on the cluster, run TeraGen to generate the test data, and run the TeraSort task to sort the generated data. The scrip also kicks off a series of counters on the slave nodes to gather performance metrics on each of the nodes. Key hardware metrics such as processor utilization, network bandwidth consumption, memory utilization, and disk bandwidth consumption is captured on each node at 30 second intervals. Once the job is complete, the counters are stopped on all slave nodes and the log files containing performance data are copied to the master node for calculating utilization of the cluster. This data is plotted into graphs using gnuplot and presented for further analysis. Also we noted the time taken to complete the job taken from the Hadoop management user interface. The lower the time measurement the better the performance. Figure 5: TeraSort workflow 7

8 Intel Benchmark Install and Test Tool Intel Benchmark Install and Test Tool (Intel BITT) provides tools to install, configure, run, and analyze benchmark programs on small test clusters. The installcli tool is used to install tar files on a cluster. moncli is used to monitor performance of the cluster nodes and provides options to start monitoring, stop monitoring, and generate CPU, disk I/O, memory, and network performance plots for the nodes and cluster. hadoopcli provides an automated Hadoop test environment. The Intel BITT templates enable configurable plot generation. Intel BITT command scripts enable configurable scripts to control monitoring actions. Benchmark configuration is implemented by using XML files. Configurable properties include the location of installation, monitoring directories, monitoring sampling duration, the list of the cluster nodes, and the list of the tar files that need to be installed. Intel BITT is implemented by using Python* and uses gnuplot to generate performance plots. Intel BITT currently runs on Linux*. Intel BITT Features Intel Benchmark Install and Test Tool provides the following tools: installcli: Used to install a specified list of tar files to a specified list of nodes moncli: Used to monitor performance metrics locally and/or remotely. It can be used to monitor the performance of a cluster. The tool currently supports sar and iostat monitoring tools. hadoopcli: Used to install, configure, and test Hadoop clusters. Intel BITT is implemented in an object oriented fashion. It can be extended to support other performance monitoring tools such as vmstat and mpstat if it is needed. The toolkit includes the following building blocks: XML parser: Parses the XML properties including name, value, and description fields. The install and monitor configuration is defined by using XML properties. Tool specific options are passed through command line options. Log file parser: Log files in the form of tables which contains rows and columns are parsed and CSV files are generated for each column. The column items on each row are separated using whitespace. The column header names are used to create CSV file names. Plot generator: gnuplot is used to plot the contents of the CSV files by using templates. The templates define the list of CSV files that are used as inputs to generate the plots. The templates also define labels and titles of the plots. Sar monitoring tool Iostat monitoring tool VTuneTM monitoring tool Emon monitoring tool installcli is used to install Intel BITT moncli is used to monitor local or cluster nodes hadoopcli is implemented by using the building blocks defined above and it is used to create and test Hadoop clusters Configuring the Setup We installed RHEL 6.0 on all 17 nodes with the default configuration and configured passphraseless SSH access between the nodes to enable them to communicate without having to login with a password every time there is a transaction between them. 1. Install Intel BITT tar file Cd mkdir bitt cp bitt-1.0.tar bitt cd bitt/bitt-2.0 The following is the list of subdirectories under Intel BITT home: cmd conf samples scripts templates 8

9 2. Create a release directory under Intel BITT home to copy tar files. mkdir p bitt/bitt-1.0/release cp bitt-1.0.tar bitt/bitt-1.0/release You can also download and copy the Hadoop tar file to the release directory as well if you are planning to test Hadoop. cp hadoop tar.gz ~/bitt/bitt-1.0/release 3. Download jdk and create a tar file from the installed jdk tar. For example: mkdir jdk cp jdk-6u23-linux-x64.bin jdk cd jdk chmod +x jdk-6u23-linux-x64.bin./jdk-6u23-linux-x64.bin rm jdk-6u23-linux-x64.bin tar -cvf ~/bitt/bitt-1.0/release/jdk1.6.0_23.tar 4. Download gnuplot and create a tar file from the installed gnuplot tree. For example: mkdir myinstall cp gnuplot rc1.tar myinstall cd myinstall/ tar -xvf gnuplot rc1.tar mkdir p install/ gnuplot cd gnuplot rc1./configure --prefix=/home/<user>/myinstall/install/ gnuplot make make install cd../install tar -cvf ~/bitt/bitt-1.0/release/gnuplot tar. 5. Download Python and create a tar file from the installed python tar for your platform. For example: mkdir myinstall cp Python tgz myinstall cd myinstall/ tar -xvf Python tgz mkdir p install/ Python cd Python /configure --prefix=/home/<user>/myinstall/install/ Python make make install cd../install tar -cvf ~/bitt/bitt-1.0/release/ Python tar. 9

10 6. Run TeraSort. For example: Run terasort.sh. You need to update the corresponding configuration files as described below. cd ~/bitt/bitt-1.0/conf install gnuplot on your client system install python on your client system Make sure python3 and gnuplot are on your path on the client system cd ~/bitt/bitt-1.0/scripts./terasort.sh 7. Configuration file edits. All configuration files are found under ~/bitt/bitt-1.0/conf a. hadoopnodelist: Configuration file which contains cluster nodes. Any addition or removal of nodes from the cluster should register here to be recognized by the load generator tool. node1.domain.com node2.domain.com node3.domain.com node4.domain.com.. node17.domain.com b. hadooptarlist: Configuration file where the executable are installed.../release/bitt-1.0.tar.gz../release/python-3.2.tar.gz../release/jdk1.6.0_25.tar.gz../release/hadoop tar.gz../release/gnuplot tar.gz 10

11 c. hadoop-env.sh: Main Hadoop environment configuration file. # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun # Extra Java CLASSPATH elements. Optional. # export HADOOP_CLASSPATH= # The maximum amount of heap to use, in MB. Default is # export HADOOP_HEAPSIZE=2000 # Extra Java runtime options. Empty by default. # export HADOOP_OPTS=-server # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management. jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" 11

12 d. hadoopcloudconf.xml: Custom XML configuration file used to define key parameters on how the test is executed and where the data is stored. <?xml version="1.0" encoding="utf-8"?> <configuration> <name>cloudtemplateloc</name> <value>/home/hadoop/bitt/bitt-1.0/conf</value> <description>cloud conf template file location</description> <name>cloudtemplatevars</name> <value>all</value> <description>the list of template variables to copy</description> <name>jobtrackerport</name> <value>8021</value> <description>jobtracker port</description> <name>namenodeport</name> <value>8020</value> <description>jobtracker port</description> <name>cloudconfdir</name> <value>/tmp/hadoopconf</value> <description>generated cloud conf file</description> <name>cloudtmpdir</name> <value>hadoop-${user.name}</value> <description>cloud tmp dir</description> <name>cloudinstalldir</name> <value>/usr/local/hadoop/install</value> <description>cloud install dir</description> <name>cloudnodelist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadoopnodelist</value> <description>cluster nodes</description> <name>monnodelist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadoopmonnodelist</value> 12

13 <description>cluster monitor nodes</description> <name>cloudtarlist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadooptarlist</value> <description>cluster nodes</description> <name>moninterval</name> <value>30</value> <description>sampling duration</description> <name>moncount</name> <value>0</value> <description>number of samples</description> <name>monresults</name> <value>/tmp/monhadres</value> <description>cloud monitor log files location</description> <name>monsummary</name> <value>/tmp/monhadsum</value> <description>cloud monitor log files location</description> <name>mondir</name> <value>/tmp/monhadloc</value> <description>cloud monitor log files location</description> <name>gnucmd</name> <value>/usr/local/hadoop/install/gnuplot-4.4.3/bin/gnuplot</value> <description>none</description> </configuration> 13

14 e. hdfs-site-template.xml: Hadoop configuration file where HDFS parameters are set. Please note the optimizations values we used to run the test are shown in bold font. <?xml version="1.0" encoding="utf-8"?> <!-- Put site-specific property overrides in this file. --> <configuration> <name>dfs.replication</name> <value>3</value> <description>default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> <name>dfs.datanode.max.xcievers</name> <value>655360</value> <description>number of files Hadoop serves at one time</description> <name>dfs.data.dir</name> <value>/mnt/disk1/hdfs/data,/mnt/disk2/hdfs/data,/mnt/disk3/hdfs/data,/mnt/disk4/hdfs/data,/mnt/disk5/hdfs/data,/mnt/ disk6/hdfs/data,/mnt/disk7/hdfs/data,/mnt/disk8/hdfs/data,/mnt/disk9/hdfs/data,/mnt/disk10/hdfs/data,/mnt/disk11/hdfs/ data,/mnt/disk12/hdfs/data</value> <description>determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> <name>dfs.block.size</name> <value> </value> <description>the default block size for new files.</description> <name>io.file.buffer.size</name> <value>131072</value> <description> </description> 14

15 <name>ipc.server.tcpnodelay</name> <value>true</value> <description> </description> <name>ipc.client.tcpnodelay</name> <value>true</value> <description> </description> <name>dfs.namenode.handler.count</name> <value>40</value> <description> </description> <name>io.sort.factor</name> <value>100</value> <description> </description> <name>io.sort.mb</name> <value>220</value> <description> </description> </configuration> 15

16 f. mapred-site-template.xml: Hadoop configuration file which defines key MapReduce parameters. Values used in our testing are highlighted in bold font. <?xml version="1.0" encoding="utf-8"?> <!-- Put site-specific property overrides in this file. --> <configuration> <name>mapred.tasktracker.map.tasks.maximum</name> <value>24</value> <description>the maximum number of map tasks that will be run simultaneously by a task tracker. </description> <name>io.sort.record.percent</name> <value>0.3</value> <description>added as per ssg reco </description> <name>io.sort.spill.percent</name> <value>0.9</value> <description>addded as per ssg reco </description> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>12</value> <description>the maximum number of reduce tasks that will be run simultaneously by a task tracker. </description> <name>mapred.reduce.tasks</name> <value>64</value> <description>the default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Assume 10 nodes, 10*2-2 </description> <name>mapred.local.dir</name> <value>/mnt/disk1/hdfs/mapred,/mnt/disk2/hdfs/mapred,/mnt/disk3/hdfs/ mapred,/mnt/disk4/hdfs/mapred,/mnt/disk5/hdfs/mapred,/mnt/disk6/hdfs/ mapred,/mnt/disk7/hdfs/mapred,/mnt/disk8/hdfs/mapred,/mnt/disk9/hdfs/ mapred,/mnt/disk10/hdfs/mapred,/mnt/disk11/hdfs/mapred,/mnt/disk12/hdfs/ 16

17 mapred</value> <description>the local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. </description> <name>mapred.child.java.opts</name> <value>-xmx2048m -Djava.net.preferIPv4Stack=true</value> <description>java opts for the task tracker child processes. The following symbol, if present, will be is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc The configuration variable mapred.child.ulimit can be used to control the maximum virtual memory of the child processes. </description> <name>mapred.output.compress</name> <value>false</value> <description>should the job outputs be compressed? </description> <name>mapred.compress.map.output</name> <value>false</value> <description>should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression. </description> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.defaultcodec</value> <description>if the job outputs are compressed, how should they be compressed? </description> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.defaultcodec</value> <description>if the job outputs are compressed, how should they be compressed? </description> 17

18 <name>mapred.map.tasks.speculative.execution</name> <value>true</value> <description> </description> <name>mapred.reduce.tasks.speculative.execution</name> <value>true</value> <description> </description> <name>mapred.job.reuse.jvm.num.tasks</name> <value>1</value> <description> </description> <name>mapred.reduce.parallel.copies</name> <value>20</value> <description> </description> <name>mapred.min.split.size</name> <value>65536</value> <description> </description> <name>mapred.reduce.copy.backoff</name> <value>5</value> <description> </description> <name>mapred.job.shuffle.merge.percent</name> <value>0.7</value> <description> </description> <name>mapred.job.shuffle.input.buffer.percent</name> <value>0.66</value> <description> </description> <name>mapred.job.reduce.input.buffer.percent</name> <value>0.90</value> <description> </description> </configuration> 18

19 g. hadoop-terasort.xml: Intel BITT configuration file from which the parameters are read before the test runs. Parameters in this configuration file override values in the other configuration files mentioned above. This configuration file helps to quickly change the parameter values for different test runs without editing individual configuration files. <?xml version="1.0" encoding="utf-8"?> <configuration> <name>mapred.map.tasks</name> <value>8192</value> <description> Total Map task number </description> <name>mapred.reduce.tasks</name> <value>243</value> <description> Total Reduce task number </description> <name>dfs.replication</name> <value>3</value> <description> Number of copies to replicate </description> <name>mapred.compress.map.output</name> <value>true</value> <description> compress map output </description> <name>mapred.job.shuffle.input.buffer.percent</name> <value>0.66</value> <description> none </description> <name>datasetsizesmall</name> <value> </value> <description> Total Record Number about 1T data. 1 record has 100 bytes </ description> <name>datasetsize</name> <value> </value> <description> Total Record Number about 1T data. 1 record has 100 bytes </ description> 19

20 <name>datasetname</name> <value>tera</value> <description> none </description> <name>outputdataname</name> <value>tera-sort2</value> <description> none </description> <name>jarfile</name> <value>hadoop examples.jar</value> <description> none </description> </configuration> 20

21 Running TeraSort TeraSort can be started by running terasort.sh. The script runs various commands involved in starting the test, starting performance counters, ending the test, and gathering performance counter data for analysis. Below is the list of commands executed when the script is running, and a brief explanation on what the command does. #!/usr/bin/env bash ########################################################### #Intel Benchmark Install and Test Tool (BITT) Use Cases #Typical sequence for hadoop terasort benchmark: ########################################################### echo "START: terasort benchmark..." date # Stop any current running test on the cluster.../scripts/hadoopcli -a stop -c../conf/hadoopcloudconf.xml # Kill Java* processes on the nodes./runkill.sh # Install fresh copy of executables on the slave nodes.../scripts/hadoopcli -a install -c../conf/hadoopcloudconf.xml # Format the HDFS to store the data../scripts/hadoopcli -a format -c../conf/hadoopcloudconf.xml # Start Java processes on all slave nodes.../scripts/hadoopcli -a start -c../conf/hadoopcloudconf.xml # 2 minutes delay to get the processes started on the slave nodes. sleep 120 # Generate 1TB of data which will be used for sorting.../scripts/hadoopcli -a data -c../conf/hadoopcloudconf.xml # Create monitoring directories.../scripts/moncli -r clean -c../conf/hadoopcloudconf.xml # Start iostat utility to monitor disk usage on the slave nodes.../scripts/moncli -m iostat -a run -c../conf/hadoopcloudconf.xml -s run_iostat.sh 21

22 # Start sar utility on all the slave nodes to monitor CPU, network, and memory utilization../scripts/moncli -m sar -a run -c../conf/hadoopcloudconf.xml -s run_sar2.sh # Start the sort activity on the 1TB data generated in the earlier step.../scripts/hadoopcli -a run -c../conf/hadoopcloudconf.xml # Stop sar monitoring utility.../scripts/moncli -m sar -a kill -s run_sar_kill.sh -c../conf/hadoopcloudconf.xml # Stop sar monitoring utility.../scripts/moncli -m sar -a kill -s run_sar_kill.sh -c../conf/hadoopcloudconf.xml # Stop iostat utility.../scripts/moncli -m iostat -a kill -s run_iostat_kill.sh -c../conf/hadoopcloudconf.xml # Convert iostat generated data to CSV file format.../scripts/moncli -m iostat -a csv -c../conf/hadoopcloudconf.xml # Convert data generated from sar utility to CSV format.../scripts/moncli -m sar -a csv -c../conf/hadoopcloudconf.xml -s run_sar_gen.sh # Using gnuplot to generate image containing graph of iostat data.../scripts/moncli -m iostat -a plot -t iostat -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing CPU graph from sar data.../scripts/moncli -m sar -a plot -t cpu -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing memory graph from sar data.../scripts/moncli -m sar -a plot -t mem -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing network graph from sar data.../scripts/moncli -m sar -a plot -t nw -c../conf/hadoopcloudconf.xml # Archive logfiles on all the slave nodes.../scripts/moncli -r tar -c../conf/hadoopcloudconf.xml 22

23 # Copy archived logfiles from slave nodes to master node.../scripts/moncli -r collect -c../conf/hadoopcloudconf.xml # Stop running processes on slave nodes.../scripts/hadoopcli -a stop -c../conf/hadoopcloudconf.xml # Creates folder called cluster in head node /tmp/monhadsum.../scripts/moncli -r cluster -c../conf/hadoopcloudconf.xml # Calculate average CPU utilization of the cluster.../scripts/moncli -r average -m sar -t cpu -c../conf/hadoopcloudconf.xml # Calculate average memory throughput of the cluster.../scripts/moncli -r throughput -m sar -t mem -c../conf/hadoopcloudconf.xml # Calculate average network utilization for the cluster.../scripts/moncli -r throughput -m sar -t nw -c../conf/hadoopcloudconf.xml # Calculate average disk throughput of the cluster.../scripts/moncli -r throughput -m iostat -t iostat -c../conf/hadoopcloudconf.xml # Copy contents of hadoopconf folder. cp -r /tmp/hadoopconf /tmp/monhadsum # Copy contents of hadoopconf folder. cp -r /tmp/hadoopconf /tmp/monhadsum # Copy performance data gathering templates. cp -r../templates /tmp/monhadsum # Create archive with all the logfiles and graph images. tar -cvf /tmp/baselinemaprtera.tar /tmp/monhadsum # End script. echo "END: terasort benchmark..." date 23

24 Results Figure 6 shows the test results from running TeraSort using the 17 node Hadoop cluster. The images shows two rounds of testing with data compression turned ON and one with data compression turned OFF. The lower the time to complete, which is measured in seconds, the better the result. The images also show resource utilization in terms of processor, memory, disk throughput, and network throughput for both the test runs. Figure 6: Time taken to complete TeraSort. In Figure 6, the blue bar shows time taken by TeraSort to run with output of the map phase compressed before the data was stored for reduced phase. In our test the cluster sorted 1TB of data in 1207 seconds with data compression. The red bar shows time taken by TeraSort run to complete without data compression. As we can see in the graph, TeraSort completes in 1040 seconds and is faster than run with data compression. 24

25 The following graphs show resource utilization with data compression enabled. Figure 7: Processor utilization with Data compression enabled Figure 7 shows the average processor utilization of the cluster with data compression. The Intel Xeon processor X5680 has the additional task of compressing the data and makes an excellent choice for setting up the Hadoop cluster. 25

26 Figure 8: Network throughout with data compression enabled Figure 8 shows average network throughput of the cluster with data compression enabled. Since the data is compressed before getting transmitted over the network, the amount of data sent over the network is reduced. 26

27 Figure 9: Memory usage Figure 9 shows the average percentage of cluster memory used. Since we allocated almost 2GB of memory per task, the entire 48GB of memory on the server is utilized when the TeraSort benchmark is running. 27

28 Figure 10: Disk throughput with data compression enabled Figure 10 shows the average disk throughput of the cluster. Since the data is compressed, the writes are minimal during the map phase and peak to nearly 600Mb/s when the sorted data is committed to the disk. 28

29 The following graphs show resource utilization with data compression disabled. Figure 11: Processor utilization with Data compression disabled 29

30 Figure 12: Network throughput with data compression disabled Figure 12 shows the network throughput reaching almost 300MB/s, or close to 3Gb/s, when TeraSort is run with compression of data disabled. To provide optimal bandwidth to accommodate the data transfer between the nodes, Intel Ethernet server adapter X520 based on10gbe efficiently handles the data throughput of the cluster. 30

31 Figure 13: Memory usage 31

32 Figure 14: Disk throughput with data compression disabled With compression disabled we see the disk usage is higher as with network usage. The peak writes were at 620MB/s and remained above 400MB/s for the entire run. The total cluster throughput including read and writes was closer to 1GB/s at the peaks. 32

33 Conclusion Hadoop clusters benefit a great deal from servers based on Intel Xeon processor 5680; the dual socket servers are optimal for any Hadoop deployment ranging from a few nodes to hundreds of nodes. In our test runs we were able to put the cluster to its maximum utilization. With the cluster being 100 percent utilized, jobs complete faster, making way for other job sets to run on the cluster. With data centers aiming to get the most out of performance per watt, having an energy efficient Intel Xeon processor 5600 series provides cost benefits on a per node basis. In distributed workloads it is key to have high throughput network connections to handle workloads with large datasets. In the test, Intel Ethernet server adapters X520-DA2 based on 10GbE were able to achieve data rates of 3Gb/s during the workload execution. While compressing the data has advantages of substantial reduction in data transfer over the network, the time to complete increases compared to test runs without data being compressed. System administrators and application developers have to make the decision whether to enable data compression based on their specific requirements. Intel has published a set of guidelines on tuning Hadoop clusters which can be found at Using LZO based compression codecs may alleviate some of the bottlenecks found with default Zlib compression codecs. 33

34 For more information: Disclaimers Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See products/processor_number for details. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROP- ERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling , or by visiting Intel s Web site at Copyright 2012 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon inside, and Intel Intelligent Power Node Manager are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others.

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms EXECUTIVE SUMMARY Intel Cloud Builder Guide Intel Xeon Processor-based Servers Novell* Cloud Manager Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms Novell* Cloud Manager Intel

More information

IEEE1588 Frequently Asked Questions (FAQs)

IEEE1588 Frequently Asked Questions (FAQs) IEEE1588 Frequently Asked Questions (FAQs) LAN Access Division December 2011 Revision 1.0 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,

More information

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes 23 October 2014 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 Intel Debugger (IDB) is

More information

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure) Evolving Small Cells Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure) Intelligent Heterogeneous Network Optimum User Experience Fibre-optic Connected Macro Base stations

More information

LED Manager for Intel NUC

LED Manager for Intel NUC LED Manager for Intel NUC User Guide Version 1.0.0 March 14, 2018 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO

More information

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes Document number: 323804-001US 8 October 2010 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 What s New...

More information

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes 24 July 2014 Table of Contents 1 Introduction... 2 1.1 Product Contents... 2 1.2 System Requirements...

More information

Software Evaluation Guide for WinZip* esources-performance-documents.html

Software Evaluation Guide for WinZip* esources-performance-documents.html Software Evaluation Guide for WinZip* 14 http://www.intel.com/content/www/us/en/benchmarks/r esources-performance-documents.html INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

Intel vpro Technology Virtual Seminar 2010

Intel vpro Technology Virtual Seminar 2010 Intel Software Network Connecting Developers. Building Community. Intel vpro Technology Virtual Seminar 2010 Getting to know Intel Active Management Technology 6.0 Fast and Free Software Assessment Tools

More information

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing User s Guide Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2013 Intel Corporation All Rights Reserved Document

More information

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes Document number: 323804-002US 21 June 2012 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.2 Product Contents...

More information

Optimizing the operations with sparse matrices on Intel architecture

Optimizing the operations with sparse matrices on Intel architecture Optimizing the operations with sparse matrices on Intel architecture Gladkikh V. S. victor.s.gladkikh@intel.com Intel Xeon, Intel Itanium are trademarks of Intel Corporation in the U.S. and other countries.

More information

Highly accurate simulations of big-data clusters for system planning and optimization

Highly accurate simulations of big-data clusters for system planning and optimization White Paper Highly accurate simulations of big-data clusters for system planning and optimization Intel CoFluent Technology for Big Data Intel Rack Scale Design Using Intel CoFluent Technology for Big

More information

Data Center Energy Efficiency Using Intel Intelligent Power Node Manager and Intel Data Center Manager

Data Center Energy Efficiency Using Intel Intelligent Power Node Manager and Intel Data Center Manager Data Center Energy Efficiency Using Intel Intelligent Power Node Manager and Intel Data Center Manager Deploying Intel Intelligent Power Node Manager and Intel Data Center Manager with a proper power policy

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Installation Guide and Release Notes Document number: 321604-001US 19 October 2009 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 System Requirements... 2 1.3 Documentation... 3 1.4

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel Parallel Studio XE 2013 for Linux* Installation Guide and Release Notes Document number: 323804-003US 10 March 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.1.1 Changes since Intel

More information

INTEL PERCEPTUAL COMPUTING SDK. How To Use the Privacy Notification Tool

INTEL PERCEPTUAL COMPUTING SDK. How To Use the Privacy Notification Tool INTEL PERCEPTUAL COMPUTING SDK How To Use the Privacy Notification Tool LEGAL DISCLAIMER THIS DOCUMENT CONTAINS INFORMATION ON PRODUCTS IN THE DESIGN PHASE OF DEVELOPMENT. INFORMATION IN THIS DOCUMENT

More information

Bitonic Sorting Intel OpenCL SDK Sample Documentation

Bitonic Sorting Intel OpenCL SDK Sample Documentation Intel OpenCL SDK Sample Documentation Document Number: 325262-002US Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL

More information

Intel Desktop Board D945GCCR

Intel Desktop Board D945GCCR Intel Desktop Board D945GCCR Specification Update January 2008 Order Number: D87098-003 The Intel Desktop Board D945GCCR may contain design defects or errors known as errata, which may cause the product

More information

Intel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008*

Intel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008* Intel Cluster Toolkit Compiler Edition. for Linux* or Windows HPC Server 8* Product Overview High-performance scaling to thousands of processors. Performance leadership Intel software development products

More information

Intel Desktop Board DZ68DB

Intel Desktop Board DZ68DB Intel Desktop Board DZ68DB Specification Update April 2011 Part Number: G31558-001 The Intel Desktop Board DZ68DB may contain design defects or errors known as errata, which may cause the product to deviate

More information

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Document number: 323803-001US 4 May 2011 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.2 Product Contents...

More information

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes 22 January 2015 Table of Contents 1 Introduction... 2 1.1 Change History... 2 1.1.1 Changes

More information

How to Create a.cibd File from Mentor Xpedition for HLDRC

How to Create a.cibd File from Mentor Xpedition for HLDRC How to Create a.cibd File from Mentor Xpedition for HLDRC White Paper May 2015 Document Number: 052889-1.0 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Intel Cache Acceleration Software for Windows* Workstation

Intel Cache Acceleration Software for Windows* Workstation Intel Cache Acceleration Software for Windows* Workstation Release 3.1 Release Notes July 8, 2016 Revision 1.3 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Intel s Architecture for NFV

Intel s Architecture for NFV Intel s Architecture for NFV Evolution from specialized technology to mainstream programming Net Futures 2015 Network applications Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Intel Core TM i7-4702ec Processor for Communications Infrastructure

Intel Core TM i7-4702ec Processor for Communications Infrastructure Intel Core TM i7-4702ec Processor for Communications Infrastructure Application Power Guidelines Addendum May 2014 Document Number: 330009-001US Introduction INFORMATION IN THIS DOCUMENT IS PROVIDED IN

More information

Intel Desktop Board D945GCLF2

Intel Desktop Board D945GCLF2 Intel Desktop Board D945GCLF2 Specification Update July 2010 Order Number: E54886-006US The Intel Desktop Board D945GCLF2 may contain design defects or errors known as errata, which may cause the product

More information

Intel Desktop Board DP55SB

Intel Desktop Board DP55SB Intel Desktop Board DP55SB Specification Update July 2010 Order Number: E81107-003US The Intel Desktop Board DP55SB may contain design defects or errors known as errata, which may cause the product to

More information

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting Important OpenCL*-related Metrics with Intel GPA System Analyzer Introduction Intel SDK for OpenCL* Applications

More information

Cluster Setup. Table of contents

Cluster Setup. Table of contents Table of contents 1 Purpose...2 2 Pre-requisites...2 3 Installation...2 4 Configuration... 2 4.1 Configuration Files...2 4.2 Site Configuration... 3 5 Cluster Restartability... 10 5.1 Map/Reduce...10 6

More information

Using the Intel VTune Amplifier 2013 on Embedded Platforms

Using the Intel VTune Amplifier 2013 on Embedded Platforms Using the Intel VTune Amplifier 2013 on Embedded Platforms Introduction This guide explains the usage of the Intel VTune Amplifier for performance and power analysis on embedded devices. Overview VTune

More information

Intel Desktop Board D975XBX2

Intel Desktop Board D975XBX2 Intel Desktop Board D975XBX2 Specification Update July 2008 Order Number: D74278-003US The Intel Desktop Board D975XBX2 may contain design defects or errors known as errata, which may cause the product

More information

Intel Desktop Board D945GCLF

Intel Desktop Board D945GCLF Intel Desktop Board D945GCLF Specification Update July 2010 Order Number: E47517-008US The Intel Desktop Board D945GCLF may contain design defects or errors known as errata, which may cause the product

More information

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327281-001US

More information

Supra-linear Packet Processing Performance with Intel Multi-core Processors

Supra-linear Packet Processing Performance with Intel Multi-core Processors White Paper Dual-Core Intel Xeon Processor LV 2.0 GHz Communications and Networking Applications Supra-linear Packet Processing Performance with Intel Multi-core Processors 1 Executive Summary Advances

More information

How to Create a.cibd/.cce File from Mentor Xpedition for HLDRC

How to Create a.cibd/.cce File from Mentor Xpedition for HLDRC How to Create a.cibd/.cce File from Mentor Xpedition for HLDRC White Paper August 2017 Document Number: 052889-1.2 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Intel and Badaboom Video File Transcoding

Intel and Badaboom Video File Transcoding Solutions Intel and Badaboom Video File Transcoding Introduction Intel Quick Sync Video, built right into 2 nd generation Intel Core processors, is breakthrough hardware acceleration that lets the user

More information

Case Study: Optimizing King of Soldier* with Intel Graphics Performance Analyzers on Intel HD Graphics 4000

Case Study: Optimizing King of Soldier* with Intel Graphics Performance Analyzers on Intel HD Graphics 4000 Case Study: Optimizing King of Soldier* with Intel Graphics Performance Analyzers on Intel HD Graphics 4000 Intel Corporation: Cage Lu, Kiefer Kuah Giant Interactive Group, Inc.: Yu Nana Abstract The performance

More information

Intel Cache Acceleration Software (Intel CAS) for Linux* v2.9 (GA)

Intel Cache Acceleration Software (Intel CAS) for Linux* v2.9 (GA) Intel Cache Acceleration Software (Intel CAS) for Linux* v2.9 (GA) Release Notes June 2015 Revision 010 Document Number: 328497-010 Notice: This document contains information on products in the design

More information

Capture and Capitalize on Business Intelligence with Intel and IBM

Capture and Capitalize on Business Intelligence with Intel and IBM White Paper Processor E7 v3 Family Capture and Capitalize on Business Intelligence with Intel and IBM Quickly add context to critical data with running on the latest processor E7 v3 family Almost everything

More information

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache

More information

WHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD

WHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD Improve Hadoop Performance with Memblaze PBlaze SSD Improve Hadoop Performance with Memblaze PBlaze SSD Exclusive Summary We live in the data age. It s not easy to measure the total volume of data stored

More information

True Scale Fabric Switches Series

True Scale Fabric Switches Series True Scale Fabric Switches 12000 Series Order Number: H53559001US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,

More information

Inria, Rennes Bretagne Atlantique Research Center

Inria, Rennes Bretagne Atlantique Research Center Hadoop TP 1 Shadi Ibrahim Inria, Rennes Bretagne Atlantique Research Center Getting started with Hadoop Prerequisites Basic Configuration Starting Hadoop Verifying cluster operation Hadoop INRIA S.IBRAHIM

More information

Intel Cluster Ready Allowed Hardware Variances

Intel Cluster Ready Allowed Hardware Variances Intel Cluster Ready Allowed Hardware Variances Solution designs are certified as Intel Cluster Ready with an exact bill of materials for the hardware and the software stack. When instances of the certified

More information

Intel System Event Log (SEL) Viewer Utility. User Guide SELViewer Version 10.0 /11.0 February 2012 Document number: G

Intel System Event Log (SEL) Viewer Utility. User Guide SELViewer Version 10.0 /11.0 February 2012 Document number: G Intel System Event Log (SEL) Viewer Utility User Guide SELViewer Version 10.0 /11.0 February 2012 Document number: G24422-003 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel

More information

Lustre Beyond HPC. Presented to the Lustre* User Group Beijing October 2013

Lustre Beyond HPC. Presented to the Lustre* User Group Beijing October 2013 Lustre Beyond HPC Presented to the Lustre* User Group Beijing October 2013 Brent Gorda General Manager High Performance Data Division, Intel Corpora:on Agenda From Whamcloud to Intel Today s Storage Challenges

More information

Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod

Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod http://www.intel.com/performance/resources Version 2008-09 Rev. 1.0 Information in this document

More information

Intel Desktop Board DH55TC

Intel Desktop Board DH55TC Intel Desktop Board DH55TC Specification Update December 2011 Order Number: E88213-006 The Intel Desktop Board DH55TC may contain design defects or errors known as errata, which may cause the product to

More information

Introduction. How it works

Introduction. How it works Introduction Connected Standby is a new feature introduced by Microsoft in Windows 8* for SOC-based platforms. The use case on the tablet/mobile systems is similar to that on phones like Instant ON and

More information

Extremely Fast Distributed Storage for Cloud Service Providers

Extremely Fast Distributed Storage for Cloud Service Providers Solution brief Intel Storage Builders StorPool Storage Intel SSD DC S3510 Series Intel Xeon Processor E3 and E5 Families Intel Ethernet Converged Network Adapter X710 Family Extremely Fast Distributed

More information

Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3

Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3 Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3 Addendum May 2014 Document Number: 329174-004US Introduction INFORMATION IN THIS

More information

Intel vpro Technology Virtual Seminar 2010

Intel vpro Technology Virtual Seminar 2010 Intel Software Network Connecting Developers. Building Community. Intel vpro Technology Virtual Seminar 2010 Getting to know Intel Active Management Technology 6.0 Intel Active Management Technology (AMT)

More information

Intel Rack Scale Architecture. using Intel Ethernet Multi-host Controller FM10000 Family

Intel Rack Scale Architecture. using Intel Ethernet Multi-host Controller FM10000 Family white paper Intel Rack Scale Architecture using Intel Multi-host FM10000 Family Introduction Hyperscale data centers are being deployed with tens of thousands of servers making operating efficiency a key

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Installation Guide and Release Notes Document number: 321604-002US 9 July 2010 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 What s New... 2 1.3 System Requirements... 2 1.4 Documentation...

More information

Intel Desktop Board DG41CN

Intel Desktop Board DG41CN Intel Desktop Board DG41CN Specification Update December 2010 Order Number: E89822-003US The Intel Desktop Board DG41CN may contain design defects or errors known as errata, which may cause the product

More information

Device Firmware Update (DFU) for Windows

Device Firmware Update (DFU) for Windows Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY

More information

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 325262-002US Revision: 1.3 World Wide Web: http://www.intel.com Document

More information

Intel Core TM Processor i C Embedded Application Power Guideline Addendum

Intel Core TM Processor i C Embedded Application Power Guideline Addendum Intel Core TM Processor i3-2115 C Embedded Application Power Guideline Addendum August 2012 Document Number: 327874-001US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO

More information

Theory and Practice of the Low-Power SATA Spec DevSleep

Theory and Practice of the Low-Power SATA Spec DevSleep Theory and Practice of the Low-Power SATA Spec DevSleep Steven Wells Principal Engineer NVM Solutions Group, Intel August 2013 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Intel Parallel Amplifier Sample Code Guide

Intel Parallel Amplifier Sample Code Guide The analyzes the performance of your application and provides information on the performance bottlenecks in your code. It enables you to focus your tuning efforts on the most critical sections of your

More information

Intel Atom Processor E3800 Product Family Development Kit Based on Intel Intelligent System Extended (ISX) Form Factor Reference Design

Intel Atom Processor E3800 Product Family Development Kit Based on Intel Intelligent System Extended (ISX) Form Factor Reference Design Intel Atom Processor E3800 Product Family Development Kit Based on Intel Intelligent System Extended (ISX) Form Factor Reference Design Quick Start Guide March 2014 Document Number: 330217-002 Legal Lines

More information

Interrupt Swizzling Solution for Intel 5000 Chipset Series based Platforms

Interrupt Swizzling Solution for Intel 5000 Chipset Series based Platforms Interrupt Swizzling Solution for Intel 5000 Chipset Series based Platforms Application Note August 2006 Document Number: 314337-002 Notice: This document contains information on products in the design

More information

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ. Big Data Programming: an Introduction Spring 2015, X. Zhang Fordham Univ. Outline What the course is about? scope Introduction to big data programming Opportunity and challenge of big data Origin of Hadoop

More information

Software Evaluation Guide for WinZip 15.5*

Software Evaluation Guide for WinZip 15.5* Software Evaluation Guide for WinZip 15.5* http://www.intel.com/performance/resources Version 2011-06 Rev. 1.1 Information in this document is provided in connection with Intel products. No license, express

More information

Intel Desktop Board DG31PR

Intel Desktop Board DG31PR Intel Desktop Board DG31PR Specification Update May 2008 Order Number E30564-003US The Intel Desktop Board DG31PR may contain design defects or errors known as errata, which may cause the product to deviate

More information

Software Evaluation Guide for Microsoft* Office Excel* 2007

Software Evaluation Guide for Microsoft* Office Excel* 2007 Software Evaluation Guide for Microsoft* Office Excel* 2007 http://www.intel.com/performance/resources Version 2007-01 Rev 1.0 Performance tests and ratings are measured using specific computer systems

More information

Software Evaluation Guide for Microsoft* Office Excel* 2007

Software Evaluation Guide for Microsoft* Office Excel* 2007 Software Evaluation Guide for Microsoft* Office Excel* 2007 http://www.intel.com/performance/resources Version 2007-01 Rev 1.0 About this Document This document is a guide measuring performance of the

More information

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs Jack Zhang yuan.zhang@intel.com, Cloud & Enterprise Storage Architect Santa Clara, CA 1 Agenda Memory Storage Hierarchy

More information

Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors

Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors Application Note May 2008 Order Number: 319801; Revision: 001US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH

More information

Intel Desktop Board DG41RQ

Intel Desktop Board DG41RQ Intel Desktop Board DG41RQ Specification Update July 2010 Order Number: E61979-004US The Intel Desktop Board DG41RQ may contain design defects or errors known as errata, which may cause the product to

More information

Intel Desktop Board D946GZAB

Intel Desktop Board D946GZAB Intel Desktop Board D946GZAB Specification Update Release Date: November 2007 Order Number: D65909-002US The Intel Desktop Board D946GZAB may contain design defects or errors known as errata, which may

More information

Intel Media Server Studio 2017 R3 Essentials Edition for Linux* Release Notes

Intel Media Server Studio 2017 R3 Essentials Edition for Linux* Release Notes Overview What's New Intel Media Server Studio 2017 R3 Essentials Edition for Linux* Release Notes System Requirements Package Contents Installation Installation Folders Known Limitations Legal Information

More information

Intel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012

Intel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012 Intel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012 Document Number: 326673-001 Background INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Intel Simple Network Management Protocol (SNMP) Subagent v8.0

Intel Simple Network Management Protocol (SNMP) Subagent v8.0 Intel Simple Network Management Protocol (SNMP) Subagent v8.0 User Guide June 2017 Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,

More information

Software Evaluation Guide for Photodex* ProShow Gold* 3.2

Software Evaluation Guide for Photodex* ProShow Gold* 3.2 Software Evaluation Guide for Photodex* ProShow Gold* 3.2 http://www.intel.com/performance/resources Version 2007-12 Rev. 1.0 Information in this document is provided in connection with Intel products.

More information

Intel Atom Processor E6xx Series Embedded Application Power Guideline Addendum January 2012

Intel Atom Processor E6xx Series Embedded Application Power Guideline Addendum January 2012 Intel Atom Processor E6xx Series Embedded Application Power Guideline Addendum January 2012 Document Number: 324956-003 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures

PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures Solovev S. A, Pudov S.G sergey.a.solovev@intel.com, sergey.g.pudov@intel.com Intel Xeon, Intel Core 2 Duo are trademarks of

More information

Intel Core vpro Processors Common-Use Guide

Intel Core vpro Processors Common-Use Guide Intel Core vpro Processors Common-Use Guide For LabTech Software* (Revision 1.1 December 6, 2011) Table of Contents Introduction... 3 Setup and Assumptions... 3 Common Use Cases Covered in this Guide...

More information

Bosch Rexroth* Innovates Sercos SoftMaster* for the Industrial PC Platform with the Intel Ethernet Controller I210

Bosch Rexroth* Innovates Sercos SoftMaster* for the Industrial PC Platform with the Intel Ethernet Controller I210 Solution brief Bosch Rexroth* Innovates Sercos SoftMaster* for the Industrial PC Platform with the Intel Ethernet Controller I210 TenAsys* INtime* real-time software enables the solution on a standard,

More information

Hadoop Virtualization Extensions on VMware vsphere 5 T E C H N I C A L W H I T E P A P E R

Hadoop Virtualization Extensions on VMware vsphere 5 T E C H N I C A L W H I T E P A P E R Hadoop Virtualization Extensions on VMware vsphere 5 T E C H N I C A L W H I T E P A P E R Table of Contents Introduction... 3 Topology Awareness in Hadoop... 3 Virtual Hadoop... 4 HVE Solution... 5 Architecture...

More information

Intel vpro Technology Virtual Seminar 2010

Intel vpro Technology Virtual Seminar 2010 Intel Software Network Connecting Developers. Building Community. Intel vpro Technology Virtual Seminar 2010 Getting to know Intel Active Management Technology 6.0 Remote Encryption Management Andy Schiestl

More information

Chapter 5. The MapReduce Programming Model and Implementation

Chapter 5. The MapReduce Programming Model and Implementation Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing

More information

Intel Thread Checker 3.1 for Windows* Release Notes

Intel Thread Checker 3.1 for Windows* Release Notes Page 1 of 6 Intel Thread Checker 3.1 for Windows* Release Notes Contents Overview Product Contents What's New System Requirements Known Issues and Limitations Technical Support Related Products Overview

More information

Intel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003

Intel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003 Intel 848P Chipset Specification Update Intel 82848P Memory Controller Hub (MCH) August 2003 Notice: The Intel 82848P MCH may contain design defects or errors known as errata which may cause the product

More information

Movidius Neural Compute Stick

Movidius Neural Compute Stick Movidius Neural Compute Stick You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to

More information

Drive Recovery Panel

Drive Recovery Panel Drive Recovery Panel Don Verner Senior Application Engineer David Blunden Channel Application Engineering Mgr. Intel Corporation 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Intel Desktop Board DH61SA

Intel Desktop Board DH61SA Intel Desktop Board DH61SA Specification Update December 2011 Part Number: G52483-001 The Intel Desktop Board DH61SA may contain design defects or errors known as errata, which may cause the product to

More information

Intel G31/P31 Express Chipset

Intel G31/P31 Express Chipset Intel G31/P31 Express Chipset Specification Update For the Intel 82G31 Graphics and Memory Controller Hub (GMCH) and Intel 82GP31 Memory Controller Hub (MCH) February 2008 Notice: The Intel G31/P31 Express

More information

Hadoop/MapReduce Computing Paradigm

Hadoop/MapReduce Computing Paradigm Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications

More information

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c White Paper Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c What You Will Learn This document demonstrates the benefits

More information

SGI Hadoop Based on Intel Xeon Processor E5 Family. Getting Started Guide

SGI Hadoop Based on Intel Xeon Processor E5 Family. Getting Started Guide SGI Hadoop Based on Intel Xeon Processor E5 Family Getting Started Guide 007-5875-001 COPYRIGHT 2013 Silicon Graphics International Corp. All rights reserved; provided portions may be copyright in third

More information

Intel Integrator Toolkit

Intel Integrator Toolkit Intel Integrator Toolkit User Guide Version 6.1.8 May 4, 2018 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY

More information

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage A Dell Technical White Paper Dell Database Engineering Solutions Anthony Fernandez April 2010 THIS

More information

Upgrading Intel Server Board Set SE8500HW4 to Support Intel Xeon Processors 7000 Sequence

Upgrading Intel Server Board Set SE8500HW4 to Support Intel Xeon Processors 7000 Sequence Upgrading Intel Server Board Set SE8500HW4 to Support Intel Xeon Processors 7000 Sequence January 2006 Enterprise Platforms and Services Division - Marketing Revision History Upgrading Intel Server Board

More information