Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms
|
|
- Millicent Fletcher
- 6 years ago
- Views:
Transcription
1 Intel Cloud Builders Guide Intel Xeon Processor-based Servers Apache* Hadoop* Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Apache* Hadoop* Intel Xeon Processor 5600 Series Audience and Purpose This reference architecture is for companies who are looking to build their own cloud computing infrastructure, including both enterprise IT organizations and cloud service providers or cloud hosting providers. The decision to use a cloud for the delivery of IT services is best done by starting with the knowledge and experience gained from previous work. This reference architecture gathers into one place the essentials of a Apache* Hadoop* cluster build out complete with benchmarking using TeraSort workload. This paper defines easy to use steps to replicate the deployment at your data center lab environment. The installation is based on Intel -powered servers and creates a multi node, optimized Hadoop environment. The reference architecture contains details on the Hadoop topology, hardware and software deployed, installation and configuration steps, and tests for real-world use cases that should significantly reduce the learning curve for building and operating your first Hadoop infrastructure. It is not expected that this paper can be used as-is. For example, adapting to an existing network and identifying specific management requirements are out of scope for this paper. Therefore, it is expected that the user of this paper will make significant adjustments as required to the design presented in order to meet their specific requirements of their own data center or lab environment. This paper also assumes that the reader has basic knowledge of computing infrastructure components and services. Intermediate knowledge of Linux* operating system, Python*, Hadoop framework and basic system administration skills is assumed. February 2012
2 Table of Contents Executive Summary... 3 Hadoop* Overview... 3 Hadoop System Architecture... 4 Operation of a Hadoop Cluster... 5 TeraSort Workload...7 TeraSort Workflow... 7 Test Methodology... 7 Intel Benchmark Install and Test Tool (Intel BITT)... 8 Intel BITT Benefits... 8 Configuring the Setups... 8 Running TeraSort Results...24 Conclusion
3 Executive Summary Map reduce technology is gaining popularity among enterprises for a variety of large-scale data intensive jobs. Map reduce based on Apache* Hadoop* is rapidly emerging as a technology preferred for big data processing and management. Enterprises are deploying commodity standard server clusters and using business intelligence tools along with Apache Hadoop to obtain high performing solutions for their large scale data processing requirements. Motivation to deploy Hadoop comes from the fact that enterprises are gathering huge unstructured data sets generated by their business processes, which enterprises are looking to exploit to get the most value out of this data to help them in the decision making process. Hadoop infrastructure moves data closer to compute to achieve high processing throughput. In this paper we tried to create a small commodity server cluster based on an Apache Hadoop distribution and ran sort benchmark to get data on how fast the cluster can process data. This reference architecture will give Figure 1: Hadoop* stack an understanding on how to set up the cluster, tune parameters, and run sort benchmark. This reference architecture provides a blue print for building a cluster with Intel Xeon processor based standard server platforms and the open source Apache Hadoop distribution. The paper further describes parameters for tuning and execution of sort benchmark to measure performance. Hadoop* Overview Apache Hadoop is a framework for running applications on large cluster built using standard hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both MapReduce and the Hadoop Distributed File System (HDFS) are designed so that node failures are automatically tolerated by the framework. Hadoop framework consists of three major components: Common: Hadoop Common is a set of utilities that support the Hadoop subprojects. Hadoop Common includes FileSystem, RPC, and serialization libraries. HDFS: The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on lowcost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS can stream file system data. MapReduce: MapReduce was first developed by Google to process large datasets. MapReduce has two functions, map and reduce, and a framework for running a large number of instances of these programs on commodity hardware. The map function reads a set of records from an input file, processes these records, and outputs a set of intermediate records. As part of the map function, a split function distributes the intermediate records across many buckets using a hash function. The reduce function then processes the intermediate records. The MapReduce Framework consists of a single master JobTracker and one slave TaskTracker per cluster node. The master is responsible for scheduling the jobs' component tasks on the slaves, monitoring them, and re-executing the failed tasks. The slaves execute the tasks as directed by the master. 3
4 Hadoop* System Architecture Hadoop framework works on the principle of "moving compute closer to the data." Figure 2 shows typical deployment of Hadoop framework on multiple standard server nodes. The computation occurs on the same node where data resides, which enables Hadoop to deliver better performance compared to storing data on the network. A combination of standard server platforms and Hadoop infrastructure provide a cost efficient and high performance platform for dataparallel applications. Each Hadoop cluster has one Master Node and multiple slave nodes. The Master node runs NameNode and JobTracker functions, coordinating with slave nodes to get the job fed to the cluster completed. The SlaveNodes run TaskTracker, HDFS to store the data, and have Map and Reduce functions which perform the data computations. Figure 2: Hadoop* deployment on standard server nodes 4
5 Operation of a Hadoop* Cluster Figure 3 shows the operation of a Hadoop cluster. The client submits the job to the Master node which acts as an orchestrator with the Slave nodes to complete the job. The JobTracker on the Master node is responsible for controlling the MapReduce job. The slaves run TaskTracker which keeps track of the MapReduce job, reporting the job status to the JobTracker on frequent intervals. In an event of a task failure, the JobTracker reschedules the task on the same slave node or a different slave node. HDFS is a location aware or rack aware file system which primarily manages data in a Hadoop cluster. HDFS replicates the data on various nodes in the cluster to attain data reliability; however, HDFS has a single point of failure in NameNode function. If the NameNode fails the file system and data become inaccessible. Since the JobTracker assigns the data to slave nodes, JobTracker is aware of the data location and efficiently schedules the task where the data is residing, thus decreasing the need to move data from one node to other and saving network bandwidth. Once the map function is complete, the data is transferred to different node to perform reduce function. MapReduce framework provides an efficient way to scale the size of the cluster by adopting modular scaleout strategy. The nodes are scaled out by adding one or more nodes with HDFS and MapReduce functions supporting new nodes as they are added. Figure 3: Operation of Hadoop* cluster 5
6 Cluster hardware setup: Total 17 nodes in the cluster. One Master node and 16 Slave nodes. Data Network: Arista 7124 switch connected to Intel Ethernet Server Adapter X520-DA2 dual 10GbE NIC on every node. Each server has an internal private Intel dual 1GbE NIC connected to a top-of-rack switch that is used for management tasks. Each node has a disk enclosure populated with SATA II 7.2K, 2TB hard disk drives for a total of 24TBs of raw storage per hard disk enclosure. Dual socket Intel 5520 Chipset platform. Two Intel Xeon processor X5680 at 3.33GHz, 12MB cache. 48GB 1333MHz DDR3 memory Red Hat Enterprise Linux* 6.0 (RHEL 6.0)(Kernel: el6..x86_64) Hadoop* Framework v Figure 4: Cluster hardware setup 6
7 TeraSort Workload TeraSort is a popular Hadoop benchmarking workload. The 1TB limit is not a hard-set limit since TeraSort allows the user to sort any size of dataset by changing various parameters. TeraSort benchmark tests HDFS and MapReduce functions in the Hadoop cluster. TeraSort is part of the Hadoop framework and is part of the standard Apache Hadoop installation package. TeraSort is widely used to benchmark and tune large Hadoop clusters with hundreds of nodes. TeraSort works in two steps: TeraGen: This generates random data based on the dataset size set by the user. This dataset is used as input data for the sort benchmark. TeraSort: TeraSort sorts the input data generated by TeraGen and stores the output data on HDFS. An optional third step, called TeraValidate, allows validation of the sorted data. This paper does not discuss this optional third step. TeraSort Workflow Figure 5 shows the workflow of the TeraSort workload tested on our cluster. The flow chart depicts the start of the workload at one control node with one master node kick starting the job and 16 slave nodes dividing 8192 map tasks. Once the map phase is complete, the cluster starts the reduce phase with 243 tasks. When the reduce phase is completed, the data output is stored on the file system. Test Methodology To run the workload we used an Intel Benchmark Install and Test Tool (Intel BITT. The workload was scripted to kickstart the job on the cluster, run TeraGen to generate the test data, and run the TeraSort task to sort the generated data. The scrip also kicks off a series of counters on the slave nodes to gather performance metrics on each of the nodes. Key hardware metrics such as processor utilization, network bandwidth consumption, memory utilization, and disk bandwidth consumption is captured on each node at 30 second intervals. Once the job is complete, the counters are stopped on all slave nodes and the log files containing performance data are copied to the master node for calculating utilization of the cluster. This data is plotted into graphs using gnuplot and presented for further analysis. Also we noted the time taken to complete the job taken from the Hadoop management user interface. The lower the time measurement the better the performance. Figure 5: TeraSort workflow 7
8 Intel Benchmark Install and Test Tool Intel Benchmark Install and Test Tool (Intel BITT) provides tools to install, configure, run, and analyze benchmark programs on small test clusters. The installcli tool is used to install tar files on a cluster. moncli is used to monitor performance of the cluster nodes and provides options to start monitoring, stop monitoring, and generate CPU, disk I/O, memory, and network performance plots for the nodes and cluster. hadoopcli provides an automated Hadoop test environment. The Intel BITT templates enable configurable plot generation. Intel BITT command scripts enable configurable scripts to control monitoring actions. Benchmark configuration is implemented by using XML files. Configurable properties include the location of installation, monitoring directories, monitoring sampling duration, the list of the cluster nodes, and the list of the tar files that need to be installed. Intel BITT is implemented by using Python* and uses gnuplot to generate performance plots. Intel BITT currently runs on Linux*. Intel BITT Features Intel Benchmark Install and Test Tool provides the following tools: installcli: Used to install a specified list of tar files to a specified list of nodes moncli: Used to monitor performance metrics locally and/or remotely. It can be used to monitor the performance of a cluster. The tool currently supports sar and iostat monitoring tools. hadoopcli: Used to install, configure, and test Hadoop clusters. Intel BITT is implemented in an object oriented fashion. It can be extended to support other performance monitoring tools such as vmstat and mpstat if it is needed. The toolkit includes the following building blocks: XML parser: Parses the XML properties including name, value, and description fields. The install and monitor configuration is defined by using XML properties. Tool specific options are passed through command line options. Log file parser: Log files in the form of tables which contains rows and columns are parsed and CSV files are generated for each column. The column items on each row are separated using whitespace. The column header names are used to create CSV file names. Plot generator: gnuplot is used to plot the contents of the CSV files by using templates. The templates define the list of CSV files that are used as inputs to generate the plots. The templates also define labels and titles of the plots. Sar monitoring tool Iostat monitoring tool VTuneTM monitoring tool Emon monitoring tool installcli is used to install Intel BITT moncli is used to monitor local or cluster nodes hadoopcli is implemented by using the building blocks defined above and it is used to create and test Hadoop clusters Configuring the Setup We installed RHEL 6.0 on all 17 nodes with the default configuration and configured passphraseless SSH access between the nodes to enable them to communicate without having to login with a password every time there is a transaction between them. 1. Install Intel BITT tar file Cd mkdir bitt cp bitt-1.0.tar bitt cd bitt/bitt-2.0 The following is the list of subdirectories under Intel BITT home: cmd conf samples scripts templates 8
9 2. Create a release directory under Intel BITT home to copy tar files. mkdir p bitt/bitt-1.0/release cp bitt-1.0.tar bitt/bitt-1.0/release You can also download and copy the Hadoop tar file to the release directory as well if you are planning to test Hadoop. cp hadoop tar.gz ~/bitt/bitt-1.0/release 3. Download jdk and create a tar file from the installed jdk tar. For example: mkdir jdk cp jdk-6u23-linux-x64.bin jdk cd jdk chmod +x jdk-6u23-linux-x64.bin./jdk-6u23-linux-x64.bin rm jdk-6u23-linux-x64.bin tar -cvf ~/bitt/bitt-1.0/release/jdk1.6.0_23.tar 4. Download gnuplot and create a tar file from the installed gnuplot tree. For example: mkdir myinstall cp gnuplot rc1.tar myinstall cd myinstall/ tar -xvf gnuplot rc1.tar mkdir p install/ gnuplot cd gnuplot rc1./configure --prefix=/home/<user>/myinstall/install/ gnuplot make make install cd../install tar -cvf ~/bitt/bitt-1.0/release/gnuplot tar. 5. Download Python and create a tar file from the installed python tar for your platform. For example: mkdir myinstall cp Python tgz myinstall cd myinstall/ tar -xvf Python tgz mkdir p install/ Python cd Python /configure --prefix=/home/<user>/myinstall/install/ Python make make install cd../install tar -cvf ~/bitt/bitt-1.0/release/ Python tar. 9
10 6. Run TeraSort. For example: Run terasort.sh. You need to update the corresponding configuration files as described below. cd ~/bitt/bitt-1.0/conf install gnuplot on your client system install python on your client system Make sure python3 and gnuplot are on your path on the client system cd ~/bitt/bitt-1.0/scripts./terasort.sh 7. Configuration file edits. All configuration files are found under ~/bitt/bitt-1.0/conf a. hadoopnodelist: Configuration file which contains cluster nodes. Any addition or removal of nodes from the cluster should register here to be recognized by the load generator tool. node1.domain.com node2.domain.com node3.domain.com node4.domain.com.. node17.domain.com b. hadooptarlist: Configuration file where the executable are installed.../release/bitt-1.0.tar.gz../release/python-3.2.tar.gz../release/jdk1.6.0_25.tar.gz../release/hadoop tar.gz../release/gnuplot tar.gz 10
11 c. hadoop-env.sh: Main Hadoop environment configuration file. # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun # Extra Java CLASSPATH elements. Optional. # export HADOOP_CLASSPATH= # The maximum amount of heap to use, in MB. Default is # export HADOOP_HEAPSIZE=2000 # Extra Java runtime options. Empty by default. # export HADOOP_OPTS=-server # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management. jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" 11
12 d. hadoopcloudconf.xml: Custom XML configuration file used to define key parameters on how the test is executed and where the data is stored. <?xml version="1.0" encoding="utf-8"?> <configuration> <name>cloudtemplateloc</name> <value>/home/hadoop/bitt/bitt-1.0/conf</value> <description>cloud conf template file location</description> <name>cloudtemplatevars</name> <value>all</value> <description>the list of template variables to copy</description> <name>jobtrackerport</name> <value>8021</value> <description>jobtracker port</description> <name>namenodeport</name> <value>8020</value> <description>jobtracker port</description> <name>cloudconfdir</name> <value>/tmp/hadoopconf</value> <description>generated cloud conf file</description> <name>cloudtmpdir</name> <value>hadoop-${user.name}</value> <description>cloud tmp dir</description> <name>cloudinstalldir</name> <value>/usr/local/hadoop/install</value> <description>cloud install dir</description> <name>cloudnodelist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadoopnodelist</value> <description>cluster nodes</description> <name>monnodelist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadoopmonnodelist</value> 12
13 <description>cluster monitor nodes</description> <name>cloudtarlist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadooptarlist</value> <description>cluster nodes</description> <name>moninterval</name> <value>30</value> <description>sampling duration</description> <name>moncount</name> <value>0</value> <description>number of samples</description> <name>monresults</name> <value>/tmp/monhadres</value> <description>cloud monitor log files location</description> <name>monsummary</name> <value>/tmp/monhadsum</value> <description>cloud monitor log files location</description> <name>mondir</name> <value>/tmp/monhadloc</value> <description>cloud monitor log files location</description> <name>gnucmd</name> <value>/usr/local/hadoop/install/gnuplot-4.4.3/bin/gnuplot</value> <description>none</description> </configuration> 13
14 e. hdfs-site-template.xml: Hadoop configuration file where HDFS parameters are set. Please note the optimizations values we used to run the test are shown in bold font. <?xml version="1.0" encoding="utf-8"?> <!-- Put site-specific property overrides in this file. --> <configuration> <name>dfs.replication</name> <value>3</value> <description>default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> <name>dfs.datanode.max.xcievers</name> <value>655360</value> <description>number of files Hadoop serves at one time</description> <name>dfs.data.dir</name> <value>/mnt/disk1/hdfs/data,/mnt/disk2/hdfs/data,/mnt/disk3/hdfs/data,/mnt/disk4/hdfs/data,/mnt/disk5/hdfs/data,/mnt/ disk6/hdfs/data,/mnt/disk7/hdfs/data,/mnt/disk8/hdfs/data,/mnt/disk9/hdfs/data,/mnt/disk10/hdfs/data,/mnt/disk11/hdfs/ data,/mnt/disk12/hdfs/data</value> <description>determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> <name>dfs.block.size</name> <value> </value> <description>the default block size for new files.</description> <name>io.file.buffer.size</name> <value>131072</value> <description> </description> 14
15 <name>ipc.server.tcpnodelay</name> <value>true</value> <description> </description> <name>ipc.client.tcpnodelay</name> <value>true</value> <description> </description> <name>dfs.namenode.handler.count</name> <value>40</value> <description> </description> <name>io.sort.factor</name> <value>100</value> <description> </description> <name>io.sort.mb</name> <value>220</value> <description> </description> </configuration> 15
16 f. mapred-site-template.xml: Hadoop configuration file which defines key MapReduce parameters. Values used in our testing are highlighted in bold font. <?xml version="1.0" encoding="utf-8"?> <!-- Put site-specific property overrides in this file. --> <configuration> <name>mapred.tasktracker.map.tasks.maximum</name> <value>24</value> <description>the maximum number of map tasks that will be run simultaneously by a task tracker. </description> <name>io.sort.record.percent</name> <value>0.3</value> <description>added as per ssg reco </description> <name>io.sort.spill.percent</name> <value>0.9</value> <description>addded as per ssg reco </description> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>12</value> <description>the maximum number of reduce tasks that will be run simultaneously by a task tracker. </description> <name>mapred.reduce.tasks</name> <value>64</value> <description>the default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Assume 10 nodes, 10*2-2 </description> <name>mapred.local.dir</name> <value>/mnt/disk1/hdfs/mapred,/mnt/disk2/hdfs/mapred,/mnt/disk3/hdfs/ mapred,/mnt/disk4/hdfs/mapred,/mnt/disk5/hdfs/mapred,/mnt/disk6/hdfs/ mapred,/mnt/disk7/hdfs/mapred,/mnt/disk8/hdfs/mapred,/mnt/disk9/hdfs/ mapred,/mnt/disk10/hdfs/mapred,/mnt/disk11/hdfs/mapred,/mnt/disk12/hdfs/ 16
17 mapred</value> <description>the local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. </description> <name>mapred.child.java.opts</name> <value>-xmx2048m -Djava.net.preferIPv4Stack=true</value> <description>java opts for the task tracker child processes. The following symbol, if present, will be is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc The configuration variable mapred.child.ulimit can be used to control the maximum virtual memory of the child processes. </description> <name>mapred.output.compress</name> <value>false</value> <description>should the job outputs be compressed? </description> <name>mapred.compress.map.output</name> <value>false</value> <description>should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression. </description> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.defaultcodec</value> <description>if the job outputs are compressed, how should they be compressed? </description> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.defaultcodec</value> <description>if the job outputs are compressed, how should they be compressed? </description> 17
18 <name>mapred.map.tasks.speculative.execution</name> <value>true</value> <description> </description> <name>mapred.reduce.tasks.speculative.execution</name> <value>true</value> <description> </description> <name>mapred.job.reuse.jvm.num.tasks</name> <value>1</value> <description> </description> <name>mapred.reduce.parallel.copies</name> <value>20</value> <description> </description> <name>mapred.min.split.size</name> <value>65536</value> <description> </description> <name>mapred.reduce.copy.backoff</name> <value>5</value> <description> </description> <name>mapred.job.shuffle.merge.percent</name> <value>0.7</value> <description> </description> <name>mapred.job.shuffle.input.buffer.percent</name> <value>0.66</value> <description> </description> <name>mapred.job.reduce.input.buffer.percent</name> <value>0.90</value> <description> </description> </configuration> 18
19 g. hadoop-terasort.xml: Intel BITT configuration file from which the parameters are read before the test runs. Parameters in this configuration file override values in the other configuration files mentioned above. This configuration file helps to quickly change the parameter values for different test runs without editing individual configuration files. <?xml version="1.0" encoding="utf-8"?> <configuration> <name>mapred.map.tasks</name> <value>8192</value> <description> Total Map task number </description> <name>mapred.reduce.tasks</name> <value>243</value> <description> Total Reduce task number </description> <name>dfs.replication</name> <value>3</value> <description> Number of copies to replicate </description> <name>mapred.compress.map.output</name> <value>true</value> <description> compress map output </description> <name>mapred.job.shuffle.input.buffer.percent</name> <value>0.66</value> <description> none </description> <name>datasetsizesmall</name> <value> </value> <description> Total Record Number about 1T data. 1 record has 100 bytes </ description> <name>datasetsize</name> <value> </value> <description> Total Record Number about 1T data. 1 record has 100 bytes </ description> 19
20 <name>datasetname</name> <value>tera</value> <description> none </description> <name>outputdataname</name> <value>tera-sort2</value> <description> none </description> <name>jarfile</name> <value>hadoop examples.jar</value> <description> none </description> </configuration> 20
21 Running TeraSort TeraSort can be started by running terasort.sh. The script runs various commands involved in starting the test, starting performance counters, ending the test, and gathering performance counter data for analysis. Below is the list of commands executed when the script is running, and a brief explanation on what the command does. #!/usr/bin/env bash ########################################################### #Intel Benchmark Install and Test Tool (BITT) Use Cases #Typical sequence for hadoop terasort benchmark: ########################################################### echo "START: terasort benchmark..." date # Stop any current running test on the cluster.../scripts/hadoopcli -a stop -c../conf/hadoopcloudconf.xml # Kill Java* processes on the nodes./runkill.sh # Install fresh copy of executables on the slave nodes.../scripts/hadoopcli -a install -c../conf/hadoopcloudconf.xml # Format the HDFS to store the data../scripts/hadoopcli -a format -c../conf/hadoopcloudconf.xml # Start Java processes on all slave nodes.../scripts/hadoopcli -a start -c../conf/hadoopcloudconf.xml # 2 minutes delay to get the processes started on the slave nodes. sleep 120 # Generate 1TB of data which will be used for sorting.../scripts/hadoopcli -a data -c../conf/hadoopcloudconf.xml # Create monitoring directories.../scripts/moncli -r clean -c../conf/hadoopcloudconf.xml # Start iostat utility to monitor disk usage on the slave nodes.../scripts/moncli -m iostat -a run -c../conf/hadoopcloudconf.xml -s run_iostat.sh 21
22 # Start sar utility on all the slave nodes to monitor CPU, network, and memory utilization../scripts/moncli -m sar -a run -c../conf/hadoopcloudconf.xml -s run_sar2.sh # Start the sort activity on the 1TB data generated in the earlier step.../scripts/hadoopcli -a run -c../conf/hadoopcloudconf.xml # Stop sar monitoring utility.../scripts/moncli -m sar -a kill -s run_sar_kill.sh -c../conf/hadoopcloudconf.xml # Stop sar monitoring utility.../scripts/moncli -m sar -a kill -s run_sar_kill.sh -c../conf/hadoopcloudconf.xml # Stop iostat utility.../scripts/moncli -m iostat -a kill -s run_iostat_kill.sh -c../conf/hadoopcloudconf.xml # Convert iostat generated data to CSV file format.../scripts/moncli -m iostat -a csv -c../conf/hadoopcloudconf.xml # Convert data generated from sar utility to CSV format.../scripts/moncli -m sar -a csv -c../conf/hadoopcloudconf.xml -s run_sar_gen.sh # Using gnuplot to generate image containing graph of iostat data.../scripts/moncli -m iostat -a plot -t iostat -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing CPU graph from sar data.../scripts/moncli -m sar -a plot -t cpu -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing memory graph from sar data.../scripts/moncli -m sar -a plot -t mem -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing network graph from sar data.../scripts/moncli -m sar -a plot -t nw -c../conf/hadoopcloudconf.xml # Archive logfiles on all the slave nodes.../scripts/moncli -r tar -c../conf/hadoopcloudconf.xml 22
23 # Copy archived logfiles from slave nodes to master node.../scripts/moncli -r collect -c../conf/hadoopcloudconf.xml # Stop running processes on slave nodes.../scripts/hadoopcli -a stop -c../conf/hadoopcloudconf.xml # Creates folder called cluster in head node /tmp/monhadsum.../scripts/moncli -r cluster -c../conf/hadoopcloudconf.xml # Calculate average CPU utilization of the cluster.../scripts/moncli -r average -m sar -t cpu -c../conf/hadoopcloudconf.xml # Calculate average memory throughput of the cluster.../scripts/moncli -r throughput -m sar -t mem -c../conf/hadoopcloudconf.xml # Calculate average network utilization for the cluster.../scripts/moncli -r throughput -m sar -t nw -c../conf/hadoopcloudconf.xml # Calculate average disk throughput of the cluster.../scripts/moncli -r throughput -m iostat -t iostat -c../conf/hadoopcloudconf.xml # Copy contents of hadoopconf folder. cp -r /tmp/hadoopconf /tmp/monhadsum # Copy contents of hadoopconf folder. cp -r /tmp/hadoopconf /tmp/monhadsum # Copy performance data gathering templates. cp -r../templates /tmp/monhadsum # Create archive with all the logfiles and graph images. tar -cvf /tmp/baselinemaprtera.tar /tmp/monhadsum # End script. echo "END: terasort benchmark..." date 23
24 Results Figure 6 shows the test results from running TeraSort using the 17 node Hadoop cluster. The images shows two rounds of testing with data compression turned ON and one with data compression turned OFF. The lower the time to complete, which is measured in seconds, the better the result. The images also show resource utilization in terms of processor, memory, disk throughput, and network throughput for both the test runs. Figure 6: Time taken to complete TeraSort. In Figure 6, the blue bar shows time taken by TeraSort to run with output of the map phase compressed before the data was stored for reduced phase. In our test the cluster sorted 1TB of data in 1207 seconds with data compression. The red bar shows time taken by TeraSort run to complete without data compression. As we can see in the graph, TeraSort completes in 1040 seconds and is faster than run with data compression. 24
25 The following graphs show resource utilization with data compression enabled. Figure 7: Processor utilization with Data compression enabled Figure 7 shows the average processor utilization of the cluster with data compression. The Intel Xeon processor X5680 has the additional task of compressing the data and makes an excellent choice for setting up the Hadoop cluster. 25
26 Figure 8: Network throughout with data compression enabled Figure 8 shows average network throughput of the cluster with data compression enabled. Since the data is compressed before getting transmitted over the network, the amount of data sent over the network is reduced. 26
27 Figure 9: Memory usage Figure 9 shows the average percentage of cluster memory used. Since we allocated almost 2GB of memory per task, the entire 48GB of memory on the server is utilized when the TeraSort benchmark is running. 27
28 Figure 10: Disk throughput with data compression enabled Figure 10 shows the average disk throughput of the cluster. Since the data is compressed, the writes are minimal during the map phase and peak to nearly 600Mb/s when the sorted data is committed to the disk. 28
29 The following graphs show resource utilization with data compression disabled. Figure 11: Processor utilization with Data compression disabled 29
30 Figure 12: Network throughput with data compression disabled Figure 12 shows the network throughput reaching almost 300MB/s, or close to 3Gb/s, when TeraSort is run with compression of data disabled. To provide optimal bandwidth to accommodate the data transfer between the nodes, Intel Ethernet server adapter X520 based on10gbe efficiently handles the data throughput of the cluster. 30
31 Figure 13: Memory usage 31
32 Figure 14: Disk throughput with data compression disabled With compression disabled we see the disk usage is higher as with network usage. The peak writes were at 620MB/s and remained above 400MB/s for the entire run. The total cluster throughput including read and writes was closer to 1GB/s at the peaks. 32
33 Conclusion Hadoop clusters benefit a great deal from servers based on Intel Xeon processor 5680; the dual socket servers are optimal for any Hadoop deployment ranging from a few nodes to hundreds of nodes. In our test runs we were able to put the cluster to its maximum utilization. With the cluster being 100 percent utilized, jobs complete faster, making way for other job sets to run on the cluster. With data centers aiming to get the most out of performance per watt, having an energy efficient Intel Xeon processor 5600 series provides cost benefits on a per node basis. In distributed workloads it is key to have high throughput network connections to handle workloads with large datasets. In the test, Intel Ethernet server adapters X520-DA2 based on 10GbE were able to achieve data rates of 3Gb/s during the workload execution. While compressing the data has advantages of substantial reduction in data transfer over the network, the time to complete increases compared to test runs without data being compressed. System administrators and application developers have to make the decision whether to enable data compression based on their specific requirements. Intel has published a set of guidelines on tuning Hadoop clusters which can be found at Using LZO based compression codecs may alleviate some of the bottlenecks found with default Zlib compression codecs. 33
34 For more information: Disclaimers Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See products/processor_number for details. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROP- ERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling , or by visiting Intel s Web site at Copyright 2012 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon inside, and Intel Intelligent Power Node Manager are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others.
Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms
EXECUTIVE SUMMARY Intel Cloud Builder Guide Intel Xeon Processor-based Servers Novell* Cloud Manager Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms Novell* Cloud Manager Intel
More informationIEEE1588 Frequently Asked Questions (FAQs)
IEEE1588 Frequently Asked Questions (FAQs) LAN Access Division December 2011 Revision 1.0 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
More informationIntel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes
Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes 23 October 2014 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 Intel Debugger (IDB) is
More informationEvolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)
Evolving Small Cells Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure) Intelligent Heterogeneous Network Optimum User Experience Fibre-optic Connected Macro Base stations
More informationLED Manager for Intel NUC
LED Manager for Intel NUC User Guide Version 1.0.0 March 14, 2018 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
More informationIntel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes
Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes Document number: 323804-001US 8 October 2010 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 What s New...
More informationIntel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes
Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes 24 July 2014 Table of Contents 1 Introduction... 2 1.1 Product Contents... 2 1.2 System Requirements...
More informationSoftware Evaluation Guide for WinZip* esources-performance-documents.html
Software Evaluation Guide for WinZip* 14 http://www.intel.com/content/www/us/en/benchmarks/r esources-performance-documents.html INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.
More informationIntel vpro Technology Virtual Seminar 2010
Intel Software Network Connecting Developers. Building Community. Intel vpro Technology Virtual Seminar 2010 Getting to know Intel Active Management Technology 6.0 Fast and Free Software Assessment Tools
More informationSample for OpenCL* and DirectX* Video Acceleration Surface Sharing
Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing User s Guide Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2013 Intel Corporation All Rights Reserved Document
More informationIntel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes
Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes Document number: 323804-002US 21 June 2012 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.2 Product Contents...
More informationOptimizing the operations with sparse matrices on Intel architecture
Optimizing the operations with sparse matrices on Intel architecture Gladkikh V. S. victor.s.gladkikh@intel.com Intel Xeon, Intel Itanium are trademarks of Intel Corporation in the U.S. and other countries.
More informationHighly accurate simulations of big-data clusters for system planning and optimization
White Paper Highly accurate simulations of big-data clusters for system planning and optimization Intel CoFluent Technology for Big Data Intel Rack Scale Design Using Intel CoFluent Technology for Big
More informationData Center Energy Efficiency Using Intel Intelligent Power Node Manager and Intel Data Center Manager
Data Center Energy Efficiency Using Intel Intelligent Power Node Manager and Intel Data Center Manager Deploying Intel Intelligent Power Node Manager and Intel Data Center Manager with a proper power policy
More informationInstallation Guide and Release Notes
Installation Guide and Release Notes Document number: 321604-001US 19 October 2009 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 System Requirements... 2 1.3 Documentation... 3 1.4
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationInstallation Guide and Release Notes
Intel Parallel Studio XE 2013 for Linux* Installation Guide and Release Notes Document number: 323804-003US 10 March 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.1.1 Changes since Intel
More informationINTEL PERCEPTUAL COMPUTING SDK. How To Use the Privacy Notification Tool
INTEL PERCEPTUAL COMPUTING SDK How To Use the Privacy Notification Tool LEGAL DISCLAIMER THIS DOCUMENT CONTAINS INFORMATION ON PRODUCTS IN THE DESIGN PHASE OF DEVELOPMENT. INFORMATION IN THIS DOCUMENT
More informationBitonic Sorting Intel OpenCL SDK Sample Documentation
Intel OpenCL SDK Sample Documentation Document Number: 325262-002US Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL
More informationIntel Desktop Board D945GCCR
Intel Desktop Board D945GCCR Specification Update January 2008 Order Number: D87098-003 The Intel Desktop Board D945GCCR may contain design defects or errors known as errata, which may cause the product
More informationIntel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008*
Intel Cluster Toolkit Compiler Edition. for Linux* or Windows HPC Server 8* Product Overview High-performance scaling to thousands of processors. Performance leadership Intel software development products
More informationIntel Desktop Board DZ68DB
Intel Desktop Board DZ68DB Specification Update April 2011 Part Number: G31558-001 The Intel Desktop Board DZ68DB may contain design defects or errors known as errata, which may cause the product to deviate
More informationIntel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes
Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Document number: 323803-001US 4 May 2011 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.2 Product Contents...
More informationIntel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes
Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes 22 January 2015 Table of Contents 1 Introduction... 2 1.1 Change History... 2 1.1.1 Changes
More informationHow to Create a.cibd File from Mentor Xpedition for HLDRC
How to Create a.cibd File from Mentor Xpedition for HLDRC White Paper May 2015 Document Number: 052889-1.0 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationIntel Cache Acceleration Software for Windows* Workstation
Intel Cache Acceleration Software for Windows* Workstation Release 3.1 Release Notes July 8, 2016 Revision 1.3 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationIntel s Architecture for NFV
Intel s Architecture for NFV Evolution from specialized technology to mainstream programming Net Futures 2015 Network applications Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationIntel Core TM i7-4702ec Processor for Communications Infrastructure
Intel Core TM i7-4702ec Processor for Communications Infrastructure Application Power Guidelines Addendum May 2014 Document Number: 330009-001US Introduction INFORMATION IN THIS DOCUMENT IS PROVIDED IN
More informationIntel Desktop Board D945GCLF2
Intel Desktop Board D945GCLF2 Specification Update July 2010 Order Number: E54886-006US The Intel Desktop Board D945GCLF2 may contain design defects or errors known as errata, which may cause the product
More informationIntel Desktop Board DP55SB
Intel Desktop Board DP55SB Specification Update July 2010 Order Number: E81107-003US The Intel Desktop Board DP55SB may contain design defects or errors known as errata, which may cause the product to
More informationCollecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers
Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting Important OpenCL*-related Metrics with Intel GPA System Analyzer Introduction Intel SDK for OpenCL* Applications
More informationCluster Setup. Table of contents
Table of contents 1 Purpose...2 2 Pre-requisites...2 3 Installation...2 4 Configuration... 2 4.1 Configuration Files...2 4.2 Site Configuration... 3 5 Cluster Restartability... 10 5.1 Map/Reduce...10 6
More informationUsing the Intel VTune Amplifier 2013 on Embedded Platforms
Using the Intel VTune Amplifier 2013 on Embedded Platforms Introduction This guide explains the usage of the Intel VTune Amplifier for performance and power analysis on embedded devices. Overview VTune
More informationIntel Desktop Board D975XBX2
Intel Desktop Board D975XBX2 Specification Update July 2008 Order Number: D74278-003US The Intel Desktop Board D975XBX2 may contain design defects or errors known as errata, which may cause the product
More informationIntel Desktop Board D945GCLF
Intel Desktop Board D945GCLF Specification Update July 2010 Order Number: E47517-008US The Intel Desktop Board D945GCLF may contain design defects or errors known as errata, which may cause the product
More informationOpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing
OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327281-001US
More informationSupra-linear Packet Processing Performance with Intel Multi-core Processors
White Paper Dual-Core Intel Xeon Processor LV 2.0 GHz Communications and Networking Applications Supra-linear Packet Processing Performance with Intel Multi-core Processors 1 Executive Summary Advances
More informationHow to Create a.cibd/.cce File from Mentor Xpedition for HLDRC
How to Create a.cibd/.cce File from Mentor Xpedition for HLDRC White Paper August 2017 Document Number: 052889-1.2 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationIntel and Badaboom Video File Transcoding
Solutions Intel and Badaboom Video File Transcoding Introduction Intel Quick Sync Video, built right into 2 nd generation Intel Core processors, is breakthrough hardware acceleration that lets the user
More informationCase Study: Optimizing King of Soldier* with Intel Graphics Performance Analyzers on Intel HD Graphics 4000
Case Study: Optimizing King of Soldier* with Intel Graphics Performance Analyzers on Intel HD Graphics 4000 Intel Corporation: Cage Lu, Kiefer Kuah Giant Interactive Group, Inc.: Yu Nana Abstract The performance
More informationIntel Cache Acceleration Software (Intel CAS) for Linux* v2.9 (GA)
Intel Cache Acceleration Software (Intel CAS) for Linux* v2.9 (GA) Release Notes June 2015 Revision 010 Document Number: 328497-010 Notice: This document contains information on products in the design
More informationCapture and Capitalize on Business Intelligence with Intel and IBM
White Paper Processor E7 v3 Family Capture and Capitalize on Business Intelligence with Intel and IBM Quickly add context to critical data with running on the latest processor E7 v3 family Almost everything
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationWHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD
Improve Hadoop Performance with Memblaze PBlaze SSD Improve Hadoop Performance with Memblaze PBlaze SSD Exclusive Summary We live in the data age. It s not easy to measure the total volume of data stored
More informationTrue Scale Fabric Switches Series
True Scale Fabric Switches 12000 Series Order Number: H53559001US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
More informationInria, Rennes Bretagne Atlantique Research Center
Hadoop TP 1 Shadi Ibrahim Inria, Rennes Bretagne Atlantique Research Center Getting started with Hadoop Prerequisites Basic Configuration Starting Hadoop Verifying cluster operation Hadoop INRIA S.IBRAHIM
More informationIntel Cluster Ready Allowed Hardware Variances
Intel Cluster Ready Allowed Hardware Variances Solution designs are certified as Intel Cluster Ready with an exact bill of materials for the hardware and the software stack. When instances of the certified
More informationIntel System Event Log (SEL) Viewer Utility. User Guide SELViewer Version 10.0 /11.0 February 2012 Document number: G
Intel System Event Log (SEL) Viewer Utility User Guide SELViewer Version 10.0 /11.0 February 2012 Document number: G24422-003 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH
More informationInstallation Guide and Release Notes
Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel
More informationLustre Beyond HPC. Presented to the Lustre* User Group Beijing October 2013
Lustre Beyond HPC Presented to the Lustre* User Group Beijing October 2013 Brent Gorda General Manager High Performance Data Division, Intel Corpora:on Agenda From Whamcloud to Intel Today s Storage Challenges
More informationSoftware Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod
Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod http://www.intel.com/performance/resources Version 2008-09 Rev. 1.0 Information in this document
More informationIntel Desktop Board DH55TC
Intel Desktop Board DH55TC Specification Update December 2011 Order Number: E88213-006 The Intel Desktop Board DH55TC may contain design defects or errors known as errata, which may cause the product to
More informationIntroduction. How it works
Introduction Connected Standby is a new feature introduced by Microsoft in Windows 8* for SOC-based platforms. The use case on the tablet/mobile systems is similar to that on phones like Instant ON and
More informationExtremely Fast Distributed Storage for Cloud Service Providers
Solution brief Intel Storage Builders StorPool Storage Intel SSD DC S3510 Series Intel Xeon Processor E3 and E5 Families Intel Ethernet Converged Network Adapter X710 Family Extremely Fast Distributed
More informationDesktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3
Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3 Addendum May 2014 Document Number: 329174-004US Introduction INFORMATION IN THIS
More informationIntel vpro Technology Virtual Seminar 2010
Intel Software Network Connecting Developers. Building Community. Intel vpro Technology Virtual Seminar 2010 Getting to know Intel Active Management Technology 6.0 Intel Active Management Technology (AMT)
More informationIntel Rack Scale Architecture. using Intel Ethernet Multi-host Controller FM10000 Family
white paper Intel Rack Scale Architecture using Intel Multi-host FM10000 Family Introduction Hyperscale data centers are being deployed with tens of thousands of servers making operating efficiency a key
More informationInstallation Guide and Release Notes
Installation Guide and Release Notes Document number: 321604-002US 9 July 2010 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 What s New... 2 1.3 System Requirements... 2 1.4 Documentation...
More informationIntel Desktop Board DG41CN
Intel Desktop Board DG41CN Specification Update December 2010 Order Number: E89822-003US The Intel Desktop Board DG41CN may contain design defects or errors known as errata, which may cause the product
More informationDevice Firmware Update (DFU) for Windows
Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY
More informationBitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved
Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 325262-002US Revision: 1.3 World Wide Web: http://www.intel.com Document
More informationIntel Core TM Processor i C Embedded Application Power Guideline Addendum
Intel Core TM Processor i3-2115 C Embedded Application Power Guideline Addendum August 2012 Document Number: 327874-001US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO
More informationTheory and Practice of the Low-Power SATA Spec DevSleep
Theory and Practice of the Low-Power SATA Spec DevSleep Steven Wells Principal Engineer NVM Solutions Group, Intel August 2013 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationIntel Parallel Amplifier Sample Code Guide
The analyzes the performance of your application and provides information on the performance bottlenecks in your code. It enables you to focus your tuning efforts on the most critical sections of your
More informationIntel Atom Processor E3800 Product Family Development Kit Based on Intel Intelligent System Extended (ISX) Form Factor Reference Design
Intel Atom Processor E3800 Product Family Development Kit Based on Intel Intelligent System Extended (ISX) Form Factor Reference Design Quick Start Guide March 2014 Document Number: 330217-002 Legal Lines
More informationInterrupt Swizzling Solution for Intel 5000 Chipset Series based Platforms
Interrupt Swizzling Solution for Intel 5000 Chipset Series based Platforms Application Note August 2006 Document Number: 314337-002 Notice: This document contains information on products in the design
More informationBig Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.
Big Data Programming: an Introduction Spring 2015, X. Zhang Fordham Univ. Outline What the course is about? scope Introduction to big data programming Opportunity and challenge of big data Origin of Hadoop
More informationSoftware Evaluation Guide for WinZip 15.5*
Software Evaluation Guide for WinZip 15.5* http://www.intel.com/performance/resources Version 2011-06 Rev. 1.1 Information in this document is provided in connection with Intel products. No license, express
More informationIntel Desktop Board DG31PR
Intel Desktop Board DG31PR Specification Update May 2008 Order Number E30564-003US The Intel Desktop Board DG31PR may contain design defects or errors known as errata, which may cause the product to deviate
More informationSoftware Evaluation Guide for Microsoft* Office Excel* 2007
Software Evaluation Guide for Microsoft* Office Excel* 2007 http://www.intel.com/performance/resources Version 2007-01 Rev 1.0 Performance tests and ratings are measured using specific computer systems
More informationSoftware Evaluation Guide for Microsoft* Office Excel* 2007
Software Evaluation Guide for Microsoft* Office Excel* 2007 http://www.intel.com/performance/resources Version 2007-01 Rev 1.0 About this Document This document is a guide measuring performance of the
More informationRe-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs
Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs Jack Zhang yuan.zhang@intel.com, Cloud & Enterprise Storage Architect Santa Clara, CA 1 Agenda Memory Storage Hierarchy
More informationEnabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors
Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors Application Note May 2008 Order Number: 319801; Revision: 001US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH
More informationIntel Desktop Board DG41RQ
Intel Desktop Board DG41RQ Specification Update July 2010 Order Number: E61979-004US The Intel Desktop Board DG41RQ may contain design defects or errors known as errata, which may cause the product to
More informationIntel Desktop Board D946GZAB
Intel Desktop Board D946GZAB Specification Update Release Date: November 2007 Order Number: D65909-002US The Intel Desktop Board D946GZAB may contain design defects or errors known as errata, which may
More informationIntel Media Server Studio 2017 R3 Essentials Edition for Linux* Release Notes
Overview What's New Intel Media Server Studio 2017 R3 Essentials Edition for Linux* Release Notes System Requirements Package Contents Installation Installation Folders Known Limitations Legal Information
More informationIntel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012
Intel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012 Document Number: 326673-001 Background INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationIntel Simple Network Management Protocol (SNMP) Subagent v8.0
Intel Simple Network Management Protocol (SNMP) Subagent v8.0 User Guide June 2017 Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
More informationSoftware Evaluation Guide for Photodex* ProShow Gold* 3.2
Software Evaluation Guide for Photodex* ProShow Gold* 3.2 http://www.intel.com/performance/resources Version 2007-12 Rev. 1.0 Information in this document is provided in connection with Intel products.
More informationIntel Atom Processor E6xx Series Embedded Application Power Guideline Addendum January 2012
Intel Atom Processor E6xx Series Embedded Application Power Guideline Addendum January 2012 Document Number: 324956-003 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationPARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures
PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures Solovev S. A, Pudov S.G sergey.a.solovev@intel.com, sergey.g.pudov@intel.com Intel Xeon, Intel Core 2 Duo are trademarks of
More informationIntel Core vpro Processors Common-Use Guide
Intel Core vpro Processors Common-Use Guide For LabTech Software* (Revision 1.1 December 6, 2011) Table of Contents Introduction... 3 Setup and Assumptions... 3 Common Use Cases Covered in this Guide...
More informationBosch Rexroth* Innovates Sercos SoftMaster* for the Industrial PC Platform with the Intel Ethernet Controller I210
Solution brief Bosch Rexroth* Innovates Sercos SoftMaster* for the Industrial PC Platform with the Intel Ethernet Controller I210 TenAsys* INtime* real-time software enables the solution on a standard,
More informationHadoop Virtualization Extensions on VMware vsphere 5 T E C H N I C A L W H I T E P A P E R
Hadoop Virtualization Extensions on VMware vsphere 5 T E C H N I C A L W H I T E P A P E R Table of Contents Introduction... 3 Topology Awareness in Hadoop... 3 Virtual Hadoop... 4 HVE Solution... 5 Architecture...
More informationIntel vpro Technology Virtual Seminar 2010
Intel Software Network Connecting Developers. Building Community. Intel vpro Technology Virtual Seminar 2010 Getting to know Intel Active Management Technology 6.0 Remote Encryption Management Andy Schiestl
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationIntel Thread Checker 3.1 for Windows* Release Notes
Page 1 of 6 Intel Thread Checker 3.1 for Windows* Release Notes Contents Overview Product Contents What's New System Requirements Known Issues and Limitations Technical Support Related Products Overview
More informationIntel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003
Intel 848P Chipset Specification Update Intel 82848P Memory Controller Hub (MCH) August 2003 Notice: The Intel 82848P MCH may contain design defects or errors known as errata which may cause the product
More informationMovidius Neural Compute Stick
Movidius Neural Compute Stick You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to
More informationDrive Recovery Panel
Drive Recovery Panel Don Verner Senior Application Engineer David Blunden Channel Application Engineering Mgr. Intel Corporation 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationIntel Desktop Board DH61SA
Intel Desktop Board DH61SA Specification Update December 2011 Part Number: G52483-001 The Intel Desktop Board DH61SA may contain design defects or errors known as errata, which may cause the product to
More informationIntel G31/P31 Express Chipset
Intel G31/P31 Express Chipset Specification Update For the Intel 82G31 Graphics and Memory Controller Hub (GMCH) and Intel 82GP31 Memory Controller Hub (MCH) February 2008 Notice: The Intel G31/P31 Express
More informationHadoop/MapReduce Computing Paradigm
Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications
More informationDeploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c
White Paper Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c What You Will Learn This document demonstrates the benefits
More informationSGI Hadoop Based on Intel Xeon Processor E5 Family. Getting Started Guide
SGI Hadoop Based on Intel Xeon Processor E5 Family Getting Started Guide 007-5875-001 COPYRIGHT 2013 Silicon Graphics International Corp. All rights reserved; provided portions may be copyright in third
More informationIntel Integrator Toolkit
Intel Integrator Toolkit User Guide Version 6.1.8 May 4, 2018 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY
More informationMicrosoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage
Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage A Dell Technical White Paper Dell Database Engineering Solutions Anthony Fernandez April 2010 THIS
More informationUpgrading Intel Server Board Set SE8500HW4 to Support Intel Xeon Processors 7000 Sequence
Upgrading Intel Server Board Set SE8500HW4 to Support Intel Xeon Processors 7000 Sequence January 2006 Enterprise Platforms and Services Division - Marketing Revision History Upgrading Intel Server Board
More information