High Performance Computing using a Parallella Board Cluster PROJECT PROPOSAL. March 24, 2015

Similar documents
Constructing a low cost and low powered cluster with Parallella boards

Example. Section: PS 709 Examples of Calculations of Reduced Hours of Work Last Revised: February 2017 Last Reviewed: February 2017 Next Review:

Cambridge English Dates and Fees for 2018

Honours Project Proposal. Luke Ross Supervisor: Dr. Karen Bradshaw Department of Computer Science, Rhodes University

Basic Device Management

YEAR 8 STUDENT ASSESSMENT PLANNER SEMESTER 1, 2018 TERM 1

Scheduling. Scheduling Tasks At Creation Time CHAPTER

What s next? Are you interested in CompTIA A+ classes?

AIMMS Function Reference - Date Time Related Identifiers

September 2015 Calendar This Excel calendar is blank & designed for easy use as a planner. Courtesy of WinCalendar.com

Institute For Arts & Digital Sciences SHORT COURSES

Project Proposal. Mark John Swaine. Supervisor: Dr. Karen Bradshaw. Department of Computer Science, Rhodes University

SHOC: The Scalable HeterOgeneous Computing Benchmark Suite

Calendar Excel Template User Guide

KENYA 2019 Training Schedule

BHARATI VIDYAPEETH`S INSTITUTE OF MANAGEMENT STUDIES AND RESEARCH NAVI MUMBAI ACADEMIC CALENDER JUNE MAY 2017

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT

Optimization of non-contiguous MPI-I/O operations

ECE 588/688 Advanced Computer Architecture II

High Performance Computing Course Notes Course Administration

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Calendar PPF Production Cycles Non-Production Activities and Events

Embedded Systems Programming

2016 Calendar of System Events and Moratoriums

IB Event Calendar Please check regularly for updates Last Update: April 30, 2013

CSC 111 Introduction to Computer Science (Section C)

Sep-17 Client Meeting

Seminar Recent Trends in Database Research

High Performance Computing in C and C++

Clusters. Rob Kunz and Justin Watson. Penn State Applied Research Laboratory

Schedule/BACnet Schedule

Grid Code Planner EU Code Modifications GC0100/101/102/104

CLOVIS WEST DIRECTIVE STUDIES P.E INFORMATION SHEET

CALENDAR OF FILING DEADLINES AND SEC HOLIDAYS

INFORMATION TECHNOLOGY SPREADSHEETS. Part 1

CS Understanding Parallel Computing

Kendrick School. Admission into Year 7 in September Application Process & Key Dates

Note: Review pre-course content, Before the Course Begins section, starting August 14, 07:00 UTC. Course Element

TEMPLATE CALENDAR2015. facemediagroup.co.uk

ControlLogix/Studio 5000 Logix Designer Course Code Course Title Duration CCP143 Studio 5000 Logix Designer Level 3: Project Development 4.

Nortel Enterprise Reporting Quality Monitoring Meta-Model Guide

Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology

MONITORING REPORT ON THE WEBSITE OF THE STATISTICAL SERVICE OF CYPRUS DECEMBER The report is issued by the.

HPC projects. Grischa Bolls

Pearson Edexcel Award

Habanero Operating Committee. January

Grade 4 Mathematics Pacing Guide

SOUTH DAKOTA BOARD OF REGENTS. Board Work ******************************************************************************

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

International Arbitration Workshops and Award Writing Examination

Introduction to GPU hardware and to CUDA

Year 10 Semester 1 Assessment Calendar

Drawing Courses. Drawing Art. Visual Concept Design. Character Development for Graphic Novels

FAMIS Web Time. User Manual. Haight, Timothy 3/3/2012

Coursework and Controlled Assessment Timetable

name name C S M E S M E S M E Block S Block M C-Various October Sunday

NetworX Series. NX-507E RELAY EXPANDER NX-508E OUTPUT EXPANDER Installation and Startup

QI TALK TIME. Run Charts. Speaker: Dr Michael Carton. Connect Improve Innovate. 19 th Dec Building an Irish Network of Quality Improvers

Pushing the Limits. ADSM Symposium Sheelagh Treweek September 1999 Oxford University Computing Services 1

Designing a Cluster for a Small Research Group

CAMBRIDGE TECHNICALS FINAL EXAMINATION TIMETABLE JANUARY 2019

Standard Operating Procedure. Talent Management. Talent Acquisition. Class 5 Talent Acquisition Process

View a Students Schedule Through Student Services Trigger:

NetworX Series. NX-507E RELAY EXPANDER NX-508E OUTPUT EXPANDER Installation and Startup

LIHP Monthly Aggregate Reporting Instructions Manual. Low Income Health Program Evaluation

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

Doody s Publisher Promotions: 2018 Media Kit

Conditional Formatting

NVLink on NVIDIA GeForce RTX 2080 & 2080 Ti in Windows 10

Intra-MIC MPI Communication using MVAPICH2: Early Experience

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

Exercises Software Development I. 02 Algorithm Testing & Language Description Manual inspection, test plan, grammar, metasyntax notations (BNF, EBNF)

COMPUTER SCIENCE TRIPOS, PART IA, 2019

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science

CSE 305 Programming Languages Spring, 2010 Homework 5 Maximum Points: 24 Due 10:30 AM, Friday, February 26, 2010

Brutus. Above and beyond Hreidar and Gonzales

MPI History. MPI versions MPI-2 MPICH2

HPC Architectures. Types of resource currently in use

typedef int Array[10]; String name; Array ages;

Meeting of the Technical Steering Committee (TSC) Board

High Performance Computing Implementation on a Risk Assessment Problem

High Performance Computing

With The Needle. Price: (incl. VAT) Cross Stitch Charts. da: Lila's Studio. Modello: SCHHOF With The Needle 326w x 146h stitches

The way toward peta-flops

UNH-IOL NVMe Testing Service

B.2 Measures of Central Tendency and Dispersion

Published: December 15, 2016 Revised: December 15, 2016

David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms.

ECE 588/688 Advanced Computer Architecture II

Aquatic Leadership Catalogue September 2018 Labour Day 2019

EACH MONTH CUTTING EDGE PEER REVIEW RESEARCH ARTICLES ARE PUBLISHED

Parallel Computing: From Inexpensive Servers to Supercomputers

Multicast can be implemented here

Fall Principles of Knowledge Discovery in Databases. University of Alberta

Published: December 15, 2017 Revised: December 15, 2017

GreenThumb Garden Registration

Enters system mode. Example The following example creates a scheduler named maintenancesched and commits the transaction:

Work Project Report: Benchmark for 100 Gbps Ethernet network analysis

Transcription:

High Performance Computing using a Parallella Board Cluster PROJECT PROPOSAL March 24, Michael Johan Kruger Rhodes University Computer Science Department g12k5549@campus.ru.ac.za

Principle Investigator Michael Johan Kruger 3 st Aidens Grahamstown, 6139 079 982 8583 g12k5549@campus.ru.ac.za Supervised by: Dr. Karen Bradshaw Project Title The proposed title of the project is: High Performance Computing using a Parallella board Cluster Problem Statement Is it possible to build a high performance computer using a cluster of Parallella boards. The release of the Parallella board provides a cheap and energy efficient computer that requires minimal setup. Cluster computing is becoming more and more popular as it is a low cost way to create high performance computers. Using multiple separate computers over a network it is possible for them to work together as one system to solve problems [5]. Objectives The proposed objectives of this project are as follows: Build a high performance computer using multiple Parallella boards connected over a network install Operating systems and software to set up the cluster Compare performance of the Parallella cluster with other similarly priced systems Discover the limitations of the Parallella cluster Background High Performance Computing High Performance Computing (HPC) is the term for very fast systems aimed at processing large amounts of information quickly. High performance computers are made up of multiple processors as the speed of a single processor has reached its limit due to physics [6]. HPC is most cheaply obtained using cluster computing, as most places needing large 2

amounts of processing power have multiple computers readily available [5]. HPC is used to run high cost simulations that would be to expensive or difficult to do physically, these require large amounts of computational power to be completed in a timely manner [11] [10] so more powerful machines are always being built. The High performance computing has evolved over time and the amount of cores in 1 computer is in the millions with performance in the Peta FLOPS (10 15 floating-point operations per second) [3]. Parallel and Concurrent Computing Due to the current size and speed constraints on single core processors it is found to be more efficient and faster to run multiple slower cores. Algorithms need to take into account ways in which to split the work load evenly between multiple processors if they want to get faster speeds in this architecture. [6]. Many compilers have special flags so that they will try optimise for parallel computation [9]. But this is a minor boost in speed when compared with an algorithm that splits the work up. According to David Geer [6] to take advantage of having multiple cores programs need to be rewritten so that they may run on multiple threads, each thread may then be assigned to a processor. Cluster Computing With the need for large amounts of processing power, ways of creating super computers cheaply have appeared. Clusters of computers connected on a network can be purposed to work together as a super computer. With the increased speed and decreased latency of the Internet, it is possible to create a cluster using computers from all over the world, this has lead to programs and applications that allow a computer to connect to a pool of other computers and add its processing power to the computation. There are factors limiting the effectiveness of cluster computing such as: Building a switch to keep up with the speed of a single core processor, creating compilers that make good use of multiple processors. Clusters communicate with each other using a message passing interface (MPI) which provides a thread safe application programming interface (API) that allows the work to be effectively delegated to multiple computers on the network [?]. The MPI allows each individual processor to communicate with the others allowing them to pass information to each other [?]. Parallella board The Parallella board is an affordable, energy efficient, high performance, credit card sized computer [2]. its function is to provide a platform for developing and implementing high performance parallel processing. The 66-core version of the Parallella board manages over 90 GFLOPS (10 9 Floating Point Operations Per Second) while the 16 core version can reach 32 GFLOPS using only about 5 Watts. The Parallella is provided with an SD card that contains a customised ARM implementation of Linux (Ubuntu 14.04) which is also provided on the Parallella website [2]. The Parallella has a 1gbps Ethernet port allowing large amounts of information to be passed quickly over the network, this increases its 3

ability to work in a cluster as it can pass information to its peers faster provided that the switch is capable of handling the 1gbps bandwidth. Approach The first step will be to find information on high performance computing (HPC) with clusters. This will require a literature review, which will provide a better understanding on how HPC is done and how cluster computing is used to create high performance systems. Information on the Parallella and what it is capable of will need to be reviewed so they can be used correctly. The second step will be the physical building of the Parallella cluster. This involves setting up cooling on all the Parallella boards using a fan, providing power to all the boards via the mounting hole pads, setting up a router or switch to facilitate communication between the boards and mounting and connecting all the boards to power and the switch. After setting up the components, operating systems and software will need to be installed on each board or have network booting set up on each. The software will have to be configured so that the boards will be aware of each other and work together. Once the Parallella cluster is operational, comparisons against other systems can be made using software to benchmark each systems performance. Once comparisons and benchmarks are taken ways in which to optimise the Parallella cluster can be explored and tested, this involves changing the cluster configuration, using different compiler flags and any other optimisations discovered after further research. Results and any limitations of the Parallella boards are to be recorded. Requirements Hardware required to meet project goals: Multiple Parallella boards Switch and Ethernet cable Fans to cool the Parallella boards Desktop with a screen to interact with the headless Parallella cluster. 4

Progression Time line Friday 27 th February, Tuesday 3 rd March, Sunday 29 th March, Monday 20 th April, Saturday 25 th April, Monday 20 th April, Monday 1 st June, Tuesday 28 th July,, 4 th and 11 th Aug Monday 14 th September, 26 th, 27 th and 28 th Oct Friday 30 th October, Friday 6 th November, Wednesday 18 th November, Week 1-4 th Term Week 2-4 th Term Week 3-4 th Term Week 4-4 th Term Formal Written Proposal completed and accepted (submit to Supervisor and cc Project coordinator) Seminar Series 1: oral presentations 8-10 mins each Literature Review and Plan of Action (email to Supervisor and cc Project coordinator) Have Parallella boards powered and mounted Operating system installed on the Parallella Boards have Parallella boards working as a cluster Run benchmarks and comparisons on the cluster and similar systems Seminar Series 2: oral presentations 15 mins each Short Paper Submitted (Supervisors to confirm) Seminar Series 3: Final oral presentations 20 mins max - assessed 12 noon Project Deadline submitted to Project Coordinator Research Website complete Final Research Oral Examination Skeleton outline of all chapters including major section and subsection headings. Complete structure of final thesis including summary of contents of all sections and subsections. By now you should have at least one or two chapter drafts. Additional drafts of new chapters. etc. 5

References [1] Networkboot parallella board. https://plus.google.com/+shogunx/posts/ K5taMhPKurp. Accessed: -02-27. [2] The parallella board. http://www.top500.org/lists/2014/11/. Accessed: - 03-01. [3] Top 500 super computers. https://www.parallella.org/board/. [4] George S Almasi and Allan Gottlieb. Highly parallel computing. 1988. [5] Rajkumar Buyya. High performance cluster computing. New Jersey: F rentice, 1999. [6] David Geer. Chip makers turn to multicore processors. Computer, 38(5):11 13, 2005. [7] William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. A highperformance, portable implementation of the mpi message passing interface standard. Parallel computing, 22(6):789 828, 1996. [8] Mark F Mergen, Volkmar Uhlig, Orran Krieger, and Jimi Xenidis. Virtualization for high-performance computing. ACM SIGOPS Operating Systems Review, 40(2):8 11, 2006. [9] David A Padua and Michael J Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12):1184 1201, 1986. [10] KY Sanbonmatsu and C-S Tung. High performance computing in biology: multimillion atom simulations of nanoscale systems. Journal of structural biology, 157(3):470 480, 2007. [11] T Tezduyar, S Aliabadi, M Behr, A Johnson, V Kalro, and M Litke. Flow simulation and high performance computing. Computational Mechanics, 18(6):397 412, 1996. test. 6