LINUX & Parallel Processing

Similar documents
Parallel Processing using PVM on a Linux Cluster. Thomas K. Gederberg CENG 6532 Fall 2007

DISTRIBUTED PROCESSING SOFTWARE ENVIRONMENTS

Seismic Code. Given echo data, compute under sea map Computation model

UNIT 2 PRAM ALGORITHMS

The rpvm Package. May 10, 2007

CMSC 714 Lecture 3 Message Passing with PVM and MPI

CMSC 714 Lecture 3 Message Passing with PVM and MPI

Message-Passing Programming

Computer Architecture

Compositional C++ Page 1 of 17

Message-Passing Computing

Commission of the European Communities **************** ESPRIT III PROJECT NB 6756 **************** CAMAS

Anomalies. The following issues might make the performance of a parallel program look different than it its:

UNIT I (Two Marks Questions & Answers)

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

5. Conclusion. References

PVM: Parallel Virtual Machine

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Programming with MPI. Pedro Velho

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho

Linux multi-core scalability

Message-Passing Computing

Introduction to Parallel Programming

Moore s Law. Computer architect goal Software developer assumption

Online Course Evaluation. What we will do in the last week?

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008


MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT

Chapter 4: Threads. Chapter 4: Threads

Martin Kruliš, v

Issues in Multiprocessors

Introduction to Parallel Programming

Comp 310 Computer Systems and Organization

Parallel Computing Introduction

Performance analysis basics

CSE 351. GDB Introduction

MIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:

Introduction to Parallel Computing

4.8 Summary. Practice Exercises

Dr. Joe Zhang PDC-3: Parallel Platforms

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

CS/CoE 1541 Final exam (Fall 2017). This is the cumulative final exam given in the Fall of Question 1 (12 points): was on Chapter 4

CISC2200 Threads Spring 2015

6.1 Multiprocessor Computing Environment

Tutorial on Parallel Programming in R p.1

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Performance Metrics. 1 cycle. 1 cycle. Computer B performs more instructions per second, thus it is the fastest for this program.

OPERATING SYSTEM. Chapter 4: Threads

Chapter 4: Threads. Operating System Concepts 9 th Edition

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Cray XE6 Performance Workshop

Concurrency, Thread. Dongkun Shin, SKKU

Ch. 7: Benchmarks and Performance Tests

CS 426. Building and Running a Parallel Application

Tutorial on Parallel Programming in R

Parallel Computing Why & How?

CRESPO, M., PICCOLI F., PRINTISTA M., GALLARD R. Proyecto UNSL Departamento de

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

Multi-core Architectures. Dr. Yingwu Zhu

Performance Analysis of MPI Programs with Vampir and Vampirtrace Bernd Mohr

Programs. Function main. C Refresher. CSCI 4061 Introduction to Operating Systems

Reliable Distributed System Approaches

TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica

Message-Passing Programming with MPI

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems.

Concurrent Programming

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Co-synthesis and Accelerator based Embedded System Design

Parallel Computing Basics, Semantics

A brief introduction to OpenMP

Introduction to Parallel Programming Models

Virtual Repository (VR) Version 1.0 Operation Manual

CS650 Computer Architecture. Lecture 10 Introduction to Multiprocessors and PC Clustering

Multiprocessors 2007/2008

Implementation of Parallelization

Lecture 5: Process Description and Control Multithreading Basics in Interprocess communication Introduction to multiprocessors

Issues in Multiprocessors

High Performance Computing Systems

Lecture 9: MIMD Architectures

Application Note: AN00152 xscope - Bi-Directional Endpoint

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

The Art of Parallel Processing

Performance Metrics (I)

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne

Chapter 4: Threads. Operating System Concepts 9 th Edition

Multithreading and Interactive Programs

Page 1. Challenges" Concurrency" CS162 Operating Systems and Systems Programming Lecture 4. Synchronization, Atomic operations, Locks"

Lecture Topics. Announcements. Today: Operating System Overview (Stallings, chapter , ) Next: Processes (Stallings, chapter

Memory Consistency Models

Semantic and State: Fault Tolerant Application Design for a Fault Tolerant MPI

Replication in Distributed Systems

COSC 6385 Computer Architecture - Multi Processor Systems

Transcription:

LINUX & Parallel Processing Wouter Brissinck Vrije Universiteit Brussel Dienst INFO wouter@info.vub.ac.be +32 2 629 2965

Overview b The Dream b The Facts b The Tools b The Real Thing b The Conclusion

History b 60 1 computer N users N users b 80 N computers N users b 90 N computers 1 user b past b future problem = f(computer) computer = f(problem)

The Sequential Frustration b Faster CPU : High development cost + time b Speed of light : faster smaller is the ultimate limit b Size of Si atom is the ultimate limit

Parallelism b Offers scalable performance b Solves as yet unsolvable problems problem size computation time b Hardware cost b Software development

Scalability PROBLEM PROBLEM * p CPU CPU CPU CPU 1 CPU CPU p CPU

Scalability b problem on 1 CPU takes T time b p times larger problem on p CPUs takes T time b Same technology ( only more of it ) b Same code ( if... ) b Economy of scale

Hardware cost? b Parallel machines : expensive to buy expensive to maintain expensive to upgrade unreliable lousy operating systems b N.O.W : you own one! Its maintained for you You just did, didn t you? You rely on it every day! How did you call LINUX? A network-of-workstations is a virtually free parallel machine!

Overview b The Dream b The Facts b The Tools b The Real Thing b The Conclusion

Problems b 1 painter 100 days b 100 painters 1 day? b Synchronization shared resources dependencies b Communication

Amdahl s Law T = α T + (1 ) T T p 1 α (1 α) T = α T + + p βpt α = sequential fraction β = communication cost Speedup = T T 1 p = 1 α p 1 + α + βp

Amdahl s Law Speedup=S 1 α #processors=p Speedup = S = 1 α p 1 + α + βp α = sequential fraction β = communication cost

Granularity b Granularity = Computation Communication N.O.W coarse Dedicated MIMD Pipelining in CPU fine

Network latency proc1 proc2 L len B T = L + len / B L=Latency(s) len=length(bits) B=Bandwidth(bits/s)

The N of N.O.W?

Software Design b Sequential specification implementation b Parallel specification implementation Preferably : 1 process/processor Parallel (less) parallel

Software Design : classification b Master - Slave b Pipelining b Systolic Array b Functional parallelism b Data parallelism

Overview b The Dream b The Facts b The Tools b The Real Thing b The Conclusion

P.V.M b PVM : overview b PVM installation b Using PVM console compiling writing code bmessage passing in C in C++ bgdb and pvm

Parallel Virtual Machine b Most popular Message Passing Engine b Works on MANY platforms ( LINUX, NT, SUN, CRAY,...) b Supports C, C++, fortran b Allows heterogenious virtual machines b FREE software ( ftp.netlib.org )

Heterogenious NOWs b PVM provides data format translation of messages b Must compile application for each kind of machine b Can take advantage of specific resources in the LAN

PVM features b User-configured host pool b Application = set of (unix) processes automatically mapped forced by programmer b Message Passing model strongly typed messages asynchronous (blocking/non-blocking receive) b Multiprocessor support

How PVM works b Console for resource management b Deamon for each host in the virtual machine message passing fault tolerance b PVM library provides primitives for task spawning send/receive VM info,...

How PVM works b Tasks are identified by TIDs (pvmd supplied) b Group server provides for groups of tasks Tasks can join and leave group at will Groups can overlap Useful for multicasts

N.O.W Setup Slave Slave Slave Slave Master PVM LINUX LINUX LINUX LINUX PC PC PC PC

Example : master #include "pvm3.h main(){ int cc, tid;char buf[100]; cc = pvm_spawn("hello_other", (char**)0, 0, "", 1, &tid); if (cc == 1) { cc = pvm_recv(-1, -1); pvm_bufinfo(cc, (int*)0, (int*)0, &tid); pvm_upkstr(buf); printf("from t%x: %s\n", tid, buf); } pvm_exit(); }

Example : Slave #include "pvm3.h" main() { int ptid;char buf[100]; ptid = pvm_parent(); strcpy(buf, "hello, world from "); gethostname(buf + strlen(buf), 64); pvm_initsend(pvmdatadefault); pvm_pkstr(buf); pvm_send(ptid, 1); } pvm_exit();

PVM : installing b Set environment variables (before build) PVM_ROOT PVM_ARCH b Make default b Make sure you can rsh without authentication

PVM and rsh b PVM starts tasks using rsh b Fails if a password is required b.rhosts on remote account must mention the machine where the console is running b Typical error : Can t start pvmd

Typical situation b All machines share the same user accounts b Account has these directories/files : /shared/pvm3/lib/linux/pvmd3 /shared/pvm3/lib/sun4/pvmd3 ~/pvm3/bin/linux/eg.app ~/pvm3/bin/sun4/eg.app b.rhosts file looks like this : ( mymachine is the machine which runs the console ) mymachine mymachine.my.domain

PVM Console b add hostname : add a host to VM b conf : show list of hosts b ps -a : show all pvm processes running Manually started (i.e. not spawned) tasks are shown by a - b reset : kill all pvm tasks b quit : leave console, but keep deamon alive b halt : kill deamon

PVM Console : hostfile b hostfile contains hosts to be added, with options mymachine ep=$(sim_root)/bin/$(pvm_arch) faraway lo=woutie pw=whoknows * lo=wouter b On machine without network (just loopback): * ip=127.0.0.1

Compiling a program b Application must be recompiled for each architecture binary goes in ~/pvm3/bin/$(pvm_arch) aimk determines architecture, then calls make b Must be linked with -lpvm3 and -lgpvm3 b Libraries in $(PVM_ROOT)/lib/$(PVM_ARCH)

The PVM library b Enrolling and TIDs int tid=pvm_mytid() int tid=pvm_parent() int info=pvm_exit()

The PVM library b Spawning a task int numt=pvm_spawn(char *task, char **argv, int flag, char* where, int ntask, int *tids) b flag = PvmTaskDefault PvmTaskHost ( where is host) PvmTaskDebug (use gdb to spawn) b typical error : tids[i] = -7 : executable not found

The PVM library b Message sending : initialize a buffer (initsend) pack the datastructure send (send or mcast) b Message receiving : blocking/non-blocking receive unpack (the way you packed)

The PVM library Use pvm_initsend() to initiate send int bufid = pvm_initsend(int encoding) Use one of the several pack functions to pack int info=pvm_pkint(int *np,int nitem, int stride) To send int info=pvm_send( int tid, int msgtag) int info=pvm_mcast(int *tids, int ntask, int msgtag)

The PVM library b pvm_recv waits till a message with a given source-tid and tag arrives (-1 is wildcard) : int bufid=pvm_recv (int tid, int msgtag) b pvm_nrecv returns if no appropriate message has arrived int bufid=pvm_nrecv (int tid, int msgtag) b Use one of the several unpack functions to unpack (exactly the way you packed) int info=pvm_upkint(int *np,int nitem, int stride)

Message passing : an example struct picture { int sizex,sizey; char* name; colourmap* cols; };

Message passing : an example void pkpicture(picture* p) { pvm_pkint(&p->sizex,1,1); pvm_pkint(&p->sizey,1,1); pvm_pkstr(p->name); pkcolourmap(p->cols); };

Message passing : an example picture* upkpicture() { picture* p=new picture; pvm_upkint(&p->sizex,1,1); pvm_upkint(&p->sizey,1,1); p->name=new char[20]; pvm_upkstr(p->name); p->cols=upkcolourmap(); return p; };

Message passing : an example //I m A picture mypic;... pvm_initsend(pvmdatadefault); pkpicture(&mypic); pvm_send(b,pictag);

Message passing : an example //I m B picture* mypic; pvm_recv(a,pictag); mypic=upkpicture();

In C++? b Messages contain data, not code b No cute way to send and receive objects b Provide constructors that build from receive buffer b Receiving end must have a list of possible objects

PVM and GDB b Spawn tasks with PvmTaskDebug b Pvmd will call $PVM_ROOT/lib/debugger taskname b Make sure this scripts starts an xterm with a debugger running taskname b GREAT debugging tool!!

Overview b The Dream b The Facts b The Tools b The Real Thing b The Conclusion

Ray-Tracing b Ray-Tracing master slave Data Parallel S S S S S S M

Simulation : the Problem par1 par2 par3 Unknown system Behaviour? input1 input2 input3

Modeling : example Entrance Shopping Area Queues and Cash desks Exit How many queues to achieve average queue length < 4?

Modeling : views b Modeling viewpoints : event view process view b Model classification Logic gates Queueing nets Petri nets.

Simulation : Design Cycle Modeling Real System Model Simulation Validation Results

Simulation : Time Investment b Modeling & Validation = Human Thinking b Simulation = Machine Thinking

Simulation : Time Investment (2) b Simple Model : Short Simulation Time Large Modeling Time Large Validation Time b Complex Model : Large Simulation Time Short Modeling Time Short Validation Time Solution??? Solution!!

Parallel Simulators b Two Methods : Multiple-Run : same simulation, different parameters/input stimuli easy (stupid but effective) event-parallelism : distribute model over processors hard : fine granular

Modeling : causality Global Event Ordering e1 e2 Process (state) e e t < t e1 e2

Parallel Simulators : event parallelism b Idea : simulate different events in parallel b Condition : causal independence! b Local Time Global Time e1 lvt1 e1 e2 lvt2 e2

Parallel Simulators : event parallelism b Very fine grain b Strong synchronization (=limited parallelism) b Strongly model dependent performance b Hard to implement Software TOOL

Parallel Simulators : Event Parallelism Fine grain specification :

Parallel Simulators : Event Parallelism Coarse grain implementation : (top view)

Parallel Simulators : Event Parallelism Coarse grain implementation : (side view) Sequential Sequential Sequential Parallel Simulator

Parallel Simulators : some results 400000 350000 300000 s p e e d ( e v / 250000 200000 150000 100000 50000 seq da=10 da=5 da=0 0 2 4 6 8 processors

Overview b The Dream b The Facts b The Tools b The Real Thing b The Conclusion

Conclusion b NOW + LINUX + PVM = Scalability for free! b Drawbacks : Software is hard to implement Performance is sensitive to problem machine implementation b Solution : software tools

References b See our web-page at http://infoweb.vub.ac.be/~parallel