Dr. Joe Zhang PDC-3: Parallel Platforms

CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model hysical organization (actual hardware) RAM Interconnection networks Network topologies Characteristics 2 1

Example of arallelism arallelism from single instruction on multiple processors for (i = 0; i < 1000; i++) c[i] = a[i] + b[i]; Various iterations of the loop are independent Execute the same instruction, add, on all processors Single instruction stream, multiple data stream (SIMD) 3 Flynn Classification Flynn classification based on instruction stream and data stream (control structure) Each single-instruction stream generated from program operates single data (SISD) Each single-instruction stream generated from program operates multiple data (SIMD) Multiple instruction stream generated from program operates single data (MISD) (not exits) Multiple instruction stream generated from program operates multiple data (MIMD) 4 2

Diagram Comparing Classifications SISD The simplest architecture is single-instruction single-data (SISD). The first extension to CUs for speedup was pipelining. The various circuits of the CU are split up into functional units which are arranged into a pipeline. Each functional unit operates on the result of the previous functional unit during a clock cycle. 3

SIMD Vector processors perform the same operation on several inputs simultaneously. The basic instruction is only issued once for several operands. ure SIMD systems have a single CU devoted to control and a large collection of subordinate processors each with its own registers. The control CU broadcasts an instruction to all of the subordinates. Each subordinate either executes the instruction or sits idle. SIMD (Fortran Example) Compare the Fortran 77 code (sequential): do 100 i = 1, 100 z(i) = x(i) + y(i) 100 continue with the equivalent Fortran 90 code (vector): z(1:100) = x(1:100) + y(1:100) 4

MIMD All the autonomous processors in MIMD machines operate on their own data. Each processor operates on its own pace. There is often no global clock and no explicit synchronization. There are shared-memory systems and distributed-memory systems. SIMD and MIMD 10 5

Comparison of SIMD and MIMD SIMD computers requires less hardware Only one global control unit SIMD requires less memory Only one copy of data SIMD is not so popular Specialized hardware architecture, extensive design efforts oor resource utility in the case of conditional execution 11 Conditional Execution on SIMD 12 6

SMD Single program multiple data (SMD) A simple variant of MIMD model Rely on multiple instances of the same program executing on different data SMD is widely used by many parallel platforms Sun Ultra Servers Multiprocessors Cs Workstation cluster IBM S Requires minimal architecture support 13 Communication Model Two primary forms of data exchange Accessing a shared memory space Message passing 14 7

Shared vs. Distributed Memory BUS Memory Shared memory - single address space. All processors have access to a pool of shared memory. (Ex: SGI Origin, Sun E10000) Distributed memory - each processor has it s own local memory. Must do message passing to exchange data between processors. (Ex: CRAY T3E, IBM S, clusters) M M M M Network M M Shared-Address-Space latforms Shared-address-space platforms A common data space is accessible to all processors rocessors interact by modifying data objects stored in the shared-address-space Memory can be local or global (common to all processors) Accessing local memory is cheaper With a global memory, programming is easier Multiprocessors Shared-address-space platforms supporting SMD programming 16 8

Shared Memory: UMA vs. NUMA BUS Memory Uniform memory access (UMA): Each processor has uniform access to memory. Also known as symmetric multiprocessors, or SMs (Sun E10000) Non-uniform memory access (NUMA): Time for memory access depends on location of data. Local access is faster than non-local access. Easier to scale than SMs (SGI Origin) BUS Memory BUS Memory Network UMA and NUMA with/without cashes 18 9

Message-assing latforms A message-passing platform Consists of p processing nodes Each with its own exclusive address space Each node can either be a single processor or a sharedaddress-space multiprocessor Clustered workstations Interactions must be accomplished using messages Send Receive 19 latforms and rogramming latforms that supports message-passing paradigm IBM S SGI Origin 2000 Workstation clusters rogramming Message assing Interface (MI) (Chapter 6) 20 10

Summary Flynn classification SISD SIMD MISD MIMD SMD Communication model Accessing a shared memory space UMA and NUMA Message passing 21 CSC630/CSC730: arallel & Distributed Computing Questions? 22 11