Useful Links. CS 6213 Fall ASU

Size: px

Start display at page:

Download "Useful Links. CS 6213 Fall ASU"

Emily Bradley
6 years ago
Views:

1 Useful Links DBLP ( CiteSeer ( Computer Science Directory ( Wikipedia ( Google Scholar ( Microsoft Academic Search ( ACM Digital Library ( on campus IEEE Computer Society Digital Library ( on campus IEEE Transactions on Computers (print in library) IEEE Transactions on Parallel and Distributed Systems ( on campus Journal of Parallel and Distributed Computing ( on campus 1

2 Chapter 2 Parallel Programming Platforms 2

3 Topic Overview Implicit Parallelism: Trends in Microprocessor Architectures Limitations of Memory System Performance Dichotomy of Parallel Computing Platforms Communication Model of Parallel Platforms Physical Organization of Parallel Platforms Communication Costs in Parallel Machines Messaging Cost Models and Routing Mechanisms Mapping Techniques Case Studies 3

von Neumann Architecture Computer model : von Neumann computer ( Named after the Hungarian mathematician John von Neumann ) A von Neumann computer uses the stored-program concept.

4 von Neumann Architecture Computer model : von Neumann computer ( Named after the Hungarian mathematician John von Neumann ) A von Neumann computer uses the stored-program concept. The CPU executes a stored program that specifies a sequence of read and write operations on the memory Three components: processor, memory, and datapath Bottlenecks Solution: multiplicity (implicit parallelism) 4

5 Trend in Microprocessor Architectures Implicit parallelism: techniques that enable execution of multiple threads/instructions in a single clock cycle have become popular Some example microprocessors with instruction level parallelism capability: Itanium, Sparc Ultra, MIPS, and Power4 A traditional OS supports multitasking: process level parallelism (time slice) 5

6 Implicit Parallelism: Trends in Microprocessor Architectures Microprocessor clock speeds have posted impressive gains over the past two decades (two to three orders of magnitude). Higher levels of device integration have made available a large number of transistors. The question of how best to utilize these resources is an important one. Current processors use these resources in multiple functional units and execute multiple instructions in the same cycle. The precise manner in which these instructions are selected and executed provides impressive diversity in architectures. 6

7 Pipelining Pipelining enables faster execution: overlapping various stages in instruction execution (fetch, schedule, decode, operand fetch, execute, store, among others) The assembly of a car: 100 time units totally 10 pipelined stages of 10 units each Producing a car every 10 time units! Stages Cycles S1 S2 S3 S4 S5 S6 Cycles Stages S1 S2 S3 S4 S5 S

8 Pipelining (cont d) The speed of a pipeline is limited by the largest atomic task Predicting branch destinations: the penalty depends on the number of instructions need to be flushed Cycles Stages exe S1 S2 S3 S4 S5 S6 8

9 Superscalar Execution One simple way of alleviating these bottlenecks is to use multiple pipelines Super-pipelined processor: issue multiple instructions in the same cycle (superscalar execution) 2-way vs. 4-way Stages S4 S1 S2 S3 u v S5 S Cycles

10 Two-way Superscalar Execution True data dependence Poor resource utilization Require reordering instructions 10

11 Dependencies True data dependency: The results of an instruction may be required for subsequent instructions Resource dependency: More instructions compete for a single processor resource Branch or procedural dependencies: Conditional branch instructions are encountered between every five to six instructions Handled by speculative scheduling and rolling back The ability of a processor to detect and schedule concurrent instructions is critical to superscalar performance Dynamic instruction issue, a window of instructions, look-ahead 11

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism: