Memory Systems in Pipelined Processors

Similar documents
Multi-Processor / Parallel Processing

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

CSCI 4717 Computer Architecture

Computer Organization. Chapter 16

Organisasi Sistem Komputer

Lecture 9: MIMD Architectures

Operating Systems: Internals and Design Principles, 7/E William Stallings. Chapter 1 Computer System Overview

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Chapter 18 Parallel Processing

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Chapter 18. Parallel Processing. Yonsei University

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architectures

Module 5 Introduction to Parallel Processing Systems

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:

Chapter 11. Introduction to Multiprocessors

Computer Organization

Computer Systems Architecture

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Computer Systems Architecture

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

A MULTIPROCESSOR SYSTEM. Mariam A. Salih

Lecture 1: Parallel Architecture Intro

Handout 3 Multiprocessor and thread level parallelism

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

CS370 Operating Systems

Shared Memory Architecture Part One

Computer parallelism Flynn s categories

Multiprocessors - Flynn s Taxonomy (1966)

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

Multiprocessors & Thread Level Parallelism

Multiprocessor scheduling

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Chapter 17 - Parallel Processing

Advanced Parallel Architecture. Annalisa Massini /2017

Comp. Org II, Spring

Parallel Processing & Multicore computers

Operating Systems, Fall Lecture 9, Tiina Niklander 1

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

Comp. Org II, Spring

Chap. 4 Multiprocessors and Thread-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism

CS370 Operating Systems

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Uniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Chapter 1 Computer System Overview

Convergence of Parallel Architecture

Multiprocessor Systems Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message pas

CSC 553 Operating Systems

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

QUESTION BANK UNIT I

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

WHY PARALLEL PROCESSING? (CE-401)

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Computer Architecture

CSE Opera+ng System Principles

Lecture 24: Virtual Memory, Multiprocessors

Course Details. Operating Systems with C/C++ Course Details. What is an Operating System?

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

CS6401- Operating System QUESTION BANK UNIT-I

Chapter 5. Multiprocessors and Thread-Level Parallelism

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Lecture 7: Parallel Processing

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Lecture 1 Introduction (Chapter 1 of Textbook)

Computer-System Organization (cont.)

Lecture 24: Memory, VM, Multiproc

DISTRIBUTED SHARED MEMORY

Part I Overview Chapter 1: Introduction

Parallel Computing Platforms

Operating Systems. Lecture Course in Autumn Term 2015 University of Birmingham. Eike Ritter. September 22, 2015

Multiprocessor Support

For use by students enrolled in #71251 CSE430 Fall 2012 at Arizona State University. Do not use if not enrolled.

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Chapter 1 Computer System Overview

OPERATING- SYSTEM CONCEPTS

Why Multiprocessors?

Static Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept.

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Three basic multiprocessing issues

High Performance Computing

High Performance Computing in C and C++

Parallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence

Operating Systems: Internals and Design Principles. Chapter 4 Threads Seventh Edition By William Stallings

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

COSC 6385 Computer Architecture - Multi Processor Systems

Multiprocessor Systems. Chapter 8, 8.1

Transcription:

Advanced Computer Architecture (0630561) Lecture 12 Memory Systems in Pipelined Processors Prof. Kasim M. Al-Aubidy Computer Eng. Dept.

Interleaved Memory: In a pipelined processor data is required every processor clock cycle. Memory system usually is slower than the processor and may be able ti deliver data every n processor clock cycles. To overcome this limitation, it is necessary to operate n memory units in parallel to maintain the bandwidth match between the processor and memory. Bandwidth is defined as a numbers of bits that can be transferred between two units every second. The performance gain or speed up of the memory system is defined as: A good approximation to the speed up is given by: speedup= ( ( n/ 2)) 0.28

Memory Access: It is possible to organize the memory system on two basic forms: 1. Synchronous access Organization: 2. Asynchronous access Organization: Synchronous access Organization: *

Asynchronous access Organization: It works well only for the requests having consecutive addresses. The address latches are provided to all the banks. This allows individual banks to hold addresses of the requests being served allowing them to carry out their memory cycles independently.,

Synchronous or Asynchronous Access Organization? Memory system performance is characterized by the following theorems: For addresses spaced at a distance of m, the average data access time (t) per word access in synchronous organization is: t = m.t/n for m <<n t = T for m >>n where T= bank cycle time and n= number of banks. The average data access time (t) per element with requests spaced m addresses apart for asynchronous organization is: t = gcd(m.n).t/n, where T= memory cycle. If n and m are relatively prime then the data output rate is T/n for asynchronous memory organization, where T= memory cycle. The asynchronous memory system performs better than synchronous memory system for most of the time. -

Performance Enhancement: Average performance of an interleaved memory grows with the square root of the degree of interleaving. The low performance occurs due to the loss of the time slots. Example: consider a 4-way interleaved memory with FCFS schedule. A simple request queue with 12 memory requests. 5

Multiple Processor Organizatios: A traditional way to increase system performance is to use multiple processors that execute in parallel. The most common multiple-processor organizations are; 1. Symmetric multiprocessors (SMPs), 2. Clusters, and 3. Nonuniform memory access (NUMA). 8

Symmetric Multiprocessors (SMPs): The term SMP refers to a computer hardware architecture and also to the OS behavior that reflects that architecture. SMP has the following characteristics: 1. There are two or more similar processors. 2. These processors share the same main memory and I/O facilities and are interconnected by a bus or other Ins, such that memory access time is approximately the same for each processor. 3. All processors share access to I/O devices, either through the same channels or through different channels. 4. All processors can perform the same functions. 5. The system is controlled by an integrated OS that provides interaction between processors and their programs at the job, task, file, and data element levels. 6. The OS of an SMP schedules processes or threads across all of the processors. >

SMP Advantages: An SMP organization has a number of potential advantages over a uniprocessor organization, including the following: 1. Performance: if some portions of the work can be done in parallel, then a system with multiple processors will yield greater performance than one with a single processor of the same type. C

SMP Advantages: (cont.) 2. Availability: in an SMP, because all processors can perform the same functions, the failure of a single processor does not halt the system. Instead, the system can continue to function at reduced performance. 3. Incremental growth: a user can enhance the system performance by adding an additional processor. 4. Scaling: vendors can offer a range of products with different price and performance characteristics based on the number of processors in the system. D

Organization of a Multiprocessor System: 1. Tightly Coupled Multiprocessor: There are two or more processors. Each processor is self-contained, including a control unit, ALU, registers, and one or more levels of cache. Each processor has access to a shared main memory and I/O devices through interconnection mechanism. The processor can communicate with each other through memory (messages and status information left in common data areas).

2. Time-Shared Bus Organization: It is the most common organization for PCs, workstations, and servers. It is the simplest mechanism for constructing a multiprocessor system. The bus consists of data, address, and control lines. To facilitate DMA transfers from I/O processors, the following features are provided: Addressing: it must be possible to distinguish modules on the bus to determine the source and destination of data. Arbitration: any I/O module can temporarily function as master. A mechanism is provided to arbitrate competing requests for bus control using priority scheme. Time-sharing: when one module is controlling the bus, other modules are locked out and must suspend operation until bus access is achieved.

2. Time-Shared Bus Organization: (Cont.) The time-shared bus organization has several attractive features: Simplicity: the physical interface and the addressing, arbitration, and time-sharing logic of each processor remain the same as in a singleprocessor system. Flexibility: easy to expand the system by attaching more processors to the bus. Reliability: the bus is essentially a passive medium, and the failure of any attached device should not cause failure of the whole system. The main drawback to the bus organization is performance. All memory references pass through the common bus. To improve performance, it is desirable to equip each processor with a cache memory. Typically, workstations and SMPs have two levels of cache, and some processors now employ a third level of cache as well. *

Uniform Memory Access (UMA): All processors have access to all parts of main memory using loads and stores. The memory access time of a processor to all regions of memory is the same. The access times experienced by different processors are the same.,

Nonuniform Memory Access (NUMA): All processors have access to all parts of main memory using loads and stores. The memory access time of a processor differs depending on which region of main memory is accessed. For different processors, which memory regions are slower and which are faster differ. Cache-coherent NUMA (CCNUMA): A NUMA system in which cache coherence is maintained among the caches of the various processors. -

CCNUMA Organization: There are multiple independent nodes, each of which is an SMP organization. Each node contains multiple processors, each with its own L1 and L2 caches, plus main memory. 5