Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory
|
|
- Cornelius Hines
- 6 years ago
- Views:
Transcription
1 Parallel Machine 1
2 CPU Usage Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory Solution Use multiple CPUs or multiple ALUs For simultaneous processing Known as parallel computers or multiprocessors computer To improve the efficiency of computer Increase the speed of processor Improve memory access 2
3 Type of Processing Flynn Taxonomy FLYNN TAXONOMY Single Instruction Multiple Instruction SISD SIMD MISD MIMD 3
4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate on data stored in single memory. Example: Von Neumann Machine. SIMD Single Instruction Multiple Data Single machine instruction controls the simultaneous execution of a number of processing elements on a lockstep basis. Each processing elements has an associated data memory, so that each instruction is executed on different set of data Vector Parallel 4
5 Type of Processing MISD Multiple Instruction Single Data A sequence of data is transmitted to a set of processors, each of which executes a different instruction sequence. This structure is not commercially implemented. Extraordinary MIMD Multiple Instruction Multiple Data A set of processors simultaneously execute different instruction sequence on different data sets. SMP (Symmetric Multiprocessor) and NUMA (Non Uniform Memory Access) Shared memory switch bus Distributed / Local Memory Switch Bus 5
6 MIMD Distributed Memory Using many CPUs connected CPU control the implementation of each operation separately Can perform various tasks simultaneously 2 techniques of connection between CPU and memory: Direct connection Net/Grid Connection The relationship between a corner with the opposite corner is fat Solution: use the hypercube / n-cube 6
7 Direct Connection 7
8 Net Connection 8
9 Hypercube Connection Route from 100 to 111 XOR Therefore, the possible routes are through 110 and 101 9
10 MIMD Shared Memory Bus Use Bus simple and easy CPU1 CPU2 CPU3 Cache memory Cache memory Cache memory Memory 10
11 MIMD Shared Memory Bus Using bus Problem: Von Neumann Bottleneck Solution: Use cache memory in each CPU Problem: Coherence cache memory 2 processors read the same data. When one of them change the data, the other processor assumed the data is original and did know the changing of data. Solution: Software Hardware 11
12 MIMD Shared Memory Bus Solution in software Classified the data Shared Read only Read-Write Unshared Problem on shared data read-write Solution: not allow the caching 12
13 MIMD Shared Memory - Bus Solution in hardware Using the cache memory controller and cache memory resolution protocol The required word block will be loaded in the memory cache. 13
14 MIMD Shared Memory - Switch Crossed Switch that connecting n CPU with k memory Advantage network without barries Disadvantage use a lot of cross point (increase in n 2 ) 14
15 Omega Network Have log 2 n stages / levels with n/2 switch in each stage Example: Omega Network 8 CPU x 8 memory Stage : log 2 8 = 3 Number of Switch : 8/2 = 4 Suis Bersilang Total number of switch = 3 * 4 = 12 8 CPU x 8 Ingatan = Less crossed point. 64 Suis Disadvantage network detained 15
16 Omega Network A 1B 1C 2A 2B 2C 3A 3B 3C 1D 2D 3D
17 Benes Network Resolved obstacles in omega network Use more switches and more stage Provide more route options from CPU to memory 17
18 SIMD Parallel Computer Execution of programs with the same set of data simultaneously More simple, cheap and very fast Example: connection machine 18
19 Connection Machine Consist of: 4 quadrant which can be operated separately 1 quadrant = 2 part of 8KPE (8192 processors) Each quadrant has: ALU 8Kb memory 4 bit flags Interface with memory and I/O system 1 route determinant 19
20 Connection Machine The compiler is written in C or LISP Each section of 8KPE sub-cube quadrant is divided into 2 part of 4KPE (256 cip pemproses) Each 4KPE subcube has I/O system of its own Bus Width I/O = 64 bit Has 39 disk drive I/O 1 disk 1 bit 20
21 SIMD Computer Vector Connection machine is only suitable to solve artificial intelligent problems For floating point arithmetic such as grafic processing that involves vectors, connection machine is not suitable Example of SIMD Computer Vector Super Computer CRAY-1 21
22 CRAY-1 Consist of Multiple ALU that can operate simultaneously 2 addressing unit to compute addresses 4 unit integer scalar for arithmetic operations. 6 unit vector integer for vector operations 22
23 Cache Memory Characteristics of Memory System Location: Refers to whether memory is internal or external to the computer Example: main memory, cache (internal) and optical disk, magnetic disk (external) Capacity: Number of words or Number of bytes Unit of transfer: Word or block Access Method: Sequential, Direct, Random, Associative Performance: Access time, cycle time and transfer time Physical type: semiconductor, magnetic, optical Physical characteristic: volatile or erasable Organization: memory modules 23
24 Cache Memory Principles It is intended to give memory speed approaching that of the fastest memory available At the same time provide a large memory size at the price of less expensive types of semiconductor memories. The cache contains a copy of portions of main memory. When the processor attempts to read a word of memory: A check is made to determine if the word is in the cache. If so, word is delivered to the processor. If not, a block of main memory is read into cache and the word is delivered to the processor. The phenomenon of locality of reference, it is likely that there will be future references to that same memory location or to other words in the block 24
25 Cache/Main Memory Structure 25
26 Cache/Main Memory Principles Main memory consists up to 2 n addressable words, with each word having a unique n-bit address. For mapping purpose, this memory is considered to consist of a number of fixed length blocks of K words each. That is, M=2 n /K blocks in main memory. The cache consist of m blocks called lines. Each line contains K words, plus a tag of a few bits. Each line also includes control bits. 26
27 Cache Read Operation 27
28 Cache Mapping Function An algorithm is needed for mapping main memory blocks to cache line. It is because of a fewer cache lines than main memory blocks. The choice of the mapping function dictates how the cache is organized. Three technique can be used: Direct, associative and set associative. 28
29 Example A line is an adjacent series of bytes in main memory (that is, their addresses are contiguous). Suppose a line is 16 bytes in size. For example, suppose we have a 212 = 4K-byte cache with 28 = byte lines; a 224 = 16M-byte main memory, which is 212 = 4K times the size of the cache; and a 400-line program, which will not all fit into the cache at once. 29
30 Direct Mapping Under this mapping scheme, each memory line j maps to cache line j mod 128 so the memory address looks like this: Here: The "Word" field selects one from among the 16 addressable words in a line: The "Line" field defines the cache line where this memory line should reside. The "Tag" field of the address is is then compared with that cache line's 5-bit tag to determine whether there is a hit or a miss. If there's a miss, we need to swap out the memory line that occupies that position in the cache and replace it with the desired memory line. 30
31 Direct Mapping E.g., Supposed that we want to read or write a word at the address 357A, whose 16 bits are This translates to Tag = 6, line = 87, and Word = 10 (all in decimal). If line 87 in the cache has the same tag (6), then memory address 357A is in the cache. Otherwise, a miss has occurred and the contents of cache line 87 must be replaced by the memory line = 855 before the read or write is executed. Direct mapping is the most efficient cache mapping scheme, but it is also the least effective in its utilization of the cache - that is, it may leave some cache lines unused. 31
32 Associative mapping This mapping scheme attempts to improve cache utilization, but at the expense of speed. Here, the cache line tags are 12 bits, rather than 5, and any memory line can be stored in any cache line. The memory address looks like this: Here: The "Tag" field identifies one of the 2 12 = 4096 memory lines; all the cache tags are searched to find out whether or not the Tag field matches one of the cache tags. If so, we have a hit, and if not there's a miss and we need to replace one of the cache lines by this line before reading or writing into the cache. The "Word" field again selects one from among 16 addressable words (bytes) within the line. 32
33 Associative Mapping For example, suppose again that we want to read or write a word at the address 357A, whose 16 bits are Under associative mapping, this translates to Tag = 855 and Word = 10 (in decimal). So we search all of the 128 cache tags to see if any one of them will match with 855. If not, there's a miss and we need to replace one of the cache lines with line 855 from memory before completing the read or write. The search of all 128 tags in the cache is time-consuming. However, the cache is fully utilized since none of its lines will be unused prior to a miss (recall that direct mapping may detect a miss even though the cache is not completely full of active lines). 33
34 Set Associative Mapping This scheme is a compromise between the direct and associative schemes described above. Here, the cache is divided into sets of tags, and the set number is directly mapped from the memory address (e.g., memory line j is mapped to cache set j mod 64), as suggested by the diagram below: 34
35 Set Associative Mapping The memory address is now partitioned to like this: Here: The "Tag" field identifies one of the 26 = 64 different memory lines in each of the 26 = 64 different "Set" values. Since each cache set has room for only two lines at a time, the search for a match is limited to those two lines (rather than the entire cache). If there's a match, we have a hit and the read or write can proceed immediately. Otherwise, there's a miss and we need to replace one of the two cache lines by this line before reading or writing into the cache. The "Word" field again select one from among 16 addressable words inside the line. 35
36 Set Associative Mapping In set-associative mapping, when the number of lines per set is n, the mapping is called n-way associative. For instance, the above example is 2- way associative. Example: Again, supposed that we want to read or write a word at the memory address 357A, whose 16 bits are Under set-associative mapping, this translates to Tag = 13, Set = 23, and Word = 10 (all in decimal). So we search only the two tags in cache set 23 to see if either one matches tag 13. If so, we have a hit. Otherwise, one of these two must be replaced by the memory line being addressed (good old line 855) before the read or write can be executed. 36
ARCHITECTURAL CLASSIFICATION. Mariam A. Salih
ARCHITECTURAL CLASSIFICATION Mariam A. Salih Basic types of architectural classification FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FENG S CLASSIFICATION Handler Classification Other types of architectural
More informationLecture 24: Virtual Memory, Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large
More informationChapter 17 - Parallel Processing
Chapter 17 - Parallel Processing Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 17 - Parallel Processing 1 / 71 Table of Contents I 1 Motivation 2 Parallel Processing Categories
More informationFLYNN S TAXONOMY OF COMPUTER ARCHITECTURE
FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE The most popular taxonomy of computer architecture was defined by Flynn in 1966. Flynn s classification scheme is based on the notion of a stream of information.
More informationParallel Computing Introduction
Parallel Computing Introduction Bedřich Beneš, Ph.D. Associate Professor Department of Computer Graphics Purdue University von Neumann computer architecture CPU Hard disk Network Bus Memory GPU I/O devices
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More informationComputer parallelism Flynn s categories
04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationComputer Organization. Chapter 16
William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data
More informationPhysical characteristics (such as packaging, volatility, and erasability Organization.
CS 320 Ch 4 Cache Memory 1. The author list 8 classifications for memory systems; Location Capacity Unit of transfer Access method (there are four:sequential, Direct, Random, and Associative) Performance
More informationMulti-Processor / Parallel Processing
Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms
More informationUnit 9 : Fundamentals of Parallel Processing
Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing
More informationParallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam
Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationOrganisasi Sistem Komputer
LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple
More informationPARALLEL COMPUTER ARCHITECTURES
8 ARALLEL COMUTER ARCHITECTURES 1 CU Shared memory (a) (b) Figure 8-1. (a) A multiprocessor with 16 CUs sharing a common memory. (b) An image partitioned into 16 sections, each being analyzed by a different
More informationLecture 24: Memory, VM, Multiproc
Lecture 24: Memory, VM, Multiproc Today s topics: Security wrap-up Off-chip Memory Virtual memory Multiprocessors, cache coherence 1 Spectre: Variant 1 x is controlled by attacker Thanks to bpred, x can
More informationLecture 8: RISC & Parallel Computers. Parallel computers
Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer
More information3.3 Hardware Parallel processing
Parallel processing is the simultaneous use of more than one CPU to execute a program. Ideally, parallel processing makes a program run faster because there are more CPUs running it. In practice, it is
More informationOverview. Processor organizations Types of parallel machines. Real machines
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationComputer Architecture
Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors
More informationEastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.
Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture CACHE MEMORY Introduction Computer memory is organized into a hierarchy. At the highest
More informationECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache.
ECE 201 - Lab 8 Logic Design for a Direct-Mapped Cache PURPOSE To understand the function and design of a direct-mapped memory cache. EQUIPMENT Simulation Software REQUIREMENTS Electronic copy of your
More information3. Which of the following is volatile? [ ] A) Bubble memory B) RAM C) ROM D) Magneticdisk
Code No: 05210505 Set No. 1 JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD III B.Tech. II Sem. II Mid-Term Examinations, April 2009 COMPUTER ORGANIZATION Objective Exam Name: Hall Ticket No. Answer
More informationQ.1 Explain Computer s Basic Elements
Q.1 Explain Computer s Basic Elements Ans. At a top level, a computer consists of processor, memory, and I/O components, with one or more modules of each type. These components are interconnected in some
More informationChapter 18 Parallel Processing
Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD
More informationComputer Organization
University of Pune S.E. I.T. Subject code: 214442 Computer Organization Part 20 : Memory Organization Basics UNIT IV Tushar B. Kute, Department of Information Technology, Sandip Institute of Technology
More informationChapter 11. Introduction to Multiprocessors
Chapter 11 Introduction to Multiprocessors 11.1 Introduction A multiple processor system consists of two or more processors that are connected in a manner that allows them to share the simultaneous (parallel)
More informationIntroduction to parallel computing
Introduction to parallel computing 2. Parallel Hardware Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Motherboard Processor https://sites.google.com/
More information5 Computer Organization
5 Computer Organization 5.1 Foundations of Computer Science ã Cengage Learning Objectives After studying this chapter, the student should be able to: q List the three subsystems of a computer. q Describe
More informationCSCI 4717 Computer Architecture
CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel
More informationComputer Organization
University of Pune S.E. I.T. Subject code: 214442 Computer Organization Part 20 : Memory Organization Basics UNIT IV Tushar B. Kute, Department of Information Technology, Sandip Institute of Technology
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:
More informationA Review on Cache Memory with Multiprocessor System
A Review on Cache Memory with Multiprocessor System Chirag R. Patel 1, Rajesh H. Davda 2 1,2 Computer Engineering Department, C. U. Shah College of Engineering & Technology, Wadhwan (Gujarat) Abstract
More informationCMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ
CMPE 511 TERM PAPER Distributed Shared Memory Architecture by Seda Demirağ 2005701688 1. INTRODUCTION: Despite the advances in processor design, users still demand more and more performance. Eventually,
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy
More informationModule 5 Introduction to Parallel Processing Systems
Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationParallel Architectures
Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s
More informationWEEK 7. Chapter 4. Cache Memory Pearson Education, Inc., Hoboken, NJ. All rights reserved.
WEEK 7 + Chapter 4 Cache Memory Location Internal (e.g. processor registers, cache, main memory) External (e.g. optical disks, magnetic disks, tapes) Capacity Number of words Number of bytes Unit of Transfer
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More information(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (4 th Week)
+ (Advanced) Computer Organization & Architechture Prof. Dr. Hasan Hüseyin BALIK (4 th Week) + Outline 2. The computer system 2.1 A Top-Level View of Computer Function and Interconnection 2.2 Cache Memory
More informationParallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor
Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel
More informationComputer Organization
Objectives 5.1 Chapter 5 Computer Organization Source: Foundations of Computer Science Cengage Learning 5.2 After studying this chapter, students should be able to: List the three subsystems of a computer.
More informationParallel Architectures
Parallel Architectures Instructor: Tsung-Che Chiang tcchiang@ieee.org Department of Science and Information Engineering National Taiwan Normal University Introduction In the roughly three decades between
More informationTypes of Parallel Computers
slides1-22 Two principal types: Types of Parallel Computers Shared memory multiprocessor Distributed memory multicomputer slides1-23 Shared Memory Multiprocessor Conventional Computer slides1-24 Consists
More informationTDT 4260 lecture 3 spring semester 2015
1 TDT 4260 lecture 3 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU http://research.idi.ntnu.no/multicore 2 Lecture overview Repetition Chap.1: Performance,
More informationChapter 18. Parallel Processing. Yonsei University
Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types
More informationParallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.
Parallel Systems Prof. James L. Frankel Harvard University Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Architectures SISD (Single Instruction, Single Data)
More informationBinghamton University. CS-220 Spring Cached Memory. Computer Systems Chapter
Cached Memory Computer Systems Chapter 6.2-6.5 Cost Speed The Memory Hierarchy Capacity The Cache Concept CPU Registers Addresses Data Memory ALU Instructions The Cache Concept Memory CPU Registers Addresses
More informationNon-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.
CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says
More informationMultiprocessors - Flynn s Taxonomy (1966)
Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationChapter Seven. Large & Fast: Exploring Memory Hierarchy
Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationIntroduction to Parallel Processing
Babylon University College of Information Technology Software Department Introduction to Parallel Processing By Single processor supercomputers have achieved great speeds and have been pushing hardware
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationWilliam Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ William Stallings Computer Organization and Architecture 10 th Edition 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. 2 + Chapter 4 Cache Memory 3 Location Internal (e.g. processor registers,
More informationInstruction Register. Instruction Decoder. Control Unit (Combinational Circuit) Control Signals (These signals go to register) The bus and the ALU
Hardwired and Microprogrammed Control For each instruction, the control unit causes the CPU to execute a sequence of steps correctly. In reality, there must be control signals to assert lines on various
More informationIntroduction to High-Performance Computing
Introduction to High-Performance Computing Simon D. Levy BIOL 274 17 November 2010 Chapter 12 12.1: Concurrent Processing High-Performance Computing A fancy term for computers significantly faster than
More informationRAID 0 (non-redundant) RAID Types 4/25/2011
Exam 3 Review COMP375 Topics I/O controllers chapter 7 Disk performance section 6.3-6.4 RAID section 6.2 Pipelining section 12.4 Superscalar chapter 14 RISC chapter 13 Parallel Processors chapter 18 Security
More informationAdvanced Parallel Architecture. Annalisa Massini /2017
Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing
More informationArchitectures of Flynn s taxonomy -- A Comparison of Methods
Architectures of Flynn s taxonomy -- A Comparison of Methods Neha K. Shinde Student, Department of Electronic Engineering, J D College of Engineering and Management, RTM Nagpur University, Maharashtra,
More information3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:
BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Organization Prof. Michel A. Kinsy The course has 4 modules Module 1 Instruction Set Architecture (ISA) Simple Pipelining and Hazards Module 2 Superscalar Architectures
More informationCDA3101 Recitation Section 13
CDA3101 Recitation Section 13 Storage + Bus + Multicore and some exam tips Hard Disks Traditional disk performance is limited by the moving parts. Some disk terms Disk Performance Platters - the surfaces
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationTools and techniques for optimization and debugging. Fabio Affinito October 2015
Tools and techniques for optimization and debugging Fabio Affinito October 2015 Fundamentals of computer architecture Serial architectures Introducing the CPU It s a complex, modular object, made of different
More information5 Computer Organization
5 Computer Organization 5.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three subsystems of a computer. Describe the
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Ninth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationParallel and High Performance Computing CSE 745
Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel
More informationChapter 4 - Cache Memory
Chapter 4 - Cache Memory Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 4 - Cache Memory 1 / 143 Table of Contents I 1 Introduction 2 Computer Memory System Overview Characteristics
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationCSC 553 Operating Systems
CSC 553 Operating Systems Lecture 1- Computer System Overview Operating System Exploits the hardware resources of one or more processors Provides a set of services to system users Manages secondary memory
More informationComputer Architecture and Assembly Language. Spring
Computer Architecture and Assembly Language Spring 2014-2015 What is a computer? A computer is a sophisticated electronic calculating machine that: Accepts input information, Processes the information
More informationParallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization
Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor
More informationHigh Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA
High Performance Computing Leopold Grinberg T. J. Watson IBM Research Center, USA High Performance Computing Why do we need HPC? High Performance Computing Amazon can ship products within hours would it
More informationParallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple
More informationCache memory. Lecture 4. Principles, structure, mapping
Cache memory Lecture 4 Principles, structure, mapping Computer memory overview Computer memory overview By analyzing memory hierarchy from top to bottom, the following conclusions can be done: a. Cost
More informationCPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport
CPS 303 High Performance Computing Wensheng Shen Department of Computational Science SUNY Brockport Chapter 2: Architecture of Parallel Computers Hardware Software 2.1.1 Flynn s taxonomy Single-instruction
More informationUnit 2. Chapter 4 Cache Memory
Unit 2 Chapter 4 Cache Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation Location CPU Internal External Capacity Word
More informationLecture 2. Memory locality optimizations Address space organization
Lecture 2 Memory locality optimizations Address space organization Announcements Office hours in EBU3B Room 3244 Mondays 3.00 to 4.00pm; Thurs 2:00pm-3:30pm Partners XSED Portal accounts Log in to Lilliput
More informationChapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348
Chapter 1 Introduction: Part I Jens Saak Scientific Computing II 7/348 Why Parallel Computing? 1. Problem size exceeds desktop capabilities. Jens Saak Scientific Computing II 8/348 Why Parallel Computing?
More informationComputer System Overview
Computer System Overview Operating Systems 2005/S2 1 What are the objectives of an Operating System? 2 What are the objectives of an Operating System? convenience & abstraction the OS should facilitate
More informationParallel Computing Platforms
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationCS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics
CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically
More informationChapter 4. Cache Memory. Yonsei University
Chapter 4 Cache Memory Contents Computer Memory System Overview Cache Memory Principles Elements of Cache Design Pentium 4 and Power PC Cache 4-2 Key Characteristics 4-3 Location Processor Internal (main)
More informationParallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?
Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing
More informationFlynn s Taxonomy of Parallel Architectures
Flynn s Taxonomy of Parallel Architectures Stefano Markidis, Erwin Laure, Niclas Jansson, Sergio Rivas-Gomez and Steven Wei Der Chien 1 Sequential Architecture The von Neumann architecture was conceived
More informationOverview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition
William Stallings Computer Organization and Architecture 6th Edition Chapter 4 Cache Memory Overview IN this chapter we will study 4.1 COMPUTER MEMORY SYSTEM OVERVIEW 4.2 CACHE MEMORY PRINCIPLES 4.3 ELEMENTS
More informationMemory hierarchy and cache
Memory hierarchy and cache QUIZ EASY 1). What is used to design Cache? a). SRAM b). DRAM c). Blend of both d). None. 2). What is the Hierarchy of memory? a). Processor, Registers, Cache, Tape, Main memory,
More informationA Multiprocessor system generally means that more than one instruction stream is being executed in parallel.
Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,
More information