CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

Similar documents
Chapter 1: Introduction to Parallel Computing

Homeschool Enrichment. The System Unit: Processing & Memory

Computers: Inside and Out

7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc.

Memory classification:- Topics covered:- types,organization and working

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

Von Neumann architecture. The first computers used a single fixed program (like a numeric calculator).

Management Information Systems OUTLINE OBJECTIVES. Information Systems: Computer Hardware. Dr. Shankar Sundaresan

COMP2121: Microprocessors and Interfacing. Introduction to Microprocessors

COMPUTER BASICS LECTURER: ATHENA TOUMBOURI

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

Terminology, Types of Computers & Computer Hardware

Chapter Two. Hardware Basics: Inside the Box

Technology in Action

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CENG4480 Lecture 09: Memory 1

PDF created with pdffactory Pro trial version How Computer Memory Works by Jeff Tyson. Introduction to How Computer Memory Works

Chapter 1: Introduction to the Microprocessor and Computer 1 1 A HISTORICAL BACKGROUND

Introduction to Microprocessor

HARDWARE AND OPERATING SYSTEMS

Computer System. Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Computer Architecture

Computer Organization + DIGITAL DESIGN

Introduction to parallel computing

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Computer Architecture s Changing Definition

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

Alternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model

Where Have We Been? Ch. 6 Memory Technology

Chapter 5 12/2/2013. Objectives. Computer Systems Organization. Objectives. Objectives (continued) Introduction. INVITATION TO Computer Science 1

Chapter 4 The Components of the System Unit

Computers Are Your Future

Moore s Law. Computer architect goal Software developer assumption

COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

QUIZ Ch.6. The EAT for a two-level memory is given by:

Figure 1-1. A multilevel machine.

Computer Organization

Three-box Model: These three boxes need interconnecting (usually done by wiring known as a bus. 1. Processor CPU e.g. Pentium 4 2.

COMPUTER SYSTEM. COMPUTER SYSTEM IB DP Computer science Standard Level ICS3U. COMPUTER SYSTEM IB DP Computer science Standard Level ICS3U

Fundamentals of Programming Session 1

INTRODUCTION TO COMPUTERS

INTRODUCTION TO INFORMATION & COMMUNICATION TECHNOLOGY (ICT) LECTURE 2 : WEEK 2 CSC-111-T

Computer Architecture. Fall Dongkun Shin, SKKU

Systems Architecture

Computer Architecture Dr. Charles Kim Howard University

What is the typical configuration of a computer sold today? 1-1

machine cycle, the CPU: (a) Fetches an instruction, (b) Decodes the instruction, (c) Executes the instruction, and (d) Stores the result.

COMS 1003 Fall Introduction to Computer Programming in C. History & Computer Organization. September 15 th

Full file at

What s inside your computer? Session 3. Peter Henderson

Homework. Reading. Machine Projects. Labs. Exam Next Class. None (Finish all previous reading assignments) Continue with MP5

Introduction. Computer System Organization. Languages, Levels, Virtual Machines. A multilevel machine. Sarjana Magister Program

Computer Technology Flash Card 2

Unit 9 : Fundamentals of Parallel Processing

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

5 Computer Organization

Memory Systems IRAM. Principle of IRAM

Chapter 9: A Closer Look at System Hardware

Chapter 9: A Closer Look at System Hardware 4

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Introduction To Computer Hardware. Hafijur Rahman

CS 101, Mock Computer Architecture

UNIT 1.1 SYSTEMS ARCHITECTURE MCQS

Chapter One. Introduction to Computer System

CENG3420 Lecture 08: Memory Organization

Machine Architecture. or what s in the box? Lectures 2 & 3. Prof Leslie Smith. ITNP23 - Autumn 2014 Lectures 2&3, Slide 1

Chapter 08: The Memory System. Lesson 01: Basic Concepts

Chapter 5: Computer Systems Organization

Segment 1A. Introduction to Microcomputer and Microprocessor

Chapter 5: Computer Systems Organization. Invitation to Computer Science, C++ Version, Third Edition

Processor: Faster and Faster

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

System Unit Components Chapter2

ECE468 Computer Organization and Architecture. Memory Hierarchy

CSC 121 Computers and Scientific Thinking

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

EE 4980 Modern Electronic Systems. Processor Advanced

Electricity: Voltage. Gate: A signal enters the gate at a certain voltage. The gate performs operations on it, and sends it out was a new signal.

CS 261 Fall Mike Lam, Professor. Memory

Introduction to computers

A+ Guide to Hardware: Managing, Maintaining, and Troubleshooting, 5e. Chapter 1 Introducing Hardware

Computer Systems Architecture

Chapter 2. Prepared By: Humeyra Saracoglu

About the Presentations

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Memory Study Material

MGT Fall 2005 Exam #1

CHAPTER 1 Introduction

Chapter 2 Data Manipulation

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

Computer Organization

Summary of Computer Architecture

CS 140 Introduction to Computing & Computer Technology. Computing Components

Chapter 1 Basic Computer Organization

Outline Marquette University

CHAPTER 1 Introduction

Introduction to the Personal Computer

PC I/O. May 7, Howard Huang 1

Transcription:

CPS 303 High Performance Computing Wensheng Shen Department of Computational Science SUNY Brockport

Chapter 1: Introduction to High Performance Computing van Neumann Architecture CPU and Memory Speed Motivation of Parallel Computing Applications of Parallel Computing

1.1. van Neumann Architecture A fixed-program computer A stored-program computer A computer model for more than 40 years CPU executes a stored program The operation is sequential A sequence of read and write operations on the memory van Neumann proposed the use of ROM: read only stored program

John van Neumann Born on December 28, 1903, died on February 8, 1957. Mastered calculus at the age of 8. Graduate level math at the age of 12 Obtained his Ph.D. at the age of 23 Stored program concept

A Typical Example of van Neumann Architecture Memory Input Devices Output Devices Control Unit CPU Arithmetic Logic Unit External Storage

Modern Personal Computers Graphics cards Sound cards Network cards 1. Monitor Modems 2. Motherboard 3. CPU (Microprocessor) 4. Primary storage (RAM) 5. Expansion cards 6. Power supply 7. Optical disc drive 8. Secondary storage (Hard disk) 9. Keyboard 10. Mouse http://en.wikipedia.org/wiki/personal_computer Peripheral Component Interconnect

CISC and RISC machines CISC: stands for complex instruction set computer. A single bus system. CISC: Each individual instruction can execute several low-level operations, such as a memory load, an arithmetic operation, and a memory store. RISC: stands for reduced instruction set computer. Two bus system, a data bus and an address bus. They are all SISD machines: Single Instruction Stream on Single Data Stream.

1.2 CPU and memory speed Cray 1: 12ns 1975 Cray 2: 6ns 1986 Cray T-90 2ns 1997 Intel PC 1ns 2000 Today s PC 0.3ns 2006(P4)

Moore s Law Moore s law (1965): the number of transistors per square inch on integrated circuits had double every two years since the integrated circuit was invented How about the future? (price of computers that have the same computing power falls by half every two years?) In a 2008 article in InfoWorld, Randall C. Kennedy, formerly of Intel introduces this term using successive versions of Microsoft Office between the year 2000 and 2007 as his premise. Despite the gains in computational performance during this time period according to Moore's law, Office 2007 performed the same task at half the speed on a prototypical year 2007 computer as compared to Office 2000 on a year 2000 computer.

CPU and memory speed comparison In 20 years, CPU speed (clock rate) has increased by a factor of 1000 DRAM speed has increased only by a factor of smaller than 4 CPU speed: 1-2 ns Cache speed: 10 ns DRAM speed: 50-60 ns How to feed data fast enough to keep CPU busy?

Possible Solutions A hierarchy of successive fast memory devices (multilevel caches) Location of data reference Efficient programming can be an issue Parallel systems may provide (1) large aggregate cache (2) high aggregate bandwidth to the memory system

1.3 Price and Performance Comparison Price for highend CPU rises sharply Intel processor price/performance

1.4 Computation for special purpose Weather forecasting Information retrieval Car and aircraft design NASA space discovery

Problem: Insufficient memory Slow in speed

Example: predicting weather of US and Canada for next two days 20 kilometer Δx=Δy=Δz=0.1 kilometer 20 million square kilometers = 2.0 10 7 kilometer 5,000 kilometers 4,000 kilometers

Mesh size Number of cells: n = 5000 0.1 4000 0.1 20 0.1 = 4 10 11 0.1 kilometer Assuming it takes 100 calculations to determine the weather at a typical grid point, we wants to predict the weather condition at each hour for the next 48 hours, the total number of calculations are: 11 4 10 100 48 2 10 15

Assuming that our computer can execute one billion (10 9 ) calculations per second, it will take 2 5 9 6 10 /10 = 2 10 Seconds, or 23 days Increase the CPU speed to one trillion calculations per second? We still need more than half an hour. What happens if we wants to predict the weather for the whole earth, or if we want to use a smaller grid size, Δx=Δy=Δz=0.05 kilometer for better accuracy?

The memory requirement If we need 7 variables (u, v, w, p, T, ρ, ω) at each location, the memory cost is, 7 4 10 11 words = 112 10 11 bytes = 11,200 Gbytes Data transfer latency among CPU, registers, and memory

Possible solution: to build a processor executing 1 trillion operations per second For (i=0; i<one_trillion; i++) z[i] = x[i] + y[i]; Fetch x[i], y[i] Add z[i], x[i], y[i] Store z[i] At least 3 10 12 copies of data must be transfer between registers and memory per second Data are transferred with the speed of light 3 10 8 m/s.

We assume that r is the average distance of a word memory from the CPU, then r must satisfy r 3 10 12 r meters = 3 10 8 meters/second 1 second r = 10-4 meters We need at least three trillion words of memory to store x, y, and z. Memory words are typically arranged in rectangular grid in hardware. If we use square grid with side length s and connect the CPU to the center of the square, then the average distance from a memory location to the CPU is about s/2=r, so s=2 10-4 meters. For a square grid, a typical row of memory words will contain s 3 12 6 10 = 3 10 words

Therefore, we need to fit a single word of memory into a square with a side length of 4 2 10 10 3 10 6 10 meters, the size of an atom That is to say, we need to figure out how to represent a 32-bit word with a single atom. The solution of building a computer performing one trillion operations is extremely difficulty. Other solutions?

To invite one hundred people for a diner, should we build one big table to seat everyone? More tables How to perform the task of 10 15 calculation in minutes? More computers. We divide one big problem into many small sized sub problems.

1.5 Challenges: Communications. In a dinner, people sitting in different table can work around to talk to each other. How about data in different processors? How do processors communicate?

Tasks: Decide on and implement an interconnection network for the processors and memory modules Design and implement system software for the hardware Devise algorithms and data structures for solving our problem Divide the algorithm and data structures up into subproblems Identify the communications that will be needed among the subproblems Assign subproblems to processors and memory modules.

1.6 Topics covered in the course The architecture, interconnection network, and system software for parallel computing Message passing interface (MPI) libraries for parallel computing Basic communication of MPI Applications of using MPI in numerical computation Collective communication of MPI Designing and coding issues in parallel computing The performance of parallel computing