Final Lecture. A few minutes to wrap up and add some perspective

Similar documents
PC I/O. May 7, Howard Huang 1

Course web site: teaching/courses/car. Piazza discussion forum:

CS654 Advanced Computer Architecture. Lec 2 - Introduction

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Cycle Time for Non-pipelined & Pipelined processors

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

LECTURE 5: MEMORY HIERARCHY DESIGN

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Copyright 2012, Elsevier Inc. All rights reserved.

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Caches Concepts Review

Chapter 4. The Processor

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function

COMPUTER ORGANIZATION AND DESI

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS3350B Computer Architecture Winter 2015

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

LECTURE 3: THE PROCESSOR

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

Universität Dortmund. ARM Architecture

Caching Basics. Memory Hierarchies

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

Chapter 4. The Processor

Where Does The Cpu Store The Address Of The

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Lecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

Tutorial 11. Final Exam Review

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

UNIT I (Two Marks Questions & Answers)

EE 4683/5683: COMPUTER ARCHITECTURE

COMPUTER ORGANIZATION AND DESIGN

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections )

CS3350B Computer Architecture

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Summary of Computer Architecture

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

ARM processor organization

Computer Systems Architecture Spring 2016

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

Techniques described here for one can sometimes be used for the other.

Chapter Seven Morgan Kaufmann Publishers

Full Datapath. Chapter 4 The Processor 2

Computer Architecture Review. Jo, Heeseung

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

CPU Structure and Function

Computer Architecture Memory hierarchies and caches

CSEE 3827: Fundamentals of Computer Systems

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Computer Architecture. Lecture 6.1: Fundamentals of

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Lecture 15: Pipelining. Spring 2018 Jason Tang

Advanced processor designs

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

Caches. Hiding Memory Access Times

CEC 450 Real-Time Systems

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

Improve performance by increasing instruction throughput

Lecture 7 Pipelining. Peng Liu.

CS Computer Architecture

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011

Lecture 4: Instruction Set Architectures. Review: latency vs. throughput

Welcome to Part 3: Memory Systems and I/O

Review: latency vs. throughput

EE 4980 Modern Electronic Systems. Processor Advanced

Pipelining, Branch Prediction, Trends

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CPU Structure and Function

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Chapter 4. The Processor

CS 1013 Advance Computer Architecture UNIT I

Architecture and OS. To do. q Architecture impact on OS q OS impact on architecture q Next time: OS components and structure

Processors, Performance, and Profiling

Pipelining. CSC Friday, November 6, 2015

Outline Marquette University

William Stallings Computer Organization and Architecture

Chapter 4. The Processor

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Lecture: Storage, GPUs. Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4)

Transcription:

Final Lecture A few minutes to wrap up and add some perspective 1

2

Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection between software and hardware. In the 2nd part of the course we discussed processor design. We focused on pipelining, which is one of the most important ways of improving processor performance. The 3rd part focused on large and fast memory systems (via caching), virtual memory, and I/O. Finally, we briefly discussed performance tuning, including profiling and exploiting data parallelism via SIMD and Multi-Core processors. We also introduced many performance metrics to estimate the actual benefits of all of these fancy designs. Processor Memory Input/Output 3

Some recurring themes There were several recurring themes throughout the quarter. Instruction set and processor designs are intimately related. Parallel processing can often make systems faster. Performance and Amdahl s Law quantifies performance limitations. Hierarchical designs combine different parts of a system. Hardware and software depend on each other. 4

Instruction sets and processor designs The MIPS instruction set was designed for pipelining. All instructions are the same length, to make instruction fetch and jump and branch address calculations simpler. Opcode and operand fields appear in the same place in each of the three instruction formats, making instruction decoding easier. Only relatively simple arithmetic and data transfer instructions are supported. These decisions have multiple advantages. They lead to shorter pipeline stages and higher clock rates. They result in simpler hardware, leaving room for other performance enhancements like forwarding, branch prediction, and on-die caches. 5

Parallel processing One way to improve performance is to do more processing at once. There were several examples of this in our CPU designs. Multiple functional units can be included in a datapath to let single instructions execute faster. For example, we can calculate a branch target while reading the register file. Pipelining allows us to overlap the executions of several instructions. SIMD performance operations on multiple data items simultaneously. Multi-core processors enable thread-level parallel processing. Memory and I/O systems also provide many good examples. A wider bus can transfer more data per clock cycle. Memory can be split into banks that are accessed simultaneously. Similar ideas may be applied to hard disks, as with RAID systems. A direct memory access (DMA) controller performs I/O operations while the CPU does compute-intensive tasks instead. 6

Performance and Amdahl s Law First Law of Performance: Make the common case fast! But, performance is limited by the slowest component of the system. We ve seen this in regard to cycle times in our CPU implementations. Single-cycle clock times are limited by the slowest instruction. Pipelined cycle times depend on the slowest individual stage. Amdahl s Law also holds true outside the processor itself. Slow memory or bad cache designs can hamper overall performance. I/O bound workloads depend on the I/O system s performance. 7

Hierarchical designs Hierarchies separate fast and slow parts of a system, and minimize the interference between them. Caches are fast memories which speed up access to frequently-used data and reduce traffic to slower main memory. (Registers are even faster ) Buses can also be split into several levels, allowing higher-bandwidth devices like the CPU, memory and video card to communicate without affecting or being affected by slower peripherals. 8

Architecture and Software Computer architecture plays a vital role in many areas of software. Compilers are critical to achieving good performance. They must take full advantage of a CPU s instruction set. Optimizations can reduce stalls and flushes, or arrange code and data accesses for optimal use of system caches. Operating systems interact closely with hardware. They should take advantage of CPU features like support for virtual memory and I/O capabilities for device drivers. The OS handles exceptions and interrupts together with the CPU. 9

Five things that I hope you will remember Abstraction: the separation of interface from implementation. ISA s specify what the processor does, not how it does it. Locality: Temporal Locality: if you used it, you ll use it again Spatial Locality: if you used it, you ll use something near it Caching: buffering a subset of something nearby, for quicker access Typically used to exploit locality. Indirection: adding a flexible mapping from names to things Virtual memory s page table maps virtual to physical address. Throughput vs. Latency: (# things/time) vs. (time to do one thing) Improving one does not necessitate improving the other. 10