CS 2630 Computer Organization. What did we accomplish in 8 weeks? Brandon Myers University of Iowa

Similar documents
CS 2630 Computer Organization. What did we accomplish in 15 weeks? Brandon Myers University of Iowa

Computer Architecture. Fall Dongkun Shin, SKKU

Course web site: teaching/courses/car. Piazza discussion forum:

Many ways to build logic out of MOSFETs

UC Berkeley CS61C : Machine Structures

CS61C Machine Structures. Lecture 1 Introduction. 8/27/2006 John Wawrzynek (Warzneck)

Lecture 8: Synthesis, Implementation Constraints and High-Level Planning

CS61C : Machine Structures

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems

Written Exam / Tentamen

CSC258: Computer Organization. Memory Systems

Introduction to CMOS VLSI Design (E158) Lecture 7: Synthesis and Floorplanning

Chapter 4. The Processor

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS61C : Machine Structures

COMPUTER ORGANIZATION (CSE 2021)

CS61C : Machine Structures

ASIC = Application specific integrated circuit

CS/EE Prerequsites. Hardware Infrastructure. Class Goal CS/EE Computer Design Lab. Computer Design Lab Fall 2010

CS/EE Computer Design Lab Fall 2010 CS/EE T Th 3:40pm-5:00pm Lectures in WEB 110, Labs in MEB 3133 (DSL) Instructor: Erik Brunvand

Putting it all Together: Modern Computer Architecture

Computer Architecture CS372 Exam 3

COMPUTER ORGANIZATION (CSE 2021)

CS61C Machine Structures. Lecture 1 Introduction. 8/25/2003 Brian Harvey. John Wawrzynek (Warznek) www-inst.eecs.berkeley.

EECE 321: Computer Organization

Pipelining. CSC Friday, November 6, 2015

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Multicore and Parallel Processing

The Microprocessor as a Microcosm:

CPU Pipelining Issues

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101

Chapter 4. The Processor

Chapter 4. The Processor Designing the datapath

ECE/CS 3710 Computer Design Lab Lab 2 Mini-MIPS processor Controller modification, memory mapping, assembly code

Chapter 4. The Processor

Slide Set 8. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

CS3350B Computer Architecture. Introduction

Final Exam Fall 2007

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Spring 2016 :: CSE 502 Computer Architecture. Introduction. Nima Honarmand

ECE 8823: GPU Architectures. Objectives

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Write only as much as necessary. Be brief!

Computer Systems and Networks. ECPE 170 University of the Pacific

Chapter 4. The Processor

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Administrivia. Minute Essay From 4/11

ECE260: Fundamentals of Computer Engineering

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Teaching Computer Architecture with FPGA Soft Processors

378: Machine Organization and Assembly Language

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Higher Level Programming Abstractions for FPGAs using OpenCL

CS146 Computer Architecture. Fall Midterm Exam

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Contents Slide Set 9. Final Notes on Textbook Chapter 7. Outline of Slide Set 9. More about skipped sections in Chapter 7. Outline of Slide Set 9

ECE/CS Computer Design Lab

Multicore Hardware and Parallelism

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016

CS 6534: Tech Trends / Intro

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Computer Architecture s Changing Definition

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

GRE Architecture Session

Parallelism in Hardware

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07

04 - DSP Architecture and Microarchitecture

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

UC Berkeley CS61C : Machine Structures

LECTURE 1. Introduction

COURSE DESCRIPTION. CS 232 Course Title Computer Organization. Course Coordinators

COMPUTER ORGANIZATION AND DESI

CS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017

CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN FALL 2005

UC Berkeley CS61C : Machine Structures

45-year CPU Evolution: 1 Law -2 Equations

Handouts. (CSC-3501) Lecture 1 (15 Jan 2008) Seung-Jong Park (Jay) Class information. Schedule (check online frequently)

Engineering 9859 CoE Fundamentals Computer Architecture

TDT 4260 lecture 3 spring semester 2015

Final Exam Fall 2008

EE/CSCI 451: Parallel and Distributed Computation

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

LECTURE 10. Pipelining: Advanced ILP

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

CS 2630 Computer Organization. Meeting 13: Faster arithmetic and more operations Brandon Myers University of Iowa

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

ECE 2300 Digital Logic & Computer Organization. Caches

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 1: Introduction

Keywords and Review Questions

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 15

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

Transcription:

CS 2630 Computer Organization What did we accomplish in 8 weeks? Brandon Myers University of Iowa

Course evaluations ICON student (course?) tools Evaluations

require Why take 2630? Brandon s esoteric answer: graduates of a Computer Science program should have an appreciation for how real computers work ACM and IEEE s answer: Computer Organization and Architecture is a topic in Computer Science Curricula 2013 (https://www.acm.org/education/cs2013-final-report.pdf ) But more concretely...

require Why take 2630? 1. It will be up to you to design our new computer systems (software AND hardware)...computer architects have been panicking for nearly a decade and they are not calming down

require Why take 2630? 2. at some point you will be have to measure a system you ve built: performance (latency & throughput), energy usage, reliability,... To be able to measure/interpret/improve your system, it helps to understand how more of the computer works What metrics would you measure to know how good a car is? What can the engineer do to change the values of those metrics? https://commons.wikimedia.org/wiki/file:toyota_mirai_trimmed.jpg

The insights you brought to the course: CA Topics

App High-level language (e.g., C, Java) Compiler Operating system (e.g., Linux, Windows) Instruction set architecture (e.g., MIPS) Memory system Processor I/O system Datapath & Control Digital logic Circuits Devices (e.g., transistors) Physics

rug: don t need to write assembly code for a particular architecture. Instead write portable Java/Python/C code bumps: some C code isn t portable; some programmers write snippets of assembly code when the compiler doesn t do the best thing lw $t0, 4($s0) addi $t0, $t0, 10 sw $t0, 8($s0) You learned how to write assembly code in HW2 (usually the compiler does the work for us)

lw $t0, 4($s0) addi $t0, $t0, 10 sw $t0, 8($s0) rug: we can write MIPS programs in a language made of humanreadable characters, use pseudo instructions, refer to labels even though the machine reads binary numbers 10001110000010000000000000000100 00100001000010000000000000001010 10101110000010000000000000001000 Project1 MiniMA the MIPS assembler

lw $t0, 4($s0) addi $t0, $t0, 10 sw $t0, 8($s0) rug: linker allows us to write our programs modularly stdlib.a 000010000000000000000100 000010000000000000001010 000010000000000000001000 10001110000010000000000000000100 00100001000010000000000000001010 10101110000010000000000000001000 math.a 10001110000010000000000000000100 00100001000010000000000000001010 10101110000010000000000000001000 10001110000010000000000000000100 00100001000010000000000000001010 10101110000010000000000000001000 10000100000101111100000000111100 00111001000011010100000000001000 00000110000010000000000000001000 10001110000010000000000000000100 00100001000010000000000000001010 10101110000010000000000000001000 browser.exe

PEER INSTRUCTION (actually, a survey) strongly agree agree neutral disagree strongly agree a b c d e 1. I understand the relationship between bits, numbers, and information. 2. I understand the stored program concept 3. I understand the role of the instruction set architecture in a computer 4. I understand why abstractions are essential for building complex systems 5. I understand why the digital abstraction is important 6. I understand why the synchronous abstraction is important 7. I understand the tradeoffs in the memory hierarchy 8. I understand how problems can be decomposed into a datapath and a control 9. I appreciate the layers of the computing stack and why they may need to change in the near future.

rug: our program has the illusion of having access to the entire address space (e.g. all 2 32 bytes) of the computer browser.exe 10001110000010000000000000000100 00100001000010000000000000001010 10101110000010000000000000001000 In Project 2-2 you played the role of the Loader by loading your hex image into the Instruction memory

rug: machine code for the MIPS architecture ought to run on any MIPS processor, regardless of its design (its microarchitecture) bumps: choices about the architecture sometimes are based on assumptions about the microarhitecture (e.g., MIPS branch delay slot). Project 2-2, you designed, built, and tested a processor that runs assembled MIPS programs datapath and control

rug: we can build a complex system out of basic components. Synchronous abstraction allows us to not have to worry about interfaces between components Project 2-1, HW4 you built components (like register files and finite state machines) from sequential logic

You don t have to build the 5-stage pipelined MIPS processor you don t have to build a MIPS processor Instruction set architecture (e.g., MIPS) neural network structure and weights GPU instruction set systolic array and control MIPS R10000 (out-of-order superscalar) http://people.cs.pitt.edu/~cho/cs2410/papers/yeagermicromag96.pdf Digital logic Circuits

every component is made of logic gates; you learned how to build logic circuits in HW3

rug: It is a bit easier just to think of gates that are functions as opposed to transistors attaching output to Vsource or Vground logic gate made of pmos and nmos transistors arranged in a CMOS configuration rug: CMOS ensures every gate has a pure 0 or 1 output! This idea is the digital abstraction that lets the layers above compose two electrical circuits without worrying about how they affect each other.

rug: When building a functional digital logic circuit, no need to worry about how it is arranged on the silicon layout engineer s view of a NOR gate https://commons.wikimedia.org/wiki/file:nor_gate_layout.png

https://commons.wikimedia.org/wiki/file:nor_gate_layout.png rug: device engineers provide layout engineers with design rules. If they obey the required spacing between components then the transistors will work nmos cross section https://commons.wikimedia.org/wiki/file:mosfet_functioning_body.svg

rug: When operating a transistor in the saturation regimes it looks like an electrical switch between voltages GND and VDD. Part of supporting the digital abstraction. OFF ON

It s not enough to just build software these days all in production not just research computer vision processors running Google s augmented reality platform Tango Google TPU is built for machine learning Project Catapult: custom hardware running part of Bing search holographic processor for Microsoft s augmented reality platform HoloLens Sparc M7 chip is built specifically for for accelerating database queries

Life beyond Logisim? Logisim s main mode of input is schematic entry Much digital logic design uses hardware description languages (HDL) like Verilog (look up Verilog in your textbook index) HDL is not much different than what you did, except it is textual instead of graphical Typically have powerful compilers than make development easier than using Logisim, e.g., write a statement like case (ALUCtrl) { 0: R = X+Y 1: R = X-Y } And you get an ALU!

Did we really build a real processor? Yes! You implemented much of the MIPS Instruction Set Architecture. Your Project 2-2 could run Linux (at 4KHz clock frequency) given a bootloader program and Linux compiled for MIPS No, I mean like real hardware

No, I mean like real hardware If we use a hardware compiler, we could turn your logisim files (look inside; it s just some XML listing a bunch of components and wires) into an FPGA design or standard-cell VLSI design more to learn about how to deal with the details of these design flows, but you have a good starting point

Administrivia Final Exam Friday, 10am-12 in 205 MLH! open notes/book, no electronics reminder: practice materials on ICON in Files Exams Review tomorrow in class take at least one of the previous final exams and grade yourself come prepared with the list of problems or specific questions

Stuff in your textbook I recommend you look at! Ch 3.4 Finite State Machines state transition diagrams are to sequential logic AS truth tables are to combinational logic involves ideas that relate to other parts of CS, like finite automata Ch 8 Memory systems How do we deal with the tradeoff between capacity and speed in memory technologies? (one answer: caches) How do we run two programs on the computer simultaneously if both think they own the whole address space? (one answer: virtual memory)

What courses next? CS:3620 Operating Systems CS:3210 Programming languages and Tools (in C++) CS:3640 Introduction to Networks and Their Applications CS:3820 Programming Language Concepts CS:4640 Computer Security CS:4700 High Performance and Parallel Computing CS:5610: High performance computer architecture CS:4980 Topics in CS (Compiler Construction on raspberry pi) CS:4980 Topics in CS (Advanced Computer Architecture this Spring!)

What s to learn next: operating systems

Questions we didn t get to answer fully in CS2630 Operating systems how do multiple programs share the computer? 2-64 processors 1 network interface 1 memory 1 keyboard, mouse, screen 100 s of running programs how do you keep programs isolated from each other or one program from consuming all resources? how do you implement syscalls? how do you load the OS code into memory when you power on the computer?

What s to learn next: computer architecture

What s to learn next: computer architecture the role of parallelism in microarchitectures every implementation and its effect on performance... seconds program = seconds cycle cycles instruction instructions program...and cost and energy

Parallelism in architectures pipelining super scalar dataflow vector and others... multicore, VLIW, multithreading,...

Vector machines found in... early super computers Intel AVX GPUs SIMD: single instruction, multiple data

Superscalar machines superscalar found in... most CPUs in servers and smartphones superscalar + pipelining Replicate resources, e.g., two decoders, 2-wide instruction cache read port: fetch two instructions at a time two ALUs: execute two instructions at a time more register file write ports: write back two registers in one cycle

Dataflow machines a processor needs to get the instruction and the input data to the same physical place at the same time (known as dataflow locality ) Dataflow machines have a bunch of Execution units of various kind; the data flows through the operators Challenges?

What s to learn next: parallel computing

Metric for performance comparison: Time My program runs in 100 seconds If I parallelize it on 10 processors I saw that it runs in 12 seconds What is the speedup? T serial / T parallel = 100/12 = 8.33X

Predicting parallel running time (T par ) from serial running time My program runs in T ser = 100 seconds If I parallelize it on 10 processors, how fast will it run (i.e., what is T par )? T 6;<54=>? = T 4567689: ( 1 r 1 + r 1 s ) r = fraction of program that is able to be improved s = speedup when applying the improvement T 4567689: T 6;<54=>? = 1 1 r + r s In this form, it is called Amdahl s law: says your speedup is limited by how much of the program is improved (e.g., parallelize)

Amdahl s law applied to parallelization r s https://en.wikipedia.org/wiki/amdahl's_law#/media/file:amdahlslaw.svg

A sequential abstract machine model you already know RAM: random access memory just like any other computational step, accessing memory is cost of 1 RAM memory processor

One of the foundational parallel machine models: Parallel Random Access Machine (PRAM) All processors are attached to a shared memory Memory access takes 1 step More realistic variants of PRAM incur greater cost for conflicting memory accesses used very often for understanding the speedup limits of parallel algorithms; not very realistic

One of the foundational parallel machine models: Bulk synchronous parallel (BSP) this abstract machine does not support as many algorithms as CTA, but it is simpler (see blackboard notes) w 1 w 2 w p h 6 l https://en.wikipedia.org/wiki/bulk_synchronous_parallel

The future of CS2630 Stay in touch! Tell others how awesome CS2630 is! Sign up to be an approved tutor! https://cs.uiowa.edu/resources/approved-tutors CS2630 continuing in the TILE classrooms Thank you for being the first to participate in TILE version of CS2630 and new lab assignments working on: potentially a future opportunity for including lab assistants help students but do not grade work gain experience with teaching learn the material even better by teaching others