What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?

Similar documents
EE 457. EE 457 Unit 0. Prerequisites. Course Info Lecture: Prof. Redekopp Class Introduction Basic Hardware Organization

Computers: Inside and Out

CS103 Lecture 1 Slides. Introduction Mark Redekopp

VLSI Design Automation

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

VLSI Design Automation

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

CS 3410: Computer System Organization and Programming

CS103 Lecture 1 Slides. Introduction Mark Redekopp

CS Computer Architecture Spring Lecture 01: Introduction

45-year CPU Evolution: 1 Law -2 Equations

Introduction to Microprocessor

Chapter 1: Fundamentals of Quantitative Design and Analysis

Parallelism and Concurrency. COS 326 David Walker Princeton University

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016

Computer Architecture. Fall Dongkun Shin, SKKU

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Outline Marquette University

CS 6534: Tech Trends / Intro

Microelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica

SYSTEM BUS AND MOCROPROCESSORS HISTORY

Lecture 1: CS/ECE 3810 Introduction

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

Lecture 1: Gentle Introduction to GPUs

Introduction to ICs and Transistor Fundamentals

Fundamentals of Quantitative Design and Analysis

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II

Computer Architecture. What is it?

ECE 471 Embedded Systems Lecture 2

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Advanced Computer Architecture (CS620)

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Design Metrics. A couple of especially important metrics: Time to market Total cost (NRE + unit cost) Performance (speed latency and throughput)

VLSI Design Automation. Maurizio Palesi

CS/EE 6810: Computer Architecture

Unit 1. Digital Representation

EE5780 Advanced VLSI CAD

Computer Architecture!

Performance of computer systems

Multicore and Parallel Processing

Overview of the ECE Computer Software Curriculum. David O Hallaron Associate Professor of ECE and CS Carnegie Mellon University

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

Computer Architecture

x86 Architectures; Assembly Language Basics of Assembly language for the x86 and x86_64 architectures

Computer Architecture!

Thomas Polzer Institut für Technische Informatik

Microprocessor Trends and Implications for the Future

COMPUTER ORGANIZATION (CSE 2021)

High Performance Computing

CMPEN 411 VLSI Digital Circuits. Lecture 01: Introduction

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Fundamentals of Computers Design

Course Overview. Jo, Heeseung

Chapter 1: Introduction to the Microprocessor and Computer 1 1 A HISTORICAL BACKGROUND

Computer Architecture

Functional Programming in Hardware Design

EE380K: Computing In Transition

Performance, Power, Die Yield. CS301 Prof Szajda

1.3 Data processing; data storage; data movement; and control.

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

The Implications of Multi-core

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Advanced Computer Architecture

HISTORY OF MICROPROCESSORS

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Course overview Computer system structure and operation

Fundamentals of Computer Design

Lecture 1: Course Introduction and Overview Prof. Randy H. Katz Computer Science 252 Spring 1996

7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc.

EE108B: Digital Systems II EE108B. Digital Systems II. Major Topics. What EE108b is About

ECE 172 Digital Systems. Chapter 15 Turbo Boost Technology. Herbert G. Mayer, PSU Status 8/13/2018

Digital Systems. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Concept of Memory. The memory of computer is broadly categories into two categories:

Great Reality #2: You ve Got to Know Assembly Does not generate random values Arithmetic operations have important mathematical properties

COMP 633 Parallel Computing.

CS3350B Computer Architecture CPU Performance and Profiling

Basic Concepts COE 205. Computer Organization and Assembly Language Dr. Aiman El-Maleh

Lecture 1: Introduction

ADVANCED FPGA BASED SYSTEM DESIGN. Dr. Tayab Din Memon Lecture 3 & 4

Embedded Systems. 7. System Components

CS 101, Mock Computer Architecture

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems

COMPUTER ORGANIZATION (CSE 2021)

Lecture 18: DRAM Technologies

EE586 VLSI Design. Partha Pande School of EECS Washington State University

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems

Parallelism Marco Serafini

Parallelism, Multicore, and Synchronization

EEM 486: Computer Architecture

EECS4201 Computer Architecture

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

Transistors and Wires

Computer Performance

Computer Architecture!

Knowledge Organiser. Computing. Year 10 Term 1 Hardware

CIT 668: System Architecture. Computer Systems Architecture

PDF created with pdffactory Pro trial version How Computer Memory Works by Jeff Tyson. Introduction to How Computer Memory Works

Computer Architecture

Transcription:

0.1 What is This Course About? 0.2 CS 356 Unit 0 Class Introduction Basic Hardware Organization Introduction to Computer Systems a.k.a. Computer Organization or Architecture Filling in the "systems" details How is software generated (compilers, libraries) and executed (OS, etc.) How does computer hardware work and how does it execute the software I write? Lays a foundation for future CS courses CS 350 (Operating Systems), ITP/CS 439 (Compilers), CS 353/EE 450 (Networks), EE 457 (Computer Architecture) Today's Digital Environment 0.3 Why is System Knowledge Important? 0.4 C++ / Java / Algorithms Python Assembly / OS / Machine Code Libraries Processor / Memory / GPU / FPGAs Digital Logic Transistors / Circuits Voltage / Currents Applications Networks C++ / Java / Python Our Focus in CS 356 Assembly / Machine Code Processor / Memory / GPU / FPGAs Digital Logic Algorithms OS / Libraries Transistors / Circuits Voltage / Currents Increase productivity Debugging Build/compilation High-level language abstractions break down at certain points Improve performance Take advantage of hardware features Avoid pitfalls presented by the hardware Basis of understanding security and exploits

0.5 0.6 What Will You Learn Syllabus Binary representation systems Assembly Processor organization Memory subsystems (caching, virtual memory) Compiler optimization and linking 0.7 0.8 20-Second Timeout Who Am I? Teaching faculty in EE and CS Undergrad at USC in CECS Grad at USC in EE Work(ed) at Raytheon Learning Spanish (and Chinese?) Sports enthusiast! Basketball Baseball Ultimate Frisbee? ABSTRACTIONS & REALITY

0.9 0.10 Abstraction vs. Reality Abstraction is good until reality intervenes Bugs can result It is important to underlying HW implementations Sometimes abstractions don't provide the control or performance you need Reality 1 ints are not integers and floats aren't reals Is x 2 >= 0? Floats: Yes Ints: Not always 40,000*40,000 = 1,600,000,000 50,000*50,000 = -1,794,967,296 Is (x+y)+z = x+(y+z)? Ints: Yes Floats: Not always (1e20 + -1e20) + 3.14 = 3.14 1e20 + (-1e20 + 3.14) = around 0 0.11 0.12 Reality 2 Knowing some assembly is critical You'll probably never write much (any?) code in assembly as compilers are often better than even humans at optimizing code But knowing assembly is critical when Tracking down some bugs Taking advantage of certain HW features that a compiler may not be able to use Implementing system software (OS/compilers/libraries) Understanding security and vulnerabilities Memory matters! Memory is not infinite Reality 3 Memory can impact performance more than computation for many applications Source of many bugs both for single-threaded and especially parallel programs Source of many security vulnerabilities

0.13 0.14 Reality 4 There's more to performance than asymptotic complexity Constant factors matter! Even operation counts do not predict performance How long an instruction takes to execute is not deterministic it depends on what other instructions have been execute before it Understanding how to optimize for the processor organization and memory can lead to up to an order of magnitude performance increase Drivers and Trends COMPUTER ORGANIZATION AND ARCHITECTURE 0.15 0.16 Computer Components Architecture Issues Processor Executes the program and performs all the operations Main Memory Stores data and program (instructions) Different forms: RAM = read and write but volatile (lose values when power off) ROM = read-only but non-volatile (maintains values when power off) Significantly slower than the processor speeds Input / Output Devices Generate and consume data from the system MUCH, MUCH slower than the processor Disk Drive Instructions Data Software Program Input Devices Data Combine 2c. Flour Mix in 3 eggs Processor Arithmetic + Logic + Control Circuitry Program (Instructions) Data (Operands) Memory (RAM) Processor (Reads instructions, operates on data) Output Devices Fundamentally, computer architecture is all about the different ways of answering the question: What do we do with the ever-increasing number of transistors available to us Goal of a computer architect is to take increasing transistor budgets of a chip (i.e. Moore s Law) and produce an equivalent increase in computational ability

Moore s Law, Computer Architecture & Real- Estate Planning Moore s Law = Number of transistors able to be fabricated on a chip grows exponentially with time Computer architects decide, What should we do with all of this capability? Similarly real-estate developers ask, How do we make best use of the land area given to us? USC University Park Development Master Plan http://re.usc.edu/docs/university%20park%20development%20project.pdf 0.17 Transistor Physics Cross-section of transistors on an IC Moore s Law is founded on our ability to keep shrinking transistor sizes Gate/channel width shrinks Gate oxide shrinks Transistor feature size is referred to as the implementation technology node 0.18 Technology Nodes 0.19 Tranistor Count (Thousands) 1,000,000 100,000 10,000 1,000 100 Growth of Transistors on Chip Intel '286 (134K) Intel '386 (275K) Pentium (3.1M) Intel '486 (1.2M) Pentium 3 (28M) Pentium 4 Northwood (42M) Pentium 2 (7M) Pentium Pro (5.5M) Pentium 4 Prescott (125M) Pentium D (230M) Core 2 Duo (291M) 0.20 10 Intel 8086 (29K) 1 1975 1980 1985 1990 1995 2000 2005 2010 Year

0.21 0.22 Implications of Moore s Law What should we do with all these transistors Put additional simple cores on a chip Use transistors to make cores execute instructions faster Use transistors for more on-chip cache memory Cache is an on-chip memory used to store data the processor is likely to need Faster than main-memory (RAM) which is on a separate chip and much larger (thus slower) Memory Wall Problem Processor performance is increasing much faster than memory performance 55%/year 7%/year Processor-Memory Performance Gap Hennessy and Patterson, Computer Architecture A Quantitative Approach (2003) 0.23 0.24 Cache Example Pentium 4 Small, fast, on-chip memory to store copiesof recently-used data When processor attempts to access data it will check the cache first If the cache has the desired data, it can supply it quickly If the cache does not have the data, it must go to the main memory (RAM) to access it Processor Cache does Cache not have desired data System Bus RAM Processor Cache System Bus Cache has desired data L2 Cache L1 Data RAM L1 Instruc.

0.25 0.26 10000 1000 Increase in Clock Frequency Pentium 4 Willamette (1500) Pentium 4 Prescott (3600) Pentium D (2800) Core 2 Duo (2400) Intel Nehalem Quad Core Pentium 2 (266) Pentium 3 (700) Frequency (MHz) 100 Intel '286 (12.5) Intel '386 (20) Intel '486 (25) Pentium (60) Pentium Pro (200) 10 Intel 8086 (8) 1 1975 1980 1985 1990 1995 2000 2005 2010 Year 0.27 0.28 Progression to Parallel Systems GPU Chip Layout If power begins to limit clock frequency, how can we continue to achieve more and more operations per second? By running several processor cores in parallel at lower frequencies Two cores @ 2 GHz vs. 1 core @ 4 GHz yield the same theoretical maximum ops./sec. For various applications like graphics and computationally intensive workloads this is taken to an extreme by GPUs 2560 Small Cores Upwards of 7.2 billion transistors 8.2 TFLOPS 320 Gbytes/sec Source: NVIDIA Photo: http://www.theregister.co.uk/2010/01/19/nvidia_gf100/

0.29 Intel Haswell Quad Core