1 The Many-Core Revolution Understanding Change Alejandro Cabrera January 29, 2009
2 Disclaimer This presentation currently contains several claims requiring proper citations and a few images that may or may not be licensed under the Creative Commons. The dwarf twins later on are CC-compatible. In short, it is not ready for production use. You've been warned.
3 Acknowledgements Berkeley View: The bulk of the presentation is based off of this paper. NVIDIA: Their GPUs and spec-sheets provide some exciting numbers. Google: Image searching made easy. Tilera: Many-core CPU.
4 Overview Exciting Pictures, Exciting Numbers State of the Core Why Many-Core? Common Wisdoms Refuted Dwarves and Applications Programming Many-Core Discussion
12 Many-Core GPU: Nvidia GTX 295 GPU Engine Specs Cores 480 Graphics Clock 576 MHz Processor Clock 1242 MHz Texture Fill Rate 92.2 billion pixels/sec 92.2 billion pixels per second... A high-end monitor has a resolution of 2560 x 1600
13 Many-Core GPU: Nvidia GTX 295 GPU Engine Specs Cores 480 Graphics Clock 576 MHz Processor Clock 1242 MHz Texture Fill Rate 92.2 billion pixels/sec 92.2 billion pixels per second... A high-end monitor has a resolution of 2560 x 1600 That's...
14 Many-Core GPU: Nvidia GTX 295 GPU Engine Specs Cores 480 Graphics Clock 576 MHz Processor Clock 1242 MHz Texture Fill Rate 92.2 billion pixels/sec 92.2 billion pixels per second... A high-end monitor has a resolution of 2560 x 1600 That's million pixels
15 Many-Core GPU: Nvidia GTX 295 GPU Engine Specs Cores 480 Graphics Clock 576 MHz Processor Clock 1242 MHz Texture Fill Rate 92.2 billion pixels/sec 4.1 million << 92.2 billion You could re-draw an entire scene about 22,500 times per second!
16 Many-Core GPU: Nvidia GTX 295 GPU Engine Specs Cores 480 Graphics Clock 576 MHz Processor Clock 1242 MHz Texture Fill Rate 92.2 billion pixels/sec 4.1 million << 92.2 billion You could re-draw an entire scene about 22,500 times per second! (assumes trivial, flat scene)
20 Many-Core GPU: Nvidia GTX 295 Numbers WARNING The following slides feature assumptions that have no basis in reality. *A disk cannot store data at 223.8GB/s (yet)
21 Many-Core GPU: Nvidia GTX seconds of data processing ~1 TB
22 Many-Core GPU: Nvidia GTX minute of data processing
23 Many-Core GPU: Nvidia GTX hour of data processing 3.5 form factor = 4 x 5.75 x 1 = 23in 3 15 x 60 disks filled in one hour 23in 3 x 15 x 60 = 20,700in 3 = 1,725ft 3 Height of world's tallest building: 1,730ft To fit all the data, would require a cube as wide, long, and tall as Sears (Willis) Tower!
24 State of the Core Where Are We Now? Sequential processors aren't getting any faster No free lunch via Moore's Law voucher Quad-core = commodity Parallel applications few and far between in consumer markets Look to scientific and enterprise computing Multi-process Google Chrome browser Vendors want more performance and they want it yesterday Often a selling point
25 State of the Core Where Are We Now? Graphics processors no longer take the back seat GPGPU (2002) CUDA v1.1 (2007) CUDA v3.0b (2010) Powerful accelerators mingling with CPU IBM Cell
26 Biggest problem: State of the Core Where Are We Now? How do we develop efficient, correct, scalable parallel components? How do we develop highly-parallel applications composed of those components? Components: Data structures Algorithms
27 Why Many-Core? Beyond Exciting Numbers It's not a victory parade towards a bright, new idea. It's a retreat from an even greater challenge. We can't make sequential processors faster without melting them (or exploding our energy bills). We still want to get faster as quickly as possible, so we pursue the most immediate solution towards that end. As a result, many of our conventional wisdoms acquired over previous decades of computing have been overturned.
28 Common Wisdoms (Refuted) Power vs. Transistors Old Wisdom Power is free, but transistors are expensive. New Wisdom Power is expensive, but transistors are free. We can fit as many transistors on a chip as we have power to turn them on!
29 Common Wisdoms (Refuted) Dynamic vs. Static Power Old Wisdom You should only worry about dynamic power (voltage scaling). New Wisdom For desktops and servers, static power leakage can be 40% of total power.
30 Common Wisdoms (Refuted) Hardware Errors Old Wisdom Uniprocessors are reliable internally. New Wisdom With transistor designs falling below 65nm scale, errors occur at quantum level.
31 Common Wisdoms (Refuted) Scaling Designs Old Wisdom Old successes guide future successes, so we need only build upon prior designs. New Wisdom As design size (nanometer) drops, a multitude of factors will stretch development time.
32 Common Wisdoms (Refuted) Architecture Research Old Wisdom Let academia evaluate experimental designs they can build their own chips. New Wisdom Academia can no longer afford tools required to build believable chips.
33 Common Wisdoms (Refuted) Bandwidth vs. Latency Old Wisdom Performance improvements latency drops and bandwidth increases New Wisdom Bandwidth improves exponentially compared to latency (memory wall)
34 Common Wisdoms (Refuted) Computation vs. Memory Access Old Wisdom Store common computations in tables arithmetic is slow. New Wisdom Re-compute needed results data storage is slow.
35 Common Wisdoms (Refuted) Instruction Level Parallelism Old Wisdom There's an abundance of ILP waiting to be found by compilers, architectures designs, VLIW... New Wisdom Diminishing returns on ILP.
36 Common Wisdoms (Refuted) Moore's Law Old Wisdom Performance doubles every 18 months on a uniprocessor. New Wisdom Power Wall + Memory Wall + ILP Wall greater than 60 months to maintain Moore's Law for uniprocessors
37 Common Wisdoms (Refuted) Why Parallelize? Old Wisdom Don't bother parallelizing Moore's Law promises it'll run faster in a couple of years, unmodified. New Wisdom It'll be a long time before an unmodified program gets faster.
38 Common Wisdoms (Refuted) Parallel Performance Value Old Wisdom If it doesn't scale linearly, trash it. New Wisdom Any performance scaling is better than none- it's the only way to get faster now!
39 Common Wisdoms Moore's Law II New Wisdom The number of cores available on a chip doubles every 18 months.
40 Classifying Parallelism Dwarves To better understand parallel applications, a series of areas where parallelism is commonly exploited were analyzed. These areas are dwarves, patterns of communication and computation that identify a category of application.
41 Classifying Parallelism The Seven Dwarves
42 The Seven Dwarves Dense Linear Algebra (BLAS) Sparse Linear Algebra (conjugate gradient) Spectral Methods (FFT, DSP) N-Body simulation Structured Grid (PDE solver) Unstructured Grid Monte Carlo (embarrassingly parallel why?) Independent events
43 The Six (Other) Dwarves Why more dwarves? Up and coming algorithmic techniques and application domains require parallelization Observed domains: Combinatorial logic (SHA, MD5, AES, cryptography) Graph traversal (BFS, A*, maximum network flow) Finite state machines Bayesian networks, Hidden Markov Models Machine learning Dynamic programming Backtrack, branch-and-bound
44 The Six (Other) Dwarves In particular, no approach is currently known for parallel evaluation of a finite state machine. How can a system be in multiple states at once? Thought to be embarrassingly sequential. This brings us to our crux how do we develop new parallel applications?
45 Programming Many-Core We have exciting equipment. GPUs, CPUs, accelerators... We have clear applications areas. Linear algebra, graph traversal, dynamic programming... How do we use those exciting machines to satisfy those application requirements?
46 Programming Many-Core #include <???> import??? from
47 Programming Many-Core #include <???> import??? from How did I get here...?
48 Programming Many-Core #include <???> import??? from and who is that behind me?
49 Parallel Programming Models There are many (more) details that may be necessary to manage in order to produce an efficient parallel application versus a sequential application. Programming models seek to simplify the management of one or more of the following: Task identification Task mapping Data distribution Communication mapping Synchronization
50 MPI Parallel Programming Models A Few Existing Examples Pthreads MapReduce OpenMP CUDA OpenCL
51 Parallel Programming Models A Few Existing Examples Model Task ID Task Mapping Data Distrib. Comm. Mapping Sync MPI explicit explicit explicit implicit implicit Pthreads explicit explicit implicit implicit explicit MapReduce explicit implicit implicit implicit explicit OpenMP implicit implicit implicit implicit implicit CUDA explicit explicit explicit implicit explicit OpenCL N/A N/A N/A N/A N/A
52 Parallel Programming Models Message passing: Pros: Harder to make mistakes Much upfront planning Highly verifiable model Cons: Difficult to learn Hardware not yet thought of widely as networked components Shared Memory Pros: Friendly learning curve Matches hardware layout Cons: May not scale Cache coherence... Easy to make mistakes
53 Looking to the Many-Core Future We need: Better parallel programming models Better compilers A set of primitives to build APIs from Thoroughly tested parallel components Libraries Tools
54 Looking to the Many-Core Future Many challenges await. However, thanks to the revised Moore's Law, we have much to look forward to. Now, more than ever before, we'll be able to make great advances in sciences depending on extensive computation. All we have to do is add another core.
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: email@example.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
GPGPU on ARM Tom Gall, Gil Pitney, 30 th Oct 2013 Session Description This session will discuss the current state of the art of GPGPU technologies on ARM SoC systems. What standards are there? Where are
Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have
PFAC Library: GPU-Based String Matching Algorithm Cheng-Hung Lin Lung-Sheng Chien Chen-Hsiung Liu Shih-Chieh Chang Wing-Kai Hon National Taiwan Normal University, Taipei, Taiwan National Tsing-Hua University,
CSE 392/CS 378: High-performance Computing: Principles and Practice Administration Professors: Keshav Pingali 4.126 ACES Email: firstname.lastname@example.org Jim Browne Email: email@example.com Robert van de
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
Part IV Review of hardware-trends for real-time ray tracing Hardware Trends For Real-time Ray Tracing Philipp Slusallek Saarland University, Germany Large Model Visualization at Boeing CATIA Model of Boeing
Parallel and Distributed Computing Chapter 1: Introduction to Parallel Computing Jun Zhang Laboratory for High Performance Computing & Computer Simulation Department of Computer Science University of Kentucky
The Future of GPU Computing Bill Dally Chief Scientist & Sr. VP of Research, NVIDIA Bell Professor of Engineering, Stanford University November 18, 2009 The Future of Computing Bill Dally Chief Scientist
Evaluation Of The Performance Of GPU Global Memory Coalescing Dae-Hwan Kim Department of Computer and Information, Suwon Science College, 288 Seja-ro, Jeongnam-myun, Hwaseong-si, Gyeonggi-do, Rep. of Korea
How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead firstname.lastname@example.org This research used resources of the Oak Ridge Leadership
Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era 11/16/2011 Many-Core Computing 2 Gene M. Amdahl, Validity of the Single-Processor Approach to Achieving
Using CUDA to Accelerate Radar Image Processing Aaron Rogan Richard Carande 9/23/2010 Approved for Public Release by the Air Force on 14 Sep 2010, Document Number 88 ABW-10-5006 Company Overview Neva Ridge
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
Direct Rendering of Trimmed NURBS Surfaces Hardware Graphics Pipeline 2/ 81 Hardware Graphics Pipeline GPU Video Memory CPU Vertex Processor Raster Unit Fragment Processor Render Target Screen Extended
Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
How many cores are too many cores? Dr. Avi Mendelson, Intel - Mobile Processors Architecture group email@example.com 1 Disclaimer No Intel proprietary information is disclosed. Every future estimate
Multi-Processors and GPU Philipp Koehn 7 December 2016 Predicted CPU Clock Speed 1 Clock speed 1971: 740 khz, 2016: 28.7 GHz Source: Horowitz "The Singularity is Near" (2005) Actual CPU Clock Speed 2 Clock
Accelerating Financial Applications on the GPU Scott Grauer-Gray Robert Searles William Killian John Cavazos Department of Computer and Information Science University of Delaware Sixth Workshop on General
Twos Complement Signed Numbers IT 3123 Hardware and Software Concepts Modern Computer Implementations April 26 Notice: This session is being recorded. Copyright 2009 by Bob Brown http://xkcd.com/571/ Reminder:
CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D
1 CS 6534: Tech Trends / Intro Charles Reiss 24 August 2016 Moore s Law Microprocessor Transistor Counts 1971-2011 & Moore's Law 16-Core SPARC T3 2,600,000,000 1,000,000,000 Six-Core Core i7 Six-Core Xeon
CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN Graphics Processing Unit Accelerate the creation of images in a frame buffer intended for the output
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
Technology in Action Chapter 6 Understanding and Assessing Hardware: Evaluating Your System 1 Chapter Topics To buy or to upgrade? Evaluating your system CPU RAM Storage devices Video card Sound card System
EE 4702-1, GPU Programming When / Where Here (1218 Patrick F. Taylor Hall), MWF 11:30-12:20 Fall 2017 http://www.ece.lsu.edu/koppel/gpup/ Offered By David M. Koppelman Room 3316R Patrick F. Taylor Hall
Scalable Multi Agent Simulation on the GPU Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Reasoning Explicit State machine, serial Implicit Compute intensive Fits SIMT well Collision avoidance Motivation
AMD Smarter Choice Graphics Hardware 2008 Mike Mantor AMD Fellow Architect firstname.lastname@example.org GPUs vs. Multi-core CPUs On a Converging Course or Fundamentally Different? Many Cores Disruptive Change
Partial Wave Analysis using Graphics Cards Niklaus Berger IHEP Beijing Hadron 2011, München The (computational) problem with partial wave analysis n rec * * i=1 * 1 Ngen MC NMC * i=1 A complex calculation
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
1 Realtime Water Simulation on GPU Nuttapong Chentanez NVIDIA Research 2 3 Overview Approaches to realtime water simulation Hybrid shallow water solver + particles Hybrid 3D tall cell water solver + particles
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications
Advanced Seminar Computer Engineering Institute of Computer Engineering (ZITI) University of Heidelberg February 5, 2014 Overview 1 2 Current Platforms: 3 4 5 Architecture 6 2/37 Single-thread Performance
Ryzen Agenda What is Ryzen? History Features Zen Architecture SenseMI Technology Master Software Benchmarks The Ryzen Chip What is Ryzen? CPU chip family released by AMD in 2017, which uses their latest
ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:
CS 515 Programming Language and Compilers I Lecture 1: Introduction and Basics Zheng (Eddy) Zhang Rutgers University Fall 2017, 9/5/2017 Class Information Instructor: Zheng (Eddy) Zhang Email: eddyzhengzhang@gmailcom
Παράλληλη Επεξεργασία Μέτρηση και σύγκριση Παράλληλης Απόδοσης Γιάννος Σαζεϊδης Εαρινό Εξάμηνο 2013 HW 1. Homework #3 due on cuda (summary of Tesla paper on web page) Slides based on Lin and Snyder textbook
Management Information Systems Information Systems: Computer Hardware Dr. Shankar Sundaresan (Adapted from Introduction to IS, Rainer and Turban) OUTLINE Introduction The Central Processing Unit Computer
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
The Era of Heterogeneous Computing EU-US Summer School on High Performance Computing New York, NY, USA June 28, 2013 Lars Koesterke: Research Staff @ TACC Nomenclature Architecture Model -------------------------------------------------------
Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking
Many benchmarks available MHz (cycle speed of processor) MIPS (million instructions per second) Peak FLOPS Whetstone Stresses unoptimized scalar performance, since it is designed to defeat any effort to
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: GP-GPU Programming GPUs Hardware specialized for graphics calculations Originally developed to facilitate the use of CAD programs
scientific infrastructure Multicore Computing and Scientific Discovery James Larus Dennis Gannon Microsoft Research In the past half century, parallel computers, parallel computation, and scientific research
Week 2, Lecture 1 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. Directory-Based Coherence Idea Maintain pointers instead of simple states with each cache block. Ingredients Data owners
Parallel Programming on Ranger and Stampede Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition December 11, 2012 What is Stampede? NSF-funded XSEDE
ECE7995 (4) Basics of Memory Hierarchy [Adapted from Mary Jane Irwin s slides (PSU)] Major Components of a Computer Processor Devices Control Memory Input Datapath Output Performance Processor-Memory Performance
9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing GPGPU general-purpose
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering
AMD EPYC PRESENTS OPPORTUNITY TO SAVE ON SOFTWARE LICENSING COSTS BUSINESS SELECTION OF PROCESSOR SHOULD FACTOR IN SOFTWARE COSTS EXECUTIVE SUMMARY Software licensing models for many server applications
THE PHOTOGRAMMETRIC LOAD CHAIN FOR ADS IMAGE DATA AN INTEGRAL APPROACH TO IMAGE CORRECTION AND RECTIFICATION M. Downey a, 1, U. Tempelmann b a Pixelgrammetry Inc., suite 212, 5438 11 Street NE, Calgary,
Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows
Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 48 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum
School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Intro to HPC Architecture Instructor: Ekpe Okorafor A little about me! PhD Computer Engineering Texas A&M University Computer
Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost
An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,
GPU Programming Using CUDA Michael J. Schnieders Depts. of Biomedical Engineering & Biochemistry The University of Iowa & Gregory G. Howes Department of Physics and Astronomy The University of Iowa Iowa
CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput
Optimisation Myths and Facts as Seen in Statistical Physics Massimo Bernaschi Institute for Applied Computing National Research Council & Computer Science Department University La Sapienza Rome - ITALY
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code
Memory Bandwidth and Low Precision Computation CS6787 Lecture 9 Fall 2017 Memory as a Bottleneck So far, we ve just been talking about compute e.g. techniques to decrease the amount of compute by decreasing
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, email@example.com Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
CAESAR: Cryptanalysis of the Full AES Using GPU-Like Hardware Alex Biryukov and Johann Großschädl Laboratory of Algorithmics, Cryptology and Security University of Luxembourg SHARCS 2012, March 17, 2012
Chapter 2: Computer-System Structures MP Example: Intel Pentium Pro Quad Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure
Postgraduate course on Electronics and Informatics Engineering (M.Sc.) Training Course on Circuits Theory (prof. G. Capizzi)! Workshop on High performance computing and GPGPU computing Postgraduate course
CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations
Where Have We Been? Combinational and Sequential Logic Finite State Machines Computer Architecture Instruction Set Architecture Tracing Instructions at the Register Level Building a CPU Pipelining Where
CS550 Advanced Operating Systems (Distributed Operating Systems) Instructor: Xian-He Sun Email: firstname.lastname@example.org, Phone: (312) 567-5260 Office hours: 1:30pm-2:30pm Tuesday, Thursday at SB229C, or by appointment