VIRTUALIZATION IN THE AGE OF HETEROGENEOUS MACHINES

Size: px
Start display at page:

Download "VIRTUALIZATION IN THE AGE OF HETEROGENEOUS MACHINES"

Transcription

1 VIRTUALIZATION IN THE AGE OF HETEROGENEOUS MACHINES David F. Bacon The ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 11) Keynote - March 9, 2011

2 WHAT HETEROGENEOUS MACHINES?

3 WHAT HETEROGENEOUS MACHINES?

4 WHAT HETEROGENEOUS MACHINES? It s The Multicore Era!

5 THE PATH OF LEAST IMAGINATION For years, computer architects have been saying that a big new idea in computing was needed. Indeed, as transistors have continued to shrink, rather than continuing to innovate, computer designers have simply adopted a so-called multicore approach, where multiple processors are added as more chip real estate became available. Source: Remapping Computer Circuitry to Avert Impending Bottlenecks, February 28, 2011

6 MEANWHILE... GPU FPGA Tilera 64 Cell BE IBM WSP

7 WHAT IS PERFORMANCE? Peak Performance Source: Brodtkorb et al, The State-of-the-Art in Heterogeneous Computing, 2010

8 PERFORMANCE, TAKE 2 Actual Performance (Random Number Generation) Source: Thomas et al, A comparison of CPUs, GPUs, FPGAs and masssively parallel processor arrays for random number generation, 2009

9 WHAT S THE BEST USE OF TRANSISTORS? Homogeneous vs. Heterogeneous

10 WHAT S THE BEST USE OF TRANSISTORS? Homogeneous vs. Heterogeneous Keep Transistors Busy vs. Turn Transistors Off

11 Should We Virtualize Heterogeneous Architectures?

12 How?

13 At the Language or System Level? Or Both?

14 What is Virtualization Anyway?

15 THE ONLY 2 IDEAS IN COMPUTER SCIENCE

16 THE ONLY 2 IDEAS IN COMPUTER SCIENCE Hashing f(x)

17 THE ONLY 2 IDEAS IN COMPUTER SCIENCE Hashing f(x) Indirection

18 THE ONLY 2 IDEAS IN COMPUTER SCIENCE Hashing f(x) Indirection

19 SYSTEM VS. LANGUAGE VMS User-Mode ISA Supervisor-Mode ISA Device Drivers

20 SYSTEM VS. LANGUAGE VMS Supervisor-Mode ISA Device Drivers x86 User-Mode ISA x86

21 SYSTEM VS. LANGUAGE VMS Supervisor-Mode ISA Device Drivers x86 Supervisor-Mode ISA Device Drivers x86 User-Mode ISA x86 Supervisor-Mode ISA Device Drivers x86

22 SYSTEM VS. LANGUAGE VMS Supervisor-Mode ISA Device Drivers x86 Supervisor-Mode ISA User-Mode ISA User-Mode ISA User-Mode ISA Device Drivers x86 ARM POWER x86 Supervisor-Mode ISA Device Drivers x86

23 SYSTEM VS. LANGUAGE VMS Virtualize Environment Hold ISA Constant Virtualize Processor Vary ISA

24 OVERLAP: SYNERGY AND CONFLICT

25 OVERLAP: SYNERGY AND CONFLICT Memory

26 OVERLAP: SYNERGY AND CONFLICT Memory Threads

27 OVERLAP: SYNERGY AND CONFLICT Memory Threads User-Mode ISA Supervisor-Mode ISA

28 DEVICE OR PROCESSOR? GPU FPGA

29 THE ACCELERATOR MODEL Main Program GPU CPU FPGA

30 THE ACCELERATOR MODEL Force Calc Main Program GPU CPU FPGA

31 THE ACCELERATOR MODEL Force Calc Main Program FFT GPU CPU FPGA

32 THE REALITY

33 THE REALITY

34 THE REALITY

35 THE REALITY

36 THE REALITY

37 THE REALITY

38 THE REALITY

39 THE REALITY

40 Language VMs for Heterogenous Systems

41 Liquid Metal Joshua Auerbach Perry Cheng Rodric Rabbah David F. Bacon Steven Fink Sunil Shukla Christophe Dubach Yu Zhang

42 HETEROGENEOUS PROGRAMMING Node Compiler Synthesis C+Intrinsics VHDL Verilog SystemC GPU Compiler Cuda OpenCL CPU Compiler Java C++ Python binary binary binary bitfile CPU GPU WSP FPGA

43 CPU Compiler GPU Compiler Node Compiler Synthesis binary binary binary bitfile CPU GPU WSP FPGA

44 CPU Backend GPU Backend Node Backend Verilog Backend bytecode binary binary bitfile CPU GPU WSP FPGA

45 THE LIQUID METAL PROGRAMMING LANGUAGE Lime CPU Backend GPU Backend Node Backend Verilog Backend Lime Compiler bytecode binary binary bitfile CPU GPU WSP FPGA

46 THE ARTIFACT STORE & EXCLUSION Lime Lime Compiler bytecode binary binary bitfile CPU GPU WSP FPGA

47 THE ARTIFACT STORE & EXCLUSION Lime Lime Compiler bytecode binary binary bitfile Artifact Store CPU GPU WSP FPGA

48 THE ARTIFACT STORE & EXCLUSION Lime Lime Compiler bytecode binary binary bitfile bitfile bitfile Artifact Store CPU GPU WSP FPGA

49 THE ARTIFACT STORE & EXCLUSION Lime Lime Compiler bytecode binary bitfile bitfile bitfile Artifact Store CPU GPU WSP FPGA

50 THE ARTIFACT STORE & EXCLUSION Lime Lime Compiler bytecode binary bitfile bitfile bitfile Artifact Store CPU GPU WSP FPGA

51 HETEROGENEOUS EXECUTION OF LIME Lime Lime Compiler bytecode binary bitfile bitfile bitfile Artifact Store CPU GPU WSP FPGA

52 HETEROGENEOUS EXECUTION OF LIME Lime Lime Compiler bytecode binary bitfile bitfile bitfile Artifact Store Lime Virtual Machine (LmVM) CPU GPU WSP FPGA

53 HETEROGENEOUS EXECUTION OF LIME Lime Lime Compiler bytecode binary bitfile bitfile bitfile Artifact Store Lime Virtual Machine (LmVM) CPU GPU Bus WSP FPGA

54 HETEROGENEOUS EXECUTION OF LIME Lime Lime Compiler bytecode binary bitfile bitfile bitfile Artifact Store Lime Virtual Machine (LmVM) CPU bytecode GPU Bus WSP FPGA

55 HETEROGENEOUS EXECUTION OF LIME Lime Lime Compiler bytecode binary bitfile bitfile bitfile Artifact Store Lime Virtual Machine (LmVM) CPU bytecode GPU Bus WSP FPGA bitfile

56 THE LIME LANGUAGE: VIRTUALIZING HETEROGENEOUS COMPUTATION

57 LIME: JAVA IS (ALMOST) A SUBSET % javac MyClass.java % java MyClass

58 LIME: JAVA IS (ALMOST) A SUBSET % javac MyClass.java % java MyClass % mv MyClass.java MyClass.lime

59 LIME: JAVA IS (ALMOST) A SUBSET % javac MyClass.java % java MyClass % mv MyClass.java MyClass.lime % limec MyClass.lime

60 LIME: JAVA IS (ALMOST) A SUBSET % javac MyClass.java % java MyClass % mv MyClass.java MyClass.lime % limec MyClass.lime % java MyClass

61 LIME: JAVA IS (ALMOST) A SUBSET % javac MyClass.java % java MyClass % mv MyClass.java MyClass.lime % limec MyClass.lime % java MyClass INCREMENTALLY USE LIME FEATURES

62 LIME LANGUAGE OVERVIEW BIT-LEVEL PARALLELISM Core Features Programmable Primitives Stream Programming Collective Operations PIPELINE PARALLELISM Immutable Types Bounded Types Bounded Arrays Primitive Supertypes DATA PARALLELISM Graph Construction Isolation Enforcement Closed World Support Rate Matching Messaging Reifiable Generics Ranges, Bounded for User-defined operators Supporting Features Typedefs Local type inference Tuples

63 VIRTUALIZATION COMPARISON FPGA CPU Physical Virtual Physical Virtual Core Thread LUTs bit RAM Heap CLBs User Primitives byte word float dfloat Compare&Swap byte float int Slices double Block RAM Bounded Arrays Routing Network Streams Reduce DMA, SerDes Streams DSP Multipliers integer<n> synchronized tasks Map

64 STREAMING COMPUTATION PIPELINE PARALLELISM

65 TASK CREATION (STATELESS) var uppercaser = task work; primitive filter stream char char port type port local char work(char c) { return toupper(c); } stream type worker method

66 TASK CREATION (STATELESS) Task Creation var uppercaser = task work; primitive filter stream char char port type port local char work(char c) { return toupper(c); } stream type worker method Isolation Keywords

67 PIPELINES var pipeline = task worker1 => task worker2 => task worker3; port-to-stream connection port-to-stream connection char char int int[[5]] worker1( ) { } worker2( ) { } worker3( ) { } compound filter

68 PIPELINES var pipeline = task worker1 => task worker2 => task worker3; Graph Construction port-to-stream connection port-to-stream connection char char int int[[5]] worker1( ) { } worker2( ) { } worker3( ) { } compound filter

69 SOURCES AND SINKS source filter sink char char reader( ) { } worker( ) { } writer( ) { } Heap /tmp/mydata File System /dev/tty

70 HELLO WORLD, STREAMING STYLE public static void main(string[[]] args) { char[[]] msg = { 'H','E','L','L','O',',',' ', 'W','O','R','L','D','!','\n' }; var hello = msg.source(1) => task Character.toLowerCase(char) => task System.out.print(char); } hello.finish();

71 DEMO HELLO WORLD LIME/ECLIPSE ENVIRONMENT

72

73 STATEFUL TASKS var averager = task Averager().avg; instance variables (local state) primitive filter double double total; long count; double double avg(double x) { total += x; return total/++count; }

74 RATE MATCHING var matchedpipe = task AddStuff().work1 => # => task work2; rate matcher (rate 1:2) int int x; int[[2]] Buffer int int int[[2]] work1(int i) { int r = i+x; x = i/2; return { r, i }; } int work2(int i) { return i*3; } compound filter (rate 1:2)

75 DEMO N-BODY SIMULATION: CPU VS. GPU

76 9X SPEEDUP (9.26 GFLOPS) ON LAPTOP

77 VIRTUALIZATION OF DATA MOVEMENT =>

78 VIRTUALIZATION OF DATA MOVEMENT => +

79 VIRTUALIZATION OF DATA MOVEMENT => + x 3 months

80 WHAT S SO HARD? Projects routinely spend 60-80% of time on data xfer Verilog is hard, but driving devices and buses is harder Motherboard chips, BIOS settings: 4x slowdown each Signal transitions are considered an interface spec IP extremely sensitive to configurations and often broken when it mismatches the original Device drivers also highly sensitive Often specialized to one I/O style; fall off cliff otherwise Hardware design culture accepts this as C.O.D.B.

81 COLLECTIVE OPERATIONS DATA PARALLELISM

82 float[ ] a =...; float[ ] b =...; float[ ] c = b; ARRAY PARALLELISM a b

83 ARRAY PARALLELISM float[ ] a =...; float[ ] b =...; float[ ] c = b; Indexable<int,float> a b

84 ARRAY PARALLELISM float[ ] a =...; float[ ] b =...; float[ ] c = b; Indexable<int,float> index a b

85 ARRAY PARALLELISM float[ ] a =...; float[ ] b =...; float[ ] c = b; Indexable<int,float> index a b domain( )

86 ARRAY PARALLELISM float[ ] a =...; float[ ] b =...; float[ ] c = b; Indexable<int,float> index a b domain( ) 0 :: 2 0 :: 2

87 ARRAY PARALLELISM float[ ] a =...; float[ ] b =...; float[ ] c = b; Indexable<int,float> index a b domain( ) 0 :: 2 ==? 0 :: 2

88 ARRAY PARALLELISM float[ ] a =...; float[ ] b =...; float[ ] c = b; Indexable<int,float> Collectable<int,float> index a b domain( ) 0 :: 2 ==? 0 :: 2

89 ARRAY PARALLELISM float[ ] a =...; float[ ] b =...; float[ ] c = b; Indexable<int,float> Collectable<int,float> index a b collector temp domain( ) 0 :: 2 ==? 0 :: 2

90 ARRAY PARALLELISM float[ ] a =...; float[ ] b =...; float[ ] c = b; Indexable<int,float> Collectable<int,float> index a b collector temp domain( ) 0 :: 2 ==? 0 :: 2

91 ARRAY PARALLELISM float[ ] a =...; float[ ] b =...; float[ ] c = b; Indexable<int,float> Collectable<int,float> index a b collector c domain( ) 0 :: 2 ==? 0 :: 2

92 MANY THINGS ARE COLLECTABLE var a = new Matrix<3,quaternion>(100,200,100); var b = new Matrix<3,quaternion>(100,200,100); var c = + b; ( 0::99, 0::199, 0::99 ) Map<string,List<int>> a =... Map<string,List<int>> b =... Map<string,List<int>> c = append(b); { foo, bar } string foo = Character.toLowerCase(@ FOO ); 0 :: 2

93 COLLECTIVE OPERATIONS IN ACTION package lime.util.synthesizable; public class FixedHashMap<K extends Value, V extends Value, BUCKETS extends ordinal<buckets>, LINKS extends ordinal<links>> extends AbstractMap<K,V> { protected final nodes = new Node<K,V>[BUCKETS][LINKS]; LINKS = 2 nodes BUCKETS = 3 Node 1 K V Node 1 K V Node 0 K V Node 1 K V Node 0 K V Node 0 K V

94 GET OPERATION, STEP 1: SELECT ROW public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } key LINKS = 2 row nodes BUCKETS = 3 Node 1 K V Node 1 K V Node 0 K V Node 1 K V Node 0 K V Node 0 K V

95 GET OPERATION, STEP 1: SELECT ROW public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } key row Node 1 K V Node 1 K V

96 STEP 2: COMPARE ALL KEYS public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } key row Node 1 K V Node 1 K V

97 STEP 2: COMPARE ALL KEYS public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } key row Node 1 K V Node 1 K V selections

98 STEP 2: COMPARE ALL KEYS public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } key key row Node 1 K V Node 1 K V selections

99 STEP 2: COMPARE ALL KEYS public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } key key row Node 1 K V Node 1 K V true false selections

100 STEP 2: COMPARE ALL KEYS public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } row Node 1 K V Node 1 K V selections true false

101 STEP 3: GET VALUES/ZEROS public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } row Node 1 K V Node 1 K V selections true false

102 STEP 3: GET VALUES/ZEROS public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } row Node 1 K V Node 1 K V selections true false vals

103 STEP 3: GET VALUES/ZEROS public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } row Node 1 K V Node 1 K V selections true false vals V V0

104 STEP 4: OR-REDUCE FOR RESULT public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } row Node 1 K V Node 1 K V selections true false vals V 0

105 STEP 4: OR-REDUCE FOR RESULT public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } vals V 0

106 STEP 4: OR-REDUCE FOR RESULT public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; } vals V 0

107 STEP 4: OR-REDUCE FOR RESULT public local V get(k key) { Node[LINKS] row = nodes[hash(key)]; boolean[links] selections = comparekey(key); V[LINKS] vals = getvalueordefault(selections); return! vals; 0V } vals V 0

108 VIRTUALIZATION OF DATA

109 CURRENT RESULTS

110 HOW DO WE EVALUATE PERFORMANCE? Speedup for Naïve Users How much faster than Java? Slowdown for Expert Users? How much slower than hand-tuned low-level code? Our methodology: Write/tune/compare 4 versions of each benchmark: Java, Lime, OpenCL, Verilog Doesn t address flops/watt, flops/watt/$, productivity

111 EXPERT VS NAÏVE SPEEDUP: END-TO-END (JAVA BASELINE) 6.67x 136x 1.06x 1.0x 6.10x 1.06x Expert Lime 66x 0.04x GPU GPU FPGA FPGA

112 EXPERT VS NAÏVE SPEEDUP: KERNEL TIME (JAVA BASELINE) 37x 193x 10x 150x 10x Expert Lime 26x 126x 0.02x GPU GPU FPGA FPGA

113 Should We Virtualize Heterogeneous Architectures?

114 Yes We Can!

115 USER VS. SUPERVISOR

116 ACCELERATOR VS. FIRST-CLASS DEVICE

117 ANOTHER UNSOLVED PROBLEM

118 SUMMARY

119 SUMMARY Heterogeneity is Here to Stay

120 SUMMARY Heterogeneity is Here to Stay Multicore Isn t Dead - Just Less Important

121 SUMMARY Heterogeneity is Here to Stay Multicore Isn t Dead - Just Less Important HLLs Can Insulate Programmer from Low-Level Details

122 SUMMARY Heterogeneity is Here to Stay Multicore Isn t Dead - Just Less Important HLLs Can Insulate Programmer from Low-Level Details But not fundamental computational structure

123 SUMMARY Heterogeneity is Here to Stay Multicore Isn t Dead - Just Less Important HLLs Can Insulate Programmer from Low-Level Details But not fundamental computational structure System-Level Virtualization is an Open (Hard) Problem

124 REAL-TIME PHOTOMOSAIC

125 REAL-TIME PHOTOMOSAIC

126 Questions?

LIQUID METAL Taming Heterogeneity

LIQUID METAL Taming Heterogeneity LIQUID METAL Taming Heterogeneity Stephen Fink IBM Research! IBM Research Liquid Metal Team (IBM T. J. Watson Research Center) Josh Auerbach Perry Cheng 2 David Bacon Stephen Fink Ioana Baldini Rodric

More information

FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges

FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges Bob Blainey IBM Software Group 27 Feb 2011 FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges Workshop on The Role of FPGAs in a Converged Future with Heterogeneous Programmable

More information

AND THEN THERE WERE NONE

AND THEN THERE WERE NONE ND THEN THERE WERE NONE Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. acon Perry Cheng Sunil Shukla IM Research IMPLEMENTING PROGRMMING LNGUGE Program Source Code Interpreter

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9 General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

Experts in Application Acceleration Synective Labs AB

Experts in Application Acceleration Synective Labs AB Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg

More information

Trends in the Infrastructure of Computing

Trends in the Infrastructure of Computing Trends in the Infrastructure of Computing CSCE 9: Computing in the Modern World Dr. Jason D. Bakos My Questions How do computer processors work? Why do computer processors get faster over time? How much

More information

Functioning Hardware from Functional Specifications

Functioning Hardware from Functional Specifications Functioning Hardware from Functional Specifications Stephen A. Edwards Martha A. Kim Richard Townsend Kuangya Zhai Lianne Lairmore Columbia University IBM PL Day, November 18, 2014 Where s my 10 GHz processor?

More information

FABRICATION TECHNOLOGIES

FABRICATION TECHNOLOGIES FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general

More information

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware

More information

Gzip Compression Using Altera OpenCL. Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh

Gzip Compression Using Altera OpenCL. Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh Gzip Widely-used lossless compression program Gzip = LZ77 + Huffman Big data needs fast

More information

Using FPGAs as Microservices

Using FPGAs as Microservices Using FPGAs as Microservices David Ojika, Ann Gordon-Ross, Herman Lam, Bhavesh Patel, Gaurav Kaul, Jayson Strayer (University of Florida, DELL EMC, Intel Corporation) The 9 th Workshop on Big Data Benchmarks,

More information

Hardware-Supported Pointer Detection for common Garbage Collections

Hardware-Supported Pointer Detection for common Garbage Collections 2013 First International Symposium on Computing and Networking Hardware-Supported Pointer Detection for common Garbage Collections Kei IDEUE, Yuki SATOMI, Tomoaki TSUMURA and Hiroshi MATSUO Nagoya Institute

More information

Growing a Software Language for Hardware Design

Growing a Software Language for Hardware Design Growing a Software Language for Hardware Design Joshua Auerbach, David F. Bacon, Perry Cheng, Stephen J. Fink, Rodric Rabbah, and Sunil Shukla IBM Thomas J. Watson Research Center Yorktown Heights, NY,

More information

ClearSpeed Visual Profiler

ClearSpeed Visual Profiler ClearSpeed Visual Profiler Copyright 2007 ClearSpeed Technology plc. All rights reserved. 12 November 2007 www.clearspeed.com 1 Profiling Application Code Why use a profiler? Program analysis tools are

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7 General Purpose GPU Programming Advanced Operating Systems Tutorial 7 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

Introduction to Visual Basic and Visual C++ Introduction to Java. JDK Editions. Overview. Lesson 13. Overview

Introduction to Visual Basic and Visual C++ Introduction to Java. JDK Editions. Overview. Lesson 13. Overview Introduction to Visual Basic and Visual C++ Introduction to Java Lesson 13 Overview I154-1-A A @ Peter Lo 2010 1 I154-1-A A @ Peter Lo 2010 2 Overview JDK Editions Before you can write and run the simple

More information

Object-oriented programming. and data-structures CS/ENGRD 2110 SUMMER 2018

Object-oriented programming. and data-structures CS/ENGRD 2110 SUMMER 2018 Object-oriented programming 1 and data-structures CS/ENGRD 2110 SUMMER 2018 Lecture 1: Types and Control Flow http://courses.cs.cornell.edu/cs2110/2018su Lecture 1 Outline 2 Languages Overview Imperative

More information

CS 152, Spring 2012 Section 11

CS 152, Spring 2012 Section 11 CS 152, Spring 2012 Section 11 Christopher Celio University of California, Berkeley Agenda Quiz 3 Lab 4 results Lab 5 MI protocol MSI protocol http://spectrum.ieee.org/semiconductors/processors/ multicore-cpus-processor-proliferation/0

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

SP3Q.3. What makes it a good idea to put CRC computation and error-correcting code computation into custom hardware?

SP3Q.3. What makes it a good idea to put CRC computation and error-correcting code computation into custom hardware? Part II CST: SoC D/M: Quick exercises S3-S4 (examples sheet) Feb 2018 (rev a). This sheet contains short exercises for quick revision. Please also look at past exam questions and/or try some of the longer

More information

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction

More information

Higher Level Programming Abstractions for FPGAs using OpenCL

Higher Level Programming Abstractions for FPGAs using OpenCL Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*

More information

Chapter 2 Parallel Hardware

Chapter 2 Parallel Hardware Chapter 2 Parallel Hardware Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers

More information

Re-architecting Virtualization in Heterogeneous Multicore Systems

Re-architecting Virtualization in Heterogeneous Multicore Systems Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College

More information

Crash Course in Java. Why Java? Java notes for C++ programmers. Network Programming in Java is very different than in C/C++

Crash Course in Java. Why Java? Java notes for C++ programmers. Network Programming in Java is very different than in C/C++ Crash Course in Java Netprog: Java Intro 1 Why Java? Network Programming in Java is very different than in C/C++ much more language support error handling no pointers! (garbage collection) Threads are

More information

Array. Prepared By - Rifat Shahriyar

Array. Prepared By - Rifat Shahriyar Java More Details Array 2 Arrays A group of variables containing values that all have the same type Arrays are fixed length entities In Java, arrays are objects, so they are considered reference types

More information

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs October 13, 2009 Overview Presenting: Alex Papakonstantinou, Karthik Gururaj, John Stratton, Jason Cong, Deming Chen, Wen-mei Hwu. FCUDA:

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

The Virtualized Virtual Machine:

The Virtualized Virtual Machine: he Virtualized Virtual Machine: he Next Generation of Virtual Machine echnology David F Bacon IBM.J. Watson Research Center November 5, 2003 2003 IBM Corporation he Virtual Machine Virtual Machine Definition

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

ECE 8823: GPU Architectures. Objectives

ECE 8823: GPU Architectures. Objectives ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading

More information

Lift: a Functional Approach to Generating High Performance GPU Code using Rewrite Rules

Lift: a Functional Approach to Generating High Performance GPU Code using Rewrite Rules Lift: a Functional Approach to Generating High Performance GPU Code using Rewrite Rules Toomas Remmelg Michel Steuwer Christophe Dubach The 4th South of England Regional Programming Language Seminar 27th

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs October 13, 2009 Overview Presenting: Alex Papakonstantinou, Karthik Gururaj, John Stratton, Jason Cong, Deming Chen, Wen-mei Hwu. FCUDA:

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

M 3 Microkernel-based System for Heterogeneous Manycores

M 3 Microkernel-based System for Heterogeneous Manycores M 3 Microkernel-based System for Heterogeneous Manycores Nils Asmussen MKC, 06/29/2017 1 / 35 Outline 1 Introduction 2 Data Transfer Unit 3 Prototype Platforms 4 M 3 5 Summary 2 / 35 Outline 1 Introduction

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Functioning Hardware from Functional Specifications

Functioning Hardware from Functional Specifications Functioning Hardware from Functional Specifications Stephen A. Edwards Columbia University Chalmers, Göteborg, Sweden, December 17, 2013 (λx.?)f = FPGA Where s my 10 GHz processor? From Sutter, The Free

More information

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1 Agenda CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Summer 2004 Java virtual machine architecture.class files Class loading Execution engines Interpreters & JITs various strategies

More information

Advanced CUDA Optimization 1. Introduction

Advanced CUDA Optimization 1. Introduction Advanced CUDA Optimization 1. Introduction Thomas Bradley Agenda CUDA Review Review of CUDA Architecture Programming & Memory Models Programming Environment Execution Performance Optimization Guidelines

More information

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market

More information

Automatic Intra-Application Load Balancing for Heterogeneous Systems

Automatic Intra-Application Load Balancing for Heterogeneous Systems Automatic Intra-Application Load Balancing for Heterogeneous Systems Michael Boyer, Shuai Che, and Kevin Skadron Department of Computer Science University of Virginia Jayanth Gummaraju and Nuwan Jayasena

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1 CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Winter 2008 3/11/2008 2002-08 Hal Perkins & UW CSE V-1 Agenda Java virtual machine architecture.class files Class loading Execution engines

More information

Subset Sum Problem Parallel Solution

Subset Sum Problem Parallel Solution Subset Sum Problem Parallel Solution Project Report Harshit Shah hrs8207@rit.edu Rochester Institute of Technology, NY, USA 1. Overview Subset sum problem is NP-complete problem which can be solved in

More information

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Registers

More information

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Advanced Topics on Heterogeneous System Architectures HSA Foundation! Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

Hybrid Threading: A New Approach for Performance and Productivity

Hybrid Threading: A New Approach for Performance and Productivity Hybrid Threading: A New Approach for Performance and Productivity Glen Edwards, Convey Computer Corporation 1302 East Collins Blvd Richardson, TX 75081 (214) 666-6024 gedwards@conveycomputer.com Abstract:

More information

Concurrent Programming with Threads: Why you should care deeply

Concurrent Programming with Threads: Why you should care deeply Concurrent Programming with Threads: Why you should care deeply Don Porter Portions courtesy Emmett Witchel Performance (vs. VAX-11/780) Uniprocessor Performance Not Scaling 10000 20% /year 1000 52% /year

More information

CSC573: TSHA Introduction to Accelerators

CSC573: TSHA Introduction to Accelerators CSC573: TSHA Introduction to Accelerators Sreepathi Pai September 5, 2017 URCS Outline Introduction to Accelerators GPU Architectures GPU Programming Models Outline Introduction to Accelerators GPU Architectures

More information

Communication Library to Overlap Computation and Communication for OpenCL Application

Communication Library to Overlap Computation and Communication for OpenCL Application Communication Library to Overlap Computation and Communication for OpenCL Application Toshiya Komoda, Shinobu Miwa, Hiroshi Nakamura Univ.Tokyo What is today s talk about? Heterogeneous Computing System

More information

Lec 3. Compilers, Debugging, Hello World, and Variables

Lec 3. Compilers, Debugging, Hello World, and Variables Lec 3 Compilers, Debugging, Hello World, and Variables Announcements First book reading due tonight at midnight Complete 80% of all activities to get 100% HW1 due Saturday at midnight Lab hours posted

More information

Chapter 5. A Closer Look at Instruction Set Architectures

Chapter 5. A Closer Look at Instruction Set Architectures Chapter 5 A Closer Look at Instruction Set Architectures Chapter 5 Objectives Understand the factors involved in instruction set architecture design. Gain familiarity with memory addressing modes. Understand

More information

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats Chapter 5 Objectives Chapter 5 A Closer Look at Instruction Set Architectures Understand the factors involved in instruction set architecture design. Gain familiarity with memory addressing modes. Understand

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

GPU Memory Model Overview

GPU Memory Model Overview GPU Memory Model Overview John Owens University of California, Davis Department of Electrical and Computer Engineering Institute for Data Analysis and Visualization SciDAC Institute for Ultrascale Visualization

More information

M 3 Microkernel-based System for Heterogeneous Manycores

M 3 Microkernel-based System for Heterogeneous Manycores M 3 Microkernel-based System for Heterogeneous Manycores Nils Asmussen MKC, 06/29/2017 1 / 48 Overall System Design Prototype Platforms Capabilities OS Services Context Switching Evaluation Heterogeneous

More information

Heterogenous Computing

Heterogenous Computing Heterogenous Computing Fall 2018 CS, SE - Freshman Seminar 11:00 a 11:50a Computer Architecture What are the components of a computer? How do these components work together to perform computations? How

More information

GPU Fundamentals Jeff Larkin November 14, 2016

GPU Fundamentals Jeff Larkin November 14, 2016 GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate

More information

Exploiting High-Performance Heterogeneous Hardware for Java Programs using Graal

Exploiting High-Performance Heterogeneous Hardware for Java Programs using Graal Exploiting High-Performance Heterogeneous Hardware for Java Programs using Graal James Clarkson ±, Juan Fumero, Michalis Papadimitriou, Foivos S. Zakkak, Christos Kotselidis and Mikel Luján ± Dyson, The

More information

COMP 530: Operating Systems Concurrent Programming with Threads: Why you should care deeply

COMP 530: Operating Systems Concurrent Programming with Threads: Why you should care deeply Concurrent Programming with Threads: Why you should care deeply Don Porter Portions courtesy Emmett Witchel 1 Uniprocessor Performance Not Scaling Performance (vs. VAX-11/780) 10000 1000 100 10 1 20% /year

More information

HKG OpenCL Support by NNVM & TVM. Jammy Zhou - Linaro

HKG OpenCL Support by NNVM & TVM. Jammy Zhou - Linaro HKG18-417 OpenCL Support by NNVM & TVM Jammy Zhou - Linaro Agenda OpenCL Overview OpenCL in NNVM & TVM Current Status OpenCL Introduction Open Computing Language Open standard maintained by Khronos with

More information

On the Portability and Performance of Message-Passing Programs on Embedded Multicore Platforms

On the Portability and Performance of Message-Passing Programs on Embedded Multicore Platforms On the Portability and Performance of Message-Passing Programs on Embedded Multicore Platforms Shih-Hao Hung, Po-Hsun Chiu, Chia-Heng Tu, Wei-Ting Chou and Wen-Long Yang Graduate Institute of Networking

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU

More information

Java Programming. Atul Prakash

Java Programming. Atul Prakash Java Programming Atul Prakash Java Language Fundamentals The language syntax is similar to C/ C++ If you know C/C++, you will have no trouble understanding Java s syntax If you don't, it will be easier

More information

Introduction to CELL B.E. and GPU Programming. Agenda

Introduction to CELL B.E. and GPU Programming. Agenda Introduction to CELL B.E. and GPU Programming Department of Electrical & Computer Engineering Rutgers University Agenda Background CELL B.E. Architecture Overview CELL B.E. Programming Environment GPU

More information

Porting Performance across GPUs and FPGAs

Porting Performance across GPUs and FPGAs Porting Performance across GPUs and FPGAs Deming Chen, ECE, University of Illinois In collaboration with Alex Papakonstantinou 1, Karthik Gururaj 2, John Stratton 1, Jason Cong 2, Wen-Mei Hwu 1 1: ECE

More information

Stream Computing using Brook+

Stream Computing using Brook+ Stream Computing using Brook+ School of Electrical Engineering and Computer Science University of Central Florida Slides courtesy of P. Bhaniramka Outline Overview of Brook+ Brook+ Software Architecture

More information

Introduction to Java

Introduction to Java Introduction to Java Module 1: Getting started, Java Basics 22/01/2010 Prepared by Chris Panayiotou for EPL 233 1 Lab Objectives o Objective: Learn how to write, compile and execute HelloWorld.java Learn

More information

learning objectives understand the role of an operating system understand the role of interpreters and compilers

learning objectives understand the role of an operating system understand the role of interpreters and compilers system software learning objectives algorithms your software system software hardware understand the role of an operating system understand the role of interpreters and compilers unders tand t he role

More information

CS 231 Data Structures and Algorithms, Fall 2016

CS 231 Data Structures and Algorithms, Fall 2016 CS 231 Data Structures and Algorithms, Fall 2016 Dr. Bruce A. Maxwell Department of Computer Science Colby College Course Description Focuses on the common structures used to store data and the standard

More information

Accelerated Test Execution Using GPUs

Accelerated Test Execution Using GPUs Accelerated Test Execution Using GPUs Vanya Yaneva Supervisors: Ajitha Rajan, Christophe Dubach Mathworks May 27, 2016 The Problem Software testing is time consuming Functional testing The Problem Software

More information

A Deterministic Concurrent Language for Embedded Systems

A Deterministic Concurrent Language for Embedded Systems A Deterministic Concurrent Language for Embedded Systems Stephen A. Edwards Columbia University Joint work with Olivier Tardieu SHIM:A Deterministic Concurrent Language for Embedded Systems p. 1/38 Definition

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

Design issues for objectoriented. languages. Objects-only "pure" language vs mixed. Are subclasses subtypes of the superclass?

Design issues for objectoriented. languages. Objects-only pure language vs mixed. Are subclasses subtypes of the superclass? Encapsulation Encapsulation grouping of subprograms and the data they manipulate Information hiding abstract data types type definition is hidden from the user variables of the type can be declared variables

More information

Agenda. The main body and cout. Fundamental data types. Declarations and definitions. Control structures

Agenda. The main body and cout. Fundamental data types. Declarations and definitions. Control structures The main body and cout Agenda 1 Fundamental data types Declarations and definitions Control structures References, pass-by-value vs pass-by-references The main body and cout 2 C++ IS AN OO EXTENSION OF

More information

Introduction to Java https://tinyurl.com/y7bvpa9z

Introduction to Java https://tinyurl.com/y7bvpa9z Introduction to Java https://tinyurl.com/y7bvpa9z Eric Newhall - Laurence Meyers Team 2849 Alumni Java Object-Oriented Compiled Garbage-Collected WORA - Write Once, Run Anywhere IDE Integrated Development

More information

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important? 0.1 What is This Course About? 0.2 CS 356 Unit 0 Class Introduction Basic Hardware Organization Introduction to Computer Systems a.k.a. Computer Organization or Architecture Filling in the "systems" details

More information

StackVsHeap SPL/2010 SPL/20

StackVsHeap SPL/2010 SPL/20 StackVsHeap Objectives Memory management central shared resource in multiprocessing RTE memory models that are used in Java and C++ services for Java/C++ programmer from RTE (JVM / OS). Perspectives of

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

Altera SDK for OpenCL

Altera SDK for OpenCL Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group

More information

Lecture 1 - Introduction (Class Notes)

Lecture 1 - Introduction (Class Notes) Lecture 1 - Introduction (Class Notes) Outline: How does a computer work? Very brief! What is programming? The evolution of programming languages Generations of programming languages Compiled vs. Interpreted

More information

Outline. Parts 1 to 3 introduce and sketch out the ideas of OOP. Part 5 deals with these ideas in closer detail.

Outline. Parts 1 to 3 introduce and sketch out the ideas of OOP. Part 5 deals with these ideas in closer detail. OOP in Java 1 Outline 1. Getting started, primitive data types and control structures 2. Classes and objects 3. Extending classes 4. Using some standard packages 5. OOP revisited Parts 1 to 3 introduce

More information

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573 What is a compiler? Xiaokang Qiu Purdue University ECE 573 August 21, 2017 What is a compiler? What is a compiler? Traditionally: Program that analyzes and translates from a high level language (e.g.,

More information

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats Chapter 5 Objectives Understand the factors involved in instruction set architecture design. Chapter 5 A Closer Look at Instruction Set Architectures Gain familiarity with memory addressing modes. Understand

More information

Programmable NICs. Lecture 14, Computer Networks (198:552)

Programmable NICs. Lecture 14, Computer Networks (198:552) Programmable NICs Lecture 14, Computer Networks (198:552) Network Interface Cards (NICs) The physical interface between a machine and the wire Life of a transmitted packet Userspace application NIC Transport

More information

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions CMSC 330: Organization of Programming Languages Multithreaded Programming Patterns in Java CMSC 330 2 Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to

More information

Eventrons: A Safe Programming Construct for High-Frequency Hard Real-Time Applications

Eventrons: A Safe Programming Construct for High-Frequency Hard Real-Time Applications Eventrons: A Safe Programming Construct for High-Frequency Hard Real-Time Applications Daniel Spoonhower Carnegie Mellon University Joint work with Joshua Auerbach, David F. Bacon, Perry Cheng, David Grove

More information

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy

More information

Chapter 3 Parallel Software

Chapter 3 Parallel Software Chapter 3 Parallel Software Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers

More information

RISC Architecture Ch 12

RISC Architecture Ch 12 RISC Architecture Ch 12 Some History Instruction Usage Characteristics Large Register Files Register Allocation Optimization RISC vs. CISC 18 Original Ideas Behind CISC (Complex Instruction Set Comp.)

More information

Chapter 5. A Closer Look at Instruction Set Architectures

Chapter 5. A Closer Look at Instruction Set Architectures Chapter 5 A Closer Look at Instruction Set Architectures Chapter 5 Objectives Understand the factors involved in instruction set architecture design. Gain familiarity with memory addressing modes. Understand

More information

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio Fall 2017 1 Outline Inter-Process Communication (20) Threads

More information

Fall 2017 CISC124 9/16/2017

Fall 2017 CISC124 9/16/2017 CISC124 Labs start this week in JEFF 155: Meet your TA. Check out the course web site, if you have not already done so. Watch lecture videos if you need to review anything we have already done. Problems

More information

Technology for a better society. hetcomp.com

Technology for a better society. hetcomp.com Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction

More information