Evaluating Orthogonality between Application Auto tuning and Run Time Resource Management for Adaptive OpenCL Applications

Size: px
Start display at page:

Download "Evaluating Orthogonality between Application Auto tuning and Run Time Resource Management for Adaptive OpenCL Applications"

Transcription

1 Evaluating Orthogonality between Application Auto tuning and Run Time Resource Management for Adaptive OpenCL Applications Edoardo Paone, Davide Gadioli, Gianluca Palermo, Vittorio Zaccaria, Cristina Silvano Politecnico di Milano

2 Computer Architecture Evolution um um Pentium um Core2 Duo 65nm Nehalem 45nm Time The number of transistors incorporated in a chip will approximately double every two years Gordon Moore, Intel co-founder 2

3 Moore s Law on Performance Performance um um Pentium 4 0.8um Core2 Duo 65nm Nehalem 45nm

4 Moore s Law on Performance Performance The Golden Era: - Single-processor - 1 st Power Wall um um Pentium 4 0.8um Core2 Duo 65nm Nehalem 45nm

5 Moore s Law on Performance Performance The Multicore Era: - 2 to 16 cores - On-chip shared LL$ um um Pentium 4 0.8um Core2 Duo 65nm Nehalem 45nm - Programmability challenge

6 Moore s Law on Performance Performance? The Manycore Era: - Larger # of cores - Networks on-chip um um Pentium 4 0.8um Core2 Duo 65nm Nehalem 45nm Programmability challenge + Dynamic Resource Management 3

7 Main Idea In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto Tuning Approximate computing Target Platforms 4

8 Main Idea In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto Tuning Approximate computing Target Platforms 4 Multicore Platform

9 Run Time Resource Management App1 App2 App3 RTRM Target Platform Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. Mapping on multi/many core systems: survey of current and emerging trends. In Proceedings of the 50th Annual Design Automation Conference (DAC)

10 RTRM Overview App1 App2 App3 Accounting Mapping RTRM Target Platform 6

11 RTRM Overview Resource accounting phase grants resources to critical workloads while optimize resource usage by best effort workloads Accounting Mapping App1 App2 App RTRM Target Platform 6

12 RTRM Overview Resource accounting phase grants resources to critical workloads while optimize resource usage by best effort workloads Mapping phase maps virtual resources on physical resources to achieve optimal platform usage to handle run time variations Accounting Mapping App1 App2 App RTRM Target Platform 6

13 Application Auto Tuning Key idea is that most of the applications are configurable thanks to a set of parameters 7

14 Application Auto Tuning Key idea is that most of the applications are configurable thanks to a set of parameters 7

15 Application Auto Tuning Key idea is that most of the applications are configurable thanks to a set of parameters Parameters: Color 7

16 Application Auto Tuning Key idea is that most of the applications are configurable thanks to a set of parameters Parameters: Color Shape 7

17 Application Auto Tuning Key idea is that most of the applications are configurable thanks to a set of parameters Parameters: Color Shape Size 7

18 Application Auto Tuning Key idea is that most of the applications are configurable thanks to a set of parameters Run-Time Knobs 7

19 Why Run Time Knobs & Auto Tuning? In some applications internal knobs that can be used to trade off between application quality of results and performance QoR Performance 8

20 Why Run Time Knobs & Auto Tuning? In some applications internal knobs that can be used to trade off between application quality of results and performance Autonomous Video-surveillance System 8

21 Why Run Time Knobs & Auto Tuning? In some applications internal knobs that can be used to trade off between application quality of results and performance Video Frame Rate Video Resolution Autonomous Video-surveillance System 8

22 Why Run Time Knobs & Auto Tuning? In some applications internal knobs that can be used to trade off between application quality of results and performance Video Frame Rate Video Resolution Autonomous Video-surveillance System 8

23 Why Run Time Knobs & Auto Tuning? In some applications internal knobs that can be used to trade off between application quality of results and performance Video Frame Rate Video Resolution Autonomous Video-surveillance System 8

24 Why Run Time Knobs & Auto Tuning? In some applications internal knobs that can be used to trade off between application quality of results and performance Video Frame Rate Video Resolution Autonomous Video-surveillance System 8

25 Why Run Time Knobs & Auto Tuning? In some applications internal knobs that can be used to trade off between application quality of results and performance Video Frame Rate Video Resolution Autonomous Video-surveillance System 8

26 Why Run Time Knobs & Auto Tuning? In some applications internal knobs that can be used to trade off between application quality of results and performance Video Frame Rate Video Resolution Autonomous Video-surveillance System 8

27 9 Application Auto Tuning Framework

28 Application Auto Tuning Framework Execution Loop 9

29 Application Auto Tuning Framework Monitoring 9

30 Application Auto Tuning Framework Re-Configure 9

31 Orthogonality Concept Requests Resources Platform OS Target HW Platform 10

32 Orthogonality Concept Requests Resources Exploitation of OpenCL Device Fission to limit resource requests Platform OS Target HW Platform 10

33 Orthogonality Concept Requests Resources Run-Time Resource Manager Exploitation of OpenCL Device Fission to limit resource requests Platform OS Target HW Platform 10

34 The Multi View Case Study 2 eyes = 3 dimensions 11

35 Implementation 1 P R P L Q R Q L Q L Q P L P D P CAM LEFT D Q Q R P R CAM LEFT CAM RIGHT CAM RIGHT 1 Ke Zhang, Jiangbo Lu, and Gauthier Lafruit, Cross-Based Local Stereo Matching Using Orthogonal Integral Images, IEEE Transactions On Circuits and Systems For Video Technology, Vol. 19, No. 7, July

36 Pixel disparity 36 Left camera Right camera reference disparity 13

37 Pixel disparity 1 Left camera Right camera 2 5 Application Knobs 3 reference disparity QoR Disparity Error 13

38 Experimental Setup Target Platform AMD NUMA Architecture: 4 nodes 4 cores OpenCL 1.2 run time provided by AMD 14

39 Experimental Setup Target Platform AMD NUMA Architecture: 4 nodes 4 cores OpenCL 1.2 run time provided by AMD Workload Definition: Single application multiple instances Dynamic workload in terms of start time, amount of data to process, frame rate goal 14

40 Experimental Setup Target Platform AMD NUMA Architecture: 4 nodes 4 cores OpenCL 1.2 run time provided by AMD Workload Definition: Single application multiple instances Dynamic workload in terms of start time, amount of data to process, frame rate goal Evaluation Metrics Normalized Actual Penalty (Performance/Quality metric) User satisfaction in terms of Application Frame Rate Normalized Application Error (Quality metric) User satisfaction in terms of quality of the resulting image (1/QoR) Difference w.r.t. off line profiling (Predictability metric) 14

41 Application Auto Tuning Effects 15

42 15 Application Auto Tuning Effects

43 15 Application Auto Tuning Effects

44 Comparative Analysis Application Auto-tuning OFF ON Run-Time Resource Management OFF ON PLAIN-LINUX (No Device Fission) PLAIN-RTRM ADAPTIVE-LINUX ADAPTIVE-RTRM 16

45 Run Time Results Missed Deadlines Design-Time Vs Run-Time Profiling QoR Disparity Error #Multi-View Apps

46 Run Time Results Missed Deadlines Design-Time Vs Run-Time Profiling QoR Disparity Error #Multi-View Apps

47 Run Time Results Missed Deadlines Design-Time Vs Run-Time Profiling QoR Disparity Error #Multi-View Apps

48 Run Time Results Missed Deadlines Design-Time Vs Run-Time Profiling QoR Disparity Error #Multi-View Apps

49 Run Time Results Missed Deadlines Design-Time Vs Run-Time Profiling QoR Disparity Error #Multi-View Apps

50 Run Time Results APPS 18

51 Resource Aware AS RTM Requests Resources Resource Availability Platform OS Target HW Platform 19

52 Resource Aware AS RTM Requests Resources Resource Availability Platform OS Target HW Platform 19

53 Conclusions We considered the problem of managing multiple OpenCL applications for server consolidation on multicore platforms We implemented an approach exploiting run time management frameworks operating both at application level or at OS/resource level Analysis of results: Auto tuning is necessary to modulate performance and QoR Resource awareness is needed for predictability by means of resource isolation (RTRM) or simple monitor (RA AS RTM) 20

Design Space Exploration and Application Autotuning for Runtime Adaptivity in Multicore Architectures

Design Space Exploration and Application Autotuning for Runtime Adaptivity in Multicore Architectures Design Space Exploration and Application Autotuning for Runtime Adaptivity in Multicore Architectures Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Outline Research challenges in multicore

More information

An Evaluation of Autotuning Techniques for the Compiler Optimization Problems

An Evaluation of Autotuning Techniques for the Compiler Optimization Problems An Evaluation of Autotuning Techniques for the Compiler Optimization Problems Amir Hossein Ashouri, Gianluca Palermo and Cristina Silvano Politecnico di Milano, Milan, Italy {amirhossein.ashouri,ginaluca.palermo,cristina.silvano}@polimi.it

More information

Challenges of Heterogeneous MPSoC for Image Processing

Challenges of Heterogeneous MPSoC for Image Processing Challenges of Heterogeneous MPSoC for Image Processing DGLR 2017 Walter Stechele Institute for Integrated Systems Technische Universität München Overview Reconfigurable hardware Hardware software

More information

CS671 Parallel Programming in the Many-Core Era

CS671 Parallel Programming in the Many-Core Era CS671 Parallel Programming in the Many-Core Era Lecture 1: Introduction Zheng Zhang Rutgers University CS671 Course Information Instructor information: instructor: zheng zhang website: www.cs.rutgers.edu/~zz124/

More information

Microprocessor Trends and Implications for the Future

Microprocessor Trends and Implications for the Future Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from

More information

OpenCL Application Auto-Tuning and Run-Time Resource Management for Multi-Core Platforms

OpenCL Application Auto-Tuning and Run-Time Resource Management for Multi-Core Platforms 214 IEEE International Symposium on Parallel and Distributed Processing with Applications OpenCL Application Auto-Tuning and Run-Time Resource Management for Multi-Core Platforms Davide Gadioli, Simone

More information

Automatic Pruning of Autotuning Parameter Space for OpenCL Applications

Automatic Pruning of Autotuning Parameter Space for OpenCL Applications Automatic Pruning of Autotuning Parameter Space for OpenCL Applications Ahmet Erdem, Gianluca Palermo 6, and Cristina Silvano 6 Department of Electronics, Information and Bioengineering Politecnico di

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1. Introduction João Bispo 1, Pedro Pinto 1, João M.P. Cardoso 1, Jorge G. Barbosa 1, Hamid Arabnejad 1, Davide Gadioli 2, Emanuele Vitali 2, Gianluca Palermo 2, Cristina Silvano 2, Stefano Cherubin

More information

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important? 0.1 What is This Course About? 0.2 CS 356 Unit 0 Class Introduction Basic Hardware Organization Introduction to Computer Systems a.k.a. Computer Organization or Architecture Filling in the "systems" details

More information

CSE 291: Mobile Application Processor Design

CSE 291: Mobile Application Processor Design CSE 291: Mobile Application Processor Design Mobile Application Processors are where the action are The evolution of mobile application processors mirrors that of microprocessors mirrors that of mainframes..

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh (thanks to Prof. Nigel Topham) General Information Instructor

More information

HW Trends and Architectures

HW Trends and Architectures Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty

More information

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

The ANTAREX Approach to AutoTuning and Adaptivity for Energy efficient HPC systems

The ANTAREX Approach to AutoTuning and Adaptivity for Energy efficient HPC systems The ANTAREX Approach to AutoTuning and Adaptivity for Energy efficient HPC systems The ANTAREX Team Nesus Fifth Working Group Meeting Ljubljana, July 8 th, 2016 ANTAREX AutoTuning and Adaptivity approach

More information

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Systems Group Department of Computer Science ETH Zürich SMP architecture

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

Performance of computer systems

Performance of computer systems Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type

More information

Parallelism in Hardware

Parallelism in Hardware Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

Asymmetry-aware execution placement on manycore chips

Asymmetry-aware execution placement on manycore chips Asymmetry-aware execution placement on manycore chips Alexey Tumanov Joshua Wise, Onur Mutlu, Greg Ganger CARNEGIE MELLON UNIVERSITY Introduction: Core Scaling? Moore s Law continues: can still fit more

More information

ECE 2162 Intro & Trends. Jun Yang Fall 2009

ECE 2162 Intro & Trends. Jun Yang Fall 2009 ECE 2162 Intro & Trends Jun Yang Fall 2009 Prerequisites CoE/ECE 0142: Computer Organization; or CoE/CS 1541: Introduction to Computer Architecture I will assume you have detailed knowledge of Pipelining

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Microelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica

Microelettronica. J. M. Rabaey, Digital integrated circuits: a design perspective EE141 Microelettronica Microelettronica J. M. Rabaey, "Digital integrated circuits: a design perspective" Introduction Why is designing digital ICs different today than it was before? Will it change in future? The First Computer

More information

EN105 : Computer architecture. Course overview J. CRENNE 2015/2016

EN105 : Computer architecture. Course overview J. CRENNE 2015/2016 EN105 : Computer architecture Course overview J. CRENNE 2015/2016 Schedule Cours Cours Cours Cours Cours Cours Cours Cours Cours Cours 2 CM 1 - Warmup CM 2 - Computer architecture CM 3 - CISC2RISC CM 4

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance Chapter 1 Computer Abstractions and Technology Lesson 2: Understanding Performance Indeed, the cost-performance ratio of the product will depend most heavily on the implementer, just as ease of use depends

More information

Wisconsin Computer Architecture. Nam Sung Kim

Wisconsin Computer Architecture. Nam Sung Kim Wisconsin Computer Architecture Mark Hill Nam Sung Kim Mikko Lipasti Karu Sankaralingam Guri Sohi David Wood Technology & Moore s Law 35nm Transistor 1947 Moore s Law 1964: Integrated Circuit 1958 Transistor

More information

Network Swapping. Outline Motivations HW and SW support for swapping under Linux OS

Network Swapping. Outline Motivations HW and SW support for swapping under Linux OS Network Swapping Emanuele Lattanzi, Andrea Acquaviva and Alessandro Bogliolo STI University of Urbino, ITALY Outline Motivations HW and SW support for swapping under Linux OS Local devices (CF, µhd) Network

More information

Elettronica T moduli I e II

Elettronica T moduli I e II Elettronica T moduli I e II Docenti: Massimo Lanzoni, Igor Loi Massimo.lanzoni@unibo.it igor.loi@unibo.it A.A. 2015/2016 Scheduling MOD 1 (Prof. Loi) Weeks 39,40,41,42, 43,44» MOS transistors» Digital

More information

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems

More information

ECE 486/586. Computer Architecture. Lecture # 2

ECE 486/586. Computer Architecture. Lecture # 2 ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:

More information

ECE 154A. Architecture. Dmitri Strukov

ECE 154A. Architecture. Dmitri Strukov ECE 154A Introduction to Computer Architecture Dmitri Strukov Lecture 1 Outline Admin What this class is about? Prerequisites ii Simple computer Performance Historical trends Economics 2 Admin Office Hours:

More information

Introduction to cache memories

Introduction to cache memories Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal

More information

Getting Ready for Approximate Computing: Trading Parallelism for Accuracy for DSS Workloads. Department of Computer Science University of Cyprus

Getting Ready for Approximate Computing: Trading Parallelism for Accuracy for DSS Workloads. Department of Computer Science University of Cyprus Getting Ready for Approximate Computing: Trading Parallelism for Accuracy for DSS Workloads Pedro Moura Trancoso CASPER Research Group Department of Computer Science University of Cyprus In: Computing

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing

More information

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information

More information

UPCRC Overview. Universal Computing Research Centers launched at UC Berkeley and UIUC. Andrew A. Chien. Vice President of Research Intel Corporation

UPCRC Overview. Universal Computing Research Centers launched at UC Berkeley and UIUC. Andrew A. Chien. Vice President of Research Intel Corporation UPCRC Overview Universal Computing Research Centers launched at UC Berkeley and UIUC Andrew A. Chien Vice President of Research Intel Corporation Announcement Key Messages Microsoft and Intel are announcing

More information

Quantifying Load Imbalance on Virtualized Enterprise Servers

Quantifying Load Imbalance on Virtualized Enterprise Servers Quantifying Load Imbalance on Virtualized Enterprise Servers Emmanuel Arzuaga and David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston MA 1 Traditional Data Centers

More information

CS 3410: Computer System Organization and Programming

CS 3410: Computer System Organization and Programming CS 3410: Computer System Organization and Programming Anne Bracy Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy,

More information

Lecture 2: Performance

Lecture 2: Performance Lecture 2: Performance Today s topics: Technology wrap-up Performance trends and equations Reminders: YouTube videos, canvas, and class webpage: http://www.cs.utah.edu/~rajeev/cs3810/ 1 Important Trends

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Computer Organization and Design The Hardware / Software Interface David A. Patterson and John L. Hennessy Course based on the

More information

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017 Technologies and application performance Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017 The landscape is changing We are no longer in the general purpose era the argument of

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #21: Caches 3 2005-07-27 CS61C L22 Caches III (1) Andy Carle Review: Why We Use Caches 1000 Performance 100 10 1 1980 1981 1982 1983

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Classes of Computers Personal computers General purpose, variety of software

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware 2010 VMware Inc. All rights reserved About the Speaker Hemant Gaidhani Senior Technical

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

CLASS 1 TUESDAY, JAN 3RD

CLASS 1 TUESDAY, JAN 3RD Don t Panic! An Introduction to. Fixing & Preventing Computer Problems CLASS 1 TUESDAY, JAN 3 RD Computer Basics Startup Problems Hardware Maintenance Ange Rapa January 2018 Don t Panic! An Introduction

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!

More information

The Computer Revolution. Classes of Computers. Chapter 1

The Computer Revolution. Classes of Computers. Chapter 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition 1 Chapter 1 Computer Abstractions and Technology 1 The Computer Revolution Progress in computer technology Underpinned by Moore

More information

Chapter 0 Introduction

Chapter 0 Introduction Chapter 0 Introduction Jin-Fu Li Laboratory Department of Electrical Engineering National Central University Jhongli, Taiwan Applications of ICs Consumer Electronics Automotive Electronics Green Power

More information

By Charvi Dhoot*, Vincent J. Mooney &,

By Charvi Dhoot*, Vincent J. Mooney &, By Charvi Dhoot*, Vincent J. Mooney &, -Shubhajit Roy Chowdhury*, Lap Pui Chau # *International Institute of Information Technology, Hyderabad, India & School of Electrical and Computer Engineering, Georgia

More information

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES P(ND) 2-2 2014 Guillaume Colin de Verdière OCTOBER 14TH, 2014 P(ND)^2-2 PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France October 14th, 2014 Abstract:

More information

Amir H. Ashouri University of Toronto Canada

Amir H. Ashouri University of Toronto Canada Compiler Autotuning using Machine Learning: A State-of-the-art Review Amir H. Ashouri University of Toronto Canada 4 th July, 2018 Politecnico di Milano, Italy Background 2 Education B.Sc (2005-2009):

More information

Design Metrics. A couple of especially important metrics: Time to market Total cost (NRE + unit cost) Performance (speed latency and throughput)

Design Metrics. A couple of especially important metrics: Time to market Total cost (NRE + unit cost) Performance (speed latency and throughput) Design Metrics A couple of especially important metrics: Time to market Total cost (NRE + unit cost) Performance (speed latency and throughput) 1 Design Metrics A couple of especially important metrics:

More information

Intel Architecture for Software Developers

Intel Architecture for Software Developers Intel Architecture for Software Developers 1 Agenda Introduction Processor Architecture Basics Intel Architecture Intel Core and Intel Xeon Intel Atom Intel Xeon Phi Coprocessor Use Cases for Software

More information

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Introduction. Summary. Why computer architecture? Technology trends Cost issues Introduction 1 Summary Why computer architecture? Technology trends Cost issues 2 1 Computer architecture? Computer Architecture refers to the attributes of a system visible to a programmer (that have

More information

Jin-Fu Li. Department of Electrical Engineering. Jhongli, Taiwan

Jin-Fu Li. Department of Electrical Engineering. Jhongli, Taiwan EEA001 VLSI Design Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering National Central University Jhongli, Taiwan Contents Syllabus Introduction to CMOS Circuits MOS Transistor

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Comparison of Processor Architectures and Metrics from 1992 to 2011

Comparison of Processor Architectures and Metrics from 1992 to 2011 Comparison of Processor Architectures and Metrics from 1992 to 211 Michael Colucciello Department of Electrical Engineering and Computer Science University of Central Florida Orlando, FL 32816-2362 Abstract

More information

Architecture at the end of Moore

Architecture at the end of Moore Architecture at the end of Moore Stefanos Kaxiras Uppsala University IT Uppsala universitet Conclusions There s a power problem and it seems bad Nothing works really well (e.g., multicores) Heterogeous

More information

low Energy COnsumption NETworks

low Energy COnsumption NETworks low Energy COnsumption NETworks (ECONET) Smart power management for fixed network devices ETSI Workshop on Energy Efficiency Genova 21st June 2012 The Project Motivations and Focus x 10 Static energy efficiency

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Computer Organization and Components

Computer Organization and Components Computer Organization and Components Course Structure IS500, fall 05 Lecture :, Concurrency,, and ILP Module : C and Assembly Programming L L L L Module : Processor Design X LAB S LAB L9 L5 Assistant Research

More information

CMSC 411 Computer Systems Architecture Lecture 2 Trends in Technology. Moore s Law: 2X transistors / year

CMSC 411 Computer Systems Architecture Lecture 2 Trends in Technology. Moore s Law: 2X transistors / year CMSC 411 Computer Systems Architecture Lecture 2 Trends in Technology Moore s Law: 2X transistors / year Cramming More Components onto Integrated Circuits Gordon Moore, Electronics, 1965 # on transistors

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

Compiler Optimizations and Auto-tuning. Amir H. Ashouri Politecnico Di Milano -2014

Compiler Optimizations and Auto-tuning. Amir H. Ashouri Politecnico Di Milano -2014 Compiler Optimizations and Auto-tuning Amir H. Ashouri Politecnico Di Milano -2014 Compilation Compilation = Translation One piece of code has : Around 10 ^ 80 different translations Different platforms

More information

Multi-core Programming Evolution

Multi-core Programming Evolution Multi-core Programming Evolution Based on slides from Intel Software ollege and Multi-ore Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

Evaluating Run-time Resource Management Policies for Multi-core Embedded Platforms with

Evaluating Run-time Resource Management Policies for Multi-core Embedded Platforms with Evaluating Run-time Resource Management Policies for Multi-core Embedded Platforms with the EMME Evaluation Framework Giovanni Mariani ALaRI - University of Lugano Lugano, Switzerland E-mail: giovanni.mariani@lu.unisi.ch

More information

The Effect of Temperature on Amdahl Law in 3D Multicore Era

The Effect of Temperature on Amdahl Law in 3D Multicore Era The Effect of Temperature on Amdahl Law in 3D Multicore Era L Yavits, A Morad, R Ginosar Abstract This work studies the influence of temperature on performance and scalability of 3D Chip Multiprocessors

More information

Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems

Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems This paper will provide you with a basic understanding of the differences among several computer system architectures dual-processor

More information

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008 Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated

More information

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering CSE 141: Computer 0 Architecture Professor: Michael Taylor RF UCSD Department of Computer Science & Engineering Computer Architecture from 10,000 feet foo(int x) {.. } Class of application Physics Computer

More information

Intel s Architecture for NFV

Intel s Architecture for NFV Intel s Architecture for NFV Evolution from specialized technology to mainstream programming Net Futures 2015 Network applications Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Many-Core Computing Era and New Challenges. Nikos Hardavellas, EECS

Many-Core Computing Era and New Challenges. Nikos Hardavellas, EECS Many-Core Computing Era and New Challenges Nikos Hardavellas, EECS Moore s Law Is Alive And Well 90nm 90nm transistor (Intel, 2005) Swine Flu A/H1N1 (CDC) 65nm 2007 45nm 2010 32nm 2013 22nm 2016 16nm 2019

More information

Computer Architecture s Changing Definition

Computer Architecture s Changing Definition Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction

More information

Agenda. Review. Costs of Set-Associative Caches

Agenda. Review. Costs of Set-Associative Caches Agenda CS 61C: Great Ideas in Computer Architecture (Machine Structures) Technology Trends and Data Level Instructor: Michael Greenbaum Caches - Replacement Policies, Review Administrivia Course Halfway

More information

Better Security with Virtual Machines

Better Security with Virtual Machines Better Security with Virtual Machines VMware Security Seminar Cambridge, 2006 Agenda VMware Evolution Virtual machine Server architecture Virtual infrastructure Looking forward VMware s security vision

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

4. Shared Memory Parallel Architectures

4. Shared Memory Parallel Architectures Master rogram (Laurea Magistrale) in Computer cience and Networking High erformance Computing ystems and Enabling latforms Marco Vanneschi 4. hared Memory arallel Architectures 4.4. Multicore Architectures

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Lecture 1: Gentle Introduction to GPUs

Lecture 1: Gentle Introduction to GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed

More information

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 Structured Computer Organization A computer s native language, machine language, is difficult for human s to use to program the computer

More information

Real-Time Cache Management for Multi-Core Virtualization

Real-Time Cache Management for Multi-Core Virtualization Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation

More information

THREAD LEVEL PARALLELISM

THREAD LEVEL PARALLELISM THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture

More information

Arquitecturas y Modelos de. Multicore

Arquitecturas y Modelos de. Multicore Arquitecturas y Modelos de rogramacion para Multicore 17 Septiembre 2008 Castellón Eduard Ayguadé Alex Ramírez Opening statements * Some visionaries already predicted multicores 30 years ago And they have

More information

Architecture-Conscious Database Systems

Architecture-Conscious Database Systems Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query

More information

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing

More information

HPC in the Multicore Era

HPC in the Multicore Era HPC in the Multicore Era -Challenges and opportunities - David Barkai, Ph.D. Intel HPC team High Performance Computing 14th Workshop on the Use of High Performance Computing in Meteorology ECMWF, Shinfield

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

VLSI Design Automation. Maurizio Palesi

VLSI Design Automation. Maurizio Palesi VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 Outline Technology trends VLSI Design flow (an overview) 3 IC Products Processors CPU, DSP, Controllers Memory chips

More information