Heterogeneous Computing Architecture. Adaptive Systems Laboratory

Size: px
Start display at page:

Download "Heterogeneous Computing Architecture. Adaptive Systems Laboratory"

Transcription

1 1 Heterogeneous Computing Architecture m d s Mitsuhiro Nakamura Achraf Ben Ahmed Maiko Tanaka

2 Contents 2 CPU Period Background Single core Multi core Heterogeneous Multi core Heterogeneous Computing Type Problem Heterogeneous System Architecture(HSA)

3 Contents 3 CPU Period Background Single core Multi core Heterogeneous Multi core Heterogeneous Computing Type Problem Heterogeneous System Architecture(HSA)

4 Moore s law 4 source:

5 Moore s law 5 Multi Single source:

6 Contents 6 CPU Period Background Single core Multi core Heterogeneous Multi core Heterogeneous Computing Type Problem Heterogeneous System Architecture(HSA)

7 Single 7 One core in one package data Age: 1971 ~ 2003 (Intel 4004) (Pemtium M) Clock rate: 740kHz ~ 2.26GHz FLOPS: 7.6G(Pemtium4) Price: $56(Pemtium4)

8 Single Performance increase Clock rate Segmentalized pipeline 8 F E WB F E WB F E WB IPC(Instruction per cycle) Superscalar F E WB F E WB F E WB F D E M W F D E M W F D E M W F E WB F E WB F E WB F E WB F E WB F E WB

9 Single Performance limit 9 Performance increase Complexity Area Power consumption Cost

10 Contents 10 CPU Period Background Single core Multi core Heterogeneous Multi core Heterogeneous Computing Type Problem Heterogeneous System Architecture(HSA)

11 Multi 11 The same cores in one package (Homogeneous) data data Age: 2006 ~ ( Duo) Clock rate: 1.66GHz ~ 3.6GHz FLOPS: 26.64G(2 Duo) 53.28G( i7 Nehalem) 384G( i7 Haswell) Price: $48(2 Duo) ~ $282( i7 Haswell) ~

12 Multi Performance increase TLP(Thread-Level Parallelism) 12 data data data data data data Simple core(pollack s Rule)

13 Multi Performance limit 13 Same core Difficult different operations Sequence parallel

14 Contents 14 CPU Period Background Single core Multi core Heterogeneous Multi core Heterogeneous Computing Type Problem Heterogeneous System Architecture(HSA)

15 Heterogeneous multi core 15 Different cores in one package data data Age: 2006 ~ (Cell Broadband Engine) Clock rate: Various(3.7GHz + 720MHz) FLOPS: 200G ~ Price: $129 A (CPU) B (GPU,etc)

16 Heterogeneous Performance increase 16 Single thread + Multi thread(simd) + General processing (Control OS) Data processing (Stream) Optimization each core

17 Heterogeneous Example 17 Cell Broadband Engine in PS3 commons/e/e0/cell-processor.jpg

18 Heterogeneous Example 18 AMD APU(Accelerated Processing Unit) in PS4

19 Heterogeneous Example 19 Fast CPU + slow CPU CPU + FPGA CPU + Photonic CPU + Matrix processor Kazuki Kobayashi, Matrix Multiplication for Hierarchical Memory Interface on Array Processor, P5

20 CPU Architecture Trends 20

21 Contents 21 CPU Period Background Single core Multi core Heterogeneous Multi core Heterogeneous Computing Type Problem Heterogeneous System Architecture(HSA)

22 Heterogeneous Computing 22

23 Type 23 OpenCL Device CPU, GPU, Cell processor, DSP Area Parallel computing OpenGL GPU 2D, 3D CUDA GPU(NVIDIA) Parallel computing

24 Problem 24 Different Programs Memory spaces C/C++, Jave CUDA, OpenCL

25 Problem 25 U_without_HSA.svg/720px-HSA_%E2%80%93_using_the_GPU_without_HSA.svg.png

26 Contents 26 CPU Period Background Single core Multi core Heterogeneous Multi core Heterogeneous Computing Type Problem Heterogeneous System Architecture(HSA)

27 HSA 27 Framework to simplify GPU computing H/W: Share memory spaces(huma) S/W: The same programming language

28 HSA 28 Framework to simplify GPU computing H/W: Share memory spaces _graphics_card.svg/320px-hsa-enabled_virtual_memory_with_distinct_graphics_card.svg.png

29 HSA 29 Framework to simplify GPU computing H/W: Share memory spaces GPU_with_HSA.svg/720px-HSA_%E2%80%93_using_the_GPU_with_HSA.svg.png

30 HSA 30 Framework to simplify GPU computing S/W: The same programming language

31 BOLT 31

32 Language comparison 32

33 Contents 33 CPU Period Background Single core Multi core Heterogeneous Multi core Heterogeneous Computing Type Problem Heterogeneous System Architecture(HSA)

34 Summary 34 CPU Architecture trends Single Multi Heterogeneous Heterogeneous computing OpenCL HAS BOLT

35 Reference 35 What is Heterogeneous System Architecture? ヘテロジニアスマルチコア -Wikipedia 3%83%8B%E3%82%A2%E3%82%B9%E3%83%9E%E3%83%AB%E3%83%81%E3%82 %B3%E3%82%A2 CPU+GPU のヘテロジニアス構成を簡易化を目指す -HSA を巡る ARM の思惑と動向 新時代到来! ヘテロジニアス コンピューティング最新動向 前編 (1/4) 新時代到来! ヘテロジニアス コンピューティング最新動向 後編 (1/3) 決定的となったヘテロジニアスマルチコアへの潮流

Lecture 1: Gentle Introduction to GPUs

Lecture 1: Gentle Introduction to GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed

More information

Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1

Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1 COSCOⅣ Heterogeneous SoCs M5171111 HASEGAWA TORU M5171112 IDONUMA TOSHIICHI May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1 Contents Background Heterogeneous technology May 28, 2014 COMPUTER SYSTEM COLLOQUIUM

More information

Software Prototyping ( プロトタイピング ) Animating and demonstrating system requirements

Software Prototyping ( プロトタイピング ) Animating and demonstrating system requirements Software Prototyping ( プロトタイピング ) Animating and demonstrating requirements Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 8 Slide 1 Uses of s The principal use is to help customers and

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

Parallelism in Hardware

Parallelism in Hardware Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

JASCO-HPLC Operating Manual. (Analytical HPLC)

JASCO-HPLC Operating Manual. (Analytical HPLC) JASCO-HPLC Operating Manual (Analytical HPLC) Index A) Turning on Equipment and Starting ChromNav... 3 B) For Manual Measurement... 6 (1) Making Control Method... 7 (2) Preparation for Measurement... 9

More information

JR SHIKOKU_Wi Fi Connection Guide

JR SHIKOKU_Wi Fi Connection Guide JR SHIKOKU_Wi Fi Connection Guide Ver1.0 June, 2018 Procedure to connect JR SHIKOKU_Station_Wi Fi Wireless LAN info SSID :JR SHIKOKU_Station_Wi Fi IP Address: Acquired automatically DNS Address:Acquired

More information

Cloud Connector 徹底解説. 多様な基盤への展開を可能にするための Citrix Cloud のキーコンポーネント A-5 セールスエンジニアリング本部パートナー SE 部リードシステムズエンジニア. 哲司 (Satoshi Komiyama) Citrix

Cloud Connector 徹底解説. 多様な基盤への展開を可能にするための Citrix Cloud のキーコンポーネント A-5 セールスエンジニアリング本部パートナー SE 部リードシステムズエンジニア. 哲司 (Satoshi Komiyama) Citrix 1 2017 Citrix Cloud Connector 徹底解説 多様な基盤への展開を可能にするための Citrix Cloud のキーコンポーネント A-5 セールスエンジニアリング本部パートナー SE 部リードシステムズエンジニア 小宮山 哲司 (Satoshi Komiyama) 2 2017 Citrix このセッションのもくじ Cloud Connector 徹底解説 Cloud Connector

More information

オープンソ プンソース技術者のための AMD 最新テクノロジーアップデート 日本 AMD 株式会社 マーケティング ビジネス開発本部 エンタープライズプロダクトマーケティング部 山野 洋幸

オープンソ プンソース技術者のための AMD 最新テクノロジーアップデート 日本 AMD 株式会社 マーケティング ビジネス開発本部 エンタープライズプロダクトマーケティング部 山野 洋幸 AMD AMD CPU 2 Happy 6 th Birthday AMD Opteron Processor 3 6コア Istanbul : 完全な進捗状況 Executing months ahead of schedule In collaboration with GLOBALFOUNDRIES: first tapeout to production World s only six-core

More information

Parallel and Distributed Programming Introduction. Kenjiro Taura

Parallel and Distributed Programming Introduction. Kenjiro Taura Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel Programming? 2 What Parallel Machines Look Like, and Where Performance Come From? 3 How to Program Parallel

More information

Androidプログラミング 2 回目 迫紀徳

Androidプログラミング 2 回目 迫紀徳 Androidプログラミング 2 回目 迫紀徳 前回の復習もかねて BMI 計算アプリを作ってみよう! 2 3 BMI の計算方法 BMI = 体重 [kg] 身長 [m] 2 状態も表示できると GOOD 状態低体重 ( 痩せ型 ) 普通体重肥満 (1 度 ) 肥満 (2 度 ) 肥満 (3 度 ) 肥満 (4 度 ) 指標 18.5 未満 18.5 以上 25 未満 25 以上 30 未満 30

More information

GPGPU. Peter Laurens 1st-year PhD Student, NSC

GPGPU. Peter Laurens 1st-year PhD Student, NSC GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing

More information

Unofficial Redmine Cooking - QA #782 yaml_db を使った DB のマイグレーションで失敗する

Unofficial Redmine Cooking - QA #782 yaml_db を使った DB のマイグレーションで失敗する Unofficial Redmine Cooking - QA #782 yaml_db を使った DB のマイグレーションで失敗する 2018/03/26 10:04 - Tamura Shinji ステータス : 新規開始日 : 2018/03/26 優先度 : 通常期日 : 担当者 : 進捗率 : 0% カテゴリ : 予定工数 : 0.00 時間 対象バージョン : 作業時間 : 0.00 時間

More information

今日の予定 1. 展開図の基礎的な知識 1. 正多面体の共通の展開図. 2. 複数の箱が折れる共通の展開図 :2 時間目 3. Rep-Cube: 最新の話題 4. 正多面体に近い立体と正 4 面体の共通の展開図 5. ペタル型の紙で折るピラミッド型 :2 時間目 ~3 時間目

今日の予定 1. 展開図の基礎的な知識 1. 正多面体の共通の展開図. 2. 複数の箱が折れる共通の展開図 :2 時間目 3. Rep-Cube: 最新の話題 4. 正多面体に近い立体と正 4 面体の共通の展開図 5. ペタル型の紙で折るピラミッド型 :2 時間目 ~3 時間目 今日の予定 このミステリー (?) の中でメイントリックに使われました! 1. 展開図の基礎的な知識 1. 正多面体の共通の展開図 2. 複数の箱が折れる共通の展開図 :2 時間目 3. Rep-Cube: 最新の話題 4. 正多面体に近い立体と正 4 面体の共通の展開図 5. ペタル型の紙で折るピラミッド型 :2 時間目 ~3 時間目 Some nets are available at http://www.jaist.ac.jp/~uehara/etc/origami/nets/index-e.html

More information

TEGRA K1 による GPU コンピューティング

TEGRA K1 による GPU コンピューティング TEGRA K1 による GPU コンピューティング COMPUTE WITH TEGRA K1 馬路徹 シニア ソリューション アーキテクト NVIDIA AGENDA Introducing Tegra K1 Tegra K1 Compute Software Capabilities OpenGL GLSL OpenCL CUDA/Unified Memory Google Renderscript

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Interdomain Routing Security Workshop 21 BGP, 4 Bytes AS. Brocade Communications Systems, K.K.

Interdomain Routing Security Workshop 21 BGP, 4 Bytes AS. Brocade Communications Systems, K.K. Interdomain Routing Security Workshop 21 BGP, 4 Bytes AS Ken ichiro Hashimoto Brocade Communications Systems, K.K. September, 14 th, 2009 BGP Malformed AS_PATH そもそもうちは as0 を出せるのか? NetIron MLX-4 Router(config-bgp)#router

More information

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)

More information

! Readings! ! Room-level, on-chip! vs.!

! Readings! ! Room-level, on-chip! vs.! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads

More information

MySQL Cluster 7.3 リリース記念!! 5 分で作る MySQL Cluster 環境

MySQL Cluster 7.3 リリース記念!! 5 分で作る MySQL Cluster 環境 MySQL Cluster 7.3 リリース記念!! 5 分で作る MySQL Cluster 環境 日本オラクル株式会社山崎由章 / MySQL Senior Sales Consultant, Asia Pacific and Japan 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. New!! 外部キー

More information

4. Shared Memory Parallel Architectures

4. Shared Memory Parallel Architectures Master rogram (Laurea Magistrale) in Computer cience and Networking High erformance Computing ystems and Enabling latforms Marco Vanneschi 4. hared Memory arallel Architectures 4.4. Multicore Architectures

More information

JPexam. 最新の IT 認定試験資料のプロバイダ IT 認証であなたのキャリアを進めます

JPexam.   最新の IT 認定試験資料のプロバイダ IT 認証であなたのキャリアを進めます JPexam 最新の IT 認定試験資料のプロバイダ http://www.jpexam.com IT 認証であなたのキャリアを進めます Exam : MB6-704 Title : Microsoft Dynamics AX 2012 R3 CU8 Development Introduction Vendor : Microsoft Version : DEMO Get Latest & Valid

More information

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016 Moore s Law CS 6534: Tech Trends / Intro Microprocessor Transistor Counts 1971-211 & Moore's Law 2,6,, 1,,, Six-Core Core i7 Six-Core Xeon 74 Dual-Core Itanium 2 AMD K1 Itanium 2 with 9MB cache POWER6

More information

autocad 2000i update 647D5CDB9807FA8605EC016DF2CFDE43 Autocad 2000i Update 1 / 6

autocad 2000i update 647D5CDB9807FA8605EC016DF2CFDE43 Autocad 2000i Update 1 / 6 Autocad 2000i Update 1 / 6 2 / 6 3 / 6 Autocad 2000i Update AutoCAD is a CAD (Computer Aided Design or Computer Aided Drafting) software application for 2D and 3D design and drafting. First released in

More information

J の Lab システムの舞台裏 - パワーポイントはいらない -

J の Lab システムの舞台裏 - パワーポイントはいらない - JAPLA 研究会資料 2011/6/25 J の Lab システムの舞台裏 - パワーポイントはいらない - 西川利男 学会の発表などでは 私は J の Lab を活用している 多くの人が使っているパワーポイントなぞ使う気にはならない J の Lab システムは会場の大きなスクリーンで説明文書が出来ることはもちろんだが システム自身が J の上で動いていることから J のプログラムが即実行出来て

More information

CS 6534: Tech Trends / Intro

CS 6534: Tech Trends / Intro 1 CS 6534: Tech Trends / Intro Charles Reiss 24 August 2016 Moore s Law Microprocessor Transistor Counts 1971-2011 & Moore's Law 16-Core SPARC T3 2,600,000,000 1,000,000,000 Six-Core Core i7 Six-Core Xeon

More information

Verify99. Axis Systems

Verify99. Axis Systems Axis Systems Axis Systems Mission Axis Systems, Inc. is a technology leader in the logic design verification market. Founded in 1996, the company offers breakthrough technologies and high-speed simulation

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

From Application to Technology OpenCL Application Processors Chung-Ho Chen

From Application to Technology OpenCL Application Processors Chung-Ho Chen From Application to Technology OpenCL Application Processors Chung-Ho Chen Computer Architecture and System Laboratory (CASLab) Department of Electrical Engineering and Institute of Computer and Communication

More information

Vectorisation and Portable Programming using OpenCL

Vectorisation and Portable Programming using OpenCL Vectorisation and Portable Programming using OpenCL Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre (JSC) Andreas Beckmann, Ilya Zhukov, Willi Homberg, JSC Wolfram Schenck, FH Bielefeld

More information

CPU-GPU Heterogeneous Computing

CPU-GPU Heterogeneous Computing CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems

More information

電脳梁山泊烏賊塾 構造体のサイズ. Visual Basic

電脳梁山泊烏賊塾 構造体のサイズ. Visual Basic 構造体 構造体のサイズ Marshal.SizeOf メソッド 整数型等型のサイズが定義されて居る構造体の場合 Marshal.SizeOf メソッドを使う事に依り型のサイズ ( バイト数 ) を取得する事が出来る 引数に値やオブジェクトを直接指定するか typeof や GetType で取得した型情報を渡す事に依り 其の型のサイズを取得する事が出来る 下記のプログラムを実行する事に依り Marshal.SizeOf

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

Rechargeable LED Work Light

Rechargeable LED Work Light Rechargeable LED Work Light 充電式 LED 作業灯 Model:SWL-150R1 Using LED:LG innotek SMD, HI-POWER(150mA 15 position) Color Temperature:5,700 kelvin Using Battery:LG chemical Li-ion Battery(2,600mA 1set) Brightness

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 Video Codecs 70% of internet traffic will be video in 2018 [CISCO] Video

More information

2011 Signal Processing CoDR: Technology Roadmap W. Turner SPDO. 14 th April 2011

2011 Signal Processing CoDR: Technology Roadmap W. Turner SPDO. 14 th April 2011 2011 Signal Processing CoDR: Technology Roadmap W. Turner SPDO 14 th April 2011 Technology Roadmap Objectives: Identify known potential technologies applicable to the SKA Provide traceable attributes of

More information

Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou. University of Maryland Baltimore County

Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou. University of Maryland Baltimore County Accelerating a climate physics model with OpenCL Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou University of Maryland Baltimore County Introduction The demand to increase forecast predictability

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

The Era of Heterogeneous Computing

The Era of Heterogeneous Computing The Era of Heterogeneous Computing EU-US Summer School on High Performance Computing New York, NY, USA June 28, 2013 Lars Koesterke: Research Staff @ TACC Nomenclature Architecture Model -------------------------------------------------------

More information

OpenCL: History & Future. November 20, 2017

OpenCL: History & Future. November 20, 2017 Mitglied der Helmholtz-Gemeinschaft OpenCL: History & Future November 20, 2017 OpenCL Portable Heterogeneous Computing 2 APIs and 2 kernel languages C Platform Layer API OpenCL C and C++ kernel language

More information

Lecture 4 Branch & cut algorithm

Lecture 4 Branch & cut algorithm Lecture 4 Branch & cut algorithm 1.Basic of branch & bound 2.Branch & bound algorithm 3.Implicit enumeration method 4.B&B for mixed integer program 5.Cutting plane method 6.Branch & cut algorithm Slide

More information

Motion Path Searches for Maritime Robots

Motion Path Searches for Maritime Robots Journal of National Fisheries University 59 ⑷ 245-251(2011) Motion Path Searches for Maritime Robots Eiji Morimoto 1, Makoto Nakamura 1, Dai Yamanishi 1 and Eiki Osaki 2 Abstract : A method based on genetic

More information

Multicore and Parallel Processing

Multicore and Parallel Processing Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement

More information

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008 Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated

More information

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important? 0.1 What is This Course About? 0.2 CS 356 Unit 0 Class Introduction Basic Hardware Organization Introduction to Computer Systems a.k.a. Computer Organization or Architecture Filling in the "systems" details

More information

Relaxed Consistency models and software distributed memory. Computer Architecture Textbook pp.79-83

Relaxed Consistency models and software distributed memory. Computer Architecture Textbook pp.79-83 Relaxed Consistency models and software distributed memory Computer Architecture Textbook pp.79-83 What is the consistency model? Coherence vs. Consistency (again) Coherence and consistency are complementary:

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

Using Graphics Chips for General Purpose Computation

Using Graphics Chips for General Purpose Computation White Paper Using Graphics Chips for General Purpose Computation Document Version 0.1 May 12, 2010 442 Northlake Blvd. Altamonte Springs, FL 32701 (407) 262-7100 TABLE OF CONTENTS 1. INTRODUCTION....1

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

45-year CPU Evolution: 1 Law -2 Equations

45-year CPU Evolution: 1 Law -2 Equations 4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there

More information

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing

More information

Predictive Runtime Code Scheduling for Heterogeneous Architectures

Predictive Runtime Code Scheduling for Heterogeneous Architectures Predictive Runtime Code Scheduling for Heterogeneous Architectures Víctor Jiménez, Lluís Vilanova, Isaac Gelado Marisa Gil, Grigori Fursin, Nacho Navarro HiPEAC 2009 January, 26th, 2009 1 Outline Motivation

More information

and Parallel Algorithms Programming with CUDA, WS09 Waqar Saleem, Jens Müller

and Parallel Algorithms Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller Organization People Waqar Saleem, waqar.saleem@uni-jena.de Jens Mueller, jkm@informatik.uni-jena.de Room 3335, Ernst-Abbe-Platz 2

More information

GPU for HPC. October 2010

GPU for HPC. October 2010 GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,

More information

Parallelism and Concurrency. COS 326 David Walker Princeton University

Parallelism and Concurrency. COS 326 David Walker Princeton University Parallelism and Concurrency COS 326 David Walker Princeton University Parallelism What is it? Today's technology trends. How can we take advantage of it? Why is it so much harder to program? Some preliminary

More information

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Gaurav Mitra Andrew Haigh Luke Angove Anish Varghese Eric McCreath Alistair P. Rendell Research School of Computer Science Australian

More information

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1

More information

Trends in the Infrastructure of Computing

Trends in the Infrastructure of Computing Trends in the Infrastructure of Computing CSCE 9: Computing in the Modern World Dr. Jason D. Bakos My Questions How do computer processors work? Why do computer processors get faster over time? How much

More information

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013 GPGPU on ARM Tom Gall, Gil Pitney, 30 th Oct 2013 Session Description This session will discuss the current state of the art of GPGPU technologies on ARM SoC systems. What standards are there? Where are

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

暗い Lena トーンマッピング とは? 明るい Lena. 元の Lena. tone mapped. image. original. image. tone mapped. tone mapped image. image. original image. original.

暗い Lena トーンマッピング とは? 明るい Lena. 元の Lena. tone mapped. image. original. image. tone mapped. tone mapped image. image. original image. original. 暗い Lena トーンマッピング とは? tone mapped 画素値 ( ) output piel value input piel value 画素値 ( ) / 2 original 元の Lena 明るい Lena tone mapped 画素値 ( ) output piel value input piel value 画素値 ( ) tone mapped 画素値 ( ) output

More information

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort rob@cs.vu.nl Schedule 2 1. Introduction, performance metrics & analysis 2. Many-core hardware 3. Cuda class 1: basics 4. Cuda class

More information

Quick Install Guide. Adaptec SCSI RAID 2120S Controller

Quick Install Guide. Adaptec SCSI RAID 2120S Controller Quick Install Guide Adaptec SCSI RAID 2120S Controller The Adaptec SCSI Raid (ASR) 2120S Controller is supported on the HP Workstation xw series with Microsoft Windows 2000 and Windows XP operating systems

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12

More information

Googleの強みは ささえるのは世界一のインフラ. Google File System 2008年度後期 情報システム構成論2 第10回 クラウドと協調フィルタリング. 初期(1999年)の Googleクラスタ. 最近のデータセンタ Google Chrome Comicより

Googleの強みは ささえるのは世界一のインフラ. Google File System 2008年度後期 情報システム構成論2 第10回 クラウドと協調フィルタリング. 初期(1999年)の Googleクラスタ. 最近のデータセンタ Google Chrome Comicより Googleの強みは 2008年度後期 情報システム構成論2 第10回 クラウドと協調フィルタリング 西尾 信彦 nishio@cs.ritsumei.ac.jp 立命館大学 情報理工学部 Cloud Computing 全地球規模で構成された圧倒的なPCクラスタ 部分的な機能不全を補う機能 あらゆる種類の情報へのサービスの提供 Web上の 全 情報 地図情報 (実世界情報) どのように利用されているかを機械学習

More information

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Evolution of Single-Chip Transistor Count 10K- 100K Clock Frequency 0.2-2MHz Microprocessors 1970 s 1980 s 1990 s 2010s 100K-1M

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

CUDA Programming Model

CUDA Programming Model CUDA Xing Zeng, Dongyue Mou Introduction Example Pro & Contra Trend Introduction Example Pro & Contra Trend Introduction What is CUDA? - Compute Unified Device Architecture. - A powerful parallel programming

More information

Studies of Large-Scale Data Visualization: EXTRAWING and Visual Data Mining

Studies of Large-Scale Data Visualization: EXTRAWING and Visual Data Mining Chapter 3 Visualization Studies of Large-Scale Data Visualization: EXTRAWING and Visual Data Mining Project Representative Fumiaki Araki Earth Simulator Center, Japan Agency for Marine-Earth Science and

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Experts in Application Acceleration Synective Labs AB

Experts in Application Acceleration Synective Labs AB Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Computer Organization and Design The Hardware / Software Interface David A. Patterson and John L. Hennessy Course based on the

More information

Microprocessor Trends and Implications for the Future

Microprocessor Trends and Implications for the Future Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from

More information

Multi-Processors and GPU

Multi-Processors and GPU Multi-Processors and GPU Philipp Koehn 7 December 2016 Predicted CPU Clock Speed 1 Clock speed 1971: 740 khz, 2016: 28.7 GHz Source: Horowitz "The Singularity is Near" (2005) Actual CPU Clock Speed 2 Clock

More information

HEAD HardwarE Accelerated Deduplication

HEAD HardwarE Accelerated Deduplication HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee Executive Summary A-Z development of deduplication SW version

More information

Introduction to GPGPU and GPU-architectures

Introduction to GPGPU and GPU-architectures Introduction to GPGPU and GPU-architectures Henk Corporaal Gert-Jan van den Braak http://www.es.ele.tue.nl/ Contents 1. What is a GPU 2. Programming a GPU 3. GPU thread scheduling 4. GPU performance bottlenecks

More information

Private Sub 終了 XToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles 終了 XToolStripMenuItem.

Private Sub 終了 XToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles 終了 XToolStripMenuItem. Imports MySql.Data.MySqlClient Imports System.IO Public Class FrmMst Private Sub 終了 XToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles 終了 XToolStripMenuItem.Click

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

Antonio R. Miele Marco D. Santambrogio

Antonio R. Miele Marco D. Santambrogio Advanced Topics on Heterogeneous System Architectures GPU Politecnico di Milano Seminar Room A. Alario 18 November, 2015 Antonio R. Miele Marco D. Santambrogio Politecnico di Milano 2 Introduction First

More information

MatCL - OpenCL MATLAB Interface

MatCL - OpenCL MATLAB Interface MatCL - OpenCL MATLAB Interface MatCL - OpenCL MATLAB Interface Slide 1 MatCL - OpenCL MATLAB Interface OpenCL toolkit for Mathworks MATLAB/SIMULINK Compile & Run OpenCL Kernels Handles OpenCL memory management

More information

Parallel Programming

Parallel Programming Parallel Programming Introduction Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Acknowledgements Prof. Felix Wolf, TU Darmstadt Prof. Matthias

More information

State-of-the-art in Heterogeneous Computing

State-of-the-art in Heterogeneous Computing State-of-the-art in Heterogeneous Computing Guest Lecture NTNU Trond Hagen, Research Manager SINTEF, Department of Applied Mathematics 1 Overview Introduction GPU Programming Strategies Trends: Heterogeneous

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Graphics Processor Acceleration and YOU

Graphics Processor Acceleration and YOU Graphics Processor Acceleration and YOU James Phillips Research/gpu/ Goals of Lecture After this talk the audience will: Understand how GPUs differ from CPUs Understand the limits of GPU acceleration Have

More information

INFRAGISTICS WPF 13.2 サービスリリースノート 2014 年 12 月

INFRAGISTICS WPF 13.2 サービスリリースノート 2014 年 12 月 INFRAGISTICS WPF 13.2 サービスリリースノート 2014 年 12 月 Infragistics WPF で実現する高度な BI ときれいなデスクトップ UI Infragistics WPF コントロールは タッチサポート機能 動的なテーマ 高パフォーマンスなアプリケーションを最小限の工数で 実現できる画期的なコントロールです インストール ダウンロード NetAdvantage

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market

More information

Introduction to Embedded Systems

Introduction to Embedded Systems Introduction to Embedded Systems Minsoo Ryu Hanyang University Outline 1. Definition of embedded systems 2. History and applications 3. Characteristics of embedded systems Purposes and constraints User

More information

フラクタル 1 ( ジュリア集合 ) 解説 : ジュリア集合 ( 自己平方フラクタル ) 入力パラメータの例 ( 小さな数値の変化で模様が大きく変化します. Ar や Ai の数値を少しずつ変化させて描画する. ) プログラムコード. 2010, AGU, M.

フラクタル 1 ( ジュリア集合 ) 解説 : ジュリア集合 ( 自己平方フラクタル ) 入力パラメータの例 ( 小さな数値の変化で模様が大きく変化します. Ar や Ai の数値を少しずつ変化させて描画する. ) プログラムコード. 2010, AGU, M. フラクタル 1 ( ジュリア集合 ) PictureBox 1 TextBox 1 TextBox 2 解説 : ジュリア集合 ( 自己平方フラクタル ) TextBox 3 複素平面 (= PictureBox1 ) 上の点 ( に対して, x, y) 初期値 ( 複素数 ) z x iy を決める. 0 k 1 z k 1 f ( z) z 2 k a 写像 ( 複素関数 ) (a : 複素定数

More information

AMD s Unified CPU & GPU Processor Concept

AMD s Unified CPU & GPU Processor Concept Advanced Seminar Computer Engineering Institute of Computer Engineering (ZITI) University of Heidelberg February 5, 2014 Overview 1 2 Current Platforms: 3 4 5 Architecture 6 2/37 Single-thread Performance

More information