Realtime Photometry System for AST3 (AST3_RPS)

Size: px
Start display at page:

Download "Realtime Photometry System for AST3 (AST3_RPS)"

Transcription

1 Realtime Photometry System for AST3 (AST3_RPS) Yu Ce ( 于策 ) yuce@tju.edu.cn School of Computer Science and Technology, Tianjin University ( 天津大学 ) Under the direction of Zhaohui Shang Xi an, Aug. 2010

2 Team Members Astronomers Zhaohui Shang, Jun Pan, Qiang Liu, Bin Ma Members from TJU (HPC and Software) Director: Prof. Jizhou Sun ( 孙济洲 ) Wei Guo, Ce Yu, Jiyan Chen, Jizeng Wei Wei Cao, Xuming Zhang, Jiliang Li, Wendong Kang, Mujin Yang, Shaofei Shi /8/24

3 Outline AST3_RPS Summary Report Related R&D Works /8/24

4 Outline AST3_RPS Summary Report Related R&D Works /8/24

5 AST3_RPS Overview Under support of NSFC ( , ) Build a realtime photometry system for AST3 Image subtraction photometry (based on ISIS, etc) /8/24

6 Provided by Prof. Shang /8/24

7 Challenges of AST3_RPS Processing speed: Realtime Every 2.4 minutes, the telescope will produce a new image (200MB), we must finish the processing procedure before the next image is produced. Robust: High Reliability AST3_RPS will be running through out the whole winter in Kunlun station, without human intervention. So the robustness of the system must be ensured /8/24

8 Work Summary AST3_RPS (AST3 Realtime Photometry System) Daemon AST3 Pipelines Algorithm Parallelization GPU SoC (System on Chip) Output Photometry Pipelines Paralleled Processing Input Image AST3 Daemon Server GPU/SoC Runtime Environment (CPU) /8/24

9 AST3_RPS Daemon AST3 Pipelines Algorithm Parallelization GPU SoC (System on Chip) /8/24

10 Loop and monitor All ISIS Flow China Research Laboratory AST3 Daemon Server (ADS) Periodic check Timer All jobs ok Init runtime parameters A new image Add Start a new job Processes pool Init Abnormal Job Kill job Log Remove Fork() Log Communication with daemon: This job is terminal normally. End /8/24

11 AST3 Daemon Server, self protect Cross protection Daemon Protect service If one is terminated, another will restart it! Cron Daemon /8/24

12 AST3_RPS Daemon AST3 Pipelines Algorithm Parallelization GPU SoC (System on Chip) /8/24

13 AST3 Pipeline Support two observer patterns Tracking Observe Drift Scan Two pipelines Produce the reference image Find the variable objects /8/24

14 Produce the reference image 1 Flat field correction 2 Using sextractor to generate the stars catalog 3 Using SCAMP to generate a ASCII file 4 Using cfitsio to generate the fits header. 5 Co-add the references in the same sky area /8/24

15 Find the variable objects 1 Flat field correction 2 Using sextractor to generate the stars catalog 3 Using SCAMP to generate a ASCII file 4 Using cfitsio to generate the fits header 5 Cut the corresponding area in the references as the reference image 6 Image registration 7 Image subtraction 8 find the variable objects 9 photometry /8/24

16 AST3_RPS Daemon AST3 Pipelines Algorithm Parallelization GPU SoC (System on Chip) /8/24

17 Performance bottleneck In the image subtract photometry processing, the operation of kernel_convolve is the performance bottleneck /8/24

18 GPU Result GPU: NV GTX GFLOPS 1792MB Mem CPU: i7 920, 4core 2.67GHz 8MB Cache Mem: 12GB GPU can meet the requirement of processing speed, but the power consumption is a big problem /8/24

19 SoC solution CPU DMA SRAM FPGA Bus T*core PCIE DDR2 Controller PCIE PHY DDR2 Host /8/24

20 GPU vs SoC GPU SoC Cost Low High Development cycle Short Long Customization No Yes Power consumption High (NV T2, 200W+) Low (T*core, <2W) /8/24

21 Outline AST3_RPS Summary Report Related R&D Works /8/24

22 Research background NAOC-TJU Joint Laboratory in Astro-Informatics Founded in Nov members Director: Jizhou Sun Vice Director Ce Yu Chenzhou Cui /8/24

23 Research topics Astronomical computing Realtime photometry system for AST3 (AST3_RPS) China-VO Large scale cross matching (MapReduce) Indexing massive astronomy data National Astronomical Observatories Science Data Center Parallel programming theory and methods Visual modeling of parallel application Source code skeleton generation /8/24

24 Computing resources Tianhe-1, 1 st peta scale supercomputer in China National Supercomputing Center in Tianjin Its first sub center is founded in TJU in Jul, 2010 Daily management: HPC Lab HPC Cloud of TJU Under construction /8/24

25 /8/24

26 Cooperation with IT companies IBM SUR projects and awards, on parallel computing Google Course development, on Cloud Computing Intel Joint lab and Course development, on multicore Dawning R&D on parallel programming /8/24

27 Our goal: HPC support for Astronomers We are here /8/24

28 Thanks /8/24

Exploration of Cache Coherent CPU- FPGA Heterogeneous System

Exploration of Cache Coherent CPU- FPGA Heterogeneous System Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP INTRODUCTION or With the exponential increase in computational power of todays hardware, the complexity of the problem

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

Godson Processor and its Application in High Performance Computers

Godson Processor and its Application in High Performance Computers Godson Processor and its Application in High Performance Computers Weiwu Hu Institute of Computing Technology, Chinese Academy of Sciences Loongson Technologies Corporation Limited hww@ict.ac.cn 1 Contents

More information

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information

More information

Design of Large-scale Wire-speed Multicast Switching Fabric Based on Distributive Lattice

Design of Large-scale Wire-speed Multicast Switching Fabric Based on Distributive Lattice Design of Large-scale Wire-speed Multicast Switching Fabric Based on Distributive Lattice 1 CUI Kai, 2 LI Ke-dan, 1 CHEN Fu-xing, 1 ZHU Zhi-pu, 1 ZHU Yue-sheng 1. Shenzhen Eng. Lab of Converged Networks

More information

When MPPDB Meets GPU:

When MPPDB Meets GPU: When MPPDB Meets GPU: An Extendible Framework for Acceleration Laura Chen, Le Cai, Yongyan Wang Background: Heterogeneous Computing Hardware Trend stops growing with Moore s Law Fast development of GPU

More information

HPC and Big Data: Updates about China. Haohuan FU August 29 th, 2017

HPC and Big Data: Updates about China. Haohuan FU August 29 th, 2017 HPC and Big Data: Updates about China Haohuan FU August 29 th, 2017 1 Outline HPC and Big Data Projects in China Recent Efforts on Tianhe-2 Recent Efforts on Sunway TaihuLight 2 MOST HPC Projects 2016

More information

arxiv: v1 [physics.comp-ph] 4 Nov 2013

arxiv: v1 [physics.comp-ph] 4 Nov 2013 arxiv:1311.0590v1 [physics.comp-ph] 4 Nov 2013 Performance of Kepler GTX Titan GPUs and Xeon Phi System, Weonjong Lee, and Jeonghwan Pak Lattice Gauge Theory Research Center, CTP, and FPRD, Department

More information

Fast Snippet Generation. Hybrid System

Fast Snippet Generation. Hybrid System Huazhong University of Science and Technology Fast Snippet Generation Approach Based On CPU-GPU Hybrid System Ding Liu, Ruixuan Li, Xiwu Gu, Kunmei Wen, Heng He, Guoqiang Gao, Wuhan, China Outline Background

More information

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Juan C. Pichel Centro de Investigación en Tecnoloxías da Información (CITIUS) Universidade de Santiago de Compostela, Spain

More information

Parallel waveform extraction algorithms for the Cherenkov Telescope Array Real-Time Analysis

Parallel waveform extraction algorithms for the Cherenkov Telescope Array Real-Time Analysis Parallel waveform extraction algorithms for the Cherenkov Telescope Array Real-Time Analysis, a, Andrea Bulgarelli a, Adriano De Rosa a, Alessio Aboudan a, Valentina Fioretti a, Giovanni De Cesare a, Ramin

More information

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Seventh Edition By William Stallings Course Outline & Marks Distribution Hardware Before mid Memory After mid Linux

More information

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,

More information

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Advanced School in High Performance and GRID Computing November Introduction to Grid computing. 1967-14 Advanced School in High Performance and GRID Computing 3-14 November 2008 Introduction to Grid computing. TAFFONI Giuliano Osservatorio Astronomico di Trieste/INAF Via G.B. Tiepolo 11 34131 Trieste

More information

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud Doug Burger Director, Hardware, Devices, & Experiences MSR NExT November 15, 2015 The Cloud is a Growing Disruptor for HPC Moore s

More information

A MPI-based parallel pyramid building algorithm for large-scale RS image

A MPI-based parallel pyramid building algorithm for large-scale RS image A MPI-based parallel pyramid building algorithm for large-scale RS image Gaojin He, Wei Xiong, Luo Chen, Qiuyun Wu, Ning Jing College of Electronic and Engineering, National University of Defense Technology,

More information

Big Data Systems on Future Hardware. Bingsheng He NUS Computing

Big Data Systems on Future Hardware. Bingsheng He NUS Computing Big Data Systems on Future Hardware Bingsheng He NUS Computing http://www.comp.nus.edu.sg/~hebs/ 1 Outline Challenges for Big Data Systems Why Hardware Matters? Open Challenges Summary 2 3 ANYs in Big

More information

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan Abstract String matching is the most

More information

Hardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms

Hardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms Hardware Acceleration of Feature Detection and Description Algorithms on LowPower Embedded Platforms Onur Ulusel, Christopher Picardo, Christopher Harris, Sherief Reda, R. Iris Bahar, School of Engineering,

More information

There s STILL plenty of room at the bottom! Andreas Olofsson

There s STILL plenty of room at the bottom! Andreas Olofsson There s STILL plenty of room at the bottom! Andreas Olofsson 1 Richard Feynman s Lecture (1959) There's Plenty of Room at the Bottom An Invitation to Enter a New Field of Physics Why cannot we write the

More information

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance

More information

OpenPOWER Performance

OpenPOWER Performance OpenPOWER Performance Alex Mericas Chief Engineer, OpenPOWER Performance IBM Delivering the Linux ecosystem for Power SOLUTIONS OpenPOWER IBM SOFTWARE LINUX ECOSYSTEM OPEN SOURCE Solutions with full stack

More information

Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?

Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? Sayantan Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang and D. K. Panda {surs, vishnu, jinhy, huanwei, panda}@cse.ohio-state.edu

More information

Goro Watanabe. Bill King. OOW 2013 The Best Platform for Big Data and Oracle Database 12c. EVP Fujitsu R&D Center North America

Goro Watanabe. Bill King. OOW 2013 The Best Platform for Big Data and Oracle Database 12c. EVP Fujitsu R&D Center North America OOW 2013 The Best Platform for Big Data and Oracle Database 12c Goro Watanabe EVP Fujitsu R&D Center North America Bill King EVP Platform Products Group Fujitsu America, Inc. Overview 1. Fujitsu: Quick

More information

Overview of Tianhe-2

Overview of Tianhe-2 Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn

More information

Intel MIC Programming Workshop, Hardware Overview & Native Execution LRZ,

Intel MIC Programming Workshop, Hardware Overview & Native Execution LRZ, Intel MIC Programming Workshop, Hardware Overview & Native Execution LRZ, 27.6.- 29.6.2016 1 Agenda Intro @ accelerators on HPC Architecture overview of the Intel Xeon Phi Products Programming models Native

More information

Tiny GPU Cluster for Big Spatial Data: A Preliminary Performance Evaluation

Tiny GPU Cluster for Big Spatial Data: A Preliminary Performance Evaluation Tiny GPU Cluster for Big Spatial Data: A Preliminary Performance Evaluation Jianting Zhang 1,2 Simin You 2, Le Gruenwald 3 1 Depart of Computer Science, CUNY City College (CCNY) 2 Department of Computer

More information

Spiral. Computer Generation of Performance Libraries. José M. F. Moura Markus Püschel Franz Franchetti & the Spiral Team. Performance.

Spiral. Computer Generation of Performance Libraries. José M. F. Moura Markus Püschel Franz Franchetti & the Spiral Team. Performance. Spiral Computer Generation of Performance Libraries José M. F. Moura Markus Püschel Franz Franchetti & the Spiral Team Platforms Performance Applications What is Spiral? Traditionally Spiral Approach Spiral

More information

MultiDroid: A Novel Solution to Consolidate Interactive Physical Android Clients on One Single Computing Platform

MultiDroid: A Novel Solution to Consolidate Interactive Physical Android Clients on One Single Computing Platform MultiDroid: A Novel Solution to Consolidate Interactive Physical Android Clients on One Single Computing Platform Bin Yang Shoumeng, Yan Intel R&D Center Intel Labs Agenda Background and Scenarios Solution

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

A Preliminary evalua.on of OpenPOWER through op.mizing stencil based algorithms

A Preliminary evalua.on of OpenPOWER through op.mizing stencil based algorithms A Preliminary evalua.on of OpenPOWER through op.mizing stencil based algorithms Speaker: Jingheng Xu Tsinghua University Revolu'onizing the Datacenter Join the Conversa'on #OpenPOWERSummit Contents 1 About

More information

An approach to provide remote access to GPU computational power

An approach to provide remote access to GPU computational power An approach to provide remote access to computational power University Jaume I, Spain Joint research effort 1/84 Outline computing computing scenarios Introduction to rcuda rcuda structure rcuda functionality

More information

GPU for HPC. October 2010

GPU for HPC. October 2010 GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,

More information

Lecture 1: Gentle Introduction to GPUs

Lecture 1: Gentle Introduction to GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed

More information

HW Trends and Architectures

HW Trends and Architectures Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty

More information

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Network

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Network Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Network Chen Zhang 1, Peng Li 3, Guangyu Sun 1,2, Yijin Guan 1, Bingjun Xiao 3, Jason Cong 1,2,3 1 Peking University 2 PKU/UCLA Joint

More information

Studying GPU based RTC for TMT NFIRAOS

Studying GPU based RTC for TMT NFIRAOS Studying GPU based RTC for TMT NFIRAOS Lianqi Wang Thirty Meter Telescope Project RTC Workshop Dec 04, 2012 1 Outline Tomography with iterative algorithms on GPUs Matri vector multiply approach Assembling

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

Intel MIC Programming Workshop, Hardware Overview & Native Execution. IT4Innovations, Ostrava,

Intel MIC Programming Workshop, Hardware Overview & Native Execution. IT4Innovations, Ostrava, , Hardware Overview & Native Execution IT4Innovations, Ostrava, 3.2.- 4.2.2016 1 Agenda Intro @ accelerators on HPC Architecture overview of the Intel Xeon Phi (MIC) Programming models Native mode programming

More information

Placement de processus (MPI) sur architecture multi-cœur NUMA

Placement de processus (MPI) sur architecture multi-cœur NUMA Placement de processus (MPI) sur architecture multi-cœur NUMA Emmanuel Jeannot, Guillaume Mercier LaBRI/INRIA Bordeaux Sud-Ouest/ENSEIRB Runtime Team Lyon, journées groupe de calcul, november 2010 Emmanuel.Jeannot@inria.fr

More information

Design of analog acquisition and storage system about airborne flight data recorder

Design of analog acquisition and storage system about airborne flight data recorder 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Design of analog acquisition and storage system about airborne flight data recorder Changyou Li 1, a, Pengfei Sun 1, b

More information

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator

More information

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation

More information

Accelerating Business Analytics with Flash Storage and FPGAs

Accelerating Business Analytics with Flash Storage and FPGAs Accelerating Business Analytics with Flash Storage and FPGAs Satoru Watanabe Center for Technology Innovation - Information and Telecommunications Hitachi, Ltd., Research and Development Group Aug.10 2016

More information

Outline. The demand The San Jose NAP. What s the Problem? Most things. Time. Part I AN OVERVIEW OF HARDWARE ISSUES FOR IP AND ATM.

Outline. The demand The San Jose NAP. What s the Problem? Most things. Time. Part I AN OVERVIEW OF HARDWARE ISSUES FOR IP AND ATM. Outline AN OVERVIEW OF HARDWARE ISSUES FOR IP AND ATM Name one thing you could achieve with ATM that you couldn t with IP! Nick McKeown Assistant Professor of Electrical Engineering and Computer Science

More information

Adaptive selfcalibration for Allen Telescope Array imaging

Adaptive selfcalibration for Allen Telescope Array imaging Adaptive selfcalibration for Allen Telescope Array imaging Garrett Keating, William C. Barott & Melvyn Wright Radio Astronomy laboratory, University of California, Berkeley, CA, 94720 ABSTRACT Planned

More information

GPU-ACCELERATED SPECKLE MASKING RECONSTRUCTION ALGORITHM

GPU-ACCELERATED SPECKLE MASKING RECONSTRUCTION ALGORITHM Journal of the Korean Astronomical Society https://doi.org/10.5303/jkas.2018.51.3.65 51: 65 71, 2018 June pissn: 1225-4614 eissn: 2288-890X c 2018. The Korean Astronomical Society. All rights reserved.

More information

INSPUR HPC Development

INSPUR HPC Development INSPUR HPC Development -Industrial Perspective of China Zilong (DAVID) XU Contents 1. Introduction of INSPUR 2. Introduction of INSPUR HPC 3. Being A Responsible HPC Company 2 Contents 1. Introduction

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Jacquard Control System of Warp Knitting Machine Based on Embedded System

Jacquard Control System of Warp Knitting Machine Based on Embedded System IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Control System of Warp Knitting Machine Based on Embedded System To cite this article: Li Ce et al 2018 IOP Conf. Ser.: Mater.

More information

Building Energy Saving Configuration Software Data Processing System

Building Energy Saving Configuration Software Data Processing System 2017 3rd International Conference on Electronic Information Technology and Intellectualization (ICEITI 2017) ISBN: 978-1-60595-512-4 Building Energy Saving Configuration Software Data Processing System

More information

The Future of High- Performance Computing

The Future of High- Performance Computing Lecture 26: The Future of High- Performance Computing Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Comparing Two Large-Scale Systems Oakridge Titan Google Data Center Monolithic

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Ninth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Maximizing heterogeneous system performance with ARM interconnect and CCIX Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU

More information

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort rob@cs.vu.nl Schedule 2 1. Introduction, performance metrics & analysis 2. Many-core hardware 3. Cuda class 1: basics 4. Cuda class

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011 The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011 HPC Scale Working Group, Dec 2011 Gilad Shainer, Pak Lui, Tong Liu,

More information

International Conference on Information Sciences, Machinery, Materials and Energy (ICISMME 2015)

International Conference on Information Sciences, Machinery, Materials and Energy (ICISMME 2015) International Conference on Information Sciences, Machinery, Materials and Energy (ICISMME 2015) ARINC - 429 airborne communications transceiver system based on FPGA implementation Liu Hao 1,Gu Cao 2,MA

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved

More information

NAMD GPU Performance Benchmark. March 2011

NAMD GPU Performance Benchmark. March 2011 NAMD GPU Performance Benchmark March 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory

More information

MANY-CORE COMPUTING. 7-Oct Ana Lucia Varbanescu, UvA. Original slides: Rob van Nieuwpoort, escience Center

MANY-CORE COMPUTING. 7-Oct Ana Lucia Varbanescu, UvA. Original slides: Rob van Nieuwpoort, escience Center MANY-CORE COMPUTING 7-Oct-2013 Ana Lucia Varbanescu, UvA Original slides: Rob van Nieuwpoort, escience Center Schedule 2 1. Introduction, performance metrics & analysis 2. Programming: basics (10-10-2013)

More information

Arguably one of the most fundamental discipline that touches all other disciplines and people

Arguably one of the most fundamental discipline that touches all other disciplines and people The scientific and mathematical approach in information technology and computing Started in the 1960s from Mathematics or Electrical Engineering Today: Arguably one of the most fundamental discipline that

More information

History. PowerPC based micro-architectures. PowerPC ISA. Introduction

History. PowerPC based micro-architectures. PowerPC ISA. Introduction PowerPC based micro-architectures Godfrey van der Linden Presentation for COMP9244 Software view of Processor Architectures 2006-05-25 History 1985 IBM started on AMERICA 1986 Development of RS/6000 1990

More information

Efficient Resource Management for Cloud Computing Environments

Efficient Resource Management for Cloud Computing Environments Efficient Resource Management for Cloud Computing Environments Andrew J. Younge, Gregor von Laszewski, Lizhe Wang Pervasive Technology Institute Indianan University Bloomington, IN USA Sonia Lopez-Alarcon,

More information

Enabling Technology for the Cloud and AI One Size Fits All?

Enabling Technology for the Cloud and AI One Size Fits All? Enabling Technology for the Cloud and AI One Size Fits All? Tim Horel Collaborate. Differentiate. Win. DIRECTOR, FIELD APPLICATIONS The Growing Cloud Global IP Traffic Growth 40B+ devices with intelligence

More information

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice Providing the Best Return on Investment by Delivering the Highest System Efficiency and Utilization Top500 Supercomputers June

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

P4Debug: A Framework for Debugging Programmable Data Planes. Pietro Giuseppe Bressana. Research Advisors: Prof. Robert Soulé Dr.

P4Debug: A Framework for Debugging Programmable Data Planes. Pietro Giuseppe Bressana. Research Advisors: Prof. Robert Soulé Dr. Università della Svizzera italiana P4Debug: A Framework for Debugging Programmable Data Planes Pietro Giuseppe Bressana Research Advisors: Prof. Robert Soulé Dr. Noa Zilberman Emerging In-Network Computing

More information

Performance Analysis and Optimization of Gyrokinetic Torodial Code on TH-1A Supercomputer

Performance Analysis and Optimization of Gyrokinetic Torodial Code on TH-1A Supercomputer Performance Analysis and Optimization of Gyrokinetic Torodial Code on TH-1A Supercomputer Xiaoqian Zhu 1,2, Xin Liu 1, Xiangfei Meng 2, Jinghua Feng 2 1 School of Computer, National University of Defense

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA

More information

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim PowerPC-base Core @3.2GHz 1 VMX vector unit per core 512KB L2 cache 7 x SPE @3.2GHz 7 x 128b 128 SIMD GPRs 7 x 256KB SRAM for SPE 1 of 8 SPEs reserved for redundancy total

More information

Chapter 0 Introduction

Chapter 0 Introduction Chapter 0 Introduction Jin-Fu Li Laboratory Department of Electrical Engineering National Central University Jhongli, Taiwan Applications of ICs Consumer Electronics Automotive Electronics Green Power

More information

A Network Disk Device Based on Web Accessing

A Network Disk Device Based on Web Accessing TELKOMNIKA Indonesian Journal of Electrical Engineering Vol.12, No.6, June 2014, pp. 4387 ~ 4392 DOI: 10.11591/telkomnika.v12i6.5472 4387 A Network Disk Device Based on Web Accessing QunFang Yuan 1, Wenxia

More information

MIL-STD-1553 (T4240/T4160/T4080) 12/8/4 2 PMC/XMC 2.0 WWDT, ETR, RTC, 4 GB DDR3

MIL-STD-1553 (T4240/T4160/T4080) 12/8/4 2 PMC/XMC 2.0 WWDT, ETR, RTC, 4 GB DDR3 Rugged 6U VME Single-Slot SBC Freescale QorIQ Multicore SOC 1/8/4 e6500 Dual Thread Cores (T440/T4160/T4080) Altivec Unit Secure Boot and Trust Architecture.0 4 GB DDR3 with ECC 56 MB NOR Flash Memory

More information

Acceleration of HPC applications on hybrid CPU-GPU systems: When can Multi-Process Service (MPS) help?

Acceleration of HPC applications on hybrid CPU-GPU systems: When can Multi-Process Service (MPS) help? Acceleration of HPC applications on hybrid CPU- systems: When can Multi-Process Service (MPS) help? GTC 2018 March 28, 2018 Olga Pearce (Lawrence Livermore National Laboratory) http://people.llnl.gov/olga

More information

Processing Technology of Massive Human Health Data Based on Hadoop

Processing Technology of Massive Human Health Data Based on Hadoop 6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Processing Technology of Massive Human Health Data Based on Hadoop Miao Liu1, a, Junsheng Yu1,

More information

Fast Hardware For AI

Fast Hardware For AI Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights

More information

Building NVLink for Developers

Building NVLink for Developers Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant

More information

Gen-Z Memory-Driven Computing

Gen-Z Memory-Driven Computing Gen-Z Memory-Driven Computing Our vision for the future of computing Patrick Demichel Distinguished Technologist Explosive growth of data More Data Need answers FAST! Value of Analyzed Data 2005 0.1ZB

More information

The rcuda middleware and applications

The rcuda middleware and applications The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,

More information

A Case for High Performance Computing with Virtual Machines

A Case for High Performance Computing with Virtual Machines A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation

More information

INSPUR and HPC Innovation. Dong Qi (Forrest) Oversea PM

INSPUR and HPC Innovation. Dong Qi (Forrest) Oversea PM INSPUR and HPC Innovation Dong Qi (Forrest) Oversea PM dongqi@inspur.com Contents 1 2 3 4 5 Inspur introduction HPC Challenge and Inspur HPC strategy HPC cases Inspur contribution to HPC community Inspur

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

BlueDBM: An Appliance for Big Data Analytics*

BlueDBM: An Appliance for Big Data Analytics* BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A

More information

Integrating GPUs as fast co-processors into the existing parallel FE package FEAST

Integrating GPUs as fast co-processors into the existing parallel FE package FEAST Integrating GPUs as fast co-processors into the existing parallel FE package FEAST Dipl.-Inform. Dominik Göddeke (dominik.goeddeke@math.uni-dortmund.de) Mathematics III: Applied Mathematics and Numerics

More information

Development and Realization of Real-time Data Exchange between OPC Client and Multiple Remote Servers

Development and Realization of Real-time Data Exchange between OPC Client and Multiple Remote Servers 168 JOURNAL OF COMPUTERS, VOL. 9, NO. 1, JANUARY 2014 Development and Realization of Real-time Data Exchange between OPC Client and Multiple Remote Servers Da-hua Li Tianjin Key Laboratory for Control

More information

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs An Extension of the StarSs Programming Model for Platforms with Multiple GPUs Eduard Ayguadé 2 Rosa M. Badia 2 Francisco Igual 1 Jesús Labarta 2 Rafael Mayo 1 Enrique S. Quintana-Ortí 1 1 Departamento

More information

SDA: Software-Defined Accelerator for general-purpose big data analysis system

SDA: Software-Defined Accelerator for general-purpose big data analysis system SDA: Software-Defined Accelerator for general-purpose big data analysis system Jian Ouyang(ouyangjian@baidu.com), Wei Qi, Yong Wang, Yichen Tu, Jing Wang, Bowen Jia Baidu is beyond a search engine Search

More information