End User Update: High-Performance Reconfigurable Computing
|
|
- Felicia Banks
- 6 years ago
- Views:
Transcription
1 End User Update: High-Performance Reconfigurable Computing Tarek El-Ghazawi Director, GW Institute for Massively Parallel Applications and Computing Technologies(IMPACT) Co-Director, NSF Center for High-Performance Reconfigurable Computing (CHREC) The George Washington University hpcl.gwu.edu
2 Paul Muzio s Outline! Performance What hardware accelerators are you using/evaluating? Describe the applications that you are porting to accelerators? What kinds of speed-ups are you seeing (provide the basis for the comparison)? How does it compare to scaling out (i.e., just using more X86 processors)? What are the bottlenecks to further performance improvements? Economics Describe the programming effort required to make use of the accelerator. Amortization Compare accelerator cost to scaling out cost Ease of use issues Futures What is the future direction of hardware based accelerators? Software futures? What are your thoughts on what the vendors need to do to ensure wider acceptance of accelerators? 2
3 Why Accelerators: A Historical Perspective Performance Vector Machines Massively Parallel Processors MPPs with Multicores and Heterogeneous Accelerators HPCC End of Moore s Law in Clocking! Hopes are in Architecture! Time 3
4 Which Accelerators? We considered HPRCs more than anything else To be addressed today We are increasingly using GPUs Some Cell 4
5 High-Performance Reconfigurable Computing (HPRC) 5 IEEE Computer, March 2007 High-Performance Reconfigurable Computers are parallel computing systems that contain multiple microprocessors and multiple FPGAs. In current settings, the design uses FPGAs as coprocessors that are deployed to execute the small portion of the application that takes most of the time under the rule, the 10 percent of code that takes 90 percent of the execution time.
6 Evaluated FPGA-Accelerated Systems Altix-4700 XD1 SRC- 6E Altix-350 SRC- 6 HC-36 6
7 An Architectural Classification for Hardware Accelerated High-Performance Computers El-Ghazawi et. al. The Performance Potential of HPRCs. IEEE Computer, February 2008
8 Uniform Nodes Non-Uniform System (UNNS) μp 1 μp N μp 1 μp N RP 1 RP N RP 1 RP N μp Node μp Node RP Node RP Node IN and/or GSM HPRC Examples: SRC 6/7, SGI Altix/RC100 Systems 8
9 Non-Uniform Nodes Uniform System (NNUS) μp RP μp RP IN and/or GSM HPRC Example: Cray XD1, Cray XT5h 9
10 Applications and Performance Cryptography, Remote Sensing and Bioinformatics
11 Hyperspectral Dimension Reduction Multi-Spectral Imagery 10 s of bands (MODIS 36 bands, SeaWiFS 8 bands, IKONOS 5 bands) Hyperspectral Imagery 100 s-1000 s of bands (AVIRIS 224 bands, AIRS 2378 bands) Challenges (Curse of Dimensionality) Solution Dimension Reduction Multispectral / Hyperspectral Imagery Comparison 11
12 Hyperspectral Dimension Reduction (Techniques) Principal Component Analysis (PCA): Most Common Method Dimension Reduction Complex and Global computations: difficult for parallel processing and hardware implementations Multi-Resolution Wavelet Decomposition of Each Pixel 1-D Spectral Signature (Preservation of Spectral Locality) Wavelet-Based Dimension Reduction*: Simple and Local Operations High-Performance Implementation * S. Kaewpijit, J. Le Moigne, T. El-Ghazawi, Automatic Reduction of Hyperspectral Imagery Using Wavelet Spectral Analysis, IEEE Transactions on Geoscience and Remote Sensing, Vol. 41, No. 4, April, 2003, pp
13 Wavelet-Based Dimension Reduction (Execution Profiles on SRC) Total Execution Time = sec (Pentium4, 1.8GHz) Total Execution Time = 1.67 sec (SRC-6E, P3) Speedup = x (without-streaming) Speedup = x (with-streaming) Total Execution Time = 0.84 sec (SRC-6) Speedup = x (without-streaming) Speedup = x (with-streaming) 13
14 Cloud Detection Band 2 (Green Band) Band 3 (Red Band) Band 4 (Near-IR Band) Band 5 (Mid-IR Band) Band 6 (Thermal IR Band) Software/Reference Mask Hardware Floating-Point Mask (Approximate Normalization) Hardware Fixed-Point Mask (Approximate Normalization) 14
15 15 G C T A T T G G - 0 G A T A C T T T - Protein and DNA Matching-The Scoring Matrix G C T A T T G G G A T A C T T T - 0 _ 1, _ 1,, 1 1, max, penalty gap j i F penalty gap j i F y x s j i F j i F j i
16 Application Smith-Waterman (DNA Sequencing) DES Breaker IDEA Breaker RC5(32/12/16) Breaker Savings of HPRC (Based on one Altix U rack) Speedup Cost Savings 22x 96x 2x 17x Assumptions 100% cluster efficiency Cost Factor P : RP 1 : 400 Power Factor P : RP 1 : U Rack: 1230 W µp board (with two µps): 220 W Size Factor P : RP 1 : 34.5 Cluster of 100 µps = four 19-inch racks» footprint = 6 square feet Reconfigurable computer (10U)» footprint = 2.07 square feet SAVINGS Power Savings 779x 3439x Tarek El-Ghazawi, GWU 16 HPC User Forum, Roanoke April 21, x 610x Size Reduction 253x 1116x 28x 198x
17 Smith-Waterman (DNA Sequencing) IDEA Breaker RC5(32/8/8) Breaker Hyperspectral Dimension Reduction Application DES Breaker Savings of HPRC (Based on one Cray-XD1 chassis) Speedup Cloud Detection 110 Assumptions 100% cluster efficiency Cost Factor P : RP 1 : 100 Power Factor P : RP 1 : 20 Cost Savings 28x 122x 24x 23x 1x 1x SAVINGS Power Savings 140x 608x 120x 116x Reconfigurable processor (based on one XD1 Chassis): 2200 W µp board (with two µps): 220 W Size Factor P : RP 1 : 95.8 Cluster of 100 µps = four 19-inch racks» footprint = 6 square feet Reconfigurable computer (one XD1 Chassis)» footprint = 5.75 square feet Tarek El-Ghazawi, GWU 17 HPC User Forum, Roanoke April 21, x 5x Size Reduction 29x 127x 25x 24x 1x 1x
18 Application Smith-Waterman (DNA Sequencing) DES Breaker IDEA Breaker RC5(32/12/16) Breaker Hyperspectral Dimension Reduction Cloud Detection 28 Assumptions 100% cluster efficiency Cost Factor P : RP 1 : 200 Power Factor P : RP 1 : 3.64 Savings of HPRC (Based on SRC-6) Speedup Cost Savings 6x 34x 3x 6x 0.16x 0.14x Reconfigurable processor (based on SRC-6): 200 W µp board (with two µps): 220 W Size Factor P : RP 1 : 33.3 Cluster of 100 µps = four 19-inch racks» footprint = 6 square feet SAVINGS Power Savings 313x 1856x 176x 313x Reconfigurable computer (SRC MAPstation TM ) Tarek El-Ghazawi,» footprint GWU = 1 square feet 18 HPC User Forum, Roanoke April 21, x 8x Size Reduction 34x 203x 19x 34x 0.96x 0.84x
19 Programming
20 Historical Perspective Technology In-Socket Accelerators (65 nm) Circuit Designers Users Tools Evolution New Methodologies, Programming Models and IDEs Platform Specifications & Parallel SW Languages Improved-HLLs Domain Scientists RC-Aware Domain Scientists Embedded Processors & Transceivers (90 nm) Embedded Software Engineers Embedded System Designers DSP Slice & Dual Port Block RAM (130 nm) HW/SW Codesign Embedded & DSP IDEs HLLs IP Core Generators HDLs Schematics RTL DSP & Networking Designers Logic Fabric (180 nm) HPRC PSoC Custom Comp. Glue Logic Glue Logic Applications Custom Comp. PSoC HPRC 20 Tarek El-Ghazawi, GWU 20 HPC User Forum, Roanoke April 21, 2008
21 Programming HPRCs 21
22 Productivity Analysis of Existing Tools Tools considered Impulse-C Handel-C Carte-C Mitrion-C SysGen RC-Toolbox HDLs Utility Frequency Area Cost Acquisition time Learning time Development time Results excerpted from GWU papers in SPL 07 and FPT 07 conferences. 22 Tarek El-Ghazawi, GWU 22 HPC User Forum, Roanoke April 21, 2008
23 Future Hardware Development More use of socket-based integration Better integration with memory hierarchy Better accelerators, in the FPGA side More computationally oriented/floating-point cores? Coarser grain FPGAs? On-chip FPGAs and accelerators? 23
24 Parallelism Concepts: From Systems to Commodity Chips Early 1970s First Vector and SIMD Systems CDC STAR-100 TI ASC ILLIAC IV First MIMD System CMU C.mmp (16 PDP11s) 1985 FPGA Xilinx 1998 HPRC SRC SIMD Altivec By Apple, IBM, and Motorola 2001 Vector Processor/ SIMD CELL BE GPGPUs NVIDIA and AMD Mulicore CPUs IBM Power 4 Time Coming soon? Hybrid- Reconfigurable/ Chip? Accelerators as cores? 24
25 Future Software More user/application-centric programming Unified parallel programming interface? More efficient compiling Tools for accelerator-gpp application co-design Virtualization for ease-of-use and portability HELP MAY BE COMING? 25
26 DARPA Studies DARPA is looking at bridging productivity gap for FPGAs NSF CHREC Schools (UF, GWU) and (BYU, VT) conducted a DARPA study DARPA has at least one more ongoing study Are we going to see any BAAs? 26
27 27
28 Conclusions Lots of common issues among accelerators For the applications that they can do well, they do really well! FPGAs were not built originally for computing Limited applications Less than user friendly interfaces Very long time for compiling Programming languages expose a restrictive view of the system and are often hardware oriented, Need a single system wide language paradigm A major bottleneck is data transfer rates between the microprocessor and the FPGA More work is needed on how to manage heterogeneity Virtualization for portability and ease of use Advanced programming models based on parallel computing New tools for performance tuning and debugging in heterogeneous environments Better integration into memory hierarchy The above requires fundamental work that will be unlikely supported by vendors alone, it needs a for example DARPA Driven Industry/University effort (like HPCS) Tarek El-Ghazawi, GWU 28 HPC User Forum, Roanoke April 21, 2008
Developing Applications for HPRCs
Developing Applications for HPRCs Esam El-Araby The George Washington University Acknowledgement Prof.\ Tarek El-Ghazawi Mohamed Taher ARSC SRC SGI Cray 2 Outline Background Methodology A Case Studies
More informationIn the past few years, high-performance computing. The Promise of High-Performance Reconfigurable Computing
R E S E A R C H F E A T U R E The Promise of High-Performance Reconfigurable Computing Tarek El-Ghazawi, Esam El-Araby, and Miaoqing Huang, George Washington University Kris Gaj, George Mason University
More informationIn the past few years, high-performance computing. The Promise of High-Performance Reconfigurable Computing
R E S E A R C H F E A T U R E The Promise of High-Performance Reconfigurable Computing Tarek El-Ghazawi, Esam El-Araby, and Miaoqing Huang, George Washington University Kris Gaj, George Mason University
More informationIntroduction to GPU computing
Introduction to GPU computing Nagasaki Advanced Computing Center Nagasaki, Japan The GPU evolution The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. The GPU
More informationMaster s Thesis Presentation Hoang Le Director: Dr. Kris Gaj
Master s Thesis Presentation Hoang Le Director: Dr. Kris Gaj Outline RSA ECM Reconfigurable Computing Platforms, Languages and Programming Environments Partitioning t ECM Code between HDLs and HLLs Implementation
More informationAn Implementation Comparison of an IDEA Encryption Cryptosystem on Two General-Purpose Reconfigurable Computers
An Implementation Comparison of an IDEA Encryption Cryptosystem on Two General-Purpose Reconfigurable Computers Allen Michalski 1, Kris Gaj 1, Tarek El-Ghazawi 2 1 ECE Department, George Mason University
More informationWhat does Heterogeneity bring?
What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or
More informationReconfigurable Hardware Implementation of Mesh Routing in the Number Field Sieve Factorization
Reconfigurable Hardware Implementation of Mesh Routing in the Number Field Sieve Factorization Sashisu Bajracharya, Deapesh Misra, Kris Gaj George Mason University Tarek El-Ghazawi The George Washington
More informationBig Data Meets High-Performance Reconfigurable Computing
Big Data Meets High-Performance Reconfigurable Computing UF Workshop on Dense, Intense, and Complex Data Alan George CHREC Center Director Herman Lam CHREC Center Associate Director June 19, 2013 What
More informationA Framework to Improve IP Portability on Reconfigurable Computers
A Framework to Improve IP Portability on Reconfigurable Computers Miaoqing Huang, Ivan Gonzalez, Sergio Lopez-Buedo, and Tarek El-Ghazawi NSF Center for High-Performance Reconfigurable Computing (CHREC)
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution
More informationRequirements for Scalable Application Specific Processing in Commercial HPEC
Requirements for Scalable Application Specific Processing in Commercial HPEC Steven Miller Silicon Graphics, Inc. Phone: 650-933-1899 Email Address: scm@sgi.com Abstract: More and more High Performance
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationTools for Reconfigurable Supercomputing. Kris Gaj George Mason University
Tools for Reconfigurable Supercomputing Kris Gaj George Mason University 1 Application Development for Reconfigurable Computers Program Entry Platform mapping Debugging & Verification Compilation Execution
More information"On the Capability and Achievable Performance of FPGAs for HPC Applications"
"On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies
More informationParallel Programming of High-Performance Reconfigurable Computing Systems with Unified Parallel C
Parallel Programming of High-Performance Reconfigurable Computing Systems with Unified Parallel C Tarek El-Ghazawi, Olivier Serres, Samy Bahra, Miaoqing Huang and Esam El-Araby Department of Electrical
More informationIntegrating FPGAs in High Performance Computing A System, Architecture, and Implementation Perspective
Integrating FPGAs in High Performance Computing A System, Architecture, and Implementation Perspective Nathan Woods XtremeData FPGA 2007 Outline Background Problem Statement Possible Solutions Description
More informationImplementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer
Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Sashisu Bajracharya, Chang Shu, Kris Gaj George Mason University Tarek El-Ghazawi The George
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationComputing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany
Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been
More informationHeterogenous Computing
Heterogenous Computing Fall 2018 CS, SE - Freshman Seminar 11:00 a 11:50a Computer Architecture What are the components of a computer? How do these components work together to perform computations? How
More informationIMPLICIT+EXPLICIT Architecture
IMPLICIT+EXPLICIT Architecture Fortran Carte Programming Environment C Implicitly Controlled Device Dense logic device Typically fixed logic µp, DSP, ASIC, etc. Implicit Device Explicit Device Explicitly
More informationWhite Paper Assessing FPGA DSP Benchmarks at 40 nm
White Paper Assessing FPGA DSP Benchmarks at 40 nm Introduction Benchmarking the performance of algorithms, devices, and programming methodologies is a well-worn topic among developers and research of
More informationECE 699: Lecture 12. Introduction to High-Level Synthesis
ECE 699: Lecture 12 Introduction to High-Level Synthesis Required Reading The ZYNQ Book Chapter 14: Spotlight on High-Level Synthesis Chapter 15: Vivado HLS: A Closer Look S. Neuendorffer and F. Martinez-Vallina,
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationMulticore Computing and Scientific Discovery
scientific infrastructure Multicore Computing and Scientific Discovery James Larus Dennis Gannon Microsoft Research In the past half century, parallel computers, parallel computation, and scientific research
More informationCASE STUDY: Using Field Programmable Gate Arrays in a Beowulf Cluster
CASE STUDY: Using Field Programmable Gate Arrays in a Beowulf Cluster Mr. Matthew Krzych Naval Undersea Warfare Center Phone: 401-832-8174 Email Address: krzychmj@npt.nuwc.navy.mil The Robust Passive Sonar
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationThe future is parallel but it may not be easy
The future is parallel but it may not be easy Michael J. Flynn Maxeler and Stanford University M. J. Flynn 1 HiPC Dec 07 Outline I The big technology tradeoffs: area, time, power HPC: What s new at the
More informationReconfigurable Computing - (RC)
Reconfigurable Computing - (RC) Yogindra S Abhyankar Hardware Technology Development Group, C-DAC Outline Motivation Architecture Applications Performance Summary HPC Fastest Growing Sector HPC, the massive
More informationHow to Write Fast Code , spring th Lecture, Mar. 31 st
How to Write Fast Code 18-645, spring 2008 20 th Lecture, Mar. 31 st Instructor: Markus Püschel TAs: Srinivas Chellappa (Vas) and Frédéric de Mesmay (Fred) Introduction Parallelism: definition Carrying
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationNCBI BLAST accelerated on the Mitrion Virtual Processor
NCBI BLAST accelerated on the Mitrion Virtual Processor Why FPGAs? FPGAs are 10-30x faster than a modern Opteron or Itanium Performance gap is likely to grow further in the future Full performance at low
More informationChallenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008
Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationReconfigurable Computing. Introduction
Reconfigurable Computing Tony Givargis and Nikil Dutt Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally
More informationExperts in Application Acceleration Synective Labs AB
Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg
More informationModern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design
Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant
More informationSupport for Programming Reconfigurable Supercomputers
Support for Programming Reconfigurable Supercomputers Miriam Leeser Nicholas Moore, Albert Conti Dept. of Electrical and Computer Engineering Northeastern University Boston, MA Laurie Smith King Dept.
More informationOverview. CS 472 Concurrent & Parallel Programming University of Evansville
Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University
More informationHeterogeneous Processing
Heterogeneous Processing Maya Gokhale maya@lanl.gov Outline This talk: Architectures chip node system Programming Models and Compiler Overview Applications Systems Software Clock rates and transistor counts
More informationEvolution of Computers & Microprocessors. Dr. Cahit Karakuş
Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor
More informationParallel Computing Why & How?
Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel
More informationThe Heterogeneous Programming Jungle. Service d Expérimentation et de développement Centre Inria Bordeaux Sud-Ouest
The Heterogeneous Programming Jungle Service d Expérimentation et de développement Centre Inria Bordeaux Sud-Ouest June 19, 2012 Outline 1. Introduction 2. Heterogeneous System Zoo 3. Similarities 4. Programming
More informationSupercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?
Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA
More informationProgramming Soft Processors in High Performance Reconfigurable Computing
Programming Soft Processors in High Performance Reconfigurable Computing Andrew W. H. House and Paul Chow Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada M5S
More informationAddressing Heterogeneity in Manycore Applications
Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction
More informationMetropolitan Road Traffic Simulation on FPGAs
Metropolitan Road Traffic Simulation on FPGAs Justin L. Tripp, Henning S. Mortveit, Anders Å. Hansson, Maya Gokhale Los Alamos National Laboratory Los Alamos, NM 85745 Overview Background Goals Using the
More informationUsing FPGAs in Supercomputing Reconfigurable Supercomputing
Using FPGAs in Supercomputing Reconfigurable Supercomputing Why FPGAs? FPGAs are 10 100x faster than a modern Itanium or Opteron Performance gap is likely to grow further in the future Several major vendors
More informationTutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers
Tutorial Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012
More informationVLSI Design Automation
VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,
More informationThe Art of Parallel Processing
The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a
More informationProgramming Models for Multi- Threading. Brian Marshall, Advanced Research Computing
Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows
More informationIntroduction to Embedded Systems
Introduction to Embedded Systems Minsoo Ryu Hanyang University Outline 1. Definition of embedded systems 2. History and applications 3. Characteristics of embedded systems Purposes and constraints User
More informationAn innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ.
An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ. Of Pisa Italy 29/02/2012, Nuremberg, Germany ARTEMIS ARTEMIS Joint Joint Undertaking
More informationHPC future trends from a science perspective
HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively
More informationImplementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer
Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Sashisu Bajracharya 1, Chang Shu 1, Kris Gaj 1, Tarek El-Ghazawi 2 1 ECE Department, George
More informationCenter Extreme Scale CS Research
Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek
More informationDid I Just Do That on a Bunch of FPGAs?
Did I Just Do That on a Bunch of FPGAs? Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto About the Talk Title It s the measure
More informationMore Course Information
More Course Information Labs and lectures are both important Labs: cover more on hands-on design/tool/flow issues Lectures: important in terms of basic concepts and fundamentals Do well in labs Do well
More informationFPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges
Bob Blainey IBM Software Group 27 Feb 2011 FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges Workshop on The Role of FPGAs in a Converged Future with Heterogeneous Programmable
More informationEarly Experiences with the Naval Research Laboratory XD1. Wendell Anderson Dr. Robert Rosenberg Dr. Marco Lanzagorta Dr.
Your corporate logo here Early Experiences with the Naval Research Laboratory XD1 Wendell Anderson Dr. Robert Rosenberg Dr. Marco Lanzagorta Dr. Jeanie Osburn Who we are Naval Research Laboratory Navy
More informationHigher Level Programming Abstractions for FPGAs using OpenCL
Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*
More informationCMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN
CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN Graphics Processing Unit Accelerate the creation of images in a frame buffer intended for the output
More informationMohamed Taher The George Washington University
Experience Programming Current HPRCs Mohamed Taher The George Washington University Acknowledgements GWU: Prof.\Tarek El-Ghazawi and Esam El-Araby GMU: Prof.\Kris Gaj ARSC SRC SGI CRAY 2 Outline Introduction
More informationThread and Data parallelism in CPUs - will GPUs become obsolete?
Thread and Data parallelism in CPUs - will GPUs become obsolete? USP, Sao Paulo 25/03/11 Carsten Trinitis Carsten.Trinitis@tum.de Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR) Institut für
More informationConcepts from High-Performance Computing
Concepts from High-Performance Computing Lecture A - Overview of HPC paradigms OBJECTIVE: The clock speeds of computer processors are topping out as the limits of traditional computer chip technology are
More informationFPGA-based Supercomputing: New Opportunities and Challenges
FPGA-based Supercomputing: New Opportunities and Challenges Naoya Maruyama (RIKEN AICS)* 5 th ADAC Workshop Feb 15, 2018 * Current Main affiliation is Lawrence Livermore National Laboratory SIAM PP18:
More informationHakam Zaidan Stephen Moore
Hakam Zaidan Stephen Moore Outline Vector Architectures Properties Applications History Westinghouse Solomon ILLIAC IV CDC STAR 100 Cray 1 Other Cray Vector Machines Vector Machines Today Introduction
More informationAdvanced Imaging Applications on Smart-phones Convergence of General-purpose computing, Graphics acceleration, and Sensors
Advanced Imaging Applications on Smart-phones Convergence of General-purpose computing, Graphics acceleration, and Sensors Sriram Sethuraman Technologist & DMTS, Ittiam 1 Overview Imaging on Smart-phones
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationVLSI Design Automation. Calcolatori Elettronici Ing. Informatica
VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing
More informationDeveloping a Data Driven System for Computational Neuroscience
Developing a Data Driven System for Computational Neuroscience Ross Snider and Yongming Zhu Montana State University, Bozeman MT 59717, USA Abstract. A data driven system implies the need to integrate
More informationWorkload Optimized Systems: The Wheel of Reincarnation. Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013
Workload Optimized Systems: The Wheel of Reincarnation Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013 Outline Definition Technology Minicomputers Prime Workstations Apollo Graphics
More informationAltera SDK for OpenCL
Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group
More informationBuilding supercomputers from embedded technologies
http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationThe WINLAB Cognitive Radio Platform
The WINLAB Cognitive Radio Platform IAB Meeting, Fall 2007 Rutgers, The State University of New Jersey Ivan Seskar Software Defined Radio/ Cognitive Radio Terminology Software Defined Radio (SDR) is any
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationVLSI Design Automation
VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationWhat Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others
What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University * slides thanks to Kavita Bala & many others Final Project Demo Sign-Up: Will be posted outside my office after lecture today.
More informationCray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:
Cray events! Cray User Group (CUG):! When: May 16-19, 2005! Where: Albuquerque, New Mexico - USA! Registration: reserved to CUG members! Web site: http://www.cug.org! Cray Technical Workshop Europe:! When:
More informationParallelism and Concurrency. COS 326 David Walker Princeton University
Parallelism and Concurrency COS 326 David Walker Princeton University Parallelism What is it? Today's technology trends. How can we take advantage of it? Why is it so much harder to program? Some preliminary
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationThe Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center
The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationFPGAs as Components in Heterogeneous HPC Systems: Raising the Abstraction Level of Heterogeneous Programming
FPGAs as Components in Heterogeneous HPC Systems: Raising the Abstraction Level of Heterogeneous Programming Wim Vanderbauwhede School of Computing Science University of Glasgow A trip down memory lane
More informationINTRODUCTION TO FPGA ARCHITECTURE
3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)
More informationPark Sung Chul. AE MentorGraphics Korea
PGA Design rom Concept to Silicon Park Sung Chul AE MentorGraphics Korea The Challenge of Complex Chip Design ASIC Complex Chip Design ASIC or FPGA? N FPGA Design FPGA Embedded Core? Y FPSoC Design Considerations
More informationIntroduction to High-Performance Computing
Introduction to High-Performance Computing Simon D. Levy BIOL 274 17 November 2010 Chapter 12 12.1: Concurrent Processing High-Performance Computing A fancy term for computers significantly faster than
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationDIGITAL DESIGN TECHNOLOGY & TECHNIQUES
DIGITAL DESIGN TECHNOLOGY & TECHNIQUES CAD for ASIC Design 1 INTEGRATED CIRCUITS (IC) An integrated circuit (IC) consists complex electronic circuitries and their interconnections. William Shockley et
More informationSDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center
SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently
More informationMassively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain
Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,
More informationDistributed systems: paradigms and models Motivations
Distributed systems: paradigms and models Motivations Prof. Marco Danelutto Dept. Computer Science University of Pisa Master Degree (Laurea Magistrale) in Computer Science and Networking Academic Year
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationComputer Architecture!
Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More information