Efficient Application Mapping on CGRAs Based on Backward Simultaneous Scheduling / Binding and Dynamic Graph Transformations
|
|
- Loreen Gallagher
- 6 years ago
- Views:
Transcription
1 Efficient Application Mapping on CGRAs Based on Backward Simultaneous Scheduling / Binding and Dynamic Graph Transformations T. Peyret 1, G. Corre 1, M. Thevenin 1, K. Martin 2, P. Coussy 2 1 CEA, LIST, Electronic Architectures and Sensors Laboratory (LCAE) F Gif-sur-Yvette, France 2 Université de Bretagne-Sud, Lab-STICC Lorient, France ASAP 2014 Conference
2 COARSE-GRAINED RECONFIGURABLE ARCHITECTURE (CGRA) Processing Elements / Tiles Homogeneous/heterogeneous Register Files (RF) Operators Interconnection network Mesh 1D, 2D, Torus, Segmented Example: 4 4 CGRA Torus 2D mesh Local RF PE PE PE PE PE PE PE PE From Neighbours & Memory PE PE PE PE FU RF PE PE PE PE To Neighbours & Memory ASAP2014 Peyret Thomas 2
3 MAPPING ON CGRA Scheduling & binding are two NP-Complete problems Separate resolution Heuristic and meta-heuristic (e.g. EMS, VPR) Heuristic and exact method (e.g. EPIMap, REGIMap) Merge resolution Exact methods (e.g. ILP-Based) Meta-heuristic (e.g. DRESC) Purpose: Have a mapping flow which deeply explores the solution space for entire application code ASAP2014 Peyret Thomas 3
4 MAPPING FLOW C Code Compilation Schedule & Binding of highest Priority Node Yes Changes? N CGRA Model CDFG Solutions? Yes No Graph Transformation Mapping Pruning List of Mappings Application & CGRA models Mapping tool No Last Node? Yes ASAP2014 Peyret Thomas 4
5 APPLICATION & CGRA MODELS Compilation C Control Data Flow Graph (CDFG) with GCC CDFG is composed of basic blocs and a control part Basic blocs are represented by Data Flow Graphs (DFG) New kind of nodes: memorization operation nodes ASAP2014 Peyret Thomas 5
6 Cycle i + 2 Cycle i + 2 Cycle i + 1 Cycle i + 1 Cycle i Cycle i APPLICATION & CGRA MODELS Example of a 2-tile CGRA with RF Memorization operators are introduced Be able to cope with RF 1/A A 2/B B 1 1 RF RF A A B B RF RF A A 3/B B A 4/B A B 4 ASAP2014 Peyret Thomas 6
7 Cycle i + 2 Cycle i + 1 Cycle i APPLICATION & CGRA MODELS Homomorphic CGRA and DFG models Memorization nodes: to keep data dependencies Equivalence between nodes: Operators Operations Registers Data Binding finding DFG into CGRA model 1/A A 2/B B 1 2 A 3/B B B RF A 4/B B 4 ASAP2014 Peyret Thomas 7
8 MAPPING FLOW C Code Compilation Schedule & Binding of highest Priority Node Yes Changes? No Fail CGRA Model CDFG Solutions? Yes No Graph Transformation Mapping Pruning List of Mappings No Last Node? Yes Application & CGRA models Simultaneous Scheduling and Binding Binding method Backward List-scheduling based scheduling Formal graph transformations Pruning step ASAP2014 Peyret Thomas 8
9 SIMULTANEOUS SCHEDULING/BINDING Purpose: Check whether at least one binding solution exists for each node schedule Avoid dead-ends due to the dependence between these two problems Allow to transform the graph only when needed and with the right transformation Based on Levi s algorithm Solves the maximum sub graph problem for homomorphic graphs Incremental version Rely on previously found partial bindings Add the newly scheduled node (and its data node) to the previously considered sub graph Find every possible partial mapping Exhaustive method If no binding solution => graph transformation is required ASAP2014 Peyret Thomas 9
10 GRAPH TRANSFORMATIONS 3 dynamic transformations are proposed: Operation splitting Simple routing Memorization splitting ASAP2014 Peyret Thomas 10
11 Cycle i + 2 Cycle i + 1 Cycle i PRUNING STEP Idea: remove mapping with same operator utilization to limit the number of partial mappings Executed at the end of each scheduling cycle Removes redundant partial mappings Still exhaustive Example: On a 2-tile CGRA 1/A 2/B 1/A 2/B A 3/B 2 2 3/A B A 4/B 4/A B ASAP2014 Peyret Thomas 11
12 EXPERIMENTS & RESULTS Compared with two other methods: Method 1: A forward list-scheduling with just routing transformation and use Levi s algorithm to bind. Method 2: Heuristic described in EPIMap which applies static a priori transformations (routing & splitting) to schedule and use Levi s algorithm to bind. 4 metrics: Success Rate Latency Exploration Quality Exploration Efficiency 9 application codes (FFT, DCT, ) 16 constraint sets per code ASAP2014 Peyret Thomas 12
13 Success Rate EXPERIMENTS & RESULTS Success Rate 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 Method 1 Method 2 Proposed Approach 0,1 0 DC Filter DCT 2D Elliptic Filter EMA Filter FFT Manhattan Distance Matrix Product MWD Filter 99% for Proposed Approach (vs 37% and 62%) Unsharp Mask Average ASAP2014 Peyret Thomas 13
14 Best Latency Rate EXPERIMENTS & RESULTS Percentage of time a mapping has the best latency 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 Method 1 Method 2 Proposed Approach 0,1 0 DC Filter DCT 2D Elliptic Filter EMA Filter FFT Manhattan Distance Matrix Product MWD Filter 90% for Proposed Approach (vs 31% and 42%) Unsharp Mask Average ASAP2014 Peyret Thomas 14
15 CONCLUSION & PROSPECTS Mapping flow C DFGs CGRA Simultaneous scheduling / exhaustive-based binding Dynamic graph transformations Very promising results Success rate Latency Exploration quality and efficiency Future works Improve pruning step Improve scalability ASAP2014 Peyret Thomas 15
16 Thank you for your attention Commissariat à l énergie atomique et aux énergies alternatives Institut Carnot CEA LIST Sensors And Electronic Architectures Laboratory Centre de Saclay bâtiment PC 72l Gif-sur-Yvette Cedex T. +33 (0) Thomas.peyret@cea.fr Etablissement public à caractère industriel et commercial l RCS Paris B
17 INTRODUCTION Performance vs Flexibility vs Conception Cost Raffin E., Déploiement d'applications multimédia sur architecture reconfigurable à gros grain : modélisation avec la programmation par contraintes, 2011 ASAP2014 Peyret Thomas 17
18 INTRODUCTION Many architectures Morphosys DART MORA ADRES Etc. Less automated compilation flow Dedicated to an architecture Not scalable (e.g. ILP-based) Not versatile or with limitations (e.g. no RF or manual partitioning) Only for kernel loop acceleration ASAP2014 Peyret Thomas 18
19 SIMULTANEOUS SCHEDULING/BINDING Purpose: Check if at least one binding solution exist for each node schedule Avoid dead-ends due to the dependence between these two problems Allow to transform the graph only when needed and with the right transformation Example: Map this DFG on this CGRA ASAP2014 Peyret Thomas 19
20 SIMULTANEOUS SCHEDULING/BINDING Schedule example: Cycle Opération ASAP2014 Peyret Thomas 20
21 SIMULTANEOUS SCHEDULING/BINDING Binding is impossible: Cycle A B C D E ? & 14 are conflicting on tile C ASAP2014 Peyret Thomas 21
22 SIMULTANEOUS SCHEDULING/BINDING Other example: Cycle A B C D E ? & 14 are conflicting on tile C ASAP2014 Peyret Thomas 22
23 BACKWARD TRAVERSING Allows to know if a transformation is relevant and which one Schedule and binding of successor nodes are already done So it is possible to know the real needs for the current node Example: Forward (non a priori transformations) Backward Cycle Operations &3 3 7 Cycle Opérations a 2b 3 ASAP2014 Peyret Thomas 23
24 BACKWARD / FORWARD TRAVERSING Example: Forward (a priori transformations) Cycle Operations 1 1 2a b Backward Cycle Operations ASAP2014 Peyret Thomas 24
25 LEVI S ALGORITHM Determining the maximum sub graph between 2 graphs is NP- Complete Based on caracteritics matrix of the graphs Adjacence matrix, compatibility matrix Example of adjacence matrix ASAP2014 Peyret Thomas 25
26 LEVI S ALGORITHM Complete example Adjacence matrix ASAP2014 Peyret Thomas 26
27 LEVI S ALGORITHM Complete example Reduce compatibility matrix ASAP2014 Peyret Thomas 27
28 LEVI S ALGORITHM Complete example Maximum compatibility classes ASAP2014 Peyret Thomas 28
29 LEVI S ALGORITHM Complete example Connected maximum sub graphs ASAP2014 Peyret Thomas 29
30 LEVI S ALGORITHM Complete example Result ASAP2014 Peyret Thomas 30
31 Number of Different Mappings EXPERIMENTS & RESULTS Number of different mappings found Method 1 Method 2 Proposed Approach 2 0 DC Filter DCT 2D Elliptic Filter EMA Filter FFT Manhattan Distance Matrix Product MWD Filter Unsharp Mask 3.7 and 2.4 times higher for Proposed Approach Average ASAP2014 Peyret Thomas 31
32 Number of Different Mappings Generated per Second EXPERIMENTS & RESULTS Number of different mappings found per second 1,6 1,4 1,2 1 Method 1 0,8 0,6 0,4 Method 2 Proposed Approach 0,2 0 DC Filter DCT 2D Elliptic Filter EMA Filter FFT Manhattan Distance Matrix Product MWD Filter Unsharp Mask Average 2.6 and 2.2 more times higher for Proposed Approach ASAP2014 Peyret Thomas 32
CODE ANALYSES FOR NUMERICAL ACCURACY WITH AFFINE FORMS: FROM DIAGNOSIS TO THE ORIGIN OF THE NUMERICAL ERRORS. Teratec 2017 Forum Védrine Franck
CODE ANALYSES FOR NUMERICAL ACCURACY WITH AFFINE FORMS: FROM DIAGNOSIS TO THE ORIGIN OF THE NUMERICAL ERRORS NUMERICAL CODE ACCURACY WITH FLUCTUAT Compare floating point with ideal computation Use interval
More informationPAPYRUS FUTURE. CEA Papyrus Team
PAPYRUS FUTURE CEA ABSTRACT SYNTAX The definition of a DSML abstract syntax in Papyrus is done with the profile editor. It lets define abstract syntax constraints in OCL and Java. Ongoing: Façade [1] lets
More informationMemory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures
Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures Abstract: The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of
More informationDATA-MANAGEMENT DIRECTORY FOR OPENMP 4.0 AND OPENACC
DATA-MANAGEMENT DIRECTORY FOR OPENMP 4.0 AND OPENACC Heteropar 2013 Julien Jaeger, Patrick Carribault, Marc Pérache CEA, DAM, DIF F-91297 ARPAJON, FRANCE 26 AUGUST 2013 24 AOÛT 2013 CEA 26 AUGUST 2013
More informationVISION FOR AUTOMOTIVE DRIVING
VISION FOR AUTOMOTIVE DRIVING French Japanese Workshop on Deep Learning & AI, Paris, October 25th, 2017 Quoc Cuong PHAM, PhD Vision and Content Engineering Lab AI & MACHINE LEARNING FOR ADAS AND SELF-DRIVING
More informationSYSTEM MODELING Introduction
SYSTEM MODELING Introduction 2015-09-14 François Terrier 1 FIRST WHAT IS A SYSTEM? Complex and heterogeneous systems responding to real-world events Human interactions Embedded system Software + Computers
More informationREAL-TIME ADAPTIVE IMAGING FOR ULTRASONIC NONDESTRUCTIVE TESTING OF STRUCTURES WITH IRREGULAR SHAPES
REAL-TIME ADATIVE IMAGING FOR ULTRASONIC NONDESTRUCTIVE TESTING OF STRUCTURES WITH IRREGULAR SHAES Sébastien Robert, Léonard Le Jeune, Vincent Saint-Martin CEA-LIST, 91191 Gif-sur-Yvette Cedex, France
More informationMemory Partitioning Algorithm for Modulo Scheduling on Coarse-Grained Reconfigurable Architectures
Scheduling on Coarse-Grained Reconfigurable Architectures 1 Mobile Computing Center of Institute of Microelectronics, Tsinghua University Beijing, China 100084 E-mail: daiyuli1988@126.com Coarse Grained
More informationModules v4. Pushing forward user environment management. Xavier Delaruelle FOSDEM 2018 February 4th 2018, ULB, Bruxelles
Modules v4 Pushing forward user environment management Xavier Delaruelle FOSDEM 2018 February 4th 2018, ULB, Bruxelles whoami I am Xavier Delaruelle Work at CEA, a large research
More informationSDN-BASED CONFIGURATION SOLUTION FOR IEEE TIME SENSITIVE NETWORKING (TSN)
SDN-BASED CONFIGURATION SOLUTION FOR IEEE 802.1 TIME SENSITIVE NETWORKING (TSN) SIWAR BEN HADJ SAID, QUANG HUY TRUONG, AND MICHAEL BOC CONTEXT Switch to IEEE standard Ethernet in Industrial and automotive
More informationSlurm at CEA. status and evolutions. 13 septembre 2013 CEA 10 AVRIL 2012 PAGE 1. SLURM User Group - September 2013 F. Belot, F. Diakhaté, M.
status and evolutions SLURM User Group - September 2013 F. Belot, F. Diakhaté, M. Hautreux 13 septembre 2013 CEA 10 AVRIL 2012 PAGE 1 Agenda Supercomputing projects Slurm usage and configuration specificities
More informationNext Generation CEA Computing Centres
Next Generation IO @ CEA Computing Centres J-Ch Lafoucriere ORAP Forum #39 2017-03-28 A long History of Storage Architectures Last Century Compute Systems Few Cray Supercomputers (vectors and MPP) Few
More informationELEMENTTYPES CONFIGURATION FRAMEWORK
ELEMENTTYPES CONFIGURATION FRAMEWORK Florian NOYRIT florian.noyrit@cea.fr AGENDA Why such a framework? The Basics How is it used in Papyrus? The Association Example ElementTypeConfiguration for DSML designers
More informationEvolving Frama-C Value Analysis
Evolving Frama-C Value Analysis Evolving Frama-C Value Analysis Frama-C Day 2016 Boris Yakobowski, CEA Tech List Frama-C Value Analysis: a Brief Recap Frama-C Value Analysis: a Brief Recap The Value Analysis
More informationHIGH PERFORMANCE LARGE EDDY SIMULATION OF TURBULENT FLOWS AROUND PWR MIXING GRIDS
HIGH PERFORMANCE LARGE EDDY SIMULATION OF TURBULENT FLOWS AROUND PWR MIXING GRIDS U. Bieder, C. Calvin, G. Fauchet CEA Saclay, CEA/DEN/DANS/DM2S P. Ledac CS-SI HPCC 2014 - First International Workshop
More informationGOING ARM A CODE PERSPECTIVE
GOING ARM A CODE PERSPECTIVE ISC18 Guillaume Colin de Verdière JUNE 2018 GCdV PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France June 2018 A history of disruptions All dates are installation dates of the machines
More informationMapping loops onto Coarse-Grained Reconfigurable Architectures using Particle Swarm Optimization
Mapping loops onto Coarse-Grained Reconfigurable Architectures using Particle Swarm Optimization Rani Gnanaolivu, Theodore S. Norvell, Ramachandran Venkatesan Faculty of Electrical and Computer Engineering
More informationASSEMBLY OF THE IFMIF CRYOMODULE
ASSEMBLY OF THE IFMIF CRYOMODULE Janic Chambrillon On behalf of the SRF-Linac Team TTC Meetting - June 5th 8th, Saclay CONTENT The IFMIF cavity string Test and trial on cavity string elements BPM s buttons
More informationMetaheuristics for Clustered Vehicle Routing Problems
Metaheuristics for Vehicle Routing s T. Barthélémy A. Rossi M. Sevaux K. Sörensen Université de Bretagne-Sud Lab-STICC, CNRS Lorient, France University of Antwerp Faculty of Economics Antwerp, Belgium
More informationECE 5775 (Fall 17) High-Level Digital Design Automation. More Binding Pipelining
ECE 5775 (Fall 17) High-Level Digital Design Automation More Binding Pipelining Logistics Lab 3 due Friday 10/6 No late penalty for this assignment (up to 3 days late) HW 2 will be posted tomorrow 1 Agenda
More informationA Just-In-Time Modulo Scheduling for Virtual Coarse-Grained Reconfigurable Architectures
A Just-In-Time Modulo Scheduling for Virtual Coarse-Grained Reconfigurable Architectures Ricardo Ferreira, Vinicius Duarte, Waldir Meireles, Monica Pereira, Luigi Carro and Stephan Wong Departamento de
More informationBranch-Aware Loop Mapping on CGRAs
Branch-Aware Loop Mapping on CGRAs Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula School of Computing, Informatics, and Decision Systems Engineering Arizona State University, Tempe, AZ {mahdi, aviral.shrivastava,
More informationModules v4. Yes, Environment Modules project is not dead. Xavier Delaruelle
Modules v4 Yes, Environment Modules project is not dead Xavier Delaruelle 3rd EasyBuild User Meeting January 30th 2018, SURFsara, Amsterdam whoami I am Xavier Delaruelle Joined
More informationALICE. Double Chooz. Irfu. Interpreting radiations from the Universe. Site report 2017 IRFU ARNAB SINHA
ALICE Double Chooz Irfu Edelweiss HESS Herschel CMS Interpreting radiations from the Universe. Site report 2017 IRFU ARNAB SINHA Irvin MARTIN Pascal ALLEXANDRE Dora MERELLI Frederic SCHAER Augustin VISSER
More informationCombination of Parallel Imaging and Compressed Sensing for high acceleration factor at 7T
Combination of Parallel Imaging and Compressed Sensing for high acceleration factor at 7T DEDALE Workshop Nice Loubna EL GUEDDARI (NeuroSPin) Joint work with: Carole LAZARUS, Alexandre VIGNAUD and Philippe
More informationRobinHood Project Update
FROM RESEARCH TO INDUSTRY RobinHood Project Update Robinhood User Group 2016 Thomas Leibovici SEPTEMBER, 19 th 2016 Project update Latest Releases Robinhood 2.5.6 (july 2016)
More informationASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler
ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler François Gonard, Marc Schoenauer, Michele Sebag To cite this version: François Gonard, Marc Schoenauer, Michele Sebag.
More informationBINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE
BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE Sébastien Bardin (CEA LIST) Joint work with Richard Bonichon, Robin David, Adel Djoudi & many other people 1 ABOUT MY LAB @CEA 2 IN A NUTSHELL Binary-level
More informationParallelization Using a PGAS Language such as X10 in HYDRO and TRITON
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallelization Using a PGAS Language such as X10 in HYDRO and TRITON Marc Tajchman* a a Commissariat à l énergie atomique
More informationLecture 21: High-level Synthesis (2)
Lecture 21: High-level Synthesis (2) Slides courtesy of Deming Chen Outline Binding for DFG Left-edge algorithm Network flow algorithm Binding to reduce interconnects Simultaneous scheduling and binding
More informationSIDE CHANNEL ANALYSIS : LOW COST PLATFORM. ETSI SECURITY WEEK Driss ABOULKASSIM Jacques FOURNIERI
SIDE CHANNEL ANALYSIS : LOW COST PLATFORM ETSI SECURITY WEEK Driss ABOULKASSIM Jacques FOURNIERI THE CEA Military Applications Division (DAM) Nuclear Energy Division (DEN) Technological Research Division
More informationHyCUBE: A CGRA with Reconfigurable Single-cycle Multi-hop Interconnect
HyCUBE: A CGRA with Reconfigurable Single-cycle Multi-hop Interconnect Manupa Karunaratne, Aditi Kulkarni Mohite, Tulika Mitra and Li-Shiuan Peh National University of Singapore {manupa,aditi,tulika,peh}@comp.nus.edu.sg
More informationRobinHood Project Status
FROM RESEARCH TO INDUSTRY RobinHood Project Status Robinhood User Group 2015 Thomas Leibovici 9/18/15 SEPTEMBER, 21 st 2015 Project history... 1999: simple purge tool for HPC
More informationCoarse Grain Reconfigurable Arrays are Signal Processing Engines!
Coarse Grain Reconfigurable Arrays are Signal Processing Engines! Advanced Topics in Telecommunications, Algorithms and Implementation Platforms for Wireless Communications, TLT-9707 Waqar Hussain Researcher
More informationOVERVIEW OF MPC JUNE 24 TH LLNL Meeting June 15th, 2015 PAGE 1
OVERVIEW OF MPC Forum Teratec Patrick CARRIBA ULT, Julien JAEGER, Marc PERACHE CEA, DAM, DIF, F-91297 Arpajon, France www.cea.fr www.cea.fr JUNE 24 TH 2015 LLNL Meeting June 15th, 2015 PAGE 1 Context Starting
More informationEPIMap: Using Epimorphism to Map Applications on CGRAs
EPIMap: Using Epimorphism to Map Applications on CGRAs Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula School of Computing, Informatics, and Decision Systems Engineering Arizona State University,
More informationDesign methodology for multi processor systems design on regular platforms
Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline
More informationCo-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationMANAGING LUSTRE & ITS CEA
MANAGING LUSTRE & ITS DATA @ CEA LUG Japan Aurelien Degremont CEA, DAM, DIF, F-91297 ARPAJON CEDEX October 17, 2013 CEA 10 AVRIL 2012 PAGE 1 AGENDA WHAT IS CEA? LUSTRE ARCHITECTURE
More informationEE382V: System-on-a-Chip (SoC) Design
EE382V: System-on-a-Chip (SoC) Design Lecture 8 HW/SW Co-Design Sources: Prof. Margarida Jacome, UT Austin Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu
More informationACCURACY-ENERGY TRADE-OFF WITH DYNAMIC ADEQUATE OPERATORS. MPSoC 2017 Anca Molnos 06/07/2017
ACCURACY-ENERGY TRADE-OFF WITH DYNAMIC ADEQUATE OPERATORS MPSoC 2017 Anca Molnos 06/07/2017 OVERVIEW Context: adequate/approximate computing Hardware Design methodology for dynamic accuracy operators Software
More informationPerformance Improvements of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path
Performance Improvements of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path MICHALIS D. GALANIS 1, GREGORY DIMITROULAKOS 2, COSTAS E. GOUTIS 3 VLSI Design Laboratory, Electrical
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationA Graceful Degradation Framework for Distributed Embedded Systems William Nace Philip Koopman
A Graceful Degradation Framework for Distributed Embedded Systems William Nace Philip Koopman Electrical & Computer ENGINEERING RoSES Project Robust Self-configuring Embedded Systems (RoSES) Robustness
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Last class Outline
More informationHardware-Software Codesign
Hardware-Software Codesign 4. System Partitioning Lothar Thiele 4-1 System Design specification system synthesis estimation SW-compilation intellectual prop. code instruction set HW-synthesis intellectual
More informationDesign of Parallel Algorithms. Models of Parallel Computation
+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes
More informationLecture Compiler Backend
Lecture 19-23 Compiler Backend Jianwen Zhu Electrical and Computer Engineering University of Toronto Jianwen Zhu 2009 - P. 1 Backend Tasks Instruction selection Map virtual instructions To machine instructions
More informationMARKET demands urge embedded systems to incorporate
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 3, MARCH 2011 429 High Performance and Area Efficient Flexible DSP Datapath Synthesis Sotirios Xydis, Student Member, IEEE,
More informationCoarse Grained Reconfigurable Architecture
Coarse Grained Reconfigurable Architecture Akeem Edwards July 29 2012 Abstract: This paper examines the challenges of mapping applications on to a Coarsegrained reconfigurable architecture (CGRA). Through
More informationApproximate Computing with Runtime Code Generation on Resource-Constrained Embedded Devices
Approximate Computing with Runtime Code Generation on Resource-Constrained Embedded Devices WAPCO HiPEAC conference 2016 Damien Couroussé Caroline Quéva Henri-Pierre Charles www.cea.fr Univ. Grenoble Alpes,
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationA Spatial Mapping Algorithm for Heterogeneous Coarse- Grained Reconfigurable Architectures
A Spatial Mapping Algorithm for Heterogeneous Coarse- Grained Reconfigurable Architectures Minwook Ahn, Jonghee W. Yoon, Yunheung Paek Software Optimization & Restructuring Laboratory, School of EE/CS,
More informationIterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen and Nicolas Vasilache ALCHEMY, INRIA Futurs / University of Paris-Sud XI March
More informationELECTROMAGNETIC GLITCH ON THE AES ROUND COUNTER
ELECTROMAGNETIC GLITCH ON THE AES ROUND COUNTER Amine DEHBAOUI ¹, Amir-Pasha Mirbaha ², Nicolas MORO¹, Jean-Max DUTERTRE ², Assia TRIA ¹ COSADE 2013 Paris, France (1) (2) OUTLINE! Context! Round Modification
More informationTODAY, new applications, e.g., multimedia or advanced
584 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1998 A Formal Technique for Hardware Interface Design Adel Baganne, Jean-Luc Philippe, and Eric
More informationMapping DSP Applications on Processor Systems with Coarse-Grain Reconfigurable Hardware
Mapping DSP Applications on Processor Systems with Coarse-Grain Reconfigurable Hardware Michalis D. Galanis 1, Gregory Dimitroulakos 2, and Costas E. Goutis 3 VLSI Design Laboratory, Electrical and Computer
More informationGeneric Design Space Exploration for Reconfigurable Architectures
Generic Design Space Exploration for Reconfigurable Architectures Lilian Bossuet, Guy Gogniat, Jean Luc Philippe To cite this version: Lilian Bossuet, Guy Gogniat, Jean Luc Philippe. Generic Design Space
More informationA Novel Design Framework for the Design of Reconfigurable Systems based on NoCs
Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction
More informationA Bimodal Scheduler for Coarse-Grained Reconfigurable Arrays
15 A Bimodal Scheduler for Coarse-Grained Reconfigurable Arrays PANAGIOTIS THEOCHARIS and BJORN DE SUTTER, Ghent University, Belgium Compilers for Course-Grained Reconfigurable Array (CGRA) architectures
More informationFROM RESEARCH TO INDUSTRY. RobinHood v3. Robinhood User Group Thomas Leibovici 16 septembre 2015
FROM RESEARCH TO INDUSTRY RobinHood v3 Robinhood User Group 2015 Thomas Leibovici 16 septembre 2015 SEPTEMBER, 21 st 2015 About Robinhood v3 Next major release: robinhood v3.0
More informationPlacement de processus (MPI) sur architecture multi-cœur NUMA
Placement de processus (MPI) sur architecture multi-cœur NUMA Emmanuel Jeannot, Guillaume Mercier LaBRI/INRIA Bordeaux Sud-Ouest/ENSEIRB Runtime Team Lyon, journées groupe de calcul, november 2010 Emmanuel.Jeannot@inria.fr
More informationFrom C Programs to the Configure-Execute Model
From C Programs to the Configure-Execute Model João M. P. Cardoso FCT/University of Algarve, Campus de Gambelas, 8000-117 Faro, Portugal Email: jmpc@acm.org Markus Weinhardt PACT XPP Technologies AG Muthmannstrasse
More informationCEA Site Report. SLURM User Group Meeting 2012 Matthieu Hautreux 26 septembre 2012 CEA 10 AVRIL 2012 PAGE 1
CEA Site Report SLURM User Group Meeting 2012 Matthieu Hautreux 26 septembre 2012 CEA 10 AVRIL 2012 PAGE 1 Agenda Supercomputing Projects SLURM usage SLURM related work SLURM
More informationUsing Speculative Computation and Parallelizing techniques to improve Scheduling of Control based Designs
Using Speculative Computation and Parallelizing techniques to improve Scheduling of Control based Designs Roberto Cordone Fabrizio Ferrandi, Gianluca Palermo, Marco D. Santambrogio, Donatella Sciuto Università
More informationEvaluating Inter-cluster Communication in Clustered VLIW Architectures
Evaluating Inter-cluster Communication in Clustered VLIW Architectures Anup Gangwar Embedded Systems Group, Department of Computer Science and Engineering, Indian Institute of Technology Delhi September
More informationTowards an automatic co-generator for manycores. architecture and runtime: STHORM case-study
Procedia Computer Science Towards an automatic co-generator for manycores Volume 51, 2015, Pages 2809 2813 architecture and runtime: STHORM case-study ICCS 2015 International Conference On Computational
More information: Advanced Compiler Design. 8.0 Instruc?on scheduling
6-80: Advanced Compiler Design 8.0 Instruc?on scheduling Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Overview 8. Instruc?on scheduling basics 8. Scheduling for ILP processors 8.
More informationSelf-optimisation using runtime code generation for Wireless Sensor Networks
Self-optimisation using runtime code generation for Wireless Sensor Networks ComNet-IoT Workshop ICDCN 16 Singapore Caroline Quéva Damien Couroussé Henri-Pierre Charles www.cea.fr Univ. Grenoble Alpes,
More informationInstruction scheduling. Advanced Compiler Construction Michel Schinz
Instruction scheduling Advanced Compiler Construction Michel Schinz 2015 05 21 Instruction ordering When a compiler emits the instructions corresponding to a program, it imposes a total order on them.
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationHigh Level Synthesis
High Level Synthesis Design Representation Intermediate representation essential for efficient processing. Input HDL behavioral descriptions translated into some canonical intermediate representation.
More informationMT-ADRES: Multithreading on Coarse-Grained Reconfigurable Architecture
MT-ADRES: Multithreading on Coarse-Grained Reconfigurable Architecture Kehuai Wu, Jan Madsen Dept. of Informatics and Mathematic Modelling Technical University of Denmark {kw, jan}@imm.dtu.dk Andreas Kanstein
More informationIBM IBM Storage Networking Solutions Version 1.
IBM 000-740 IBM Storage Networking Solutions Version 1 http://killexams.com/exam-detail/000-740 - disk storage subsystem with four (4) total ports - two (2) LTO3 tape drives to be attached Assuming best
More informationParallel graph traversal for FPGA
LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,
More informationCenter for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop
Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion
More informationIntroduction VLSI PHYSICAL DESIGN AUTOMATION
VLSI PHYSICAL DESIGN AUTOMATION PROF. INDRANIL SENGUPTA DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Introduction Main steps in VLSI physical design 1. Partitioning and Floorplanning l 2. Placement 3.
More informationSPARK: A Parallelizing High-Level Synthesis Framework
SPARK: A Parallelizing High-Level Synthesis Framework Sumit Gupta Rajesh Gupta, Nikil Dutt, Alex Nicolau Center for Embedded Computer Systems University of California, Irvine and San Diego http://www.cecs.uci.edu/~spark
More informationNetwork Calculus: A Comparison
Time-Division Multiplexing vs Network Calculus: A Comparison Wolfgang Puffitsch, Rasmus Bo Sørensen, Martin Schoeberl RTNS 15, Lille, France Motivation Modern multiprocessors use networks-on-chip Congestion
More informationMaximum Clique Problem. Team Bushido bit.ly/parallel-computing-fall-2014
Maximum Clique Problem Team Bushido bit.ly/parallel-computing-fall-2014 Agenda Problem summary Research Paper 1 Research Paper 2 Research Paper 3 Software Design Demo of Sequential Program Summary Of the
More informationRetiming Arithmetic Datapaths using Timed Taylor Expansion Diagrams
Retiming Arithmetic Datapaths using Timed Taylor Expansion Diagrams Daniel Gomez-Prado Dusung Kim Maciej Ciesielski Emmanuel Boutillon 2 University of Massachusetts Amherst, USA. {dgomezpr,ciesiel,dukim}@ecs.umass.edu
More informationElectromagnetic Transient Fault Injection on AES
Electromagnetic Transient Fault Injection on AES Amine DEHBAOUI ¹, Jean-Max DUTERTRE ², Bruno ROBISSON ¹, Assia TRIA ¹ Fault Diagnosis and Tolerance in Cryptography Leuven, Belgium Sunday, September 9,
More informationSELF-TUNING HTM. Paolo Romano
SELF-TUNING HTM Paolo Romano 2 Based on ICAC 14 paper N. Diegues and Paolo Romano Self-Tuning Intel Transactional Synchronization Extensions 11 th USENIX International Conference on Autonomic Computing
More informationHardware/Software Partitioning of Digital Systems
Hardware/Software Partitioning of Digital Systems F. Dufour Advisor: M. Radetzki Department of Technical Computer Science University of Stuttgart Seminar Embedded Systems Outline 1 Partitioning and digital
More informationSupporting information
Electronic Supplementary Material (ESI) for Journal of Materials Chemistry C. This journal is The Royal Society of Chemistry 2018 Supporting information for Ligand-Free Synthesis of Gold Nanoparticles
More informationMapping MPEG Video Decoders on the ADRES Reconfigurable Array Processor for Next Generation Multi-Mode Mobile Terminals
Mapping MPEG Video Decoders on the ADRES Reconfigurable Array Processor for Next Generation Multi-Mode Mobile Terminals Mladen Berekovic IMEC Kapeldreef 75 B-301 Leuven, Belgium 0032-16-28-8162 Mladen.Berekovic@imec.be
More informationFast Stereoscopic Rendering on Mobile Ray Tracing GPU for Virtual Reality Applications
Fast Stereoscopic Rendering on Mobile Ray Tracing GPU for Virtual Reality Applications SAMSUNG Advanced Institute of Technology Won-Jong Lee, Seok Joong Hwang, Youngsam Shin, Jeong-Joon Yoo, Soojung Ryu
More informationMemory Access Optimization in Compilation for Coarse-Grained Reconfigurable Architectures
Memory Access Optimization in Compilation for Coarse-Grained Reconfigurable Architectures YONGJOO KIM, Seoul National University JONGEUN LEE, Ulsan National Institute of Science and Technology AVIRAL SHRIVASTAVA,
More informationCOARSE GRAINED RECONFIGURABLE ARCHITECTURES FOR MOTION ESTIMATION IN H.264/AVC
COARSE GRAINED RECONFIGURABLE ARCHITECTURES FOR MOTION ESTIMATION IN H.264/AVC 1 D.RUKMANI DEVI, 2 P.RANGARAJAN ^, 3 J.RAJA PAUL PERINBAM* 1 Research Scholar, Department of Electronics and Communication
More informationPOLYMORPHIC PIPELINE ARRAY: A FLEXIBLE MULTICORE ACCELERATOR FOR MOBILE MULTIMEDIA APPLICATIONS. Hyunchul Park
POLYMORPHIC PIPELINE ARRAY: A FLEXIBLE MULTICORE ACCELERATOR FOR MOBILE MULTIMEDIA APPLICATIONS by Hyunchul Park A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor
More informationClaude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique
Got 2 seconds Sequential 84 seconds Expected 84/84 = 1 second!?! Got 25 seconds MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Séminaire MATHEMATIQUES
More informationHYRISE In-Memory Storage Engine
HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University
More informationHigh-Level Synthesis (HLS)
Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationExtraction of tiled top-down irregular pyramids from large images
Extraction of tiled top-down irregular pyramids from large images Romain Goffe 1 Guillaume Damiand 2 Luc Brun 3 1 SIC-XLIM, Université de Poitiers, CNRS, UMR6172, Bâtiment SP2MI, F-86962, Futuroscope Chasseneuil,
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation Parallel Compilation Two approaches to compilation Parallelize a program manually Sequential code converted to parallel code Develop
More informationChronological Backtracking Conflict Directed Backjumping Dynamic Backtracking Branching Strategies Branching Heuristics Heavy Tail Behavior
PART III: Search Outline Depth-first Search Chronological Backtracking Conflict Directed Backjumping Dynamic Backtracking Branching Strategies Branching Heuristics Heavy Tail Behavior Best-First Search
More informationFundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.
Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing
More informationMemory Management Algorithms on Distributed Systems. Katie Becker and David Rodgers CS425 April 15, 2005
Memory Management Algorithms on Distributed Systems Katie Becker and David Rodgers CS425 April 15, 2005 Table of Contents 1. Introduction 2. Coarse Grained Memory 2.1. Bottlenecks 2.2. Simulations 2.3.
More informationCoarse-Grained Reconfigurable Array Architectures
Coarse-Grained Reconfigurable Array Architectures Bjorn De Sutter, Praveen Raghavan, Andy Lambrechts Abstract Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that
More information