Connected Component Labelling, an embarrassingly sequential algorithm
|
|
- Louisa Sutton
- 6 years ago
- Views:
Transcription
1 Connected Component Labelling, an embarrassingly sequential algorithm Platform Parallel Netherlands GPGPU-day, 20 June 203 Jaap van de Loosdrecht NHL Centre of Expertise in Computer Vision Van de Loosdrecht Machine Vision BV Limerick Institute of Technology
2 Overview Introduction and background Connected Component Labelling Sequential Few-core Many-core Kalentev et al. approach Suggestions for extending Suggestions for optimizing Summary and conclusions Future work on CCL References Future of intelligent cameras Questions
3 Introduction Manager NHL Centre of Expertise in Computer Vision University of Applied Sciences, Leeuwarden 4 FTE Since 996: 80 industrial projects Managing director Van de Loosdrecht Machine Vision BV VisionLab: development environment for Computer Vision with Pattern matching, Neural networks and Genetic algorithms Portable library (ANSI C++) Windows, Linux and Android x86, x64, ARM and PowerPC Student Limerick Institute of Technology (Ireland) Research master project, September 20 September 203
4 Research master project Accelerating sequential computer vision algorithms using commodity parallel hardware Apply parallel programming techniques to meet the challenges posed in computer vision by the limits of sequential architectures Distinctive: investigate how to speed up a whole library by parallelizing the algorithms in an economical way and execute them on multiple platforms Generic library, lines of ANSI C++ Portability and vendor independency OpenMP for CPU, OpenCL for GPU Variance in execution times Run-time prediction if parallelization is beneficial
5 Computer vision algorithms and parallelization Classification image operators Low level image operators Point operators Local neighbour operators Global operators Connectivity based operators High level image operators Often built on the low level operators Specials Pattern matcher, neural network, genetic algorithm, etc Idea: start with low level image operators, design and implement skeletons for parallelizing representatives in each classes
6 Demonstration Label Blobs Open image cells.jl, Show image contents ThresholdIsoData Show image contents Explain background/objects, white/black and 0/ LabelBlobs, show image contents Show image contents Explain 3 used colours BlobAnalyse Explain table Explain successive label numbering
7 Screen shot demo
8 Label blobs iterative algorithm Classical sequential approach Haralick and Shapiro (992) Binary image: Give each object pixel a unique positive value
9 9 Label blobs iterative algorithm Repeat until no changes Down pass (top left to right bottom): give each pixel the minimum value of its 8 neighbours Up pass (right bottom to top left): give each pixel the minimum value of its 8 neighbours
10 Sequential version He, Chao, and Suzuki (2008): two passes approach best performance Pass: equivalent labels are stored in equivalence table (neighbourhood search) Resolving equivalences with search algorithm Pass2: assign label to pixel (lookup table) Analysis of execution time (VisionLab) in s on Core i7-2640m for typical image cells.jl Size image Pass ( s) Resolving equivalences ( s) Pass2 ( s) Total ( s) Pass/Total 256x x x
11 Parallel version Rosenfeld and Pfaltz (966): CCL cannot be implemented with parallel local operations Hawick, Leist and Playne (200): Label Equivalence best performance Kalentev, Rai, Kemnitz, and Schneider (20): alternative Label Equivalence approach Store equivalence table in image No atomic operations Claim efficient in terms of number of iterations needed, on average 5 iterations on their test set Algorithm Initial pass Multiple iterations Link pass (neighbourhood search) Label equalize pass (neighbourhood search) Final pass
12 Kalentev et al. approach It is expected that Both passes of iteration have similar complexity as Pass Initial and final pass have similar complexity as Pass2 Analysis On average Kalentev et al approach needs 5 iterations One simple initial pass 0 neighbourhood search passes One simple final pass Extra post processing step with two simple passes Estimation Sequential version unit of execution time Kalentev et al. 8.2 units of (sequential) execution time
13 Kalentev et al. approach Different approaches needed for few-core CPU approach and many-core GPU approach GPU approach will suffer from branch diversion
14 Few-core approach on Core i GHz (quad-core)
15 By Kalentev et al. suggested framework host code WriteBuffer(image) int notdone = ; RunKernel( InitLabels,image); WriteBuffer(notDone); while (notdone == ) { notdone = 0; WriteBuffer(notDone); RunKernel( Link,image,notDone) RunKernel( LabelEqualize,image) ReadBuffer(notDone); } // while notdone ReadBuffer(image)
16 Suggestions for extending Kalentev et al. approach InitLabel kernel is extended to set the border pixels of the image to the background value Link kernels are implemented for both four and eight connectivity Post processing step with two passes is added in order to make the labelling of the blobs successive
17 Suggestions for optimizing Kalentev et al. approach Each iteration has a Link pass and a LabelEqualize pass. For the last iteration the LabelEqualize pass is redundant Many of the kernel execute, read buffer and write buffer commands can be asynchronously started and synchronized using events The write to the IsNotDone buffer can be done in parallel to the LabelEqualize pass Except second pass post processing step, all kernels can be vectorized InitLabel kernel straightforward Other kernels a quick test if all pixels in the vector are background pixels Beneficial for processing background pixels Little extra overhead for object pixels
18 Core i with GTX 560 Ti (OEM)
19 Core i with GTX 560 Ti (OEM)
20 Core i with GTX 560 Ti (OEM)
21 Summary and conclusions Connected component labelling Different approaches for few-core and many-core approaches Few-core approach: reasonable speedups on CPUs Many-core approach: reasonable speedups on GPUs Suggestions for extending Kalentev et al. approach Suggestions for optimizing Kalentev et al. approach
22 Future work on Connected Component Labelling Parallelize few-core label repair step Implement and benchmark OpenCL implementation few-core approach Research in finding the break-even point few-core versus manycore approach Implement and benchmark approach suggested by Stava and Benes (20), only H/W ^2
23 References Van de Loosdrecht, J., 203. Accelerating sequential computer vision algorithms using commodity parallel hardware. Research master project at Limerick Institute of Technology. Expected to be published in autumn 203 at Haralick, R.M. and Shapiro, L.G., 992. Computer and Robot Vision. Volume I and Volume II. Reading: Addison-Welsey Publishing Company. He, L., Chao, Y. and Suzuki, K., A Run-Based Two-Scan Labeling Algorithm. IEEE Transactions on image processing, 7(5), pp Rosenfeld, A. and Pfaltz, J.L., 966. Sequential Operations in Digital Picture Processing. Journal of the ACM, 3(4), pp Hawick, K.A., Leist, A. and Playne, D.P., 200. Parallel graph component labeling with GPUs and CUDA. Parallel Computing, 36(2), pp Kalentev, O., Rai, A., Kemnitz, S. and Schneider, S., 20. Connected component labeling on a 2D grid using CUDA. Journal of Parallel and Distributed Computing, 7 (4), pp Stava, O. and Benes, B., 20. Connected Component Labeling in CUDA. In: Wen-Mei, W.H. ed. 20. Gpu Computing Gems, Emerald edition. Burlington: Morgan Kaufman. Ch.35.
24 Future: Intelligent camera with heterogonous computing XIMEA Currera G AMD T-56N Dual-core x64.6 GHZ 80 core GPU 500 MHz 2 GB DDR3 32 GB SSD 4 USB-3, USB-2 HDMI PoE Gigabit ethernet Micro PLC 8 digital I/Os Many image sensors <= 5M pixel
25 Prototype XIMEA Currera G
26 Prototype XIMEA Currera G
27 Questions? Jaap van de Loosdrecht NHL Centre of Expertise in Computer Vision Van de Loosdrecht Machine Vision BV
Accelerating sequential computer vision algorithms using commodity parallel hardware
Accelerating sequential computer vision algorithms using commodity parallel hardware Platform Parallel Netherlands GPGPU-day, 28 June 2012 Jaap van de Loosdrecht NHL Centre of Expertise in Computer Vision
More informationAccelerating Sequential Computer Vision Algorithms Using Commodity Parallel Hardware
Accelerating Sequential Computer Vision Algorithms Using Commodity Parallel Hardware Jacob (Jaap) van de Loosdrecht A thesis submitted to Quality and Qualifications Ireland (QQI) for the award of Master
More informationA Hybrid Approach to Parallel Connected Component Labeling Using CUDA
International Journal of Signal Processing Systems Vol. 1, No. 2 December 2013 A Hybrid Approach to Parallel Connected Component Labeling Using CUDA Youngsung Soh, Hadi Ashraf, Yongsuk Hae, and Intaek
More informationConnected component labeling on a 2D grid using CUDA
Connected component labeling on a 2D grid using CUDA Oleksandr Kalentev a,, Abha Rai a, Stefan Kemnitz b, Ralf Schneider c a Max-Planck-Institut für Plasmaphysik, Wendelsteinstr. 1, Greifswald, Germany
More informationComputer Vision. License Plate Recognition Klaas Dijkstra - Jaap van de Loosdrecht
License Plate Recognition Klaas Dijkstra - Jaap van de Loosdrecht 10 April 2018 Copyright 2001 2018 by NHL Stenden Hogeschooland Van de Loosdrecht Machine Vision BV All rights reserved j.van.de.loosdrecht@nhl.nl,
More informationMulti Core Processing in VisionLab
Multi Core CPU Processing in 10 April 2018 Copyright 2001 2018 by Van de Loosdrecht Machine Vision BV All rights reserved jaap@vdlmv.nl Overview Introduction Demonstration Automatic operator parallelization
More informationEvaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi
Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
More informationFiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers
FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers Rene Griessl, Meysam Peykanu, Lennart Tigges, Jens Hagemeyer, Mario Porrmann Center of Excellence Cognitive Interaction Technology
More informationSingle Pass Connected Components Analysis
D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected
More informationComputer Vision & Deep Learning
Computer Vision & Deep Learning VisionLab Python interoperability 11 June 2018 Copyright 2001 2018 by NHL Stenden Hogeschooland Van de Loosdrecht Machine Vision BV All rights reserved Jaap van de Loosdrecht,
More informationComputer vision. 3D Stereo camera Bumblebee. 10 April 2018
Computer vision 3D Stereo camera Bumblebee 10 April 2018 Copyright 2001 2018 by NHL Stenden Hogeschooland Van de Loosdrecht Machine Vision BV All rights reserved Thomas Osinga j.van.de.loosdrecht@nhl.nl,
More informationOptimization solutions for the segmented sum algorithmic function
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationCUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni
CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3
More informationFacial Recognition Using Neural Networks over GPGPU
Facial Recognition Using Neural Networks over GPGPU V Latin American Symposium on High Performance Computing Juan Pablo Balarini, Martín Rodríguez and Sergio Nesmachnow Centro de Cálculo, Facultad de Ingeniería
More informationGPGPU. Peter Laurens 1st-year PhD Student, NSC
GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing
More informationFTF Americas. FTF Brazil. freescale.com/ftf. Secure, Embedded Processing Solutions for the Internet of Tomorrow
Secure, Embedded Processing Solutions for the Internet of Tomorrow FTF Americas FTF Brazil June 22-25, 2015 September 15, 2015 JW Marriott Austin Grand Hyatt São Paulo Hotel TM freescale.com/ftf Freescale
More informationEXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS
EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS James Ross High Performance Technologies, Inc (HPTi) Computational Scientist Edward Carmack David Richie Song Park, Brian Henz and Dale Shires HPTi
More informationGTC 2013 March San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation:
GTC 2013 March 18-21 San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation: SPEAK - Showcase your work among the elite of graphics computing - Call
More informationEmbedded Computing without Compromise. Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM -Aitech Systems GTC Israel 2017
Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM - Systems GTC Israel 2017 Agenda Current GPGPU systems NVIDIA Jetson TX1 and TX2 evaluation Conclusions New Products 2 GPGPU Product
More informationBlock-Based Connected-Component Labeling Algorithm Using Binary Decision Trees
Sensors 2015, 15, 23763-23787; doi:10.3390/s150923763 Article OPEN ACCESS sensors ISSN 1424-8220 www.mdpi.com/journal/sensors Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees
More informationIntegrating DMA capabilities into BLIS for on-chip data movement. Devangi Parikh Ilya Polkovnichenko Francisco Igual Peña Murtaza Ali
Integrating DMA capabilities into BLIS for on-chip data movement Devangi Parikh Ilya Polkovnichenko Francisco Igual Peña Murtaza Ali 5 Generations of TI Multicore Processors Keystone architecture Lowers
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More information8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2
CSE 820 Graduate Computer Architecture Richard Enbody Dr. Enbody 1 st Day 2 1 Why Computer Architecture? Improve coding. Knowledge to make architectural choices. Ability to understand articles about architecture.
More informationDiego J C. Santiago], Tsang Ing Ren], George D. C. Cavalcant/ and Tsang Ing Jyh2
FAST BLOCK-BASED ALGORITHMS FOR CONNECTED COMPONENTS LABELING Diego J C. Santiago], Tsang Ing Ren], George D. C. Cavalcant/ and Tsang Ing Jyh2 l Center for Informatics, Federal University of Pernambuco
More informationHardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms
Hardware Acceleration of Feature Detection and Description Algorithms on LowPower Embedded Platforms Onur Ulusel, Christopher Picardo, Christopher Harris, Sherief Reda, R. Iris Bahar, School of Engineering,
More informationOpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania
OpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania Course Overview This OpenCL base course is structured as follows: Introduction to GPGPU programming, parallel programming
More informationA176 Cyclone. GPGPU Fanless Small FF RediBuilt Supercomputer. IT and Instrumentation for industry. Aitech I/O
The A176 Cyclone is the smallest and most powerful Rugged-GPGPU, ideally suited for distributed systems. Its 256 CUDA cores reach 1 TFLOPS, and it consumes less than 17W at full load (8-10W at typical
More informationHARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationOpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017
OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017 Agenda Why Zynq SoCs for Traditional Computer Vision Automated
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More informationRenderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs. Lihua Zhang, Ph.D. MulticoreWare Inc.
Renderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs Lihua Zhang, Ph.D. MulticoreWare Inc. lihua@multicorewareinc.com Overview More & more mobile apps are beginning to require
More informationAccelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors
Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationConcurrent Manipulation of Dynamic Data Structures in OpenCL
Concurrent Manipulation of Dynamic Data Structures in OpenCL Henk Mulder University of Twente P.O. Box 217, 7500AE Enschede The Netherlands h.mulder-1@student.utwente.nl ABSTRACT With the emergence of
More informationPortable GPU-Based Artificial Neural Networks For Data-Driven Modeling
City University of New York (CUNY) CUNY Academic Works International Conference on Hydroinformatics 8-1-2014 Portable GPU-Based Artificial Neural Networks For Data-Driven Modeling Zheng Yi Wu Follow this
More informationINTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES. Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp.
INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp. Computer Vision in Mobile Tegra K1 It s time! AGENDA Use cases categories
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationElaborazione dati real-time su architetture embedded many-core e FPGA
Elaborazione dati real-time su architetture embedded many-core e FPGA DAVIDE ROSSI A L E S S A N D R O C A P O T O N D I G I U S E P P E T A G L I A V I N I A N D R E A M A R O N G I U C I R I - I C T
More informationXIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture
XIV International PhD Workshop OWD 2012, 20 23 October 2012 Optimal structure of face detection algorithm using GPU architecture Dmitry Pertsau, Belarusian State University of Informatics and Radioelectronics
More informationGeneral Purpose GPU Programming (1) Advanced Operating Systems Lecture 14
General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14 Lecture Outline Heterogenous multi-core systems and general purpose GPU programming Programming models Heterogenous multi-kernels
More informationREPORT DOCUMENTATION PAGE
REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationSimplify System Complexity
Simplify System Complexity With the new high-performance CompactRIO controller Fanie Coetzer Field Sales Engineer Northern South Africa 2 3 New control system CompactPCI MMI/Sequencing/Logging FieldPoint
More informationThe Benefits of GPU Compute on ARM Mali GPUs
The Benefits of GPU Compute on ARM Mali GPUs Tim Hartley 1 SEMICON Europa 2014 ARM Introduction World leading semiconductor IP Founded in 1990 1060 processor licenses sold to more than 350 companies >
More informationJ. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst
Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function
More informationIntroduction to GPGPU and GPU-architectures
Introduction to GPGPU and GPU-architectures Henk Corporaal Gert-Jan van den Braak http://www.es.ele.tue.nl/ Contents 1. What is a GPU 2. Programming a GPU 3. GPU thread scheduling 4. GPU performance bottlenecks
More informationOffloading Java to Graphics Processors
Offloading Java to Graphics Processors Peter Calvert (prc33@cam.ac.uk) University of Cambridge, Computer Laboratory Abstract Massively-parallel graphics processors have the potential to offer high performance
More information15.6. TEP Series. Unique Expansion Possibilities. Power and Networking Expansion Modules 10.1
TEP TEP Series Our TEP series is available with either ARM or x86 architecture, and also has an IP65 anodized aluminum enclosure enabling cleaning with water. Moreover, to improve robustness, the design
More informationServosila Robotic Heads
Servosila Robotic Heads www.servosila.com TABLE OF CONTENTS SERVOSILA ROBOTIC HEADS 2 SOFTWARE-DEFINED FUNCTIONS OF THE ROBOTIC HEADS 2 SPECIFICATIONS: ROBOTIC HEADS 4 DIMENSIONS OF ROBOTIC HEAD 5 DIMENSIONS
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationA Large-Scale Cross-Architecture Evaluation of Thread-Coarsening. Alberto Magni, Christophe Dubach, Michael O'Boyle
A Large-Scale Cross-Architecture Evaluation of Thread-Coarsening Alberto Magni, Christophe Dubach, Michael O'Boyle Introduction Wide adoption of GPGPU for HPC Many GPU devices from many of vendors AMD
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationOverview of Project's Achievements
PalDMC Parallelised Data Mining Components Final Presentation ESRIN, 12/01/2012 Overview of Project's Achievements page 1 Project Outline Project's objectives design and implement performance optimised,
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationTHE LEADER IN VISUAL COMPUTING
MOBILE EMBEDDED THE LEADER IN VISUAL COMPUTING 2 TAKING OUR VISION TO REALITY HPC DESIGN and VISUALIZATION AUTO GAMING 3 BEST DEVELOPER EXPERIENCE Tools for Fast Development Debug and Performance Tuning
More informationA new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI SOI Symposium Santa Clara, Apr.
Dr. Jens Benndorf MD, COO Dream Chip A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI SOI Symposium Santa Clara, Apr. 13th, 2017 DCT Company Profile Dream
More informationReal-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010
1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.
More informationThe Many-Core Revolution Understanding Change. Alejandro Cabrera January 29, 2009
The Many-Core Revolution Understanding Change Alejandro Cabrera cpp.cabrera@gmail.com January 29, 2009 Disclaimer This presentation currently contains several claims requiring proper citations and a few
More informationA new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology
Dr.-Ing Jens Benndorf (DCT) Gregor Schewior (DCT) A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology Tensilica Day 2017 16th
More informationBack-Projection on GPU: Improving the Performance
UNIVERSITY OF MICHIGAN Back-Projection on GPU: Improving the Performance EECS 499 Independent Study Wenlay Esther Wei 4/29/2010 The purpose of this project is to accelerate the processing speed of the
More informationSystem-on-Chip Architecture for Mobile Applications. Sabyasachi Dey
System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More informationAccelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies
Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies John C. Linford John Michalakes Manish Vachharajani Adrian Sandu IMAGe TOY 2009 Workshop 2 Virginia
More informationAccelerating Financial Applications on the GPU
Accelerating Financial Applications on the GPU Scott Grauer-Gray Robert Searles William Killian John Cavazos Department of Computer and Information Science University of Delaware Sixth Workshop on General
More informationTR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut
TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationC7 Player. Overview. Specifications
C7 Overview C7 Player C7 can connect to Internet through LAN/WiFi/4G. Based on Colorlight Cloud Server, C7 can rapidly achieve unified management of multiple screens and multi-services across regions.
More informationDown selecting suitable manycore technologies for the ELT AO RTC. David Barr, Alastair Basden, Nigel Dipper and Noah Schwartz
Down selecting suitable manycore technologies for the ELT AO RTC David Barr, Alastair Basden, Nigel Dipper and Noah Schwartz GFLOPS RTC for AO workshop 27/01/2016 AO RTC Complexity 1.E+05 1.E+04 E-ELT
More informationOpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data
OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data Andrew Miller Computer Vision Group Research Developer 3-D TERRAIN RECONSTRUCTION
More informationLecture 1: Introduction and Computational Thinking
PASI Summer School Advanced Algorithmic Techniques for GPUs Lecture 1: Introduction and Computational Thinking 1 Course Objective To master the most commonly used algorithm techniques and computational
More informationManycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT
Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate
More informationMartin Dubois, ing. Contents
Martin Dubois, ing Contents Without OpenNet vs With OpenNet Technical information Possible applications Artificial Intelligence Deep Packet Inspection Image and Video processing Network equipment development
More informationObject Counting Using Convolutional Neural Network Accelerator IP Reference Design
Object Counting Using Convolutional Neural Network Accelerator IP FPGA-RD-02036 Version 1.1 September 2018 Contents Acronyms in This Document... 3 1. Introduction... 4 2. Related Documentation... 5 2.1.
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationHigh Performance Computing. Taichiro Suzuki Tokyo Institute of Technology Dept. of mathematical and computing sciences Matsuoka Lab.
High Performance Computing Taichiro Suzuki Tokyo Institute of Technology Dept. of mathematical and computing sciences Matsuoka Lab. 1 Review Paper Two-Level Checkpoint/Restart Modeling for GPGPU Supada
More informationSpeed Sign Detection Using Convolutional Neural Network Accelerator IP Reference Design
Speed Sign Detection Using Convolutional Neural Network Accelerator IP FPGA-RD-02035 Version 1.1 September 2018 Contents Acronyms in This Document... 3 1. Introduction... 4 2. Overview... 5 2.1. Block
More informationWaveView. System Requirement V6. Reference: WST Page 1. WaveView System Requirements V6 WST
WaveView System Requirement V6 Reference: WST-0125-01 www.wavestore.com Page 1 WaveView System Requirements V6 Copyright notice While every care has been taken to ensure the information contained within
More informationMoving Object Detection by Connected Component Labeling of Point Cloud Registration Outliers on the GPU
Moving Object Detection by Connected Component Labeling of Point Cloud Registration Outliers on the GPU Michael Korn, Daniel Sanders and Josef Pauli Intelligent Systems Group, University of Duisburg-Essen,
More informationUSB for Embedded Device ASHWINI MISHRA
USB for Embedded Device ASHWINI MISHRA 200811025 Outline Introduction Why USB History of USB Architecture USB on Embedded systems Future References Introduction USB( Universal Serial Bus) is a specification
More informationHETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE
HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)
More informationARM and x86 on Qseven & COM Express Mini. Zeljko Loncaric, Marketing Engineer, congatec AG
ARM and x86 on Qseven & COM Express Mini Zeljko Loncaric, Marketing Engineer, congatec AG Content COM Computer-On-Module Concept Qseven Key Points The Right ARM Integration with Freescale i.mx6 Qseven
More informationTEK Series. Unique Expansion Possibilities. Power and Networking Expansion Module. Automation I/O Expansion Module
TEK 40 BUS STOP VISA VISA TEK Series TechNexion fanless industrial embedded computer series, TEK, has options for ARM or x86 technology, and a compact, fully aluminum ruggedized enclosure. To improve robustness
More informationGPU-accelerated data expansion for the Marching Cubes algorithm
GPU-accelerated data expansion for the Marching Cubes algorithm San Jose (CA) September 23rd, 2010 Christopher Dyken, SINTEF Norway Gernot Ziegler, NVIDIA UK Agenda Motivation & Background Data Compaction
More informationProfiling and Debugging Games on Mobile Platforms
Profiling and Debugging Games on Mobile Platforms Lorenzo Dal Col Senior Software Engineer, Graphics Tools Gamelab 2013, Barcelona 26 th June 2013 Agenda Introduction to Performance Analysis with ARM DS-5
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationVectorisation and Portable Programming using OpenCL
Vectorisation and Portable Programming using OpenCL Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre (JSC) Andreas Beckmann, Ilya Zhukov, Willi Homberg, JSC Wolfram Schenck, FH Bielefeld
More informationEvaluation Of The Performance Of GPU Global Memory Coalescing
Evaluation Of The Performance Of GPU Global Memory Coalescing Dae-Hwan Kim Department of Computer and Information, Suwon Science College, 288 Seja-ro, Jeongnam-myun, Hwaseong-si, Gyeonggi-do, Rep. of Korea
More informationGPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013
GPGPU on ARM Tom Gall, Gil Pitney, 30 th Oct 2013 Session Description This session will discuss the current state of the art of GPGPU technologies on ARM SoC systems. What standards are there? Where are
More informationRecent Advances in Heterogeneous Computing using Charm++
Recent Advances in Heterogeneous Computing using Charm++ Jaemin Choi, Michael Robson Parallel Programming Laboratory University of Illinois Urbana-Champaign April 12, 2018 1 / 24 Heterogeneous Computing
More informationProject Proposals. Advanced Operating Systems / Embedded Systems (2016/2017)
Project Proposals / Embedded Systems (2016/2017) Giuseppe Massari, Federico Terraneo giuseppe.massari@polimi.it federico.terraneo@polimi.it Project Rules 2/40 General rules Two types of project: Code development
More informationEyeCheck Smart Cameras
EyeCheck Smart Cameras 2 3 EyeCheck 9xx & 1xxx series Technical data Memory: DDR RAM 128 MB FLASH 128 MB Interfaces: Ethernet (LAN) RS422, RS232 (not EC900, EC910, EC1000, EC1010) EtherNet / IP PROFINET
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationPredicting GPU Performance from CPU Runs Using Machine Learning
Predicting GPU Performance from CPU Runs Using Machine Learning Ioana Baldini Stephen Fink Erik Altman IBM T. J. Watson Research Center Yorktown Heights, NY USA 1 To exploit GPGPU acceleration need to
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationMost real programs operate somewhere between task and data parallelism. Our solution also lies in this set.
for Windows Azure and HPC Cluster 1. Introduction In parallel computing systems computations are executed simultaneously, wholly or in part. This approach is based on the partitioning of a big task into
More informationHow GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige
How GPUs can find your next hit: Accelerating virtual screening with OpenCL Simon Krige ACS 2013 Agenda > Background > About blazev10 > What is a GPU? > Heterogeneous computing > OpenCL: a framework for
More informationContour Detection on Mobile Platforms
Contour Detection on Mobile Platforms Bor-Yiing Su, subrian@eecs.berkeley.edu Prof. Kurt Keutzer, keutzer@eecs.berkeley.edu Parallel Computing Lab, University of California, Berkeley 1/26 Diagnosing Power/Performance
More informationTowards Breast Anatomy Simulation Using GPUs
Towards Breast Anatomy Simulation Using GPUs Joseph H. Chui 1, David D. Pokrajac 2, Andrew D.A. Maidment 3, and Predrag R. Bakic 4 1 Department of Radiology, University of Pennsylvania, Philadelphia PA
More informationM100 GigE Series. Multi-Camera Vision Controller. Easy cabling with PoE. Multiple inspections available thanks to 6 GigE Vision ports and 4 USB3 ports
M100 GigE Series Easy cabling with PoE Multiple inspections available thanks to 6 GigE Vision ports and 4 USB3 ports Maximized acquisition performance through 6 GigE independent channels Common features
More informationReducing Time-to-Market with i.mx6-based Qseven Modules
Reducing Time-to-Market with i.mx6-based Qseven Modules congatec Facts The preferred global vendor for innovative embedded solutions to enable competitive advantages for our customers. Founded December
More information