2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems
|
|
- Ronald Stephens
- 5 years ago
- Views:
Transcription
1 2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems Accelerating a computer vision algorithm on a mobile SoC using CPU-GPU co-processing - A case study on face detection Youngwan Lee Department of Information and Communication Engineering Inha University Incheon, Korea youngwan88@gmail.com Cheolyong Jang Department of Information and Communication Engineering Inha University Incheon, Korea cyjang@gmail.com Hakil Kim Department of Information and Communication Engineering Inha University Incheon, Korea hikim@inha.ac.kr ABSTRACT Recently, mobile devices have become equipped with sophisticated hardware components such as a heterogeneous multi-core SoC that consists of a CPU, GPU, and DSP. This provides opportunities to realize computationally-intensive computer vision applications using General Purpose GPU (GPGPU) programming tools such as Open Graphics Library for Embedded System (OpenGL ES) and Open Computing Language (OpenCL). As a case study, the aim of this research was to accelerate the Viola-Jones face detection algorithm which is computationally expensive and limited in use on mobile devices due to irregular memory access and imbalanced workloads resulting in low performance regarding the processing time. To solve the above challenges, the proposed method of this study adapted CPU GPU task parallelism, sliding window parallelism, scale image parallelism, dynamic allocation of threads, and local memory optimization to improve the computational time. The experimental results show that the proposed method achieved a 3.3~6.29 times increased computational time compared to the well-optimized OpenCV implementation on a CPU. The proposed method can be adapted to other applications using mobile GPUs and CPUs. Keywords Computer vision; Mobile GPGPU; OpenGL ES 2.0; OpenCL; CPU-GPU co-processing 1. INTRODUCTION In recent years, the number of mobile devices with high-definition displays, high-resolution cameras, and application processors has increased exponentially, which has facilitated pragmatic computer vision applications such as face detection, mobile visual search, 3- D games, and augmented reality on mobile devices [7,14,15,19]. However, computationally intensive computer vision applications for practical use on mobile devices are limited because of computational restrictions and limited performance compared to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. MobileSoft 16, May 16 17, 2016, Austin, TX, USA Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM /16/05 $ DOI: computers. To address this limitation, many researchers have tried to use GPUs as general purpose GPUs (GPGPUs) to perform computations in applications usually handled by CPUs to accelerate image processing and computer vision algorithms [5,20,21] using several GPU programming models such as Open Graphics Library for Embedded System 2.0 (OpenGL ES 2.0) [8] and Open Computing Language (OpenCL) [9]. However, many studies and advancements applied to desktop GPUs (dgpus) are not suitable for mobile applications because of the difference between dgpus and the mobile hardware architecture, namely System-On-Chip (SoC), with a CPU and GPU. To achieve good performance, it is of great importance to analyze the algorithms and workloads on a mobile phone and redesign an efficient workload partitioning policy for mobile hardware architecture. In this study, we present an acceleration and optimization method on mobile devices for an exemplar computer vision application the widely used Viola -Jones face detection algorithm [17,18] to exploit the capability of mobile CPUs and GPUs using OpenGL ES and OpenCL. Because of irregular memory access and an imbalanced workload, it is challenging to optimize the Viola-Jones face detection algorithm on a mobile SoC. This paper addresses the problems regarding the full use of computing power from both the mobile CPU and GPU. The rest of the paper is organized as follows: Section 2 explains the GPGPU image processing framework on a mobile device. Section 3 discusses related works on accelerating face detection algorithms with GPUs. Section 4 briefly describes the Viola Jones face detection algorithm. Section 5 presents the proposed accelerating face detection algorithm based on CPU-GPU co-processing. The experimental results are shown in Section 6. Finally, Section 7 concludes this paper. 2. MOBILE GPGPU IMAGE PROCESSING FRAMEWORK There are several differences between a mobile GPU and a dgpu. First, because a mobile GPU and CPU are both integrated into the application processor the SoC, they can save data transfer time by sharing the same memory bus. Second, the memory bandwidth of the mobile GPU is much lower than a dgpu. Additionally, the mobile GPU has far fewer compute units than that of a dgpu. For these reasons, it is necessary to carefully analyze specific algorithms and efficiently map them to the mobile SoC as well as find an optimal mapping method for the mobile SoC. OpenGL ES and OpenCL support mobile SoCs. OpenGL ES is an embedded version of OpenGL which is a standard graphic API 70
2 Figure 1. Mobile GPGPU Image Processing Framework. providing a graphic rendering pipeline as well as a GPGPU tool. OpenCL is an open parallel computing framework which can be used on heterogeneous platforms including CPUs, GPUs, and even DSPs. Because of the nature of shared memory on a mobile SoC, OpenGL ES and Open CL can both access the same data in the memory without any data copying enabling the processing to take place in the same memory rather than increasing the number of separate allocations. Considering the mobile GPU as a combination of both the main rendering device by OpenGL ES and the main compute device by OpenCL, these functionalities time-share the GPGPU. When a video stream provider such as mobile device camera supplies frame data as a GLES texture data source in the global memory, the GPU can use it in the cl_mem format for a compute OpenCL kernel. After the compute OpenCL kernel is executed, the computed result data are stored as GLES texture that the GPU can render in the display. 3. RELATED WORKS There are many works that have accelerated Viola-Jones face detection with a dgpu rather than with a mobile GPU implemented by the Compute Unified Device Architecture (CUDA) [10] or OpenCL. Sharma et al. [16] presented a face detection and tracking algorithm based on the haar-like feature on the GTX285 and achieved more than 20 times the processing performance than that of the VGA image processing performance. Oro et al. [12,13] also proposed a haar-like feature based face detection algorithm for HD video on the GTX470 and achieved an increased speed of 2.5 times. However, they used CUDA which is a GPGPU programming tool for only NVIDIA GPUs. When compared to OpenCL used in several compute components, it is unable to deal with the imbalanced workload problem that has been encountered while implementing the Viola-Jones face detection algorithm in GPUs. Several studies have been done in attempt to address the imbalanced computation problem [2,3,6,11]. Hefenbrock et al. [2] presented a multi-gpu solution that evaluates each detection window in a different thread, and computes each scaled window in parallel in a different GPU. Obukhov [11] also proposed another solution that consists of a stage-parallel and pixel-parallel implementation. Jia et al. [3] resolved this irregular workload problem of the GPU by using Uberkernel and Persistent threads. However, these studies do not utilize the CPU resources because most computations are executed on the GPU. Although Wang et al. [21] made use of the computational capability of both the CPU and GPU cores, their algorithm is only optimized for the Intel Sandy bridge chipset. Making full use of the computing power from both CPU and GPU on a mobile SoC, this paper presents a solution for the imbalanced computation problem with the Viola-Jones face detection on a mobile device using OpenCL. 4. VIOLA - JONES ALGORITHM The Viola-Jones object detection framework was proposed by Paul Viola and Michael Jones for face detection. The proposed cascade classifier is a particular case of ensemble learning which can speed up to achieve real-time processing. Because adaboost is a variant of boosting algorithms, this method was trained with adaboost by weighting the haar-like features which make the features suitable for face detection. However, we only discuss the detection process because the training process does not affect the speed in the face detection process. 4.1 Haar-like features Haar-like features in the Viola-Jones algorithm can judge whether a face is correct from an image. Using haar-like features makes it easier to find the edge, line and saliency of a face. As shown in Figure. 5, haar-like features, which consist of rectangular areas, are calculated by the difference between the intensity of the white areas and black areas. 4.2 Integral image As mentioned above, calculating haar-like features is very timeconsuming because it is based on a sliding window. The integral image can simply be acquired by calculating the sum of the intensity values within a particular area using only the pixel values of four points. 4.3 Cascade classifier A cascade can be seen as a strong classifier structure which consists of a number of weak classifiers for each stage in sequence. The weak classifiers of each stage initially have a simple structure because they only contain a few features, and as stages progress, the weak classifiers will get more complex making it more difficult to proceed to the next stage. As shown in Figure 2, since the sub-window cannot pass the initial classifier, it just decides that there is no face and does not proceed to the next stage. In contrast, if the sub-window can successfully pass every stage until the last one, then it can be determined as a face. Thus, the advantage is that because a sub-window can fail in any stage, the process will stop at the cost of a little time and save much processing time. 4.4 Scaling & Exhaustive sliding window Detection is carried out in each sliding window called a detection window which scans the whole image shown in Figure 5. After all the sliding windows in an image are evaluated, the same process will be repeated for rescaled images to detect faces of different sizes. 5. PROPOSED METHOD In this section, the parallel implementation of a face detection algorithm is presented first followed by optimization technologies. 5.1 Implementation Skin color filtering This paper applied skin color filtering which can reduce the detection region to accelerate the face detection algorithm. Skin 71 Figure 2. Cascade classifier.
3 color filtering is for robust rotation, scale, and occlusion of a face. In particular, we use the effective pixel-based skin detection should be noted that when the CPU reads an image object from the GPU, the data transfer overhead between the CPU and GPU is (a) (b) Figure 3. (a) Skin color filtering. (b) detected image Reducing search area. The proposed method adapts skin color filtering which can reduce the detection region to accelerate face detection algorithm. The skin color filtering is to robust rotation, scale, occlusion of face. In particular, we use effective pixel-based skin detection method to make it become the real-time processing [4]. Examples are shown in Figure 3. The skin-colored image is obtained from a color image with the color channels (R, G, B) by applying a color threshold (1): R 95 & G 40 & B 20 & max R, G, B min R, G, B 15 & (1) R G 15 & R G & R B If non-skin pixels have values similar to the skin, then they will be considered candidates for skin. This is because the method is based on a fixed color threshold. However, skin color filtering is still an effective way to decrease the overall process. Even in real skin-colored areas, there are still some pixel values that cannot satisfy the threshold, resulting in black holes, which will influence detection performance. To solve this problem, this paper adapts the dilation technique which can fill in the holes in the skincolored areas Design for parallelism CPU-GPU task-level parallelism Figure 4 shows a flow diagram of the proposed face detection algorithm based on CPU-GPU co-processing. OpenCL GPU kernels are executed in the right box. As a part of the process, in the left box, CPU serial computations are carried out. The Image 2- dimensional memory object that was converted from the texture data by the OpenGL ES pipeline is delivered to the OpenCL computing units. In the first step, scaling images and skin color filtering, which screen for skin-colored pixels, are carried out. After dilation of the skin-colored mask in the GPU kernel, CPU is treated as the host which reads the skin-colored mask from the GPU. It Figure 4. Flow diagram of the proposed face detection algorithm. Figure 5. Combined image for the GPU kernel. negligible due to the characteristic of the shared memory system on a mobile SoC. Collection of the skin-colored pixel s coordinates running on the CPU can be executed concurrently by executing the Integral kernel on the GPU, which enables the computing resources of both the CPU and GPU to be fully used at the same time. Finally, in the cascade GPU kernel, detection window computations are executed with the skin-colored pixels which are delivered from the CPU Sliding window parallelism Data parallelism means the same tasks are simultaneously executed on multiple processors across different pieces of distributed data. In particular, there should be no data dependencies affecting the execution order among the processors. As mentioned above, to implement face detection, a cascade classifier is computed to determine whether a face is in the sliding window. It is very efficient to do data parallelism when executing the same process for millions of detection windows independently Scale image parallelism The Viola-Jones face detection algorithm is scale invariant by processing several scales of images. Naïve implementation performs the face detection algorithm by iterative process among the scaled-down images, so that almost all kernels are iteratively launched. In such a process, several kernels in the loop increase the waste of computation resources due to the barrier synchronization problem. In addition, it can cause kernel launch overheads by iteratively performing the same kernel. To solve this problem, as shown in Figure 5, we merge the scaled down images into a single image. This method can reduce the waste of computing resources by eliminating kernel iterations. When we make a unified single image by combining all the scaled down images, a 2-dimensional image memory object has more advantages than a 1-dimensional global memory buffer which is commonly used in OpenCL. Therefore, this will not only access data more quickly but will also make it easier to handle boundary conditions compared to a global memory buffer. 5.2 Optimization Dynamic allocation of work-items Because a GPU uses the SIMT (Single Instruction Multiple Thread) programming model, units of work-groups are scheduled and 72
4 (a) (b) Figure 6. Reduction of idle work-items in a GPU (a) Original NDRange (b) Optimized NDRange executed in the GPU. Global work size refers to the total number of work-items (threads) in a GPU and is set as the size of the image. Each pixel of an image is computed by a work-item in the GPU. Local work size indicates the number of work-items included in a work-group. As mentioned in section 4, faces are originally detected in the cascade kernel via sliding detection window in the Viola-Jones algorithm in a serial CPU version. However, in the Cascade GPU kernel each work-item has its own detection window in parallel which means it is not necessary to slide the detection window. Nonface pixels are considered as not a face and rejected at stages 1 or 2 where simpler classifiers are used to reject the majority of images. As is shown in Figure 6. (a), earlier rejected work-items need to wait until all work-items finish the detection window computation in the same work-group because the unit of the workgroup is executed in the GPU which results in idle work-items. If only one work-item still works until the final stage, the other workitems are idle. Thus, here is a serious imbalanced computation problem which leads to poor performance. To address the imbalanced workload problem, this study presents a new approach to dynamically allocate the global work size according to the number of skin-colored pixels. In other words, by only allowing work-items to compute the detection window of a skin-colored pixel, it is less likely to be rejected; on the contrary, non-skin pixels cannot be computed which prevents idle threads from occurring and takes full advantage of the GPU resource. Figure 6. (b) shows that that global work size is allocated according to the number of skin-colored pixels, and there are few idle work-items in the GPU Local memory optimization Similar to a dgpu, a mobile GPU also has bottleneck issues regarding performance due to global memory access. A mobile GPU suffers from a longer latency from the off-chip global memory access than that of a dgpu. Therefore, memory optimization is essential in parallel image processing in a mobile GPU. Local memory where work-items can share data in a same work-group has a lower latency than that of global memory. Thus, loading these shared data into the local memory can reduce global memory access and improve processing performance. However, one should note that as more local memory is required by a kernel, fewer workitems are available to execute it. Therefore, it is important to analyze whether the data are suitable for sharing in the work-items in a work-group and to find the optimal size of the data to load. When each work-item computes a detection window in a cascade kernel, the same classifier data trained in advance are used by all work-items. Thus, this study tried to find the optimal size of the classifier data to load and thereby partially load the classifier data into local memory. We tried to load 3 features of the classifier data of cascade stage 1 that most work-items share because when the higher stage is in progress, more work-items are returned, and fewer classifier data are shared. An average reduction of 12% was observed in execution time after using local memory. 6. EXPERIMENTAL RESULTS 6.1 Experiment set-up For the experiment, we chose as a test platform the Galaxy S5- LTEA smartphone, which is driven by the Qualcomm application processor. Qualcomm is the clear leader in the smartphone application processor market with the Snapdragon series. The Galaxy S5-LTEA is powered by a Snapdragon 808 SoC with a 2.45 GHz quad-core Krait 400 CPU and 578 MHz Adreno 330 quadcore GPU. The Adreno 330 GPU supports advanced graphics APIs, including OpenGL ES 3.0 and OpenCL 1.2 library. The mobile operating system was Android 5.0. The OpenCV library was used to implement face detection for the CPU version. In the performance evaluation, this paper experimented with two different datasets. The first dataset is the Image of Groups [1] dataset which contains frontal face images in color and group images that are composed of a number of people. Additionally, this dataset considers illumination conditions, faces of various races, and size of faces. We collected 60 images containing 622 faces as part of the dataset. In addition, test images were resized to HD (720p) maintaining a fixed ratio of the image to fit the output size on the mobile display. The other dataset was the INHA FACE, in which the images inside belongs to the HD level and is comprised of people at different distances (1 m, 3 m, and 5 m). The reason we used this dataset is 73
5 120 CPUonly GPUonly CPU-GPU Scailing & Skin Color Filtering Dilation Integral Cascade Figure 7. Execution time in each kernel to evaluate the relationship between the processing time and the amount of skin-colored pixels. 6.2 Accuracy & Execution time We used cascade classification from the OpenCV library which is well known in the fields of computer vision. Thus, we set the same configuration parameters and then compared the performance between the CPU implementation of the welloptimized OpenCV library which is widely used and considered accurate, and our CPU-GPU implementation. The results of the detection from each version were the same, which means there is no performance penalty due to the acceleration of our CPU-GPU implementation. In the first experiment, we measured the processing time within each kernel and compared the proposed CPU-GPU version with other versions such as the CPU only and the GPU only. As shown in Figure 7, the cascade kernel spent most of the time in the detection window due to its computational complexity. Compared to other versions, the proposed CPU-GPU version spent less time Figure 8. Average execution time according distance in the cascade kernel because it reduces the idleness of the workitems. In addition, the CPU-GPU version had the lowest time cost and a computational speed 3.22 times faster than that of the CPU only version. In the second experiment, we measured the execution time according to the amount of skin-colored pixels. As shown in Figure 10, the shorter the distance between camera and people, the more skin-colored pixels are found, which results in more computational efforts. In contrast, as the distance to the camera became longer, fewer skin-colored pixels are detected. Our experiments were carried out under different scenarios taking into consideration distances of 1, 3 and 5 m. At 1 m, the images contained the largest amount of skin-colored pixels, so the processing time was the longest. And it was observed that an increase in distance causes a decrease in the number of colored pixels, thereby reducing processing time. Finally, processing time at 5 m is the fastest due to the least amount of skin-colored pixels. Compared with the other implementations, when the distance was 1 m, 3 m and 5 m, the processing time of the proposed CPU-GPU method was ms, Figure 9. Results from the Images of the Groups dataset. Figure 10. Results from the INHA_FACE dataset. 74
6 35.16 ms, and ms, respectively, which shows that the CPU- GPU method had the best performance regarding processing time. Table 1. Comparison of the performance of different methods with the Image of Groups dataset Method Execution time (ms) fps Speedup CPU only GPU only x CPU-GPU x Table 2. Comparison of the performance of different methods with the INHA FACE dataset Method Execution time (ms) fps Speedup CPU only GPU only x CPU-GPU x Tables 1and 2 compare the performance of each method using the Image of Groups and the INHA FACE datasets, respectively. It is obvious that the method proposed in this study achieves 3.3 times and 6.29 times increased processing times compared to the CPU only method with the Image of Groups and INHA FACE datasets, respectively. Additionally, note that real-time processing was obtained with the INHA FACE dataset. 7. Conclusions This paper presents an optimized parallel implementation of the Viola - Jones face detection algorithm as a case study into mapping a computer vision application on a mobile SoC using CPU-GPU co-processing. To explore both the CPU and GPU computational power, we discussed several parallelization and optimization methods to accelerate the algorithm: CPU GPU task parallelism, sliding window parallelism, scale image parallelism, dynamic allocation of work-items, and local memory optimization. These methods resolved the imbalanced workload problem and improved the processing time in mobile SoCs. The performance is much better than a well-optimized CPU implementation from the OpenCV library. Finally, for future work, we plan to experiment with power consumption and port this algorithm to other mobile devices to validate and optimize our work. 8. ACKNOWLEDGEMENTS This work was supported by the Industrial Strategic Technology Development Program ( , The Development of Fusion Processor based on Multi-Shader GPU) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea) 9. References [1] Gallagher, A.C. and Chen, T Understanding Images of Groups of People. Computer Vision and Pattern Recognition (CVPR). (2009), [2] Hefenbrock, D., Oberg, J., Thanh, N.T.N., Kastner, R. and Baden, S.B Accelerating Viola-Jones face detection to FPGA-level using GPUs. Proceedings - IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM (2010), [3] Jia, H., Zhang, Y., Wang, W. and Xu, J Accelerating Viola-Jones Facce Detection Algorithm on GPUs IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems. (2012), [4] Kakumanu, P., Makrogiannis, S. and Bourbakis, N A survey of skin-color modeling and detection methods. Pattern Recognition. 40, 3 (2007), [5] Kang, S.H., Lee, S. and Park, I.K Parallelization and Optimization of Feature Detection Algorithms on Embedded GPU. (2014), M. Rahman, J.Ren, and N. Kehtarnavaz, Real-time implementation of robust face detection on mobile platforms, IEEE ICASSP 09, pp. 1353, [6] Li, E., Wang, B., Yang, L., Peng, Y., Du, Y., Zhang, Y. and Chiu, Y.-J GPU and CPU Cooperative Accelaration for Face Detection on Modern Processors IEEE International Conference on Multimedia and Expo. (2012), [7] Liu, X., Lou, Y., Yu, A. and Lang, B Search by mobile image based on visual and spatial consistency. Multimedia and Expo (ICME), (2011), 1 6. [8] Munshi, A., and Leech, J., OpenGL ES common profile specification version (full specification). Khronos Group. [9] Munshi, A., OpenCL specification 1.1. Khronos OpenCL Working Group. [10] Nvidia. CUDA RUNTIME API, March [11] Obukhov, A Haar classifiers for object detection with cuda. GPU Computing Gems Emerald Edition, [12] Oro, D., Fern ndez, C., Segura, C., Martorell, X. and Hernando, J Accelerating Boosting-Based Face Detection on GPUs st International Conference on Parallel Processing. (2012), [13] Oro, D., Fernández, C., Saeta, J.R., Martorell, X. and Hernando, J Real-time GPU-based face detection in HD video sequences. Proceedings of the IEEE International Conference on Computer Vision. (2011), [14] Pulli, K., Baksheev, A., Kornyakov, K. and Eruhimov, V Real-time computer vision with OpenCV. Communications of the ACM. 55, 6 (2012), 61. [15] Rahman, M., Ren, J. and Kehtarnavaz, N Realtime implementation of robust face detection on mobile platforms. Acoustics, Speech and Signal Processing, ICASSP IEEE International Conference on. (2009), [16] Sharma, B., Thota, R., Vydyanathan, N. and Kale, A Towards a robust, real-time face processing system using CUDA-enabled GPUs International Conference on High Performance Computing (HiPC). (2009),
7 [17] Viola, P., Jones, M Rapid object detection using a boosted cascade of simple features. Computer Vision and Pattern Recognition (CVPR) 1, I 511 I 518. [18] Viola, P., Jones, M Robust real-time face detection. International journal of computer vision 57, 2, [19] Wagner, D., Schmalstieg, D History and future of tracking for mobile phone augmented reality IEEE International Symposium on Ubiquitous Virtual Reality,7-10. [20] Wang, G., Rister, B. and Cavallaro, J.R Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone IEEE Global Conference on Signal and Information Processing (December 2013), [21] Wang, G., Xiong, Y., Yun, J. and Cavallaro, J.R Accelerating computer vision algorithms using OpenCL framework on the mobile GPU - A case study. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. (2013),
Performance Estimation of Parallel Face Detection Algorithm on Multi-Core Platforms
Performance Estimation of Parallel Face Detection Algorithm on Multi-Core Platforms Subhi A. Bahudaila and Adel Sallam M. Haider Information Technology Department, Faculty of Engineering, Aden University.
More informationXIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture
XIV International PhD Workshop OWD 2012, 20 23 October 2012 Optimal structure of face detection algorithm using GPU architecture Dmitry Pertsau, Belarusian State University of Informatics and Radioelectronics
More informationFace Detection CUDA Accelerating
Face Detection CUDA Accelerating Jaromír Krpec Department of Computer Science VŠB Technical University Ostrava Ostrava, Czech Republic krpec.jaromir@seznam.cz Martin Němec Department of Computer Science
More informationACCELERATING COMPUTER VISION ALGORITHMS USING OPENCL FRAMEWORK ON THE MOBILE GPU - A CASE STUDY
ACCELERATING COMPUTER VISION ALGORITHMS USING OPENCL FRAMEWORK ON THE MOBILE GPU - A CASE STUDY Guohui Wang*, Yingen Xiong, Jay Yun, and Joseph R. Cavallaro* *ECE Department, Rice University, Houston,
More informationEnergy Efficient Object Detection on the Mobile GP- GPU
Energy Efficient Object Detection on the Mobile GP- GPU Fitsum Assamnew Andargie, Jonathan Rose, Todd Austin, and Valeria Bertacco School of Electrical and Computer Engineering, Addis Ababa University,
More informationFace Detection on CUDA
125 Face Detection on CUDA Raksha Patel Isha Vajani Computer Department, Uka Tarsadia University,Bardoli, Surat, Gujarat Abstract Face Detection finds an application in various fields in today's world.
More informationMaximizing Face Detection Performance
Maximizing Face Detection Performance Paulius Micikevicius Developer Technology Engineer, NVIDIA GTC 2015 1 Outline Very brief review of cascaded-classifiers Parallelization choices Reducing the amount
More informationGeneral Purpose GPU Programming. Advanced Operating Systems Tutorial 9
General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous
More informationParallel face Detection and Recognition on GPU
Parallel face Detection and Recognition on GPU Shivashankar J. Bhutekar 1, Arati K. Manjaramkar 2 1 Research Scholar 2 Associate Professor Shri Guru Gobind Singhji Institute of Engineering and Technology
More informationDesign guidelines for embedded real time face detection application
Design guidelines for embedded real time face detection application White paper for Embedded Vision Alliance By Eldad Melamed Much like the human visual system, embedded computer vision systems perform
More informationA Hybrid Face Detection System using combination of Appearance-based and Feature-based methods
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.5, May 2009 181 A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods Zahra Sadri
More informationOptimization solutions for the segmented sum algorithmic function
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code
More informationGeneral Purpose GPU Programming. Advanced Operating Systems Tutorial 7
General Purpose GPU Programming Advanced Operating Systems Tutorial 7 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous
More informationProgress Report of Final Year Project
Progress Report of Final Year Project Project Title: Design and implement a face-tracking engine for video William O Grady 08339937 Electronic and Computer Engineering, College of Engineering and Informatics,
More informationViola Jones Face Detection. Shahid Nabi Hiader Raiz Muhammad Murtaz
Viola Jones Face Detection Shahid Nabi Hiader Raiz Muhammad Murtaz Face Detection Train The Classifier Use facial and non facial images Train the classifier Find the threshold value Test the classifier
More informationFACE DETECTION BY HAAR CASCADE CLASSIFIER WITH SIMPLE AND COMPLEX BACKGROUNDS IMAGES USING OPENCV IMPLEMENTATION
FACE DETECTION BY HAAR CASCADE CLASSIFIER WITH SIMPLE AND COMPLEX BACKGROUNDS IMAGES USING OPENCV IMPLEMENTATION Vandna Singh 1, Dr. Vinod Shokeen 2, Bhupendra Singh 3 1 PG Student, Amity School of Engineering
More informationNeural Network Implementation using CUDA and OpenMP
Neural Network Implementation using CUDA and OpenMP Honghoon Jang, Anjin Park, Keechul Jung Department of Digital Media, College of Information Science, Soongsil University {rollco82,anjin,kcjung}@ssu.ac.kr
More informationDetection of a Single Hand Shape in the Foreground of Still Images
CS229 Project Final Report Detection of a Single Hand Shape in the Foreground of Still Images Toan Tran (dtoan@stanford.edu) 1. Introduction This paper is about an image detection system that can detect
More informationREDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS
BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the
More informationProfiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency
Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationVehicle Detection Method using Haar-like Feature on Real Time System
Vehicle Detection Method using Haar-like Feature on Real Time System Sungji Han, Youngjoon Han and Hernsoo Hahn Abstract This paper presents a robust vehicle detection approach using Haar-like feature.
More informationFast Face Detection Assisted with Skin Color Detection
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 4, Ver. II (Jul.-Aug. 2016), PP 70-76 www.iosrjournals.org Fast Face Detection Assisted with Skin Color
More informationCPU-GPU hybrid computing for feature extraction from video stream
LETTER IEICE Electronics Express, Vol.11, No.22, 1 8 CPU-GPU hybrid computing for feature extraction from video stream Sungju Lee 1, Heegon Kim 1, Daihee Park 1, Yongwha Chung 1a), and Taikyeong Jeong
More informationStorage Architecture and Software Support for SLC/MLC Combined Flash Memory
Storage Architecture and Software Support for SLC/MLC Combined Flash Memory Soojun Im and Dongkun Shin Sungkyunkwan University Suwon, Korea {lang33, dongkun}@skku.edu ABSTRACT We propose a novel flash
More informationUtilizing Graphics Processing Units for Rapid Facial Recognition using Video Input
Utilizing Graphics Processing Units for Rapid Facial Recognition using Video Input Charles Gala, Dr. Raj Acharya Department of Computer Science and Engineering Pennsylvania State University State College,
More informationASYNCHRONOUS SHADERS WHITE PAPER 0
ASYNCHRONOUS SHADERS WHITE PAPER 0 INTRODUCTION GPU technology is constantly evolving to deliver more performance with lower cost and lower power consumption. Transistor scaling and Moore s Law have helped
More informationParallel Processing of Multimedia Data in a Heterogeneous Computing Environment
Parallel Processing of Multimedia Data in a Heterogeneous Computing Environment Heegon Kim, Sungju Lee, Yongwha Chung, Daihee Park, and Taewoong Jeon Dept. of Computer and Information Science, Korea University,
More informationAdvanced Imaging Applications on Smart-phones Convergence of General-purpose computing, Graphics acceleration, and Sensors
Advanced Imaging Applications on Smart-phones Convergence of General-purpose computing, Graphics acceleration, and Sensors Sriram Sethuraman Technologist & DMTS, Ittiam 1 Overview Imaging on Smart-phones
More informationMediaTek Video Face Beautify
MediaTek Video Face Beautify November 2014 2014 MediaTek Inc. Table of Contents 1 Introduction... 3 2 The MediaTek Solution... 4 3 Overview of Video Face Beautify... 4 4 Face Detection... 6 5 Skin Detection...
More informationReal-time Background Subtraction Based on GPGPU for High-Resolution Video Surveillance
Real-time Background Subtraction Based on GPGPU for High-Resolution Video Surveillance Sunhee Hwang sunny16@yonsei.ac.kr Youngjung Uh youngjung.uh@yonsei.ac.kr Minsong Ki kms2014@yonsei.ac.kr Kwangyong
More informationFace tracking. (In the context of Saya, the android secretary) Anton Podolsky and Valery Frolov
Face tracking (In the context of Saya, the android secretary) Anton Podolsky and Valery Frolov Introduction Given the rather ambitious task of developing a robust face tracking algorithm which could be
More informationFace detection and recognition. Many slides adapted from K. Grauman and D. Lowe
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Detection Recognition Sally History Early face recognition systems: based on features and distances
More informationFast Natural Feature Tracking for Mobile Augmented Reality Applications
Fast Natural Feature Tracking for Mobile Augmented Reality Applications Jong-Seung Park 1, Byeong-Jo Bae 2, and Ramesh Jain 3 1 Dept. of Computer Science & Eng., University of Incheon, Korea 2 Hyundai
More informationGPU Programming Using NVIDIA CUDA
GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics
More informationPortland State University ECE 588/688. Graphics Processors
Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly
More informationClassifier Case Study: Viola-Jones Face Detector
Classifier Case Study: Viola-Jones Face Detector P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-time face detection.
More informationLarge-Scale Traffic Sign Recognition based on Local Features and Color Segmentation
Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,
More informationDesign of a Dynamic Data-Driven System for Multispectral Video Processing
Design of a Dynamic Data-Driven System for Multispectral Video Processing Shuvra S. Bhattacharyya University of Maryland at College Park ssb@umd.edu With contributions from H. Li, K. Sudusinghe, Y. Liu,
More informationAdaptive Feature Extraction with Haar-like Features for Visual Tracking
Adaptive Feature Extraction with Haar-like Features for Visual Tracking Seunghoon Park Adviser : Bohyung Han Pohang University of Science and Technology Department of Computer Science and Engineering pclove1@postech.ac.kr
More informationCopyright Khronos Group Page 1. Vulkan Overview. June 2015
Copyright Khronos Group 2015 - Page 1 Vulkan Overview June 2015 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon Open Consortium creating OPEN STANDARD APIs for hardware acceleration
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationMouse Pointer Tracking with Eyes
Mouse Pointer Tracking with Eyes H. Mhamdi, N. Hamrouni, A. Temimi, and M. Bouhlel Abstract In this article, we expose our research work in Human-machine Interaction. The research consists in manipulating
More informationIntroduction. How? Rapid Object Detection using a Boosted Cascade of Simple Features. Features. By Paul Viola & Michael Jones
Rapid Object Detection using a Boosted Cascade of Simple Features By Paul Viola & Michael Jones Introduction The Problem we solve face/object detection What's new: Fast! 384X288 pixel images can be processed
More informationGPGPU on Mobile Devices
GPGPU on Mobile Devices Introduction Addressing GPGPU for very mobile devices Tablets Smartphones Introduction Why dedicated GPUs in mobile devices? Gaming Physics simulation for realistic effects 3D-GUI
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationPortable GPU-Based Artificial Neural Networks For Data-Driven Modeling
City University of New York (CUNY) CUNY Academic Works International Conference on Hydroinformatics 8-1-2014 Portable GPU-Based Artificial Neural Networks For Data-Driven Modeling Zheng Yi Wu Follow this
More informationImage Processing Pipeline for Facial Expression Recognition under Variable Lighting
Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Ralph Ma, Amr Mohamed ralphma@stanford.edu, amr1@stanford.edu Abstract Much research has been done in the field of automated
More informationFace Recognition for Mobile Devices
Face Recognition for Mobile Devices Aditya Pabbaraju (adisrinu@umich.edu), Srujankumar Puchakayala (psrujan@umich.edu) INTRODUCTION Face recognition is an application used for identifying a person from
More informationComputing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany
Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been
More informationLearning to Detect Faces. A Large-Scale Application of Machine Learning
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P. Viola and M. Jones, International Journal of Computer
More informationOcclusion Detection of Real Objects using Contour Based Stereo Matching
Occlusion Detection of Real Objects using Contour Based Stereo Matching Kenichi Hayashi, Hirokazu Kato, Shogo Nishida Graduate School of Engineering Science, Osaka University,1-3 Machikaneyama-cho, Toyonaka,
More informationUse cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games
Viewdle Inc. 1 Use cases Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games 2 Why OpenCL matter? OpenCL is going to bring such
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationNext Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1
Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Ecosystem @neilt3d Copyright Khronos Group 2015 - Page 1 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon
More informationPowerVR Hardware. Architecture Overview for Developers
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationUsing Graphics Chips for General Purpose Computation
White Paper Using Graphics Chips for General Purpose Computation Document Version 0.1 May 12, 2010 442 Northlake Blvd. Altamonte Springs, FL 32701 (407) 262-7100 TABLE OF CONTENTS 1. INTRODUCTION....1
More informationMixing Graphics and Compute for Real-Time Multiview Human Body Tracking
Mixing Graphics and Compute for Real-Time Multiview Human Body Tracking Boguslaw Rymut 2 and Bogdan Kwolek 1 1 AGH University of Science and Technology 30 Mickiewicza Av., 30-059 Krakow, Poland bkw@agh.edu.pl
More informationGraphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university
Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited
More informationB. Tech. Project Second Stage Report on
B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic
More informationWindow based detectors
Window based detectors CS 554 Computer Vision Pinar Duygulu Bilkent University (Source: James Hays, Brown) Today Window-based generic object detection basic pipeline boosting classifiers face detection
More informationPerformance impact of dynamic parallelism on different clustering algorithms
Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu
More informationFace Detection on OpenCV using Raspberry Pi
Face Detection on OpenCV using Raspberry Pi Narayan V. Naik Aadhrasa Venunadan Kumara K R Department of ECE Department of ECE Department of ECE GSIT, Karwar, Karnataka GSIT, Karwar, Karnataka GSIT, Karwar,
More informationImproved Integral Histogram Algorithm. for Big Sized Images in CUDA Environment
Contemporary Engineering Sciences, Vol. 7, 2014, no. 24, 1415-1423 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.49174 Improved Integral Histogram Algorithm for Big Sized Images in CUDA
More informationGPU-based pedestrian detection for autonomous driving
Procedia Computer Science Volume 80, 2016, Pages 2377 2381 ICCS 2016. The International Conference on Computational Science GPU-based pedestrian detection for autonomous driving V. Campmany 1,2, S. Silva
More informationGPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013
GPGPU on ARM Tom Gall, Gil Pitney, 30 th Oct 2013 Session Description This session will discuss the current state of the art of GPGPU technologies on ARM SoC systems. What standards are there? Where are
More informationAn Acceleration Scheme to The Local Directional Pattern
An Acceleration Scheme to The Local Directional Pattern Y.M. Ayami Durban University of Technology Department of Information Technology, Ritson Campus, Durban, South Africa ayamlearning@gmail.com A. Shabat
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied
More informationEfficient and Fast Multi-View Face Detection Based on Feature Transformation
Efficient and Fast Multi-View Face Detection Based on Feature Transformation Dongyoon Han*, Jiwhan Kim*, Jeongwoo Ju*, Injae Lee**, Jihun Cha**, Junmo Kim* *Department of EECS, Korea Advanced Institute
More informationSkin and Face Detection
Skin and Face Detection Linda Shapiro EE/CSE 576 1 What s Coming 1. Review of Bakic flesh detector 2. Fleck and Forsyth flesh detector 3. Details of Rowley face detector 4. Review of the basic AdaBoost
More informationROBUST REAL TIME FACE RECOGNITION AND TRACKING ON GPU USING FUSION OF RGB AND DEPTH IMAGE.
ROBUST REAL TIME FACE RECOGNITION AND TRACKING ON GPU USING FUSION OF RGB AND DEPTH IMAGE. Narmada Naik 1 and Dr.G.N Rathna 2 1 Department of Electrical Engineering, Indian Institute of science, Bangalore,
More informationEmbedded Face Detection Application based on Local Binary Patterns
Embedded Face Detection Application based on Local Binary Patterns Laurentiu Acasandrei Instituto de Microelectrónica de Sevilla IMSE-CNM-CSIC Sevilla, Spain laurentiu@imse-cnm.csic.es Angel Barriga Instituto
More informationCriminal Identification System Using Face Detection and Recognition
Criminal Identification System Using Face Detection and Recognition Piyush Kakkar 1, Mr. Vibhor Sharma 2 Information Technology Department, Maharaja Agrasen Institute of Technology, Delhi 1 Assistant Professor,
More informationFace detection and recognition. Detection Recognition Sally
Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationFace Detection and Alignment. Prof. Xin Yang HUST
Face Detection and Alignment Prof. Xin Yang HUST Many slides adapted from P. Viola Face detection Face detection Basic idea: slide a window across image and evaluate a face model at every location Challenges
More informationad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors Weifeng Liu and Brian Vinter Niels Bohr Institute University of Copenhagen Denmark {weifeng, vinter}@nbi.dk March 1, 2014 Weifeng
More informationDeep Learning Based Real-time Object Recognition System with Image Web Crawler
, pp.103-110 http://dx.doi.org/10.14257/astl.2016.142.19 Deep Learning Based Real-time Object Recognition System with Image Web Crawler Myung-jae Lee 1, Hyeok-june Jeong 1, Young-guk Ha 2 1 Department
More informationDisguised Face Identification Based Gabor Feature and SVM Classifier
Disguised Face Identification Based Gabor Feature and SVM Classifier KYEKYUNG KIM, SANGSEUNG KANG, YUN KOO CHUNG and SOOYOUNG CHI Department of Intelligent Cognitive Technology Electronics and Telecommunications
More informationExploiting scene constraints to improve object detection algorithms for industrial applications
Exploiting scene constraints to improve object detection algorithms for industrial applications PhD Public Defense Steven Puttemans Promotor: Toon Goedemé 2 A general introduction Object detection? Help
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationEmerging Vision Technologies: Enabling a New Era of Intelligent Devices
Emerging Vision Technologies: Enabling a New Era of Intelligent Devices Computer vision overview Computer vision is being integrated in our daily lives Acquiring, processing, and understanding visual data
More informationAccelerating MapReduce on a Coupled CPU-GPU Architecture
Accelerating MapReduce on a Coupled CPU-GPU Architecture Linchuan Chen Xin Huo Gagan Agrawal Department of Computer Science and Engineering The Ohio State University Columbus, OH 43210 {chenlinc,huox,agrawal}@cse.ohio-state.edu
More informationMobile Face Recognization
Mobile Face Recognization CS4670 Final Project Cooper Bills and Jason Yosinski {csb88,jy495}@cornell.edu December 12, 2010 Abstract We created a mobile based system for detecting faces within a picture
More informationProject Report for EE7700
Project Report for EE7700 Name: Jing Chen, Shaoming Chen Student ID: 89-507-3494, 89-295-9668 Face Tracking 1. Objective of the study Given a video, this semester project aims at implementing algorithms
More informationParallel Tracking. Henry Spang Ethan Peters
Parallel Tracking Henry Spang Ethan Peters Contents Introduction HAAR Cascades Viola Jones Descriptors FREAK Descriptor Parallel Tracking GPU Detection Conclusions Questions Introduction Tracking is a
More informationTUNING CUDA APPLICATIONS FOR MAXWELL
TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v7.0 March 2015 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationParallelizing Inline Data Reduction Operations for Primary Storage Systems
Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr
More informationLocal Difference Binary for Ultrafast and Distinctive Feature Description
Local Difference Binary for Ultrafast and Distinctive Feature Description Xin Yang, K.-T. Tim Cheng IEEE Trans. on Pattern Analysis and Machine Intelligence, 2014, January *Source code has been released
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationBifrost - The GPU architecture for next five billion
Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor
More informationAn Approach for Real Time Moving Object Extraction based on Edge Region Determination
An Approach for Real Time Moving Object Extraction based on Edge Region Determination Sabrina Hoque Tuli Department of Computer Science and Engineering, Chittagong University of Engineering and Technology,
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationGPU Based Face Recognition System for Authentication
GPU Based Face Recognition System for Authentication Bhumika Agrawal, Chelsi Gupta, Meghna Mandloi, Divya Dwivedi, Jayesh Surana Information Technology, SVITS Gram Baroli, Sanwer road, Indore, MP, India
More informationA robust method for automatic player detection in sport videos
A robust method for automatic player detection in sport videos A. Lehuger 1 S. Duffner 1 C. Garcia 1 1 Orange Labs 4, rue du clos courtel, 35512 Cesson-Sévigné {antoine.lehuger, stefan.duffner, christophe.garcia}@orange-ftgroup.com
More informationEnergy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package
High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction
More information