2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Size: px
Start display at page:

Download "2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems"

Transcription

1 2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems Accelerating a computer vision algorithm on a mobile SoC using CPU-GPU co-processing - A case study on face detection Youngwan Lee Department of Information and Communication Engineering Inha University Incheon, Korea youngwan88@gmail.com Cheolyong Jang Department of Information and Communication Engineering Inha University Incheon, Korea cyjang@gmail.com Hakil Kim Department of Information and Communication Engineering Inha University Incheon, Korea hikim@inha.ac.kr ABSTRACT Recently, mobile devices have become equipped with sophisticated hardware components such as a heterogeneous multi-core SoC that consists of a CPU, GPU, and DSP. This provides opportunities to realize computationally-intensive computer vision applications using General Purpose GPU (GPGPU) programming tools such as Open Graphics Library for Embedded System (OpenGL ES) and Open Computing Language (OpenCL). As a case study, the aim of this research was to accelerate the Viola-Jones face detection algorithm which is computationally expensive and limited in use on mobile devices due to irregular memory access and imbalanced workloads resulting in low performance regarding the processing time. To solve the above challenges, the proposed method of this study adapted CPU GPU task parallelism, sliding window parallelism, scale image parallelism, dynamic allocation of threads, and local memory optimization to improve the computational time. The experimental results show that the proposed method achieved a 3.3~6.29 times increased computational time compared to the well-optimized OpenCV implementation on a CPU. The proposed method can be adapted to other applications using mobile GPUs and CPUs. Keywords Computer vision; Mobile GPGPU; OpenGL ES 2.0; OpenCL; CPU-GPU co-processing 1. INTRODUCTION In recent years, the number of mobile devices with high-definition displays, high-resolution cameras, and application processors has increased exponentially, which has facilitated pragmatic computer vision applications such as face detection, mobile visual search, 3- D games, and augmented reality on mobile devices [7,14,15,19]. However, computationally intensive computer vision applications for practical use on mobile devices are limited because of computational restrictions and limited performance compared to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. MobileSoft 16, May 16 17, 2016, Austin, TX, USA Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM /16/05 $ DOI: computers. To address this limitation, many researchers have tried to use GPUs as general purpose GPUs (GPGPUs) to perform computations in applications usually handled by CPUs to accelerate image processing and computer vision algorithms [5,20,21] using several GPU programming models such as Open Graphics Library for Embedded System 2.0 (OpenGL ES 2.0) [8] and Open Computing Language (OpenCL) [9]. However, many studies and advancements applied to desktop GPUs (dgpus) are not suitable for mobile applications because of the difference between dgpus and the mobile hardware architecture, namely System-On-Chip (SoC), with a CPU and GPU. To achieve good performance, it is of great importance to analyze the algorithms and workloads on a mobile phone and redesign an efficient workload partitioning policy for mobile hardware architecture. In this study, we present an acceleration and optimization method on mobile devices for an exemplar computer vision application the widely used Viola -Jones face detection algorithm [17,18] to exploit the capability of mobile CPUs and GPUs using OpenGL ES and OpenCL. Because of irregular memory access and an imbalanced workload, it is challenging to optimize the Viola-Jones face detection algorithm on a mobile SoC. This paper addresses the problems regarding the full use of computing power from both the mobile CPU and GPU. The rest of the paper is organized as follows: Section 2 explains the GPGPU image processing framework on a mobile device. Section 3 discusses related works on accelerating face detection algorithms with GPUs. Section 4 briefly describes the Viola Jones face detection algorithm. Section 5 presents the proposed accelerating face detection algorithm based on CPU-GPU co-processing. The experimental results are shown in Section 6. Finally, Section 7 concludes this paper. 2. MOBILE GPGPU IMAGE PROCESSING FRAMEWORK There are several differences between a mobile GPU and a dgpu. First, because a mobile GPU and CPU are both integrated into the application processor the SoC, they can save data transfer time by sharing the same memory bus. Second, the memory bandwidth of the mobile GPU is much lower than a dgpu. Additionally, the mobile GPU has far fewer compute units than that of a dgpu. For these reasons, it is necessary to carefully analyze specific algorithms and efficiently map them to the mobile SoC as well as find an optimal mapping method for the mobile SoC. OpenGL ES and OpenCL support mobile SoCs. OpenGL ES is an embedded version of OpenGL which is a standard graphic API 70

2 Figure 1. Mobile GPGPU Image Processing Framework. providing a graphic rendering pipeline as well as a GPGPU tool. OpenCL is an open parallel computing framework which can be used on heterogeneous platforms including CPUs, GPUs, and even DSPs. Because of the nature of shared memory on a mobile SoC, OpenGL ES and Open CL can both access the same data in the memory without any data copying enabling the processing to take place in the same memory rather than increasing the number of separate allocations. Considering the mobile GPU as a combination of both the main rendering device by OpenGL ES and the main compute device by OpenCL, these functionalities time-share the GPGPU. When a video stream provider such as mobile device camera supplies frame data as a GLES texture data source in the global memory, the GPU can use it in the cl_mem format for a compute OpenCL kernel. After the compute OpenCL kernel is executed, the computed result data are stored as GLES texture that the GPU can render in the display. 3. RELATED WORKS There are many works that have accelerated Viola-Jones face detection with a dgpu rather than with a mobile GPU implemented by the Compute Unified Device Architecture (CUDA) [10] or OpenCL. Sharma et al. [16] presented a face detection and tracking algorithm based on the haar-like feature on the GTX285 and achieved more than 20 times the processing performance than that of the VGA image processing performance. Oro et al. [12,13] also proposed a haar-like feature based face detection algorithm for HD video on the GTX470 and achieved an increased speed of 2.5 times. However, they used CUDA which is a GPGPU programming tool for only NVIDIA GPUs. When compared to OpenCL used in several compute components, it is unable to deal with the imbalanced workload problem that has been encountered while implementing the Viola-Jones face detection algorithm in GPUs. Several studies have been done in attempt to address the imbalanced computation problem [2,3,6,11]. Hefenbrock et al. [2] presented a multi-gpu solution that evaluates each detection window in a different thread, and computes each scaled window in parallel in a different GPU. Obukhov [11] also proposed another solution that consists of a stage-parallel and pixel-parallel implementation. Jia et al. [3] resolved this irregular workload problem of the GPU by using Uberkernel and Persistent threads. However, these studies do not utilize the CPU resources because most computations are executed on the GPU. Although Wang et al. [21] made use of the computational capability of both the CPU and GPU cores, their algorithm is only optimized for the Intel Sandy bridge chipset. Making full use of the computing power from both CPU and GPU on a mobile SoC, this paper presents a solution for the imbalanced computation problem with the Viola-Jones face detection on a mobile device using OpenCL. 4. VIOLA - JONES ALGORITHM The Viola-Jones object detection framework was proposed by Paul Viola and Michael Jones for face detection. The proposed cascade classifier is a particular case of ensemble learning which can speed up to achieve real-time processing. Because adaboost is a variant of boosting algorithms, this method was trained with adaboost by weighting the haar-like features which make the features suitable for face detection. However, we only discuss the detection process because the training process does not affect the speed in the face detection process. 4.1 Haar-like features Haar-like features in the Viola-Jones algorithm can judge whether a face is correct from an image. Using haar-like features makes it easier to find the edge, line and saliency of a face. As shown in Figure. 5, haar-like features, which consist of rectangular areas, are calculated by the difference between the intensity of the white areas and black areas. 4.2 Integral image As mentioned above, calculating haar-like features is very timeconsuming because it is based on a sliding window. The integral image can simply be acquired by calculating the sum of the intensity values within a particular area using only the pixel values of four points. 4.3 Cascade classifier A cascade can be seen as a strong classifier structure which consists of a number of weak classifiers for each stage in sequence. The weak classifiers of each stage initially have a simple structure because they only contain a few features, and as stages progress, the weak classifiers will get more complex making it more difficult to proceed to the next stage. As shown in Figure 2, since the sub-window cannot pass the initial classifier, it just decides that there is no face and does not proceed to the next stage. In contrast, if the sub-window can successfully pass every stage until the last one, then it can be determined as a face. Thus, the advantage is that because a sub-window can fail in any stage, the process will stop at the cost of a little time and save much processing time. 4.4 Scaling & Exhaustive sliding window Detection is carried out in each sliding window called a detection window which scans the whole image shown in Figure 5. After all the sliding windows in an image are evaluated, the same process will be repeated for rescaled images to detect faces of different sizes. 5. PROPOSED METHOD In this section, the parallel implementation of a face detection algorithm is presented first followed by optimization technologies. 5.1 Implementation Skin color filtering This paper applied skin color filtering which can reduce the detection region to accelerate the face detection algorithm. Skin 71 Figure 2. Cascade classifier.

3 color filtering is for robust rotation, scale, and occlusion of a face. In particular, we use the effective pixel-based skin detection should be noted that when the CPU reads an image object from the GPU, the data transfer overhead between the CPU and GPU is (a) (b) Figure 3. (a) Skin color filtering. (b) detected image Reducing search area. The proposed method adapts skin color filtering which can reduce the detection region to accelerate face detection algorithm. The skin color filtering is to robust rotation, scale, occlusion of face. In particular, we use effective pixel-based skin detection method to make it become the real-time processing [4]. Examples are shown in Figure 3. The skin-colored image is obtained from a color image with the color channels (R, G, B) by applying a color threshold (1): R 95 & G 40 & B 20 & max R, G, B min R, G, B 15 & (1) R G 15 & R G & R B If non-skin pixels have values similar to the skin, then they will be considered candidates for skin. This is because the method is based on a fixed color threshold. However, skin color filtering is still an effective way to decrease the overall process. Even in real skin-colored areas, there are still some pixel values that cannot satisfy the threshold, resulting in black holes, which will influence detection performance. To solve this problem, this paper adapts the dilation technique which can fill in the holes in the skincolored areas Design for parallelism CPU-GPU task-level parallelism Figure 4 shows a flow diagram of the proposed face detection algorithm based on CPU-GPU co-processing. OpenCL GPU kernels are executed in the right box. As a part of the process, in the left box, CPU serial computations are carried out. The Image 2- dimensional memory object that was converted from the texture data by the OpenGL ES pipeline is delivered to the OpenCL computing units. In the first step, scaling images and skin color filtering, which screen for skin-colored pixels, are carried out. After dilation of the skin-colored mask in the GPU kernel, CPU is treated as the host which reads the skin-colored mask from the GPU. It Figure 4. Flow diagram of the proposed face detection algorithm. Figure 5. Combined image for the GPU kernel. negligible due to the characteristic of the shared memory system on a mobile SoC. Collection of the skin-colored pixel s coordinates running on the CPU can be executed concurrently by executing the Integral kernel on the GPU, which enables the computing resources of both the CPU and GPU to be fully used at the same time. Finally, in the cascade GPU kernel, detection window computations are executed with the skin-colored pixels which are delivered from the CPU Sliding window parallelism Data parallelism means the same tasks are simultaneously executed on multiple processors across different pieces of distributed data. In particular, there should be no data dependencies affecting the execution order among the processors. As mentioned above, to implement face detection, a cascade classifier is computed to determine whether a face is in the sliding window. It is very efficient to do data parallelism when executing the same process for millions of detection windows independently Scale image parallelism The Viola-Jones face detection algorithm is scale invariant by processing several scales of images. Naïve implementation performs the face detection algorithm by iterative process among the scaled-down images, so that almost all kernels are iteratively launched. In such a process, several kernels in the loop increase the waste of computation resources due to the barrier synchronization problem. In addition, it can cause kernel launch overheads by iteratively performing the same kernel. To solve this problem, as shown in Figure 5, we merge the scaled down images into a single image. This method can reduce the waste of computing resources by eliminating kernel iterations. When we make a unified single image by combining all the scaled down images, a 2-dimensional image memory object has more advantages than a 1-dimensional global memory buffer which is commonly used in OpenCL. Therefore, this will not only access data more quickly but will also make it easier to handle boundary conditions compared to a global memory buffer. 5.2 Optimization Dynamic allocation of work-items Because a GPU uses the SIMT (Single Instruction Multiple Thread) programming model, units of work-groups are scheduled and 72

4 (a) (b) Figure 6. Reduction of idle work-items in a GPU (a) Original NDRange (b) Optimized NDRange executed in the GPU. Global work size refers to the total number of work-items (threads) in a GPU and is set as the size of the image. Each pixel of an image is computed by a work-item in the GPU. Local work size indicates the number of work-items included in a work-group. As mentioned in section 4, faces are originally detected in the cascade kernel via sliding detection window in the Viola-Jones algorithm in a serial CPU version. However, in the Cascade GPU kernel each work-item has its own detection window in parallel which means it is not necessary to slide the detection window. Nonface pixels are considered as not a face and rejected at stages 1 or 2 where simpler classifiers are used to reject the majority of images. As is shown in Figure 6. (a), earlier rejected work-items need to wait until all work-items finish the detection window computation in the same work-group because the unit of the workgroup is executed in the GPU which results in idle work-items. If only one work-item still works until the final stage, the other workitems are idle. Thus, here is a serious imbalanced computation problem which leads to poor performance. To address the imbalanced workload problem, this study presents a new approach to dynamically allocate the global work size according to the number of skin-colored pixels. In other words, by only allowing work-items to compute the detection window of a skin-colored pixel, it is less likely to be rejected; on the contrary, non-skin pixels cannot be computed which prevents idle threads from occurring and takes full advantage of the GPU resource. Figure 6. (b) shows that that global work size is allocated according to the number of skin-colored pixels, and there are few idle work-items in the GPU Local memory optimization Similar to a dgpu, a mobile GPU also has bottleneck issues regarding performance due to global memory access. A mobile GPU suffers from a longer latency from the off-chip global memory access than that of a dgpu. Therefore, memory optimization is essential in parallel image processing in a mobile GPU. Local memory where work-items can share data in a same work-group has a lower latency than that of global memory. Thus, loading these shared data into the local memory can reduce global memory access and improve processing performance. However, one should note that as more local memory is required by a kernel, fewer workitems are available to execute it. Therefore, it is important to analyze whether the data are suitable for sharing in the work-items in a work-group and to find the optimal size of the data to load. When each work-item computes a detection window in a cascade kernel, the same classifier data trained in advance are used by all work-items. Thus, this study tried to find the optimal size of the classifier data to load and thereby partially load the classifier data into local memory. We tried to load 3 features of the classifier data of cascade stage 1 that most work-items share because when the higher stage is in progress, more work-items are returned, and fewer classifier data are shared. An average reduction of 12% was observed in execution time after using local memory. 6. EXPERIMENTAL RESULTS 6.1 Experiment set-up For the experiment, we chose as a test platform the Galaxy S5- LTEA smartphone, which is driven by the Qualcomm application processor. Qualcomm is the clear leader in the smartphone application processor market with the Snapdragon series. The Galaxy S5-LTEA is powered by a Snapdragon 808 SoC with a 2.45 GHz quad-core Krait 400 CPU and 578 MHz Adreno 330 quadcore GPU. The Adreno 330 GPU supports advanced graphics APIs, including OpenGL ES 3.0 and OpenCL 1.2 library. The mobile operating system was Android 5.0. The OpenCV library was used to implement face detection for the CPU version. In the performance evaluation, this paper experimented with two different datasets. The first dataset is the Image of Groups [1] dataset which contains frontal face images in color and group images that are composed of a number of people. Additionally, this dataset considers illumination conditions, faces of various races, and size of faces. We collected 60 images containing 622 faces as part of the dataset. In addition, test images were resized to HD (720p) maintaining a fixed ratio of the image to fit the output size on the mobile display. The other dataset was the INHA FACE, in which the images inside belongs to the HD level and is comprised of people at different distances (1 m, 3 m, and 5 m). The reason we used this dataset is 73

5 120 CPUonly GPUonly CPU-GPU Scailing & Skin Color Filtering Dilation Integral Cascade Figure 7. Execution time in each kernel to evaluate the relationship between the processing time and the amount of skin-colored pixels. 6.2 Accuracy & Execution time We used cascade classification from the OpenCV library which is well known in the fields of computer vision. Thus, we set the same configuration parameters and then compared the performance between the CPU implementation of the welloptimized OpenCV library which is widely used and considered accurate, and our CPU-GPU implementation. The results of the detection from each version were the same, which means there is no performance penalty due to the acceleration of our CPU-GPU implementation. In the first experiment, we measured the processing time within each kernel and compared the proposed CPU-GPU version with other versions such as the CPU only and the GPU only. As shown in Figure 7, the cascade kernel spent most of the time in the detection window due to its computational complexity. Compared to other versions, the proposed CPU-GPU version spent less time Figure 8. Average execution time according distance in the cascade kernel because it reduces the idleness of the workitems. In addition, the CPU-GPU version had the lowest time cost and a computational speed 3.22 times faster than that of the CPU only version. In the second experiment, we measured the execution time according to the amount of skin-colored pixels. As shown in Figure 10, the shorter the distance between camera and people, the more skin-colored pixels are found, which results in more computational efforts. In contrast, as the distance to the camera became longer, fewer skin-colored pixels are detected. Our experiments were carried out under different scenarios taking into consideration distances of 1, 3 and 5 m. At 1 m, the images contained the largest amount of skin-colored pixels, so the processing time was the longest. And it was observed that an increase in distance causes a decrease in the number of colored pixels, thereby reducing processing time. Finally, processing time at 5 m is the fastest due to the least amount of skin-colored pixels. Compared with the other implementations, when the distance was 1 m, 3 m and 5 m, the processing time of the proposed CPU-GPU method was ms, Figure 9. Results from the Images of the Groups dataset. Figure 10. Results from the INHA_FACE dataset. 74

6 35.16 ms, and ms, respectively, which shows that the CPU- GPU method had the best performance regarding processing time. Table 1. Comparison of the performance of different methods with the Image of Groups dataset Method Execution time (ms) fps Speedup CPU only GPU only x CPU-GPU x Table 2. Comparison of the performance of different methods with the INHA FACE dataset Method Execution time (ms) fps Speedup CPU only GPU only x CPU-GPU x Tables 1and 2 compare the performance of each method using the Image of Groups and the INHA FACE datasets, respectively. It is obvious that the method proposed in this study achieves 3.3 times and 6.29 times increased processing times compared to the CPU only method with the Image of Groups and INHA FACE datasets, respectively. Additionally, note that real-time processing was obtained with the INHA FACE dataset. 7. Conclusions This paper presents an optimized parallel implementation of the Viola - Jones face detection algorithm as a case study into mapping a computer vision application on a mobile SoC using CPU-GPU co-processing. To explore both the CPU and GPU computational power, we discussed several parallelization and optimization methods to accelerate the algorithm: CPU GPU task parallelism, sliding window parallelism, scale image parallelism, dynamic allocation of work-items, and local memory optimization. These methods resolved the imbalanced workload problem and improved the processing time in mobile SoCs. The performance is much better than a well-optimized CPU implementation from the OpenCV library. Finally, for future work, we plan to experiment with power consumption and port this algorithm to other mobile devices to validate and optimize our work. 8. ACKNOWLEDGEMENTS This work was supported by the Industrial Strategic Technology Development Program ( , The Development of Fusion Processor based on Multi-Shader GPU) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea) 9. References [1] Gallagher, A.C. and Chen, T Understanding Images of Groups of People. Computer Vision and Pattern Recognition (CVPR). (2009), [2] Hefenbrock, D., Oberg, J., Thanh, N.T.N., Kastner, R. and Baden, S.B Accelerating Viola-Jones face detection to FPGA-level using GPUs. Proceedings - IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM (2010), [3] Jia, H., Zhang, Y., Wang, W. and Xu, J Accelerating Viola-Jones Facce Detection Algorithm on GPUs IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems. (2012), [4] Kakumanu, P., Makrogiannis, S. and Bourbakis, N A survey of skin-color modeling and detection methods. Pattern Recognition. 40, 3 (2007), [5] Kang, S.H., Lee, S. and Park, I.K Parallelization and Optimization of Feature Detection Algorithms on Embedded GPU. (2014), M. Rahman, J.Ren, and N. Kehtarnavaz, Real-time implementation of robust face detection on mobile platforms, IEEE ICASSP 09, pp. 1353, [6] Li, E., Wang, B., Yang, L., Peng, Y., Du, Y., Zhang, Y. and Chiu, Y.-J GPU and CPU Cooperative Accelaration for Face Detection on Modern Processors IEEE International Conference on Multimedia and Expo. (2012), [7] Liu, X., Lou, Y., Yu, A. and Lang, B Search by mobile image based on visual and spatial consistency. Multimedia and Expo (ICME), (2011), 1 6. [8] Munshi, A., and Leech, J., OpenGL ES common profile specification version (full specification). Khronos Group. [9] Munshi, A., OpenCL specification 1.1. Khronos OpenCL Working Group. [10] Nvidia. CUDA RUNTIME API, March [11] Obukhov, A Haar classifiers for object detection with cuda. GPU Computing Gems Emerald Edition, [12] Oro, D., Fern ndez, C., Segura, C., Martorell, X. and Hernando, J Accelerating Boosting-Based Face Detection on GPUs st International Conference on Parallel Processing. (2012), [13] Oro, D., Fernández, C., Saeta, J.R., Martorell, X. and Hernando, J Real-time GPU-based face detection in HD video sequences. Proceedings of the IEEE International Conference on Computer Vision. (2011), [14] Pulli, K., Baksheev, A., Kornyakov, K. and Eruhimov, V Real-time computer vision with OpenCV. Communications of the ACM. 55, 6 (2012), 61. [15] Rahman, M., Ren, J. and Kehtarnavaz, N Realtime implementation of robust face detection on mobile platforms. Acoustics, Speech and Signal Processing, ICASSP IEEE International Conference on. (2009), [16] Sharma, B., Thota, R., Vydyanathan, N. and Kale, A Towards a robust, real-time face processing system using CUDA-enabled GPUs International Conference on High Performance Computing (HiPC). (2009),

7 [17] Viola, P., Jones, M Rapid object detection using a boosted cascade of simple features. Computer Vision and Pattern Recognition (CVPR) 1, I 511 I 518. [18] Viola, P., Jones, M Robust real-time face detection. International journal of computer vision 57, 2, [19] Wagner, D., Schmalstieg, D History and future of tracking for mobile phone augmented reality IEEE International Symposium on Ubiquitous Virtual Reality,7-10. [20] Wang, G., Rister, B. and Cavallaro, J.R Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone IEEE Global Conference on Signal and Information Processing (December 2013), [21] Wang, G., Xiong, Y., Yun, J. and Cavallaro, J.R Accelerating computer vision algorithms using OpenCL framework on the mobile GPU - A case study. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. (2013),

Performance Estimation of Parallel Face Detection Algorithm on Multi-Core Platforms

Performance Estimation of Parallel Face Detection Algorithm on Multi-Core Platforms Performance Estimation of Parallel Face Detection Algorithm on Multi-Core Platforms Subhi A. Bahudaila and Adel Sallam M. Haider Information Technology Department, Faculty of Engineering, Aden University.

More information

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture XIV International PhD Workshop OWD 2012, 20 23 October 2012 Optimal structure of face detection algorithm using GPU architecture Dmitry Pertsau, Belarusian State University of Informatics and Radioelectronics

More information

Face Detection CUDA Accelerating

Face Detection CUDA Accelerating Face Detection CUDA Accelerating Jaromír Krpec Department of Computer Science VŠB Technical University Ostrava Ostrava, Czech Republic krpec.jaromir@seznam.cz Martin Němec Department of Computer Science

More information

ACCELERATING COMPUTER VISION ALGORITHMS USING OPENCL FRAMEWORK ON THE MOBILE GPU - A CASE STUDY

ACCELERATING COMPUTER VISION ALGORITHMS USING OPENCL FRAMEWORK ON THE MOBILE GPU - A CASE STUDY ACCELERATING COMPUTER VISION ALGORITHMS USING OPENCL FRAMEWORK ON THE MOBILE GPU - A CASE STUDY Guohui Wang*, Yingen Xiong, Jay Yun, and Joseph R. Cavallaro* *ECE Department, Rice University, Houston,

More information

Energy Efficient Object Detection on the Mobile GP- GPU

Energy Efficient Object Detection on the Mobile GP- GPU Energy Efficient Object Detection on the Mobile GP- GPU Fitsum Assamnew Andargie, Jonathan Rose, Todd Austin, and Valeria Bertacco School of Electrical and Computer Engineering, Addis Ababa University,

More information

Face Detection on CUDA

Face Detection on CUDA 125 Face Detection on CUDA Raksha Patel Isha Vajani Computer Department, Uka Tarsadia University,Bardoli, Surat, Gujarat Abstract Face Detection finds an application in various fields in today's world.

More information

Maximizing Face Detection Performance

Maximizing Face Detection Performance Maximizing Face Detection Performance Paulius Micikevicius Developer Technology Engineer, NVIDIA GTC 2015 1 Outline Very brief review of cascaded-classifiers Parallelization choices Reducing the amount

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9 General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

Parallel face Detection and Recognition on GPU

Parallel face Detection and Recognition on GPU Parallel face Detection and Recognition on GPU Shivashankar J. Bhutekar 1, Arati K. Manjaramkar 2 1 Research Scholar 2 Associate Professor Shri Guru Gobind Singhji Institute of Engineering and Technology

More information

Design guidelines for embedded real time face detection application

Design guidelines for embedded real time face detection application Design guidelines for embedded real time face detection application White paper for Embedded Vision Alliance By Eldad Melamed Much like the human visual system, embedded computer vision systems perform

More information

A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods

A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.5, May 2009 181 A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods Zahra Sadri

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7 General Purpose GPU Programming Advanced Operating Systems Tutorial 7 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

Progress Report of Final Year Project

Progress Report of Final Year Project Progress Report of Final Year Project Project Title: Design and implement a face-tracking engine for video William O Grady 08339937 Electronic and Computer Engineering, College of Engineering and Informatics,

More information

Viola Jones Face Detection. Shahid Nabi Hiader Raiz Muhammad Murtaz

Viola Jones Face Detection. Shahid Nabi Hiader Raiz Muhammad Murtaz Viola Jones Face Detection Shahid Nabi Hiader Raiz Muhammad Murtaz Face Detection Train The Classifier Use facial and non facial images Train the classifier Find the threshold value Test the classifier

More information

FACE DETECTION BY HAAR CASCADE CLASSIFIER WITH SIMPLE AND COMPLEX BACKGROUNDS IMAGES USING OPENCV IMPLEMENTATION

FACE DETECTION BY HAAR CASCADE CLASSIFIER WITH SIMPLE AND COMPLEX BACKGROUNDS IMAGES USING OPENCV IMPLEMENTATION FACE DETECTION BY HAAR CASCADE CLASSIFIER WITH SIMPLE AND COMPLEX BACKGROUNDS IMAGES USING OPENCV IMPLEMENTATION Vandna Singh 1, Dr. Vinod Shokeen 2, Bhupendra Singh 3 1 PG Student, Amity School of Engineering

More information

Neural Network Implementation using CUDA and OpenMP

Neural Network Implementation using CUDA and OpenMP Neural Network Implementation using CUDA and OpenMP Honghoon Jang, Anjin Park, Keechul Jung Department of Digital Media, College of Information Science, Soongsil University {rollco82,anjin,kcjung}@ssu.ac.kr

More information

Detection of a Single Hand Shape in the Foreground of Still Images

Detection of a Single Hand Shape in the Foreground of Still Images CS229 Project Final Report Detection of a Single Hand Shape in the Foreground of Still Images Toan Tran (dtoan@stanford.edu) 1. Introduction This paper is about an image detection system that can detect

More information

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes. HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation

More information

Vehicle Detection Method using Haar-like Feature on Real Time System

Vehicle Detection Method using Haar-like Feature on Real Time System Vehicle Detection Method using Haar-like Feature on Real Time System Sungji Han, Youngjoon Han and Hernsoo Hahn Abstract This paper presents a robust vehicle detection approach using Haar-like feature.

More information

Fast Face Detection Assisted with Skin Color Detection

Fast Face Detection Assisted with Skin Color Detection IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 4, Ver. II (Jul.-Aug. 2016), PP 70-76 www.iosrjournals.org Fast Face Detection Assisted with Skin Color

More information

CPU-GPU hybrid computing for feature extraction from video stream

CPU-GPU hybrid computing for feature extraction from video stream LETTER IEICE Electronics Express, Vol.11, No.22, 1 8 CPU-GPU hybrid computing for feature extraction from video stream Sungju Lee 1, Heegon Kim 1, Daihee Park 1, Yongwha Chung 1a), and Taikyeong Jeong

More information

Storage Architecture and Software Support for SLC/MLC Combined Flash Memory

Storage Architecture and Software Support for SLC/MLC Combined Flash Memory Storage Architecture and Software Support for SLC/MLC Combined Flash Memory Soojun Im and Dongkun Shin Sungkyunkwan University Suwon, Korea {lang33, dongkun}@skku.edu ABSTRACT We propose a novel flash

More information

Utilizing Graphics Processing Units for Rapid Facial Recognition using Video Input

Utilizing Graphics Processing Units for Rapid Facial Recognition using Video Input Utilizing Graphics Processing Units for Rapid Facial Recognition using Video Input Charles Gala, Dr. Raj Acharya Department of Computer Science and Engineering Pennsylvania State University State College,

More information

ASYNCHRONOUS SHADERS WHITE PAPER 0

ASYNCHRONOUS SHADERS WHITE PAPER 0 ASYNCHRONOUS SHADERS WHITE PAPER 0 INTRODUCTION GPU technology is constantly evolving to deliver more performance with lower cost and lower power consumption. Transistor scaling and Moore s Law have helped

More information

Parallel Processing of Multimedia Data in a Heterogeneous Computing Environment

Parallel Processing of Multimedia Data in a Heterogeneous Computing Environment Parallel Processing of Multimedia Data in a Heterogeneous Computing Environment Heegon Kim, Sungju Lee, Yongwha Chung, Daihee Park, and Taewoong Jeon Dept. of Computer and Information Science, Korea University,

More information

Advanced Imaging Applications on Smart-phones Convergence of General-purpose computing, Graphics acceleration, and Sensors

Advanced Imaging Applications on Smart-phones Convergence of General-purpose computing, Graphics acceleration, and Sensors Advanced Imaging Applications on Smart-phones Convergence of General-purpose computing, Graphics acceleration, and Sensors Sriram Sethuraman Technologist & DMTS, Ittiam 1 Overview Imaging on Smart-phones

More information

MediaTek Video Face Beautify

MediaTek Video Face Beautify MediaTek Video Face Beautify November 2014 2014 MediaTek Inc. Table of Contents 1 Introduction... 3 2 The MediaTek Solution... 4 3 Overview of Video Face Beautify... 4 4 Face Detection... 6 5 Skin Detection...

More information

Real-time Background Subtraction Based on GPGPU for High-Resolution Video Surveillance

Real-time Background Subtraction Based on GPGPU for High-Resolution Video Surveillance Real-time Background Subtraction Based on GPGPU for High-Resolution Video Surveillance Sunhee Hwang sunny16@yonsei.ac.kr Youngjung Uh youngjung.uh@yonsei.ac.kr Minsong Ki kms2014@yonsei.ac.kr Kwangyong

More information

Face tracking. (In the context of Saya, the android secretary) Anton Podolsky and Valery Frolov

Face tracking. (In the context of Saya, the android secretary) Anton Podolsky and Valery Frolov Face tracking (In the context of Saya, the android secretary) Anton Podolsky and Valery Frolov Introduction Given the rather ambitious task of developing a robust face tracking algorithm which could be

More information

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Detection Recognition Sally History Early face recognition systems: based on features and distances

More information

Fast Natural Feature Tracking for Mobile Augmented Reality Applications

Fast Natural Feature Tracking for Mobile Augmented Reality Applications Fast Natural Feature Tracking for Mobile Augmented Reality Applications Jong-Seung Park 1, Byeong-Jo Bae 2, and Ramesh Jain 3 1 Dept. of Computer Science & Eng., University of Incheon, Korea 2 Hyundai

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

Classifier Case Study: Viola-Jones Face Detector

Classifier Case Study: Viola-Jones Face Detector Classifier Case Study: Viola-Jones Face Detector P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-time face detection.

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

Design of a Dynamic Data-Driven System for Multispectral Video Processing

Design of a Dynamic Data-Driven System for Multispectral Video Processing Design of a Dynamic Data-Driven System for Multispectral Video Processing Shuvra S. Bhattacharyya University of Maryland at College Park ssb@umd.edu With contributions from H. Li, K. Sudusinghe, Y. Liu,

More information

Adaptive Feature Extraction with Haar-like Features for Visual Tracking

Adaptive Feature Extraction with Haar-like Features for Visual Tracking Adaptive Feature Extraction with Haar-like Features for Visual Tracking Seunghoon Park Adviser : Bohyung Han Pohang University of Science and Technology Department of Computer Science and Engineering pclove1@postech.ac.kr

More information

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Copyright Khronos Group Page 1. Vulkan Overview. June 2015 Copyright Khronos Group 2015 - Page 1 Vulkan Overview June 2015 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon Open Consortium creating OPEN STANDARD APIs for hardware acceleration

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior

More information

Mouse Pointer Tracking with Eyes

Mouse Pointer Tracking with Eyes Mouse Pointer Tracking with Eyes H. Mhamdi, N. Hamrouni, A. Temimi, and M. Bouhlel Abstract In this article, we expose our research work in Human-machine Interaction. The research consists in manipulating

More information

Introduction. How? Rapid Object Detection using a Boosted Cascade of Simple Features. Features. By Paul Viola & Michael Jones

Introduction. How? Rapid Object Detection using a Boosted Cascade of Simple Features. Features. By Paul Viola & Michael Jones Rapid Object Detection using a Boosted Cascade of Simple Features By Paul Viola & Michael Jones Introduction The Problem we solve face/object detection What's new: Fast! 384X288 pixel images can be processed

More information

GPGPU on Mobile Devices

GPGPU on Mobile Devices GPGPU on Mobile Devices Introduction Addressing GPGPU for very mobile devices Tablets Smartphones Introduction Why dedicated GPUs in mobile devices? Gaming Physics simulation for realistic effects 3D-GUI

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

Portable GPU-Based Artificial Neural Networks For Data-Driven Modeling

Portable GPU-Based Artificial Neural Networks For Data-Driven Modeling City University of New York (CUNY) CUNY Academic Works International Conference on Hydroinformatics 8-1-2014 Portable GPU-Based Artificial Neural Networks For Data-Driven Modeling Zheng Yi Wu Follow this

More information

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Ralph Ma, Amr Mohamed ralphma@stanford.edu, amr1@stanford.edu Abstract Much research has been done in the field of automated

More information

Face Recognition for Mobile Devices

Face Recognition for Mobile Devices Face Recognition for Mobile Devices Aditya Pabbaraju (adisrinu@umich.edu), Srujankumar Puchakayala (psrujan@umich.edu) INTRODUCTION Face recognition is an application used for identifying a person from

More information

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been

More information

Learning to Detect Faces. A Large-Scale Application of Machine Learning

Learning to Detect Faces. A Large-Scale Application of Machine Learning Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P. Viola and M. Jones, International Journal of Computer

More information

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Occlusion Detection of Real Objects using Contour Based Stereo Matching Occlusion Detection of Real Objects using Contour Based Stereo Matching Kenichi Hayashi, Hirokazu Kato, Shogo Nishida Graduate School of Engineering Science, Osaka University,1-3 Machikaneyama-cho, Toyonaka,

More information

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games Viewdle Inc. 1 Use cases Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games 2 Why OpenCL matter? OpenCL is going to bring such

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1 Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Ecosystem @neilt3d Copyright Khronos Group 2015 - Page 1 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

Using Graphics Chips for General Purpose Computation

Using Graphics Chips for General Purpose Computation White Paper Using Graphics Chips for General Purpose Computation Document Version 0.1 May 12, 2010 442 Northlake Blvd. Altamonte Springs, FL 32701 (407) 262-7100 TABLE OF CONTENTS 1. INTRODUCTION....1

More information

Mixing Graphics and Compute for Real-Time Multiview Human Body Tracking

Mixing Graphics and Compute for Real-Time Multiview Human Body Tracking Mixing Graphics and Compute for Real-Time Multiview Human Body Tracking Boguslaw Rymut 2 and Bogdan Kwolek 1 1 AGH University of Science and Technology 30 Mickiewicza Av., 30-059 Krakow, Poland bkw@agh.edu.pl

More information

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

Window based detectors

Window based detectors Window based detectors CS 554 Computer Vision Pinar Duygulu Bilkent University (Source: James Hays, Brown) Today Window-based generic object detection basic pipeline boosting classifiers face detection

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

Face Detection on OpenCV using Raspberry Pi

Face Detection on OpenCV using Raspberry Pi Face Detection on OpenCV using Raspberry Pi Narayan V. Naik Aadhrasa Venunadan Kumara K R Department of ECE Department of ECE Department of ECE GSIT, Karwar, Karnataka GSIT, Karwar, Karnataka GSIT, Karwar,

More information

Improved Integral Histogram Algorithm. for Big Sized Images in CUDA Environment

Improved Integral Histogram Algorithm. for Big Sized Images in CUDA Environment Contemporary Engineering Sciences, Vol. 7, 2014, no. 24, 1415-1423 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.49174 Improved Integral Histogram Algorithm for Big Sized Images in CUDA

More information

GPU-based pedestrian detection for autonomous driving

GPU-based pedestrian detection for autonomous driving Procedia Computer Science Volume 80, 2016, Pages 2377 2381 ICCS 2016. The International Conference on Computational Science GPU-based pedestrian detection for autonomous driving V. Campmany 1,2, S. Silva

More information

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013 GPGPU on ARM Tom Gall, Gil Pitney, 30 th Oct 2013 Session Description This session will discuss the current state of the art of GPGPU technologies on ARM SoC systems. What standards are there? Where are

More information

An Acceleration Scheme to The Local Directional Pattern

An Acceleration Scheme to The Local Directional Pattern An Acceleration Scheme to The Local Directional Pattern Y.M. Ayami Durban University of Technology Department of Information Technology, Ritson Campus, Durban, South Africa ayamlearning@gmail.com A. Shabat

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied

More information

Efficient and Fast Multi-View Face Detection Based on Feature Transformation

Efficient and Fast Multi-View Face Detection Based on Feature Transformation Efficient and Fast Multi-View Face Detection Based on Feature Transformation Dongyoon Han*, Jiwhan Kim*, Jeongwoo Ju*, Injae Lee**, Jihun Cha**, Junmo Kim* *Department of EECS, Korea Advanced Institute

More information

Skin and Face Detection

Skin and Face Detection Skin and Face Detection Linda Shapiro EE/CSE 576 1 What s Coming 1. Review of Bakic flesh detector 2. Fleck and Forsyth flesh detector 3. Details of Rowley face detector 4. Review of the basic AdaBoost

More information

ROBUST REAL TIME FACE RECOGNITION AND TRACKING ON GPU USING FUSION OF RGB AND DEPTH IMAGE.

ROBUST REAL TIME FACE RECOGNITION AND TRACKING ON GPU USING FUSION OF RGB AND DEPTH IMAGE. ROBUST REAL TIME FACE RECOGNITION AND TRACKING ON GPU USING FUSION OF RGB AND DEPTH IMAGE. Narmada Naik 1 and Dr.G.N Rathna 2 1 Department of Electrical Engineering, Indian Institute of science, Bangalore,

More information

Embedded Face Detection Application based on Local Binary Patterns

Embedded Face Detection Application based on Local Binary Patterns Embedded Face Detection Application based on Local Binary Patterns Laurentiu Acasandrei Instituto de Microelectrónica de Sevilla IMSE-CNM-CSIC Sevilla, Spain laurentiu@imse-cnm.csic.es Angel Barriga Instituto

More information

Criminal Identification System Using Face Detection and Recognition

Criminal Identification System Using Face Detection and Recognition Criminal Identification System Using Face Detection and Recognition Piyush Kakkar 1, Mr. Vibhor Sharma 2 Information Technology Department, Maharaja Agrasen Institute of Technology, Delhi 1 Assistant Professor,

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Face Detection and Alignment. Prof. Xin Yang HUST

Face Detection and Alignment. Prof. Xin Yang HUST Face Detection and Alignment Prof. Xin Yang HUST Many slides adapted from P. Viola Face detection Face detection Basic idea: slide a window across image and evaluate a face model at every location Challenges

More information

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors Weifeng Liu and Brian Vinter Niels Bohr Institute University of Copenhagen Denmark {weifeng, vinter}@nbi.dk March 1, 2014 Weifeng

More information

Deep Learning Based Real-time Object Recognition System with Image Web Crawler

Deep Learning Based Real-time Object Recognition System with Image Web Crawler , pp.103-110 http://dx.doi.org/10.14257/astl.2016.142.19 Deep Learning Based Real-time Object Recognition System with Image Web Crawler Myung-jae Lee 1, Hyeok-june Jeong 1, Young-guk Ha 2 1 Department

More information

Disguised Face Identification Based Gabor Feature and SVM Classifier

Disguised Face Identification Based Gabor Feature and SVM Classifier Disguised Face Identification Based Gabor Feature and SVM Classifier KYEKYUNG KIM, SANGSEUNG KANG, YUN KOO CHUNG and SOOYOUNG CHI Department of Intelligent Cognitive Technology Electronics and Telecommunications

More information

Exploiting scene constraints to improve object detection algorithms for industrial applications

Exploiting scene constraints to improve object detection algorithms for industrial applications Exploiting scene constraints to improve object detection algorithms for industrial applications PhD Public Defense Steven Puttemans Promotor: Toon Goedemé 2 A general introduction Object detection? Help

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Emerging Vision Technologies: Enabling a New Era of Intelligent Devices

Emerging Vision Technologies: Enabling a New Era of Intelligent Devices Emerging Vision Technologies: Enabling a New Era of Intelligent Devices Computer vision overview Computer vision is being integrated in our daily lives Acquiring, processing, and understanding visual data

More information

Accelerating MapReduce on a Coupled CPU-GPU Architecture

Accelerating MapReduce on a Coupled CPU-GPU Architecture Accelerating MapReduce on a Coupled CPU-GPU Architecture Linchuan Chen Xin Huo Gagan Agrawal Department of Computer Science and Engineering The Ohio State University Columbus, OH 43210 {chenlinc,huox,agrawal}@cse.ohio-state.edu

More information

Mobile Face Recognization

Mobile Face Recognization Mobile Face Recognization CS4670 Final Project Cooper Bills and Jason Yosinski {csb88,jy495}@cornell.edu December 12, 2010 Abstract We created a mobile based system for detecting faces within a picture

More information

Project Report for EE7700

Project Report for EE7700 Project Report for EE7700 Name: Jing Chen, Shaoming Chen Student ID: 89-507-3494, 89-295-9668 Face Tracking 1. Objective of the study Given a video, this semester project aims at implementing algorithms

More information

Parallel Tracking. Henry Spang Ethan Peters

Parallel Tracking. Henry Spang Ethan Peters Parallel Tracking Henry Spang Ethan Peters Contents Introduction HAAR Cascades Viola Jones Descriptors FREAK Descriptor Parallel Tracking GPU Detection Conclusions Questions Introduction Tracking is a

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v7.0 March 2015 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Parallelizing Inline Data Reduction Operations for Primary Storage Systems Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr

More information

Local Difference Binary for Ultrafast and Distinctive Feature Description

Local Difference Binary for Ultrafast and Distinctive Feature Description Local Difference Binary for Ultrafast and Distinctive Feature Description Xin Yang, K.-T. Tim Cheng IEEE Trans. on Pattern Analysis and Machine Intelligence, 2014, January *Source code has been released

More information

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee

More information

Bifrost - The GPU architecture for next five billion

Bifrost - The GPU architecture for next five billion Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor

More information

An Approach for Real Time Moving Object Extraction based on Edge Region Determination

An Approach for Real Time Moving Object Extraction based on Edge Region Determination An Approach for Real Time Moving Object Extraction based on Edge Region Determination Sabrina Hoque Tuli Department of Computer Science and Engineering, Chittagong University of Engineering and Technology,

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

GPU Based Face Recognition System for Authentication

GPU Based Face Recognition System for Authentication GPU Based Face Recognition System for Authentication Bhumika Agrawal, Chelsi Gupta, Meghna Mandloi, Divya Dwivedi, Jayesh Surana Information Technology, SVITS Gram Baroli, Sanwer road, Indore, MP, India

More information

A robust method for automatic player detection in sport videos

A robust method for automatic player detection in sport videos A robust method for automatic player detection in sport videos A. Lehuger 1 S. Duffner 1 C. Garcia 1 1 Orange Labs 4, rue du clos courtel, 35512 Cesson-Sévigné {antoine.lehuger, stefan.duffner, christophe.garcia}@orange-ftgroup.com

More information

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction

More information