AN HARDWARE ALGORITHM FOR REAL TIME IMAGE IDENTIFICATION 1

730 AN HARDWARE ALGORITHM FOR REAL TIME IMAGE IDENTIFICATION 1 BHUVANESH KUMAR HALAN, 2 MANIKANDABABU.C.S 1 ME VLSI DESIGN Student, SRI RAMAKRISHNA ENGINEERING COLLEGE, COIMBATORE, India (Member of IEEE) 2 Assistant Professor, Department of ECE (PG- ME VLSI DESIGN), SRI RAMAKRISHNA ENGINEERING COLLEGE, COIMBATORE, India 1 bhuvihalanjoghee25@gmail.com, 2 manikandababu.srec@gmail.com ABSTRACT Real-time object detection is important for surveillance applications. Here we describe a high- performance object detector using a commercially available FPGA. Major bottlenecks in the real AdaBoost classifier are resolved. A new FIR-filter-like hardware architecture takes advantage of an FPGA s hardware parallelism and block-ram structure. The resulting design uses Xilinx Virtex 5 and achieves the real-time processing performance of 220 f/s at 201 MHz and adjustable recognition performance with a variable number of weak classifiers. This is the first demonstration of a histogram of oriented gradients and Real AdaBoost detector on an FPG. A new hardware algorithm for real- time object detection based on the Real AdaBoost classifier and histograms of oriented gradients (HOG) features. Real- time object detection is important for various surveillance applications. Multiple network cameras provide multiple video streams simultaneously. The system must continuously detect various objects, such as pedestrians, vehicles and animals. The speed of detectors and flexible adaptability in their functions. Keywords: HOG and Real AdaBoost Algorithms,Real AdaBoost Scanning, Summation Mechanism. [1] INTRODUCTION A new hardware algorithm for real- time object detection based on the Real AdaBoost classifier and histograms of oriented gradients (HOG) features. Real- time object detection is important for various surveillance applications. Multiple network cameras provide multiple video streams simultaneously. The system must continuously detect various objects, such as pedestrians, vehicles and animals. The speed of detectors and flexible adaptability in their functions are therefore attracting much attention. an object image detection system is constructed of two parts: a feature extractor, which extracts features from a given image, and a classifier, which classifies features on the basis of their values. Viola and Jones [1] investigated a detector framework that uses Haar-like features and a cascade classifier based on AdaBoost learning. This eases calculation effort effectively. There have been many implementation studies [3-7] based on this framework. The performance of the classification algorithm strongly depends on the performance of the foregoing feature extraction algorithm. A HOG feature algorithm that performs well under various illumination conditions. On the other hand, R. Schapire et al. have improved and generalized the learning and classification algorithm to Real AdaBoost, which can achieve faster convergence of learning and better recognition rates than the conventional AdaBoost algorithm. An implementation of a HOG and Real AdaBoost detector on an FPGA with the maximum performance of over 220 f/s. If camera images can be multiplexed, this frame-rate performance will enable seven streams of 30 f/s VGA video to be recognized on a single FPGA. This means the proposed architecture is suitable for surveillance applications that have to simultaneously process images from many cameras distributed. In addition, it is easy to fit the various conditions of search targets by reconfiguration of FPGA. 2. HOG AND REAL ADABOOST ALGORITHMS: Detection algorithm is based on the theory of a Real AdaBoost classifier using a HOG feature extractor. The overall algorithm is described below. using AdaBoost until the required detection performance was achieved. In this paper we present a new training algorithm designed specifically for a classifier cascade called asymmetric AdaBoost. The algorithm is a generalization of that given in Singer and Shapire [6]. Many of the formal guarantees presented by Singer and Shapire also hold for this new algorithm. The paper concludes with a set of experiments in the domain of face detection demonstrating that asymmetric AdaBoost yields a significant improvement in detection performance over conventional boosting. // HOG feature extraction for (all pixels) { calculate intensity gradient } for (all cell ) { construct histogram of gradient } for (all block ) { concatenate and normalize all histogram in cell}

731 // Real AdaBoost classification sub-window scanning loop for (all sub-window ) { set and pick up new sub-window Real AdaBoost classification } Fig.1. HOG AND REAL ADABOOST ALGORITHM The hardware design techniques to accelerate the processing speed of face detection. The face detection system generates an integral image window to perform a feature classification during one clock cycle. And then it performs classification operations in parallel using classifiers to detect a face in the image sequence. The main contribution of our work, described in this paper, is design and implementation of a physically feasible hardware system to accelerate the processing speed of the operations required for real- time face detection. Therefore, this work has resulted in the development of a real-time face detection system employing an FPGA implemented system designed by Verilog HDL. Its performance has been measured and compared with an equivalent software implementation. 3. PROPOSED HOG FEATURE EXTRACTOR AND ADABOOST SCANNING: HOG features consist of accumulated histograms sorted by the orientation of the intensity gradient. The first step of the intensity gradient calculation is given where I(x,y) is intensity for each pixel at (x,y). The upper- left corner of a given image is the origin, where the x-axis is left to right and the y-axis is top to bottom. m(x,y) is the magnitude of the intensity gradient and θ(x,y) is its orientation. m (x,y) = di x (x,y) 2 + di y (x,y) 2 (1) Ѳ(x,y) = tan -1 (di y(x,y)/di x(x,y)) (2) A cell is constructed as a set of horizontally and vertically neighboring pixels. The cell Ci,j is placed at row i and column j. The row and column unit is cell count. The histogram consists of several bins sorted by orientation. The number of bins is Q, where ui,j,q (q=0,1,2, Q-1) represents the height of individual bins. This represents the accumulated magnitude whose orientation is matched to the bin for all pixels in the cell. The third step is to concatenate and normalize the histograms in each block. A HOG feature vector vi,j of block Bi,j positioned. The third step is to concatenate and normalize the histograms in each block 3.1 REAL ADABOOST SCANNING: The Real AdaBoost classification process with scanning. In the scanning loop procedure, in the Real AdaBoost classifier picks up a portion of the frame of HOG feature vectors as a sub-window. And in each k-th component sub-window, each component of the feature vector is evaluated by the associated weak classifier whose outputs are summed.the sub- window moves slightly at the end of each calculation, and the scanning loop process is repeated for all possible sub-window positions in a frame of HOG feature vectors. The classifier is composed of a sufficient number of weak classifiers as shown in Fig. 1(b). This set of weak classifiers is obtained through the learning procedure in advance, which is originally written in software based on the Real AdaBoost [8] learning algorithm. Fig.2. Real AdaBoost Classifier And Scanning Each weak classifier has a function hr,s,k. A k-th component of the feature vector at (r,s) on the sub- window is associated as an argument of hr,s,k. The classification result for this sub-window placed at row i and column j, D(i,j) is defined as a standard classifier and scanning. In our system each classifier in the cascade is a single

732 layer perceptron whose input is a set of computationally efficient binary features. The computational cost of each classifier is then simply the number of input features.. 3.1.1 SCANNING MECHANISM Scanning is the procedure for selecting each sub-window from the frame sequentially and transferring the feature value from the sub-window to weak classifiers, like video raster scanning. Since the overlap of subsequent sub-windows is large, the naive scanning causes many duplicate references of features at the same position. As a result, there is huge data transfer from feature memory to many weak classifiers. The integral image window buffer stores integral pixel values moving from the image window buffer and its controller generates control signals for moving and calculating the integral pixel values. Since pixels of an integral image window buffer are stored in registers, it is possible to access all integral pixels in the integral image window buffer simultaneously to perform the Haar feature classification. For incoming pixel with coordinate (i, j), the integral image window buffer controller performs operation as in (3) where n is the row and column size of the integral image window buffer. II(s, t) represents each of the integral pixels in the integral image window buffer; and I(i, j) represents each of the pixels in the image window buffer. The valuable computational time as well as power are wasted on potentially non-promising regions. An efficient way to eliminate windows prior to the classification process, in a manner that can be parallelized in hardware and does not require many hardware resources, is by utilizing edge information. Edges provide information about visual features in an image and thus the number of edge pixels in an image can give an indication of the useful information in a particular image region. An early approach to eliminate windows from the classification process, in addition to constructing the windows, was first proposed in [Anilla and Devarajan, 2010]. Overall, the edge-based window rejection process involves obtaining the edge image, using an edge detection algorithm such as scan the image, and reject windows depending on the number of edge pixels. Specifically, after obtaining the edges of a particular image region (window), the edge pixels are counted and if they exceed a certain threshold, which is assigned according to the object of interest, the window is considered for classification; otherwise it is not classified and considered as a non-object. In [Anilla and Devarajan, 2010] a window is discarded. Fig.3. RENUMBERING HOG FEATURES It is natural to assume that the sub-window scanning is raster scanning because HOG feature vectors are extracted in a raster scan order. Let us renumber the feature vectors in a HOG frame in a one dimension as shown that is, in raster scan order. The HOG feature of the left-most cell in the top line of the HOG frame is denoted as v(0). Starting from this cell, we give a one-dimensional number to each cell in the frame in the raster scan order as shown in Fig. 2. We express these numbers by variable t and correspond t to the cycle time for the scan process described below. As shown in Fig. 3, the feature of the marked position that is the input of weak classifier hr,s in the k-th component can be rewritten to v(t-(n-s)-(m-r)w) using cycle time t. The architecture shown in Fig. 3 is obtained. Every register output in the sub-window is connected to an associated weak classifier s input. In every cycle when a component of the feature vector reaches the starting register, the components of feature vectors in this register chain are shifted. The feature vector component that reaches the end of the sub-window is discarded. Note that frame buffers are not necessary; only line buffers are. Therefore, busy transference of duplicate references to feature vectors components is completely resolved. However, the operating frequency is still low due to the delay time of summation of all weak classifier outputs. 3.2 SUMMATION MECHANISM Each term of the summation in which is given as a function value of a component of feature, can be regarded as each term of the representation of an FIR filter. We found that a technique known in regards to FIR filters can be used to reduce this delay. The faster architecture is obtained by rewriting the naive architecture in Fig. 4 using this technique. The component of a new feature vector is given to all weak classifiers at the same cycle time. Each output of a weak classifier is summed with the value of the corresponding register by the adder. And the output of each adder is stored to the next register. The final result of summation is given at the end of the

733 register chain corresponding to the bottom-right corner of the sub-window area. As a result, it is clear that the addition is only performed once within a cycle. Fig.4. A NAIVE IMPLEMETATION OF REAL ADABOOST This naive technique is only somewhat effective. The main reason is AdaBoost s balanced reweighting scheme. As a result the initially asymmetric example weights are immediately lost. Essentially the AdaBoost process is too greedy. The first classifier selected absorbs the entire effect of the initial asymmetric weights. The remaining rounds are entirely symmetric. This preserves the bound on asymmetric loss. But the effect on the training process is quite different. In order to demonstrate this approach we generated an artificial data set and learned strong classifiers containing 4 weak classifiers. In this we can see that all but the first weak classifier learned by the naive rule are poor, since they each balance positive and negative errors. The final combination of these classifiers cannot yield high detection rates without introducing many false positives. All the weak classifiers generated by the pro- posed Asymmetric Adaboost rule are consistent with asymmetric loss and the final strong classifier yields very high detection rates and modest false positive rates. 4. RESULTS We built a prototype of the object image detector using a commercially available board with Virtex 5 LX 330 and ISE12.3. Specifications of the prototyped image detector are shown in Table 1. The weak classifiers are implemented as look-up tables. The set of tables as every hr,s,k is obtained by the learning software based on Real AdaBoost in advance. Two sets of samples were processed for learning. One was a set of vehicles that contains 550 positive samples and 500 negative samples. The other was a set of pedestrians that contains 4800 positive samples and 5000 negative samples. The prototype is implemented in one chip of Virtex-5. The implementation results are shown in Table 2. The RAM usage is dominant due to the many weak classifiers. However, the register and LUT usage is fairly low. In general, real-time video processing needs to operate at 30 f/s or more. It is easy to achieve with the proposed architecture, which can process up to 220 f/s at 201 MHz according to the maximum operation frequency.. In these output images, the detection results are shown by small white rectangles at the upper left corner of the detected area. In this experiment, we switched the target object from pedestrians to vehicles by reprogramming the entire FPGA configuration using the JTAG chain while the system was running. It took about 30 seconds. Compared to the previous Haar-like-feature-based cascade classifier, the Real AdaBoost classifier based on HOG features has a very large number of weak classifiers in a single stage. We designed these weak classifiers as look-up tables using RAM on an FPGA. The implementation of RAM-based weak classifiers has advantages for detection systems in terms of flexibility. By rewriting the contents of these RAMs, the classifier can change target objects quickly if the size of the sub-window for the new target is the same or smaller.

734 Fig 4.A DETECTED FRAME OF PEDESTRIANS All weak classifiers have a common input. This means that classifiers can be implemented as a single longword-width RAM. Compared to the implementation in an ASIC, the implementation of this architecture to an FPGA is very easy because it is natural to provide long and variable word width memory using a block RAM or distributed RAM structure. This makes the size and aspect ratio of sub-window fit the target object flexibly. The block RAM usage is fairly high. However, since the proposed implementation has a repetitive structure and simple logic in memory access, the operation frequency is not degraded. 5. CONCLUSION A novel hardware algorithm for real-time object detection based on the Real AdaBoost classifier and HOG feature extractor. The proposed architecture has novel scanning and summation mechanisms against bottlenecks. The obtained performance meets real-time requirements and the maximum operation frequency of the designed and prototyped system is 201 MHz, which means that the maximum performance is 220 f/s for real VGA-sized video streams. If camera images can be multiplexed, this frame-rate performance will enable seven streams of 30 f/s VGA video to be processed on a single FPGA. We conclude that the proposed architecture is effective for HOG and Real AdaBoost object detection and suitable for surveillance services that have to process images from many distributed cameras simultaneously. REFERENCES [1] P. A. Viola, and M. J. Jones, "Fast and robust classification using asymmetric AdaBoost and a detector cascade," in Proc. NIPS'2001. pp.1311-1318. [2] Y.Freund, and R. E. Schapire,"A Decision-Theoretic generalization of on-line learning and an application to boosting," Journal of Computer and System Sciences, Vol.55, Issue1, pp.119-139, August 1997 [3] Y. Wei, X. Bing and C. Chereonsak, FPGA Implementation of AdaBoost algorithm for detection of face biometrics, International Workshop on Biomedical Circuits and Systems, vol., no., pp. S1/6-17- 20, 1-3 Dec. 2004. [4] H. Lai, M. Savvides, and T. Chen, "Proposed FPGA hardware architecture for high frame rate (>>100 fps) face detection using feature cascade classifiers," International Conference on Biometrics: Theory, Applications, and Systems, vol., no., pp.1-6, 27-29 Sept. 2007 [5] M. Hiromoto, K. Nakahara, H. Sugano, U. Nakamura, R. Miyamoto, A specialized processor suitable for AdaBoost-based detection with Haar-like features, IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR '07. pp.1-8, 17-22 June 2007 [6] R. C. Luo, and H. Liu, "Design and implementation of efficient hardware solution based sub-window architecture of Haar classifiers for real-time detection of face biometrics," International Conference on Mechatronics and Automation (ICMA), vol., no., pp.1563-1568, 4-7 Aug. 2010 [7] C.Kyrkou, T.Theocharides, A flexible parallel hardware architecture for AdaBoost-based real-time object detection, IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol.19, No.6, pp.1034-1047, June 2011 [8] N. Dalal, and B. Triggs, "Histograms of oriented gradients for human detection," IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2005., vol.1, no., pp.886-893, 25 June 2005 [9] R. E. Schapire and Y. Singer, "Improved boosting algorithms using confidence-rated predictions," Machine Learning, Vol.37, Issue.3, pp.297-336, 1999. [10] (in Japanese) C.Matsushima, Y.Yamauchi, T.Yamashita, H.Fujiyoshi, "A method for reducing number of HOG features based on Real AdaBoost," 2009-CVIM-167(32), 1-8, June 2009