Fast Component-Based QR Code Detection in Arbitrarily Acquired Images

Size: px

Start display at page:

Download "Fast Component-Based QR Code Detection in Arbitrarily Acquired Images"

Percival Thompson
5 years ago
Views:

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

1 See discussions, stats, and author profiles for this publication at: Fast Component-Based QR Code Detection in Arbitrarily Acquired Images ARTICLE in JOURNAL OF MATHEMATICAL IMAGING AND VISION MARCH 2013 Impact Factor: 1.55 DOI: /s x CITATIONS 9 READS AUTHORS, INCLUDING: Nina S. T. Hirata University of São Paulo 72 PUBLICATIONS 312 CITATIONS SEE PROFILE Available from: Nina S. T. Hirata Retrieved on: 08 April 2016

The final publication is available at springerlink.com http://link.springer.com/article/10.1007%2fs10851-012-0355-x Fast component-based QR code detection in arbitrarily acquired images Luiz F. F. Belussi Nina S.

Since they can encode alphanumeric characters, a rich set of information can be made available through encoded URL addresses.

information. However, in order to be decoded, QR codes need to be properly framed, something that robots, visually impaired and blind people will not be able to do easily without guidance.

A fast component-based two-stage approach for detecting QR codes in arbitrarily acquired images is proposed in this work.

2 The final publication is available at springerlink.com Fast component-based QR code detection in arbitrarily acquired images Luiz F. F. Belussi Nina S. T. Hirata Received: date / Accepted: date Abstract Quick Response (QR) codes are a type of 2D barcode that is becoming very popular, with several application possibilities. Since they can encode alphanumeric characters, a rich set of information can be made available through encoded URL addresses. In particular, QR codes could be used to aid visually impaired and blind people to access web based voice information systems and services, and autonomous robots to acquire context-relevant information. However, in order to be decoded, QR codes need to be properly framed, something that robots, visually impaired and blind people will not be able to do easily without guidance. Therefore, any application that aims assisting robots or visually impaired people must have the capability to detect QR codes and guide them to properly frame the code. A fast component-based two-stage approach for detecting QR codes in arbitrarily acquired images is proposed in this work. In the first stage, regular components present at three corners of the code are detected, and in the second stage geometrical restrictions among detected components are verified to confirm the presence of a code. Experimental results show a high detection rate, superior to 90%, at a fast speed compatible with real-time applications. 1 Introduction Compared to traditional 1D barcodes, 2D barcodes can encode a larger amount of data, including alphanumeric characters. QR code, which stands for Quick Response Code, is a type of two-dimensional code introduced by Denso Wave in 1994 [12]. They have been designed to be easily found and to have its size and orientation determined under bad imaging conditions. In addition, ISO/IEC specifies an error correction scheme that can recover at most 30% of occluded or damaged symbol area. These features make the QR code an extremely well succeeded technology in the area of barcodes. Figure 1 shows some examples of QR codes. Keywords QR code 2D barcode component-based detection Haar-like features cascade classifier accessibility N. S. T. H. is partly supported by CNPq (The National Council for Scientific and Technological Development, Brazil) L. F. F. Belussi and N. S. T. Hirata Department of Computer Science Institute of Mathematics and Statistics University of São Paulo Rua do Matão, , São Paulo, Brazil belussi,nina@ime.usp.br Fig. 1 Samples of QR code. QR codes were initially used by the automotive industry to track vehicle parts during the manufacturing process [12]. Nowadays, QR codes are most commonly

3 2 Luiz F. F. Belussi, Nina S. T. Hirata used as physical hyperlinks (encoded hyperlinks), notably in the advertising industry, to connect places and objects to websites containing additional context relevant information. Applications in education and entertainment [25, 8, 23], data and system security [9, 16, 19], specific service offer [27, 7], among others are also emerging. Another possible application consists in helping visually impaired and blind people, or even robots, in several tasks such as indoor navigation, shopping, reading, and much more [13,11,2]. The existing decoders, easily found for mobile devices, are able to work correctly only if codes are properly framed, with code region corresponding to at least 30% of the image. When exploring an environment, visually impaired or robots will not be able to capture such images unless they are told where those codes are located. Thus, in order to make useful applications for them viable, detecting the presence of a code in an image is a necessary step prior to the decoding process. In related literature, the majority of work that mention QR code recognition or detection is actually concerned with improving image quality or determining the exact contours of the code rather than finding (i.e, deciding whether there is or there is not) a QR code symbol in an image [22,10,20]. In fact, most of the images considered in those works are images acquired with the specific intent of capturing only the code symbol. Some few works that deal with the problem of finding QR codes propose solutions that rely on auxiliary information such as visual cues or RFID tags [11,28]. Although such approach is possible in controlled environments, it is not practical in general contexts. This work addresses the problem of detecting QR codes in arbitrarily acquired images without relying on auxiliary cues. The aim is not only to detect the presence of code symbols, but also to delimit their position within an image as accurately as possible. That would allow, additionally, to instruct an user or robot to properly approach the camera towards the code. To that end, a component-based, two-stage detection approach is proposed. In the first stage, a square pattern, called finder pattern, located at three corners of any QR code is detected using a cascaded classifier trained according to the rapid object detection method proposed in [26]. In the second stage, geometrical relationships among detected candidate finder patterns are analyzed in order to verify if there are subgroups of three of them spatially arranged as three corners of a square region. A report on preliminary results of this approach has been previously published in [3]. In this paper, a reformulated report of the approach and an extended set of results are presented. In particular, a detailed account of the second stage, including a formal description of the component aggregation algorithm, new results obtained with a larger training and test sets, discussions concerning additional parameters that have not been examined in the previous work, and quantitative results, not presented before, on QR code detection performance are presented. This text is organized as follows. In Section 2, important structural aspects of QR codes which will be explored in the proposed approach are first described. Then, an overview of the proposed two-stage approach is presented, together with a brief review on Viola- Jones method for rapid object detection, used in the first stage of the approach. In Section 3, the problem of aggregating information obtained in the first stage of the detection process is expressed in graph-based formalism and a simple algorithm to solve the problem is proposed. The procedure to determine appropriate values for the training parameters of the FIP detector, in an OpenCV implementation of Viola-Jones method, is discussed and described in Section 4. Then, in Section 5 experimental results to assess influence of some parameters in FIP as well as QR code detection, and frame rates achieved in video processing are reported. Although extensive experimental variation was carried out, not all possibilities arising from parameter variations were evaluated. Nevertheless, QR code detection rate superior to 90% was observed on test images, suggesting that even better rates can be achieved. Finally, in Section 6 a summary of the main contributions of this work and some issues for future investigation are listed. 2 Component-based approach for QR code detection In this section, some structural aspects of QR codes and detection requirements in a real-time application scenario that have motivated the proposed two-stage approach are first described. Then, an overview of the proposed approach followed by a brief review on Viola- Jones method for object detection is presented. 2.1 QR code A QR code symbol, according to the ISO/IEC standard 18004, has a general structure that comprises data, version information, error correction code words, and the following regions illustrated in Fig 2: a quiet zone around the symbol, three finder patterns (FIP) in the corners, two timing patterns (TP) between the finder patterns, and

Fast QR code detection 3 a certain number of alignment patterns (AP) inside the data area. (a) 10 (b) 100 Fig. 2 QR code structure (FIP: Finder Pattern / TP: Timing Pattern / AP: Alignment Pattern).

The smallest units (black or white squares) that compose a QR code are called modules.

Figure 3 shows an example of a version 1 code, highlighting the number of modules.

Note that the alignment patterns occur more times in larger versions of the code. 21 7=14 modules (c) 500 (d) 1000 Fig.

5 Black and white pixel proportions in finder patterns for diagonal (d), vertical (v) and horizontal (h) scan lines. 2.2 Detection requirements 21 modules 7 modules Fig.

FIPs have been specially designed to be found in any search direction as the sequence of black (b) and white (w) pixels along any scan line that passes through its center preserve the special

4 Fast QR code detection 3 a certain number of alignment patterns (AP) inside the data area. (a) 10 (b) 100 Fig. 2 QR code structure (FIP: Finder Pattern / TP: Timing Pattern / AP: Alignment Pattern). The internal organization of a code with respect to its structural components may vary depending on the amount of data encoded. The smallest units (black or white squares) that compose a QR code are called modules. There are various versions of the symbols, from 1 to 40, having distinct information storage capabilities that are determined by predefined number of modules in symbol area. Figure 3 shows an example of a version 1 code, highlighting the number of modules. Figure 4 shows additional examples that illustrate the relation between number of modules and number of encoded characters. Note that the alignment patterns occur more times in larger versions of the code. 21 7=14 modules (c) 500 (d) 1000 Fig. 4 Different versions of QR code symbols and respective number of encoded alphanumeric characters. Fig. 5 Black and white pixel proportions in finder patterns for diagonal (d), vertical (v) and horizontal (h) scan lines. 2.2 Detection requirements 21 modules 7 modules Fig. 3 QR code version 1 is composed of modules. There is a visually distinctive pattern at three corners of QR codes, known as finder pattern (FIP). FIPs have been specially designed to be found in any search direction as the sequence of black (b) and white (w) pixels along any scan line that passes through its center preserve the special sequence and size ratio w:b:w:bbb:w:b:w, as can be seen in Fig. 5. One of the first requirements considering real-time applications is fast detection. Moreover, since there is no prior knowledge about the size of a code, another important feature of the detector is scale invariance. In addition, in order to succeed in uncontrolled environments, the detector should be also robust with respect to variation in illumination, rotation, and perspective distortion. In order to establish some limits to the detection task, in this work attention is paid to scale invariance and robustness with respect to variation in illumination condition, while rotations and perspective distortions are assumed to be small. Further than that, it is assumed that QR codes are printed in black and white (or at least dark foreground and light background). Color is not an issue since all processing considers gray tone images.

5 4 Luiz F. F. Belussi, Nina S. T. Hirata 2.3 Overview of the proposed approach Direct detection of the whole code may not be an easy task because the internal structural organization may vary from code to code, depending on code version. Moreover, extraction of features of the code area, such as histograms or frequency information, can be greatly affected by image resolution, code scale, or noise. For instance, algorithms for recognition of QR codes frequently rely on the regularity of scan lines over the FIPs (see Fig. 5) to find them and then determine the exact contours of the code. However, that regularity may be disrupted in low quality images. Thus, in order to deal with unconstrained images, the processing algorithm should not depend on fine details. On the other hand, FIPs are typically the largest structures within a code, have a fixed shape, and appear in exactly three corners of all codes. These characteristics of FIPs suggest a component-based detection approach to the problem of detecting QR codes. The main idea of component-based detection [21] is the detection of parts of the object and then the analysis of detected parts in order to find a coherent arrangement forming the integral object. Arrangement of the parts can take into consideration geometrical restrictions. When the relation between parts can not be easily stated, machine learning based approaches can be used as in [1, 14]. Taking the above observations into consideration, the proposed approach for detecting QR codes consists of a two-stage method. In the first stage the aim is to detect FIPs, and in the second stage the goal is to integrate information about detected FIPs and decide whether subsets of three of them correspond to a QR code or not. In the first stage, it is desirable to maximize true positives (TP) while maintaining a controlled number of false positives (FP). Thus, for the detection of FIPs, some mechanism to control TP and FP should be available, and the detection process should fulfill the requirements listed above. In particular, it should be fast and invariant to scale and variation of illumination conditions. To meet these requirements, Viola-Jones rapid object detection method [26] is proposed for the first stage. In the second stage, the algorithm should be able to find sets of three FIPs that are the corners of a same QR code. To that end, geometrical restrictions as well as size related restrictions are considered, as detailed in Section A review on Viola-Jones object detection framework Viola-Jones object detection framework has the following characteristics: it uses simple features (Haar-like); it performs fast computation of features using integral image; it is based on a cascaded approach to train a classifier: each stage is trained in such a way as to achieve fixed TP and false alarm (FA) rates, using a boosting algorithm; it is very fast in practice since features are computed rapidly and the cascade is designed to discard the majority of negative samples in its early stages, eliminating the need for calculating responses of all stages to every sample considered. The Viola-Jones framework can be illustrated with the following concise description: The detection process consists in acquiring a complete sample set from the input image and submitting each sample to a cascade classifier whose stages have been trained by a boosting scheme that aggregates weak classifiers made of Haarlike features. These features are calculated in constant time using a matrix called integral image. Below some key concepts are reviewed and the detection process is explained. Sample set: A sample consists of a sub-region of the image restricted to a window and the complete sample set of the image is obtained by sliding the window on the image and varying its size from a given minimum to the largest possible size in the image. Cascade classifier: A cascade consists of a series of consecutive classifiers (stages) trained to reject samples that do not match the searched pattern while accepting the majority of positive samples and passing them to the next stage. A sample is said to be detected when it is accepted from the first to the last cascade stages, without rejection. Each stage is built from a set of weak classifiers aggregated in a committee by a boosting algorithm and can be seen as an independent classifier designed to obtain a very high hit rate (typically 99% or more) with an acceptable false alarm rate (typically between 40% and 60%). Figure 6 illustrates the concept of a cascade classifier. Boosting: Boosting works in rounds, iteratively training a set of weak classifiers while reweighting the training samples so that hard samples will have increased

6 Fast QR code detection 5 ar br cr dr er fr hr Fig. 8 Additional rotated feature prototypes in the extended set. Given a square window, all features used are derived from these prototypes by defining its width, height and relative position within the window. Some examples are shown in Fig. 9. Note, for instance, that features in Fig. 9(a) and Fig. 9(b) are both based on the same prototype. (a) Based on prototype a (b) Based on prototype a Fig. 6 Illustration of the cascade approach to detection. relative importance in the set. In each round the best weak classifier is selected to compose the resulting classifier [15]. Thus, boosting can achieve the specified hit rate and false alarm rate as it increases the weak classifiers in the combination (as long as the features used have enough representational power). Every weak classifier is typically the result of the computation of a Haar-like feature followed by a binary threshold, although they can have more sophisticated forms like simple trees. Haar-like features: The features used by the classifiers proposed in [26], inspired from [24], are based on feature prototypes shown in Fig. 7. In [18], this set has been Fig. 7 a b c d e f g h Basic feature prototype set. extended with additional 45 degree rotated prototypes, shown in Fig. 8. (c) Based on prototype fr Fig. 9 Examples of Haar-like features. (d) Based on prototype d Feature values are computed with respect to a sample that corresponds to a sub-region of the image under a sliding evaluation window. By definition, the value of a feature for each sample is the sum of the pixel values in the white rectangle area subtracted from the corresponding summation in the black rectangle area. Integral image: Haar-like feature values are efficiently computed using the integral image [26], a matrix with the same dimensions of the input image in which every (x, y) position stores the sum of all pixel values in the rectangle between (0, 0) and (x, y). Given an image i of dimension h w, the calculation of the corresponding integral image ii can be performed in approximately 2 (w h) memory accesses based on the following recurrence: s(x, y) = s(x, y 1) + i(x, y) (1) ii(x, y) = ii(x 1, y) + s(x, y) (2) where s(x, y) is the cumulative row sum, with s(x, 1) = 0, and ii( 1, y) = 0.

7 6 Luiz F. F. Belussi, Nina S. T. Hirata Once the integral image is calculated, the sum of the pixel values in any rectangle within the image can be determined in exactly four memory accesses, as shown in Fig. 10. Thus, feature values of the type above can be computed in constant time. For the extended feature most (T/9) T. Since they may be on opposite corners, the maximum distance is therefore 19 2 T. Moreover, two edges linking the three FIPs of a QR code are either orthogonal each other (two sides of the square region) or they form an angle of 45 (one of the sides and the diagonal of the square region). Given two FIPs of a QR code, there are six possible positions for the third FIP, as shown in Fig. 11. Fig. 10 The sum of pixels in the shaded area can be computed as ii(c) ii(b) ii(d) + ii(a). 1 2 prototype set, similar fast calculation is possible using triangles instead of rectangles [18]. 3 Component aggregation method Once FIP candidates are detected in the first stage, arrangement of three candidate FIPs that are likely to correspond to the corners of a QR code must be examined in the second stage. In this section, an algorithm that considers size, distance and angle restrictions to perform such examination is proposed. Note that besides these geometrical restrictions, additional information such as the number of overlapping detections or texture around the FIPs could be also used to rule out some subsets or support others. However, taking into consideration the requirement for fast processing, only size, distance, and angle information are considered. 3.1 Problem formulation Each FIP candidate from stage one is represented as a triple (T, x, y) where T is the size of a square region, and x and y are the coordinates of the center of that square in the image. In an ideal scenario with no noise and no deformation, two FIPs of a same QR code have the same size. Moreover, the maximum distance between them is limited by a factor related to FIP size. A FIP corresponds to 9 9 modules, considering the white region around it. The smallest version of QR code (version 1) can accommodate 21 modules (see Fig. 3) while the largest one (version 40) can accommodate 177 modules in its width. Thus, the minimum number of modules between the center of two adjacent FIPs is 21-7=14, and the maximum is 177-7=170. Therefore the distance of two FIPs in adjacent corners is at least (T/9) T and at Fig. 11 Given two FIPs (indicated as 1 and 2) of a QR code, the six possible positions for the third FIP are represented by dark circles. Based on these observations, the following criteria are defined. Definition 1 (Size criterion) Two FIP candidates (T 1, x 1, y 1 ) and (T 2, x 2, y 2 ) satisfy the size criterion if and only if T 1 = T 2. Definition 2 (Distance criterion) Two FIP candidates (T 1, x 1, y 1 ) and (T 2, x 2, y 2 ) of same size (T 1 = T 2 = T ) satisfy the distance criterion if and only if 1.6 T dist ( (x 1, y 1 ), (x 2, y 2 ) ) 19 2 T. Let F 1 = (T 1, x 1, y 1 ) and F 2 = (T 2, x 2, y 2 ) be two FIPs. Let F 1 F 2 denote the line segment that links (x 1, y 1 ) and (x 2, y 2 ), and F 1 F 2 denote its length. Note that F 2 F 1 = F 1 F 2. Definition 3 (Orientation criterion) Let F 1 F 2 and F 1 F 3 be two line segments with common extremity (x 1, y 1 ), θ be the angle between them, and P = min{ F 1 F 2, F 1 F 3 } max{ F 1 F 2, F 1 F 3 }. Then, {F 1 F 2, F 1 F 3 } satisfies the orientation criterion if and only if (i) θ = 90 and P = 1 (i.e., F 1 F 2 = F 1 F 3 ), or (ii) θ = 45 and P = 2 2. Given that three candidate FIPs are actually corners of a QR code, then pairwise they must satisfy the size and distance criteria and the edges of the triangle also satisfy, pairwise, the orientation criterion. Therefore, finding QR code candidates can be cast as a problem of finding subsets of three FIPs, satisfying the three necessary conditions above.

8 Fast QR code detection 7 The formulation proposed for solving this problem is based on graphs. FIP candidates are represented as vertices and two vertices are linked if the pair of respective FIPs satisfy the size and the distance criteria. Note, however, that not all triplet of candidate FIPs satisfying pairwise the size and distance criteria are corners of a QR code. Therefore, the solution of the problem does not correspond to all cliques of size three in the graph. There is need to, additionally, verify whether the edges satisfy or not the orientation criterion. Proposition 1 Let {F 1, F 2, F 3 } be a set of FIP candidates that pairwise satisfy the size and distance criteria. Then, {F 1 F 2, F 1 F 3 } satisfies the orientation criterion if and only if {F 2 F 1, F 2 F 3 } and {F 3 F 1, F 3 F 2 } also satisfy the orientation criterion. Proof : If {F 1 F 2, F 1 F 3 } satisfies the orientation criterion, then either they are two sides of a square or they are one side and the diagonal of a square. If they are two sides of a square, then the third edge F 2 F 3 is the diagonal and obviously it satisfies condition (ii) of the orientation criterion with F 1 F 2 and also with F 1 F 3. If they are one side and the diagonal of a square, obviously the third one is another side of the square that satisfies condition (i) of the orientation criterion with the edge that is a side of the square, and condition (ii) of the orientation criterion with the edge that is the diagonal of the square. An important consequence of this proposition is that given that three FIP candidates satisfy pairwise the size and distance criteria, just one pair of edges satisfying the orientation criterion is sufficient to determine that the three FIPs correspond to a QR code candidate. Moreover, in an ideal condition, given that a FIP is part of a QR code, it can not be part of another QR code. Thus, once a triplet satisfying the code condition is found, they could be removed from the graph together with all edges adjacent to it. However, there may be situations such as the one shown in Fig. 12, in which the first detected triplet is a false positive. In order to avoid loosing the other, an edge should be removed only after all edges adjacent to it are analyzed. In practice, the size and distance values obtained from an image are not precise due to, for instance, imaging conditions, image resolution, rotation and perspective distortions. Therefore, the criteria should allow some tolerance to accommodate imprecisions due to these conditions. (Size) Let T = max{t 1, T 2 } and t = min{t 1, T 2 }. The size difference between two FIP candidates is allowed to vary up to a given percentage, i.e., T t Fig. 12 An edge that may be part of two subsets of three FIP candidates satisfying the conditions. ε T. For instance, ε = 0.2 means that maximum allowed size difference is at most 20%. (distance) Large version codes are rarely found in practice and therefore, the maximum distance could be established based on the maximum code version to be detected. If the maximum code version to be detected corresponds to the largest version, then distance should satisfy 1.6 T dist ( (x 1, y 1 ), (x 2, y 2 ) ) 19 2 T, where T = max{t 1, T 2 }. (orientation): a small variation in the angle as well as in the length of the edges should be allowed. These can be parametrized by one single parameter d, by rewriting the orientation condition as (i) 90 θ d and 1 cos(45 d) cos(45 + d) cos(45 ) P 1 or (ii) 45 θ d and cos(45 + d) P cos(45 d). 3.2 Aggregation algorithm The proposed solution for aggregating FIP candidates is presented in Algorithm 1. Lines 1 to 3 correspond to vertex set creation. Lines 4 to 13 correspond to the creation of edges, whenever a pair of vertices satisfy the size and the distance criteria, with respective tolerances. The loop structure from line 14 to 23 is the main part of the algorithm, where cliques of size three whose edges pairwise satisfy the orientation criterion are found. For each edge (u, v), all edges (u, v ) that are adjacent to one of its extremities (u) are verified. For each of such pairs of edges (i.e, {(u, v), (u, v )}), first the existence of the third edge (i.e, edge (v, v )) is verified. If the third edge is not there, that means that either the two vertices, v and v do not satisfy the size or distance criteria, or that the third edge has already been processed (in this case, the triplet {u, v, v } is already in the output list). If the third edge (v, v ) is present, and if the first two satisfy the orientation condition, then the subset of the three vertices {u, v, v } is added to the output list. Note that there is no need to examine edges that are adjacent to the other extremity

8 Luiz F. F. Belussi, Nina S. T. Hirata (v) since those will be evaluated later. The algorithm ends when all edges are processed. Algorithm 1: FIP candidate aggregation algorithm.

9 8 Luiz F. F. Belussi, Nina S. T. Hirata (v) since those will be evaluated later. The algorithm ends when all edges are processed. Algorithm 1: FIP candidate aggregation algorithm. Input: a list of FIP candidates, each represented by a triple F i = (T i, x i, y i ), and tolerance parameters ε and d Output: a list of subsets of the form {F i, F j, F k } satisfying the size, distance, proportion and orientation criteria, with respective allowed tolerances. 1 foreach F i do 2 create a vertex v i and mark it as unprocessed 3 end 4 while Unprocessed vertices u remaining do 5 foreach unprocessed vertex v, v u, do 6 if {u, v} satisfies the size criterion with tolerance ε then 7 if {u, v} satisfies the distance criterion then 8 create edge (u, v); 9 end 10 end 11 end 12 mark vertex u as processed; 13 end 14 while edges (u, v) remaining do 15 foreach edge (u, v ), v v, do 16 if there exists edge (v, v ) then 17 if edges (u, v) and (u, v ) satisfy the orientation criterion with tolerance d then 18 add subset {u, v, v } to the output list 19 end 20 end 21 end 22 Remove edge (u, v); 23 end 4 Procedure for training and evaluating FIP detectors Classifier training and evaluation tools that implement the Viola-Jones method are available respectively as opencv-haartraining and opencv-performance utilities in OpenCV 2.0 [4]. For the training of a cascaded classifier, a set of positive samples (cropped image regions containing only a target object) and negative background images (with no target objects) should be provided. For each stage, the training algorithm randomly selects negative samples from the background images that have been misclassified (as positives) by the preceding stages. Another utility available in OpenCV is opencv-createsamples, that allows application of transformations on positive samples with controlled random variations in intensity, rotation and perspective distortion. Examples of transformed samples obtained from a positive sample are shown in Fig. 13. Fig. 13 Examples of samples generated by random transformations on a positive sample. 4.1 Main training parameters The training process implemented in opencv-haartraining involves several parameters that need to be tuned for each application. Among them, the following parameters have been considered in this work: Feature set: It can be the basic set shown in Fig. 7 (MODE=Basic) or the basic set in conjunction with the extended set shown in Fig. 8 (MODE=Extended/All). Symmetry: When the target pattern is symmetrical, the training algorithm can be restricted to consider only half part of the positive samples (SYM=Symmetric/Y) in order to reduce processing time during training, or not (SYM=Asymmetric/N). Classifier topology: Rather than cascaded stages (MTS=Cascade), it is possible to allow splits (MTS=Tree) that turns the classifier into a simple CART tree [5]. Weak classifier number of splits: A weak classifier in its simplest form is just a single Haar-like feature and a binary threshold. They are combined with other weak classifiers by the boosting algorithm to form a strong classifier corresponding to one stage of the cascade. It is possible to allow weak classifiers to learn more complex relationships in the training pattern by letting them to be not just a single feature (NS = 1) but a simple CART tree with

10 Fast QR code detection 9 a limited small number of splits (NS > 1). For face detection, empirical observations [17] indicate that there is an increase in cascade performance when splits in the weak classifiers are allowed. Maximum false alarm rate / Minimum hit rate: Each cascade stage must comply with a maximum false alarm rate (FA) and with a minimum hit rate (HR) for the samples supplied to it in order to be considered trained. A too low false alarm rate requirement may cause the stage to become overly complex diminishing the benefits of the cascade approach. A very high minimum hit rate restriction may have a similar effect. Number of training samples : The number of positive training samples (SAMPLES). Size of training samples: All training samples are resized to a fixed dimension (SIZE). For face detection, it has been observed that is an appropriate size [17]. 4.2 Detection evaluation metrics A classifier trained using the opencv-haartraining utility can be evaluated using the opencv-performance utility. The performance evaluation utility requires test images with ground-truth data (i.e., positions of the target samples). Such images can be artificially generated by the opencv-createsamples utility, that inserts one target sample in each background image. This, in conjunction to opencv-performance, is useful for automating the performance tests, by providing to them a test set of positive samples and negative background images as the ones provided for training. Figure 14 shows an example of target sample inserted into a background image. (FN). Given a test set with P positive samples, recall is defined as the fraction of positive samples that are detected, i.e., Recall = T P P. (3) Two important parameters in the application of a detector are the scaling factor (SF) and the number of overlapping detections (ND). Scaling factor: For each position of the image, several samples are obtained by varying the window size from a minimum size to the maximum possible size in that position. Since all samples are resized to the size of the detector, this minimum size should be the SIZE of the detector (i.e. the size of the samples used for training the detector). The scaling factor (SF) determines how window size varies for sampling. For instance, SF=1.10 means that the size of the processing window will be increased in 10% at each sampling round in a same position. The larger the scale factor, less samples are considered in each position. Note that the classifier can be considered scale invariant due to this resizing of all samples to a fixed size. Minimum overlapping detections: Often more than one sample among those corresponding to small shifts of the window around the target object result in positive detection. In such cases, it is convenient to establish some criteria in order to consolidate multiple redundant detections into a single detection representing all of them. Naturally, the larger the number of detections that overlap in a given neighborhood, the larger the evidence that there is a target object in that position. Therefore, a minimum number of overlaps (ND) in the set of overlapping detections can be established as determining a single detection, that is, a set of overlapping detections is considered a detection if and only if the number of detections in the set is at least ND. In the implementation used in this work, two detections are considered overlapping if their position in x and in y coordinates differ by no more than 20% of the width of the smaller detected sample and their widths differ by no more than 20%. 4.3 Estimation of training parameter values Fig. 14 A background image with an inserted FIP. The result of a detector on a set of test images can be measured by the number of true positives (TP), false positives (FP), true negatives (TN), and false negatives Viola-Jones framework is widely known and very popular for face detection and the literature concerning the training of the cascade classifier often recommends training practices, parameters and sets, suitable for that application domain [6]. Since FIPs are simpler and

10 Luiz F. F. Belussi, Nina S. T. Hirata much different from face patterns, variation of parameter values for the cascade training of FIP detectors is a subject of study in this work.

11 10 Luiz F. F. Belussi, Nina S. T. Hirata much different from face patterns, variation of parameter values for the cascade training of FIP detectors is a subject of study in this work. Due to the large number of parameters in the training process, an exhaustive assessment of possible combinations of parameter values is not feasible. The approach for determining the parameter values used in this work for the cascade training of a FIP detector is similar to the one used in [17] for face detection. The methodology consists in first establishing a sequential order as well as initial values for the parameters to be evaluated. Then, following the established order, the effect of individual parameter variation is assessed while the values of all other parameters are kept fixed. After assessment of a parameter, its initial value is replaced by the best value found, and the assessment proceeds to the next parameter. Although this analysis cannot be considered exhaustive, it provides a very good indication of the parameter variation influence in final detection performance while avoiding the combinatorial explosion that would arise from experimenting all possibilities. Parameter values for training FIP detectors were chosen following the strategy described above. A base set consisting of 380 FIP samples cropped from a variety of different images, and a set of 1500 background images (without QR codes) for the negative samples were divided into training and test sets. All positive samples for training were generated from the 285 FIPs in the base set, applying transformations using the opencv-createsamples utility with intensity variation limited to ±80 and rotation limited in x, y to ±23 and in z (perspective) to ±34. The test samples were generated from the remaining 95 FIPs applying the same variation in intensity and in x and y, and in z limited to ±23. The background images were split into two subsets of equal size (750 images each), being used respectively for training and testing. Initial values of parameters were set to HR=0.998, ST=15, SYM=N (Asymmetric), MTS=Cascade, NS = 2, FA=0.5, SAMPLES=4000 (positives and negatives, 4000 each), and SIZE= The tree diagram in Fig. 15 shows, from top to bottom, the order in which the parameters were evaluated, and the results obtained for each parameter value. Recall and FP shown for each training is relative to the test set, computed as described in Section 4.2 using detector application parameters SF=1.10 and ND=1. Dark nodes indicate the values chosen for the respective parameters. Beyond considering just recall and FP, the choice of the best parameter values were made taking into account the simplicity of the resulting classifier and the balance of its cascade (i.e. uniformity of feature quantity among consecutive stages). Moreover, recall values were prioritized over FP because while the FIP aggregation algorithm can not recover undetected FIPs, it does can disregard false FIP detections. Figure 16 shows the performance of the classifiers trained during the parameter determination in terms of Recall and FP, and further illustrates the impacts of parameter variations in classifiers performance. Fig. 16 Recall and FP for the classifiers trained during parameter evaluation on a test set with 1000 FIPs. The arrows highlight some of the trained classifiers and comments regarding them are presented below: 1. c : Classifier with the best relation between recall and FP. 2. s : Best recall achieved by a classifier but just slightly better than the recall achieved by c and with considerably higher FP. Its weak classifiers were allowed in training to have at most 8 splits making it a more complex cascade. 3. a : this classifier was trained with the extended feature set (Fig. 8) in addition to the basic feature set (Fig. 7). Although these features allow for a richer representation of the patterns and minimize false negatives during training, for the test set the recall was quite low. The excessive freedom of classifier

Fast QR code detection 11 Fig. 15 Parameter choice order: at each level, the chosen value for the respective parameter is in the black node (see text).

12 Fast QR code detection 11 Fig. 15 Parameter choice order: at each level, the chosen value for the respective parameter is in the black node (see text). space may be an explanation for its poor performance. 4. l : Samples of size 6 6 were used to train this classifier and its very poor performance indicates that such a small pixel area does not contain enough information for appropriate FIP detection. 5. i : This classifier was trained with the same parameters of classifier c, except for the sample size that in this case was Its recall is similar to that of classifier c, while its sample size compared to the sample size of classifier c (that is 20 20) is about 36% smaller. The chosen training parameter values correspond to those of classifier i. In spite of its comparatively large number of FP, classifier i was chosen taking into consideration the fact that it can detect smaller FIPs than those that can be detected by other classifiers with similar recall, while FPs can be treated in the second step of the detection process. Thus, the chosen training parameter values are: Feature Set: Basic Symmetry: Symmetric Classifier Topology: Cascade Weak Classifier Number of Splits: 1 Stage maximum False Alarm Rate: 0.50 Number of Training Samples: 4000 Size of Training Samples: Experimental Results In this section, experimental results regarding FIP and QR code detection in still images, and frame rates achieved for QR code detection in video images are reported. Some images that illustrate detection results are presented. Additional resulting images can be accessed from URL FIP detection After choosing training parameter values as described in Section 4.3, a FIP detector was trained with the chosen parameter values, using a set of 462 positive samples

13 12 Luiz F. F. Belussi, Nina S. T. Hirata (cropped FIP regions) and 820 background images with no QR codes from where 4000 negative exemplars were randomly sampled for training each stage. The trained FIP detector was evaluated on a validation set A consisting of 74 images containing 222 positive samples (exactly one complete QR code in each image), totally distinct from those in the training set. During this evaluation, the influence on detection performance of two detector application parameters (see Section 4.2), scaling factor (SF) and number of overlapping detections (ND), have been analyzed as described below. correspond to FIP detection. As these results will be passed on to a second stage where the FIP aggregation algorithm is responsible for distinguishing between true and false detections of the first stage, it is much more important to have a high recall than a low FP in the first stage. Hence, ND=30 as highlighted in point B of Fig. 18 was chosen. Scaling factor: Figure 17 shows that the scaling factor (SF) directly impacts the quality of results. A wide range of result quality was observed from SF=1.05 (recall: 0.98 / FP: 628) to SF=1.30 (recall: 0.86 / FP: 93). As the scaling factor is incremented, both recall and FP decrease. Considering that high recall and small FP are desirable, scaling factor of 1.10 as highlighted in the chart (recall: 0.99 / FP: 290) was adopted for the successive experiments. Fig. 18 Recall x FP for the trained FIP detector varying ND from 1 to 200. With these parameters, i.e., SF=1.10 and ND=30, the trained FIP detector achieved recall=0.986 (219 FIPs detected out of 222) and FP=284 (in 74 test images) on validation set A. This impressive recall is partly due to the fact that parameter SF and ND has been adjusted to this validation set. 5.2 QR code detection Fig. 17 Recall and FP of the trained FIP detector varying the scaling factor used during FIP detection. Overlapping detections: The number of minimum overlapping detections (ND) required in a set of overlapping detections in order to the set be considered a single detection affects detection performance. Figure 18 shows a ROC curve comparing recall and FP while ND varies from 1 to 200. At a first glance, a good choice for ND would be 53 as highlighted in point A in Fig. 18. However, it should be noted that these recall and FP results Another validation set, to be called validation set B, consisting of 45 images containing exactly one complete QR code each was exclusively used to perform parameter tuning for complete QR code detection. The trained FIP detector was applied using SF=1.10 and ND=30 and variations in two parameters of the second stage of our proposed approach (FIP aggregation) were evaluated, namely the tolerances ε and d respectively for the size and orientation criteria. In addition, an analysis of the impact of the minimum number of overlapping detections (ND) for FIP detection on QR code detection was carried out in order to define a complete set of parameter values to be used in the proposed approach.

14 Fast QR code detection 13 Note that in this section, recall and FP refer to the detection of QR codes rather than to the detection of FIPs Geometrical restriction variations Figure 19 shows recall and FP, relative to QR code detection, achieved for different values of the orientation tolerance parameter d on validation set B. A too small tolerance (d = 0.05) results in small recall. As the tolerance increases, so does the recall but also the FP. The highlighted value, d = 0.25, corresponds to the chosen orientation criterion tolerance, which in degrees corresponds to approximately See Section 3 for details. Fig. 20 Recall and FP of the complete QR code detector while varying the size criterion tolerance ε with fixed orientation criterion tolerance d = Fig. 19 Recall and FP of the complete QR code detector while varying the orientation criterion tolerance d with fixed size criterion tolerance ε = Figure 20 shows recall and FP for complete QR codes using the validation set B and fixing the orientation tolerance to d = The highlighted value ε = 0.40 represents chosen size criterion tolerance (see Section 3 for details). As in the case of the orientation criterion, large tolerance leads to high recall but also to larger FP Overlapping detections Figure 21 shows recall and FP, with respect to QR code detection, for different values of the minimum number of overlapping FIP detections, fixing ε = 0.4 and d = The highlighted value ND=30 was the one chosen, and it concludes the parameter determination experiments. Fig. 21 Recall and FP of the complete QR code detector while varying the number of detections (ND) Performance of the QR Code detector After all FIP detection and aggregation parameters were tuned using the validation sets A and B as described above, a completely independent test set containing 135 images with exactly one complete QR code each (405 FIPs) was used to assess the performance of the proposed approach. Using the classifier trained as described in Section 5.1 and the detection parameter values determined in Section 5.2 (SF=1.1, d = 0.25, ε=0.40, ND=30) the proposed approach was able to achieve a recall of 90.4%

14 Luiz F. F. Belussi, Nina S. T. Hirata (122 detected out of 135) with 75 false positives in the test set of 135 images.

Only four codes among those in which all three FIPs were detected have not been detected. Figures 22 to 26 show examples of correct detections. Fig. 24 Example of correct detection of a QR code, with several false positive FIPs.

The three false positive FIPs satisfy the geometrical restrictions, thus being detected as a QR code. Note that in this image the QR code has been correctly detected.

Although the occurrence of this kind of obvious false negative was not common in the experimental results, it illustrates that neither the use of Viola-Jones method for FIP detection nor

15 14 Luiz F. F. Belussi, Nina S. T. Hirata (122 detected out of 135) with 75 false positives in the test set of 135 images. It is interesting to note that among the 13 undetected QR codes, 9 were because one or more FIPs have not been detected in the first stage. Only four codes among those in which all three FIPs were detected have not been detected. Figures 22 to 26 show examples of correct detections. Fig. 24 Example of correct detection of a QR code, with several false positive FIPs. Fig. 25 Example of correct detection of a QR code. Fig. 22 Example of correct detection of a QR code, with one false positive FIP. Figure 27 shows an example of false positive in QR code detection. The three false positive FIPs satisfy the geometrical restrictions, thus being detected as a QR code. Note that in this image the QR code has been correctly detected. Figure 28 shows an example in which the QR code was not detected as a consequence of an undetected FIP. Although the occurrence of this kind of obvious false negative was not common in the experimental results, it illustrates that neither the use of Viola-Jones method for FIP detection nor understanding the resulting cascaded classifier is a straightforward task. This result suggests that even after careful selection of training samples and parameters, there is still room for FIP detection improvement. A FIP detector trained using a richer set of samples or using different training and detection parameter values may be capable of detecting this missing FIP. Fig. 23 Example of correct detection of a QR code, with several false positive FIPs. Figure 29 shows an example in which the QR code was not detected in the second stage although all three FIPs have been detected in the first stage. A geometri-

Fast QR code detection Fig. 26 Example of correct detection of a QR code, with two false positive FIPs. 15 Fig. 28 Example of undetected QR code due to a FIP undetected in the first stage. Fig. 29 Example of undetected QR code due to a failure in stage two.

3 Video data processing As the most interesting applications of the proposed approach relate to video processing, it is crucial to achieve reasonable frame processing speed.

The results in this section reveal the influence of these two parameters in computational performance.

be considered for application. The following experiments were run in a virtual machine inside a dual-core computer and the reported results correspond to mean values on 1000 frames.

16 Fast QR code detection Fig. 26 Example of correct detection of a QR code, with two false positive FIPs. 15 Fig. 28 Example of undetected QR code due to a FIP undetected in the first stage. Fig. 29 Example of undetected QR code due to a failure in stage two. Fig. 27 Example of a false positive in QR code detection. cal restriction that takes into account perspective distortions may be able to detect this QR code. 5.3 Video data processing As the most interesting applications of the proposed approach relate to video processing, it is crucial to achieve reasonable frame processing speed. It was observed that FIP detection time is by far the most important component of total processing time and it is greatly influenced by the scaling factor and the minimum size of FIP to be detected. The results in this section reveal the influence of these two parameters in computational performance. It suggests that these parameters can be used to adjust the trade-off between result quality and the frame rate of processed video, a desirable characteristic when a great variety of devices should be considered for application. The following experiments were run in a virtual machine inside a dual-core computer and the reported results correspond to mean values on 1000 frames. The video is a stream of 640x480 pixels acquired by a simple webcam where two QR code symbols are moved in front of the camera Minimum detection size in video processing The smallest FIP size that can be detected in the first stage is bounded by the size of the samples used during the training of the detector. If one knows that all FIPs to be detected are at least of a given minimum size (larger than the size of the detector), there is no need to consider all samples of size equal or larger than the size of the detector, but only samples of size equal or larger to this detection minimum size. Figure 30 shows the processing time variation when this minimum detection size is varied, for scaling factor fixed to Overall mean processing time per frame varies from 22 ms to

17 16 Luiz F. F. Belussi, Nina S. T. Hirata 103 ms. Note that the processing time of the second stage (FIP aggregation), shown in the right hand axis, has very little contribution to the total frame processing time. It corresponds to approximately only 0.8% of the processing time of the first stage (FIP detection). Fig. 31 Processing times of stages 1 and 2 (results in ms, video stream of pixels, average of 1000 frames). Fig. 30 Processing times of stages 1 and 2 (results in ms, video stream of pixels, average of 1000 frames) Scaling factor in video processing Another critical parameter in the computational performance of the proposed approach is the scaling factor used during detection. The smaller the scaling factor, the larger the number of samples to be processed. Since detection of small QR code symbols is desirable, to assess effects of scaling factor in processing time, the minimum detection size was fixed to the lowest value possible (i.e., the size of the training samples of the FIP detector). The scaling factor was varied from 1.05 to Figure 31 shows that processing time ranges from 45 ms to 255 ms per frame reaching frame rates between 3.9 fps and 22.0 fps. 6 Conclusions Detecting QR codes in arbitrarily acquired images is a necessary step for important applications that can not assume sighted users. To the best of knowledge, this problem has not been addressed before without relying on auxiliary cues to help detection. To solve this challenging problem, that requires real-time processing and robustness with respect to scale, illumination, rotation and perspective distortions, a two-stage componentbased detection approach has been proposed in this work. In the first stage, a novel use of Viola-Jones rapid object detection framework for FIP detection was proposed and evaluated. For the second stage, a simple algorithm based on geometrical restrictions among detected FIPs was proposed to decide whether detected FIPs correspond to corners of QR codes or not. The proposed approach was validated and tested on still images, resulting in QR code detection rates superior to 90% on test images. Part of these undetected codes are consequence of undetected FIPs (by construction, the aggregation algorithm can not detected a QR code if one of its FIPs is not detected). Another part is due to geometrical distortions of the QR code beyond that allowed in the FIP aggregation algorithm. Tests on video stream processing show that processing time compatible with acceptable frame rates are possible. Since these results were achieved using a FIP detector trained with a relatively small number of samples, there is an expectation that better FIP detection rates can be achieved. Current results show robustness with respect to scale and illumination variation. For a full assessment of the technique, invariance to rotation and perspective distortion should be further investigated. For FIP detector training, a more representative set of samples as well as artificially generated deformed samples can be used for this purpose. For the aggregation algorithm, more complex criterions, involving perspective distortions, and a smart processing capable of detecting QR codes in situations where one of the FIPs is not detected, are challenges to be tackled.

Classifier Case Study: Viola-Jones Face Detector

Classifier Case Study: Viola-Jones Face Detector P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-time face detection.