Trademark Matching and Retrieval in Sport Video Databases

Size: px

Start display at page:

Download "Trademark Matching and Retrieval in Sport Video Databases"

Jessie Booker
5 years ago
Views:

1 Trademark Matching and Retrieval in Sport Video Databases Andrew D. Bagdanov, Lamberto Ballan, Marco Bertini and Alberto Del Bimbo {bagdanov, ballan, bertini, 9th ACM SIGMM International Workshop on Multimedia Information Retrieval September 28-29, 2007, University of Augsburg, Germany

2 Automatic trademark detection and identification Every year sponsors spend millions of euros on sports marketing, a large portion of which is spent on placement of billboards, banners, and other physical advertising media. Given these astronomical number, sponsors require to verify the level of visibility of their brands; currently this work is done manually. Most of the work related to trademark recognition deals with the problem of CBR in logo databases; in this case the images are of good quality. We propose a system for automatically detecting and retrieving trademark appearances in sports videos. 2

Trademark appearances Trademarks usually contain both text and other high-contrast features such as graphic logos and they are also usually planar objects.

3 Trademark appearances Trademarks usually contain both text and other high-contrast features such as graphic logos and they are also usually planar objects. The appearance of trademarks in sport videos are often characterized by: Perspective deformation due to placement of the camera and the vantage from which it images advertisements in the field; Motion blur due to camera motion, or motion of the trademark in the case of trademarks placed, for example, on Formula One cars or jerseys of soccer players; Occlusion caused by players or other obstacles between the camera and the trademark; in many sports, soccer for instance, trademarks are occluded more often than not. 3

4 Trademark representation Since blur is indistinguishable from a change in scale, a scale-invariant representation is essential. To cope with partial occlusions, we use local neighborhood descriptors of salient points; by combining the results of local, point-based matching we are able to match entire trademarks. Trademarks are represented as a bag of SIFT feature points; each trademark Ti is represented by one or more graphical instances. T i = {(x t k, y t k, s t k, d t k, O t k)}, for k {1,..., N i }, x t k, y t k, s t k, and d t k are the x- and y-position, the scale, and the dominant direction of the k th detected feature point; O t k is a 128-dimensional local orientation histogram of the SIFT point. Each frame, Vi, of a video is represented similarly as a bag of Mi SIFTfeature points detected in frame i. 4

5 Trademark matching: candidate match Only the local orientation histograms (O t k ) of the feature points are used for the matching procedure. For every point detected in trademark Tj we compute its two nearest neighbors N1 and N2 in the points detected in video frame Vi: N 1 (Tj k, V i ) = min O v q O t q k N 2 (Tj k, V i ) = min O v q N 1 (Tj k,v q O t k i) For every point in the frame Vi we compute the match-score M: M(T k j, V i ) = N 1(T k j, V i) N 2 (T k j, V i), Points are selected as being good candidate match if their match score is less than a threshold τ1 and are collected in a candidate match-set M j i (for trademark Tj and frame Vi): M j i = {k M(T j k, V i ) < τ 1 }, the threshold τ1 (0.8 in our experiments) gives robust results; a correct match needs to have the closest matching descriptors significantly closer than the closest incorrect match. 5

Trademark matching: final matches The final determination of whether a frame Vi contains a trademark Tj is made by thresholding the normalized match-score: M j i T j > τ 2 trademark T j present in

6 Trademark matching: final matches The final determination of whether a frame Vi contains a trademark Tj is made by thresholding the normalized match-score: M j i T j > τ 2 trademark T j present in frame V i. the threshold τ2 requires that a certain percentage of the trademark feature points be matched to frames Vi Preliminary experiments have shown that a value of ~0.2 is a reasonable choice. 6

7 Robust trademark localization In order to localize the trademark in the original frame Vi and to approximate its area, we compute a robust estimate of the feature point cloud. The current feature point locations are so denoted as: F = {(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )} The robust centroid estimate is computed by iteratively solving for (µx, µy) in equations: n ψ(x i ; µ x ) = 0, i=1 n ψ(y i ; µ y ) = 0 The influence function Ψ(x; m) used is the Tukey biweight: { ψ(x; m) = (x m)(1 (x m)2 c ) 2 if (x m) < c 2 0 otherwise The scale parameter c is estimated using the median absolute deviation from the median (MAD): i=1 MAD x = median i ( x i median j (x j ) ) 7

8 Trademark area estimation After the robust centroid is estimated the distance of each matched point to the robust centroid is computed; points with a low influence are excluded from the final match-set. original frame Frame SIFT points Match-Set Robust Centroid and final match-set 8

9 Trademark area estimation After the robust centroid is estimated the distance of each matched point to the robust centroid is computed; points with a low influence are excluded from the final match-set. original frame Frame SIFT points Match-Set Robust Centroid and final match-set 8

10 Trademark area estimation After the robust centroid is estimated the distance of each matched point to the robust centroid is computed; points with a low influence are excluded from the final match-set. original frame Frame SIFT points Match-Set Robust Centroid and final match-set 8

11 Some examples: Points indicated in cyan are those selected as good matches according to robust trademark localization: Trademark SIFT keypoints Trademark SIFT keypoints 9

12 System architecture We have implemented the approach described above and this is our system architecture: Video acquisition and Feature point detection 1) Videos (MPEG2) are preprocessed by de-interlacing each video frame; SIFT feature are detected and stored in a MySQL DB. At present 5 fps are processed. SIFT features Matching of trademarks Matching results Retrieval of trademarks Candidate frames Visualization and Validation 2) Each trademark in DB is matched against the features point detected in the previous step. 3) Trademarks are retrieved from by supplying the thresholds τ1 and τ2 on the match-score and normalized match-score respectively. A list of candidate frames, grouped in candidate intervals, is finally returned. 4) Match results are displayed in a visual application that also allows a user to inspect and correct automatic results. Final results are stored in MPEG7 format. 10

13 Dataset: Experimental results Three videos of different sports (MotoGP, Volleyball and Soccer) are used for a preliminary evaluation of the proposed techniques; Each video is in MPEG2 format and it is approximately one hour long; one hour of videos at 25 fps contains around frames; Each frame contains around 1000 SIFT feature points, on average, and each trademark around 100. Experiment design: Videos were manually annotated for the presence of trademarks. These annotations were performed at the frame level, and each trademark appearance is associated with an interval in the ground-truth; The performance of the technique is evaluated in terms of precision and recall. 11

14 Precision-Recall as a function of normalized matches Precision and Recall performances are shown as a function of the normalized match threshold (τ2 in previous slides) and also of the frame sampling rates; A Recall (red curve) of about 85% can be obtained at a Precision (green curve) of around 80% with values of τ2 varying between

15 Precision-Recall as a function of frame-sampling rate Increasing the frame sampling rate predictably impacts the Recall of the results. It is interesting that the Precision of the retrieved results is not adversely affected; this matching technique has a very low False-Positive Rate. Precision curves Recall curves Frame Sampling sample-rate:1 fps sample-rate:5 fps sample-rate:10 fps 13

16 Precision-Recall as a function of trademark models In the case of precision, we can see that, for low values of the normalized match threshold, the synthetic images perform worse than those selected from the video itself. This is due to the fact that many trademarks consisting of mostly text and graphics are confused for the synthetic trademark models. 14

17 Future work will deal with: Future work the use of color features for the detection of trademarks in wide-angle scenes e.g. soccer, rugby development of metrics that evaluate visual quality of the trademarks detected in the videos; evaluate the fact that there s a human in the loop: test the effects on users expectations in case better recall at the expense of lower precision. 15

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation