The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System

Size: px
Start display at page:

Download "The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System"

Transcription

1 The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System Our first participation on the TRECVID workshop A. F. de Araujo 1, F. Silveira 2, H. Lakshman 3, J. Zepeda 2, A. Sheth 2, P. Perez 2, B. Girod 1 1 Stanford University 2 Technicolor 3 Fraunhofer HHI Guest lecture in Berkeley CS294, Nov. 7 th 2012

2 Outline q What is TRECVID? q TRECVID Semantic Indexing task q Overview of our system and how it differs from others q Experimental results q Conclusion 2

3 What is TRECVID? q Part of the TREC (Text REtrieval Conference) series, sponsored by NIST (National Institute of Standards and Technology) q Started as video track in 2001/2002, became an independent evaluation in 2003 q Goal of the conference series: encourage research in information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results 3

4 What is TRECVID? q De facto venue for state-of-the-art video analysis research q TRECVID is concluded with a workshop in Nov/Dec. Attendance is restricted to teams that submit something. q 6 different tracks/tasks: q Semantic Indexing (SIN) (focus of the presentation) q Known-item search (KIS) q Surveillance event detection (SED) q Instance Search (INS) q Multimedia Event Detection (MED) q Multimedia Event Recounting (MER) 4

5 What is TRECVID? Known-Item Search (KIS) q Use case: you ve seen a specific video and want to find it again, but don t know how to go directly to it. You remember some things about it. q System task: Given a test collection of short videos and a topic (some words and/or phrases describing the target video, people, places, or things visible): Automatically return a list of up to 100 video IDs ranked according to the likelihood that the video is the target one -- OR -- Interactively return a single video ID believed to be the target 5

6 What is TRECVID? Interactive surveillance event detection (SED) q Use case: detection of events in large amounts of surveillance video. q System task: Given a textual description of an observable event of interest: Automatically detect all occurrences of the event in a nonsegmented corpus of video 6

7 What is TRECVID? Instance Search (INS) q Use case: browsing a video archive, you find a video of a person, place, or thing of interest to you, known or unknown, and want to find more video containing the same target, but not necessarily in the same context. q System task: Given a topic with: (a) example segmented images of the target (2-6), and (b) a target type (PERSON, CHARACTER, PLACE, OBJECT) Return a list of up to 1000 shots ranked by likelihood that they contain the topic target 7

8 What is TRECVID? Multimedia Event Detection (MED) q Use case: Searching user-defined events on pre-computed metadata q System task: Given an event specified by an event kit (name, textual explanation, video exemplars), search multimedia recordings for the event. q Associated task: Multimedia Event Recounting: produce textual recounting that summarizes key evidence of the event 8

9 Semantic Indexing (SIN) What is TRECVID? q Use cases: Filtering, categorization, browsing, search... q System task: Given the test collection, master shot reference, and concept definitions, return for each concept a list of at most 2000 shot IDs from the test collection ranked according to their likeliness of containing the concept 9

10 TRECVID SIN task - Motivation q YouTube reports 72hs of videos uploaded every minute; Flickr reports 1M photo uploads per day; etc q Video increasingly ubiquitous, collections increasingly large: from broadcasters to your personal collection q Problems: ü How can you find a video? Or a video segment? ü How can you organize your video collection? q Video Semantic Indexing can help! Indexing videos with tags that represent objects, scenes, actions, events 10

11 TRECVID SIN task Background (1/3) q LSCOM effort [Naphade et al, 2006] [ taxonomy of 1000 concepts, realistic use cases, large annotated set of broadcast news. Example use cases: q Armed uniformed soldiers walking on city lanes q U.S. Maps depicting the electoral vote distribution (blue vs. red state) q Based on these use cases, [Hauptmann et all, 2007] showed that keyword-based search performed poorly; and that including content helped a lot! 11

12 TRECVID SIN task Background (2/3) q [Hauptmann et all, 2007] ü Using 320 semantic concepts, performance improved significantly (from 1% to 10% MAP in case where concept detectors have low performance) ü Extrapolation shows that a few thousand concepts should be enough to take video retrieval s performance to the level of current text-based search engines (65% MAP) ü Mindset: with a generic framework, detect a large number of concepts with reasonable performance 12

13 TRECVID SIN task Background (3/3) q Video Search Engine [Snoek et all, 2007] ü User inputs information need (query-by-keyword, query-byimage, query-by-concept) ü System interprets need, processes it based on metadata, and returns best estimate of relevant multimedia documents 13

14 TRECVID SIN task how it works (1/5) q Schedule: roughly 5 months from release of data to submission date q Three submission types: q LIGHT: 50 concepts (20 selected for evaluation) q FULL: 346 concepts (50 selected for evaluation) q PAIR: 10 concept pairs q Each team can submit up to 4 runs q Submissions consist of ranked list of 2000 shots per concept (as if it was the output of a system queried with the concept) 14

15 TRECVID SIN task how it works (2/5) q Videos range from 10s to 4min. Mostly YouTube-like, usergenerated content q Unit of analysis: shot (e.g., annotations are given at the shot level). Each video is composed by many shots q Annotations are collected via a collaborative process with the participating teams. Initially, keyframes are shown to annotator; video is played only if necessary q IMPORTANT: not all shots are annotated. An active-learning based system runs during annotation to select most useful samples to be annotated 15

16 TRECVID SIN task how it works (3/5) q 2012 training data: q 19,701 videos (600 hours) q 400,289 shots (~20 shots/video) q On average, a concept has: 1,225 P; 42,924 N q 2012 testing data q 8,263 videos (200 hours) q 145,634 shots q Videos contain some metadata: title, tags, short descriptions, outputs of speech-to-text (but not very consistent) q Data collected from Internet Archive and representative of usergenerated content 16

17 TRECVID SIN task how it works (4/5) q Example video 1 Concepts: Indoor, Kitchen, Room q Example video 2 Concepts: Trees, Vegetation q Example video 3 Concepts: Cityscape, Daytime_Outdoor, Outdoor, Scene_text, Streets, City, Text 17

18 TRECVID SIN task how it works (5/5) q Performance Measures q Inferred Average Precision (infap) per class ü Metric conceived to approximate the usual Average Precision, without the need of evaluating every single test instance [Yilmaz et al, 2008] q Mean infap (MinfAP) for final score: simple mean of the individual infap per class q Other measures, such as P-R and P@n curves are provided, but main one is MinfAP 18

19 Overview of Semantic Indexing System q Overall architecture shots P: positive N: negative M: missing S: skip 19

20 Overview of Semantic Indexing System q Architecture in more detail ü Each color is a feature channel ü Input is a shot (in our case, only keyframe is used) 20

21 Descriptor extraction q Modalities: q Visual Keyframe-based descriptors shown to provide most gain q Audio Significant gain only for some concepts q Tags, short descriptions Sparse, multilingual q Speech-to-text transcriptions Used only keyframebased visual modality English only, even if videos are not in English 21

22 Descriptor extraction q Local descriptors: q Keypoint selection ü Combination of dense extraction (best) and Harris-Laplace detector ü It has been shown that the use of different keypoint extraction methods provides complementary gain q Patch description ü SIFT descriptor ü OppSIFT descriptor (SIFT-based on each color component on Opponent color space) ü We verify gains when combining these two 22

23 Descriptor extraction q Global descriptor: CENTRIST/SPACT ü Binary pattern for each pixel, based on comparisons to neighbors ü Aggregated in histograms, according to spatial location on image: Spatial Principal component Analysis of Census Transform histograms SPACT ü Shown to provide very good results, at very low computational cost 23

24 Local descriptor aggregation q BoVW + SPM ü Histogram of appearances of visual words ü Pooled over different spatial regions of the image q Residual Vectors ü Mean of differences in each Voronoi cell ü Inspired on Fisher Vector approach [Perronin et al, 2007] ü No spatial aggregation due to lack of time 24

25 Classification q Machine Learning: SVMs ü One-versus-rest (shown to perform well for large-scale in [Perronin, 2012]) ü Use of HIK and RBF kernels, depending on feature type ü Validation experiments based on Average Precision, to choose the parameter C 25

26 Classification q Late fusion ü After obtaining classifier scores for each feature channel, there needs to be a way to combine them ü We perform a linear combination of their scores, based (1) on validation performances, or (2) on a simple average of scores ü Other option: early fusion: performed worse in preliminary experiments 26

27 Classification q Co-occurrence information ü Since the SIN task is a multi-label problem, should help ü Example: concept News Studio happens often together with concept Person ü Very hard to exploit, since not all shots are tagged: ü less than half of shots contain more than 10 annotations ü 16% of shots contain more than 100 annotations ü Previous work [Qi et al, 2007] shows that it is 25 times more computational complex to train using co-occurrence information 27

28 Feature channels BoW Residual Global q Dense keypoint extraction + OppSIFT. Bag-of-Words and Spatial Pyramid pooling in a 1x3 grid with a 4096 visual dictionary; q Dense keypoint extraction + SIFT. Bag-of-Words and Spatial Pyramid pooling in a 1x3 grid with a 4096 visual dictionary; q HarLap keypoint extraction + OppSIFT. Bag-of-Words pooling with a 4096 visual dictionary; q HarLap keypoint extraction + SIFT. Bag-of-Words pooling with a 4096 visual dictionary; q Residual vectors on PCA-reduced (to 64 dimensions) densely extracted OppSIFT, using a 256-dimensional visual dictionary; q Residual vectors on PCA-reduced (to 32 dimensions) densely extracted SIFT, using a 512-dimensional visual dictionary; q SPACT: Spatial Principal component Analysis of Census Transform Histograms (CENTRIST). 28

29 Experimental results q Inferred Precision-Recall curve 29

30 q Inferred curve Experimental results 30

31 Experimental results q Mean Inferred Average Precisions per concept 31

32 Timing q Descriptor extraction and aggregation: 1 to 2 days q Precomputation of kernel matrices: ü Training: 10 hours to 5 days O(400k^2) ü Testing: 5 hours to 2.5 days O(400k x 100k) q Classifier training, with precomputed kernels, was limited by memory loading time: ~ 1 hour to load on memory, ~10 min to train classifier q Late fusion does not take a significant amount of time 32

33 Visualization 33

34 Conclusion q We ve built a quite complex system that performs Semantic Concept detection in a large-scale multimedia database (training on 400k shots, testing on 100k shots) q Use of advanced Computer Vision and Machine Learning tools, and had to put them all together to work q In our first participation, we achieved the 6 th best performance in the TRECVID SIN task 34

35 Thank You Project website: h/p://stanford.edu/~afaraujo/trecvid h/p://stanford.edu/~afaraujo

THE steep rise in the availability of video content, during

THE steep rise in the availability of video content, during TRECVID WORKSHOP, NOVEMBER 2012 1 The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System A. F. de Araujo 1, F. Silveira 2, H. Lakshman 3, J. Zepeda 2, A. Sheth 2, P. Pérez 2 and B. Girod

More information

LIG and LIRIS at TRECVID 2008: High Level Feature Extraction and Collaborative Annotation

LIG and LIRIS at TRECVID 2008: High Level Feature Extraction and Collaborative Annotation LIG and LIRIS at TRECVID 2008: High Level Feature Extraction and Collaborative Annotation Stéphane Ayache, Georges Quénot To cite this version: Stéphane Ayache, Georges Quénot. LIG and LIRIS at TRECVID

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

BUAA AUDR at ImageCLEF 2012 Photo Annotation Task

BUAA AUDR at ImageCLEF 2012 Photo Annotation Task BUAA AUDR at ImageCLEF 2012 Photo Annotation Task Lei Huang, Yang Liu State Key Laboratory of Software Development Enviroment, Beihang University, 100191 Beijing, China huanglei@nlsde.buaa.edu.cn liuyang@nlsde.buaa.edu.cn

More information

Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Columbia University High-Level Feature Detection: Parts-based Concept Detectors TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task S. Avila 1,2, N. Thome 1, M. Cord 1, E. Valle 3, and A. de A. Araújo 2 1 Pierre and Marie Curie University, UPMC-Sorbonne Universities, LIP6, France

More information

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular

More information

ImageCLEF 2011

ImageCLEF 2011 SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test

More information

Quaero at TRECVid 2012: Semantic Indexing

Quaero at TRECVid 2012: Semantic Indexing Quaero at TRECVid 2012: Semantic Indexing Bahjat Safadi 1, Nadia Derbas 1, Abdelkader Hamadi 1, Franck Thollard 1, Georges Quénot 1, Jonathan Delhumeau 2, Hervé Jégou 2, Tobias Gehrig 3, Hazim Kemal Ekenel

More information

Multimedia Information Retrieval

Multimedia Information Retrieval Multimedia Information Retrieval Prof Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis Multimedia Information Retrieval 1. What are

More information

TRECVid 2012 Experiments at Dublin City University

TRECVid 2012 Experiments at Dublin City University TRECVid 2012 Experiments at Dublin City University Jinlin Guo, Zhenxing Zhang, David Scott, Frank Hopfgartner, Rami Albatal, Cathal Gurrin, and Alan F. Smeaton CLARITY: Centre for Sensor Web Technologies

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Multimodal Medical Image Retrieval based on Latent Topic Modeling

Multimodal Medical Image Retrieval based on Latent Topic Modeling Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath

More information

Hello, I am from the State University of Library Studies and Information Technologies, Bulgaria

Hello, I am from the State University of Library Studies and Information Technologies, Bulgaria Hello, My name is Svetla Boytcheva, I am from the State University of Library Studies and Information Technologies, Bulgaria I am goingto present you work in progress for a research project aiming development

More information

TRECVid 2013 Experiments at Dublin City University

TRECVid 2013 Experiments at Dublin City University TRECVid 2013 Experiments at Dublin City University Zhenxing Zhang, Rami Albatal, Cathal Gurrin, and Alan F. Smeaton INSIGHT Centre for Data Analytics Dublin City University Glasnevin, Dublin 9, Ireland

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

User Strategies in Video Retrieval: a Case Study

User Strategies in Video Retrieval: a Case Study User Strategies in Video Retrieval: a Case Study L. Hollink 1, G.P. Nguyen 2, D.C. Koelma 2, A.Th. Schreiber 1, M. Worring 2 1 Business Informatics, Free University Amsterdam. {hollink,schreiber}@cs.vu.nl

More information

MESH participation to TRECVID2008 HLFE

MESH participation to TRECVID2008 HLFE MESH participation to TRECVID2008 HLFE J. Molina 3, V. Mezaris 2, P. Villegas 1, G. Tolias 4, E. Spyrou 4, N. Sofou 4, J. Rodríguez-Benito 1, G. Papadopoulos 2, S. Nikolopoulos 2, J. M. Martínez 3, I.

More information

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points? Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not

More information

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012) Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of

More information

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

TA Section: Problem Set 4

TA Section: Problem Set 4 TA Section: Problem Set 4 Outline Discriminative vs. Generative Classifiers Image representation and recognition models Bag of Words Model Part-based Model Constellation Model Pictorial Structures Model

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Harvesting collective Images for Bi-Concept exploration

Harvesting collective Images for Bi-Concept exploration Harvesting collective Images for Bi-Concept exploration B.Nithya priya K.P.kaliyamurthie Abstract--- Noised positive as well as instructive pessimistic research examples commencing the communal web, to

More information

CS 231A Computer Vision (Fall 2011) Problem Set 4

CS 231A Computer Vision (Fall 2011) Problem Set 4 CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based

More information

An Efficient Methodology for Image Rich Information Retrieval

An Efficient Methodology for Image Rich Information Retrieval An Efficient Methodology for Image Rich Information Retrieval 56 Ashwini Jaid, 2 Komal Savant, 3 Sonali Varma, 4 Pushpa Jat, 5 Prof. Sushama Shinde,2,3,4 Computer Department, Siddhant College of Engineering,

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

Exploring Geotagged Images for Land-Use Classification

Exploring Geotagged Images for Land-Use Classification Exploring Geotagged Images for Land-Use Classification Daniel Leung Electrical Engineering & Computer Science University of California, Merced Merced, CA 95343 cleung3@ucmerced.edu Shawn Newsam Electrical

More information

Scene Recognition using Bag-of-Words

Scene Recognition using Bag-of-Words Scene Recognition using Bag-of-Words Sarthak Ahuja B.Tech Computer Science Indraprastha Institute of Information Technology Okhla, Delhi 110020 Email: sarthak12088@iiitd.ac.in Anchita Goel B.Tech Computer

More information

Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland,

Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland, Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland, fractor@icsi.berkeley.edu 1 Today Recap: Some more Machine Learning Multimedia Systems An example Multimedia

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance Summarization of Egocentric Moving Videos for Generating Walking Route Guidance Masaya Okamoto and Keiji Yanai Department of Informatics, The University of Electro-Communications 1-5-1 Chofugaoka, Chofu-shi,

More information

PRISM: Concept-preserving Social Image Search Results Summarization

PRISM: Concept-preserving Social Image Search Results Summarization PRISM: Concept-preserving Social Image Search Results Summarization Boon-Siew Seah Sourav S Bhowmick Aixin Sun Nanyang Technological University Singapore Outline 1 Introduction 2 Related studies 3 Search

More information

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,

More information

Action Recognition & Categories via Spatial-Temporal Features

Action Recognition & Categories via Spatial-Temporal Features Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction

More information

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task CPPP/UFMS at ImageCLEF 2014: Robot Vision Task Rodrigo de Carvalho Gomes, Lucas Correia Ribas, Amaury Antônio de Castro Junior, Wesley Nunes Gonçalves Federal University of Mato Grosso do Sul - Ponta Porã

More information

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies? Visual Categorization With Bags of Keypoints. ECCV,. G. Csurka, C. Bray, C. Dance, and L. Fan. Shilpa Gulati //7 Basic Problem Addressed Find a method for Generic Visual Categorization Visual Categorization:

More information

Large scale object/scene recognition

Large scale object/scene recognition Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database

More information

Integrating Visual and Textual Cues for Query-by-String Word Spotting

Integrating Visual and Textual Cues for Query-by-String Word Spotting Integrating Visual and Textual Cues for D. Aldavert, M. Rusiñol, R. Toledo and J. Lladós Computer Vision Center, Dept. Ciències de la Computació Edifici O, Univ. Autònoma de Barcelona, Bellaterra(Barcelona),

More information

SkyFinder: Attribute-based Sky Image Search

SkyFinder: Attribute-based Sky Image Search SkyFinder: Attribute-based Sky Image Search SIGGRAPH 2009 Litian Tao, Lu Yuan, Jian Sun Kim, Wook 2016. 1. 12 Abstract Interactive search system of over a half million sky images Automatically extracted

More information

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple Kernel Learning for Emotion Recognition in the Wild Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,

More information

Wikipedia Retrieval Task ImageCLEF 2011

Wikipedia Retrieval Task ImageCLEF 2011 Wikipedia Retrieval Task ImageCLEF 2011 Theodora Tsikrika University of Applied Sciences Western Switzerland, Switzerland Jana Kludas University of Geneva, Switzerland Adrian Popescu CEA LIST, France Outline

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

Mixtures of Gaussians and Advanced Feature Encoding

Mixtures of Gaussians and Advanced Feature Encoding Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why

More information

88 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 1, FEBRUARY 2012

88 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 1, FEBRUARY 2012 88 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 1, FEBRUARY 2012 Semantic Model Vectors for Complex Video Event Recognition Michele Merler, Student Member, IEEE, Bert Huang, Lexing Xie, Senior Member,

More information

UCF-CRCV at TRECVID 2014: Semantic Indexing

UCF-CRCV at TRECVID 2014: Semantic Indexing UCF-CRCV at TRECVID 214: Semantic Indexing Afshin Dehghan 1, Mahdi M. Kalayeh 1, Yang Zhang 1, Haroon Idrees 1, Yicong Tian 1, Amir Mazaheri 1, Mubarak Shah 1, Jingen Liu 2, and Hui Cheng 2 1 1 Center

More information

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and

More information

A Comparison of l 1 Norm and l 2 Norm Multiple Kernel SVMs in Image and Video Classification

A Comparison of l 1 Norm and l 2 Norm Multiple Kernel SVMs in Image and Video Classification A Comparison of l 1 Norm and l 2 Norm Multiple Kernel SVMs in Image and Video Classification Fei Yan Krystian Mikolajczyk Josef Kittler Muhammad Tahir Centre for Vision, Speech and Signal Processing University

More information

Fish species recognition from video using SVM classifier

Fish species recognition from video using SVM classifier Fish species recognition from video using SVM classifier Katy Blanc, Diane Lingrand, Frédéric Precioso Univ. Nice Sophia Antipolis, I3S, UMR 7271, 06900 Sophia Antipolis, France CNRS, I3S, UMR 7271, 06900

More information

Rushes Video Segmentation Using Semantic Features

Rushes Video Segmentation Using Semantic Features Rushes Video Segmentation Using Semantic Features Athina Pappa, Vasileios Chasanis, and Antonis Ioannidis Department of Computer Science and Engineering, University of Ioannina, GR 45110, Ioannina, Greece

More information

AXES at TRECVid 2012: KIS, INS, and MED

AXES at TRECVid 2012: KIS, INS, and MED AXES at TRECVid 202: KIS, INS, and MED Dan Oneata, Matthijs Douze, Jérôme Revaud, Schwenninger Jochen, Danila Potapov, Heng Wang, Zaid Harchaoui, Jakob Verbeek, Cordelia Schmid, Robin Aly, et al. To cite

More information

ITI-CERTH participation to TRECVID 2012

ITI-CERTH participation to TRECVID 2012 ITI-CERTH participation to TRECVID 2012 Anastasia Moumtzidou, Nikolaos Gkalelis, Panagiotis Sidiropoulos, Michail Dimopoulos, Spiros Nikolopoulos, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris

More information

Multimedia Event Detection for Large Scale Video. Benjamin Elizalde

Multimedia Event Detection for Large Scale Video. Benjamin Elizalde Multimedia Event Detection for Large Scale Video Benjamin Elizalde Outline Motivation TrecVID task Related work Our approach (System, TF/IDF) Results & Processing time Conclusion & Future work Agenda 2

More information

Class 5: Attributes and Semantic Features

Class 5: Attributes and Semantic Features Class 5: Attributes and Semantic Features Rogerio Feris, Feb 21, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Project

More information

Class 9 Action Recognition

Class 9 Action Recognition Class 9 Action Recognition Liangliang Cao, April 4, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual Recognition

More information

Fisher vector image representation

Fisher vector image representation Fisher vector image representation Jakob Verbeek January 13, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.11.12.php Fisher vector representation Alternative to bag-of-words image representation

More information

Compressed local descriptors for fast image and video search in large databases

Compressed local descriptors for fast image and video search in large databases Compressed local descriptors for fast image and video search in large databases Matthijs Douze2 joint work with Hervé Jégou1, Cordelia Schmid2 and Patrick Pérez3 1: INRIA Rennes, TEXMEX team, France 2:

More information

A Text-Based Approach to the ImageCLEF 2010 Photo Annotation Task

A Text-Based Approach to the ImageCLEF 2010 Photo Annotation Task A Text-Based Approach to the ImageCLEF 2010 Photo Annotation Task Wei Li, Jinming Min, Gareth J. F. Jones Center for Next Generation Localisation School of Computing, Dublin City University Dublin 9, Ireland

More information

Nagoya University at TRECVID 2014: the Instance Search Task

Nagoya University at TRECVID 2014: the Instance Search Task Nagoya University at TRECVID 2014: the Instance Search Task Cai-Zhi Zhu 1 Yinqiang Zheng 2 Ichiro Ide 1 Shin ichi Satoh 2 Kazuya Takeda 1 1 Nagoya University, 1 Furo-Cho, Chikusa-ku, Nagoya, Aichi 464-8601,

More information

Tag Recommendation for Photos

Tag Recommendation for Photos Tag Recommendation for Photos Gowtham Kumar Ramani, Rahul Batra, Tripti Assudani December 10, 2009 Abstract. We present a real-time recommendation system for photo annotation that can be used in Flickr.

More information

Notebook paper: TNO instance search submission 2012

Notebook paper: TNO instance search submission 2012 Notebook paper: TNO instance search submission 2012 John Schavemaker, Corné Versloot, Joost de Wit, Wessel Kraaij TNO Technical Sciences Brassersplein 2, 2612 CT, Delft, The Netherlands E-mail of corresponding

More information

CS 231A Computer Vision (Fall 2012) Problem Set 4

CS 231A Computer Vision (Fall 2012) Problem Set 4 CS 231A Computer Vision (Fall 2012) Problem Set 4 Master Set Due: Nov. 29 th, 2012 (23:59pm) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, Cordelia Schmid LEAR team, INRIA Rhône-Alpes, Grenoble, France

More information

Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature

Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature 0/19.. Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature Usman Tariq, Jianchao Yang, Thomas S. Huang Department of Electrical and Computer Engineering Beckman Institute

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN: Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

VERGE: A Video Interactive Retrieval Engine

VERGE: A Video Interactive Retrieval Engine VERGE: A Video Interactive Retrieval Engine Stefanos Vrochidis, Anastasia Moumtzidou, Paul King, Anastasios Dimou, Vasileios Mezaris and Ioannis Kompatsiaris Informatics and Telematics Institute 6th Km

More information

Features Preserving Video Event Detection using Relative Motion Histogram of Bag of Visual Words

Features Preserving Video Event Detection using Relative Motion Histogram of Bag of Visual Words ISSN 2395-1621 Features Preserving Video Event Detection using Relative Motion Histogram of Bag of Visual Words #1 Ms. Arifa U. Mulani, #2 Ms. Varsha V. Mahajan, #3 Ms. Swati B. Wadghule, #4 Prof. Radha

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

SIFT, BoW architecture and one-against-all Support vector machine

SIFT, BoW architecture and one-against-all Support vector machine SIFT, BoW architecture and one-against-all Support vector machine Mohamed Issolah, Diane Lingrand, Frederic Precioso I3S Lab - UMR 7271 UNS-CNRS 2000, route des Lucioles - Les Algorithmes - bt Euclide

More information

Classifier Fusion for SVM-Based Multimedia Semantic Indexing

Classifier Fusion for SVM-Based Multimedia Semantic Indexing Classifier Fusion for SVM-Based Multimedia Semantic Indexing Stéphane Ayache, Georges Quénot, Jérôme Gensel To cite this version: Stéphane Ayache, Georges Quénot, Jérôme Gensel. Classifier Fusion for SVM-Based

More information

Learning to Recognize Faces in Realistic Conditions

Learning to Recognize Faces in Realistic Conditions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Improving Instance Search Performance in Video Collections

Improving Instance Search Performance in Video Collections Improving Instance Search Performance in Video Collections Zhenxing Zhang School of Computing Dublin City University Supervisor: Dr. Cathal Gurrin and Prof. Alan Smeaton This dissertation is submitted

More information

By Suren Manvelyan,

By Suren Manvelyan, By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan,

More information

OBJECT CATEGORIZATION

OBJECT CATEGORIZATION OBJECT CATEGORIZATION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it Slides: Ing. Lamberto Ballan November 18th, 2009 What is an Object? Merriam-Webster Definition: Something material that may be

More information

SEARCHING pictures on smart phones, PCs, and the

SEARCHING pictures on smart phones, PCs, and the IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012 1091 Harvesting Social Images for Bi-Concept Search Xirong Li, Cees G. M. Snoek, Senior Member, IEEE, Marcel Worring, Member, IEEE, and Arnold

More information

Consumer Video Understanding

Consumer Video Understanding Consumer Video Understanding A Benchmark Database + An Evaluation of Human & Machine Performance Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, Daniel Ellis, Alexander C. Loui Columbia University Kodak Research

More information

Video event detection using subclass discriminant analysis and linear support vector machines

Video event detection using subclass discriminant analysis and linear support vector machines Video event detection using subclass discriminant analysis and linear support vector machines Nikolaos Gkalelis, Damianos Galanopoulos, Vasileios Mezaris / TRECVID 2014 Workshop, Orlando, FL, USA, November

More information

User Strategies in Video Retrieval: a Case Study

User Strategies in Video Retrieval: a Case Study User Strategies in Video Retrieval: a Case Study L. Hollink 1, G.P. Nguyen 2, D.C. Koelma 2, A.Th. Schreiber 1, M. Worring 2 1 Section Business Informatics Free University Amsterdam De Boelelaan 1081a

More information

A System of Image Matching and 3D Reconstruction

A System of Image Matching and 3D Reconstruction A System of Image Matching and 3D Reconstruction CS231A Project Report 1. Introduction Xianfeng Rui Given thousands of unordered images of photos with a variety of scenes in your gallery, you will find

More information

The AXES submissions at TrecVid 2013

The AXES submissions at TrecVid 2013 The AXES submissions at TrecVid 2013 Robin Aly, Relja Arandjelovic, Ken Chatfield, Matthijs Douze, Basura Fernando, Zaid Harchaoui, Kevin Mcguiness, Noël O Connor, Dan Oneata, Omkar Parkhi, et al. To cite

More information

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Latent Variable Models for Structured Prediction and Content-Based Retrieval Latent Variable Models for Structured Prediction and Content-Based Retrieval Ariadna Quattoni Universitat Politècnica de Catalunya Joint work with Borja Balle, Xavier Carreras, Adrià Recasens, Antonio

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

Ranking Error-Correcting Output Codes for Class Retrieval

Ranking Error-Correcting Output Codes for Class Retrieval Ranking Error-Correcting Output Codes for Class Retrieval Mehdi Mirza-Mohammadi, Francesco Ciompi, Sergio Escalera, Oriol Pujol, and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra,

More information

Team SRI-Sarnoff s AURORA TRECVID 2011

Team SRI-Sarnoff s AURORA TRECVID 2011 Team SRI-Sarnoff s AURORA System @ TRECVID 2011 Hui Cheng, Amir Tamrakar, Saad Ali, Qian Yu, Omar Javed, Jingen Liu, Ajay Divakaran, Harpreet S. Sawhney, Alex Hauptmann, Mubarak Shah, Subhabrata Bhattacharya,

More information

Toward Retail Product Recognition on Grocery Shelves

Toward Retail Product Recognition on Grocery Shelves Toward Retail Product Recognition on Grocery Shelves Gül Varol gul.varol@boun.edu.tr Boğaziçi University, İstanbul, Turkey İdea Teknoloji Çözümleri, İstanbul, Turkey Rıdvan S. Kuzu ridvan.salih@boun.edu.tr

More information

Large-scale visual recognition The bag-of-words representation

Large-scale visual recognition The bag-of-words representation Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level

More information

CS4670: Computer Vision

CS4670: Computer Vision CS4670: Computer Vision Noah Snavely Lecture 6: Feature matching and alignment Szeliski: Chapter 6.1 Reading Last time: Corners and blobs Scale-space blob detector: Example Feature descriptors We know

More information

Segmentation as Selective Search for Object Recognition in ILSVRC2011

Segmentation as Selective Search for Object Recognition in ILSVRC2011 Segmentation as Selective Search for Object Recognition in ILSVRC2011 Koen van de Sande Jasper Uijlings Arnold Smeulders Theo Gevers Nicu Sebe Cees Snoek University of Amsterdam, University of Trento ILSVRC2011

More information

IPL at ImageCLEF 2017 Concept Detection Task

IPL at ImageCLEF 2017 Concept Detection Task IPL at ImageCLEF 2017 Concept Detection Task Leonidas Valavanis and Spyridon Stathopoulos Information Processing Laboratory, Department of Informatics, Athens University of Economics and Business, 76 Patission

More information

THE POPULARITY of the internet has caused an exponential

THE POPULARITY of the internet has caused an exponential IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 4, APRIL 2011 381 Contextual Bag-of-Words for Visual Categorization Teng Li, Tao Mei, In-So Kweon, Member, IEEE, and Xian-Sheng

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information