Visual Dictionary: Towards a Higher-level Visual Representation for Object Categorization. CHUA, Tat-Seng School of Computing

Size: px
Start display at page:

Download "Visual Dictionary: Towards a Higher-level Visual Representation for Object Categorization. CHUA, Tat-Seng School of Computing"

Transcription

1 Visual Dictionary: Towards a Higher-level Visual Representation for Object Categorization CHUA, Tat-Seng School of Computing Outline of Talk Introduction Current Approaches Towards Higher Level Visual Representation Current Developments Conclusions 1

2 The New Information Age The Internet has revolutionalized the way information is created, disseminated and consumed Mixture of info available has changed from purely text, to include mm data, and live media Emergence of huge amount of end-user generated data & managed systems like Wikipedia, and more recently Wikimedia Greater connectivity leads to huge amount of live info WWW has also rapidly gone mobile, permitting access from anywhere How Big is Internet? suggest a ~22.34 billion indexed pages (Sep. 2007) Studies claimed that the deep (unindexed) web is ~500 times larger than the indexed MM contents increasing at exponential rate 31 million hours of video were produced each year (2006) Over 65% of Web traffic is on mm contents (2007) Like deep Web, the amount of info available in live Web is unknown 2

3 Information Bottleneck? Users retrieve text and mm info on routine basis But other than text, it is hard to find nontextual media.. Semantic gaps - non-textual media are hard to process How to efficiently find these images when there is no text annotation? What about live sensor media? Can we process the array of sensors to find out if a missing girl in blue dress & red bag has passed by the area in last few days? Visual Content Analysis Focus on visual analysis in this talk Need to automatically analyze image/video contents to extract semantic information Useful for processing both traditional and live media A statement of the problem: given an image (or video), what concepts are present Car, Building, Road, City, Outdoor? Cow, Truck, Road, Village, Outdoor? Pool, Landscape, Indoor 3

4 Visual Content Analysis -2 CHALLENGES How to ensure correct and complete annotation How to make use of multi-faceted knowledge (like social annotated keywords, associated text from Web pages etc) Auto extraction of concepts has many applications Concept annotation and propagation assign new or additional concepts to image/video set Image/video retrieval Simulation study shows that average concept detection accuracy of above 0.1 is useful to general image retrieval Outline of Talk Introduction Current Approaches Towards Higher Level Visual Representation Current Developments Conclusions 4

5 Visual Content Representation Current Approaches employ a combination of low-level and high-level visual features Low level global features Color histogram; Color correlegram; Grid-based color moments Edge/direction histogram; Texture Low level local features Local-region descriptors such as SIFT Bag of Words Motion Higher level features Face detection (and recognition) Text (user annotated or extracted form Web) Simple high-level features Low-Level Global Features The most widely used low-level global features: Color histogram; Color correlegram; Grid-based color moments Edge/direction histogram; Texture Provided in most publicly available test dataset NUS-WIDE data set, comprising 270,000 images Pre-compute all of the above features for every images Advantages Easy and efficient to extract and use Seems effective Retrieval example based on TRECVID dataset 5

6 Concept Annotation in TRECVID TRECVID: Large-scale video annotation and retrieval evaluation hosted by NIST o TRECVID-07: 100-hour documentary video corpus or ~40K keyframes o Annotate 20 concepts Concept Annotation in TRECVID -2 Global visual features used o Color histogram (CH) o color moment (CM) o wavelet texture (WT) o texture co-occurrence (TC) Machine Learning Techniques: o Use SVM for uni-modal training o SVM and averaging for fusion Performance: o A good baseline performance CM CH WT TC Fusion MAP o Could achieve using OWA fusion CM CH WT TC Fusi on 6

7 Low-Level Local Features -1 SIFT (Scale-invariant feature transform) A popular local-region descriptors It is effective and robust Bag of visual words (or alphabets): A quantized (clustered) vector of SIFT features In the order of 500-2,000 words Represent image as a histogram of visual words visual codebook Low-Level Local Features -2 Matching examples Un-filtered example Filtered examples Filtered examples 7

8 Concept annotation in TRECVID Testing on TRECVID 2007 dataset with the used of Bag of words (SIFT) features o Shows vast improvement in results over the use of global l features CM CH WT TC Bag of Fusion words MAP CM CH WT TC Bow Fusi on TRECVID Annotation Examples Accuracy still not sufficient < MAP of 0.15 Success Failure sports computer screen boat 8

9 Higher Level Features Face: a commonly-used high level features Very accurate for frontal face detectors Reasonable for multi-view face detectors {face body body matching} for people identification Useful for consumer and home photos where people is key Face Detected Face Body Mask Body Outline of Talk Introduction Current Approaches Towards Higher Level Visual Representation Current Developments Conclusions 9

10 Problems with Current Visual Representation Local-level local features based on bags of visual words are effective to certain extent But it is still too low-level Ideal visual representation Discriminative among object classes Large inter-class distance for categorization Invariant within the class Small intra-class distance for generalization Visual word is not discriminative or Invariant Topological proximity in visual word space > semantic relevance As objects can have arbitrarily diff. appearances Semantic Gap Visual word is not discriminative or invariant One visual word may have diff. semantic meaning in diff. context o Visual word is a result of vector quantization o Objects of different classes may share similar local appearances A polysemy phenomena visual word patch descriptor Multiple visual words may have Multiple visual words may have same semantic meaning o Diversity of object of same class o Consequence: large intra-class variations A synonymy phenomena visual word 10

11 Towards Higher Level Visual Representation Several recent efforts aim towards higher level visual representation o 1 st category: Explore deeper understanding of how the brain perceives the world to improve computer vision (Ullman, 2007; Serre, 2007; Karklin & Lewicki, 2008) o 2 nd category: Extend computer vision approach to extract more distinctive higher level feature configurations (Yuan, 2007; Quack et al., 2007) o 3 rd category: Borrow idea from NLP to extract higher level representation (Zheng et al. 2008) o The availability of large scale knowledge base such as Wikimedia and on-line visual dictionaries helps 1 ST Category: Brian-Media Approach -1 Observations: There is still huge gaps between how human brain and computers view objects Semantic hierarchies have been shown to be useful for object recognition and concept annotation Ullman et al. (2007) built feature hierarchy for object recognition based on the informative fragment Demonstrated for a few specific classes Informative fragment examples Hierarchy of abstract features 11

12 1 ST Category: Brian-Media Approach -2 To understand which features are extracted and represented by visual context, Serre et al. (2007) provide a hierarchical feature representation with Cortex-Like Mechanisms o The model consists of four layers, that replicate the way neurons process input and output stimuli Input image S1 C1 S2 C2 Correspond to simple cells in primary visual cortex (V1) Correspond cortical complex cell Selectivity: Gaussian like function for tuning and specificity Tolerance: Maximum like operation for invariance over positions and scales The C 1 and C 2 standard model features (SMF) are extracted for object recognition and scene understanding 1 ST Category: Brian-Media Approach -3 Karklin et al. (2007) presented a computational model by learning how our brain sees natural scenes o The model neurons encode the common statistical structure most consistent with a given image o Neuronal activities show a diverse range of properties observed in cortical cells and have strong discriminativeness Linear feature are overlapping Feature statistical pattern distinguish them 12

13 1 ST Category: Brian-Media Approach -4 Karklin et al. (2007) cont. Activation patterns for model neurons model neurons Images are better separated Distribution pattern in image regions image regions Distribution coding model 2 nd Category: Vision-based Approach -1 Yuan et al. (2007) proposed a method to discover high level visual phrases according to spatial co-occurrence patterns o Visual phrases are discovered by using frequent itemize mining o Each visual phrase is composed of meaningful co-located visual words Examples of meaningful itemsets from car category (red rectangle) 13

14 2 nd Category: Vision-based Approach -2 Quack et al. (2007) apply data-mining technique to extract discriminant higher-level feature configurations o Find spatial configurations of local features occurring frequently in a given object class, and rarely on the background o Features in background are pruned away o The meaningful feature configurations often corresponds to semantic object parts, such as the motorbike wheels Mind frequent configuration Class-specific feature selection Neighborhood based image description Examples of discriminant frequent spatial configurations 3 rd Category: Language Approach -1 Overview Recall BoVW is not discriminative and invariant because of: o Objects of different classes may share similar local appearances A polysemy phenomena o Multiple visual words may have same semantic meaning A synonymy phenomena Borrow idea from linguistics: o Alphabets Words Phrases Synonym Word Sense parse tree dependency and semantic parsing etc.. 14

15 3 rd Category: Language Approach -2 Visual Phrase Leverage spatial and contextual info to build more distinctive patterns Visual phrase: spatially cooccurring visual words in a support region o Could formulate visual phrase discovery as a task of Frequent Itemset Mining (FIM) o Discover groups of co-occurring visual words in a spatial neighborhood o Example: visual phrase BC Visual phrase AB visual word A visual word B 3 rd Category: Language Approach -3 Delta Visual Phrase Weakness of visual phrase: contains only spatial co-occurring info Design delta visual phrase to leverage contextual info Definitions Let R 1, R 2,, R k, R k+1, be a set of regions with same centroid and increasing sizes Minimal i Support Region: region R k is called minimal support region of visual phrase π, if any smaller region R k-1 is not large enough to discover the visual phrase π 15

16 3 rd Category: Language Approach -4 Delta Visual Phrase Definitions Delta visual phrase of region R k is: newly discovered visual phrases, when the support regions just grows from R k-1 to R k. visual phrases that have R k as minimum support region Delta visual phrase is mined by changing size of support regions When support region R k increases, o its delta visual phrases cross over larger regions o capture both spatial co-occurring and contextual info of its visual words o Example, AB will be a newly discovered visual phrase as we increase from R 1 to R 2 3 rd Category: Language Approach -5 Visual Synset: a more invariant visual unit Though more distinctive, delta visual phrase suffers from: o Topological proximity in feature = visual similarity semantic relevance o Large intra-class invariant Visual synset: a higher level visual unit o Synset (synonymy set): a set of words with similar semantic o Define the probabilistic semantic of a visual word/phrase w: class probability distribution P(c w) contribution to the classification of its belonging image 16

17 3 rd Category: Language Approach -6 Visual Synset: a more invariant visual unit Rational of visual synset: o Many visual words are intrinsic and indicative to certain classes o These visual words tend to share similar probability distribution P(c w), which peaks around its belonging classes Explore Information Bottle Principle to guide clustering process, to optimize data compression in clustering probabilistic semantic 3 rd Category: Language Approach -7 Info Bottleneck (IB) Principles Input: Joint distribution P(w, c) of visual words/ phrase w and image classes c Goal: construct the optimal compact representation of w, namely visual synset clusters s, o s preserves as much information as possible about c Solution: Lagrange optimization LPs [ ( c )] I ( SC ; ) IW ( ; S ) mutual info of synsets and classes info. loss in clustering visual words/phrases into synsets. IB Implementation: Sequential IB Clustering 17

18 3 rd Category: Language Approach -8 Experiments Testing Datasets: Caltech-101; 102 categories & 9233 images Experimental Setup: o For each class, 30 randomly selected images for training, the rest for testing Evaluation Criteria: o Image Classification: mean classification accuracy Visual word generation: o Keypoint: Difference of Gaussion (DoG) and Hessian Laplacian o Descriptor: SIFT and Spin; 2x2 image grids o Vector Quantization: K-means, 1100 clusters Classifier: SVM with RBF kernel 3 rd Category: Language Approach -9 Experiments Experiment 1: Uses only visual words o Accuracy = 57.2% Experiment 2: Incorporate delta visual phrase o Setting support region size to 4, 8 and 12 o Varying delta phrases from 1100, 1200, 1300, 1400, 1500, 1700, 1800 to 2000 o At 1400 delta visual phrases, Accuracy = 60.2 Examples of delta visual phrases

19 3 rd Category: Language Approach -10 Experiments Experiment 3: Incorporate Visual Synset o Generate visual synsets from codebook of 1400 delta visual phrases o Set cardinality of visual synsets to: 50, 100, 200, 400, 600, 800, 1000 and 1200 o Best accuracy: 62.6% Examples of visual synset 3 rd Category: Language Approach -11 Experiments: Benchmark RUN Visual synset Visual Delta visual Visual words phrase synset Accu(%) Comparison with reported systems [Grauman 05] [Berg 05] [Zhang 05] [Lazebnik 06] [Bosch 07] Accu(%) NOTE: [Bosch 07] exploited more features with complex kernel matrix learning 19

20 3 rd Category: Language Approach -12 Observation Visual synset can give superior accuracy o 600 visual synsets: 62.6% Visual synset can give compact representation o 50 visual synsets: 55.2% Visual synsets fuse semantic consistent visual words/phrases together o reduces the intra-class variations and o renders the image distribution more coherent and manageable Visual synset is a result of supervised dimensionality reduction o properly reduced dimensionality partially resolve statistical sparseness problem Outline of Talk Introduction Current Approaches Towards Higher Level Visual Representation Current Developments Conclusions 20

21 Moving Forward What s next for our approach? o Recall our starting point, that, major causes of low level of representative power are: A polysemy phenomena A synonymy phenomena Borrow idea from linguistics: o Alphabets Words Phrases Synonym Word Sense parse tree dependency and semantic parsing etc.. Natural next phase is to explore: o Word Sense parse tree dependency and semantic parsing issues.. o Build basic visual units or visual vocabulary describe image/video contents Modeling Visual Context As images are easier to obtain, problem of visual diversity of objects (concepts) becomes worse o For example, there are 79,695 category of buildings in Wikimedia Commons. Each category shows unique visual property How to deal with this problem? o Needs more discriminative feature representation o Strong generalization or large training data o Context is the key 21

22 Modeling Visual Context -2 The foundation of both generative model and classification based method is the estimation of similarity between images The two images show little visual similarity, but express similar semantics. There is inconsistency between visual & semantic similarity. How to deal with this problem? sky mountain sea Context is the KEY Modeling Visual Context -3 Training Data from, say, Wikimedia o Every image is assigned to a category in a hierarchy o Images in the same category share similar semantics; most also share visual similarity Advantages: o Large scale, people involved o The images in the category are relatively pure blue sky sky cloudy sky sunset sky 22

23 Modeling Visual Context -4 Examine Visual characteristics at object level o Traditional visual features are at image level o Alternative: utilize multiple segmentations of images to perform visual categorization at image region level o Inference of image semantic does not require accurate boundaries of objects Model visual hierarchy and context of different visual classes/themes o Rational: though images from different classes can share the same set of visual themes (parts), the distribution and combination of visual themes tend to be different for different image classes o One possible approach: utilize hierarchical Dirichlet processes to model distribution & context of visual themes Large Datasets Available Fortunately, large scale datasets such as Wikimedia are available: o Clean concept hierarchy o large amount of high quality sample images o Explicit visual component relationships Wikimedia o For example: Transport hierarchy o Visual ontology can be used to support object recognition and concept annotation 23

24 Transpor t Transpor t by country Road transport Cable transport Aviation Bicycle transport Child transport Patient transport Bus transport Objects on roads Road accidents Icons for road Road vehicles Carriages Automobil es Bookmobil es Buses Motorcycle s Car company logos Classic cars Automobil e parts Tires Car seats Automobil es lights Automobil e grills Car door handles Vehicle Mirrors Automobil egauges Transpor t Transpor t by country Road transport Bus transport Cable transport Objects on roads Aviation Road accidents Bicycle transport Patient transport Icons for road Road vehicles Carriages Automobil es Bookmobil es Child transport Buses Motorcycle s 24

25 Carriages Automobil es Car company logos Road vehicles Bookmobil es Buses Motorcycle s Classic cars Automobil e parts Tires Car seats Automobil e lights Automobil e grills Car door handles Tires Car seats Automobil e lights Automobil e parts Automobil e grills Car door handles Vehicle Mirrors Automobil egauges 25

26 The built visual ontology can be used to support object recognition and concept annotation. car Road vehicle objects + visual features detect street motorcycle Large Datasets Available Fortunately, large scale datasets such as Wikimedia are available: o Clean concept hierarchy o large amount of high quality sample images o Explicit visual component relationships Wikimedia o For example: Transport hierarchy o Visual ontology can be used to support object recognition and concept annotation NUS-WIDE: A Real World Web Image Dataset o > 300k images & associated tags crawled from Flickr o Preserve 269,648 images and their tags o Contain 5,018 unique tags o Provide 6 sets of visual features o Offer ground truth for 81 concepts 26

27 NUS-WIDE: Web Image Dataset -1 The Frequency Distribution of Tags The Number of Tags per Image 53 NUS-WIDE: Web Image Dataset -2 Most frequent tags after noise removal sky light nature sunset water sea blue white clouds people bravo night landscape beach green architecture red art explore travel

28 NUS-WIDE: Web Image Dataset -3 List of visual features extracted: Global Features o 64-D color histogram; o 144-D color correlogram; o 75-D edge direction histogram; o 128-D wavelet texture; Grid-based Features o 225-D block-wise color moments, which are extracted over 5x5 fixed grid partitions; Bag of Visual Words o 500-D bag of words based on SIFT descriptions on key- points in the images 55 NUS-WIDE: Web Image Dataset -4 Ground Truth for 81 concepts for evaluation o They are consistent with those concepts described in other literatures, such as COREL, CALTECH101 and LSCOM; o They most correspond to the frequent tags in Flickr; o They have both general concept such as animal, and specific concepts such as dog and flowers ; o They belong to different categories including object, scene, event, people, etc. The overall effort for manual annotation o About 3,000 man-hours 56 28

29 NUS-WIDE: Web Image Dataset NUS-WIDE: Web Image Dataset -6 Exemplars for some concepts Airport Birds Beach Bridge Car Cityscape Dancing Fire Explosion Glacier Map Mountain Police Railroad Rainbow Reflection Sign Sports Temple Waterfall Wedding 29

30 NUS-WIDE: Web Image Dataset -7 Statistics of Relevant Images 59 NUS-WIDE: Web Image Dataset -8 Benchmark annotation by Learning from the tags: knn MAP: According to CMU s simulations, this MAP is effective to help general image retrieval Thus effective models can be learned to improve web image annotation and retrieval Web Link:

31 Outline of Talk Introduction Current Approaches Towards Higher Level Visual Representation Current Developments Conclusions Looking Into the Future Recent Research o Towards building higher-level representations for visual contents o Various vision-oriented i i approaches o Potential in understanding human visual cognition o DON T forget about text (tags) and related knowledge Great Opportunity o Available of clean and large evolving datasets makes large scale efforts feasible Acceleration of Efforts towards visual ontology and vocabulary o Extend to other media types o Idea of multimedia dictionary and vocabulary o Towards making analysis and retrieval of non-text contents as simple as text 31

32 63 THANK YOU 32

Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Columbia University High-Level Feature Detection: Parts-based Concept Detectors TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab

More information

Beyond Visual Words: Exploring Higher-level Image Representation for Object Categorization

Beyond Visual Words: Exploring Higher-level Image Representation for Object Categorization Beyond Visual Words: Exploring Higher-level Image Representation for Object Categorization Yan-Tao Zheng Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the

More information

Leveraging flickr images for object detection

Leveraging flickr images for object detection Leveraging flickr images for object detection Elisavet Chatzilari Spiros Nikolopoulos Yiannis Kompatsiaris Outline Introduction to object detection Our proposal Experiments Current research 2 Introduction

More information

Latest development in image feature representation and extraction

Latest development in image feature representation and extraction International Journal of Advanced Research and Development ISSN: 2455-4030, Impact Factor: RJIF 5.24 www.advancedjournal.com Volume 2; Issue 1; January 2017; Page No. 05-09 Latest development in image

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System

The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System Our first participation on the TRECVID workshop A. F. de Araujo 1, F. Silveira 2, H. Lakshman 3, J. Zepeda 2, A. Sheth 2, P. Perez

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

ImageCLEF 2011

ImageCLEF 2011 SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Large-scale visual recognition The bag-of-words representation

Large-scale visual recognition The bag-of-words representation Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

ACM MM Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang

ACM MM Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang ACM MM 2010 Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang Harbin Institute of Technology National University of Singapore Microsoft Corporation Proliferation of images and videos on the Internet

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

TA Section: Problem Set 4

TA Section: Problem Set 4 TA Section: Problem Set 4 Outline Discriminative vs. Generative Classifiers Image representation and recognition models Bag of Words Model Part-based Model Constellation Model Pictorial Structures Model

More information

Exploring Geotagged Images for Land-Use Classification

Exploring Geotagged Images for Land-Use Classification Exploring Geotagged Images for Land-Use Classification Daniel Leung Electrical Engineering & Computer Science University of California, Merced Merced, CA 95343 cleung3@ucmerced.edu Shawn Newsam Electrical

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research) Joint Inference in Image Databases via Dense Correspondence Michael Rubinstein MIT CSAIL (while interning at Microsoft Research) My work Throughout the year (and my PhD thesis): Temporal Video Analysis

More information

Spatial Latent Dirichlet Allocation

Spatial Latent Dirichlet Allocation Spatial Latent Dirichlet Allocation Xiaogang Wang and Eric Grimson Computer Science and Computer Science and Artificial Intelligence Lab Massachusetts Tnstitute of Technology, Cambridge, MA, 02139, USA

More information

Scene Modeling Using Co-Clustering

Scene Modeling Using Co-Clustering Scene Modeling Using Co-Clustering Jingen Liu Computer Vision Lab University of Central Florida liujg@cs.ucf.edu Mubarak Shah Computer Vision Lab University of Central Florida shah@cs.ucf.edu Abstract

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Sparse Models in Image Understanding And Computer Vision

Sparse Models in Image Understanding And Computer Vision Sparse Models in Image Understanding And Computer Vision Jayaraman J. Thiagarajan Arizona State University Collaborators Prof. Andreas Spanias Karthikeyan Natesan Ramamurthy Sparsity Sparsity of a vector

More information

Bag-of-features. Cordelia Schmid

Bag-of-features. Cordelia Schmid Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, Cordelia Schmid LEAR team, INRIA Rhône-Alpes, Grenoble, France

More information

Sparse coding for image classification

Sparse coding for image classification Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction

More information

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013 CS4670 / 5670: Computer Vision Noah Snavely Bag-of-words models Object Bag of words Bag of Words Models Adapted from slides by Rob Fergus and Svetlana Lazebnik 1 Object Bag of words Origin 1: Texture Recognition

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

Region-based Segmentation and Object Detection

Region-based Segmentation and Object Detection Region-based Segmentation and Object Detection Stephen Gould Tianshi Gao Daphne Koller Presented at NIPS 2009 Discussion and Slides by Eric Wang April 23, 2010 Outline Introduction Model Overview Model

More information

The Fraunhofer IDMT at ImageCLEF 2011 Photo Annotation Task

The Fraunhofer IDMT at ImageCLEF 2011 Photo Annotation Task The Fraunhofer IDMT at ImageCLEF 2011 Photo Annotation Task Karolin Nagel, Stefanie Nowak, Uwe Kühhirt and Kay Wolter Fraunhofer Institute for Digital Media Technology (IDMT) Ehrenbergstr. 31, 98693 Ilmenau,

More information

Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning

Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning Izumi Suzuki, Koich Yamada, Muneyuki Unehara Nagaoka University of Technology, 1603-1, Kamitomioka Nagaoka, Niigata

More information

BUAA AUDR at ImageCLEF 2012 Photo Annotation Task

BUAA AUDR at ImageCLEF 2012 Photo Annotation Task BUAA AUDR at ImageCLEF 2012 Photo Annotation Task Lei Huang, Yang Liu State Key Laboratory of Software Development Enviroment, Beihang University, 100191 Beijing, China huanglei@nlsde.buaa.edu.cn liuyang@nlsde.buaa.edu.cn

More information

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213) Recognition of Animal Skin Texture Attributes in the Wild Amey Dharwadker (aap2174) Kai Zhang (kz2213) Motivation Patterns and textures are have an important role in object description and understanding

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

NUS-WIDE: A Real-World Web Image Database from National University of Singapore

NUS-WIDE: A Real-World Web Image Database from National University of Singapore NUS-WIDE: A Real-World Web Image Database from National University of Singapore Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, Yantao Zheng National University of Singapore Computing

More information

PRISM: Concept-preserving Social Image Search Results Summarization

PRISM: Concept-preserving Social Image Search Results Summarization PRISM: Concept-preserving Social Image Search Results Summarization Boon-Siew Seah Sourav S Bhowmick Aixin Sun Nanyang Technological University Singapore Outline 1 Introduction 2 Related studies 3 Search

More information

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study J. Zhang 1 M. Marszałek 1 S. Lazebnik 2 C. Schmid 1 1 INRIA Rhône-Alpes, LEAR - GRAVIR Montbonnot, France

More information

Probabilistic Generative Models for Machine Vision

Probabilistic Generative Models for Machine Vision Probabilistic Generative Models for Machine Vision bacciu@di.unipi.it Dipartimento di Informatica Università di Pisa Corso di Intelligenza Artificiale Prof. Alessandro Sperduti Università degli Studi di

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Tag Recommendation for Photos

Tag Recommendation for Photos Tag Recommendation for Photos Gowtham Kumar Ramani, Rahul Batra, Tripti Assudani December 10, 2009 Abstract. We present a real-time recommendation system for photo annotation that can be used in Flickr.

More information

CS6670: Computer Vision

CS6670: Computer Vision CS6670: Computer Vision Noah Snavely Lecture 16: Bag-of-words models Object Bag of words Announcements Project 3: Eigenfaces due Wednesday, November 11 at 11:59pm solo project Final project presentations:

More information

Using the Forest to See the Trees: Context-based Object Recognition

Using the Forest to See the Trees: Context-based Object Recognition Using the Forest to See the Trees: Context-based Object Recognition Bill Freeman Joint work with Antonio Torralba and Kevin Murphy Computer Science and Artificial Intelligence Laboratory MIT A computer

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Patch-Based Image Classification Using Image Epitomes

Patch-Based Image Classification Using Image Epitomes Patch-Based Image Classification Using Image Epitomes David Andrzejewski CS 766 - Final Project December 19, 2005 Abstract Automatic image classification has many practical applications, including photo

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Local Features and Bag of Words Models

Local Features and Bag of Words Models 10/14/11 Local Features and Bag of Words Models Computer Vision CS 143, Brown James Hays Slides from Svetlana Lazebnik, Derek Hoiem, Antonio Torralba, David Lowe, Fei Fei Li and others Computer Engineering

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Grounded Compositional Semantics for Finding and Describing Images with Sentences Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational

More information

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,

More information

Contextual priming for artificial visual perception

Contextual priming for artificial visual perception Contextual priming for artificial visual perception Hervé Guillaume 1, Nathalie Denquive 1, Philippe Tarroux 1,2 1 LIMSI-CNRS BP 133 F-91403 Orsay cedex France 2 ENS 45 rue d Ulm F-75230 Paris cedex 05

More information

arxiv: v3 [cs.cv] 3 Oct 2012

arxiv: v3 [cs.cv] 3 Oct 2012 Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,

More information

Exploiting noisy web data for largescale visual recognition

Exploiting noisy web data for largescale visual recognition Exploiting noisy web data for largescale visual recognition Lamberto Ballan University of Padova, Italy CVPRW WebVision - Jul 26, 2017 Datasets drive computer vision progress ImageNet Slide credit: O.

More information

Comparison of Local Feature Descriptors

Comparison of Local Feature Descriptors Department of EECS, University of California, Berkeley. December 13, 26 1 Local Features 2 Mikolajczyk s Dataset Caltech 11 Dataset 3 Evaluation of Feature Detectors Evaluation of Feature Deriptors 4 Applications

More information

Learning Representations for Visual Object Class Recognition

Learning Representations for Visual Object Class Recognition Learning Representations for Visual Object Class Recognition Marcin Marszałek Cordelia Schmid Hedi Harzallah Joost van de Weijer LEAR, INRIA Grenoble, Rhône-Alpes, France October 15th, 2007 Bag-of-Features

More information

VK Multimedia Information Systems

VK Multimedia Information Systems VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at Dienstags, 16.oo Uhr This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Agenda Evaluations

More information

Landmark Recognition: State-of-the-Art Methods in a Large-Scale Scenario

Landmark Recognition: State-of-the-Art Methods in a Large-Scale Scenario Landmark Recognition: State-of-the-Art Methods in a Large-Scale Scenario Magdalena Rischka and Stefan Conrad Institute of Computer Science Heinrich-Heine-University Duesseldorf D-40225 Duesseldorf, Germany

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

Content-Based Image Classification: A Non-Parametric Approach

Content-Based Image Classification: A Non-Parametric Approach 1 Content-Based Image Classification: A Non-Parametric Approach Paulo M. Ferreira, Mário A.T. Figueiredo, Pedro M. Q. Aguiar Abstract The rise of the amount imagery on the Internet, as well as in multimedia

More information

Multimodal Medical Image Retrieval based on Latent Topic Modeling

Multimodal Medical Image Retrieval based on Latent Topic Modeling Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath

More information

A THREE LAYERED MODEL TO PERFORM CHARACTER RECOGNITION FOR NOISY IMAGES

A THREE LAYERED MODEL TO PERFORM CHARACTER RECOGNITION FOR NOISY IMAGES INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONSAND ROBOTICS ISSN 2320-7345 A THREE LAYERED MODEL TO PERFORM CHARACTER RECOGNITION FOR NOISY IMAGES 1 Neha, 2 Anil Saroliya, 3 Varun Sharma 1,

More information

Generic object recognition using graph embedding into a vector space

Generic object recognition using graph embedding into a vector space American Journal of Software Engineering and Applications 2013 ; 2(1) : 13-18 Published online February 20, 2013 (http://www.sciencepublishinggroup.com/j/ajsea) doi: 10.11648/j. ajsea.20130201.13 Generic

More information

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and Search

Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and Search Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and Search Chong-Wah Ngo, Yu-Gang Jiang, Xiaoyong Wei Feng Wang, Wanlei Zhao, Hung-Khoon Tan and Xiao

More information

By Suren Manvelyan,

By Suren Manvelyan, By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan,

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

A Comparison of l 1 Norm and l 2 Norm Multiple Kernel SVMs in Image and Video Classification

A Comparison of l 1 Norm and l 2 Norm Multiple Kernel SVMs in Image and Video Classification A Comparison of l 1 Norm and l 2 Norm Multiple Kernel SVMs in Image and Video Classification Fei Yan Krystian Mikolajczyk Josef Kittler Muhammad Tahir Centre for Vision, Speech and Signal Processing University

More information

Supervised learning. y = f(x) function

Supervised learning. y = f(x) function Supervised learning y = f(x) output prediction function Image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and

More information

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features

More information

Rushes Video Segmentation Using Semantic Features

Rushes Video Segmentation Using Semantic Features Rushes Video Segmentation Using Semantic Features Athina Pappa, Vasileios Chasanis, and Antonis Ioannidis Department of Computer Science and Engineering, University of Ioannina, GR 45110, Ioannina, Greece

More information

Automatic Categorization of Image Regions using Dominant Color based Vector Quantization

Automatic Categorization of Image Regions using Dominant Color based Vector Quantization Automatic Categorization of Image Regions using Dominant Color based Vector Quantization Md Monirul Islam, Dengsheng Zhang, Guojun Lu Gippsland School of Information Technology, Monash University Churchill

More information

Content-Based Multimedia Information Retrieval

Content-Based Multimedia Information Retrieval Content-Based Multimedia Information Retrieval Ishwar K. Sethi Intelligent Information Engineering Laboratory Oakland University Rochester, MI 48309 Email: isethi@oakland.edu URL: www.cse.secs.oakland.edu/isethi

More information

Scene Recognition using Bag-of-Words

Scene Recognition using Bag-of-Words Scene Recognition using Bag-of-Words Sarthak Ahuja B.Tech Computer Science Indraprastha Institute of Information Technology Okhla, Delhi 110020 Email: sarthak12088@iiitd.ac.in Anchita Goel B.Tech Computer

More information

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort

More information

Large scale object/scene recognition

Large scale object/scene recognition Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database

More information

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular

More information

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce Object Recognition Computer Vision Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce How many visual object categories are there? Biederman 1987 ANIMALS PLANTS OBJECTS

More information

arxiv: v1 [cs.mm] 12 Jan 2016

arxiv: v1 [cs.mm] 12 Jan 2016 Learning Subclass Representations for Visually-varied Image Classification Xinchao Li, Peng Xu, Yue Shi, Martha Larson, Alan Hanjalic Multimedia Information Retrieval Lab, Delft University of Technology

More information

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi hrazvi@stanford.edu 1 Introduction: We present a method for discovering visual hierarchy in a set of images. Automatically grouping

More information

Comparing Local Feature Descriptors in plsa-based Image Models

Comparing Local Feature Descriptors in plsa-based Image Models Comparing Local Feature Descriptors in plsa-based Image Models Eva Hörster 1,ThomasGreif 1, Rainer Lienhart 1, and Malcolm Slaney 2 1 Multimedia Computing Lab, University of Augsburg, Germany {hoerster,lienhart}@informatik.uni-augsburg.de

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

Structured Models in. Dan Huttenlocher. June 2010

Structured Models in. Dan Huttenlocher. June 2010 Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies

More information

Consistent Line Clusters for Building Recognition in CBIR

Consistent Line Clusters for Building Recognition in CBIR Consistent Line Clusters for Building Recognition in CBIR Yi Li and Linda G. Shapiro Department of Computer Science and Engineering University of Washington Seattle, WA 98195-250 shapiro,yi @cs.washington.edu

More information

Class 5: Attributes and Semantic Features

Class 5: Attributes and Semantic Features Class 5: Attributes and Semantic Features Rogerio Feris, Feb 21, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Project

More information

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Efficient Kernels for Identifying Unbounded-Order Spatial Features Efficient Kernels for Identifying Unbounded-Order Spatial Features Yimeng Zhang Carnegie Mellon University yimengz@andrew.cmu.edu Tsuhan Chen Cornell University tsuhan@ece.cornell.edu Abstract Higher order

More information

Visual Query Suggestion

Visual Query Suggestion Visual Query Suggestion Zheng-Jun Zha, Linjun Yang, Tao Mei, Meng Wang, Zengfu Wang University of Science and Technology of China Textual Visual Query Suggestion Microsoft Research Asia Motivation Framework

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two

More information

CLASSIFICATION Experiments

CLASSIFICATION Experiments CLASSIFICATION Experiments January 27,2015 CS3710: Visual Recognition Bhavin Modi Bag of features Object Bag of words 1. Extract features 2. Learn visual vocabulary Bag of features: outline 3. Quantize

More information

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14 Announcements Computer Vision I CSE 152 Lecture 14 Homework 3 is due May 18, 11:59 PM Reading: Chapter 15: Learning to Classify Chapter 16: Classifying Images Chapter 17: Detecting Objects in Images Given

More information