CS89/189 Project Milestone: Differentiating CGI from Photographic Images
|
|
- Lesley Hill
- 5 years ago
- Views:
Transcription
1 CS89/189 Project Milestone: Differentiating CGI from Photographic Images Shruti Agarwal and Liane Makatura February 18, Overview In response to the continuing improvements being made in the realm of modeling, rendering, and image manipulation, this project seeks to utilize deep convolutional neural networks (CNN) as a tool to perform, and hopefully improve upon, this task of differentiating between photographs and computer generated images (CGI). Due to the lack of a suitably large dataset (which limited our ability to directly train a neural net), we set out to test our hypothesis in the following 3 ways: 1. Extract the visual representations from the penultimate layer of a CNN (AlexNet) that has been trained on an unrelated, natural image dataset, such as ImageNet, then use these as features in an SVM to see if it can effectively classify CGI and photographic images. 2. Use a relevant dataset (composed of comparable CGI and real image patches) to fine-tune the penultimate layer of the above mentioned CNN; then, repeat the process described in (1) to see if our results improve. Figure 1: Left shows artist Max Edwin Wahyudi s CGI rendering of Korean actress Song Hae Kyo. Right is a photograph of the same. 3. Test the performance of our fine-tuned CNN by directly feeding novel CGI and photographic input, to see if it will be able to generalize effectively at inference time. Note that we have switched to using AlexNet instead of VGG because it will be faster to finetune, which works well in our constrained time frame. While VGG might produce better results, and thus would be an interesting future extension, we believe that AlexNet will be sufficient for this project. By this milestone, we anticipated having a complete dataset, along with preliminary results from our SVM on general AlexNet features (without finetuning). 2 Dataset One of our biggest hurdles in this project is the lack of a pre-existing dataset that is both relevant and sufficiently large. Ultimately, our goal was to amass approximately 100,000 image patches for each class (CGI and photograph). This creates a final dataset consisting of 200,000 patches. This section outlines our acquisition and processing pipeline. 1
2 2.1 Image Collection To create our binary classifier, we needed to collect a 2-part dataset: Photographs We were supplied with a dataset of approximately 96 million photographic images (courtesy of Professor Hany Farid). These images spanned a wide range of semantic content, and we aimed to match this diversity in the CGI dataset as well. CGI No established CGI dataset currently exists, so we created our own. We initially intended to render our own images using an Autodesk Maya cityscape model that was made available to us (courtesy of Prof. Hany Farid). However, we had two main concerns about the dataset we would amass through such an approach: Believability The level of photorealism that we could obtain with the provided architecture, texture maps, and lighting models did not appear convincing enough to generate challenging cases for our network. Ideally, we wanted to train our network on images that were challenging even for a human to categorize. Homogeneity The model had a very distinct style and relatively limited diversity of architecture, scenes/objects, textures, lighting conditions, etc. This homogeneity may tempt the network to simply learn these content-specific identifiers, rather than general features specific to the image type (photo vs. CGI) To avoid these issues, we chose to collect our images from individual artists (namely those in the Dartmouth Digital Arts program), along with various sources around the web. In particular, we used pictures generated by professional rendering companies, such as Render Atelier or Maxwell Render, along with many personal portfolios, university competition results, and creative content-sharing sites such as cgtrader, Adobe Behance, and VRayWorld. Due to the nature of these websites, the process-related content of the posts, and the structure of the sites content tagging, we are reasonably confident that all collected images are in fact examples of CGI. This process also gave us the benefit of collecting samples produced with varying software (Autodesk Maya, Blender, ZBrush, 3DSMax) and renderers (MentalRay, VRay, Corona, RenderMan). It is worth noting that our process ended up consuming much more time than expected, because it devolved into a manual crawling effort. Small scripts often ended up collecting many undesirable images, ranging from advertisements or banners to actual photographs. Due to the quality of the renders we were gathering, it was often very difficult and time-consuming to verify the data and pick out any photographs that slipped into the CGI bin hence, we opted to shift the time-intensive part of the process to the direct gathering. This way, we always saw the images in their respective contexts, which allowed us to be more confident in their classification. 2.2 Image Processing After collecting sufficiently many CG images, we went through the following steps to finalize our dataset: categorizing full images, extracting patches, and extracting features. These steps were conducted for both CG and photographic images. Then, each CGI patch was matched with its nearest neighbor in the photo set, and both patches were moved to our final dataset. Details for each step are outlined below: Preliminary Image Categorization We fed each full image through AlexNet (the CNN we intend to use) and allowed them to be classified into one of the 1000 categories that AlexNet was trained to recognize. We were not concerned with whether or not the classification was correct this step was simply meant to split the CGI/photo dataset into semantically similar subsets. It was conducted to reduce the time complexity of the last step (nearest neighbor search), so that we didn t have to search through nearly a billion unrelated photo patches in order to find the best possible match for a particular CGI patch. We intuited that patches from semantically similar images would be more likely to provide good matches, so to reduce time complexity we partitioned our image sets into bins numbered based on AlexNet s classification. Thus, our file structure had folders CGI and Photo For simplicity, we will consider one pair of corresponding class bins, CGI i and Photo i. 2
3 Patch Extraction From each image in CGI i and Photo i, we randomly extracted 20 patches of size 227x227 to use in our final dataset. We chose to use patches instead of full images to increase the size of our dataset while also reducing the amount of spatial and contextual information that can be exploited by the network, and increasing the probability that we would be able to find a reasonably well-matching patch in the complementary image set. The patch size (227x227) was chosen to accommodate AlexNet s architecture, whose fully connected layers expect input of this size. Feature Extraction We fed each individual patch through AlexNet, and extracted the corresponding features from the first fully connected layer (FC6) of the network. These features will be used by our nearest neighbor pairing search (below) and the SVM. Nearest-Match Pairing For each patch in CGI i, we did a nearest neighbor search through all the candidate matches in Photo i using the representative feature vectors extracted from AlexNet in the previous step. We mapped in this order (CGI to photo) because we had a far larger photo dataset available to us, and we wanted to ensure that every instance of CGI was matched and included in the final dataset. The idea behind this pairing was to ensure that our dataset had challenging examples where content was very similarly distributed between the two classes thus, the network would be forced to learn something specific about CGI vs. photo. A snapshot of our paired datasets can be found below: 3
4 Figure 2: CGI patches (top) and their nearest neighbor photo match (bottom - in corresponding order). 2.3 Current Dataset At the time of SVM training, we had approximately 2,800 CG images contributing to our fully processed database, where each image was the highest available resolution. We also classified and sampled approximately 100,000 random photographs, to ensure that there were enough candidate matches for each CG 4
5 image while still maintaining reasonable time complexity for things like nearest neighbor search. Extracting 20 patches from each image, this gave us roughly 56,000 patches each for CGI and photographs. Our data also includes roughly 800 additional CGI that have not yet been processed (and thus were not included in our initial SVM). Our recent discovery of Behance and VrayWorld, where rendered images are well tagged, will also enable us to collect the remaining data for our target set much more quickly (now that we do not have to manually crawl the web from link to link). Additionally, we found that the images we were able to collect were of very high resolution (usually 1-2k, but some even approaching 4k), so we intend to leverage more patches from each photo to continue enlarging our dataset. We have not yet included these additional patches because we have not finished the processing, and we did not want to over-represent some images in comparison to the others. This process will easily allow us to surpass our original dataset goals, which listed 100,000 patches of size 64x64 for each category. 2.4 Known Limitations & Proposed Solutions In collecting our dataset, we encountered the following limitations: Presence of watermarks or signatures in CGI (typically in the lower left hand corner) We brainstormed ways to discard the affected region of the image (by manual editing, cropping off the bottom/side, or discouraging the selection of the lower left hand region). Ultimately, we opted not to address this issue explicitly, because 1) manual editing would increase time and destroy the integrity of the render, and 2) we did not want to sacrifice valuable, usable information that also lived in these affected regions. In practice, we expect the random patch selection to prevent this from becoming too much of an issue. Available photorealistic CGI content is heavily skewed toward architecture, modern interior design We intend to 1) target more organic CG images (image searches, frames of natural VR/video game reels), and 2) mimic this bias in our photographic database by adding a number of images. As long as the content is equally skewed on both halves of the dataset, this should prevent our network from using particular content to determine CGI or photo classification, which is the only real concern. Preliminary classification of the images could result in suboptimal patch matches If time permits, we may attempt to rerun nearest neighbor search without this preliminary filtering to see if our results improve. However, the purpose of this matching is just to ensure that our network has challenging examples to learn on, and our visualizations seem to corroborate that our matches are satisfactory for this purpose. As such, we are not certain that it would be worth the additional time complexity; however, it may be worth exploring if time permits. Since we had already structured our dataset with this additional filtering, we have not yet run any experiments without it. 3 Preliminary Results After generating our initial dataset, we were able to run a few preliminary tests, which are detailed below. 3.1 t-sne Visualization As we saw in [2], deep features have the ability to cluster not only similar looking images, but also images from the same domain. We wanted to see if the features we extracted from AlexNet (without finetuning) had any inherent separation between the two classes, so we also decided to create a t-sne visualization. The results of our visualization can be seen below: 5
6 Figure 3: This image shows a 2-dimensional visualization of our 4096 dimensional feature space; three separate classes (spanning 4 semantic content categories detailed in Figure 3) are represented here, as denoted in the legend. Clearly, the images in our two sets are comparable to one another, as images within distinct ImageNet class categories (including both CGI and photos) cluster very distinctly according to the AlexNet representation. We also see some separation between CGI and photos in the class clusters, given further validation to the idea that our network will be able to differentiate between these image types. 6
7 Figure 4: Same visualization as above, but with representative thumbnails flanking each class cluster. Note that class 979 inherited both water and sky images, but they cluster very distinctly in the representation. The thumbnails in this image were placed here as an interpretive tool; while they are drawn from the represented data, their particular location on the image does not necessarily correspond with the location of that thumbnail s representative dot in the visualization. An effort has been made to ensure that the images are not covering any data points in the visualization. All images are framed in the appropriate color and labeled for clarity. As we can see in fig.4&5 the CGI and photos features have some clustering within each class itself. It is possible that the Alexnet is capturing very minute appearance difference between real and CGI images. It is also a possibility that there the Alexnet features are also capturing some differences between CGI and real photos which can be generalized to all the classes. To analyse this hypothesis, we used these features to train a linear SVM. The results are discussed in the following section. 3.2 SVM Classification via LIBLINEAR We used the FC-6 features obtained from a non-finetuned Alexnet for our cgi & real dataset and trained a linear SVM for the task of binary classification. We used LIBLINEAR library available at cjlin/liblinear/ We also tried to train a linear SVM from LIBSVM package, but it was taking quite long to train on our dataset. We used features from patches having equal number of CGI and real image patches, ie patches from each category. Each feature is of 4096 dimension. Therefore, the size of our dataset is X We then scale all the dataset so that each feature dimension is [0,1] range. This is done to ensure that the training doesn t get biased towards the dimension with large values. For training data we select random instances from our dataset and the remaining instances we keep for testing. We train the linear SVM classifier using L2 regularization and L2 loss option. We then compute classification accuracy on both train and test data. We also analyse the accuracy corresponding to individual classes, for both testing and training data. We repeat the training and testing procedure 10 times-due to time constraint-each time selecting the training and testing data randomly from the full dataset. The training and testing accuracy results averaged over 10 experiments are given in Fig.4. We are achieving 77.7% test accuracy without fine-tuning of CNN features for our task. We can also see that the percentage of error in classifying CGI images is higher than 7
8 that of real images. We assume that is because the Alexnet is trained on real images only and fc-6 features couldn t capture the fine details of a CGI image which are required for differentiating it from the real image. We hope fine-tuning our network for our task should improve the accuracy. Figure 5: % Accuracy with linear SVM. 4 Future Work As our preliminary results suggest, the deep features from Alexnet are able to differentiate between CGI and real images. We expect that the accuracy of our results will be improved as we get bigger training dataset and we also with the fine-tune of Alexnet for our task. We plan to finish acquiring dataset and start with fine-tuning of Alexnet. For fine-tuning we will replace the last 1000 way classification layer with a 2-way classification layer. Currently in our dataset we chose only those real image patches which are considered similar to CGI images by Alexnet. By doing this we hypothesis that we are selecting really difficult examples for Alexnet to classify separately as CGI and real images. We will validate that our method of generating dataset is better than randomly selecting images. We plan to fine-tune Alexnet separately using the data generated by our method and a randomly generated dataset and analyse the classification accuracy obtained in each case. 5 References 1. H. Farid and M.J. Bravo. Perceptual Discrimination of Computer Generated and Photographic Faces. Digital Investigation, 8: , DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. ICML,
9 3. Very Deep Convolutional Networks for Large-Scale Image Recognition. K. Simonyan, and A. Zisserman. ICLR, vgg/research/very deep/ Z. Li, Z. Zhang, and Y. Shi, Distinguishing computer graphics from photographic images using a multiresolution approach based on local binary patterns, Secur. Commun. Netw., vol. 7, no. 11, pp , Fei-Fei, L., Fergus, R., and Perona, P. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In CVPR, Saenko, K., Kulis, B., Fritz, M., and Darrell, T. Adapting visual category models to new domains. In ECCV,
A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen
A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY
More informationarxiv: v1 [cs.cv] 6 Jul 2016
arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks
More informationDeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition
DeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition ECS 289G 10/06/2016 Authors: Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng and Trevor Darrell
More informationPart Localization by Exploiting Deep Convolutional Networks
Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.
More informationAn Exploration of Computer Vision Techniques for Bird Species Classification
An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex
More informationBeyond Bags of Features
: for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015
More informationA Deep Learning Approach to Vehicle Speed Estimation
A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas bpenchas@stanford.edu Tobin Bell tbell@stanford.edu Marco Monteiro marcorm@stanford.edu ABSTRACT Given car dashboard video footage,
More informationDepth Estimation from a Single Image Using a Deep Neural Network Milestone Report
Figure 1: The architecture of the convolutional network. Input: a single view image; Output: a depth map. 3 Related Work In [4] they used depth maps of indoor scenes produced by a Microsoft Kinect to successfully
More informationCombining Selective Search Segmentation and Random Forest for Image Classification
Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many
More informationTransfer Learning. Style Transfer in Deep Learning
Transfer Learning & Style Transfer in Deep Learning 4-DEC-2016 Gal Barzilai, Ram Machlev Deep Learning Seminar School of Electrical Engineering Tel Aviv University Part 1: Transfer Learning in Deep Learning
More informationDeep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon
Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in
More informationReturn of the Devil in the Details: Delving Deep into Convolutional Nets
Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford The Devil is still in the Details 2011 2014
More informationDeep learning for object detection. Slides from Svetlana Lazebnik and many others
Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep
More informationA Novel Representation and Pipeline for Object Detection
A Novel Representation and Pipeline for Object Detection Vishakh Hegde Stanford University vishakh@stanford.edu Manik Dhar Stanford University dmanik@stanford.edu Abstract Object detection is an important
More informationChapter 7. Conclusions and Future Work
Chapter 7 Conclusions and Future Work In this dissertation, we have presented a new way of analyzing a basic building block in computer graphics rendering algorithms the computational interaction between
More informationUsing Machine Learning for Classification of Cancer Cells
Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.
More informationApparel Classifier and Recommender using Deep Learning
Apparel Classifier and Recommender using Deep Learning Live Demo at: http://saurabhg.me/projects/tag-that-apparel Saurabh Gupta sag043@ucsd.edu Siddhartha Agarwal siagarwa@ucsd.edu Apoorve Dave a1dave@ucsd.edu
More informationObject detection with CNNs
Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals
More informationTRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK
TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.
More informationIDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS
IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS In-Jae Yu, Do-Guk Kim, Jin-Seok Park, Jong-Uk Hou, Sunghee Choi, and Heung-Kyu Lee Korea Advanced Institute of Science and
More informationReal-time Object Detection CS 229 Course Project
Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection
More informationCS 231A Computer Vision (Fall 2011) Problem Set 4
CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based
More informationYelp Restaurant Photo Classification
Yelp Restaurant Photo Classification Rajarshi Roy Stanford University rroy@stanford.edu Abstract The Yelp Restaurant Photo Classification challenge is a Kaggle challenge that focuses on the problem predicting
More informationAggregating Descriptors with Local Gaussian Metrics
Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,
More informationKnow your data - many types of networks
Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for
More informationA Generalized Method to Solve Text-Based CAPTCHAs
A Generalized Method to Solve Text-Based CAPTCHAs Jason Ma, Bilal Badaoui, Emile Chamoun December 11, 2009 1 Abstract We present work in progress on the automated solving of text-based CAPTCHAs. Our method
More informationKaggle Data Science Bowl 2017 Technical Report
Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li
More informationFlow-Based Video Recognition
Flow-Based Video Recognition Jifeng Dai Visual Computing Group, Microsoft Research Asia Joint work with Xizhou Zhu*, Yuwen Xiong*, Yujie Wang*, Lu Yuan and Yichen Wei (* interns) Talk pipeline Introduction
More informationImproving Recognition through Object Sub-categorization
Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,
More informationDeepIndex for Accurate and Efficient Image Retrieval
DeepIndex for Accurate and Efficient Image Retrieval Yu Liu, Yanming Guo, Song Wu, Michael S. Lew Media Lab, Leiden Institute of Advance Computer Science Outline Motivation Proposed Approach Results Conclusions
More informationFACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU
FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU 1. Introduction Face detection of human beings has garnered a lot of interest and research in recent years. There are quite a few relatively
More informationObject Detection Based on Deep Learning
Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
More informationLarge-scale Video Classification with Convolutional Neural Networks
Large-scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei Note: Slide content mostly from : Bay Area
More informationTEXT SEGMENTATION ON PHOTOREALISTIC IMAGES
TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES Valery Grishkin a, Alexander Ebral b, Nikolai Stepenko c, Jean Sene d Saint Petersburg State University, 7 9 Universitetskaya nab., Saint Petersburg, 199034,
More informationLecture 19: Generative Adversarial Networks
Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,
More informationECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016
ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts
More information3D model classification using convolutional neural network
3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing
More informationObject Classification Problem
HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationPreviously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011
Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition
More informationSu et al. Shape Descriptors - III
Su et al. Shape Descriptors - III Siddhartha Chaudhuri http://www.cse.iitb.ac.in/~cs749 Funkhouser; Feng, Liu, Gong Recap Global A shape descriptor is a set of numbers that describes a shape in a way that
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationData: a collection of numbers or facts that require further processing before they are meaningful
Digital Image Classification Data vs. Information Data: a collection of numbers or facts that require further processing before they are meaningful Information: Derived knowledge from raw data. Something
More informationAutomatic Colorization of Grayscale Images
Automatic Colorization of Grayscale Images Austin Sousa Rasoul Kabirzadeh Patrick Blaes Department of Electrical Engineering, Stanford University 1 Introduction ere exists a wealth of photographic images,
More informationarxiv: v1 [cs.cv] 20 Dec 2016
End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr
More informationFaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana
FaceNet Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana Introduction FaceNet learns a mapping from face images to a compact Euclidean Space
More informationCEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015
CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France
More informationHuman Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016
Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016 Max Wang mwang07@stanford.edu Ting-Chun Yeh chun618@stanford.edu I. Introduction Recognizing human
More informationImage Transformation via Neural Network Inversion
Image Transformation via Neural Network Inversion Asha Anoosheh Rishi Kapadia Jared Rulison Abstract While prior experiments have shown it is possible to approximately reconstruct inputs to a neural net
More informationIn Defense of Fully Connected Layers in Visual Representation Transfer
In Defense of Fully Connected Layers in Visual Representation Transfer Chen-Lin Zhang, Jian-Hao Luo, Xiu-Shen Wei, Jianxin Wu National Key Laboratory for Novel Software Technology, Nanjing University,
More informationCSE 559A: Computer Vision
CSE 559A: Computer Vision Fall 2018: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao Xia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/
More informationComputer Vision Lecture 16
Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:
More informationPart-based and local feature models for generic object recognition
Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza
More informationImage Analogies for Visual Domain Adaptation
Image Analogies for Visual Domain Adaptation Jeff Donahue UC Berkeley EECS jdonahue@eecs.berkeley.edu Abstract In usual approaches to visual domain adaptation, algorithms are used to infer a common domain
More informationCS231N Project Final Report - Fast Mixed Style Transfer
CS231N Project Final Report - Fast Mixed Style Transfer Xueyuan Mei Stanford University Computer Science xmei9@stanford.edu Fabian Chan Stanford University Computer Science fabianc@stanford.edu Tianchang
More informationarxiv: v1 [cs.mm] 12 Jan 2016
Learning Subclass Representations for Visually-varied Image Classification Xinchao Li, Peng Xu, Yue Shi, Martha Larson, Alan Hanjalic Multimedia Information Retrieval Lab, Delft University of Technology
More informationApplied Statistics for Neuroscientists Part IIa: Machine Learning
Applied Statistics for Neuroscientists Part IIa: Machine Learning Dr. Seyed-Ahmad Ahmadi 04.04.2017 16.11.2017 Outline Machine Learning Difference between statistics and machine learning Modeling the problem
More informationProceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong
, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA
More informationLearning to Recognize Faces in Realistic Conditions
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationMartian lava field, NASA, Wikipedia
Martian lava field, NASA, Wikipedia Old Man of the Mountain, Franconia, New Hampshire Pareidolia http://smrt.ccel.ca/203/2/6/pareidolia/ Reddit for more : ) https://www.reddit.com/r/pareidolia/top/ Pareidolia
More informationLightweight Unsupervised Domain Adaptation by Convolutional Filter Reconstruction
Lightweight Unsupervised Domain Adaptation by Convolutional Filter Reconstruction Rahaf Aljundi, Tinne Tuytelaars KU Leuven, ESAT-PSI - iminds, Belgium Abstract. Recently proposed domain adaptation methods
More informationCS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016
CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional
More informationSVM Segment Video Machine. Jiaming Song Yankai Zhang
SVM Segment Video Machine Jiaming Song Yankai Zhang Introduction Background When watching a video online, users might need: Detailed video description information Removal of repeating openings and endings
More informationIndustrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ
Stop Line Detection and Distance Measurement for Road Intersection based on Deep Learning Neural Network Guan-Ting Lin 1, Patrisia Sherryl Santoso *1, Che-Tsung Lin *ǂ, Chia-Chi Tsai and Jiun-In Guo National
More informationMeasuring Aristic Similarity of Paintings
Measuring Aristic Similarity of Paintings Jay Whang Stanford SCPD jaywhang@stanford.edu Buhuang Liu Stanford SCPD buhuang@stanford.edu Yancheng Xiao Stanford SCPD ycxiao@stanford.edu Abstract In this project,
More informationFine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task
Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationCS 231A Computer Vision (Fall 2012) Problem Set 4
CS 231A Computer Vision (Fall 2012) Problem Set 4 Master Set Due: Nov. 29 th, 2012 (23:59pm) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable
More informationComparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Nikiforos Pittaras 1, Foteini Markatopoulou 1,2, Vasileios Mezaris 1, and Ioannis Patras 2 1 Information Technologies
More informationREGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION
REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological
More informationFinal Report: Smart Trash Net: Waste Localization and Classification
Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given
More informationObject detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation
Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN
More informationFuzzy Set Theory in Computer Vision: Example 3
Fuzzy Set Theory in Computer Vision: Example 3 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Purpose of these slides are to make you aware of a few of the different CNN architectures
More informationCountermeasure for the Protection of Face Recognition Systems Against Mask Attacks
Countermeasure for the Protection of Face Recognition Systems Against Mask Attacks Neslihan Kose, Jean-Luc Dugelay Multimedia Department EURECOM Sophia-Antipolis, France {neslihan.kose, jean-luc.dugelay}@eurecom.fr
More informationUnsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang
Unsupervised Deep Learning James Hays slides from Carl Doersch and Richard Zhang Recap from Previous Lecture We saw two strategies to get structured output while using deep learning With object detection,
More informationFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period
More informationAutomatic Detection of Multiple Organs Using Convolutional Neural Networks
Automatic Detection of Multiple Organs Using Convolutional Neural Networks Elizabeth Cole University of Massachusetts Amherst Amherst, MA ekcole@umass.edu Sarfaraz Hussein University of Central Florida
More informationAn Implementation on Histogram of Oriented Gradients for Human Detection
An Implementation on Histogram of Oriented Gradients for Human Detection Cansın Yıldız Dept. of Computer Engineering Bilkent University Ankara,Turkey cansin@cs.bilkent.edu.tr Abstract I implemented a Histogram
More informationLinear combinations of simple classifiers for the PASCAL challenge
Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu
More informationObject Detection. Sanja Fidler CSC420: Intro to Image Understanding 1/ 1
Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1/ 1 Object Detection The goal of object detection is to localize objects in an image and tell their class Localization: place a tight
More informationStudy of Residual Networks for Image Recognition
Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks
More informationSemantic Segmentation
Semantic Segmentation UCLA:https://goo.gl/images/I0VTi2 OUTLINE Semantic Segmentation Why? Paper to talk about: Fully Convolutional Networks for Semantic Segmentation. J. Long, E. Shelhamer, and T. Darrell,
More informationData Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005
Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate
More informationLab 9. Julia Janicki. Introduction
Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support
More informationDeep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper
Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation Example Computer Vision
More informationFashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction SUPPLEMENTAL MATERIAL
Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction SUPPLEMENTAL MATERIAL Edgar Simo-Serra Waseda University esimo@aoni.waseda.jp Hiroshi Ishikawa Waseda
More informationFACE DETECTION AND LOCALIZATION USING DATASET OF TINY IMAGES
FACE DETECTION AND LOCALIZATION USING DATASET OF TINY IMAGES Swathi Polamraju and Sricharan Ramagiri Department of Electrical and Computer Engineering Clemson University ABSTRACT: Being motivated by the
More informationSketchable Histograms of Oriented Gradients for Object Detection
Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The
More informationImage Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading
More informationCS230: Lecture 3 Various Deep Learning Topics
CS230: Lecture 3 Various Deep Learning Topics Kian Katanforoosh, Andrew Ng Today s outline We will learn how to: - Analyse a problem from a deep learning approach - Choose an architecture - Choose a loss
More informationCharacter Recognition from Google Street View Images
Character Recognition from Google Street View Images Indian Institute of Technology Course Project Report CS365A By Ritesh Kumar (11602) and Srikant Singh (12729) Under the guidance of Professor Amitabha
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationConvolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017
Convolutional Neural Networks + Neural Style Transfer Justin Johnson 2/1/2017 Outline Convolutional Neural Networks Convolution Pooling Feature Visualization Neural Style Transfer Feature Inversion Texture
More informationA HMAX with LLC for Visual Recognition
A HMAX with LLC for Visual Recognition Kean Hong Lau, Yong Haur Tay, Fook Loong Lo Centre for Computing and Intelligent System Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia {laukh,tayyh,lofl}@utar.edu.my
More informationRecognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)
Recognition of Animal Skin Texture Attributes in the Wild Amey Dharwadker (aap2174) Kai Zhang (kz2213) Motivation Patterns and textures are have an important role in object description and understanding
More informationTwo-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Cemil Zalluhoğlu Introduction Aim Extend deep Convolution Networks to action recognition in video. Motivation
More informationContent-Based Image Recovery
Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose
More informationARE you? CS229 Final Project. Jim Hefner & Roddy Lindsay
ARE you? CS229 Final Project Jim Hefner & Roddy Lindsay 1. Introduction We use machine learning algorithms to predict attractiveness ratings for photos. There is a wealth of psychological evidence indicating
More information