Interpretable Compressed Domain Video Annotation TU Berlin IC / Frauhofer HHI, TU Berlin IDA

Similar documents
Convergence of Communication and Machine Learning

CS231N Section. Video Understanding 6/1/2018

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Spatial Localization and Detection. Lecture 8-1

People Detection and Video Understanding

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Slides for Data Mining by I. H. Witten and E. Frank

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Digital Reality: Using Trained Models of Reality to Synthesize Data for the Training & Validating Autonomous Systems Philipp Slusallek

Layer-wise Relevance Propagation for Deep Neural Network Architectures

Yiqi Yan. May 10, 2017

Computer Vision: Making machines see

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

MULTIVARIATE ANALYSES WITH fmri DATA

Sparse Models in Image Understanding And Computer Vision

ECS289: Scalable Machine Learning

Using Machine Learning for Classification of Cancer Cells

Machine Learning with Python

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Demystifying Machine Learning

Fully Convolutional Networks for Semantic Segmentation

Lecture 7: Semantic Segmentation

Deep Learning. Yee Whye Teh (Oxford Statistics & DeepMind)

Person Action Recognition/Detection

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017


3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Segmentation of Images

High-Level Computer Vision

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Semantic Segmentation

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN

Classification by Support Vector Machines

Machine Learning 13. week

Cascade Region Regression for Robust Object Detection

Image Processing, Analysis and Machine Vision

Image Classification pipeline. Lecture 2-1

Su et al. Shape Descriptors - III

Intelligent Edge Computing and ML-based Traffic Classifier. Kwihoon Kim, Minsuk Kim (ETRI) April 25.

Deep Learning for Visual Manipulation and Synthesis

Convolutional-Recursive Deep Learning for 3D Object Classification

CISC 4631 Data Mining

ImageNet Classification with Deep Convolutional Neural Networks

Det De e t cting abnormal event n s Jaechul Kim

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

Machine Learning Techniques for Data Mining

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

END-TO-END CHINESE TEXT RECOGNITION

Machine Learning Lecture 3

Deep Tensor: Eliciting New Insights from Graph Data that Express Relationships between People and Things

MIL at ImageCLEF 2014: Scalable System for Image Annotation

Object Detection Design challenges

Action recognition in robot-assisted minimally invasive surgery

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Adaptive Dropout Training for SVMs

Efficient Algorithms may not be those we think

Deep Learning for Remote Sensing

Dynamic Vision Sensors for Human Activity Recognition

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Architecting new deep neural networks for embedded applications

INTRODUCTION TO DEEP LEARNING

Class 9 Action Recognition

Slides credited from Dr. David Silver & Hung-Yi Lee

WISE: Large Scale Content Based Web Image Search. Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley

Machine Learning in WAN Research

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

RTSR: Enhancing Real-time H.264 Video Streaming using Deep Learning based Video Super Resolution Spring 2017 CS570 Project Presentation June 8, 2017

Deep CNN Object Features for Improved Action Recognition in Low Quality Videos

Graph Neural Network. learning algorithm and applications. Shujia Zhang

Storyline Reconstruction for Unordered Images

Image Classification pipeline. Lecture 2-1

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deformable Part Models

Segmentation in electron microscopy images

Semantic Pooling for Image Categorization using Multiple Kernel Learning

CogniSight, image recognition engine

Supervised Hashing for Image Retrieval via Image Representation Learning

Bilinear Models for Fine-Grained Visual Recognition

arxiv: v1 [cs.mm] 12 Jan 2016

Energy Minimization for Segmentation in Computer Vision

CPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia

From network-level measurements to expected Quality of Experience. the Skype use case

International Journal of Electrical, Electronics ISSN No. (Online): and Computer Engineering 3(2): 85-90(2014)

CSC 578 Neural Networks and Deep Learning

Computer Vision Lecture 16

CS 231A Computer Vision (Fall 2011) Problem Set 4

Decision models for the Digital Economy

Computer Vision Lecture 16

Deep Neural Networks Applications in Handwriting Recognition

Facial Expression Analysis

Information and Communications Security: Encryption and Information Hiding

Detection III: Analyzing and Debugging Detection Methods

Structured Models in. Dan Huttenlocher. June 2010

2. Blackbox hyperparameter optimization and AutoML

Transcription:

Interpretable Compressed Domain Video Annotation TU Berlin IC / Frauhofer HHI, TU Berlin IDA # 2013 Berlin Big Data Center All Rights Reserved BBDC

#

Large Databases (Youtube, Netflix) Internet Traffic by 2019 Requirements Solutions 20% Other Data Compressed domain analysis Motion vector based models 80% Video Data (compressed, multimodal) Efficient (generic) representation Multimodal Integration Fisher Vectors Multi-stream networks 1.6 zettabyte Interpretable Analysis LRP & Deep taylor analysis New streaming applications (Autonomous driving, Industry 4.0) Computationally heavy methods (Deep learning) # 2013 Berlin Big Data Center All Rights Reserved

Projects done Compressed Domain Video Analysis: Motion vector histograms & Fisher Vector representation Tracking in compressed domain based on Markov Random Field (MRF) Robustness of pixel domain vs. compressed domain methods Multi-stream convolutional neural networks + LSTM Interpretable machine learning: Layer-wise Relevance Propagation Deep Taylor Decomposition Evaluating the visualizations Application to images, videos, text, time series Scalable Retrieval: Multi-purpose Locality Sensitive Hashing (mplsh) Similarity Search in Compressed Domain Publications PLOS15, MMSP16, EUVIP16, JMLR16, ICANN16, ICIP16, ICML16, NIPS16, TNNLS16, CVPR16, ICISA16, ICIP216, GCPR16, ACL16, JNM16 #

Interpretable Compressed Domain Video Annotation

Requires only partial decoding of motion vectors, transform coefficients, block coding modes, etc. Direct analysis of compressed videos reduces: - storage and bandwith requirements - computational overhead #

Motion vectors as features

Motion vectors as features Motion vectors Video sequence Optical flow Motion vectors can be extracted from the compressed video BUT: - They not necessarily represent the true motion. - They are much sparser and have smaller resolution than e.g. optimal flow.

Fisher Vector based video annotation 1. Extract the motion vectors 2. Compute HOF and MBH features and stack all cubes over all time slices. 3. Cluster the descriptors using a GMM 4. Compute Fisher Vectors and stack them 5. Annotate the video with a linear SVM classifier.

Interpretable classification Black-box classifier Wrong prediction cat Famous example good weather no tanks bad weather tanks

Interpretable classification 1st step in improving ML algorithms is to unterstand their weaknesses Interpretability has also a legal aspect (EU s right to explain regulation by 2018) bank has to explain why you don t get loan in order to avoid discrimination by algorithms Also important in the sciences new hypotheses by better understanding what s going on (e.g. genetic studies) Interpretability helps to retain human responsibility important in e.g. medial applications, ML algorithm just a helping tool (medical doctor is responsible)

Interpretable classification Main idea: ladybug

Interpretable classification Classification cat ladybug dog

Interpretable classification Explanation cat ladybug dog Initialization =

Interpretable classification Explanation? cat ladybug dog Theoretical interpretation (Deep) Taylor Decomposition (Montavon et al., arxiv 2015) Relevance of upper layers is redistributed to lower layers proportionally (depending on activations & weights).

Interpretable classification Explanation cat ladybug dog Relevance Conservation Property

Interpretable classification

Interpretable classification We can explain compressed domain classifier by LRP. - Redistribute relevance from output to input in a meaningful manner - Observe layer-wise relevance conservation principle

Interpretable classification

Demo: Interpretable Compressed Domain Video Annotation #