EE 6882 Statistical Methods for Video Indexing and Analysis

Size: px

Start display at page:

Download "EE 6882 Statistical Methods for Video Indexing and Analysis"

Fay Harvey
5 years ago
Views:

1 EE 6882 Statistical Methods for Video Indexing and Analysis Fall 2004 Prof. ShihFu Chang Lecture 1 part A (9/8/04) 1

2 EE E6882 SVIA Lecture #1 Part I Introduction Course Syllabus Readings A. Jain et al, "Statistical Pattern Recognition: A Review," IEEE Tran. on Pattern Analysis and Machine Intelligence, vol 22, No 1, Jan Gonzalez and Woods, Digital Image Processing, 2nd edition, Prentice Hall, 2001 (Chapter 12, Object recognition) Anil K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, (Chapter 9.14) Part II Introduction of a simple image search system Image feature extraction Similarity matching, Performance metrics Readings J. R. Smith and S.F. Chang, "Visually Searching the Web for Content," IEEE Multimedia Magazine, Summer, Vol. 4 No. 3, pp.1220, John R. Smith, ShihFu Chang. VisualSEEk: a Fully Automated ContentBased Image Query System, In ACM Multimedia, Boston, MA, November

3 Problems in Video Indexing and Analysis Indexing, search, and retrieval for images and videos See Columbia s WebSEEk and EdSearch demos Goggle image search? find video clips of basketball going through the hoop find images containing shape shown in the sketch Automatic annotation of visual content (e.g., recognition of text, face, scene, vehicle, location, etc) Automatic parsing of video programs into structures (e.g., break videos into shots, scenes, and stories) Event detection (e.g., sports events, human activities, meetings, medical, and other spatiotemporal patterns) Summary e.g., topic clustering, highlight generation See Columbia s sports highlight, news topic clustering demo 3

4 Examples of object recognition and structure parsing problems How to detect and recognize the characters and words? (Demo) story How to detect the boundaries of programs, stories, and commercials? shot anchor shot 4

5 Statistical Paradigm Many problems can be posed as pattern recognition (e.g., Matlab statistical classification demo) Statistical models to handle uncertainty and provide flexibility Rich tools for learning and prediction Image processing toolkits available Increasing benchmark data (e.g., NIST TREC Video) 5

6 A Very HighLevel Stat. Pattern Recog. Architecture (From Jain, Duin, and Mao, SPR Review, 99) 6

7 Important issues Image/video preprocessing quality, resolution etc Feature extraction Color, texture, motion, shape, layout, regions, parts, etc Feature representation Discrete vs. continuous, vectorization, dimension Invariance to scale, rotation, translation Feature selection PCA, MDS, Kernel PCA, etc Classification models Generative vs. discriminative Multimodal fusion, early fusion vs. late fusion Size of training/test data and manual supervision efforts Validation and evaluation processes Complexity 7

8 Some examples of feature representation Features determine the patterns and their separability E.g., Angular distance for closed shapes Part features for iris flowers 8

9 Another example of feature Bankers Asso. Font used on personal checks Use magnetic ink and reader to simplify segmentation Feature: the horizontal scan of the rate of increase/decrease of the character area Peaks and zeros are arranged to be located at the vertical grid lines can be sampled accurately Patterns can be easily distinguished 9

10 Classification Paradigms x 2 Decision f(x) > 0 Boundary f(x) < 0 x 1 f(x) discriminant function Discriminative Likelihood P(x C=1) > or < P(x C=2) Class 1 Class 2 x 0 C(x 0 )=? Probabilistic x (Height, income, ) 11

11 x(2) Training / Validation / Testing Training optimal features, models, parameters x(1) x(2) Validation x(1) Assume the same distribution in different set, otherwise the optimal solution from validation may not be optimal in test data Select optimal hypothesis through validation x(2) Testing x(1) Evaluate performance over test data 12

12 Training / Validation / Testing (cont.) Multiple validation sets can be used for different optimization steps. Val 1 Val 1 Optimal classifier using feature 1 Optimal classifier using feature 2 Cross validation, leaveoneout Val 2 Optimal classifier fusing multiple features 1 2 K Training Testing Rotate the choice of the test set and average the performance over runs 13

13 Curse of Dimensionality and Overtraining x(2) Overtraining A case of overtraining x(1) Rule of thumb (# of training patterns per class) / (# of features) > 10 14

14 About the course Objectives: Learn how to formulate and solve problems in this field Feature extraction, object/event recognition, structure detection, video search and retrieval Get insights and experience of recent machine learning techniques Statistical, Bayesian, Neural Network, PCA, HMM, SVM Have fun in experimenting with actual visual classification/indexing problems Intended Audience Beginning graduate students or professionals familiar with signal/image processing comfortable with probability, statistics, linear algebra, and some machine learning 15

15 Course Format Overview Lectures + student presentations + final projects I will give several overview lectures at the beginning. Student paper presentation One paper assigned to each student assignments determined 3 weeks in advance CVN students present over the phone Everyone writes comments before and after class on the class wiki site (starting the 3 rd week) One written exam after all presentations test understanding of concepts discussed throughout the course One term project at the end of the course Grading Paper presentation/demo 30% Exam 30% Final Project 40% 16

16 Paper review and demo Each student discusses paper and demos with me and TA 2 weeks before class Week 1: review and research Week 2: simulate a toy problem using available data set and tools Week 3: prepare presentation Upload the slide and codes to the class wiki site before class Presentation 30 mins each paper (including demo) I will provide additional materials about the subject. 17

17 Paper Review and Demo (2) Review Background review and examples Problem addressed and main ideas Insights about why it works Limitation, generality, and repeatability Alternatives and comparisons Demo Software and data available and repeatable? Reconstruct the method and try on toy data set? (from some available generic toolkit) Analysis of results (not just accuracy numbers, offer explanations and verifiable theories about observations) Demo code archived on class site and shared with others 18

18 Resources and Matlab Links on the class web site Tutorials on paper writing, Matlab, etc Software links on web site to Matlab, Neural Network, HMM, Netlab, SVM SVIA EE6882 Class Dataset Benchmark data set, a few thousands of images from broadcast news and stock photos Extracted features and labels Will distribute on a DVD for class project use only Matlab is recommended for programming Accessible in Mudd 251 Computer Lab Need CU ACIS account Very brief introduction next week 19

19 Paper categories Problems Feature extraction and image search Image/video classification Interactive image retrieval Video structure parsing Multimedia information retrieval Statistical Techniques Bayesian, factor graph, graphical model SVM and variations Language model, relevance model from IR HMM and variations others 20

20 A few papers reviewed last year 21

observation τ k 1 {video, audio} τ τ k k + 1 a static face?

21 Maximum Entropy Fusing Objective: a story boundary at time? τ k τ k = { shot boundaries or significant pauses} (Hsu and Chang) observation τ k 1 {video, audio} τ τ k k + 1 a static face? time motion energy changes? {cue words} i appear 22 {cue words} j appear change from music to speech? speech segment?

22 Bayesian Image Classification How to select the categories and tree? How to estimate the distributions of features for each class? (Valaiya et al 98 and 01) 23

23 Concept (In)Dependence (Naphade et al) 25

24 Boosting (Tieu and Viola) Extract > 45K selective efficient features by multiscale filtering Classifier combination and sample reweighting 26

25 Boosting retrieval interface Two class problem: relevant vs. irrelevant User selected examples 20 retrieval results Realtime evaluation of 20 features over millions of images Negative images in the training set close to decision boundary Images in the testing set close to the decision boundary 27

26 ObjectWord Correspondence (Duygulu et al) Model the joint distribution between words and blobs Used in automatic annotation and retrieval 28

27 Unsupervised Video Structure Discovery: Hierarchical Hidden Markov Model Learning MultiLevel Markovian Temporal Dependence Highlevel states represent distinct events (Xie et al) Presence of each event produces observations modeled by lowlevel HMMs Baseball Example toplevel states running pitching break bottomlevel states field bird view 1 st base bench close up audience pitcher batter time 29

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included