Project: Standard Framework of Video Content Processing

Similar documents
LATIHAN Identify the use of multimedia in various fields.

Parcel QA/QC: Video Script. 1. Introduction 1

SMK SEKSYEN 5,WANGSAMAJU KUALA LUMPUR FORM

Lecture 1: January 22

Principles of Computer Game Design and Implementation. Lecture 3

Module 10A Lecture - 20 What is a function? Why use functions Example: power (base, n)

Lecture 1: January 23

Aim of this paper is to describe the motivation and the concept of the Decibel project and related technologies.

!!!!!! Portfolio Summary!! for more information July, C o n c e r t T e c h n o l o g y

Chapter 5 - Input / Output

Balancing the pressures of a healthcare SQL Server DBA

Question #1: 1. The assigned readings use the phrase "Database Approach." In your own words, what is the essence of a database approach?

The Analysis and Design of the Object-oriented System Li Xin 1, a

Introduction to Personal Computing

Automatic Crediting on Sona

The core logic manages the state of the traffic light, properly sequencing the signals.

Windows XP. A Quick Tour of Windows XP Features

OO Development and Maintenance Complexity. Daniel M. Berry based on a paper by Eliezer Kantorowitz

I m going to be introducing you to ergonomics More specifically ergonomics in terms of designing touch interfaces for mobile devices I m going to be

ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES

Principles of Algorithm Design

Yammer Product Manager Homework: LinkedІn Endorsements

Systems Ph.D. Qualifying Exam

Lecture 04 FUNCTIONS AND ARRAYS

Driven by a passion to develop our customers, SuperOffice has become one of Europes leading providers of CRM solutions.

V6 Programming Fundamentals: Part 1 Stored Procedures and Beyond David Adams & Dan Beckett. All rights reserved.

Big Ideas. Chapter Computational Recipes

turning data into dollars

Library Website Migration and Chat Functionality/Aesthetics Study February 2013

Cold, Hard Cache KV? On the implementation and maintenance of caches. who is

DIABLO VALLEY COLLEGE CATALOG

SOFTWARE REQUIREMENT REUSE MODEL BASED ON LEVENSHTEIN DISTANCES

20 reasons why the Silex PTE adds value to your collaboration environment

Beyond Gnuplot: An introduction to POV-Ray

Note Takers: Chau, Bao Kham (cs162-bb) and Quang Tran (cs162-bc) Topic: Real time system

Research Statement. Yehuda Lindell. Dept. of Computer Science Bar-Ilan University, Israel.

Getting started 7. Setting properties 23

What is SCADA? What is Telemetry? What is Data Acquisition? Differences between SCADA and DCS? Components of SCADA. Field Instrumentation

Midterm Reminders. Design Pattern #6: Facade. General Idea. The Situation

Responsive Web Design Discover, Consider, Decide

Know which Edition is right for you by using this sales tool!

Sennheiser Business Solutions. Visitor guidance & audio streaming

WEBINARS AS AN EDUCATIONAL TOOL

Definition: A data structure is a way of organizing data in a computer so that it can be used efficiently.

Software Design. Introduction. Software Design (Introduction) SERG

Software Architectures

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

Independent Solution Review AppEnsure for Citrix Monitoring

WELCOME TO OUR PRESENTATION. Erasmus + Project IT CAREER START-UP RO01-KA

Lecture 19: Generative Adversarial Networks

AGIIS Duplicate Prevention

Log into your Participant Centre and you'll find five steps to fundraising success, and all the tools you need.

Let s begin with the very beginning...

Algorithms in Systems Engineering IE172. Midterm Review. Dr. Ted Ralphs

GETTING STARTED WITH. HOW TO CREATE A KAHOOT Step-by-step guide

VERTICAL CONFERENCE Juni 2018

BUYING DECISION CRITERIA WHEN DEVELOPING IOT SENSORS

DEFERRED ACCEPTANCE 2

Interactive Map Project. Sophie 70227

The Software System for controlling the RF Power Plant.

International Journal of Advance Engineering and Research Development

An initial evaluation of ROP-based JIT-compilation

This paper was presented at DVCon-Europe in November It received the conference Best Paper award based on audience voting.

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change

How Your First Program Works

5-1McGraw-Hill/Irwin. Copyright 2007 by The McGraw-Hill Companies, Inc. All rights reserved.

Example of Focus Group Discussion Guide (Caregiver)

Introduction CHAPTER1. Strongly Recommend: Guidelines that, if not followed, could result in an unusable application.

This is an oral history interview conducted on. October 30, 2003, with IBM researcher Chieko Asakawa and IBM

DIABLO VALLEY COLLEGE CATALOG

Academic Reference Standards (ARS) for Electronics and Electrical Communications Engineering, B. Sc. Program

Beginning SQL Server for Developers

15 Minute Traffic Formula. Contents HOW TO GET MORE TRAFFIC IN 15 MINUTES WITH SEO... 3

Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling

How to download/install Windows Live Movie Maker 2011

History of object-oriented approaches

Empty Your Inbox 4 Ways to Take Control of Your

1 More on the Bellman-Ford Algorithm

Nowadays data-intensive applications play a

Service Complex System

BEST PRACTICES FOR PERSONAL Security

Module 6. Campaign Layering

Using X-Particles with Team Render

CSE 403 Spring UDub Mail. Gabriel Maganis Sachin Pradhan. April 04, 2006

Important Encoder Settings for Your Live Stream

Patterns in Software Engineering

The educational uses of pencasts in mathematics education.

INTRODUCTION BACKGROUND DISCOVERER. Dan Vlamis, Vlamis Software Solutions, Inc. DISCOVERER PORTLET

Object Recognition Tools for Educational Robots

Smart 5G networks: enabled by network slicing and tailored to customers needs

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #43. Multidimensional Arrays

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE

Su et al. Shape Descriptors - III

Software Engineering Lab Manual

Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland,

EDGE, MICROSOFT S BROWSER

Redefining Modularity, Re-use in Variants and all that with Object Teams

Lync Training Outline

There we are; that's got the 3D screen and mouse sorted out.

Smart-Voice Invocation of Scenes in Home- Automation Systems

Transcription:

Project: Standard Framework of Video Content Processing Haichuan Yang Summary This is a proposal to launch the project of creating a framework to analyze and process the content of videos. The framework may contain data structure and algorithms. The data structure acts as the basic element to process or analysis the content of video, for example, it could be a set of image features of each frame, and the corresponding audio feature. The algorithm is defined upon this data structure, i.e., how to take use of the feature set. Both the data structure and the algorithm keep the potential to be extended. New kinds of features and processing algorithms can be easily added in this standard framework, and the standard interface will make them work together. Intellectual Merit. The proposed research project will provide a standard way to digitally understand and describe the content of video. It is based on the thriving research on computer vision, acoustics and signal processing, and show its own way to compose these aspects together to define a set of methods which are specifically used in tasks about video content. To the academic community of video understanding and processing, the proposed framework extracts the very commonly used part from similar or different algorithms and tasks, make it easy to develop new algorithms based on it. Broader Impacts. As the data form that takes the most percentage, videos can represent lots of thing. Almost all the aspects of our life involve videos. Video is a very efficient way to represent and send information. Nowadays, as the development of smart phone and digital camera, generating high-quality videos is quite easier. As everyone participating in this activity, the amount of videos is increasing as an incredible speed. However, the analysis tools for video do not catch this increase. For solving a new problem, people usually need to start all the things from scratch. Besides that, different with text or images, video contains much more information, both visual and audial. By decomposing the complexity of video analysis with the proposed framework, solving this problem and developing new algorithms will be much more efficient.

Project Description The main work of this projected is to develop a set of framework to support the design of new video analysis algorithms. We anticipate this framework can unify most of the extant video analysis methods, and have good flexibility. The framework also has the implementations for feature extraction, and processing algorithms, which can be directly used for practical problems. We wish to use this framework to extract the basic component and create a scaffold to serve the researchers and engineers, to free them from the repeated and low-efficient scratch work. Motivation Existing video analysis methods usually face several problems: How to represent the video, what kind of feature should be used, and whether the analysis algorithm can handle the scale of the data. Take near-duplicate video retrieval [1] as an example. Nearduplicate video retrieval is a task to search the video set, and find the sample which is nearly the same to the query. The video set could be very large, e.g., IMDB has 3,446,167 titles [2]. If assume each video length is 10 minutes, and we have 12 frames per second. There are totally 24,812,402,400 images. If we directly represent video with all the image features, there will be pretty large computation in the retrieval process. On the other hand, we all know that retrieval usually require real-time, which means we do not have much time for the whole retrieval process. So it is impossible to accomplish this job with plain image representation. From the example above, we can find that video analysis is usually a complex job. Different with image and text, video contains multimodal data: visual and audial. Compared with one dimensional audio and two dimensional image, video has three dimensions. It is the complex composition which consists of two modalities and three dimensions, so it is nearly impossible to handle it with one single algorithm. For our researchers and engineers, it usually needs many kinds of methods to cope different aspects of the problem, and it will take lots of time to handle them if always start from scratch. Method and Plan Actually, people already have lots of experience to decompose complex problem to be simpler ones, i.e., divide and conquer. In the history of computer science and engineering, we can take some successful cases as example. The first example is network protocols [3]. The different network functions are assigned

to different layers, the whole function of network communication is accomplished by the cooperation of different network layers. On each layer, there may exists different kind of protocol, but they all have some similarity and can work on the base of its lower layer. The whole network model is just a layer-wise framework, and it provide basic function of communication between nodes. It allows people to implement specific protocol, and plug it into the network protocol stack. If a researcher or engineer only want to create or modify a protocol in one layer, he do not need to change anything of other layers. This framework simplifies the process of developing new network algorithm and eventually contributes to our internet world today. Another example is the hardware architecture of modern computer. Because computer is a complex system, it need to complete lots of functions, e.g., input, output, compute, communicate, storage, etc. People divided these features and put them into different components, then gather these components, plug them in the board, then we see the computer. We must admit that the dramatic development speed of computer cannot leave the contribution made by modularization. Taking these successful experience, we can learn something to build our video processing framework. There are several approaches can be considered in the plan of our work. 1. Compare the application background and problems in video processing and computer network, can we use layer-wise framework in our project? This depends on whether the solution of most video processing tasks have the layer-wise structure, which means the problem can be separated to several sub-problems, and each problem only depends on the solution of the previous problem. If we can get layerwise structure in the video processing problem, we can build a layer-wise framework just as network protocol stack. 2. Try to decompose the video processing problem into disjoint modules, different module can have different function. Build a central module to gather all the modules, all the sub-modules are responsible to interact with the central module. It is not hard to see that this approach is just analogy of the computer architecture. Similarly, it is not sure that our video processing task have the modular structure, so we should analysis the most extant video analysis tasks and algorithms, to find the answer. 3. The third way is to combine the approach 1 and 2. Maybe the video processing task is not layer-wise, but it can be divided into several modules and each modules has a layer-wise structure. Or maybe it cannot be modularized directly, but it can be separated into different layers and each layer has modular structure. Video processing is a very complex task, it is definitely possible that our framework has a very complex structure. 4. Maybe neither example is suitable to our domain. In this situation, we need to create our own approach to build the framework. Firstly, we should collect the most typical and common video processing task, e.g., video retrieval, event detection, etc. Then we also should collect the classic methods for them. At last, we try to analyze the common structure or aspect of these methods, and build the framework.

Another useful tools for building our framework is the object-oriented strategy. Objectoriented strategy can hide the complex details of the implementation, and this is just what we need. Experiment Criterion The experiment criterion is nothing but to solve all the known typical video processing task by our framework. This maybe sounds incredible, but it is totally possible. It is not necessary to implement every algorithm in video processing, we only need to get the state-of-the-art level evaluation result for a certain problem. Since the number of typical problems is much smaller than the number of all the video processing algorithms, our criterion is feasible. Broader Impact of the Proposed Work Education, entertainment, commercial, sport, art, news, almost all aspects of our life involve videos. This framework will change the extant process of dealing with new problems about video analysis, and give a clear logic structure of the algorithm. Given this framework, researchers and engineers can design their new algorithm or solve their new task upon it. Maybe one can just find implemented data structure and algorithm to solve his problem, or need to plug his own implementation in the framework. It also contains completed routines to implement common jobs about video content, such as retrieval, indexing, similarity computation, and so on.

Reference [1] IMDb Database Statistics, http://www.imdb.com/stats [2] Liu, Jiajun, et al. "Near-duplicate video retrieval: Current research and future trends." ACM Computing Surveys (CSUR) 45.4 (2013): 44. [3] Communications protocol, https://en.wikipedia.org/wiki/communications_protocol