The LICHEN Framework: A new toolbox for the exploitation of corpora

Similar documents
Video search requires efficient annotation of video content To some extent this can be done automatically

Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Search Engines. Information Retrieval in Practice

AUTOMATIC VIDEO INDEXING

Lesson 11. Media Retrieval. Information Retrieval. Image Retrieval. Video Retrieval. Audio Retrieval

CHAPTER 8 Multimedia Information Retrieval

Multimedia Information Retrieval The case of video

Welcome Back to Fundamental of Multimedia (MR412) Fall, ZHU Yongxin, Winson

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009

MPEG-7 Audio: Tools for Semantic Audio Description and Processing

The BilVideo video database management system

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Accessibility Guidelines

ELL 788 Computational Perception & Cognition July November 2015

MULTIMEDIA DATABASES OVERVIEW

Contend Based Multimedia Retrieval

Semantic Video Indexing

IMOTION. Heiko Schuldt, University of Basel, Switzerland

Multimedia Database Systems. Retrieval by Content

Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors

Topics for thesis. Automatic Speech-based Emotion Recognition

Approach to Metadata Production and Application Technology Research

Columbia University TRECVID-2005 Video Search and High-Level Feature Extraction

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Image Processing (IP)

CSI 4107 Image Information Retrieval

Introduzione alle Biblioteche Digitali Audio/Video

A Content Based Image Retrieval System Based on Color Features

Course overview. Digital Visual Effects, Spring 2005 Yung-Yu Chuang 2005/2/23

How to Tell a Human apart from a Computer. The Turing Test. (and Computer Literacy) Spring 2013 ITS B 1. Are Computers like Human Brains?

Real-Time Content-Based Adaptive Streaming of Sports Videos

A Reranking Approach for Context-based Concept Fusion in Video Indexing and Retrieval

Offering Access to Personalized Interactive Video

The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System

Lecture 7: Introduction to Multimedia Content Description. Reji Mathew & Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2009

Interactive Video Retrieval System Integrating Visual Search with Textual Search

Chapter 19: Multimedia

The Essential Guide to Video Processing

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Baseball Game Highlight & Event Detection

Video Retrieval. Yoshinobu Hotta FUJITSU LABORATORIES LTD. Copyright 2012 FUJITSU LABORATORIES LTD.

Searching Video Collections:Part I

Description and search of multimedia data

3 Publishing Technique

CHIST-ERA Projects Seminar Topic IUI

YUSUF AYTAR B.S. Ege University

Vision Impairment and Computing

MPEG-7. Multimedia Content Description Standard

Image Classification Using Wavelet Coefficients in Low-pass Bands

Scanned Documents. LBSC 796/INFM 718R Douglas W. Oard Week 8, March 30, 2011

Eyes-Free User Interaction

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment

8.5 Application Examples

Audio-Visual Content Indexing, Filtering, and Adaptation

Audio-Visual Content Indexing, Filtering, and Adaptation

Celebrity Identification and Recognition in Videos. An application of semi-supervised learning and multiclass classification

Video Search Reranking via Information Bottleneck Principle

Assessed Project 1: Magnetic North. 5: Working to a Brief. 64: Motion Graphics and Compositing Video. 20 credits 20 credits

CogniSight, image recognition engine

ebooks & ecomics WHAT: Book Creator is an ipad and Android app that lets you design and publish your own customized ebook.

Multimedia Information Retrieval at ORL

T H E D I G I TA L L I B R A R Y

Introduction. Multimedia Applications and Data Management Requirements 1.1 HETEROGENEITY

Contexts and 3D Scenes

CAP 5415 Computer Vision. Fall 2011

Distributed Multimedia Systems

Vision-Based Technologies for Security in Logistics. Alberto Isasi

Multimedia structuring using trees

MATRIX BASED SEQUENTIAL INDEXING TECHNIQUE FOR VIDEO DATA MINING

USING METADATA TO PROVIDE SCALABLE BROADCAST AND INTERNET CONTENT AND SERVICES

Metadata, Chief technicolor

DOTNET PROJECTS. DOTNET Projects. I. IEEE based IOT IEEE BASED CLOUD COMPUTING

Automatically Identifying and Georeferencing Street Maps on the Web

Adults Men Women Adults Men Women GEICO 98% 99% 98% 96% 96% 96%

Lecture 12: Video Representation, Summarisation, and Query

Summary Table Voluntary Product Accessibility Template

Artificial Intelligence Powered Brand Identification and Attribution for On Screen Content

Information Retrieval (Part 1)

Facade Stanford Facial Animation System. Instruction Manual:

Lesson Plans. Put It Together! Combining Pictures with Words to Create Your Movie

Dr. Shahanawaj Ahamad. Dr. S.Ahamad, SWE-423, Unit-04

ECG782: Multidimensional Digital Signal Processing

Talking Books in PowerPoint

2. Basic Task of Pattern Classification

iflix is246 Multimedia Metadata Final Project Supplement on User Feedback Sessions Cecilia Kim, Nick Reid, Rebecca Shapley

EE Multimedia Signal Processing. Scope & Features. Scope & Features. Multimedia Signal Compression VI (MPEG-4, 7)

Hello, I am from the State University of Library Studies and Information Technologies, Bulgaria

Yiannis Kompatsiaris Multimedia Knowledge Laboratory CERTH - Informatics and Telematics Institute

Content-based image and video analysis. Event Recognition

Quick Start Guide Natural Reader 14 (free version)

Ontology driven voice-based interaction in mobile environment

ACCESSING VIDEO ARCHIVES USING INTERACTIVE SEARCH

Latent Variable Models for Structured Prediction and Content-Based Retrieval

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

Research on Construction of Road Network Database Based on Video Retrieval Technology

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

Transcription:

The LICHEN Framework: A new toolbox for the exploitation of corpora Lisa Lena Opas-Hänninen, Tapio Seppänen, Ilkka Juuso and Matti Hosio (University of Oulu, Finland)

Background cultural inheritance is increasingly preserved in multiple media: text, images, speech, audio, graphics, animation etc. digitalization of content creation and storage devices produces an increasing amount of digital data for databases (digital convergence) need for database tools for accessing and analyzing the data

Content-based information retrieval Basic principle The user has a need for information and an illustrating example (text, image, video shot, sound) The user formulates a query from the properties of the example The retrieval system provides the user with hits that are supposed to be relevant The user checks the hits and refines his query to get better hits Information 1. need 2. Query formulation Computer, I want similar sounding shots that are located indoors and have at least one musician in it 3. Master, here are the most similar ones: 4. I prefer shots like the first one, could you provide me more like those? And could you also show me the entire video of this

The semantic gap Information need High-level semantic concepts objects, scenes, persons, actions, events, feelings Retrieval engine Tries to map data-driven low-level features to user-driven concepts. - feature computation (representation) - inference, classifiers - machine learning (modeling, feature fusion) Data: text, texture, shape, color, layout, motion Low-level features (Automatically computed) metadata

The problem of annotation (Semi)automatic extraction of useful information from video content for the purpose of retrieval, browsing and indexing Required for training and testing search engines Why (semi)automatic instead of manual annotation? Sheer volume of data may render manual methods impractical Manual methods are subject to personal interpretations Manual methods are subject to human errors Still, manual annotation is very important (Semi)automatic methods may not be robust enough

Document image retrieval Paper documents have been scanned and stored in DBs Search for documents with specific layout structures Search for documents containing specific text or markings Books, emails, poems, articles, etc.

Various document types

Document image retrieval Image retrieval is based on: subject color texture date Document retrieval is conventionally based on: text subject attributes + In IDIR documents can be queried also by: layout (position and size of objects in documents)

Effect of OCR errors on document retrieval Optical Character Recognition (OCR) Conversion of document images to text The TREC community experiments For OCR accuracy of <80%, not useful For OCR accuracy of 80-95%, use enhanced IR Filtering of noise, approximate string matching, fuzzy methods, OCR confusion statistics, n-gram For OCR accuracy of 95-100%, most IR work fine

Arbitrary image retrieval digital images of... cultural content: shamans, Lapps, ceilidh, herd of reindeer, whisky stillpots, santa claus, fishing, sauna, etc. other tourism-related images ads, illustrations, etc.

Image color shape texture objects layout

Image Retrieval System Searching with content-based search interfaces, flexible search trees, sketch based retrieval, example-based search, fast indexing, and similarity metrics

Audio and speech retrieval interviews, conversations, TV broadcasts, speeches, etc. search for instances of words, utterances, expressions,... wonderful, yeah play the sounds while displaying the accompanying textual transciption or images/video do smiles always indicate happiness? does a knotted brow always indicate puzzlement? samples of environmental sounds, such as from nature or animals

Audio and speech Audio speech music other sounds speech-driven UI simultaneous interpretation audio material indexing voice-effect libraries effect-based video categorization music classification music storage and search voice-sample based queries

Prosodic analysis tools

Video retrieval TV broadcasts, political speeches, videoed events, etc. Search for specific videos or video shots Display hits and their metadata or interpretations gestures, facial expressions (eg. political speeches)

Video and movies Video auditory information static and dynamic visual information spoken information voice analysis speech recognition Pesäpalloliiton selvitys sopupeleistä image analysis text analysis fusion techniques key frames shots scenes time-dependency video analysis media asset management activity recognition

Data abstraction levels Low-level components Intermediate-level components High-level components Raw Raw data data sequence sequence atomic pixel colors (image) frames(video) spectrum (speech) regional temporal segmented (image) shots(video) phonemes (speech) audio types: speech/music meaning objects (image) scenes (video) words, clauses (speech) Semantics Semantics Objects Objects Automatic Semi-automatic Semi-automatic or manual

Query examples Find shots of Condoleeza Rice Find shots of Iyad Allawi, the former prime minister of Iraq Find shots of Omar Karami, the former prime minister of Lebannon Find shots of Hu Jintao, president of the People's Republic of China Find shots of Tony Blair Find shots of Mahmoud Abbas, also known as Abu Mazen, prime minister of the Palestinian Authority Find shots of a graphic map of Iraq, location of Bagdhad marked - not a weather map Find shots of tennis players on the court both players visible at same time Find shots of people shaking hands Find shots of a helicopter in flight Find shots of George Bush entering or leaving a vehicle, e.g., car, van, airplane, helicopter, etc - he and the vehicle both visible at the same time. Find shots of something (e.g., vehicle, aircraft, building, etc) on fire with flames and smoke visible Find shots of people with banners or signs Find shots of one or more people entering or leaving a building Find shots of a meeting with a large table and more than two people Find shots of a ship or boat Find shots of basketball players on the court Find shots of one or more palm trees Find shots of an airplane taking off Find shots of a road with one or more cars Find shots of one or more tanks or other military vehicles Find shots of a tall building (with more than 5 floors above the ground) Find shots of a goal being made in a soccer match Find shots of an office setting, i.e., one or more desks/tables and one or more computers and one or more people

The LICHEN Framework Architecture supports both stand-alone (local) and client-server (over the network) operation modes Remote UI Client front-end Active Pages CI CI Local operation mode CI CI Client-server mode Web server

Digital rights management License servers Cryptography-based protection of data Digital watermarking of images, audio, speech, videos