Multimedia Databases

Similar documents
Multimedia Databases. 0. Organizational Issues. 0. Organizational Issues. 0. Organizational Issues. 0. Organizational Issues. 1.

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Relational Database Systems 2 1. System Architecture

Relational Database Systems 2 1. System Architecture

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Information Retrieval and Web Search Engines

Data Warehousing & Data Mining

Multimedia Databases. 2. Summary. 2 Color-based Retrieval. 2.1 Multimedia Data Retrieval. 2.1 Multimedia Data Retrieval 4/14/2016.

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

Information Retrieval

9/8/2016. Characteristics of multimedia Various media types

Data Warehousing & Mining Techniques

Chapter 2 - Concepts and Definitions

VK Multimedia Information Systems

Data Warehousing & Mining Techniques

2. Summary. 2.1 Basic Architecture. 2. Architecture. 2.1 Staging Area. 2.1 Operational Data Store. Last week: Architecture and Data model

Multimedia Databases

Information Retrieval and Web Search Engines

Data Warehouse and Data Mining

1. Inroduction to Data Mininig

Overview of Web Mining Techniques and its Application towards Web

CSCI 5417 Information Retrieval Systems. Jim Martin!

Hello, I am from the State University of Library Studies and Information Technologies, Bulgaria

Multimedia Systems. Lehrstuhl für Informatik IV RWTH Aachen. Prof. Dr. Otto Spaniol Dr. rer. nat. Dirk Thißen

New Media Production week 3

Multimedia Databases. 8 Audio Retrieval. 8.1 Music Retrieval. 8.1 Statistical Features. 8.1 Music Retrieval. 8.1 Music Retrieval 12/11/2009

Multimedia Information Retrieval

An Introduction to Content Based Image Retrieval

Relational Database Systems 1

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Informatica Enterprise Information Catalog

5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator

Lecture 19 Media Formats

Relational Database Systems 2 5. Query Processing

Part 11: Collaborative Filtering. Francesco Ricci

Description and search of multimedia data

Distributed Data Management

INFSCI 2140 Information Storage and Retrieval Lecture 1: Introduction. INFSCI 2140 and your program

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Pattern recognition (4)

Distributed Data Management

MPEG-7. Multimedia Content Description Standard

Multi-modal Information Retrieval experiences from Context-Aware Image Management, CAIM

Relational Database Systems 2 5. Query Processing

MULTIMEDIA DATABASES OVERVIEW

Multimedia Database Systems. Retrieval by Content

Multimedia on the Web

Introductory Visualizing Technology

So, what is data compression, and why do we need it?

- Content-based Recommendation -

Feature Selection in Learning Using Privileged Information

Multimedia Systems. Part 1. Mahdi Vasighi

CSI 4107 Image Information Retrieval

Relational Database Systems 1

Introduction to Similarity Search in Multimedia Databases

Data Warehousing & Data Mining

COMP6237 Data Mining Searching and Ranking

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

Multimedia Information Systems

Information Systems (Informationssysteme)

Outline. Database Theory. Prerequisites and Admission. Classes VU , SS 2018

Relational Database Systems 1

Presentational aids should be used for the right reasons in the right ways

Lecture #3: Digital Music and Sound

Graphics in IT82. Representing Graphical Data. Graphics in IT82. Lectures Overview. Representing Graphical Data. Logical / Physical Representation

CompSci 125 Lecture 02

CISC 7610 Lecture 4 Approaches to multimedia databases

CSE 132A. Database Systems Principles

In this lesson we are going to review some of the most used scanning devices.

0. Organizational Issues. 0. Why should you be here? 0. Why should you be here? 0. Why should you be here? Relational Database Systems

Database System Concepts and Architecture

CS4670: Computer Vision

Information Retrieval Spring Web retrieval

Representing Graphical Data

Introduction to Database Systems. Motivation. Werner Nutt

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

Multimedia Search Engine

LATIHAN Identify the use of multimedia in various fields.

Topic 2 Data and Information. Data Data can be defined as a set of recorded facts, numbers or events that have no meaning.

Lecture Information Multimedia Video Coding & Architectures

Information Retrieval and Knowledge Organisation

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency

Database System Concepts and Architecture

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Introduction: Databases and. Database Users

Dr. Angelika Reiser Chair for Database Systems (I3)

MINI MOCK (MIMO) Computer Science 2210

Homework: Exercise 19. Homework: Exercise 21. Homework: Exercise 20. Homework: Exercise 22. Detour: Apache Lucene

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 1-1

Different File Types and their Use

Relational Database Systems 1

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018

Your very own movie studio. menu bar

Identifying Important Communications

3.01C Multimedia Elements and Guidelines Explore multimedia systems, elements and presentations.

Relational Database Systems 1 Wolf-Tilo Balke Hermann Kroll

Relational Database Systems 1 Wolf-Tilo Balke Hermann Kroll, Janus Wawrzinek, Stephan Mennicke

Transcription:

Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

0 Organizational Issues Lecture 07.04.2011 14.07.2011 09:45-12:15 (3 lecture hours with a break) Exercises, detours, and homeworks 4 or 5 Credits depending on the examination rules Exams Oral exam Achieving more than 50% in homework points is advised Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2

0 Organizational Issues Recommended literature Schmitt: Ähnlichkeitssuche in Multimedia-Datenbanken, Oldenbourg, 2005 Steinmetz: Multimedia-Technologie: Grundlagen, Komponenten und Systeme, Springer, 1999 Relational Database Systems 1 Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3

0 Organizational Issues Castelli/Bergman: Image Databases, Wiley, 2002 Khoshafian/Baker: Multimedia and Imaging Databases, Morgan Kaufmann, 1996 Sometimes: original papers (on our Web page) Relational Database Systems 1 Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4

0 Organizational Issues Course Web page http://www.ifis.cs.tu-bs.de/teaching/ss-11/mmdb Contains slides, exercises, related papers and a video of the lecture Any questions? Just drop us an email Relational Database Systems 1 Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5

1 Introduction 1 Introduction 1.1 What are multimedia databases? 1.2 Multimedia database applications 1.3 Evaluation of retrieval techniques Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6

1.1 Multimedia Databases What are multimedia databases (MMDB)? Databases + multimedia = MMDB Key words: databases and multimedia We already know databases, so what is multimedia? Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 7

1.1 Basic Definitions Multimedia The concept of multimedia expresses the integration of different digital media types The integration is usually performed in a document Basic media types are text, image, vector graphics, audio and video Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 8

1.1 Data Types Text Text data, Spreadsheets, E-Mail, Image Photos (Bitmaps), Vector graphics, CAD, Audio Speech- and music records, annotations, wave files, MIDI, MP3, Video Dynamical image record, frame-sequences, MPEG, AVI, Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 9

1.1 Documents Earliest definition of information retrieval: Documents are logically interdependent digitally encoded texts Extension to multimedia documents allows the additional integration of other media types as images, audio or video Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10

1.1 Documents Document types Media objects are documents which are of only one type (not necessarily text) Multimedia objects are general documents which allow an arbitrary combination of different types Multimedia data is transferred through the use of a medium Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11

1.1 Basic Definitions Medium A medium is a carrier of information in a communication connection It is independent of the transported information The used medium can also be changed during information transfer Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12

1.1 Medium Example Book Communication between author and reader Independent from content Hierarchically built on text and images Reading out loud represents medium change to sound/audio Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13

1.1 Medium Classification Based on receiver type Visual/optical medium Acoustic mediums Haptical medium through tactile senses Olfactory medium through smell Gustatory medium through taste Based on time Dynamic Static Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 14

1.1 Multimedia Databases We now have seen What multimedia is And how it is transported (through some medium) But why do we need databases? Most important operations of databases are data storage and data retrieval Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15

1.1 Multimedia Databases Persistent storage of multimedia data, e.g.: Text documents Vector graphics, CAD Images, audio, video Content-based retrieval Efficient content based search Standardization of meta-data (e. g., MPEG-7, MPEG-21) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 16

1.1 Multimedia Databases Stand-alone vs. database storage model? Special retrieval functionality as well as corresponding optimization can be provided in both cases But in the second case we also get the general advantages of databases Declarative query language Orthogonal combination of the query functionality Query optimization, Index structures Transaction management, recovery... Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17

1.1 Historical Overview 1960 1970 Retrieval procedures for text documents (Information Retrieval) Relational Databases and SQL 1980 Presence of multimedia objects intensifies 1990 SQL-92 introduces BLOBs First Multimedia-Databases 2000 Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18

1.1 Commercial Systems Relational Databases use the data type BLOB (binary large object) Un-interpreted data Retrieval through metadata like e.g., file name, size, author, Object-relational extensions feature enhanced retrieval functionality Semantic search IBM DB2 Extenders, Oracle Cartridges, Integration in DB through UDFs, UDTs, Stored Procedures, Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19

1.1 Requirements Requirements for multimedia databases (Christodoulakis, 1985) Classical database functionality Maintenance of unformatted data Consideration of special storage and presentation devices Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20

1.1 Requirements To comply with these requirements the following aspects need to be considered Software architecture new or extension of existing databases? Content addressing identification of the objects through content-based features Performance improvements using indexes, optimization, etc. Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21

1.1 Requirements User interface how should the user interact with the system? Separate structure from content! Information extraction (automatic) generation of content-based features Storage devices very large storage capacity, redundancy control and compression Information retrieval integration of some extended search functionality Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22

1.1 Retrieval Retrieval: choosing between data objects. Based on a SELECT condition (exact match) or a defined similarity connection (best match) Retrieval may also cover the delivery of the results to the user Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23

1.1 Retrieval Closer look at the search functionality Semantic search functionality Orthogonal integration of classical and extended functionality Search does not directly access the media objects Extraction, normalization and indexing of contentbased features Meaningful similarity/distance measures Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24

1.1 Content-based Retrieval Retrieve all images showing a sunset! What exactly do these images have in common? Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25

1.1 Schematic View Usually 2 main steps Example: image databases Image collection Digitization Image analysis and feature extraction Image database Creating the database Image query Digitization Querying the database Image analysis and feature extraction Similarity search Search result Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 26

1.1 Detailed View Query Result 3. Query preparation 5. Result preparation Query plan & feature values Result data 4. Similarity computation & query processing Feature values Raw & relational data 2. Extraction of features Raw data MM-Database 1. Insert into the database MM-Objects + relational data Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27

Relevance feedback 1.1 More Detailed View Query Result Query preparation Result preparation Normalization Segmentation Feature extraction Optimization Medium transformation Format transformation Feature values Query plan Result data Similarity computation Query processing Feature values Feature index Feature extraction Feature recognition Feature preparation MM-Database BLOBs/CLOBs Pre-processing Decomposition Normalization Segmentation Relational DB Metadata Profile Structure data MM-Objects Relational data Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 28

1.2 Applications Lots of multimedia content on the Web Social networking e.g., Facebook, MySpace, Hi5, Orkut, etc. Photo sharing e.g., Flickr, Photobucket, Imeem, Picasa, etc. Video sharing e.g., YouTube, Megavideo, Metacafe, blip.tv, Liveleak, etc. Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29

1.2 Applications Cameras are everywhere In London there are at least 500,000 cameras in the city, and one study showed that in a single day a person could expect to be filmed 300 times Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30

1.2 Applications Picasa face recognition Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31

1.2 Applications Picasa, face recognition example Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 32

1.2 Applications Picasa, learning phase Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 33

1.2 Applications Picasa example Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 34

1.2 Sample Scenario Consider a police investigation of a large-scale drug operation Possible generated data: Video data captured by surveillance cameras Audio data captured Image data consisting of still photographs taken by investigators Structured relational data containing background information Geographic information system data Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35

1.2 Sample Scenario Possible queries Image query by keywords: police officer wants to examine pictures of Tony Soprano Query: retrieve all images from the image library in which Tony Soprano appears" Image query by example: the police officer has a photograph and wants to find the identity of the person in the picture He hopes that someone else has already tagged another photo of this person Query: retrieve all images from the database in which the person appearing in the (currently displayed) photograph appears Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36

1.2 Sample Scenario Video Query: (Murder case) The police assumes that the killer must have interacted with the victim in the near past Query: Find all video segments from last week in which Jerry appears Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37

1.2 Sample Scenario Heterogeneous Multimedia Query: Find all individuals who have been photographed with Tony Soprano and who have been convicted of attempted murder in New Jersey and who have recently had electronic fund transfers made into their bank accounts from ABC Corp. Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38

1.2 Characteristics so there are different types of queries what about the MMDB characteristics? Static: high number of search queries (read access), few modifications of the data Dynamic: often modifications of the data Active: the functionality of the database leads to operations at application level Passive: database reacts only at requests from outside Standard search: queries are answered through the use of metadata e.g., Google-image search Retrieval functionality: content based search on the multimedia repository e.g., Picasa face recognition Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39

1.2 Example Passive static retrieval Art historical use case Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40

1.2 Example Possible hit in a multimedia database Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41

1.2 Example Active dynamic retrieval Wetter warning through evaluation of satellite photos Extraction Typhoon-Warning for the Philippines Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42

1.2 Example Standard search Queries are answered through the use of metadata e.g., Google-image search Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43

1.2 Example Retrieval functionality Content based e.g., Picasa face recognition Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44

1.3 Retrieval Evaluation Basic evaluation of retrieval techniques Efficiency of the system Efficient utilization of system resources Scalable also over big collections Effectivity of the retrieval process High quality of the result Meaningful usage of the system What is more important? An effective retrieval process or an efficient one? Depends on the application! Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45

1.3 Evaluating Efficiency Characteristic values to measure efficiency are e.g.: Memory usage CPU-time Number of I/O-Operations Response time Depends on the (Hardware-) environment Goal: the system should be efficient enough! Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46

1.3 Evaluating Effectivity Measuring effectivity is more difficult and always depending on the query We need to define some query-dependent evaluation measures! Objective quality metrics Independent from the querying interface and the retrieval procedure Allows for comparing different systems/algorithms Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47

1.3 Evaluating Effectivity Effectivity can be measured regarding an explicit query Main focus on evaluating the behavior of the system with respect to a query Relevance of the result set But effectivity also needs to consider implicit information needs Main focus on evaluating the usefulness, usability and user friendliness of the system Not relevant for this lecure! Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48

1.3 Relevance Relevance as a measure for retrieval: each document will be binary classified as relevant or irrelevant with respect to the query This classification is manually performed by experts The response of the system to the query will be compared to this classification Compare the obtained response with the ideal result Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49

1.3 Involved Sets Then apply the automatic retrieval system: collection searched for (= relevant) found (= query result) Experts say: this is relevant The automatic retrieval says: this is relevant Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50

1.3 False Positives False positives: irrelevant documents, classified as collection relevant by the system False alarms fd ca fa searched for found Needlessly increase the result set Usually inevitable (ambiguity) Can be easily eliminated by the user cd Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51

1.3 False Negatives False negatives: relevant documents classified by collection the system as irrelevant False dismissals searched for Dangerous, since they cd can t be detected easily by the user Are there better documents in the collection which the system didn t return? False alarms are usually not as bad as false dismissals fd ca fa found Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52

1.3 Remaining Sets Correct positives (correct alarms) All documents correctly classified by the system as relevant Correct negatives (correct dismissals) All documents correctly classified by the system as irrelevant All sets are disjunctive and their reunion is the collection entire document collection fd searched for ca cd fa found Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53

1.3 Overview Confusion matrix: Userevaluation Systemevaluation relevant relevant ca irrelevant fd irrelevant fa cd Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54

1.3 Interpretation Relevant results = fd + ca Handpicked by experts! Retrieved results = ca + fa Retrieved by the system collection fd searched for ca fa found cd Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55

1.3 Precision Precision measures the ratio of correctly returned documents relative to all returned documents P = ca / (ca + fa) Value between [0, 1] (1 representing the best value) High number of false alarms mean worse results fd searched for collection ca cd fa found Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 56

1.3 Recall Recall measures the ratio of correctly returned documents relative to all relevant documents R = ca / (ca + fd) collection Value between [0, 1] (1 representing the best value) High number of false drops mean worse results fd searched for ca cd fa found Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57

1.3 Precision-Recall Analysis Both measures only make sense, if considered at the same time E.g., get perfect recall by simply returning all documents, but then the precision is extremely low Can be balanced by tuning the system E.g., smaller result sets lead to better precision rates at the cost of recall Usually the average precision-recall of more queries is considered (macro evaluation) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 58

1.3 Actual Evaluation collection Alarms (returned elements) fd ca fa divided in ca and fa searched for found Precision is easy to calculate cd Dismissals (not returned elements) are not so trivial to divide in cd und fd, because the entire collection has to be classified Recall is difficult to calculate Standardized Benchmarks Provided connections and queries Annotated result sets Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 59

1.3 TREC Text REtrieval Conference De-Facto-Standard since 1992 Establish average precision for 11 fixed recall points (0; 0,1; 0,2; ; 1) according to defined procedures (trec_eval) Different tracks, extended also for video data, Web retrieval and Question Answering Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 60

1.3 Example Query fa ca fd cd P R Q 1 8 2 6 4 0,2 0,25 Q 2 2 8 2 8 0,8 0,8 Average 0,5 0,525 Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 61

1.3 Representation Precision-Recall-Curves Average precision of the system 3 at a recall-level of 0,2 System 1 System 2 System 3 Which system is the best? What is more important: recall or precision? Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 62

Next lecture Retrieval of images by color Introduction to color spaces Color histograms Matching Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 63