Minimal-Impact Personal Audio Archives

Similar documents
Audio & Music Research at LabROSA

LabROSA Research Overview

Mining Large-Scale Music Data Sets

Multimedia Indexing. Lecture 12: EE E6820: Speech & Audio Processing & Recognition. Spoken document retrieval Audio databases.

Lecture 12: Multimedia Indexing. Spoken Document Retrieval (SDR)

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Detection of Acoustic Events in Meeting-Room Environment

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Get the most out of your Oticon hearing instruments

Movie synchronization by audio landmark matching

Connecting to Webex for eorganic Webinar Attendees: Instructions and Troubleshooting

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING

Adobe Sound Booth Tutorial

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

Keyword Recognition Performance with Alango Voice Enhancement Package (VEP) DSP software solution for multi-microphone voice-controlled devices

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

Principles of Audio Coding

Spectral modeling of musical sounds

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition

MPEG-7 Audio: Tools for Semantic Audio Description and Processing

Robustness and independence of voice timbre features under live performance acoustic degradations

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri

Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors

Repeating Segment Detection in Songs using Audio Fingerprint Matching

System Identification Related Problems at

How to Change the Default Playback & Recording Audio Device. How to Change the Default Playback Device

Chapter 5.5 Audio Programming

A Short Introduction to Audio Fingerprinting with a Focus on Shazam

Multimedia Database Systems. Retrieval by Content

LP2CD Wizard 2.0 User's Manual

INTRODUCTION TO SAMPLING 1

Basic Features Guide

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

Best-in-class audio recording

Andrea PureAudio BT-200 Noise Canceling Bluetooth Headset Performance Comparative Testing

HKIoTDemo Documentation

Optimal Video Adaptation and Skimming Using a Utility-Based Framework

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2

Manifold Constrained Deep Neural Networks for ASR

First Communications Cloud IP PBX User Guide (Polycom)

8x8 Virtual Office Online with Softphone User Guide

Ponto Streamer. New wireless communication possibilities. Ponto TM The Bone Anchored Hearing System

Polycom VVX410. Full user guide

Modeling Coarticulation in Continuous Speech

A GET YOU GOING GUIDE

D1.4 Digitization Guide Cassette Audio Project Parameters

Complex Identification Decision Based on Several Independent Speaker Recognition Methods. Ilya Oparin Speech Technology Center

Quick Start Guide MAC Operating System Built-In Accessibility

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

Using Speech Recognition for controlling a Pan-Tilt-Zoom Network Camera

CS 525M Mobile and Ubiquitous Computing Healthcare and Personal Assistants Intro. Emmanuel Agu

Available online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article

CHAPTER 3. Preprocessing and Feature Extraction. Techniques

Introduction to Google Voice

AAC Apps App includes three different air horn sounds. Tap on horn wanted, and shake to increase volume. Shake harder to increase volume further.

EVAS CAN Bus. Ref : User Guide

Bamboo Paper - Notebook. FreeSpeech. Locabulary NO Wifi Req'd - see description for details. MyTalkTools Mobile Lite

Lecture 16 Perceptual Audio Coding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

ELANTRA TOPICS. Phone Pairing Navigation Blue Link

VoIP Overview. Device Setup The device is configured via the VoIP tab of the devices Device Properties dialog in Integration Designer.

Table of Contents. The Home and More screens... 14

Are You Too Busy? Practical Tips For Better Time Management

How Do I Search & Replay Communications

SMARTWATCH User Manual

System Identification Related Problems at SMN

Multimedia Event Detection for Large Scale Video. Benjamin Elizalde

Let life inspire you. with ReSound Unite wireless accessories. Learn more about ReSound Unite wireless accessories.

Agenda. Quick Start Menu. Understanding the Interface. Voice Status Icons. Commonly Used Features. Security. Dialing Out. Question & Answer Feature

Logging in. Your teacher will give you a login address during lectures or via .

Call Recording System. Installation and User Guide

Data fusion and multi-cue data matching using diffusion maps

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

Innovative Industrial Solutions, Inc Skyline Drive Russellville, AR Phone (479) Fax (479)

SAMSUNG HANDSET USER GUIDE FOR DS-5007S / DS-5014S / DS-5038S / DS-5014D / DS-5021D ITP-5107 / ITP-5114D / ITP5121D

Audio involves developing a variety of techniques. In this short course, you will learn the necessary skills to do the following:

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Lesson 11. Media Retrieval. Information Retrieval. Image Retrieval. Video Retrieval. Audio Retrieval

Talking Books in PowerPoint

How to edit audio tracks

Large scale object/scene recognition

R300. Quick Start Guide 15G06A E3403

Phone Settings 26 Ringer Volume 26. Basic Calling Features 13 Help Online Services 43

IPLDK CRS. Installation and User Guide ISSUE 1.0A

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

Bringing the Voices of Communities Together:

QUICK TIPS SANTA FE. Phone Pairing Navigation Blue Link TOPICS

R-09HR ReleaseNote. R-09HR Operating System Version 2.00 RO9HRRN200

OCR Interfaces for Visually Impaired

Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming

INSTRUCTION MANUAL Mi9 Executive Digital Voice Recorder, 60hrs SB-VR9100

TABLE OF CONTENTS. Introduction Setting up Your Patriot Voice Controls Starting the System Controls...

NAVIGATION/TELECOMMUNICATION - SERVICE INFORMATION

Table of Contents. iii

8180 LOUD RINGER USER GUIDE

National Writers Workshop Wichita, Kan., May 19 20, 2007

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Transcription:

Minimal-Impact Personal Audio Archives Dan Ellis, Keansub Lee, Jim Ogle Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. Personal Audio Archives 2. Segmenting & Clustering 3. Speech Detection 4. Repeated Events 5. Future Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 1 /18

1. Personal Audio Archives Easy to record everything you hear <2GB / week @ 64 kbps Hard to find anything how to scan? how to visualize? how to index? Need automatic analysis Need minimal impact Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 2 /18

Information in Audio Long-duration recordings contain info on: location type (restaurant, street,...) and specific activity talking, walking, typing people generic (2 males), specific (Chuck & John) spoken content... maybe but not: what people and things looked like day/night gaze, posture, motion,... Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 3 /18

Applications Automatic appointment-book history fills in when & where of movements Life statistics how long did I spend in meetings this week? most frequent conversations favorite phrases? Retrieving details what exactly did I promise? privacy issues... Nostalgia... or what? Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 4 /18

2. Segmentation & Clustering Top-level structure for long recordings: Where are the major boundaries? e.g. for diary application support for manual browsing Length of fundamental time-frame 60s rather than 10ms? background more important than foreground average out uncharacteristic transients Perceptually-motivated features.. so results have perceptual relevance broad spectrum + some detail Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 5 /18

Features 20 Average Linear Energy 120 20 Normalized Energy Deviation 60 freq / bark 15 10 5 100 80 freq / bark 15 10 5 40 20 20 Average Log Energy 60 db 120 20 Log Energy Deviation db 15 freq / bark freq / bark 15 10 5 20 15 10 5 Average Spectral Entropy 100 80 60 db 0.9 0.8 0.7 0.6 0.5 freq / bark freq / bark 15 10 5 20 15 10 5 Spectral Entropy Deviation 10 5 db 0.5 0.4 0.3 0.2 0.1 bits 50 100 150 200 250 300 350 400 450 time / min Capture both average and variation Capture a little more detail in subbands... bits Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 6 /18

BIC Segmentation Results Evaluate: 62 hr hand-marked dataset 8 days, 139 segments, 16 categories measure Correct Accept % @ False Accept = 2%: Feature Correct Accept μdb 80.8% μh 81.1% σh/μh 81.6% μdb + σh/μh 84.0% μdb + σh/μh + μh 83.6% mfcc 73.6% Sensitivity 0.8 0.7 0.6 0.5 0.4 0.3 o µ db µ H! H /µ H µ db +! H /µ H µ db + µ H +! H /µ H 0.2 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 1 - Specificity Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 7 /18

Segment Clustering Daily activity has lots of repetition: Automatically cluster similar segments affinity of segments as KL2 distances 4*5)#1-% 1))%'23 -"#"0-),"#,)# ()!%*#)/,'(('"#.,#)"- ()!%*#)+!"#$%"&' ;01),0:('23 4%#))% #)4%"*#"2% (',#"#9 + 768!"15*4 7!15 (', #4% 4%# 666 Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 8 /18

Clustering Results Clustering of automatic segments gives anonymous classes BIC criterion to choose number of clusters make best correspondence to 16 GT clusters Frame-level scoring gives ~70% correct errors when same place has multiple ambiences Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 9 /18

3. Speech Detection Speech emerges as most interesting content Just identifying speech would be useful goal is speaker identification / labeling Lots of background noise conventional Voice Activity Detection inadequate Insight: Listeners detect pitch track (melody) look for voice-like periodicity in noise 4000 coffeeshop excerpt Frequency 3000 2000 1000 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 10/18

Voice Periodicity Enhancement Noise-robust subband autocorrelation Subtract local average suppresses steady background e.g. machine noise 15 min test set; 88% acc (79% w/o enhancement) also for enhancing speech (harmonic filtering) Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 11/18

4. Repeating Events Recurring sound events can be informative indicate similar circumstance... but: define event sound organization define recurring event how similar?.. and how to find them tractable? Idea: Use hashing (fingerprints) index points to other occurrences of each hash; intersection of hashes points to match - much quicker search use a fingerprint insensitive to background? Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 12/18

Shazam Fingerprints Prominent spectral onsets are landmarks; Use relations {f1, f2, t} as hashes 4000 Phone ring - Shazam fingerprint 3000 2000 1000 0 0 0.5 1 1.5 2 2.5 3 intrinsically robust to background noise Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 13/18

Exhaustive Search for Repeats More selective hashes few hits required to confirm match (faster; better precision) but less robust to backgound (reduce recall) Works well when exact structure repeats recorded music, electronic alerts no good for organic sounds e.g. garage door Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 14/18

5. Future: Browsing Tools / Diary interface Browsing links to other information (diary, email, photos) synchronize with note taking? (Stifelman & Arons) Release Tools + how to for capture!"#!! '!!(D!%D&$ '!!(D!%D&( '!!(D!%D&) '!!(D!%D&* '!!(D!%D&+!"#$!!%#!!!%#$! &!#!! &!#$! &&#!!,-./01223,-./01223 >2= 3.067-. <..68=: <..68=:',2/63.0 EFG!( C' 2769223.067-. EFG!$ &&#$! &'#!! &'#$! 276922:-27, 276922276922 276922<..68=:' &$#$! 34; &(#!! <..68=:'?4=7.3 276922?8H. C' <..68=: 3.067-.' @--2A2B 276922- F4<;4-64B &(#$! 276922276922<..68=:,2/63.0 <..68=: &+#$! &"#$! &%#!! &%#$! /.<8=4- :,,2/63.0 :-27, C.//.- H.4=/7; <..68=:' :-27, 27692234; Personal Audio Archives - Ellis, Lee, Ogle &"#!! :-27, :-414< &*#$! &+#!! :-27, 276922- &)#!! &*#!! -9: 02<,<6: &$#!! &)#$! :-27, 3.067-. 276922-,-./01223 =4614= 2006-07-19 p. 15 /18

Future: Speech Recognition Most audio is too noisy for standard ASR actually reassuring for privacy issues But... similar to Meeting Recordings NIST distant microphone conditions Speech enhancement - directional filtering 2 channels a big improvement over one... use a more special-purpose directional mic? Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 16/18

Privacy and Security Recordings are controversial privacy expectations: speech should be ephemeral? Oops button, delayed review (Roy) subpoenas... (Golubchik) Access to recordings is very sensitive.. but preservation is important too Approaches don t store intelligible audio.. but lessens utility - maybe store ASR output? split and store on multiple machines - tiered, distributed trust/access protocols Big issue! Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 17 /18

Conclusions Personal Audio is easy & cheap to collect but is it any use? Segmentation/clustering works well Voice detection in noise is harder prospects for speaker identification Hashing to find arbitrary repeating events Tools distribution as a goal Personal Audio Archives - Ellis, Lee, Ogle 2006-07-19 p. 18 /18