Multi-Modal Audio, Video, and Physiological Sensor Learning for Continuous Emotion Prediction
|
|
- Oswin Carroll
- 5 years ago
- Views:
Transcription
1 Multi-Modal Audio, Video, and Physiological Sensor Learning for Continuous Emotion Prediction Youngjune Gwon 1, Kevin Brady 1, Pooya Khorrami 2, Elizabeth Godoy 1, William Campbell 1, Charlie Dagli 1, Thomas S Huang 2 1. MIT Lincoln Laboratory Human Language Technology Group 2. University of Illinois Urbana-Champaign Beckman Institute October 16, 2016 This work was sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA C Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.
2 System Overview & Technical Contributions Approach Leverage derived low- and high-level features and exploit the timecorrelated nature of the emotional state Novel Features Low-level prosodic-based descriptors High-level Unsupervised sparse coding of audio and video features Supervised deep-learning of video and physiological features Kalman Filtering framework with smoothing Bias term compensation for non-zero mean measurement noise AVEC- 2
3 Outline System Overview & Technical Contributions Technical Overview System Architecture Audio Processing Pipeline Unsupervised High-Level Features (Sparse Coding) Supervised High-Level Features (Deep Learning) Time-Varying Emotional State Estimation Results Concluding Remarks AVEC- 3
4 System Architecture Sensor Channels Audio Low-Level Audio Feature Extraction Baseline MFCC Prosodic SDC High-Level Unsupervised (Sparse Coding) Framework Video High-Level Supervised (Deep-Learning) Framework Kalman Filtering Framework Emotional State Physiological Baseline Estimates * (all sensors) * Extracted using supplied AVEC baseline code AVEC- 4 MFCC: Mel-frequency cepstral coefficients SDC: Shifted delta cepstrum
5 Audio Feature Processing Pipeline In addition to precomputed audio features, we perform the following feature extraction MFCC: 40-dimensional Mel-frequency cepstral coefficients per 10-msec audio frame are computed using filterbank SDC: 56-dimensional shifted delta cepstra are computed from stacked MFCC vectors of multiple frames Prosody: 7-dimensional perceptual audio features based on vocal effort, variations in intonation and speaking rate We consider MFCC, SDC, and prosody low-level audio features Not suitable for regressing directly, but treated as input for highlevel learning High-level feature learning using sparse coding Audio (*.WAV) Speech Activity Detection MFCC, SDC, Prosody Feature Extraction High-level feature learning (Sparse Coding) SVM Regression Arousal, valence estimates AVEC- 5 MFCC: Mel-frequency cepstral coefficients SDC: Shifted delta cepstra SVM: Support vector machine
6 Unsupervised High-Level Features Sparse Coding Audio & Video Channels Low-level feature aggregation MFCC and SDC features are max-pooled over 40 msec before sparse coding For sparse coding, we used L 1 -regularized LARS with hyperparameters trained in K = , λ = , average pooling over 1 2 second window min {D,y} ǁ x Dyǁ λǁ yǁ 1 Regression L 2 -regularized L 2 -loss linear SVM LARS: Least angle regression MFCC: Mel-frequency cepstral coefficients SDC: Shifted delta cepstra SVM: Support vector machine AVEC- 6
7 Supervised High-Level Features Deep Learning Video & Physiological Channels Video Appearance Data (CNN+RNN) / Physiological Data (LSTM) Input: A fixed size window (W) of video frames / physiological features 1. Video (appearance): pass extracted faces from frame sequence through a CNN, and the resulting CNN features through a recurrent network (RNN) 2. Physiological (EDA, HRHRV): Pass baseline features from frame sequence through a LSTM Sensor Channel 3. Compute desired output ( " $ ) AVEC Dev Set CCC Results Baseline Arousal MITLL- UIUC Baseline Valence MITLL- UIUC ( $ ' ( $ 1 ( $ RNN RNN RNN Video (appearance) CNN CNN CNN HRHRV EDA & $ ' & $ 1 & $ AVEC- 7 CNN: Convolutional Neural Network RNN: Recurrent Neural Network LSTM: Long Short Term Memory EDA: Electrodermal activity HRHRV: Heart rate & heart rate variability
8 Time-Varying Emotional State Estimation System Equations Emotional State x ( k + 1) = Ax( k) + w( k) Dynamic System z ( k) = Cx( k) + β + v( k) Measurement System Q = cov( w, w) Process Noise R = cov( v, v) Measurement Noise Measurement Bias Sensor Measurements System Parameters A ˆ, Cˆ, Qˆ, Rˆ, βˆ System Identification ˆ T T A = ( X )( ) 1 2, N X1, N 1 X1, N 1X 1, N 1 [ ˆ ˆ T T C β ] = ( Z X )( ) 1 1, N X 1, N X 1, N 1, N ( X 2, N AX1, 1 ) ( Z CX β ) Qˆ = cov N ˆ cov R = 1, N 1, N X 1, N X = 1 1, N xn X Z [ x ] 1, N = 1 x N [ z ] 1, N = 1 z N Held out data Leveraging Kalman Filter estimation model (1 st order polynomial) with smoothing Introducing bias term to compensate for non-zero mean sensor measurement error Sensor Measurements z z z( k) = z z audio physiolog ( k) ( k) ( k) ical ( k) video _ appearance video _ geometric + - υ zˆ ( k k 1) z ˆ( k + 1 k) Measurement Prediction Kalman Estimator x ˆ( k + 1 k) xˆ ( k k) Dynamic Prediction Kalman Smoother Emotional State x ˆ( k k + T ) AVEC- 8
9 AVEC Emotion State Estimation Results AVEC Channel and Feature Utilization AVEC CCC Results Channel Audio Low-level features High-level features Arousal Valence Baseline Yes Yes Baseline Sparse coding Yes Yes MFCC Sparse coding Yes Yes Prosody Sparse coding Yes SDC Sparse coding Yes Yes Data Partition Arousal Valence Baseline MITLL-UIUC Baseline MITLL-UIUC Dev Set Test Set Video (appearance) Video (geometric) Baseline Yes Yes CNN+RNN Yes Yes Sparse coding Yes Baseline Yes Yes Sparse coding Yes ECG Baseline Yes Yes HRHRV Baseline Yes Yes Baseline CNN+RNN Yes Yes EDA Baseline Yes Baseline CNN+RNN Yes Yes SCL Baseline Yes SCR Baseline Strong Impact Performance was strongly impacted by: Arousal: Low-level audio MFCC & SDC features with high-level sparse coding features Valence: High-level deeply-learned (CNN-DNN) video (appearance) features and sparse coded video (geometric) features General: Kalman filtering fusion framework exploiting time-correlated signal CCC: Concordance correlation coefficient ECG: Electrocardiogram EDA: Electrodermal activity HRHRV: Heart rate & heart rate variability SCL: Skin conductance level SCR: Skin conductance resistance MFCC: Mel-frequency cepstral coefficients SDC: Shifted delta cepstra AVEC- 9
10 Concluding Remarks Provided overview of MITLL-UIUC AVEC System Novel low-level (prosodic) and high-level (sparse coding and deep-learning) features Kalman filtering fusion framework with compensation for non-zero mean sensor measurement noise Reviewed emotional state recognition results driven by: Arousal: High-level sparse coding of low-level MFCC & SDC features Valence: High-level deeply learned video (appearance) features and sparse coded video (geometric) features General: Kalman filtering framework exploitation of time-correlated signal Next steps Refine low-level and high-level features and apply to other sensors Multiple hypothesis emotional state estimation framework Improved train-test data partitioning & sensor channel delay compensation Questions? AVEC- 10 MFCC: Mel-frequency cepstral coefficients SDC: Shifted delta cepstra
arxiv: v1 [cs.cv] 4 Feb 2018
End2You The Imperial Toolkit for Multimodal Profiling by End-to-End Learning arxiv:1802.01115v1 [cs.cv] 4 Feb 2018 Panagiotis Tzirakis Stefanos Zafeiriou Björn W. Schuller Department of Computing Imperial
More informationMultimodal Sparse Coding for Event Detection
Multimodal Sparse Coding for Event Detection Youngjune Gwon William M. Campbell Kevin Brady Douglas Sturim MIT Lincoln Laboratory, Lexington, M 02420, US Miriam Cha H. T. Kung Harvard University, Cambridge,
More informationHello Edge: Keyword Spotting on Microcontrollers
Hello Edge: Keyword Spotting on Microcontrollers Yundong Zhang, Naveen Suda, Liangzhen Lai and Vikas Chandra ARM Research, Stanford University arxiv.org, 2017 Presented by Mohammad Mofrad University of
More informationMulti-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature
0/19.. Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature Usman Tariq, Jianchao Yang, Thomas S. Huang Department of Electrical and Computer Engineering Beckman Institute
More informationSocial Behavior Prediction Through Reality Mining
Social Behavior Prediction Through Reality Mining Charlie Dagli, William Campbell, Clifford Weinstein Human Language Technology Group MIT Lincoln Laboratory This work was sponsored by the DDR&E / RRTO
More informationChapter 3. Speech segmentation. 3.1 Preprocessing
, as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents
More informationLLTools: Machine Learning for Human Language Processing
LLTools: Machine Learning for Human Language Processing C. K. Dagli, W. M. Campbell, L. Li, J. Williams, K. Geyer, G. Vidaver, J. Acevedo-Aviles, E. Wolf, J. Taylor, J. P. Campbell Human Language Technology
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationCS231N Section. Video Understanding 6/1/2018
CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image
More informationCluster-based 3D Reconstruction of Aerial Video
Cluster-based 3D Reconstruction of Aerial Video Scott Sawyer (scott.sawyer@ll.mit.edu) MIT Lincoln Laboratory HPEC 12 12 September 2012 This work is sponsored by the Assistant Secretary of Defense for
More informationGYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)
GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE
More informationRECURRENT NEURAL NETWORKS
RECURRENT NEURAL NETWORKS Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning based method Supervised SVM MLP CNN RNN (LSTM) Localizati on GPS, SLAM Self Driving Perception Pedestrian
More informationIndex. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,
A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,
More informationAcoustic Features Fusion using Attentive Multi-channel Deep Architecture
CHiME 2018 Workshop on Speech Processing in Everyday Environments 07 September 2018, Hyderabad, India Acoustic Features Fusion using Attentive Multi-channel Deep Architecture Gaurav Bhatt 1, Akshita Gupta
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationBidirectional Recurrent Convolutional Networks for Video Super-Resolution
Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition
More informationLecture 7: Neural network acoustic models in speech recognition
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic
More informationTRACT: Threat Rating and Assessment Collaboration Tool
TRACT: Threat Rating and Assessment Collaboration Tool Robert Hollinger and Doran Smestad Advised by: George Heineman (WPI), Philip Marquardt (MIT/LL) Worcester Polytechnic Institute Major Qualifying Project
More informationMultimodal Gesture Recognition using Multi-stream Recurrent Neural Network
Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network Noriki Nishida and Hideki Nakayama Machine Perception Group Graduate School of Information Science and Technology The University
More informationPractical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow
Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains
More informationPTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks
PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks Pramod Srinivasan CS591txt - Text Mining Seminar University of Illinois, Urbana-Champaign April 8, 2016 Pramod Srinivasan
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationSensor-based Semantic-level Human Activity Recognition using Temporal Classification
Sensor-based Semantic-level Human Activity Recognition using Temporal Classification Weixuan Gao gaow@stanford.edu Chuanwei Ruan chuanwei@stanford.edu Rui Xu ray1993@stanford.edu I. INTRODUCTION Human
More informationVariable-Component Deep Neural Network for Robust Speech Recognition
Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 1 Course Overview This course is about performing inference in complex
More informationUsing Capsule Networks. for Image and Speech Recognition Problems. Yan Xiong
Using Capsule Networks for Image and Speech Recognition Problems by Yan Xiong A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved November 2018 by the
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationMachine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017
Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationAcoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing
Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se
More informationSpeaker Verification Using SVM
Mr. Rastoceanu Florin / Mrs. Lazar Marilena Military Equipment and Technologies Research Agency Aeroportului Street, No. 16, CP 19 OP Bragadiru 077025, Ilfov ROMANIA email: rastoceanu_florin@yahoo.com
More informationAdvanced Multimodal Machine Learning
Advanced Multimodal Machine Learning Lecture 1.2: Challenges and applications Louis-Philippe Morency Tadas Baltrušaitis 1 Objectives Identify the 5 technical challenges in multimodal machine learning Identify
More informationA new approach for supervised power disaggregation by using a deep recurrent LSTM network
A new approach for supervised power disaggregation by using a deep recurrent LSTM network GlobalSIP 2015, 14th Dec. Lukas Mauch and Bin Yang Institute of Signal Processing and System Theory University
More informationPitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery
Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian Institute of Science (IISc), Bangalore,
More informationLearning-based Localization
Learning-based Localization Eric Brachmann ECCV 2018 Tutorial on Visual Localization - Feature-based vs. Learned Approaches Torsten Sattler, Eric Brachmann Roadmap Machine Learning Basics [10min] Convolutional
More informationMusic Genre Classification
Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different
More informationIntroduction to Massive Data Interpretation
Introduction to Massive Data Interpretation JERKER HAMMARBERG JAKOB FREDSLUND THE ALEXANDRA INSTITUTE 2013 2/12 Introduction Cases C1. Bird Vocalization Recognition C2. Body Movement Classification C3.
More informationAuthentication of Fingerprint Recognition Using Natural Language Processing
Authentication of Fingerprint Recognition Using Natural Language Shrikala B. Digavadekar 1, Prof. Ravindra T. Patil 2 1 Tatyasaheb Kore Institute of Engineering & Technology, Warananagar, India 2 Tatyasaheb
More informationCode Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:
Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment
More informationA text-independent speaker verification model: A comparative analysis
A text-independent speaker verification model: A comparative analysis Rishi Charan, Manisha.A, Karthik.R, Raesh Kumar M, Senior IEEE Member School of Electronic Engineering VIT University Tamil Nadu, India
More informationFUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION
Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationMultiple Kernel Learning for Emotion Recognition in the Wild
Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationPartial Least Squares Regression on Grassmannian Manifold for Emotion Recognition
Emotion Recognition In The Wild Challenge and Workshop (EmotiW 2013) Partial Least Squares Regression on Grassmannian Manifold for Emotion Recognition Mengyi Liu, Ruiping Wang, Zhiwu Huang, Shiguang Shan,
More informationIMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES. Mitchell McLaren, Yun Lei
IMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES Mitchell McLaren, Yun Lei Speech Technology and Research Laboratory, SRI International, California, USA {mitch,yunlei}@speech.sri.com ABSTRACT
More informationConditional Random Fields as Recurrent Neural Networks
BIL722 - Deep Learning for Computer Vision Conditional Random Fields as Recurrent Neural Networks S. Zheng, S. Jayasumana, B. Romera-Paredes V. Vineet, Z. Su, D. Du, C. Huang, P.H.S. Torr Introduction
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationEstimation of Affective Level in the Wild With Multiple Memory Networks
Estimation of Affective Level in the Wild With Multiple Memory Networks Jianshu Li,2 Yunpeng Chen Shengtao Xiao Jian Zhao Sujoy Roy 2 Jiashi Feng Shuicheng Yan Terence Sim National University of Singapore
More informationReplay Attack Detection using DNN for Channel Discrimination
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Replay Attack Detection using DNN for Channel Discrimination Parav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland Pindrop, Atlanta, USA {pnagarsheth,ekhoury,kpatil,matt.garland}@pindrop.com
More information3D Attention-Driven Depth Acquisition for Object Identification
3D Attention-Driven Depth Acquisition for Object Identification Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or and Baoquan Chen National University of Defense
More informationarxiv: v1 [cs.cv] 6 Jul 2016
arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks
More informationRecurrent Neural Networks and Transfer Learning for Action Recognition
Recurrent Neural Networks and Transfer Learning for Action Recognition Andrew Giel Stanford University agiel@stanford.edu Ryan Diaz Stanford University ryandiaz@stanford.edu Abstract We have taken on the
More informationInternational Journal of Advance Engineering and Research Development
Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 10, October -2017 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Face
More informationDeep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm
Deep Learning on Arm Cortex-M Microcontrollers Rod Crawford Director Software Technologies, Arm What is Machine Learning (ML)? Artificial Intelligence Machine Learning Deep Learning Neural Networks Additional
More informationAnalyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun
Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about
More informationMapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008.
Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008. Introduction There is much that is unknown regarding
More informationNeetha Das Prof. Andy Khong
Neetha Das Prof. Andy Khong Contents Introduction and aim Current system at IMI Proposed new classification model Support Vector Machines Initial audio data collection and processing Features and their
More informationObject Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal
Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn
More informationAnomaly Detection in Very Large Graphs Modeling and Computational Considerations
Anomaly Detection in Very Large Graphs Modeling and Computational Considerations Benjamin A. Miller, Nicholas Arcolano, Edward M. Rutledge and Matthew C. Schmidt MIT Lincoln Laboratory Nadya T. Bliss ASURE
More informationNeural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationMEMORY AUGMENTED CONTROL NETWORKS
MEMORY AUGMENTED CONTROL NETWORKS Arbaaz Khan, Clark Zhang, Nikolay Atanasov, Konstantinos Karydis, Vijay Kumar, Daniel D. Lee GRASP Laboratory, University of Pennsylvania Presented by Aravind Balakrishnan
More informationDeep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?
Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 16, 2015 Urtasun & Zemel (UofT) CSC 411: 02-Regression Sep 16, 2015 1 / 16 Today Linear regression problem continuous
More informationMultimodal Medical Image Retrieval based on Latent Topic Modeling
Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath
More informationImplementation of Speech Based Stress Level Monitoring System
4 th International Conference on Computing, Communication and Sensor Network, CCSN2015 Implementation of Speech Based Stress Level Monitoring System V.Naveen Kumar 1,Dr.Y.Padma sai 2, K.Sonali Swaroop
More informationInput speech signal. Selected /Rejected. Pre-processing Feature extraction Matching algorithm. Database. Figure 1: Process flow in ASR
Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Feature Extraction
More information3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm
3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm Bing Han, Chris Paulson, and Dapeng Wu Department of Electrical and Computer Engineering University of Florida
More informationProviding Information Superiority to Small Tactical Units
Providing Information Superiority to Small Tactical Units Jeff Boleng, PhD Principal Member of the Technical Staff Software Solutions Conference 2015 November 16 18, 2015 Copyright 2015 Carnegie Mellon
More informationUsing Machine Learning for Classification of Cancer Cells
Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.
More informationModule 4. Non-linear machine learning econometrics: Support Vector Machine
Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity
More informationBinary Convolutional Neural Network on RRAM
Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua
More informationReal-Time Speech-Driven Face Animation with Expressions Using Neural Networks
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 100 Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks Pengyu Hong, Zhen Wen, and Thomas S. Huang, Fellow,
More informationIntroduction to ANSYS DesignXplorer
Lecture 4 14. 5 Release Introduction to ANSYS DesignXplorer 1 2013 ANSYS, Inc. September 27, 2013 s are functions of different nature where the output parameters are described in terms of the input parameters
More informationDiffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Yaguang Li Joint work with Rose Yu, Cyrus Shahabi, Yan Liu Page 1 Introduction Traffic congesting is wasteful of time,
More informationImplementing Long-term Recurrent Convolutional Network Using HLS on POWER System
Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System Xiaofan Zhang1, Mohamed El Hadedy1, Wen-mei Hwu1, Nam Sung Kim1, Jinjun Xiong2, Deming Chen1 1 University of Illinois Urbana-Champaign
More informationImproving Bottleneck Features for Automatic Speech Recognition using Gammatone-based Cochleagram and Sparsity Regularization
Improving Bottleneck Features for Automatic Speech Recognition using Gammatone-based Cochleagram and Sparsity Regularization Chao Ma 1,2,3, Jun Qi 4, Dongmei Li 1,2,3, Runsheng Liu 1,2,3 1. Department
More informationGuiding Semi-Supervision with Constraint-Driven Learning
Guiding Semi-Supervision with Constraint-Driven Learning Ming-Wei Chang 1 Lev Ratinov 2 Dan Roth 3 1 Department of Computer Science University of Illinois at Urbana-Champaign Paper presentation by: Drew
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationInception Network Overview. David White CS793
Inception Network Overview David White CS793 So, Leonardo DiCaprio dreams about dreaming... https://m.media-amazon.com/images/m/mv5bmjaxmzy3njcxnf5bml5banbnxkftztcwnti5otm0mw@@._v1_sy1000_cr0,0,675,1 000_AL_.jpg
More informationA Deep Relevance Matching Model for Ad-hoc Retrieval
A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese
More informationAnalysis and Mapping of Sparse Matrix Computations
Analysis and Mapping of Sparse Matrix Computations Nadya Bliss & Sanjeev Mohindra Varun Aggarwal & Una-May O Reilly MIT Computer Science and AI Laboratory September 19th, 2007 HPEC2007-1 This work is sponsored
More information(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard
(Deep) Learning for Robot Perception and Navigation Wolfram Burgard Deep Learning for Robot Perception (and Navigation) Lifeng Bo, Claas Bollen, Thomas Brox, Andreas Eitel, Dieter Fox, Gabriel L. Oliveira,
More informationTopics for thesis. Automatic Speech-based Emotion Recognition
Topics for thesis Bachelor: Automatic Speech-based Emotion Recognition Emotion recognition is an important part of Human-Computer Interaction (HCI). It has various applications in industrial and commercial
More informationKara Greenfield, William Campbell, Joel Acevedo-Aviles
Kara Greenfield, William Campbell, Joel Acevedo-Aviles GraphEx 2014 8/21/2014 This work was sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA8721-05-C-0002. Opinions,
More informationLLMORE: Mapping and Optimization Framework
LORE: Mapping and Optimization Framework Michael Wolf, MIT Lincoln Laboratory 11 September 2012 This work is sponsored by Defense Advanced Research Projects Agency (DARPA) under Air Force contract FA8721-05-C-0002.
More informationIndex. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,
Index A Algorithmic noise tolerance (ANT), 93 94 Application specific instruction set processors (ASIPs), 115 116 Approximate computing application level, 95 circuits-levels, 93 94 DAS and DVAS, 107 110
More informationA Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images
A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract
More informationClassifying Depositional Environments in Satellite Images
Classifying Depositional Environments in Satellite Images Alex Miltenberger and Rayan Kanfar Department of Geophysics School of Earth, Energy, and Environmental Sciences Stanford University 1 Introduction
More informationWhy data science is the new frontier in software development
Why data science is the new frontier in software development And why every developer should care Jeff Prosise jeffpro@wintellect.com @jprosise Assertion #1 Being a programmer is like being the god of your
More informationThis Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 1 This Talk 1) Node embeddings Map nodes to low-dimensional embeddings. 2) Graph neural networks Deep learning architectures
More informationCCNF FOR CONTINUOUS EMOTION TRACKING IN MUSIC: COMPARISON WITH CCRF AND RELATIVE FEATURE REPRESENTATION
CCNF FOR CONTINUOUS EMOTION TRACKING IN MUSIC: COMPARISON WITH CCRF AND RELATIVE FEATURE REPRESENTATION Vaiva Imbrasaitė, Tadas Baltrušaitis, Peter Robinson Computer Laboratory, University of Cambridge
More informationK Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat
K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that
More informationMultilayer and Multimodal Fusion of Deep Neural Networks for Video Classification
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang, Pavlo Molchanov, Jan Kautz INTELLIGENT VIDEO ANALYTICS Surveillance event detection Human-computer interaction
More informationCSE446: Linear Regression. Spring 2017
CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill
More informationSemantic Segmentation. Zhongang Qi
Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in
More informationLEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS
LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS Alexey Dosovitskiy, Jost Tobias Springenberg and Thomas Brox University of Freiburg Presented by: Shreyansh Daftry Visual Learning and Recognition
More informationDeep Learning Cook Book
Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation
More information