Research at Google: With a Case of YouTube-8M. Joonseok Lee, Google Research

Size: px

Start display at page:

Download "Research at Google: With a Case of YouTube-8M. Joonseok Lee, Google Research"

Ezra Heath
5 years ago
Views:

1 Research at Google: With a Case of YouTube-8M Joonseok Lee, Google Research

2 Google Research We tackle the most challenging problems in Computer Science and related fields. Being bold and taking risks allows our embedded teams to make discoveries that affect billions of users every day. Ever since Google was born in Stanford's Computer Science department, the company has valued and maintained strong relations with universities and research institutes. 2

3 Today s Talk: YouTube-8M Video Challenging Understanding Problem Problem Large-Scale Billions of Challenges Users Kaggle Academic Competition Relationship + and CVPR 2017 Contribution Workshop 3

4 Video Understanding Problem 4

5 What is Video Understanding? From raw signals... to meaning... useful for Annotation Classification Recommendation Search Summarization... 5

The Multiple Shades of Video Understanding kids playing park cars honking

Police Chase intro indoor dialog poorly lit outdoor chase hand-held camera

Inferring the central topics: what is the story about?

6 The Multiple Shades of Video Understanding kids playing park cars honking sidewalk protagonist policeman shouting stop! Police Chase intro indoor dialog poorly lit outdoor chase hand-held camera Describing the content: what is visible/audible? Inferring the central topics: what is the story about? Describing the structure & style: how is the story told? credits Inferring creator / viewer intent: why capture this video? why watch this video? 6

7 Applications: YouTube Video Discovery Content Metadata Fuser Viewer signals YouTube Auto-generated Channel Topics Topic annotations describe videos and channels 7

8 Applications: Personal Media Collections Lots of videos No metadata 8

Applications: Cloud Video Intelligence API https://cloud.google.

API allows developers to extract actionable insights from video

9 Applications: Cloud Video Intelligence API Insite from Videos Cloud Video Intelligence API allows developers to extract actionable insights from video files without requiring any machine learning or computer vision knowledge. 9

10 Large-Scale Challenges 10

11 Challenges in Creating Video Dataset File sizes are larger than images. Video labels are more expensive to obtain. More expensive to download, store, and train from. Requiring annotators to watch the video and listen to audio stream. Therefore, video datasets tend to be small. 11

YouTube-8M: What is it? Dataset & open-source TensorFlow code research.google.com/youtube8m/ github.com/google/youtube-8m/ 1 Petabyte of data served so far!

12 YouTube-8M: What is it? Dataset & open-source TensorFlow code research.google.com/youtube8m/ github.com/google/youtube-8m/ 1 Petabyte of data served so far! Kaggle Competition (2017/2/ /6/2) kaggle.com/c/youtube8m $100,000 prize pool (sponsored by Cloud ML) $30,000 in Cloud credits for participants CVPR 17 Workshop (2017/7/26) research.google.com/youtube8m/workshop.html 4 invited talks, 10 oral + 8 poster presentations 12

13 research.google.com/youtube8m/ YouTube-8M Dataset: Vocabulary 4,716 Knowledge Graph entities, each entity has 200+ corresponding videos 13

14 YouTube-8M: Diversity 14

15 YouTube-8M: TensorFlow Framework Design YT-8M (original videos) HMDB Video/Audio Feature Extraction UCF101 Computation per example ImageNet YT-8M (pre-computed features) MNIST Data Size github.com/google/youtube-8m/ Large data size and lower compute intensity 15

16 Kaggle Competition & CVPR Workshop 16

17 The YouTube-8M Classification Challenge Input: Target: A sequence of frame-level visual and audio features, extracted at 1 frame-per-second Each video has Visual Inception-V3 bottleneck features extracted from pixels (PCA-ed to 1024-d) Audio VGG-style bottleneck features extracted from audio spectrograms (128-d) Video topics from a 4,716 Knowledge Graph entity vocabulary The target topics cover the main themes in the video (vs. object detection, scene parsing) Each video has 3.4 ground truth labels on average Goal: Predict target video topics from the sequence of frame-level features 17

18 The YouTube-8M Classification Challenge Korean Food Cooking Meat Football Machine Learning Model Feature Feature Feature Feature 18

19 Participation Statistics: Overall Submissions Received: 7,833 (73.2/day in average) Unique Page Views: 145,863 Downloaders: 3,024 Competing users: 926 Leaderboard top score Competing teams:

20 Number of Submissions 20

21 Where are the participants from? Participants from 56 countries Number of submissions USA: 2,810 China: 1,675 Korea: 420 UK: 370 Number of participants USA: 293 China: 94 India: 48 Russia: 32 Korea, UK: 31 21

22 Final standing: Top 500 The top-performing team (84.97%, rank 1) LSTM starter code baseline (80.93%, rank 78) Audio-visual MoE video-level baseline (~78%) Audio-visual log-reg baseline (74.71%) Visual log-reg baseline (69.42%) 22

23 Final standing: Top 20 INRIA Tsinghua University Baidu + Tsinghua University Fudan University University Pompeu Fabra Seoul National University 23

24 Summary We tackle the most challenging problems in Computer Science and related fields, affecting billions of users every day. Video Understanding Problem Large-Scale Challenges Kaggle Competition + CVPR 2017 Workshop 24

25 Thanks for your Attention!

YouTube-8M Video Classification

YouTube-8M Video Classification Alexandre Gauthier and Haiyu Lu Stanford University 450 Serra Mall Stanford, CA 94305 agau@stanford.edu hylu@stanford.edu Abstract Convolutional Neural Networks (CNNs) have