Real Sense- Use Case Scenarios

Similar documents
Gesture Recognition and Voice Synthesis using Intel Real Sense

Create Natural User Interfaces with the Intel RealSense SDK Beta 2014

Chapter 19: Multimedia

Mixed-Reality for Intuitive Photo-Realistic 3D-Model Generation

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

MultiAR Project Michael Pekel, Ofir Elmakias [GIP] [234329]

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

26/05/2015 AR & VR? 1 1

CS 498 VR. Lecture 20-4/11/18. go.illinois.edu/vrlect20

Vision-Based Human-Computer-Interaction

High-Fidelity Facial and Speech Animation for VR HMDs

Making XR a reality for everyone

Gestural and Cinematic Interfaces - DX11. David Brebner Unlimited Realities CTO

SLO to ILO Alignment Reports


3D Computer Modelling and Animation

Non-line-of-sight imaging

Mixed Reality with Microsoft HoloLens

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Projects for the Creative Industries

High-Fidelity Augmented Reality Interactions Hrvoje Benko Researcher, MSR Redmond

Prasanna Krishnaswamy Intel Platform Architect. Imaging Systems Design for Mixed Reality Scenarios

ADOBE CHARACTER ANIMATOR PREVIEW ADOBE BLOGS

Synthesizing Realistic Facial Expressions from Photographs

3D Reconstruction with Tango. Ivan Dryanovski, Google Inc.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

3D Time-of-Flight Image Sensor Solutions for Mobile Devices

Data-Driven Face Modeling and Animation

3D Interaction Techniques for Virtual Environments: Selection and Manipulation. Doug A. Bowman

New Sony DepthSense TM ToF Technology

Avatar Communication: Virtual Instructor in the Demonstration Exhibit

Detection of a Single Hand Shape in the Foreground of Still Images

System Requirements:-

Visual Perception for Robots

AUTOMATIC VIDEO INDEXING

Books: 1) Computer Graphics, Principles & Practice, Second Edition in C JamesD. Foley, Andriesvan Dam, StevenK. Feiner, John F.

Emerging Vision Technologies: Enabling a New Era of Intelligent Devices

Artec Leo. A smart professional 3D scanner for a next-generation user experience

GCSE ICT AQA Specification A (Short Course) Summary

WHAT IS BFA NEW MEDIA?

3D Editing System for Captured Real Scenes

Adobe Campaign (15.12) Voluntary Product Accessibility Template

Computer Graphics Lecture 2

Intel RealSense SDK 2014

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Positional tracking for VR/AR using Stereo Vision. Edwin AZZAM, CTO Stereolabs

VERSATILE AND EASY-TO-USE 3D LASER SCANNERS

Artec Leo. A smart professional 3D scanner for a next-generation user experience

Introduction to Computer Graphics

Artec Leo. A smart professional 3D scanner for a next-generation user experience

Construction Progress Management and Interior Work Analysis Using Kinect 3D Image Sensors

The ioxp worker assistance system an overview

Handheld Augmented Reality. Reto Lindegger

COMPUTER GRAPHICS. Computer Multimedia Systems Department Prepared By Dr Jamal Zraqou

Computer Vision: Reconstruction, Recognition, and Interaction

CAPTURE. USER GUIDE v1

Some Resources. What won t I learn? What will I learn? Topics

Computer and Machine Vision

Homework Graphics Input Devices Graphics Output Devices. Computer Graphics. Spring CS4815

Visualisation : Lecture 1. So what is visualisation? Visualisation

CS535: Interactive Computer Graphics

Artec Leo. A smart professional 3D scanner for a next-generation user experience

Artec Leo. A smart professional 3D scanner for a next-generation user experience

ART OF 3D MODELLING & ANIMATION

MediaTek Video Face Beautify

Welcome to INTRODUCTION TO WINDOWS 10. Instructor: Tori Moody Co-Owner of CPU Computers & About You Web Design

Using the RealSense D4xx Depth Sensors in Multi-Camera Configurations

Bellevue Community College Summer 2009 Interior Design 194 SPECIAL TOPIC: SKETCHUP

I.

GETTING STARTED Contents

HOW TO RECONSTRUCT DAMAGED PARTS BASED ON PRECISE AND (PARTLY-)AUTOMATISED SCAN METHODS

GTC Interaction Simplified. Gesture Recognition Everywhere: Gesture Solutions on Tegra

3D MACHINE VISION PLATFORM

Making Machines See. Roberto Cipolla Department of Engineering. Research team

Inspiron Series. Views. Specifications

27 th Computers & Information in Engineering Conference June 15, 2010 Las Vegas, NV

New Media Production week 3

Scene Modeling for a Single View

Using Intel RealSense Depth Data for Hand Tracking in Unreal Engine 4. A Senior Project. presented to

Topics for thesis. Automatic Speech-based Emotion Recognition

ArchGenTool: A System-Independent Collaborative Tool for Robotic Architecture Design

Course overview. Digital Visual Effects, Spring 2005 Yung-Yu Chuang 2005/2/23

See What We See - Sharing Mixed Reality Experiences with WebRTC

English Table of Contents

Virtual presence - Research and Development Issues

This one-semester elective course is intended as a practical, hands-on guide to help you understand digital art.

Digitization of 3D Objects for Virtual Museum

The Most User-Friendly 3D scanner

Ultrabook Convertible Application Design Considerations

INFORMATION MANAGEMENT

Gesture-Based 3D Mesh Modeler

Operating an Application Using Hand Gesture Recognition System

CS 428: Fall Introduction to. Realism (overview) Andrew Nealen, Rutgers, /11/2009 1

HISTOGRAMS OF ORIENTATIO N GRADIENTS

Case Study: The Pixar Story. By Connor Molde Comptuer Games & Interactive Media Year 1

OVERVIEW COURSE FOR STUDENTS OF MANAGEMENT COMPUTER GRAPHICS

8. A is approximately one billion bytes a. Kilobyte b. Bit c. Gigabyte d. Megabyte 9. SMPT, FTP and DNS are applications of the layer a. Data link b.

Immersion. Tim Leland Vice President, Product Management Qualcomm Technologies,

20 reasons why the Silex PTE adds value to your collaboration environment

Artec Leo. A smart professional 3D scanner for a next-generation user experience

Transcription:

Real Sense- Use Case Scenarios

Real Sense F200: Top-3 Experiences Experience Use Case MW Win 10 Facial Login & Win10 Hello + RealSense MSFT Authentication = Facial Login/authentication Win10 Hello + RealSense + TrueKey = Password management Immersive Collaboration: Interactive Conferencing Mainstream Gaming Virtual background Avatar Personify BGS Facial Landmarks/ FaceShift MW 3D Face Selfie In-game virtual presence (blogging) Share with friends on Facebook & social media Create 3D statues Personify BGS 3DMe use Facial Landmark s; Rest N/A N/A Drop it into games 3D Systems and Mixamo

Real Sense R200: Top-3 Experiences Experience Use Case MW Success Criteria Enhanced Viewing and Enhanced Photo Hole filling, Live modes Photography and Focus Resulting Images and object scan (speed vs. quality), plane fitting, tracking (layer, object, human) 3D Scan and Share Business: Scan facilities and compare to pre-existing blueprints, laser scans or CAD drawing N/A 1-cm of accuracy at 3-4m of range (fiducial in scene w/ known calibration for relative measurement) Consumer: Share with Friends on FaceBook N/A Easy and seamless sharing from Apps Consumer: Print 3D Statues N/A Depth data/model accuracy serving as the higher level mesh representation 3D Measure and Design Quick measure of rooms and furniture (Asynch): take a depth enabled photo for point to point measurement) Capture 2D floor plans Scene Perception to get 6 DOF & then IMU based trigonometry Consistently estimate relative accuracy based on distance from the device

Use Cases for F200 BodyScan / Apparel Fitting F200 Apparel/Fitting/Immersive Shopping: At-home, enhanced online shopping experience, online sales: Body scan and Fit Assessment BodyLabs, FitAnalytics, Embodee Consistent, accurate size recommendations and fit assessment 3D Capture, Share, and Print F200 Scan real people and objects and create 3D digital renditions. Can be inserted into virtual reality worlds. Physical renditions can be edited, shared and printed with 3D printer. Use Case : 3D Capture: Scan objects, people, and places Simplified Printing Sharing Immersive Collaboration F200 Enhanced interaction and sharing for teleconferencing: Background replacement and the use of avatars to enrich the experience. Use Case: Enhanced interaction and sharing for video conferencing and Add an Avatar Interact Naturally F200 Natural, intuitive and Immersive interaction with devices using face and body. Implicit and explicit interactions. Face recognition to gain authorized access to device. Speech basic command and control. Use Case: Utilize hands/gestures, Face, and Speech Most ISVs use raw depth streams and are not dependent upon on Intel s MW capability or timing SDK Core, Hand Tracking and Gestures, Touchless Touch Core, Face 2D+3D Tracking, Landmarks Material type, Resolution of detail, Accuracy and precision, RGB quality and resolution Full body segmentation, multi people, Distance 3m (actual) 5m (goal). 24 fps (rqmt), capture resolution : 720p and 1080p, high quality RGB in low-light for Skype compliance Detection (hand detection, time to detection), Gesture (Accuracy Hit rate), Hand Labeling (hand switch), Hand Tracking: (accuracy 95%, Min algo FPS >45 fps on 1-hand, 2-hands> 40 fps), Min rendering (1-hand, >45 fps, 2- hands >40 fps) Gaming F200 Natural real-time interaction (select, control, act) within games and education apps (e.g. interactive ebooks). Inserting objects into scenes and interacting with objects in scenes. Objects can be real (scanned), or virtual (loaded, constructed). Scenes can be real or constructed Use Case: Intuitive Control, Object Scanning and 2 Avatar Usages, Personify IC, Biometrics Speech, Face tracking, blob tracking, contour. Additional 3rd party components required, e.g. Personify, Mixamo, Twitch, and FaceShift Gesture Tracking: Frame rate at least 60fps on SKL-Y, Latency under 50ms from gesture to visible update for comparable to mouse movement, filter impossible hand poses and take less than 30ms to get tracking. Voice: Use MS-SAPI Face/Head Tracking: Requires large FOV, ability to recover quickly

Education F200 Use Case MW Success Criteria OS/ Platform Gesture controlled manipulation of 2D/3D objects STEM, Art Students control 3D objects on the screen without using expensive physical models of molecules or 3D shapes to understand different characteristics. RSSDK Control and manipulate 2D/3D objects on the screen Effective range for Gesture is (20 to 55CM) Win 8/10 Students control 3D objects in physical world while camera captures object s position and displays projections allowing students to understand connection between 3D and 2D Students change point of views or position of geometrical shapes with gestures to allow them to better understand connection between real world and its reflection on the screen (for younger children) Constant velocity Students are moving their hands and arms at a constant speed in a uniform direction to represent constant velocity. They receive immediate multi-modal sonic and visual feedback. In addition, when a pair of students create a game or try to produce a particular pattern on the graph display, they need to coordinate how they move as a pair RSSDK Track physical objects, and movements Effective range for Gesture is (20 to 55CM) Win 8 Assessments, tests, examinations Camera captures emotional responses to certain tests stimuli during base line assessments and augment test results with emotion maps. Camera tracks eye and head landmark movements during dyslexia assessments to understand reading patterns during speed naming tests. In assessment centre cameras are used to identify pupils and spot fraud patterns during examination process. RSSDK Use emotion and face landmark tracking information to augment results of assessments and tests Effective range for Face tracking is (35 to 70CM) Win 8/10

Interact Naturally Usages / Use Cases Speech (Kids): Recognition of Keywords and short phrases Multi-modality with other usages Utilize Hands: Broad Gestures - Simple ability to detect speech - Recognition accuracy - Voice control/commands interpreted correctly - Natural language detection - Detection (hand detection, time to detection), - Gesture (Accuracy Hit rate), Hand Labeling (hand switch), - Hand Tracking: (accuracy 95%, Min algo FPS >45 fps on 1-hand, 2-hands> 40 fps), Min rendering (1-hand, >45 fps, 2-hands >40 fps) Utilize Face: Detection and Tracking (not emotions yet) Detect User Engaged/Frustr ated CPU consumption Face detection Pose Landmarks Face Recognition Face Expression (not yet) Multiple face Multiple Landmark

3D Capture, Share, and Print Usages / Use Cases 3D Capture: Scan objects - small, medium, large, fine detail; People - face, torso, whole body; (coming on F200) Places - partial space / room, entire room - Material Common materials need to be scanable regardless of color - Resolution of Detail for standard scans being able to consistently resolve features 0.8-1.0mm in size - Scanning Speed for smooth operation the scanning software needs to be able to capture at 20fps (SKL) and 15(CHT) which depends upon both the hardware as well as the depth data drivers - Accuracy and Precision These are the most fundamental issues for 3D Capture. Model creation and model detail success hinges upon repeatable data that accurately represents the real world - RGB Quality and Resolution RGB quality and the size of the RGB image are important to scanning. The RGB data is mostly used to generate texture maps which are then layered over the generate mesh model to provide a more photorealistic representation than colored vertices alone. Simplified Printing: Locally Send to a service Timeshare 3D printer Sharing: Post to social media or other ecommerce sites

Enhanced Photography Usages / Use Cases Capture: Picture Live Preview Artistic Filters Editing Effects Entirely dependent upon depth data an RGB - connectedness in depth. Edit: Share: Artistic Filters Editing Effects Motion Effects Consumer Point to Point Measurements Foreground to shift more than other directions, KPI TBD. Easier than BGS because not as difficult requirements No good end user metric yet. Want Box measured and app needs to help with defining points. Reliant upon underlying depth map quality - holes in RGB, measuring two points different than dimensions, improve depth data Share captured pictures. Enhanced Videography Usages / Use Cases Capture Video Artistic Filters Editing Effects Tracking Capture videos with Platform camera. Requires to run at 24fps min. Need Video file format to store the Captured video. Edit Share Artistic Filters Editing Effects Tracking Load and Edit captures videos. Requires the new video file format Share captured pictures

Gaming Usages / Use Cases Intuitive Control: Use your hand to interact with in-game characters or to trigger in-game events. Voice commands in games Face/head tracking Object scanning: Scan an object and import it into a game Gesture Tracking: Frame rate at least 60fps on SKL-Y, Latency under 50ms from gesture to visible update for comparable to mouse movement, filter impossible hand poses and take less than 30ms to get tracking. Voice: Use MS-SAPI Face/Head Tracking: Requires large FOV, ability to recover quickly (Additional KPI details in speaker notes) Model resolution higher than actual effective resolution of scan (Voxel) Real-time usage resolution Texture mapping Avatar Usage #1 Need rigging/blending solution from 3 rd party. Put a 3D model of my head on an in game Current workflow for end user is complex. character to play in game Avatar Usage #2 Let me control an Avatar through my emotions and expressions. While playing a game. <= 15% CPU and GPU consumption (each) <= 10ms latency from camera to final buffer Rigging/animation solution required Personify (Immersive <= 15% CPU and GPU consumption (each) <= 10ms latency from camera to final buffer Collaborate) Play a game with my friends where they can see me while I play using BGS (like Twitch & a virtual greenscreen/personify + gaming) Biometric Tracking: Not POR but experimental feature heart-rate: needs much higher refresh rate (60-100hz); face/avatar sync: need to retarget landmarks Education and Learning

Usages / Use Cases Camera KPI / Success Criteria Virtual Green Screen: Use background subtraction (i.e. Personify) to "film" a friend in a fantasy environment. Virtual Classroom (w/green- Screen) Natural real-time interaction: Special window and lens experiences provide novel or informative content registered with the physical scene Multi-modality: - Broad gestures, Voice, Face Depth/Color resolution and framerate. min/max range of depth camera Min/max r Min/max range of depth camera, hand velocity threshold, low light face detection, face tracking uniformity (skin tonesange of depth camera, hand velocity threshold, low light face detection, face tracking uniformity (skin tones) Interact Naturally Usages / Use Cases Speech (Kids): Recognition of Keywords and short phrases Multi-modality with other usages - Simple ability to detect speech - Recognition accuracy - Voice control/commands interpreted correctly - Natural language detection Utilize Hands: Broad Gestures - Detection (hand detection, time to detection), - Gesture (Accuracy Hit rate), Hand Labeling (hand switch), - Hand Tracking: (accuracy 95%, Min algo FPS >45 fps on 1-hand, 2-hands> 40 fps), Min rendering (1-hand, >45 fps, 2-hands >40 fps) Utilize Face: Detection and Tracking (not emotions yet) Detect User Engaged/Frustra ted CPU consumption Face detection Pose Landmarks Face Recognition Face Expression (not yet) Multiple face Multiple Landmark