Real Sense- Use Case Scenarios

Real Sense F200: Top-3 Experiences Experience Use Case MW Win 10 Facial Login & Win10 Hello + RealSense MSFT Authentication = Facial Login/authentication Win10 Hello + RealSense + TrueKey = Password management Immersive Collaboration: Interactive Conferencing Mainstream Gaming Virtual background Avatar Personify BGS Facial Landmarks/ FaceShift MW 3D Face Selfie In-game virtual presence (blogging) Share with friends on Facebook & social media Create 3D statues Personify BGS 3DMe use Facial Landmark s; Rest N/A N/A Drop it into games 3D Systems and Mixamo

Real Sense R200: Top-3 Experiences Experience Use Case MW Success Criteria Enhanced Viewing and Enhanced Photo Hole filling, Live modes Photography and Focus Resulting Images and object scan (speed vs. quality), plane fitting, tracking (layer, object, human) 3D Scan and Share Business: Scan facilities and compare to pre-existing blueprints, laser scans or CAD drawing N/A 1-cm of accuracy at 3-4m of range (fiducial in scene w/ known calibration for relative measurement) Consumer: Share with Friends on FaceBook N/A Easy and seamless sharing from Apps Consumer: Print 3D Statues N/A Depth data/model accuracy serving as the higher level mesh representation 3D Measure and Design Quick measure of rooms and furniture (Asynch): take a depth enabled photo for point to point measurement) Capture 2D floor plans Scene Perception to get 6 DOF & then IMU based trigonometry Consistently estimate relative accuracy based on distance from the device

Use Cases for F200 BodyScan / Apparel Fitting F200 Apparel/Fitting/Immersive Shopping: At-home, enhanced online shopping experience, online sales: Body scan and Fit Assessment BodyLabs, FitAnalytics, Embodee Consistent, accurate size recommendations and fit assessment 3D Capture, Share, and Print F200 Scan real people and objects and create 3D digital renditions. Can be inserted into virtual reality worlds. Physical renditions can be edited, shared and printed with 3D printer. Use Case : 3D Capture: Scan objects, people, and places Simplified Printing Sharing Immersive Collaboration F200 Enhanced interaction and sharing for teleconferencing: Background replacement and the use of avatars to enrich the experience. Use Case: Enhanced interaction and sharing for video conferencing and Add an Avatar Interact Naturally F200 Natural, intuitive and Immersive interaction with devices using face and body. Implicit and explicit interactions. Face recognition to gain authorized access to device. Speech basic command and control. Use Case: Utilize hands/gestures, Face, and Speech Most ISVs use raw depth streams and are not dependent upon on Intel s MW capability or timing SDK Core, Hand Tracking and Gestures, Touchless Touch Core, Face 2D+3D Tracking, Landmarks Material type, Resolution of detail, Accuracy and precision, RGB quality and resolution Full body segmentation, multi people, Distance 3m (actual) 5m (goal). 24 fps (rqmt), capture resolution : 720p and 1080p, high quality RGB in low-light for Skype compliance Detection (hand detection, time to detection), Gesture (Accuracy Hit rate), Hand Labeling (hand switch), Hand Tracking: (accuracy 95%, Min algo FPS >45 fps on 1-hand, 2-hands> 40 fps), Min rendering (1-hand, >45 fps, 2- hands >40 fps) Gaming F200 Natural real-time interaction (select, control, act) within games and education apps (e.g. interactive ebooks). Inserting objects into scenes and interacting with objects in scenes. Objects can be real (scanned), or virtual (loaded, constructed). Scenes can be real or constructed Use Case: Intuitive Control, Object Scanning and 2 Avatar Usages, Personify IC, Biometrics Speech, Face tracking, blob tracking, contour. Additional 3rd party components required, e.g. Personify, Mixamo, Twitch, and FaceShift Gesture Tracking: Frame rate at least 60fps on SKL-Y, Latency under 50ms from gesture to visible update for comparable to mouse movement, filter impossible hand poses and take less than 30ms to get tracking. Voice: Use MS-SAPI Face/Head Tracking: Requires large FOV, ability to recover quickly

Education F200 Use Case MW Success Criteria OS/ Platform Gesture controlled manipulation of 2D/3D objects STEM, Art Students control 3D objects on the screen without using expensive physical models of molecules or 3D shapes to understand different characteristics. RSSDK Control and manipulate 2D/3D objects on the screen Effective range for Gesture is (20 to 55CM) Win 8/10 Students control 3D objects in physical world while camera captures object s position and displays projections allowing students to understand connection between 3D and 2D Students change point of views or position of geometrical shapes with gestures to allow them to better understand connection between real world and its reflection on the screen (for younger children) Constant velocity Students are moving their hands and arms at a constant speed in a uniform direction to represent constant velocity. They receive immediate multi-modal sonic and visual feedback. In addition, when a pair of students create a game or try to produce a particular pattern on the graph display, they need to coordinate how they move as a pair RSSDK Track physical objects, and movements Effective range for Gesture is (20 to 55CM) Win 8 Assessments, tests, examinations Camera captures emotional responses to certain tests stimuli during base line assessments and augment test results with emotion maps. Camera tracks eye and head landmark movements during dyslexia assessments to understand reading patterns during speed naming tests. In assessment centre cameras are used to identify pupils and spot fraud patterns during examination process. RSSDK Use emotion and face landmark tracking information to augment results of assessments and tests Effective range for Face tracking is (35 to 70CM) Win 8/10

Interact Naturally Usages / Use Cases Speech (Kids): Recognition of Keywords and short phrases Multi-modality with other usages Utilize Hands: Broad Gestures - Simple ability to detect speech - Recognition accuracy - Voice control/commands interpreted correctly - Natural language detection - Detection (hand detection, time to detection), - Gesture (Accuracy Hit rate), Hand Labeling (hand switch), - Hand Tracking: (accuracy 95%, Min algo FPS >45 fps on 1-hand, 2-hands> 40 fps), Min rendering (1-hand, >45 fps, 2-hands >40 fps) Utilize Face: Detection and Tracking (not emotions yet) Detect User Engaged/Frustr ated CPU consumption Face detection Pose Landmarks Face Recognition Face Expression (not yet) Multiple face Multiple Landmark

3D Capture, Share, and Print Usages / Use Cases 3D Capture: Scan objects - small, medium, large, fine detail; People - face, torso, whole body; (coming on F200) Places - partial space / room, entire room - Material Common materials need to be scanable regardless of color - Resolution of Detail for standard scans being able to consistently resolve features 0.8-1.0mm in size - Scanning Speed for smooth operation the scanning software needs to be able to capture at 20fps (SKL) and 15(CHT) which depends upon both the hardware as well as the depth data drivers - Accuracy and Precision These are the most fundamental issues for 3D Capture. Model creation and model detail success hinges upon repeatable data that accurately represents the real world - RGB Quality and Resolution RGB quality and the size of the RGB image are important to scanning. The RGB data is mostly used to generate texture maps which are then layered over the generate mesh model to provide a more photorealistic representation than colored vertices alone. Simplified Printing: Locally Send to a service Timeshare 3D printer Sharing: Post to social media or other ecommerce sites

Enhanced Photography Usages / Use Cases Capture: Picture Live Preview Artistic Filters Editing Effects Entirely dependent upon depth data an RGB - connectedness in depth. Edit: Share: Artistic Filters Editing Effects Motion Effects Consumer Point to Point Measurements Foreground to shift more than other directions, KPI TBD. Easier than BGS because not as difficult requirements No good end user metric yet. Want Box measured and app needs to help with defining points. Reliant upon underlying depth map quality - holes in RGB, measuring two points different than dimensions, improve depth data Share captured pictures. Enhanced Videography Usages / Use Cases Capture Video Artistic Filters Editing Effects Tracking Capture videos with Platform camera. Requires to run at 24fps min. Need Video file format to store the Captured video. Edit Share Artistic Filters Editing Effects Tracking Load and Edit captures videos. Requires the new video file format Share captured pictures

Gaming Usages / Use Cases Intuitive Control: Use your hand to interact with in-game characters or to trigger in-game events. Voice commands in games Face/head tracking Object scanning: Scan an object and import it into a game Gesture Tracking: Frame rate at least 60fps on SKL-Y, Latency under 50ms from gesture to visible update for comparable to mouse movement, filter impossible hand poses and take less than 30ms to get tracking. Voice: Use MS-SAPI Face/Head Tracking: Requires large FOV, ability to recover quickly (Additional KPI details in speaker notes) Model resolution higher than actual effective resolution of scan (Voxel) Real-time usage resolution Texture mapping Avatar Usage #1 Need rigging/blending solution from 3 rd party. Put a 3D model of my head on an in game Current workflow for end user is complex. character to play in game Avatar Usage #2 Let me control an Avatar through my emotions and expressions. While playing a game. <= 15% CPU and GPU consumption (each) <= 10ms latency from camera to final buffer Rigging/animation solution required Personify (Immersive <= 15% CPU and GPU consumption (each) <= 10ms latency from camera to final buffer Collaborate) Play a game with my friends where they can see me while I play using BGS (like Twitch & a virtual greenscreen/personify + gaming) Biometric Tracking: Not POR but experimental feature heart-rate: needs much higher refresh rate (60-100hz); face/avatar sync: need to retarget landmarks Education and Learning

Usages / Use Cases Camera KPI / Success Criteria Virtual Green Screen: Use background subtraction (i.e. Personify) to "film" a friend in a fantasy environment. Virtual Classroom (w/green- Screen) Natural real-time interaction: Special window and lens experiences provide novel or informative content registered with the physical scene Multi-modality: - Broad gestures, Voice, Face Depth/Color resolution and framerate. min/max range of depth camera Min/max r Min/max range of depth camera, hand velocity threshold, low light face detection, face tracking uniformity (skin tonesange of depth camera, hand velocity threshold, low light face detection, face tracking uniformity (skin tones) Interact Naturally Usages / Use Cases Speech (Kids): Recognition of Keywords and short phrases Multi-modality with other usages - Simple ability to detect speech - Recognition accuracy - Voice control/commands interpreted correctly - Natural language detection Utilize Hands: Broad Gestures - Detection (hand detection, time to detection), - Gesture (Accuracy Hit rate), Hand Labeling (hand switch), - Hand Tracking: (accuracy 95%, Min algo FPS >45 fps on 1-hand, 2-hands> 40 fps), Min rendering (1-hand, >45 fps, 2-hands >40 fps) Utilize Face: Detection and Tracking (not emotions yet) Detect User Engaged/Frustra ted CPU consumption Face detection Pose Landmarks Face Recognition Face Expression (not yet) Multiple face Multiple Landmark