Cross-Language Interfacing and Gesture Detection with Microsoft Kinect

Similar documents
Creating Mercator s Map Projection

Sentinel Figure 1 Sentinel operating in a retail store

3D Display and Gesture Technology For Scientific Programming

Using the CMA Warp Editor

VirMuF Manual V 0.5 1

Lightroom System February 2018 Updates

Introduction to Google SketchUp

Handheld Augmented Reality. Reto Lindegger

Game Programming with. presented by Nathan Baur

Gamepad Controls. Figure 1: A diagram of an Xbox controller. Figure 2: A screenshot of the BodyViz Controller Panel. BodyViz 3 User Manual 1

Depth Sensors Kinect V2 A. Fornaser

Pedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016

Automated Road Segment Creation Process

: Easy 3D Calibration of laser triangulation systems. Fredrik Nilsson Product Manager, SICK, BU Vision

COS 116 The Computational Universe Laboratory 10: Computer Graphics

Detection of a Single Hand Shape in the Foreground of Still Images

Software Specification of the GORT Environment for 3D Modeling

Data Representation in Visualisation

Visualization Insider A Little Background Information

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

Interactive OpenGL Animation

What is Computer Vision? Introduction. We all make mistakes. Why is this hard? What was happening. What do you see? Intro Computer Vision

Triangle meshes I. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2017

Available online at ScienceDirect. Procedia Computer Science 59 (2015 )

Planets Earth, Mars and Moon Shaders Asset V Documentation (Unity 5 version)

User Manual. Version 2.0

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Triangle meshes I. CS 4620 Lecture 2

Please find attached a copy of my final report entitled EECE 496 Hand Gesture Recognition Software Project.

Augmenting Reality with Projected Interactive Displays

User Guide. Vertex Texture Fetch Water

Appendix 2: Creating Maps with the 3D Massing Model

Shading Shades. Frank Jargstorff. June 1, Abstract

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane?

PanoMOBI: Panoramic Mobile Entertainment System

What s New in Emulate3D. New Tag Grid

Playa del Rey, California InSAR Ground Deformation Monitoring

Geolocation with FW 6.4x & Video Security Client Geolocation with FW 6.4x & Video Security Client 2.1 Technical Note

Web Map Caching and Tiling. Overview David M. Horwood June 2011

COMPUTATION OF CROSS SECTION OF COMPLEX BODIES IN ESA DRAMA TOOL

COS 116 The Computational Universe Laboratory 10: Computer Graphics

Semi-Automated and Interactive Construction of 3D Urban Terrains

Interactive 3D graphics in HEP

Making parchments and parchment scrolls

A Hillclimbing Approach to Image Mosaics

Google Earth Tutorial 1: The Basics of Map-making in Google Earth 6.2

Notes 9: Optical Flow

SketchUp. SketchUp. Google SketchUp. Using SketchUp. The Tool Set

Hand Gesture Recognition with Microsoft Kinect A Computer Player for the Rock-paper-scissors Game

MIKE: a Multimodal Cinematographic Editor for Virtual Worlds

PhotoScan. Fully automated professional photogrammetric kit

Shadows in the graphics pipeline

Mobile Camera Based Text Detection and Translation

Playa del Rey, California InSAR Ground Deformation Monitoring

Tutorial 4: Texture Mapping Techniques

Chapter 3 Image Registration. Chapter 3 Image Registration

A step by step introduction to TopoFlight

GPGPU Applications. for Hydrological and Atmospheric Simulations. and Visualizations on the Web. Ibrahim Demir

A Mouse-Like Hands-Free Gesture Technique for Two-Dimensional Pointing

SPACE - A Manifold Exploration Program

Immersive visualization of data sets in the 3-sphere Angela George & Alexander Gilbert

03 RENDERING PART TWO

PhotoScan. Fully automated professional photogrammetric kit

Video Mosaics for Virtual Environments, R. Szeliski. Review by: Christopher Rasmussen

Framework for replica selection in fault-tolerant distributed systems

Immersive Calibration PRO Step-By

The Most User-Friendly 3D scanner

Rotated earth or when your fantasy world goes up side down

Project report Augmented reality with ARToolKit

GPS/GIS Activities Summary

CSE 167: Lecture 11: Textures 2. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2011

Texturing Theory. Overview. All it takes is for the rendered image to look right. -Jim Blinn 11/10/2018

Ubi Quick Start Guide

Computer Science 474 Spring 2010 Viewing Transformation

arxiv: v1 [cs.cv] 1 Jan 2019

SYSTEM FOR ACTIVE VIDEO OBSERVATION OVER THE INTERNET

Natural Light in Design: IAP 2009

lecture 18 - ray tracing - environment mapping - refraction

Digital 3D technologies

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

1998 IEEE International Conference on Intelligent Vehicles 213

Texture. Texture Mapping. Texture Mapping. CS 475 / CS 675 Computer Graphics. Lecture 11 : Texture

CS 475 / CS 675 Computer Graphics. Lecture 11 : Texture

Triangle meshes I. CS 4620 Lecture Kavita Bala (with previous instructor Marschner) Cornell CS4620 Fall 2015 Lecture 2

Camera Calibration Utility Description

8K Earth, Moon and Mars Shaders Asset V Documentation

CPSC / Texture Mapping

Advanced Techniques for Greater Accuracy, Capacity, and Speed using Maxwell 11. Julius Saitz Ansoft Corporation

Principles of Computer Graphics. Lecture 3 1

Photorealistic 3D Rendering for VW in Mobile Devices

Well Unknown ID AKA EPSG: 3857

Texture mapping. Computer Graphics CSE 167 Lecture 9

Experiments on Animated Visual Forms

REGULATIONS Team: Submission: Late Submission: only for one day Cheating: Updates: Newsgroup: must

Computer Graphics: Programming, Problem Solving, and Visual Communication

Fractals: a way to represent natural objects

High level NavMesh Building Components

Robert Collins CSE598G. Intro to Template Matching and the Lucas-Kanade Method

Projection simulator to support design development of spherical immersive display

Transcription:

Cross-Language Interfacing and Gesture Detection with Microsoft Kinect Phu Kieu Department of Computer Science and Mathematics, California State University, Los Angeles Los Angeles, CA 90032 Lucy Abramyan, Mentor Section 317F Planning Software Systems, Jet Propulsion Laboratory, California Institute of Technology Pasadena, CA 91109 ABSTRACT Stage provides an immersive visualization environment for mission operations, in which we integrate Google Streetview and the Kinect Natural User Interface (NUI). The Kinect is controlled through a C++ dynamic link library, where it is interfaced to other environments: Java, Unity, and Teamcenter Visualization. Through the Kinect API, we retrieve the positions of various skeleton points, and use them to perform gesture processing and identification. The goal of the project is expand the uses of Stage to more than mission operations, and to explore the degree of seamlessness that we can achieve by using the Kinect NUI. This will highly depend on the types of gestures that are used to perform commands as well as the accuracy of the detection algorithms. We found that the learning curve of our system to be low since the number and complexity of the commands are low. However, other factors such as fatigue limit the feasibility of the NUI as a primary input method. 1. Background Stage is an experimental visualization environment for mission operations. For it to serve this purpose, it should handle three functions: input, display, and control. We define the following terms as follows: Input Collect and process data from device (robot, spacecraft, etc.) Display Display data from device to user in a usable format Control Process user s input and converts to commands recognized by device As of now, the display function is fairly complete, and we have fabricated input by using Mars ground data and Google Streetview. Control functions are purely client-side for the time being. The distinguishing factor of Stage is that the display is 12 ft. high, 270 degree cylindrical shaped. Figure 1 illustrates the physical configuration of Stage. Three projectors are required to illuminate the whole screen. The advantage this set-up has compared to traditional display methods is that it maintains the relative position of objects.

The code-base of Stage is written in Java. The 3D environment is rendered and managed by the Ardor3D library. In 3D rendering, the maximum view angle a camera is allowed is about 179. A piecewise rendering trick is used to achieve the full 270 degree view angle. Figure 1. Stage Configuration 2. Google Streetview Undocumented API We have found that the raw Google Streetview image data is not available through their public API. Therefore we access them through an undocumented API described by Jamie Thompson. It describes the sequence and methods on retrieving the raw image data of a given latitude and longitude pair. Note: There is a problem regarding the assumption of image dimensions (refer to Appendix A) in Thompson s article. The result is a spherical projected image of the location desired. This spherical image can be displayed correctly by creating a sphere and mapping the Streetview image to the sphere as a texture within the 3D scene. 3. Google Streetview Integration We have added support for Google Streetview in Stage. It takes a pair of latitude and longitude coordinates, and retrieves the image data from Google s servers. The image is processed and rendered with the 270 degree view geometry. Basically, the behavior of the program is similar to Google Earth, but with a wider view angle. This includes the ability to walk around various locations on earth, and the program will appropriately load and transition between images. 4. Cross-Language Kinect Interface Stage supports keyboard, mouse, and Nintendo Wii Remote inputs. Now we would like to include Microsoft Kinect support. The challenge associated with this is that Stage is written in Java, and the Kinect API supports C/C++ and C#. We solved this problem by writing interface code on each end: Java and C++. Our C++ interface is a dynamic link library (DLL) which uses the Kinect API and exports the Kinect functions. The memory mapping of the data structures is known, so any program that can call functions from this DLL should be able to retrieve data coming from the Kinect. We call this the wrapper DLL. The Java side of the interface accesses the wrapper DLL through the Java Native Access (JNA) library. The JNA library allows us to load the DLL and call functions through the DLL and store the output to a memory block. The raw data is remapped into a proper Java-native data structure that we can use to perform gesture processing.

We use a similar architecture set-up to interface the Kinect to Unity, a 3D game development tool, and Teamcenter Visualization, a CAD design tool. We will focus more on the implementation details used for Stage, however. Refer to the figures found in Appendix B. 5. Gesture Detection The method used to detect gestures in Stage is similar to the operation of a deterministic finite automaton (DFA). It is a clearly defined DFA in which at each state the system evaluates a condition, which can return either -1, 0, or 1. A value of -1 signals the system to return to the beginning state, 0 will result in no state change, and 1 will cause the system to advance a state. All states are ordered and there is no skipping of states or backtracking except for returning to the beginning state. A gesture is detected when the final state is reached. Figure 2 is a DFA of a four state gesture. We generalize that a gesture with more states are more complex. For example, our implementation of a wave gesture consists of four states. The gestures used to control Stage are probably the simplest type of gestures possible, which are composed of two states. Figure 2. DFA of Four State Gesture 6. Challenges Ardor3D was quite difficult to work with. The documentation was difficult to find that we thought it did not exist. It required inspection of the source code to understand the behavior when we needn t involve ourselves with the implementation details. The gestures we used in Stage are simple. If we wish to implement more complex gestures, the deconstruction of the gesture will be difficult. The wave gesture we have implemented as a test did not have a good detection rate. It resulted in primarily false negatives. There is also the possibility of multiple gestures interfering with each other. 7. Discussion In developing and testing the gestures, we have found that they can allow a user to finely control the camera and perform movements. However, we have had issues with unintended control, which is inherent with this type of input. It

requires the user to focus only on controlling Stage. We have also found that the behavior of the Kinect is not consistent across different machines. Another issue with using the Kinect is user fatigue. The average user will become tired after about one hour of use, whereas traditional input methods can be used for several hours before fatigue sets in. While this disqualifies the Kinect as a primary input method, it may be a valuable tool as a secondary input device. 8. Future Work Stage is currently in a dismantled state. For future work, we would like to deploy the software and use it on the actual Stage platform instead of performing tests on standard display devices. This would allow us to test factors such as whether one Kinect sensor is sufficient for the whole environment. In terms of using Stage for a mission operation, there is lots of work needed to be done. To begin, a stitching algorithm for constructing either spherical or cylindrical projected images from realtime camera data can be written. Another important function for Stage to have is the ability to issue RAPID commands, which is a communications protocol for controlling robots. Future performance requirements will necessitate that Stage be ported over to a faster programming language, such as C/C++ or C#. We are currently having performance difficulties with processing and rendering Streetview images with the Java environment. It seems unlikely that Stage will be able to process and 30-60 frames per second of panoramic images without significant performance enhancements. Acknowledgments The work described in this paper was supported by the National Science Foundation under grant number 0852088 to California State University, Los Angeles. It was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. We would also like to acknowledge David Torosyan for collaborating with the work done on Stage, the CURE Program, and Section 317F. References Ardor3D. Retrieved 15 Aug 2011. <http://www.ardor3d.com>. Thompson, Jamie. Google Streetview Static API. 15 May 2010. <http://jamiethompson.co.uk/web/201 0/05/15/google-streetview-staticapi/>. UNITY: Game Development Tool. Retrieved 15 Aug 2011. <http://www.unity3d.com>. Vrsinsky, Jan. 360Cities Panoramic Photography Blog. 10 June 2010. <http://blog.360cities.net/tag/googleearth/>.

Appendix A. Google Streetview Undocumented API Given the latitude/longitude or panoid of a location, retrieve the metadata. The panoid is a unique identifier string for a location, which is used internally by Google. The metadata can be retrieved at the following URLs: http://cbk0.google.com/cbk?output=json&ll=[latitude],[longitude] http://cbk0.google.com/cbk?output=json&panoid=[panoid] The output type can be changed to xml if xml output is desired. The metadata is used to retrieve the panoid of the location. With the panoid, it is possible to retrieve the image tiles and construct the spherical projected image. Several tiles put together form the entire image. The dimension of each tile is 512x512 pixels. Tiles are accessed at the following URL: http://cbk0.google.com/cbk?output=tile&panoid=[panoid]&zoom=[zoo m]&x=[x]&y=[y] Possible values for vary from 1-5, but values of 4 and 5 may not be available at all areas. The and parameters are used to indicate the coordinates of the tile piece; these values are always positive and start at 0,0. The maximum values of these coordinates vary, even for identical zoom levels. In the case of an invalid,, or parameter, it will return a single channel image which will be all black. This makes the detection of out of bound values simple. Retrieve the tiles by iterating over and until the boundaries are found. Once the full image is retrieved, it may need to be cropped. To correct the image height, start from the bottom and eliminate all consecutive lines that contains all black pixels (within pixel values the current implementation uses =10). Correcting for width uses the property of spherical image projections, where width=2 height. Eliminate pixel columns from the right until this property is satisfied. Figure 3 is an example output produced by this method; Figure 4 is the same image as viewed from Stage. Figure 5 is a mockup image of the Stage environment.

Figure 3. Google Streetview Output Figure 4. Google Streetview Output Viewed from Stage

Figure 5. Stage Mockup Appendix B. Architecture Diagrams Ardor3D AbstractStageScene StageApplication StageStreetviewScene Figure 6. Stage Diagram Microsoft Kinect API Kinect DLL Wrapper Target Language Interface Target Language Implementation Figure 7. General Kinect Interface Diagram

Figure 8. Java Kinect Interface and Implementation Diagram