Project report Augmented reality with ARToolKit

Similar documents
Augmented reality with the ARToolKit FMA175 version 1.3 Supervisor Petter Strandmark By Olle Landin

Topics and things to know about them:

Outline. Introduction System Overview Camera Calibration Marker Tracking Pose Estimation of Markers Conclusion. Media IC & System Lab Po-Chen Wu 2

TSBK03 Screen-Space Ambient Occlusion

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

EE795: Computer Vision and Intelligent Systems

28 SAMPLING. ALIASING AND ANTI-ALIASING

Camera model and multiple view geometry

Introduction to Homogeneous coordinates

Short on camera geometry and camera calibration

Rendering Grass Terrains in Real-Time with Dynamic Lighting. Kévin Boulanger, Sumanta Pattanaik, Kadi Bouatouch August 1st 2006

AUGMENTED REALITY. Antonino Furnari

Computer Graphics 7: Viewing in 3-D

Institutionen för systemteknik

Graphics and Interaction Rendering pipeline & object modelling

The Rasterization Pipeline

Visual Recognition: Image Formation

C P S C 314 S H A D E R S, O P E N G L, & J S RENDERING PIPELINE. Mikhail Bessmeltsev

1 Projective Geometry

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Adaptive Point Cloud Rendering

Graphics for VEs. Ruth Aylett

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

Drawing Fast The Graphics Pipeline

CS 4620 Program 3: Pipeline

CS451Real-time Rendering Pipeline

Computer Graphics I Lecture 11

Reading. 18. Projections and Z-buffers. Required: Watt, Section , 6.3, 6.6 (esp. intro and subsections 1, 4, and 8 10), Further reading:

CS 381 Computer Graphics, Fall 2012 Midterm Exam Solutions. The Midterm Exam was given in class on Tuesday, October 16, 2012.

Image Based Rendering. D.A. Forsyth, with slides from John Hart

Graphics for VEs. Ruth Aylett

CS559 Computer Graphics Fall 2015

Today. Rendering algorithms. Rendering algorithms. Images. Images. Rendering Algorithms. Course overview Organization Introduction to ray tracing

Assignment 2 : Projection and Homography

CHAPTER 1 Graphics Systems and Models 3

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important.

Chapter 4. Chapter 4. Computer Graphics 2006/2007 Chapter 4. Introduction to 3D 1

Three Main Themes of Computer Graphics

Chapter 5. Projections and Rendering

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Screen Space Ambient Occlusion TSBK03: Advanced Game Programming

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

Structure from motion

COMP30019 Graphics and Interaction Three-dimensional transformation geometry and perspective

Intrinsic and Extrinsic Camera Parameter Estimation with Zoomable Camera for Augmented Reality

Core Graphics and OpenGL ES. Dr. Sarah Abraham

3D Polygon Rendering. Many applications use rendering of 3D polygons with direct illumination

Module Contact: Dr Stephen Laycock, CMP Copyright of the University of East Anglia Version 1

CSE 252B: Computer Vision II

3D Vision Real Objects, Real Cameras. Chapter 11 (parts of), 12 (parts of) Computerized Image Analysis MN2 Anders Brun,

Computer Graphics. Shadows

INFOGR Computer Graphics

Blue colour text questions Black colour text sample answers Red colour text further explanation or references for the sample answers

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

Computer Graphics. Lecture 8 Antialiasing, Texture Mapping

Models and The Viewing Pipeline. Jian Huang CS456

The Light Field and Image-Based Rendering

COMP30019 Graphics and Interaction Rendering pipeline & object modelling

Interactive Computer Graphics. Hearn & Baker, chapter D transforms Hearn & Baker, chapter 5. Aliasing and Anti-Aliasing

Lecture outline. COMP30019 Graphics and Interaction Rendering pipeline & object modelling. Introduction to modelling

ECE-161C Cameras. Nuno Vasconcelos ECE Department, UCSD

Computer Graphics 10 - Shadows

OpenGl Pipeline. triangles, lines, points, images. Per-vertex ops. Primitive assembly. Texturing. Rasterization. Per-fragment ops.

Computer graphics 2: Graduate seminar in computational aesthetics

Shadow Algorithms. CSE 781 Winter Han-Wei Shen

Spring 2009 Prof. Hyesoon Kim

CS559: Computer Graphics. Lecture 12: Antialiasing & Visibility Li Zhang Spring 2008

Computer Graphics. Lecture 02 Graphics Pipeline. Edirlei Soares de Lima.

IMAGE-BASED RENDERING AND ANIMATION

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers

Rendering Grass with Instancing in DirectX* 10

CIS 580, Machine Perception, Spring 2016 Homework 2 Due: :59AM

Modeling the Virtual World

Flexible Calibration of a Portable Structured Light System through Surface Plane

Computer Graphics (CS 543) Lecture 10: Soft Shadows (Maps and Volumes), Normal and Bump Mapping

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

CS 4620 Midterm, March 21, 2017

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

Let s start with occluding contours (or interior and exterior silhouettes), and look at image-space algorithms. A very simple technique is to render

Game Architecture. 2/19/16: Rasterization

THE AUSTRALIAN NATIONAL UNIVERSITY Final Examinations(Semester 2) COMP4610/COMP6461 (Computer Graphics) Final Exam

CSE 167: Introduction to Computer Graphics Lecture #4: Vertex Transformation

Drawing in 3D (viewing, projection, and the rest of the pipeline)

CS 464 Review. Review of Computer Graphics for Final Exam

3D GRAPHICS. design. animate. render

Tecnologie per la ricostruzione di modelli 3D da immagini. Marco Callieri ISTI-CNR, Pisa, Italy

Camera Calibration. COS 429 Princeton University

CS452/552; EE465/505. Clipping & Scan Conversion

CS 130 Final. Fall 2015

DD2423 Image Analysis and Computer Vision IMAGE FORMATION. Computational Vision and Active Perception School of Computer Science and Communication

Augmented Reality II - Camera Calibration - Gudrun Klinker May 11, 2004

3D graphics, raster and colors CS312 Fall 2010

Scan line algorithm. Jacobs University Visualization and Computer Graphics Lab : Graphics and Visualization 272

Archeoviz: Improving the Camera Calibration Process. Jonathan Goulet Advisor: Dr. Kostas Daniilidis

CS464 Oct 3 rd Assignment 3 Due 10/6/2017 Due 10/8/2017 Implementation Outline

Midterm Exam CS 184: Foundations of Computer Graphics page 1 of 11

Drawing in 3D (viewing, projection, and the rest of the pipeline)

Chapter Answers. Appendix A. Chapter 1. This appendix provides answers to all of the book s chapter review questions.

Line Drawing. Introduction to Computer Graphics Torsten Möller / Mike Phillips. Machiraju/Zhang/Möller

Transcription:

Project report Augmented reality with ARToolKit FMA175 Image Analysis, Project Mathematical Sciences, Lund Institute of Technology Supervisor: Petter Strandmark Fredrik Larsson (dt07fl2@student.lth.se) December 5, 2011 1

1 Introduction Augmented reality (AR) is the the concept of enhancing real physical world with an extra layer of information. Additionally, this should be done in real-time and also provide some means of interaction. In a computer application this can be achieved by analyzing a video capture feed using image analysis and computer vision algorithms and then rendering some object on top of the video image. Determining where and how to render the objects can be done in numerous ways. It is possible to use positioning systems such as GPS, gyroscopic sensors or different image analysis and computer vision algorithms to detect markers in the video feed. The latter is the approach discussed in this report. The main problem, and what is common for all approaches, is how to determine where the viewer is positioned and oriented in the real physical world. The goal of this project is to explore the capabilities and limitations of a software library called ARToolKit. Using this library a demo application has also been produced. This demo application is written in the C programming language with GNU/Linux being the target platform. 2 ARToolKit ARToolKit is a software library that aids in the development of AR applications. It is written in C, and is free for non-commercial use under the GNU General Public License. A more production-ready and better supported version is also available for non-free use. The software was originally developed by Dr. Hirokazu Kato but is currently maintained by the Human Interface Technology Laboratory at the University of Washington [1]. Since its initial release in the late 1990 s it has undergone a rewrite and the current incarnation of the toolkit was released in 2004. After that a few sporadic releases has occurred up until it s most recent version (2.72.1) which was released in 2007. At this time, not much seems to be going on in terms of further development of the library, at least if judging by the project s official web site. The software library aims to be cross-platform and runs on most common operating systems, including Microsoft Windows, GNU/Linux and MacOS X. Several ports and bindings exist for other languages and platforms, such as Java and Android [2]. 2.1 Detection algorithm The primary functionality of the ARToolKit library is to detect markers in a captured video frame. These markers typically consist of a black and white pattern with a thick frame. A number of sample patterns is bundled with the library, but it is also possible to create custom patterns. An example pattern is displayed in figure 1. This pattern is also used by the demo application developed during this project. The toolkit supports detecting multiple markers in the same image frame. The algorithm used to detect the pattern uses a few basic concepts of image analysis. As a first step, the captured image is filtered through a thresholding filter yielding a binary image. The threshold value is one of the few parameters that can be set by the user of the library. The binary image is then passed through a connected-component labeling algorithm. The results of this pass is a labeling of the different regions of the image and the goal is to find big regions of the image, such as the wide black border shown in figure 1. From the information acquired from the labeling, the algorithm proceeds by detecting the contours of the pattern, from which one can extract the edges and corners of the 2

Figure 1: An example of a pattern that ARToolKit can detect. pattern. This finalizes the detection algorithm, and the obtained information can be used in the next step which computes the camera transform [3]. 2.2 Computer vision After detecting a pattern in the video a number of transformations is performed in order to be able to render a three-dimensional object on top of the frame. The mathematical model provided by the pinhole camera is simple and convenient, but does not correspond fully with the physical camera used to capture the image. It is however possible to idealize the camera using an affine transformation. This transformation is the 3 3-matrix α α cotθ u 0 β K = 0 v sinθ 0, 0 0 1 which contains what is called the camera s intrinsic parameters [4]. α and β are the magnification factors in the x and y directions respectively, expressed in pixel units. The parameter θ is the skew factor, or the angle between the axes, which should ideally be equal to 90, but may not be. Finally u 0 and v 0 are the location of the principal point, in pixel units, which is the point where the optical axis intersects the image plane. After the normalization, the detected pattern can be matched against a number of templates to determine which pattern that has been detected. Next, using the lines and corners from the detection algorithm, a projective transformation is computed. The projective transformation maps the image plane onto itself with the perspective taken into account. An important property of this transformation is that a line maps to a line, with cross-ratios preserved. And finally, at this point the camera transform can be computed, which is a mapping between the camera s coordinate system and the world s. These computations needs to be done at every frame because the transformations depend both on the real world position of both markers and camera. The intrinsic parameters however only change if the focal length of the camera changes, e.g. when zooming. 3

2.3 Computer graphics ARToolKit is tightly integrated with the OpenGL graphics pipeline which is used for the actual rendering. OpenGL has, put simply, three different spaces between which transformations are done. An object that is to be rendered to the screen first has it s coordinates defined in its own model space. In order to place this object into a scene, the world transformation is applied, and thus, the coordinates are now in world space. Finally, the object is transformed into view space which is defined by a camera model. These transformation operates on points in three-dimensions given in homogeneous coordinates, and are thus matrices of size 4 4. These transformations can be combined to one single transformation by multiplying the matrices together, which is often referred to as the model-view transform. The results of the detection and computer vision algorithms described in the previous section can be used to set up these matrices in order for us to render graphics which appear in the captured video frame. The rendering of a frame with ARToolKit normally starts with grabbing a frame from the video capture device and rendering it to a frame buffer. The previously described algorithms are then applied to the image in order to detect a pattern. If no marker is detected, the frame buffer is displayed to the screen and the rendering is complete. If a marker is detected however, the model-view transformation matrix is computed and passed down to the OpenGL pipeline. Next, using the standard OpenGL draw commands whatever geometry that is desired can be rendered to the frame buffer. When the rendering is complete, the frame buffer is displayed to the screen and the next video frame can be grabbed from the camera. 3 Demo application In order to test and analyze the ARToolKit a simple demonstration application was implemented. This application renders a four-vertex polygon, i.e. a quad, textured with an image, e.g. a photo. Additionally, in order for it to appear more realistically in the video frame, a few adjustments are made. The demo applications uses OpenGL shaders to apply these adjustments in an efficient manner, and the adjustments are described in the following sections. A screen capture of the application is displayed in figure 2. 3.1 White balance adjustment In most cases, the white balance of the captured image and the rendered image does not match. In an attempt to overcome this discrepancy a simple method of manual white balance calibration was implemented. The user of the program can manually using the mouse select a color w = (R w,g w, B w ) from the captured video frame, which is then used as the white point. In this case the colors are 8-bit RGB values, i.e. each color component are in the range [0, 255]. In order to apply the white balance adjustments a pixel s color (R,G, B ) is scaled into the resulting color (R,G, B) with the transformation R 255/R w 0 0 R G = 0 255/G w 0 G. B 0 0 255/B w B This adjustment will make the rendered image get the same tint as the background video frame. 4

Figure 2: The demo application in action. 3.2 Anti-aliasing The discrete nature of a computer screen will lead to jagged edges (aliasing) when the objects are drawn to it, causing a disturbing transition from the background to the rendered object. This is a common problem in computer graphics that has to be dealt with if decent image quality is desired. There are many solutions to this problem, and one is supported natively by OpenGL and by recent graphics hardware. This method is based on multisampling and requires the objects to be rendered in the correct order to work. There are also other methods of anti-aliasing. For instance, it is possible to, in a post-processing step, use edge-detection algorithms to find edges and after that remove the jagged edges. Due to the way ARToolKit renders the video feed by default, a way to incorporate the native multisample anti-aliasing as described above was not found. However, a very simple anti-aliasing filter based on alpha blending was applied so that the edge of the rendered photo better blends with the background. This method simply make the rendered image slightly transparent in the edges. The method is not in any way good, and will only work for rectangle shaped objects. For the purpose of this project, it will do the job and slightly improve the rendering quality. The results of the anti-aliasing is displayed in figure 3. 4 Results Augmented reality is a concept with many potential uses in many different areas. The method utilized by ARToolKit, by using markers, is a simple and easy to grasp way of achieving nice effects and interactivity. However, there are many drawbacks to this method. For one thing, the pattern must be positioned so that all of it is visible in the video frame. If even the slightest part of it is covered or creased, for just a few frames, the detection will fail. There is of course a possibility using additional algorithms to approximate the pattern location and orientation, but it is not supported by ARToolKit. Also the observa- 5

Figure 3: Aliasing between background image and rendered image is evident in the left figure. On the right, results of an attempt to remove these artifacts. tion angle of a pattern is of course limited to the hemisphere above it. The image quality produced by the video capture device, along with lighting conditions is yet another factor that needs to be taken into account. More recent research in the area have revealed new and more involved methods of augmented realism. One such method is Parallel Tracking and Mapping (PTAM) which need no markers or precomputed maps [5], and therefore offers more flexibility. The ARToolKit is a quite dated and poorly documented piece of software. For making a simple demo application it does the job, but in order to do more advanced rendering a more powerful library is needed. In fact, even during the writing of the simple demo application in this project, its limitations was inhibiting. If one wish to get involved in the underlying algorithms, digging around in the source code is pretty much the only option. But then, on the other hand, there is a production grade version of the library supposedly better supported and more stable. It is possible to apply many other techniques to improve the appearance of the final image than the ones experimented with during this project. However, since the library is rather limiting when it comes to accessing more modern features of the OpenGL pipeline. One idea for further improvement is to try to approximate the noise that is present in the video frame, and then apply that to the rendered image as well. The white balance calibration could also be done automatically by using a known white region in the video frame instead of manual selection of a white point. Another big issue that should be addressed for further improvements is the jittery appearance of the rendered image. This is caused by approximation errors that will be different from one frame to the next. Very often, this will cause a big enough difference in the computations so that the position of the rendered object is changed, even though the camera is stationary. One possible solution for this would be to use previous computations and try to interpolate between them to get smoother movement. Bibliography [1] HIT Lab. ARToolKit Home Page. [online] Available at: http://www.hitl.washington. edu/artoolkit/ [Accessed 30 November 2011] 6

[2] nyatla.jp. FrontPage.en - NyARToolKit. [online] Available at: http://nyatla.jp/ nyartoolkit/wiki/index.php?frontpage.en [Accessed 30 November 2011] [3] HIT Lab. ARToolKit Documentation (Computer Vision Algorithm). [online] Available at: http://www.hitl.washington.edu/artoolkit/documentation/vision.htm [Accessed 30 November 2011] [4] Forsyth, D.A. and Ponce, J, 2003. Computer Vision, A Modern Approach. Upper Saddle River, NJ: Pearson Education. [5] Klein, G. Parallel Tracking and Mapping for Small AR Workspaces (PTAM). Available at: http://www.robots.ox.ac.uk/~gk/ptam/ [Accessed 30 November 2011] 7