The Video to Panoramic Image Converter

Size: px

Start display at page:

Download "The Video to Panoramic Image Converter"

Cassandra Hodge
5 years ago
Views:

1 The Video to Panoramic Image Converter Mfundo Bill Thesis presented in fulfilment of the requirements for the degree of Bachelor of Science Honours Computer Science at the University of the Western Cape Supervisor: Mehrdad Ghaziasgar Co-supervisor: Reg Dodds This version June 12, 2016

2 ii

3 Declaration I, Mfundo Bill, declare that this thesis The Video to Panoramic Image Converter is my own work, that it has not been submitted before for any degree or assessment at any other university, and that all the sources I have used or quoted have been indicated and acknowledged by means of complete references. Signature: Date: Mfundo Bill. iii

4 iv

5 Abstract This document contains user requirements for the Video to 3D panorama system. This is project chosen as part of the Honours Program in Computer Science at University of the Western Cape. These user requirements where established according to Mr Mehrdad Ghaziasgar, the supervisor of this project. The structure of this document was adapted from the one that was authored by R.P.J. Coset, M.C.G. Leijten, T.J.C. Muller and J.C.J. Mens for the SP- INGRID enviroment system at Eindhoven University of Technology. v

6 vi

7 Key words Computer Vision Video Frames Panorama Panoramic Stitching 3D Modelling Video Summarisation vii

8 viii

9 Acknowledgment This thesis is a compilation of the efforts of many people that helped and me through the years. I would first like to thank my supervisor Mr Mehrdad Ghaziasgar encouraging me during my study. Without our weekly meetings, this work would not have been possible. At this time I would like to extend a very special thanks to the post-graduate assistant. Without their help I would certainly not be where I am today. ix

10 x

11 Contents Declaration Abstract Key words Acknowledgment iii v vii ix 1. User Requirement Document Introduction Scope Definition Overview General Description Main goal General capabilities General constraints User characteristics Users Environment description Assumption and dependencies Use Cases Introduction End User Case Use case Use case Requirements Introduction Functional Requirements Non-Functional Requirements Use Case Diagram User Interface Specification Introduction Description User Interface Behavior xi

12 6. High Level Design Introduction Technical Solution Break Down in Subsystems Subsystems descriptions Input Processing Output Low Level Design Introduction Programming language and Libraries Modules Description Input Processing Gaussian smoothing and Differencing SIFT RANSAC Stitching the images Object Oriented Design Bibliography xii

13 Chapter 1 User Requirement Document 1.1 Introduction The purpose of this document is to specify and describes requirements of the Automatic Video to 3D panorama system Scope The software implements a system that takes in a video, processes it and generates a 3D panoramic image. Given a video, the system must hide all the complicated processing happening and produce the required result Definition Panoramic Frame (of a view or picture) with a wide view surrounding the observer; sweeping. One of many still images that compose a video Image Stitching The process of combining multiple 3D photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results (Mann and Picard., 1994) Three Dimensional 1

14 Overview Chapter 2 gives a general description for the Video to 3D panorama system. it describes the product, the capabilities and the characteristics of the user and the environments it is going to operate in. Chapter 3 describes a set of use cases. Chapter 4 describes specific requirements for the Video to 3D panorama system and goes on into analysing them.

15 Chapter 2 General Description 2.1 Main goal The Video to 3D panorama system aims to devise an efficient, accurate and user friendly system that converts video into a 3D panoramic image. This can be used for applications like video summarizing and 3D modelling. 2.2 General capabilities The Video to 3D panorama system is designed to convert any kind of video compression. The system uses computer vision techniques to convert video to panorama. The user must only supply the video file and set the time frame at which the panorama must be generated. The Video to 3D panorama system acts as a black box that the user feeds with video content and it gives out a 3D panoramic image as it s output. 2.3 General constraints The Video to 3D panoramic system s quality will be constrained by video noise, video blur,the lighting in the video and the general quality of the video itself. 2.4 User characteristics Users General user: The user that supplies the system with the video file. This user s role is to supply the system with the video that is to be converted and set the time limit of how much of the video should be processed. 3

16 4 2.5 Environment description The system is designed to work in any operating system, ultimately it must also work on the cloud. The system will consist of a user interface that the end user must interact with 2.6 Assumption and dependencies It is assumed that the input video is one continuous scene and has no breaks.

17 Chapter 3 Use Cases 3.1 Introduction A Use Case is a piece of functionality in the system. Those pieces will return a value to a user, where a user does not have to be a human, but could also be a computer system. (R.P.J. Coset, 2006) End User Case Use case 1 The user uploads a video Precondition: true Postcondition: The video being processed User: End User The user uploads a video to the system, sets the time interval for conversion and submits the video Use case 2 The system processes the video. Precondition:Use case 1 User: System Post condition:the final image The system receives a video from the user, processes it and produces the resultant panoramic image. 5

18 6

19 Chapter 4 Requirements 4.1 Introduction Below are the General user requirements of the System Functional Requirements The System must convert a video into a 3 dimensional panoramic image and it must do this efficiently and accurately. The system must be automated as much as possible. The only part of the system that will not be automated is the one where the user uploads the video and sets the time limits Non-Functional Requirements The system must have a user interface that is easy to use and hides all the complex processes happening in the background. 4.2 Use Case Diagram 7

20 8

21 Chapter 5 User Interface Specification 5.1 Introduction This chapter specifically describes the user interface of the video to panorama converter. 5.2 Description The V2PIC system will be a very simple one in the eyes of a user. Since it s goal is to convert an video into a panoramic image, the system is as simple as having one button to convert the specified video to a panorama. 5.3 User Interface Behavior The user must first interact with the interface by providing the path to the video that is to be converted. This can be done by either specifying the path in typing it out into a field in the interface or by interacting with the interface using a mouse and point to where the video is situated. The user then presses the convert button which will trigger the processing and require the user to wait for some time, preferably a short period of time. The user can then view the resultant panorama. 9

22 10

23 Chapter 6 High Level Design 6.1 Introduction This chapter explains the architecture of the system, identifying all the subsystems that form the complete system. 6.2 Technical Solution Break Down in Subsystems The system has 3 main subsystem, the input, processing and output subsystem. This systematic break down will attempt to give a non-technical explanation of how the system works per component. 6.3 Subsystems descriptions This section describes each subsystem and explains what the function is per subsystem Input This component is a simple component that accepts input as the path of the video and passes it to the processing component Processing The processing component on it s own, consists of two components. The first component is one that detects if the video contains a panorama and if so, picks the key frames that will be stitched together. The second subsystem is the one that stitches frames picked by the detector subsystem Output This subsystem is where the resultant panoramic image ends up. This system will help the user view the panorama in an easily navigable way. 11

24 12

25 Chapter 7 Low Level Design 7.1 Introduction This chapter describes and explains each module of the system from a developer s point of view. The description will go into technical details about what each subsystem must do in order to accomplish it s task. 7.2 Programming language and Libraries The programming language used for the system is python3 including the libraries OpenCV and numpy which are available for python3. OpenCV (Open source Computer Vision Library) is used for certain components of the system including reading the video and extracting each frame for analysis. Numpy is used for the computations involved in each process as it is a powerful library for performing matrix operations which is highly required by the system. 7.3 Modules Description Input The first module of the system which is the input component is the most trivial one.it accepts the path of the video and validates if the provided path is correct or with the help of a regular expression and considering that the video exists in the specified path. When all is done as required, the video must then be passed to the processing module Processing The processing module is broken down into two subsystems. The first subsystem is the one which is responsible for selecting the required frames for the stitching processes. The second module is where the image stitching algorithms are apply. The picking of these frames will be based on the frame rate 13

26 14 of the video thus far because the video will be taken in a controlled manner. The stitching subsystem uses the Gaussian smoothing, SIFT and RANSAC algorithms Gaussian smoothing and Differencing This algorithm is used for extracting features of interest. Gaussian smoothing works by taking a square matrix of some chosen size (kernel size), that consists of elements that are numbers that are normally distributed. The matrix (Gaussian kernel) is convolved with the image (Provided that the image is represented by a matrix of 8 bit pixel values) to produce a new image that consists of a smoothed texture with reduced amount of noise compared to the original one. This technique is applied recursively to each resultant image and the result is stored each time. This is repeated for different scales of the image. Differencing the resultant images and finding the extrema gives key-points for features of interest SIFT SIFT, which stands for Scale Invariant Feature Transform helps to describe features from the key-points obtained from the Gaussian smoothing.it does this by selecting some fixed area of pixels surrounding the key point. The output of this algorithm is a four tuple (p, s, r, f) where p is the pixel co-ordinate,

27 15 s is the scale, r is the orientation/direction and f is the feature descriptor. (Ahi, 2016)Below are the steps for the SIFT algorithm: 1. Compute image gradients in the local 16x16 area at a selected scale. 2. Create an array of orientation/direction histograms using the gradients orientations x 4x4 histogram array = 128 dimensions of our feature descriptor. Below is a figure that illustrates orientation calculation for the descriptor: Figure 7.1: Gradient distribution around key-point. The feature descriptor f is view point independant hence it is the same across multiple images if the key-points indicate the same object in the image RANSAC The RANSAC algorithm makes sure that the feature that are the ones that belong in the transformation from image A to image B. In this way RANSAC rolls out all the outliers to make sure that we have the correct features for an overlap of two images. It does this by creating a model (line) for fitting the maximum amount of inliers from the given data points.a threshold for inliers is set and the distant from each point is computed using squared distances from the line to each point. If a point false outside the threshold, then the point is labeled as an outlier. (Fiscler, 1905) Below are the steps for the RANSAC algorithm: 1. Randomly Sample the number of data points required to fit the model

28 16 2. Comp-ute model parameters using the sample. 3. Score the fraction of inliers within a present threshold of the model. 4. Repeat until model is found. Figure 7.2: RANSAC with inliers and outliers illustration Stitching the images Using the RANSAC algorithm, given Image A and Image B, extracted features from A are compared with the ones from B. The comparison of these features determines which ones overlap (are inliers) and which ones do not (outliers). The features that overlap belong in our transformation model hypothesis. We repeat this process until with the number of repetitions depending on the probability at which we have at least one outlier.we then stitching the Images together by matching the key-points (pixel co-ordinate) where the features are situated according to the feature descriptors. The last step is to stitch each selected frame to the each resultant panorama. This is done as an iterative process until all frames are exhausted. Below is a rough overall algorithm for the stitching.

29 17 Data: Video Result: Panaroma prevframe = Video.Cputure; while Video still playing do set frame rate; currframe = frame from video capture; keypointa = GaussianDiff(prevFrame); keypointb = GaussianDiff(currFrame); descriptora = SIFT(keyPointA); descriptorb = SIFT(keyPointB); matchedset = Matcher(DescriptorA,DescriptorB); resultantimage = Stitcher(matchedSet); prevframe = resultantimage; end Algorithm 1: Algorithm for the stitching process Figure 7.3: Key-point Matching illustration.

30 Object Oriented Design Below is thus far the object oriented structure of how all the modules of the system will work. Figure 7.4: Object Oriented Design

31 Bibliography Ahi, K. (2016). Modeling of terahertz images based on x-ray images: a novel approach for verification of terahertz images and identification of objects with fine details beyond terahertz resolution. Conference Paper, page 3. Fiscler, M. A. (1905). Random Sampling Consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Technical Note, (213):3. Mann, S. and Picard., R. W. (1994). Virtual bellows: constructing highquality images from video., in proceedings of the ieee first international conference on image processingaustin. Technical report. R.P.J. Coset, M.C.G. Leijten, T. M. J. M. (2006). User requirements document,spingrid. Technical report, Technische Informatica, Eindhoven University of Technology. 19

Panoramic Image Stitching

Mcgill University Panoramic Image Stitching by Kai Wang Pengbo Li A report submitted in fulfillment for the COMP 558 Final project in the Faculty of Computer Science April 2013 Mcgill University Abstract