Institutionen för systemteknik

Size: px

Start display at page:

Download "Institutionen för systemteknik"

Arthur Small
5 years ago
Views:

1 Institutionen för systemteknik Department of Electrical Engineering Examensarbete Image interpolation in firmware for 3D display Examensarbete utfört i Elektroniksystem vid Tekniska högskolan i Linköping av Martin Wahlstedt LiTH-ISY-EX--07/4032--SE Linköping 2007 Department of Electrical Engineering Linköpings universitet SE Linköping, Sweden Linköpings tekniska högskola Linköpings universitet Linköping

3 Image interpolation in firmware for 3D display Examensarbete utfört i Elektroniksystem vid Tekniska högskolan i Linköping av Martin Wahlstedt LiTH-ISY-EX--07/4032--SE Handledare: Examinator: Thomas Ericson Setred AB Kent Palmkvist isy, Linköpings universitet Linköping, 9 November, 2007

5 Avdelning, Institution Division, Department Division of Electronics Systems Department of Electrical Engineering Linköpings universitet SE Linköping, Sweden Datum Date Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport ISBN ISRN LiTH-ISY-EX--07/4032--SE Serietitel och serienummer Title of series, numbering ISSN URL för elektronisk version urn:nbn:se:liu:diva Titel Title Bildinterpolation i programmerbar hårdvara för 3D-visning Image interpolation in firmware for 3D display Författare Author Martin Wahlstedt Sammanfattning Abstract This thesis investigates possibilities to perform image interpolation on an FPGA instead of on a graphics card. The images will be used for 3D display on Setred AB s screen and an implementation in firmware will hopefully give two major advantages over the existing rendering methods. First, an FPGA can handle big amounts of data and perform a lot of calculations in parallel. Secondly, the amount of data to transfer is drastically increased after the interpolation and with this, a higher bandwith is required to transfer the data at a high speed. By moving the interpolation as close to the projector as possible, the bandwidth requirements can be lowered. Both these points will hopefully be improved, giving a higher frame rate on the screen. The thesis consists of three major parts, where the first handles methods to increase the resolution of images. Especially nearest neighbour, bilinear and bicubic interpolation is investigated. Bilinear interpolation was considered to give a good trade off between image quality and calculation cost and was therefore implemented. The second part discusses how a number of perspectives can be interpolated from one or a few captured images and the corresponding depth or disparity maps. Two methods were tested and one was chosen for a final implementation. The last part of the thesis handles Multi Video, a method that can be used to slice the perspectives into a form that is needed for the Scanning Slit display to show them correctly. The quality of the images scaled with bilinear interpolation is satisfactory if the scale factor is kept reasonably low. The perspectives interpolated in the second part show good quality with lots of details but suffers from some empty areas. Further improvements of this function is not necessary but would increase the image quality further. An acceptable frame rate has been achieved but further improvements of the speed can be performed. The most important continuation of this thesis is to integrate the implemented parts with the existing firmware and with that enable a real test of the performance. Nyckelord Keywords interpolation, view interpolation, image processing, scanning slit, 3D, FPGA

7 Abstract This thesis investigates possibilities to perform image interpolation on an FPGA instead of on a graphics card. The images will be used for 3D display on Setred AB s screen and an implementation in firmware will hopefully give two major advantages over the existing rendering methods. First, an FPGA can handle big amounts of data and perform a lot of calculations in parallel. Secondly, the amount of data to transfer is drastically increased after the interpolation and with this, a higher bandwith is required to transfer the data at a high speed. By moving the interpolation as close to the projector as possible, the bandwidth requirements can be lowered. Both these points will hopefully be improved, giving a higher frame rate on the screen. The thesis consists of three major parts, where the first handles methods to increase the resolution of images. Especially nearest neighbour, bilinear and bicubic interpolation is investigated. Bilinear interpolation was considered to give a good trade off between image quality and calculation cost and was therefore implemented. The second part discusses how a number of perspectives can be interpolated from one or a few captured images and the corresponding depth or disparity maps. Two methods were tested and one was chosen for a final implementation. The last part of the thesis handles Multi Video, a method that can be used to slice the perspectives into a form that is needed for the Scanning Slit display to show them correctly. The quality of the images scaled with bilinear interpolation is satisfactory if the scale factor is kept reasonably low. The perspectives interpolated in the second part show good quality with lots of details but suffers from some empty areas. Further improvements of this function is not necessary but would increase the image quality further. An acceptable frame rate has been achieved but further improvements of the speed can be performed. The most important continuation of this thesis is to integrate the implemented parts with the existing firmware and with that enable a real test of the performance. v

9 Acknowledgments I would like to thank Setred AB for the opportunity to write this thesis, especially Joel de Vahl for endless questions, discussions and help with Ruby. Thomas Ericson have also been involved in a lot of discussions and has been of great help with proof reading. The project would not have been viable without their help. Doug Patterson have been of great help with discussions of the existing firmware and ideas of integration. Finally, I would like to thank my near and beloved for their constant support. vii

11 Contents 1 Introduction Setred Background Hardware target Method Limitations Scanning Slit 3D displays Seeing in 3D Monocular depth cues Binocular depth cues Understanding the Scanning Slit display Rendering for the Scanning Slit display Multi Video The Generalised Rendering Method Up sampling basics Known techniques Nearest neighbour interpolation Bilinear interpolation Bicubic interpolation Other interpolation techniques Comparisons between nearest neighbour, bilinear and bicubic interpolation Implementation of bilinear interpolation Discussion Future work View interpolation Image sequence creation Interpolation method Interpolation method Image quality improvement Quality improvements for method ix

12 5.2.2 Quality improvements for method Implementation of view interpolation Discussion Future Work Implementation of Multi Video Discussion Future work Conclusions and final discussion Performance Performance of the bilinear up scaler Performance of the view interpolation-multi Video chain DVI load reduction DVI load reduction with the up scaler integrated DVI load reduction with view interpolation integrated Conclusions Bibliography 43 A VHDL code example 45 A.1 Block RAM instantiation A.2 View interpolation block

13 Chapter 1 Introduction One of the first three dimensional projections that were seen by a big audience was R2D2 s projection of princess Leia in the first Star Wars TM movie in Visualizations in three dimensions have always been fascinating and comics requiring red and green glasses were popular among young people for a long time. Even though a lot of research has been done in the field of 3D visualisations, no perfect working 3D display or projector exist today. This thesis is a part of Setred AB s development of a 3D display. Even though the idea of 3D visualization is old, no commercial 3D display or projector with good image quality, frame rate and depth exist on the market. Research in the area has been performed since the beginning of the 20 th century and there exist different, more or less working, displays that do not require glasses or other head tracking gear. These displays do, however, all have weaknesses due to the fact that the problems that arises when trying to build one are many and the shortcuts few. All known displays today are based on stereopsis, the fact that the eyes are separated on the head and therefore captures different information. The major challenge is how to show a different image to the two eyes. One of the big problems when it comes to image rendering is the huge amount of data that is needed and the short time available for rendering this to achieve real time behaviour. The rendering stage contains steps like capturing images, depth calculations, intermediate view interpolation and other operations. On Setred s 3D display, most parts of the rendering is done on the graphics card but even though modern GPUs are powerful the complete rendering process consists of massive calculations and data handling that take long time to perform. If all rendering is done on the graphics card, a massive amount of data has to be transfered to the screen, which puts extremely high demands on the transfer channels. The purpose with this thesis is to move some of the operations to an FPGA located on a PCB near the projector. The implementation will be done using VHDL, Very High Speed Integrated Circuit Hardware Description Language. The thesis consists of three major parts, which will be treated separately. The first part investigates different methods of changing resolution of images, and one method is finally implemented. This is done in chapters 3 and 4. The second 1

14 2 Introduction part investigates intermediate view interpolation of stereoscopic images and chapter 5 consists of theory on the subject. Chapter 6 shows the solution that was implemented and how that was done. The last part treats a method that is used to modify the output images into a form that is suitable to show on the display. An introduction to this method is given in section 2.3 and the implementation is shown in chapter 7. Apart from this, chapter 2 gives a short introduction to 3D visualization and the Scanning Slit display. Image scaling and view interpolation can be used separately or together, depending on the application and/or other requirements. Web sources that have been of great help during the thesis, and not referred to in the text, are Ashenden [17] for VHDL, Oetiker et al. [18] for L A TEX and Ruby Documentation [16] for Ruby. 1.1 Setred Setred aims to become the leading provider of high-end 3D display solutions. The company s display technology is a result of the joint research between Massachusetts Institute of Technology and Cambridge University. Setred aims to be the leader in enabling more intuitive and realistic interpretation of three dimensional information. The company s first product is a 3D display that acts as a digital hologram. It works with any software application that uses the computer s 3D graphics card, for example CAD applications and games. The display has a combination of properties that break the current compromises. True 3D, allowing the observer to look around objects by moving the head sideways. No restriction in head movement or number of viewers. Full colour and full resolution. Possible to make flat panel. No headgear or head tracking equipment. There is currently a 20 inch colour prototype of the display, shown in Figure Background The reason why Setred is interested in this project is that they think that these components can increase the performance and flexibility of the display. If real time behaviour can be achieved, together with the independence of input resolution, a new area of customers can be addressed. One example is ROVs, Remotely Operated Vehicles, where two cameras placed on a vehicle could display a three dimensional projection of the environment. One big advantage with performing

15 1.2 Background 3 Figure 1.1. Setred s 3D display. operations in programmable hardware is that a lot of calculations can be performed in parallel. Modern FPGA:s have become very powerful and fast and can therefore handle large amounts of data and a lot of calculations. One of the desired consequences with this implementation is that the bandwidth explosion, that arises after the views have been interpolated, are moved closer to the projector. This decreases the traffic on the connection channels and increased therefore the possibility of achieving higher frame rate. The PCB containing the FPGA is connected to a PC through DVI cables, maximum eight in number. Each DVI cable have three channels, each with a maximum transfer speed of about pixels per second [21]. Modern graphics card often have one or two DVI outputs, which means that many graphics cards or many synchronized PCs have to be used to enable use of all eight inputs. Decreasing the amount of data to be sent over the DVIs also decreases the requirements on the PC(s) running the screen. The projector, that together with a diffuser creates the screen component on Setred s 3D display, runs in a fixed resolution. This means that to show an image on the display, the image must be of this specific resolution. This is not very flexible since different applications often create outputs with different resolution. The use of smaller images also decreases the amount of data to be rendered by the GPU and sent to the display. To overcome this limitation, a resolution changer is implemented as near the projector as possible. The major benefits with interpolating intermediate views on the FPGA instead of on the GPU are the same as mentioned above. It it quite obvious that the load on the connection channels between the display and the computer reduces drastically if the main data to be sent consists of a few images and depth maps instead of a complete set of output frames. The number of output frames are usually 16 or more. It is also advantageous to perform the heavy arithmetic view interpolations on the FPGA due to the possibility of parallel calculations. The final step, in the signal chain and in this thesis, modifies the images into

16 4 Introduction a form that is needed for the display to show correct 3D illusions. This operation has to be performed after the views are interpolated and hence, the function must be implemented on the FPGA as well. The simplicity of reading and writing to the internal Block RAMs, which can be used as internal buffers, are a great benefit for this operation and an firmware implementation is therefore not only necessary but also suitable. 1.3 Hardware target The target FPGA is a Virtex-4, XC4VLX25, from Xilinx. This FPGA already exists on the current PCB and is therefore not possible to change in this project. The XC4VLX25 is equipped with 72 Block RAMs, each with a maximum storage capacity of 18 kilobit, configurable in a number of ways as described in the Virtex- 4 User Guide [20]. It is also equipped with 48 XtremeDSP Slices that can be used as bit multiplier, multiply-accumulator, or multiply-adder blocks. Two external RAMs, that can be read from and written to by in and out ports on the FPGA, exists on the PCB. The connection between the PC and the PCB is the DVI(s) mentioned above. 1.4 Method Before the implementation was started, a study of existing methods and previous made work in the area was done. Some time was also spent on trying to understand the Scanning Slit display. The implementation was done using Xilinx Webpack and the simulations were performed in Mentor Graphics ModelSim. To make the simulation result easy to interpret, small scripts were written in Ruby that parses images to text files and vice versa. These text files were read and written by the testbench. 1.5 Limitations The functions implemented in this thesis are meant to be integrated with the existing firmware on Setred s 3D display. This integration will not be performed and explained here since the time needed does not exist within the limits for this project. A short discussion of the integration of view interpolation and Multi Video can be found in section 7.2. All operations will be performed on greyscale images. Colour images are interpolated in the same way for each of the RGB channels.

17 Chapter 2 Scanning Slit 3D displays Three dimensional projections in mid-air, as in Star Wars TM, has long been a fascinating dream. Although it is possible in theory to create such an effect, the practical challenges are currently too big to make an attractive commercial product and unless major breakthroughs in the area are made, other ways to display three dimensional objects have to be used [1]. One of the systems, which basic principles have been known for decades, and currently being developed by Setred is the Scanning Slit 3D display, presented and discussed in [1]. This chapter serves as pure information in purpose to make the reader understand how Setred s 3D display works and to motivate some of the work being done in this thesis, especially chapter 7. Most information is taken from Christian Møller s thesis [1], which is the cause of Setred being founded. No references will be given unless the information is from another source. First, the reader needs a basic knowledge of how the human eyes and brain work to create three dimensional images. A short explanation of this will be given in section 2.1. A short explanation of the Scanning Slit 3D display is given in section Seeing in 3D The human eye is a complex creation and details of how the eye is built and captures information will not be given here. The interested reader can find this in any book of anatomy or human vision, for example [8] or [12]. One detail that is of importance for this thesis is that the sample time for an eye capturing an image is finite. This means that the images we see actually are an integral over time, like an image captured by a camera. The sample time is a function of photon density, or simpler the strength of the luminance, and is therefore not constant. Another interesting field for this thesis is how humans perceive depth, which is an advanced feedback system between the eyes and the brain. The huge amount of information that is processed by the brain for this purpose is said to comprise depth cues [12]. There are many depth cues, some more important than others, 5

6 Scanning Slit 3D displays and they are often divided into three categories: extraretinal cues, monocular cues and binocular cues. The two latter are described below. 2.1.

The most important of these cues are: Linear perspective. Objects that appear over a large depth will be smaller the further away a point is located.

When an object is located in front of another object, parts of the object behind will be hidden for the observer. Retinal image size.

18 6 Scanning Slit 3D displays and they are often divided into three categories: extraretinal cues, monocular cues and binocular cues. The two latter are described below Monocular depth cues The Monocular cues treat information received from one single eye and processed by the brain to give a perception of depth. The most important of these cues are: Linear perspective. Objects that appear over a large depth will be smaller the further away a point is located. A good example is a straight railroad that turns into a single point as the road reaches the horizon. Interposition. Interposition or occlusion is, by intuition, one of the most important depth cues. When an object is located in front of another object, parts of the object behind will be hidden for the observer. Retinal image size. By comparing the relative size of objects of known size, their position in depth can be estimated. For example, if a man and a house have the same height in an image, the man is probably placed in front of the house. There are several other monocular depth cues that are of importance and the interested reader can read more in [12]. Examples of the depth cues listed above can be seen seen in Figure 2.1. (a) Linear perspective. (b) Interposition/occlusion. (c) Retinal image size. Figure 2.1. Examples of three monocular depth cues. Image courtesy of Christian Møller [1] Binocular depth cues The depth cues that are of great interest for this thesis is the binocular depth cues, which exists due to the fact that the human eyes are separated on the head. Because of this, the two eyes capture two images that differs slightly from each other, which can be seen as a binocular disparity [12]. This disparity creates an effect known as stereopsis and forms the fundamental principles of all 3D displays that exist today. The idea of showing different perspectives for the two eyes and with that inducing the stereoscopic effect is old, the first stereoscope was built by Wheatstone in 1838 [10]. The stereoscopic effect can be achieved in many way,

2.2 Understanding the Scanning Slit display 7 for example by using the well known red and green glasses or by simply separate the views from the two eyes so they see different images. Figure 2.

19 2.2 Understanding the Scanning Slit display 7 for example by using the well known red and green glasses or by simply separate the views from the two eyes so they see different images. Figure 2.2 shows an old stereoscope. Figure 2.2. Stereoscope. With this as a background, an explanation of the Scanning Slit 3D display can be given. 2.2 Understanding the Scanning Slit display The goal with the Scanning Slit 3D display is to create a stereoscopic effect, as described in section This is achieved by placing a scanning slit device, a shutter, with thin vertical slits in front of a display. The shutter acts like a filter, making only a limited region of the screen reaching the observer s eyes. As long as the slit is sufficient narrow, the areas on the screen that can be seen with the two eyes are separated and therefore, the fundamental requirements for stereoscopy are fulfilled. Figure 2.3 shows the principle of the display viewed from above. Figure 2.3. Principle of the Scanning Slit display.

8 Scanning Slit 3D displays Setred s 3D screen has a number of slits open at a time, creating repeated zones on the display. This is done to increase the effective bandwidth.

20 8 Scanning Slit 3D displays Setred s 3D screen has a number of slits open at a time, creating repeated zones on the display. This is done to increase the effective bandwidth. Details of this are not important for the thesis and a motivation will therefore not be given. The interested reader can find a detailed description in Møller [1]. The open slits are even spaced with a distance, d open slits given by equation 2.1. d open slits = screen width total number of slits screen width = total number of slits 5 5 (2.1) With screen width = 40cm and 5 open slits, which is a possible setup on the current Setred display, d open slits = 40 5 = 8cm which means that the risk for an observer to see through two open slits is small. One important behaviour of the shutter is that it should be invisible for the observer. This is achieved by switching the open slit(s) in a very high rate, synchronous with changing the image on the screen. Further discussions of this can be found in section 2.3. So far, a short but straight forward explanation of the Scanning Slit display has been given but nothing have been said about the images to show on the display and how to render them. This will be explained in the following section. 2.3 Rendering for the Scanning Slit display 3D computer graphics rendering is a huge field and this chapter will only give a brief explanation of the theories and concepts needed to understand the work done in this thesis. First, a couple of definitions are needed. The cone, through which a camera captures light, is referred to as the viewing frustum. The frustum together with the front and back clipping planes defines the viewing volume, which is the volume seen by the cameras. See Figure 2.4 for illustration. Figure 2.4. Illustration of viewing volume with frustum, back and front clipping planes.

21 2.3 Rendering for the Scanning Slit display 9 Stereoscopic systems like the green and red glasses have a great advantage when it comes to display images since they can display complete views or perspectives to each eye all the time. For the Scanning Slit display this is not possible and more advanced render methods have to be used since a viewer looks through an ensemble of slices of several 3D perspectives. When rendering images for the Scanning Slit display, one of the challenges are to ensure that each slice visible through each slit adds up to form a decent 3D perspective. This can be done by slicing the perspectives with a method known as Multi Video, described in the following section Multi Video Multi Video was first developed to enable multiple images to be shown on a display at the same time. One possible use of that would be that several persons can watch different TV programs on the same display at the same time. Multi Video can also be used for 3D purpose by using several cameras and assign them a Multi Video channel each. In this case, a Multi Video channel is simply an image captured from one camera. It is quite clear that this method is useful for slicing the perspectives in order to synchronize the slices with the slits. With a good slicing method, the stereoscopic effect can be accomplished as described in Section 2.2. A US patent exist regarding using Multi Video for 3D display, in which the number of Multi Video channels and output frames are the same [11]. (a) Four Multi Video channels and one slit. (b) One Multi Video channel and five open slits. Figure 2.5. Illustration of a Multi Video setup. Figure 2.5(a) shows the frustums for the center open slit of a four channel Multi Video setup. Four channels are a very low number for 3D purpose but is used for illustrative purpose. Tests have shown that increasing the number of Multi Video channels enhances the output quality [9]. Each Multi Video channel is sliced and slices from different channels are merged to form an output frame. Figure 2.5(b) shows one single Multi Video channel. The grey lines show the frustum and the black lines the parts of the channel that will be used for the current output frame. Notice that not all channels have to be seen

22 10 Scanning Slit 3D displays through each slit, as in the leftmost or bottom slit in Figure 2.5(b). Following the same idea for the other Multi Video channels, one complete output frame can be built. The next output frame is merged in the same way but with the slits shifted one step. Two issues with Multi Video are blind zones and tearing. Blind zones can occur if the Multi Video frames are not correctly rendered, for example if the perspectives are separated too much as in Figure 2.6(a). This is specially a problem when the number of Multi Video channels is low. If this problem is taken under consideration during the perspective rendering process, it is easy to overcome. The other issue is, as mentioned above, tearing. This effect occurs when the viewer is positioned between two Multi Video channels and thus sees parts of two perspectives, as in Figure 2.6(b). This problem is not very easy to solve but one solution is quite intuitive. Increasing the number of Multi Video channels leads to shorter distance between the cameras and thus increased possibility that the observer is positioned on, or near, an optimal position. An optimal position is where a virtual, or physical, camera is placed. (a) Blind zones due to incorrect perspective rendering. (b) Tearing due to to observer s position relative the original perspectives. Figure 2.6. Two possible problems with Multi Video; blind zones and tearing. Calculating Multi Video frames To calculate the Multi Video frames a number of parameters have to be taken into account into account. A couple of these are the distance between the cameras and the shutter, the distance between the cameras, the number of cameras and the width of the shutter slits. This can be understood by looking at Figure 2.5(a). Details of the calculations will not be given here since they will not be performed in this thesis. More details of this can be found in chapter 7. The interested reader can find more information about Multi Video in Haavik [9].

23 2.3 Rendering for the Scanning Slit display The Generalised Rendering Method A different approach to the rendering problem has been developed by Møller during his PhD, and is patented. This approach is called the Generalised Rendering Method. The major difference from the methods discussed in detail in this thesis is that instead of translating the camera along an axis and calculating perspectives, a virtual camera is placed in the open slit and the image on the film plane is warped. The Generalised Rendering Method is a bit demanding to perform and will therefore not be considered further in this thesis. An implementation in firmware should be interested to investigate but no time for this exist within this project. The interested reader can find a detailed explanation of The Generalised Rendering Method in Møller [1].

24 12 Scanning Slit 3D displays

25 Chapter 3 Up sampling basics The purpose of all techniques used for image up sampling is the same; create information that does not exist. The methods to do this do however vary, both in computation time, demand of resources and output quality. Unfortunately, there are no magic techniques and a trade off between quality and computation time and/or demand of resources basically always has to be considered. It is quite clear that the quality of an up sampled image never can be better than the original, but with the best techniques and with small scale factors the difference in quality between the original and the up sampled image can be reasonably small. Unfortunately, the result always suffers from quality losses when the scale factor increases. 3.1 Known techniques Various techniques for up and down sampling of images and sequences of images are used today and they are all based on mathematical theory for numerical analysis. The most common ones are listed below. All methods have advantages and disadvantages and to decide which to use a number of parameters have to be taken into account, for example the maximum scale factor, the available computation resources and the allowed computation time Nearest neighbour interpolation Nearest neighbour interpolation is the most basic of the techniques discussed here. As the name hints, the pixel nearest the sample position is simply copied [4]. This will create an image that looks thorny, but at low computational cost since no arithmetic operations need to be performed. See Figure 3.1 for an example. The crosses in Figure 3.1(a) indicate the new sample positions relative the original image, Figure 3.1(b) shows the up sampled image. 13

26 14 Up sampling basics (a) Original image with sample positions. (b) Up sampled image. Figure 3.1. Example of nearest neighbour interpolation Bilinear interpolation Bilinear interpolation considers the four pixels surrounding the interpolation point. The interpolated pixel is then calculated through a weighted mean value of these four pixels [4]. The result is a much smoother looking image than after nearest pixel interpolation. The disadvantages are the higher computation cost and that the image tend to be a little blurry, especially for big scale factors. See Figure 3.2 for the calculation principle. Figure 3.2. Basic principle of bilinear interpolation. Assume that the distance between two adjacent pixels is one (arbitrary length unit), both vertically and horizontal. The interpolated pixel, p, is then calculated as p = (1 x v )(A(1 x h ) + Bx h ) + x v (C(1 x h ) + Dx h ) = A + (B A)x h + (C A)x v + (D C + A B)x h x v (3.1) Bicubic interpolation Bicubic interpolation is based on the same idea as the bilinear but considers 16 pixels surrounding the interpolation point. This gives a sharper result compared

27 3.1 Known techniques 15 to the bilinear interpolation to the cost of longer computation time and/or higher resource demand. Bicubic interpolation is standard in many image editing programs, printer drivers and in-camera interpolations [15]. The calculation of the interpolated pixel is done in the same way as for bilinear interpolation but is more extensive and will therefore not be shown here. The reader can get an idea of how the calculations are performed by looking at Figure 3.3. Figure 3.3. Basic principle of bicubic interpolation Other interpolation techniques There are many other ways to interpolate values. It is possible to consider more surrounding pixels or use spline-, sinc- or polynomial functions. These techniques do however require long computation time or lots of resource and an implementation will suffer from non-real time behaviour or too much hardware allocation. These higher order interpolation techniques will therefore not be consider further in this thesis Comparisons between nearest neighbour, bilinear and bicubic interpolation In order to decide which method to use a number of considerations have to be made, as mentioned above. For a software implementation, the computation time is critical. For a hardware implementation, computation time and resource allocation often goes hand in hand due to the possibility of parallelization. Another parameter to consider is the image characteristics, straight lines may for example

often look better if they are scaled using nearest neighbour instead of a higher order interpolation function. Finally, the maximum scale factor is of great interest.

28 16 Up sampling basics Figure 3.4. Comparison between nearest neighbour, bilinear and bicubic interpolation, photographic image. Figure 3.5. Comparison between nearest neighbour, bilinear and bicubic interpolation, synthetic image. often look better if they are scaled using nearest neighbour instead of a higher order interpolation function. Finally, the maximum scale factor is of great interest. For small scale factors, all of the methods may look quite good. To get an idea of the differences in image quality between nearest neighbour, bilinear and bicubic interpolation, two tests were made using Adobe Photoshop c. The inputs were uncompressed images which were scaled up 800 percent using these three methods. The result can be seen in Figure 3.4 and 3.5. As expected, nearest neighbour interpolation gives a sharp image with "visible" pixels which makes the image look a bit thorny. More interesting is the small difference between bicubic and bilinear interpolation in the photographic image, the one scaled through bicubic interpolation shows some more details but the difference is not significant for this scale factor. The synthetic images do however show a bigger difference as the edges are sharper in the image modified with bicubic interpolation. One thing to remember is that multiple interpolations of the same object create a final result that might differ from one where the interpolation is done in one step. This is quite easy to understand when considering the mathematics behind the interpolation. Figure 3.6 shows the same content as Figure 3.5 but with the

3.1 Known techniques 17 Figure 3.6. Comparison between nearest neighbour, bilinear and bicubic interpolation, synthetic image. scaling done in five equal steps instead of one.

29 3.1 Known techniques 17 Figure 3.6. Comparison between nearest neighbour, bilinear and bicubic interpolation, synthetic image. scaling done in five equal steps instead of one. The difference is most visible, and also most obvious with nearest neighbour and bilinear interpolation. This is not very surprising since these functions only consider pixels in a near surrounding to the interpolation point.

30 18 Up sampling basics

31 Chapter 4 Implementation of bilinear interpolation After the study of the known techniques used for image scaling today, bilinear interpolation was chosen for implementation. Even though bicubic interpolation have been implemented on FPGA:s before [2], the small improvement in quality is not sufficient to motivate the massive increase in computation cost. The implementation of bilinear interpolation is quite straight forward; four values have to be multiplied by a weight and then added. The natural way of doing this is to keep the weights, c i in the range [0,1] such that Σc i = 1 so that adding the products gives the interpolated value. To accomplish this, a fixed point representation of the fractionals has to be used. One solution is to use constants in the range [0,8] such that Σc i = 16, which gives a sum of products from which the interpolated value is obtained by a division of 64. This division is easily done by six right shifts, or in this case by reading the eight most significant bits of the 14 bit sum. This three bits fixed point representation limits the accuracy to 1/8, which can be considered enough to obtain a good result. The distance between the sample points in the original image is calculated as dist hor = input image width fixed point scale factor output image width (4.1) dist vert = input image height fixed point scale factor output image height (4.2) respectively, where dist hor is the horizontal sampling distance, dist vert the vertical and fixed point scale factor = 8 as motivated above. If a pixel image will be scaled to pixels, the horizontal distance, dist hor is = 5 and the vertical, dist vert, is = 5. Figure 4.1 shows an example with these values, where P i,j is the pixels in the original image and X i,j the interpolation points. Note that if the proportion of the image is to be preserved, dist hor = dist vert and only one calculation has to be performed. 19

32 20 Implementation of bilinear interpolation With this fixed point representation of the coefficients, equation 3.1 needs to be modified. This is done in equation 4.3. p = (8 x v )(A(8 x h ) + Bx h ) + x v ((8 x h )c + Dh x ) = 64A + 8(B A)x h + 8(C A)x v + (D C + A B)x h x v (4.3) Figure 4.1. Example of bilinear interpolation. Figure 4.1 together with equation 4.3 gives the interpolated output, X, partly shown as a matrix in 4.4. = X = 3P P 1,1 +5P 1,2 1,1 8 3P 1,1+5P 2,1 3 3P 1,1 +5P 1, P 2,1 +5P 2, P 2,1 +2P 3,1 6 3P 2,1 +5P 2, P 3,1 +5P 3, P P 1,1 +5P 1,2 1,1 8 3P 1,1+5P 2,1 (9P 1,1+15P 1,2)+(15P 2,1+25P 2,2) P 2,1 +2P 3,1 (18P 2,1 +30P 2,2 )+(6P 3,1 +10P 3,2 ) = (4.4).... 6P 1,2 +2P 1,3 3 6P 1,2 +2P 1, P 2,2 +2P 2, P 2,2 +2P 2, P 3,2 +2P 3,3 8 6P 1,2 +2P 1,3 (18P 1,2+6P 1,3)+(30P 2,2+10P 2,3) (36P 2,2 +12P 2,3 )+(12P 3,2 +4P 3,3 ) One issue for all up scaling function is that the amount of data is increased after the scaling has been performed. This means that the scaling function must be able to either control the input data flow or have a good output controller. The latter presupposes that the operations after the up scaling can be performed with the higher data rate. In this implementation, it is assumed that the input data flow can be controlled, for example by reading the input images from an external RAM. See Figure 4.2 for block diagram of the implementation setup.

4.1 Discussion 21 Figure 4.2. Block diagram of the bilinear interpolation function. The interpolation is done as following. 1. Read the two first rows from the RAM to the input buffers. 2. Calculate the first sample positions according to equation 4.

33 4.1 Discussion 21 Figure 4.2. Block diagram of the bilinear interpolation function. The interpolation is done as following. 1. Read the two first rows from the RAM to the input buffers. 2. Calculate the first sample positions according to equation 4.1 and 4.2 and read the needed pixels from the buffers. 3. Interpolate the entire rows from that set of input rows. Store the result in the output buffers. 4. Empty the output buffers and read in the next row from the RAM. Jump to point 2. A couple of things need to be said here. Since the rows are being interpolated synchronous, there are no need to wait for an entire row to finish before jumping to the next stage. Only a couple of pixels are needed in the input buffers before the interpolation can start and only a couple of pixels in the output rows need to be stored before the output buffers can be read. The explanation was done in this way for simplicity. In stage 4, only one row needs to be read since the latter of the two already buffered will be used for the next interpolation as well. It is up to the interpolation block to keep the rows in order. Point one will therefore only be performed at the beginning of a new image. The number of output buffers needed for the interpolation are direct dependent of the scale factor and the time demands. To double the size of an image, four buffers are needed for every pair of input rows and so on. It is also possible to only use two output rows, to the cost of double computation time. 4.1 Discussion One drawback with the constant representation used in the interpolation is that the sample positions on the rightmost side of each row can be of a distance unequal to the others. This is due to the truncation error that is a natural cause of the fixed point representation. Figure 4.3 illustrates this problem. Only one row is

34 22 Implementation of bilinear interpolation Figure 4.3. Example of discontinuity in sample distance due to truncation. shown for simplicity, the vertical interpolation is analogous to the horizontal and is omitted from the example. A small row, five pixels wide, is to be scaled to eight pixels. The sample distance is = 5 according to equation 4.1 and the sample positions are marked as x in the figure. The images that are meant to be modified with this scaling function are assumed to be of quite big sizes, at least 100 pixels, and a small discontinuity in the sample positions will therefore hardly be noticeable. Hence, this can barely be seen as a problem. The speed of this interpolation function is strongly dependent of the hardware resources available. Each interpolation requires four multiplications according to equation 3.1 and hence, the number of available multipliers determines how many interpolations that can be performed at the same time and with that the minimum time required to scale up an image. 4.2 Future work The interpolation function it not useful unless it is integrated with the existing firmware. This should not be that difficult to do if the current system is fully understood. The integration can be done with the use of an external RAM as assumed above or by direct communication with the preceding blocks, or the PC if the up scaling is wanted in the beginning of the signal chain. Since the new sample positions are calculated inside the block, the interpolation function itself is quite dynamic. Two thing that could be more dynamic are: 1. The determination of how many interpolations that should be performed at the same time, depending on the scale factor and maybe which other processes that are currently running on the FPGA. To get as good performance as possible, as many interpolations as possible should be performed in parallel. 2. The allocation of the output buffers depending on the scale factor and/or the existing time demands. If the allowed computation time make it possible to perform more interpolations sequential instead of parallel, hardware resources can be saved. For the moment, all of the factors above are fixed can therefore not be changed after the firmware have been synthesized and programmed on the FPGA.

35 Chapter 5 View interpolation Based on information from one image and the corresponding depth map, it is possible to interpolate what a camera placed a small distance aside would capture. If information from two cameras, separated from each other, capturing the same object is known the number of views that can be interpolated and the quality of the views increases drastically. From stereoscopic images it is also possible to calculate a disparity map, which shows how two corresponding pixels differ in the two images. In this thesis only horizontal disparity is considered, which means that the two cameras both have to share the film plane and be vertically aligned to each other. There are several ways of calculating disparity maps, but none of the methods will be explained here. Thulin s thesis [3] has an investigation of pros and cons of different methods and will be used as background. In this thesis, both disparity and depth maps will be considered in order to investigate the differences in speed and complexity between two implementations. Interpolation from a single input, stereoscopic inputs and more than two input images will be discussed. The creation of disparity and depth maps will not be considered and it is proposed that they are created on the graphics card and fed to the FPGA in a known format. This chapter considers methods for visualizing imaged on the 3D-screen from the disparity or depth maps and the original single or stereoscopic images. It partly reuses work done by Thulin [3] and major parts are based on three dimensional transformations. Since the images are to be shown on the 3D-display, which are fed with a sequence of images, the focus will be the creation of this sequence. 5.1 Image sequence creation A number of methods for view interpolation are known but no deep investigation of pros and cons with all of them will be presented here. Two methods that have proven to give good result with relatively small means are chosen for further analysis. The two methods are quite similar but approaches the problem in different ways. The following sections explain how the methods work and discusses their weaknesses and strengths. A number of papers on view interpolation exists, two that are of interest are Zitnick et al. [6] and Chen et al. [7]. 23

36 24 View interpolation Interpolation method 1 In this interpolation method the input image is read synchronous and the output positions for each pixel are calculated. In other words, the disparity, d r or depth, z r, at position x r is read and the output position x i is calculated as or x i = x r + warping coefficient z r x + translation coefficient x (5.1) x i = x r + d r x (5.2) where xɛ[0, 1] is the distance between the original image and the wanted view. Figure 5.1 gives an intuitive explanation of the interpolation method for five pixels. Since the interpolation is performed row by row, only a part of one row is shown. The depth, z r, or disparity, d r, and pixel value, p r, are read at position x r in the input image, as shown in Figure 5.1. The output position, x i is calculated as shown in Equation 5.1 or 5.2 and the pixel value from x r in the input image is written to this position, as shown in Figure 5.1(b). x i might be smaller, equal to or greater than x r depending on the sign of the disparity or depth. This means that some pixels in the output row might be overwritten and some might not be written at all. The numbers in Figure 5.1(b) indicates the order in which the pixels are treated. (a) Part of depth row. (b) Part of output image row. Figure 5.1. Graphical explanation of interpolation method 1. Equation 5.2 is not very difficult to understand. d r tells the vertical disparity between to pixels and for x = 1 this translation will, ideally for stereoscopic inputs, give the other input. Equation 5.1 is based on three dimensional transformations and is not very intuitive. A proof is therefore shown below. Proof Consider Figure 5.2, where the original image is captured from camera r and the wanted perspective is given by camera i. The window coordinates {x w, y w, z w } are transformed into normalized device coordinates {x nd, y nd, z nd } by the inverted viewport matrix V 1. The image in r is reprojected into camera space by the inverted projection matrix P 1 r, giving the operator P 1 r V 1. Then, the camera is translated from r to i by the matrix X r,i, giving a total operator X r,i P 1 r V 1. The object is then projected to camera i by

37 5.1 Image sequence creation 25 Figure 5.2. Illustration of the transformation. the projection matrix P i and the coordinates are transformed back to window coordinates by V. This gives the final transform T = V(P i (X r,i (P 1 r V 1 ))), where 1 W V = 2 H P i = P r = X r,i = W 2 H 2 2near right i left i 0 2near top bottom = viewport matrix concatenated so that z-values ɛ[ 1, 1]. right i +near top bottom 0 top+bottom top bottom far+near far near near 0 R L 2near top bottom R x r near D +near top bottom 0 top+bottom top bottom 0 2far near far near far+near far near x i x r far near far near = reprojection matrix. = render projection matrix, inverted to get the unprojection matrix. = x-translation matrix transforming from r to i. W/H = width/height of frustum. near/far = distance to the near and far depth clipping planes, 0. top/bottom = coordinates for the top and bottom horizontal clipping planes. L/R = coordinates for the left and right vertical clipping planes. D = distance from camera to projection plane. x i = x position for interpolated view. x r = x position for original view. right i = R x inear. D left i = L x inear D.

38 26 View interpolation = T = V(P i (X r,i (P 1 r V 1 ))) = W (near far) 1 0 (x (R L)far r x i ) W ((L R)near+2D(top bottom)) 2D(R L)(top bottom) (x r x i ) which gives x i = x r + W (near far) (x (R L)far r x i ) z+ W ((L R)near+2D(top bottom)) (x 2D(R L)(top bottom) r x i ) = x r + (warping coefficient depth + translation coefficient) x, where, warping coefficient = W (near far) (R L)far translation coefficient = W ((L R)near+2D(top bottom)) 2D(R L)(top bottom) x = x r x i The quality of the interpolated image decreases as x increases, due to the fact that views far away from the original camera position contains information that does not exist in the original image. The largest problem with this interpolation method is that empty surfaces can occur, especially at positions where the depth varies much between adjacent pixels, for example at sharp edges. To create many perspectives from wide angles, more than one input image can be used. The information from these images can be combined in different ways to create a better interpolation. See the following section for further discussions on quality improvements. One detail that are of great importance in the interpolation is the direction in which the rows are read. Consider Figure 5.3, where a view to the left of an input image is wanted. Assume that the input image is read from the left. A pixel A behind the image plane will be shifted to the left and written in the output frame. If, later in the interpolation stage, a pixel B is located behind the point A, it will also be shifted to the left and may overwrite pixel A. This is not desired since points further away should be hidden by nearer points. If the input image instead is read from the right this will not be a problem since, to use the same example, pixel A will overwrite the previous written pixel B. It can be shown analogous that the input should be read from the left if the wanted view is to the right of the input image Interpolation method 2 In the method used by Thulin [3], the output row is built synchronous by reading the depth or disparity of a specific pixel and then calculate which pixel to copy from the input row. To do this, unique depth or disparity maps are needed for each perspective. The interpolation is illustrated in Figure 5.4. Since the interpolation is performed row by row, only a part of one row is shown. Figure 5.4(b) shows the image warping. The disparity of position x i in the wanted view s disparity map is read. Then, the read address in the original image is calculated as shown in equation 5.3, where the ± depends on how the disparity is defined. The pixel value from position x r in the original image is written to

39 5.1 Image sequence creation 27 Figure 5.3. Illustration to show the importance of correct read direction. position x i in the output image. The numbers 1-5 in Figure 5.4(b) indicates the order of the pixels being warped. Note that some pixels in the original image might not be read and some might be read more than once. (a) Part of input and warped disparity row. (b) Part of input and output image row. Figure 5.4. Graphical explanation of interpolation method 2. x r = x i ± d i (5.3) The unique disparity maps are created as shown in equation 5.4. d i = d r x (5.4) Figure 5.4(a) shows how the disparity map is being warped according to itself. This warping is a rough approximation and assumes that the disparity is continuous and strictly limited so that surrounding pixels have approximately the same disparity. If this is not the case, strange artifacts will occur. This explanation was done for disparity maps but the same discussion can be used for depth maps.

40 28 View interpolation Figure 5.5. Camera setup for view interpolation with two input images. 5.2 Image quality improvement This section presents some ideas that can be used to improve the quality of the interpolated views. Both interpolation methods are considered Quality improvements for method 1 One way to improve the image quality for the interpolated views is to use more than one input image. Figure 5.5 shows a camera setup with two input images. If a third view at position x i < 0.5 is wanted, one way to combine the information from the two input images is to first calculate the view from image R and store the result into a buffer and then calculate the view from image L and store the result into the same buffer. Since L should be the dominating image, x i x L < x i x R, the interpolated image will basically consist of information from image L, but with gaps filled with information from image R. This will increase the image quality drastically but still not give a perfect result. One major advantage with this padding approach is that no arithmetic operations are needed to build the output images as no interpolation is performed. Another benefit with not performing pixel interpolation is that the sharpness is maintained. One disadvantage is that more input images implicates longer computation time. One way to further improve the interpolation quality is to consider a number of surrounding pixels in the output image. When an output position is calculated, check if the pixels surrounding this position have been written earlier. If they have not, these positions can be padded with the current pixel but without marking the pixel as written. This means that if a "correct" pixel is to be written to the position later in the interpolation process, this padded pixel will be overwritten and if not, there will not be an empty position. There are two major disadvantages with this surrounding pixel padding. First, knowledge of which pixels that have been written and which have not is required. Even though one single bit per pixel is needed for this the list grows quite big, but if the space is available it shouldn t be a problem. Another possibility is to mark the pixels in the output buffers in some way but since the BRAMs need to be addressed, they are not direct accessible like a register, such an operation would be to time consuming. The other drawback exist due to the fact that the BRAMs only have one write port and hence, only one

41 5.2 Image quality improvement 29 position can be written to at the same time. This means that the interpolation time would increase even more. The major quality losses occurs at depth discontinuities as mentioned above. If such discontinuities were treated with extra care, big quality improvements could probably be achieved. On way to do this is through segmentation, described in Zitnick et al. [6]. Here, the image is divided into segment, where pixels with similar depth or disparity are placed in the same segment. Different segments could then be treated differently and boundaries can have special treatment. One big problem with this operation is that the images have to be sorted by depth or disparity which causes extra computation time. Another way is to smooth and use colour values for segmentation, also discussed in Zitnick et al. [6]. This will however decrease the sharpness of the image and is also time consuming Quality improvements for method 2 With this method, as with method 1, the image quality increases with the number of input views. Consider Figure 5.5 again. If a third view at position x i is wanted, one way to combine the information from the two input images is to calculate the view from both image L and image R and make the output a mean of these two. If the two interpolated views in addition are weighted, depending on the x position, the result might be better. An even better result can be obtained by using more than two input images. This method do not suffer from empty spaces as method 1 does. Instead of empty spaces, fields with the same pixel value may occur, making the image look smeared out. Which one of the artifacts that damages the images most depends on the characteristics of the image. The stage where the disparity map is being warped according to itself is, as said above, a rough approximation. One way to enhance this approximation is to make the transformation in many small steps. This could increase the accuracy of the disparity to the cost of longer computation time.

42 30 View interpolation

43 Chapter 6 Implementation of view interpolation Initially, both method 1 and 2 were implemented and tested but since method 1 has some major benefits the work was in a later stage focused on this method. The decisive factors for this choice were: The same depth maps can be used for all outputs, there is no need to calculate a depth map for each perspective. No arithmetic operations need to be performed to combine information from more than one input image. The sharpness of the images is maintained. The crucial limitations to consider are the available number of multipliers and amount of internal Block RAM (BRAM), which is suitable to use as buffers. The factors mentioned above all demand one or both of these resources. The implementation uses depth maps to warp the images. This choice was done because the depth maps are already created in the rendering performed on the graphics card, using OpenGL, and the first use of this application will be together with computer synthesized images. If disparity maps should be used, the depth maps must be transformed into disparity maps, which is an unnecessary operation right now. Since one entire sequence is created from the same set of depth maps and images, the different views can be calculated in parallel on the FPGA. The implementation is done for two input images since this is a probable setup and gives good result from a relative wide viewing angle. The input images and depth maps are assumed to be stored on external RAM on the PCB, as described in chapter 4, so that full control of the input data flow is obtained. Reading and writing to this RAM will not be considered here. The internal BRAM will be used as buffers on the input, after the view interpolation and on the output after the Multi Video slicing. See Figure 6.1 for a block diagram of the entire chain of view interpolation and Multi Video slicing. The interpolation is, when the theory discussed 31

44 32 Implementation of view interpolation in chapter 5 is understood, quite straight forward and can easily be explained as following: 1. When the input buffers are not empty, the read address creator creates a read address to these. This address is also sent to the input selector together with a signal that tells if the current transformation is the first or second of that particular row. 2. The input selector receives data from the input buffers, the read address and control signal and calculates warping and translation factors for all perspectives. 3. The data, address, warping and translation factors are sent to the transformation unit which calculates the output positions. 4. The data is written into the calculated positions in the view buffers. If the row is finished and two interpolations have been performed, as discussed in section 5.2.1, Multi Video slicing can be performed. Otherwise, return to step 1. Figure 6.1. Simplified block diagram for view interception and Multi Video. All connections are not visible. Figure 6.2 and 6.3 show two input images, left- and rightmost and two intermediate perspectives that have been interpolated with the implementation described above. Even though small artifacts can be seen, the quality is astonishing good. On the right side of the interpolated images of the ant, vertical lines are visible. These lines are positions that have not been written and since the buffers are not emptied between rows, some pixels keep the value from the previous row.

6.1 Discussion 33 Figure 6.2. Input images and interpolated intermediate perspectives, ant. Figure 6.3. Input images and interpolated intermediate perspectives, space ship. 6.1 Discussion The result obtained by this view interpolation function is better than the author s original expectation.

The first part of the interpolation, where the image which is most far away from the wanted perspective is used, does a lot of work that is later overwritten.

45 6.1 Discussion 33 Figure 6.2. Input images and interpolated intermediate perspectives, ant. Figure 6.3. Input images and interpolated intermediate perspectives, space ship. 6.1 Discussion The result obtained by this view interpolation function is better than the author s original expectation. Further improvements can be made but the result is already satisfactory. The first part of the interpolation, where the image which is most far away from the wanted perspective is used, does a lot of work that is later overwritten. This is quite unnecessary but the author could not come up with a good idea to avoid this redundancy. One method that would decrease the computation time, to the cost of more buffering, is to interpolate the same perspective from both input images into two separate buffers at the same time and also keep in mind which pixels interpolated from the dominating image that have been written. The dominating image is the left for a perspective to the left and vice versa as discussed in section The pixels that have not been written from the dominating image would then be filled up with the corresponding pixels from the non-dominating image. This method would use one more buffer and one more multiplier per view but the average computation time would decrease quite drastically. 6.2 Future Work As discussed in section 5.2, quality improvements lead to longer computation time. Despite this, some extra filling should be done to eliminate the vertical lines that are obvious in Figure 6.2. The surrounding pixel padding is very easy to implement and should be further investigated. The function should, together with the Multi Video slicer, be integrated with the existing firmware. This is shortly discussed in section 7.2.

46 34 Implementation of view interpolation

47 Chapter 7 Implementation of Multi Video The implementation of Multi Video is quite simple, see Figure 7.1 and 7.2 for illustrations of the basic principles of the operation. For a shutter with slits of equal size, the width of each slice should be the same. This means that basically all information that is needed before starting to slice the perspectives are which perspective to start with and the width of each slice. Which perspective to start with is determined by the shutter setup and the slice width can be calculated as slice width = image width number of zones number of Multi Video channels. (7.1) This gives a quite complete description of the method but some problems and limitation still exist. One problem is the beginning and end of each row where the slice seldom is as wide as the others, as illustrated in Figure 7.3. Notice that only a few cameras and frustums are shown and that the scale might not be correct, all for illustrative purpose. To determine the width of the outer slices, more advanced calculations have to be done. One limitation with the approach mentioned above is that all shutter slits have to be of the same width. If different widths of the slits are wanted or varying slice widths are wanted of any another reason, the width of each slice have to be calculated separately. As mentioned in section 2.3.1, the function used to calculate the Multi Video frustums are a function of many variables. Setred have earlier developed a good and dynamic software application that calculates the frustums for an arbitrary number of Multi Video channels and output frames and this application will be used for this thesis. The reason for not using the method mentioned above or to calculate the frustums on the FPGA is that maximal flexibility is wanted. Even though the parameters to the function can be sent to the FPGA, changes in the function are hard to do if it is implemented in hardware. The calculations of the frustums are not that demanding and they only have to be performed once for each camera setup and do therefore not burden the GPU very much. The amount 35

48 36 Implementation of Multi Video Figure 7.1. Example of Multi Video output frame. Figure 7.2. Another example of Multi Video output frame. of data that need to be sent to the FPGA is relatively small as well since the frustum values only are sent to the FPGA at start up or when the camera setup is changed. During operation, the frustum values are stored in BRAM. The slicing process is very easy and is done as following: 1. Compare the frustum with the current address. If the address is a threshold value, where the next perspective slice begin, jump to point 2. Otherwise, jump to point At a threshold, the multi video channel should be changed, as illustrated in Figure 7.1. Read pixel from the updated channel and write this to the output frame. Read new frustum threshold from memory and increase the write address. Jump to point Inside a slice, read pixel from the current channel and write this to the output buffer. Increase the address and jump to point 1. Notice that no warping or shifting is done in this process which means that the read and write address always should be the same for all pixels. As the view

7.1 Discussion 37 Figure 7.3. Illustration of different slice width at screen edge. (a) Multi Video sliced ant model. (b) Multi Video sliced ship model. Figure 7.4.

buffers are read synchronous from top to bottom, it is suitable to read the pixels from a certain position in all views at the same time and swap output buffers as each threshold is reached.

49 7.1 Discussion 37 Figure 7.3. Illustration of different slice width at screen edge. (a) Multi Video sliced ant model. (b) Multi Video sliced ship model. Figure 7.4. Outputs from the Multi Video slicer. buffers are read synchronous from top to bottom, it is suitable to read the pixels from a certain position in all views at the same time and swap output buffers as each threshold is reached. This would end up with a big multiplexer/decoder structure. Figure 7.4 shows two output frames from the Multi Video unit with 16 Multi Video channels. Figure 7.4(a) shows the ant and Figure 7.4(b) the space ship from chapter Discussion The number of frustum sets are always the same as the number of output frames and since the frustums are only written in the setup phase, it is suitable to use the same buffers to the frustum values as to the output frames. The output row is not more than 1024 pixels wide which means that position 1024 and up are available for frustum values, if the addressing starts at zero. The BRAMs can be implemented with dual read ports, as described in [19], which means that this RAM sharing does not result in reading collisions. The Block RAMs are 18Kb each and can be

Terrain Rendering using Multiple Optimally Adapting Meshes (MOAM)

Examensarbete LITH-ITN-MT-EX--04/018--SE Terrain Rendering using Multiple Optimally Adapting Meshes (MOAM) Mårten Larsson 2004-02-23 Department of Science and Technology Linköpings Universitet SE-601 74