Attempting a perfect abstraction of two rendering APIs on mobile platforms

Size: px

Start display at page:

Download "Attempting a perfect abstraction of two rendering APIs on mobile platforms"

Margaret Harper
5 years ago
Views:

1 Attempting a perfect abstraction of two rendering APIs on mobile platforms Master Thesis in Computer Science Department of Computer Science Lund Institute of Technology Mikael Baros

3 Abstract To develop a successful 3D application for mobile handsets, you need more than just a basic knowledge of 3D maths and programming. You also need to be intimately familiar with the API, its benefits and limitations, and also the available tools that shorten and simplify your development phase. The biggest problem developers face today is that different handsets support different APIs and these APIs might be fundamentally different in both functionality and approach. This report will argue that the optimal approach to designing a mobile 3D application is to focus on application code by first abstracting the 3D rendering functionality from all different APIs that the application will have to run on. By creating a lightweight intermediate framework for 3D rendering we can have a single interface towards the underlying 3D engine and a unified approach to representing a scene in 3D space. 1

4 Preface This report was initiated by Redikod AB and done at the department of Computer Science at Lund Institute of Technology under the supervision of Ass. Prof. Lennart Ohlsson. During this report I've received a lot of help from people that made this possible. I'd like to especially thank Mr. Erik Robertson at Redikod AB and of course my supervisor Lennart Ohlsson for their interest and support. I would also like to thank my girlfriend, Maria, for her understanding and support during the work that was involved. 2

5 Innehållsförteckning 1. Introduction J2ME (Java 2 Micro Edition) M3G Mascot Capsule V Feature comparison Feature reflection Abstraction overview Library requirements and features Mesh rendering Mesh animation Texturing Scene graph Collision Ignored features Non-documented features Scene representation Application structure Library structure Library renderer Library objects Abstraction implementation Meshes M3G-specific implementation of Mesh Data Mcv3-specific implementation of Mesh Data Mesh Data Conclusion Mesh Appearance introduction Texture representation Textures in M3G Textures in MCv Mesh Appearance in M3G Mesh Appearance in MCv Mesh conclusion Integer vs. floating point math Transforms (matrices) Matrices in both APIs Camera Rendering pipeline M3G Renderer MCv3 Renderer Summary Abstraction additions Collision through ray intersection M3G Collision MCv3 Collision Scene graph rendering Key frame animation M3G animation

6 4.3.2 MCv3 animation Validation Abstraction improvements Collision Scene graph Similarities with other projects Comparison with a preprocessor-based solution Comparison with a partial-abstracted solution Additional benefits of our method Possible drawbacks Conclusion Benefits Compromises Effectiveness References

7 1. Introduction Thanks to today's innovations in both hardware and software it is fairly simple to develop featureabundant and rich 3D applications on even the smallest mobile handsets. Since there's a huge amount of mobile handsets that support 3D technology out today, your applications will have a broad potential user database. When developing for mobile handsets both performance and portability are two big issues. With literally hundreds of available handsets your code needs to run flawlessly across every one of them. So what happens when you encounter a handset that supports a 3D API different from the one you are using? A normal process is to try and rewrite the rendering part of your application, but this is a time-consuming process that is often followed by dozens of errors. Especially if your application code is tightly woven into your rendering code, which can easily be the case if you are exploiting API-specific optimization or features to shorten your development cycle and enhance usability. A good approach is to plan ahead. If you wrap the rendering code into an abstract collection of classes and methods, the time to port your application will shorten significantly. However, we can do better. What if we abstract the rendering interface and simultaneously implement it on both of the APIs mentioned above? We'd get a 3D graphics API that, when used in your application, will allow it to run on both M3G and MCv3-enabled handsets with only a simple re-compilation. This is the goal of this report. 1.1 Purpose The purpose of this project is twofold. First we will research why no commercial solutions exist to the proposed abstraction and if it indeed is possible to perform. The second purpose is to deliver the finished product to technology manufacturers that wish to keep a strong foothold within the developer community by offering simpler access to several years of technological development. It is interesting why an abstraction is needed to allow applications to run both on new and old APIs. The predominant reason is that the market share of old devices that run old APIs is much greater than the new models. If a technology manufacturer would publish such technology it would sway the developers to their side by allowing the developers to construct applications much faster and keep their legacy code running without losing performance. To bridge technology like this is not unknown but slightly unorthodox. Technology manufacturers more often than not prefer lobbying for their latest technological advancements. However, in the case of a market predominant with older technology, bridging must be done. Otherwise the favour of developers would be lost. A similar case can be seen in Microsoft's new XNA API. It is capable of creating games for the Xbox360, Windows XP SP 2 and Windows Vista, thus bridging over 4 years of technological development in one API. 1.2 J2ME (Java 2 Micro Edition) J2ME is a collection of Java classes that enables development of programs for embedded devices. Every J2ME-ready phone holds a Java runtime that executes the byte code you compile using one of the compilers available. In J2ME there are two particularly important parts we want to look at. First it's the configuration. A configuration determines the basic set of features available, such as floating-point support, etc. Today it's very easy to develop 3D applications since most handsets available support the configuration CLDC 1.1, which enables the use of floating-point primitives (as one of the more important features). 5

8 Another crucial part is the profile. The profile determines (among other things) the basic classes available for graphical application development. It holds both native GUI parts and primitive drawing functions enabling the application to display coloured shapes and bitmaps onto the user's screen. The profile also holds functionality for receiving user-generated input into the application such as button presses, camera-captured pictures and more. The profile used in this report is MIDP2.0. All applications developed for Java-enabled handsets will use the J2ME library for application initialization and to get a handle to the display used by the various 3D APIs. 1.3 M3G M3G is a JSR created to support 3D graphics on a wide variety of 3D handsets. It's obvious its roots lie in Java technology, especially due to its similarities with Java3D. It is very feature heavy and relies on a scene graph as its primary scene representation. Of course, as with any scene graph it can render without using a scene graph representation. M3G is often seen as a little slower than other APIs mostly because it's a big system that supports a lot of features, where other APIs are more bare-bones. 1.4 Mascot Capsule V3 Mascot Capsule V3 is a Japanese-developed API for 3D graphics on mobile handsets. It's very lightweight and thus also considered as very fast. It supports some convenient features but most advanced add-ons are expected to be implemented by the developers themselves. It has a similar approach to OpenGL regarding scene representation but can even support a simple scene graph structure through skinned meshes. 1.5 Feature comparison Below is a table of features implemented by both APIs. Scene graph Feature Mascot Capsule M3G Partial (only skeleton hierarchy) Implemented Camera None Implemented Simple compositing (shading, blending, culling) Advanced compositing (advanced shading options, pervertex colours, etc.) Materials (shininess, ambient reception, diffuse reception, etc.) Implemented None None Implemented Implemented Implemented Mesh representation Implemented Implemented Transformation matrices Implemented Implemented 3D sprites None Implemented Animation Implemented Implemented 6

9 Feature Mascot Capsule M3G Skinning animation None Implemented Key frame animation Implemented Implemented 3D vectors Implemented None Lights Implemented Implemented Texturing Implemented Implemented 1.6 Feature reflection As displayed above, to make an abstraction a lot of features currently not supported by MCv3 will have to be emulated. However, some features implemented by M3G will also have to be re-done due to performance and lightweight issues. All these changes will be benchmarked and documented. 2. Abstraction overview Before the abstraction process begins we have to choose a certain approach and scene representation that will be the same through all APIs. 2.1 Library requirements and features These are the requirements the library must meet and the problems it must solve: 1. Rendering output must be identical regardless of underlying API with some exceptions: 1. Where a richer feature exists in one API but not in the other, the rich feature might be simulated in a way that isn't identical. 2. Where hardware limitations on some devices might make one API produce different results and thus out of our library's control. 2. Input to the library (textures, meshes, etc.) must be consistent regardless of underlying APIs. 3. Changing only one include directive followed by a compilation is all that is needed to change the underlying API. 4. Working with the library should be no harder, only simpler, than working with any other existent API. 5. The library may not significantly slow down a running application, compared to only using the bare-bones API. * * This will later be proven by benchmarking. 7

10 These are the features the library must implement, regardless of the underlying API: 1. Mesh Rendering: The API must be able to render complex meshes flawlessly 2. Mesh Animation: The API must be able to perform simple key-frame animation of meshes 3. Texturing: The API must be able to both 1. Texture meshes 2. Perform per-face colouring of meshes 4. Scene Graph: The API must provide a scene graph structure to store scene information in but: 1. The scene graph must not be a mandatory rendering procedure 2. The scene graph must be lightweight and easily integrated into the existing M3G scene graph. 5. Collision: The API must provide collision handling with ray-triangle intersection. The following table illustrates which of the features above (second table) are implemented in the two APIs, with information on how it is done. Feature M3G MCv3 Mesh Rendering Mesh Animation The API can render complex meshes of arbitrary size. Has a heavy and cumbersome implemented version of key framing. The API can render up to 255 primitives, thus meshes must be split in a memory and processor efficient way. No key framing supported, unless it is loaded in a binary form. Also a bit too complex. Texturing Fully implemented. Fully implemented. Scene Graph Implemented, however it is far from lightweight. None implemented. Collision Implemented. None implemented. As can be seen, all of the five points bring up collisions between the APIs. Even in cases where a feature is fully implemented on both APIs there are still huge gaps to cross in order to make the abstraction flawless. In almost every case two completely separate implementations will have to be written to make the abstraction seamlessly integrated with the two underlying APIs. Another point that can be seen is that MCs feature set is more or less a subset of M3Gs feature set. This means that the very first step in creating an improved abstraction of both APIs is to only implement the features that exist in both APIs. Then the next step is to emulate features not present in MC and improve existing features in M3G to attain a feature-rich, optimized and improved abstraction of both APIs. I will describe each feature above in detail and give some insight into what needs to be done. 8

11 2.1.1 Mesh rendering To render arbitrarily complex meshes is a hard task as well. Not only because of the huge fundamental differences in mesh representation in both APIs, but also because we can't afford to choose any form of mesh storage. Big optimizations must be done regarding running memory storage, processing speed and availability. Availability is especially a vital task as, regardless of the underlying API, it must be easy for the application programmer to extract and alter mesh data Mesh animation Key-framing is another problem area. Not only because it has to be very processor-efficient but also because we need to keep the information available for the application programmer. A good method of blending key frames must be chosen as well. There is also an interesting aspect here. Development for mobile platforms is still relatively small scale, meaning that the animation system must be both flexible, but also configurable from code. It must allow programmers to define animation states and bone translations from code in a very efficient and easy way. However, it should also have an easy interface in case the developers wish to import a third party animation file into the abstract system. The problem is that few animation suites focus on programmer input and instead use all of their power on quick imports and performance. This system must do all things at once, which might be a little too ambitious goal Texturing Here is another big difference in the APIs. M3G can handle arbitrarily large (power-of-two) textures while MCv3 can only use full-colour bitmap textures that are 256x256. Many question marks arise here. Should we offer a conversion tool that will force all textures to be 256x256, or should we try to intercept and convert textures in the abstraction while loading? The best route here might be to allow freedom. To let the application developers know of the limitations and then let their artists choose the correct approach. Another answer might be to use the boundaries from MCv3 and enforce them throughout the entire library, thus making the result consistent Scene graph The best way to reduce hassle regarding point 5 is to ignore the already-implemented scene graph in M3G and build a good, robust and fast scene graph from scratch. That way we can make it memory efficient and also fast Collision This is also a very problematic area, especially as mesh information needs to be transparent and we need to decide whether to use the built-in collision of M3G for M3G-only-data or if we should use the same software algorithm for both APIs. An algorithm that is both memory- and speed-efficient must be chosen. The perfect algorithm for collision here would be one that is both memory efficient (as memory is scarce on handsets) but also fast enough for practical use. 9

12 2.1.6 Ignored features One particular feature that is present in both APIs has been ignored. That is built-in goraud lighting. It exists in both APIs but is done in software on most handsets and thus very slow. This is rarely (if ever) used by mobile developers and thus is a very low-priority feature to abstract Non-documented features Smaller features will also be added that don't need to be documented in the above table. These are convenience features for the programmer and application developer. These will range from the tessellation of spheres in software, to simple mathematical utilities. 2.2 Scene representation To represent a scene in M3G normally a scene graph would be used. However in MCv3 we would just render primitives separately with their own transforms. It's already been noted that the scene graph can be ignored in M3G if needed but it is a very good way of describing a scene. The problem is that if any sort of scene graph is to be implemented it will have to be implemented fully in both M3G and MCv3, since it will be impossible to mix the existing implementation with a new implementation. Especially since the code would turn out very bloated. Since a scene graph is a very good way of describing a scene, we have chosen to implement a lightweight version of a scene graph for both APIs that will be completely API independent and still give developers a good scene representation tool. This also complies with requirement 4, where the library should be easy and elegant to work with. 2.3 Application structure Below follows a sample of the application structure in both M3G and MCv3 without using the scene graph. 1. Bind graphics interface 2. Set current camera 3. Render scene 4. Flush 5. Release graphics interface As noted above, the code applies to both APIs, which gives us little freedom in designing the structure. Our abstract structure will have to follow a similar pattern, so that the software/hardware synergy is preserved. Our framework will then have to be initialized through a bind procedure, where the native graphics handle is bound to our rendering API. Then we can actual enter the rendering loop and render our scene graph / loose meshes. When all rendering is done we also need to enter a flushing/releasing procedure, where the graphics are flushed onto the display and then the handle released so it can be used by the native system again. 10

13 2.4 Library structure The Abstraction will have to provide a specific interface that is used by all applications and that internally depends on different hardware-specific interfaces. This library will be called Xlib. The symbolic name should stand for cross-platform development and portability, as well as execution. In the Xlib we'll place all public interfaces that the application will interact with to display its code. This will include Meshes, Matrices, simple physics, scene graph nodes, cameras, etc. The Xlib will in turn rely on one of two specific libraries. The specific library is the actual interface for the used API. There are several ways of implementing the connection between the Xlib and the specific library. Some not that elegant and some more. We have chosen an approach that is both elegant but also lightweight and fast. Every specific library class will be a subclass (or interface user) of the Xlib. In order to create an abstract Xlib class the application programmer will have to use a specific static Factory class. This Factory class is mandatory in every specific library and has to look exactly the same. The entire library structure is based upon this Factory class. This class will have constructor methods for every type that the Xlib supports. Calling createmesh on a Factory will return an object of the Xlib abstract mesh type, wrapped around the native representation. Since this Factory class looks exactly the same across every specific library implementation, the only code change needed to switch specific libraries is only a recompile. Simply put: XlibMesh mesh = Factory.xlibMesh(arguments); will invoke the Factory.xlibMesh method of the currently included specific library. Changing the include changes the entire underlying implementation. A thing worth noting is that it's impossible to include two different specific libraries simultaneously as the Factory class would collide. However, including two different specific libraries is both logically moot and should be impossible, so this actually enforces our software architecture rather than weakening it. A example of the factory class signature follows (cut short for readability purposes): public class Factory /** Method for simple mesh creation */ public static XlibMesh xlibmesh(string texture, int shadingmode, int[] verts, int[] norms, byte[] cols, byte[] texels); /** Creates the API-specific mesh appearance */ public static MeshAppearance meshappearance(int shadingmode, String texture); /** Creates the API-specific mesh data */ public static MeshData meshdata(int[] verts, int[] norms, byte[] cols, byte[] texels); /** Creates the API-specific transform */ public static XlibTransform xlibtransform(); /** Creates the API-specific Camera representation. * cx is the X coordinate for the screen center (usually getwidth() / 2) * cy is the Y coordinate for the screen center (usually getheight() / 2) */ public static XlibCamera xlibcamera(int cx, int cy); 11

14 2.5 Library renderer Similar to the Factory class described above, every specific implementation will also hold a Renderer. This Renderer will be in charge of rendering the API-independent data onto the underlying API. Exactly like the Factory, every specific implementation of the renderer must be identical, so that the application programmer does not notice any difference while coding. This class has basic rendering functionality and applies any transforms onto objects prior to rendering. It should also track the current scene's camera and handle all camera transforms for us. A example of its interface follows: public interface IRenderer /** Sets the camera to be used by all subsequent renderings */ public abstract void setcamera(xlibcamera camera); /** Begins rendering to the graphics object g */ public abstract void begin(graphics g); /** Finishes rendering */ public abstract void finish(graphics g); /** Renders a XlibMesh using the transform trans. If trans is null, it's treated as the identity matrix. */ public abstract void rendermesh(xlibmesh mesh, XlibTransform trans); /** Renders a full scene graph using the XlibNode as the root node. * The transform in the root node is applied. */ public void rendergraph(xlibnode root); This means we have explicit support for both rendering single mesh/transform pairs, as well as entire scene graph hierarchies. 2.6 Library objects Almost all objects in the library will follow a very simple scheme behind-the-scenes in order to render data API-independently. In all classes where it makes sense, a method called representation which will return the ambiguous Object type containing the reference to the actual specific internal representation of the object's data. When objects are processed internally, API-specific code can call the representation method and cast the returned Object to the internal object it knows to expect. 2.7 Design summary The design can easily be described with the following two figures. Figure A shows the overall structure where the application is clearly only communicating with the Xlib, while the Xlib is responsible for relaying information to the underlying APIs. (Figure A) 12

Figure B shows the actual layered structure of the Xlib, where the Application actually only interacts with the top (abstract) layer and how the factory supplies the correct implementation depending

15 Figure B shows the actual layered structure of the Xlib, where the Application actually only interacts with the top (abstract) layer and how the factory supplies the correct implementation depending on the underlying API. Also it is clear that only the specific layer of the Xlib interacts with the underlying APIs. 3. Abstraction implementation (Figure B) After setting boundaries and defining how the library will interact with the application programmer it is time to begin defining specific library functionality. A common approach here is to keep as much of the data free from native implementations as possible. The more data we can handle in the library, the less must we rely on underlying APIs. 3.1 Meshes A mesh in our library will simply be a structure holding all 3D model geometry along with details on how to render said geometry. There are four data types we need to hold, along with some flags. These data types are: Vertices Vertex colours Normals Texture coordinates We'll store vertices as a list of triangles without indices, as the MCv3 API hinders us from using indices into arrays when rendering. Instead, every triple of vertices defines a single triangle to render. This approach is a little problematic as we could have saved some memory with indexed buffers. However, this is a external restriction that we must follow in order to stay in line with our library's requirements. The argument for normals might not be obvious from the start. Since we have disabled the built-in goraud lightning on one API, why do we need normals. Of course, normals are a part of the geometrical definition and are needed for many other things in a game, such as heading, particle collision and many more areas. Apart from the actual geometry data we also need some kind of appearance information that details exactly how this geometry should be rendered in our scene. Are we using lightning, are we smoothing normals, etc. 13

16 package xlib; The xlib mesh interface will look like this: /** * Represents a mesh in the xlib. Holds data and appearance. * It also keeps track of its own transform and allows some rudimentary * affine transforms to be performed directly on it. */ public abstract class XlibMesh private MeshData data; private MeshAppearance app; /** Creates an XlibMesh with a data and appearance part */ public XlibMesh(MeshData data, MeshAppearance app) this.data = data; this.app = app; /** Creates an empty XlibMesh */ public XlibMesh() data = null; app = null; /** Fetches the mesh data */ public MeshData getdata() return data; /** Fetches the mesh appearance */ public MeshAppearance getappearance() return app; /** Sets data */ public void setdata(meshdata d) this.data = d; /** Sets appearance */ public void setappearance(meshappearance a) this.app = a; /** Sends a ray through this mesh. The transform trans * is applied to the mesh's vertices to obtain the * world-space coordinates. * Returns < 0.0f if there was no collision. */ public abstract float rayintersection(float[] rayorigin, float[] raydirection, XlibTransform trans); As for the appearance and data that are referenced above: package xlib; /** * Abstract class holding the rudimentary appearance information * for a certain mesh */ public abstract class MeshAppearance /** FLAT shading */ public static final int SHADING_FLAT = 1; /** SMOOTH shading */ public static final int SHADING_SMOOTH = 2; /** Texture replace */ public static final int TEXTURE_REPLACE = 4; /** Texture standard blending */ public static final int TEXTURE_BLEND = 8; /** Texture additive blending */ public static final int TEXTURE_ADD = 16; /** Abstract method that's overloaded by subclasses to provide functionality */ protected abstract void storeinternal(int shadingmode, XlibTexture texture); 14

17 /** Abstract method that returns the physical representation, depending on the underlying API */ public abstract Object representation(); /** * Abstract class holding the rudimentary data for a mesh including: * vertices * normals * colors (per face) * texture coords */ public interface MeshData /** Abstract method that's overloaded by subclasses to provide functionality */ public abstract void storeinternal(int[] verts, int[] norms, short[] cols, int[] texels); API */ /** Abstract method that returns the physical representation, depending on the underlying public abstract Object representation(); Three very simple descriptions of what a mesh should contain. It is then up to each specific implementation to extend these interfaces and provide a API-representation of these concepts. By appearance we define here the way the model should be rendered in regards to shading, blending, etc M3G-specific implementation of Mesh Data This is the first class to be placed in the m3g specific library. M3G stores all its mesh data into a construct called the VertexBuffer. In reality it is a container for four different vertex arrays called VertexArray. The problem for us is to convert our native Java arrays into a construct that M3G can read. The biggest problem here is the precision. In M3G we expect all values to be of the short native type (a 16-bit integer) but our API has taken for granted that we use the int type for everything (as that is what the MCv3 API uses). The easiest solution here is to clamp values, but this would introduce huge precision errors and also hard-to-track bugs for the application programmer. Instead a more correct solution must be to scale values. Again, some precision will be lost, but this loss will be manageable and hopefully no real information will be lost. To scale values, we iterate over the entire vertex array to find what values are greater than the maximum value of a short, or smaller than its minimum. Then we construct a scaling component. Finally, we choose the smallest of these scaling components later when we're doing the actual conversion. Here is pseudocode snippet that shows the approach (verts is the native array passed into the function): 1. max = Short.Max 2. For every vertex V in the array verts do: 1. If V > Short.MAX 1. max = V 3. If max > Short.Max 1. Set scale = Short.MAX / max 2. For every vertex V in the array verts do: 1. V = V * scale 15

18 This very simple approach guarantees that we won't see any clamping and all models passed through here will retain their original form. Only if the model contains both very large and very small faces this function may produce artefacts by reducing the small faces to zero-area-triangles (degenerate triangles). However, having a model that uses the entire integer range for detail is highly improbable and also poor design, as such a model would indeed be huge and impractical. The rest of the conversion is fairly simple, as a conversion from a native Java array to a M3G VertexArray is actually very simple. However, there is another problem. MCv3 supports perface normals, which M3G does not, as it only supports normals per vertex. For the fastest solution, we simply copy the per-face-normal to every vertex in a given face when we're storing it for M3G. This way we retain the mathematical property of a per-face normal but without compromising M3G's rendering pipeline. One last thing to notice is that we only support triangles in our library and more specifically we only support triangles that are implicitly defined by each triple in our vertex array. That is: vert[i], vert[i + 1] and vert[i + 3] define a face for every i % 3 = 0. This fits well with both MCv3 and M3G as both libraries have methods of defining meshes where faces have been described implicitly as above Mcv3-specific implementation of Mesh Data For MCv3 the conversion is much more simple, as its internal representation corresponds to our in a much better way. We use the following function in MCv3 to render primitives: public final void renderprimitives(texture texture, int x, int y, FigureLayout layout, Effect3D effect, int command, int numprimitives, int[] vertexcoords, int[] normals, int[] texturecoords, int[] colors) As apparent, we can actually use our internal primitive Java arrays as the mesh data, thus no conversion needs to be done here. All we need to do is to set flags depending on the array sizes. Thus our internal data structure looks like this: public static class MeshArrays // Rudimentary mesh data public int[] verts; public int[] norms; public int[] cols; public int[] texels; // Holds the MCv3 specific command field public int command; // Number of triangles public int numtris; MeshArrays next = null; The flags we need to set are if we are using normals, colors etc. All this information can be parsed by looking at the various arrays passed into our class and their sizes/components. 16

19 The biggest problem here is the size of geometry. A known limitation of the MCv3 API is that it can only render up to 255 primitives in one draw call. This means that any geometry with more than 255 triangles must be split up in the creation step. This is where the next member of the MeshArrays is used. By storing data as a simple linked list, we can easily traverse data and render it. Every time a mesh with over 255 triangles is created, we split the data into chunks of 255 (or less) and link them together in the MeshArrays linked list. Later on, in the Renderer, we just follow the linked list, rendering all nodes in sequence. The second biggest problem here is the colour data. Just as we had some problems with normals in the M3G case, we have a problem with colors here. In MCv3 only colors per face are allowed, while M3G supports both per face and per vertex. We've kept this functionality and thus need to emulate it for MCv3. This is done with a simple brute-force manner, where we calculate the mean of all colors in a face and assign that to the entire face. Also, we need to do a data type conversion. The colour data representation we have chosen holds one byte per channel, forming a colour out of three consecutive bytes in the colour array: [R 1, G 1, B 1, R 2, G 2, B 2, R 3, G 3, B 3 ] However, MCv3 uses a very widely-used approach. It stores these bytes as part of an integer. So instead of having a separate byte per colour channel they are packed into an integer. We have to take this into account when converting as well. This leads us to the following function: 1. Set colors as Array of Size(numTriangles) 2. For every triangle 1. Set c as Mean(ColorForVertex1, ColorForVertex2, ColorForVertex3) 2. Set colors[currenttriangle] = c 3. Return colors It's easy to see the new array that is constructed per-face as the mean of the per-vertex colors. Also, note that the integer byte packing isn't shown in the pseudocode Mesh Data Conclusion The specific implementations were fairly easy to do, thanks to the flexibility of the xlib. There were no difficulties in converting data to fit any API's internal format. The classes are fundamentally different due to the differences in the APIs regarding storage of mesh data and we also had to make some compromises regarding the colour data, also due to limitations in the APIs. However, the resulting implementation is very robust and can fill almost anyone's needs. Also, the choice of not allowing any other primitives other than triangles is because in most cases end-user models are triangulated, or at least triangulated before the render call. It felt only natural to perform this as a restriction, rather than a choice Mesh Appearance introduction The design choice in the xlib is backed by the way mesh appearances are handled in M3G; through a extern appearance class. This allows for appearance sharing between similar models but also simplifies render state sorting, if this was to be implemented by the developer later on. I have chosen to ignore implementing render state sorting as that is a engine specific feature and this work only handles the API calls. 17

20 As seen above, we only support two different shading options. Flat and smooth shading, even though MCv3 only supports flat shading. To include both is a conscious choice in order to cover all aspects of shading. In MCv3 there are two different shading types; the normal flat shading and toon shading. The choice we made is to define the smooth shading constant in the xlib as smooth shading in M3G and flat shading (with our per-face median colour calculation) in MCv3. The flat shading is then defined as flat shading in M3G and toon shading in MCv3. This enables the application programmer to exactly choose which approach to use. In order to use flat shading on both platforms, choose SHADING SMOOTH with per-vertex colors that are exactly the same across a face. Otherwise one can choose between toon shading and our own median-colour smoothing. The biggest difference and the biggest problem between these two APIs regarding appearance is the texture data. Both APIs use fundamentally different file formats and loaders. M3G uses the J2ME-built-in PNG format which is easily loaded through different methods available. MCv3, however, has chosen to use its own format; BMP. Not only do the formats differ, but also the restrictions. In MCv3 the BMP that is to be used must be of 256x256 pixels. Thus one must manipulate UV sets of a mesh in order to cram several smaller textures (which is often the case in mobile 3D development) into one that is 256x256. This is a limitation we cannot circumvent. The restriction is set natively in the MCv3 API and there is no way we can supply other kinds of texture data. This leads us to a single conclusion; we must use the exact same restriction in our own library. The different format issue is also a problem, as we must choose to either implement our own loader for both formats, which would be both slow and tedious since we have no documentation what so ever on how to load raw texture data into a MCv3 class. So, we choose a dynamic approach that will benefit the developer's production pipeline. We use the PNG loader in M3G and the BMP loader in MCv3. This means that the developer must have a separate graphics set for each API, which is probably desirable in most cases as the two different APIs still can render items differently due to hardware/software synergy limitations. Thus we summarize our choices; 1. Textures are always expected to be 256x Textures will be loaded as PNG in M3G and BMP in MCv3 3. Texture names are sent without extension to the xlib, allowing it to determine the underlying API and attach the correct extension (.png or.bmp) This leads us to create a specific texture class on each platform Texture representation Textures are another resource that is bound to the Mesh Appearance and also should be able to be shared between meshes. This is why we have to use an abstract texture class. Here follows an overview of the texture class: public abstract class XlibTexture /** Loads the texture. If data already exists, it is replaced */ public abstract boolean loadtexture(string texture); API */ /** Abstract method that returns the physical representation, depending on the underlying public abstract Object representation(); 18

21 As you see it is very simple and it shouldn't be more complicated than that. The renderer can extract the native information through the representation method and the loadtexture method is overloaded by specific API classes to load the texture. Also, we choose to separate the texture and its blending information. This is because the same texture might be shared across different meshes but with different blending information (e.g. Additive blending in one case and colour replacement in the other). Thus we reduce the texture representation to that of raw data, that is later processed and assigned correctly by the native renderer Textures in M3G In M3G a texture is represented by a single texture class, just as in MCv3. The texture is very simple to load, as we are using the MIDP API. As mentioned earlier, we will be loading PNG textures for M3G and ignore anything that isn't of 256x256 pixels. Here is a snippet showing the M3G loadtexture function: public boolean loadtexture(string texture) // Check for texture if(texture!= null) // Create and set the texture try // Fix suffix texture = texture + ".png"; // Open image Image teximage = Image.createImage(texture); if(teximage == null) System.out.println("Could not open image: [" + texture + "]"); // Make sure it's the right size if(teximage!= null && teximage.getwidth() == teximage.getheight() && teximage.getwidth() == 256) // Create texture object thetexture = new Texture2D(new Image2D(Image2D.RGBA, teximage)); // Replace the mesh's original colors (no blending) thetexture.setblending(texture2d.func_replace); Texture2D.FILTER_NEAREST); // Set wrapping and filtering int wrapping = Texture2D.WRAP_CLAMP; thetexture.setwrapping(wrapping, wrapping); thetexture.setfiltering(texture2d.filter_base_level, return true; else System.out.println("WARNING: DID NOT LOAD TEXTURE " + texture + " AS ITS DIMENSIONS WERE NOT 256x256."); catch(exception e) // Something went wrong System.out.println("Failed to create texture. Error: " + e); e.printstacktrace(); return false; 19

22 Something to note is that our library uses a very logical approach where texture blending is separated from the texture itself and instead represented in the appearance. This is not the case in M3G, so here we must set a bogus value (FUNC_REPLACE). Later, when the model is rendered we have access to the appearance information and the native renderer can choose the correct texture blending mode Textures in MCv3 Loading a texture in MCv3 is very easy, as all we need to do is actually set the correct extension and load it through the MCv3 native loader. The code became very small and amounted to this: public boolean loadtexture(string texture) if(texture!= null) // Set right suffix texture = texture + ".bmp"; try thetexture = new Texture(texture, true); return true; catch(exception e) System.out.println("Could not create texture. Error: " + e); e.printstacktrace(); thetexture = null; return false; As you see, MCv3 does not associate texture blending data with the texture itself, which is exactly how we do it Mesh Appearance in M3G As mentioned earlier, M3G already uses a similar approach to ours in encapsulating all appearance data in a single class. Thus conversion of appearances is very pain-free in the M3G case. As we are using PNG here, we use the Image.createImage method to load the desired texture. Finally we translate our own constants to M3G constants before composing the Appearance class. Here is a code snippet: /** Stores internal details regarding M3Gs rendering */ protected void storeinternal(int shadingmode, XlibTexture texture) // Allocate storage app = new Appearance(); // Check for texture if(texture!= null) // Add texture to the appearance app.settexture(0, (Texture2D)texture.representation()); // Polygon mode PolygonMode pm = new PolygonMode(); pm.setperspectivecorrectionenable(true); pm.setculling(polygonmode.cull_none); // Check the shading mode (keeping it simple) if(shadingmode == MeshAppearance.SHADING_SMOOTH) pm.setshading(polygonmode.shade_smooth); else 20

23 pm.setshading(polygonmode.shade_flat); // Add to the appearance app.setpolygonmode(pm); // Check the blend mode if((shadingmode & MeshAppearance.TEXTURE_BLEND)!= 0) CompositingMode cm = new CompositingMode(); cm.setblending(compositingmode.alpha); app.setcompositingmode(cm); app.gettexture(0).setblending(texture2d.func_blend); else if((shadingmode & MeshAppearance.TEXTURE_ADD)!= 0) CompositingMode cm = new CompositingMode(); cm.setblending(compositingmode.alpha_add); app.setcompositingmode(cm); app.gettexture(0).setblending(texture2d.func_add); A very simple class now that we have an internal texture representation Mesh Appearance in MCv3 In MCv3 the code became minimal, as we only need to store two separate items. First the texture, which is already finished and then the actual shading data. This is represented in MCv3 with the Effect3D class. The code written amounted to the following: /** Stores internal details regarding MCv3 rendering */ protected void storeinternal(int shadingmode, XlibTexture texture) // Allocate storage if(eff == null) eff = new Effects(); // Check if we are using a texture if(texture!= null) eff.texture = (Texture)texture.representation(); // Check the shading mode (keeping it simple) if((shadingmode & MeshAppearance.SHADING_SMOOTH)!= 0) eff.shading = new Effect3D(null, Effect3D.NORMAL_SHADING, true, null); else eff.shading = new Effect3D(null, Effect3D.TOON_SHADING, true, null); // Check the blend mode if((shadingmode & MeshAppearance.TEXTURE_BLEND)!= 0) eff.texblend = Graphics3D.PATTR_BLEND_NORMAL; else if((shadingmode & MeshAppearance.TEXTURE_ADD)!= 0) eff.texblend = Graphics3D.PATTR_BLEND_ADD; Very simple, as we only translate our own constants to MCv3 versions of them Mesh conclusion The finished mesh class became very robust but also very easy to use. Resource sharing between different meshes is also very simple and very important on memory-sparse environments such as mobile phones. We solved all differences in the APIs with minimal fuss and retained a rendering result that is more or less perfectly equal. The biggest differences are: 1. Colour management. Where smooth shading is emulated on MCv3 2. Texture handling, where one needs separate data sets for MCv3 and M3G (BMP vs. PNG) 21

24 3.2 Integer vs. floating point math One of the bigger problems in abstracting the two APIs hasn't been discussed yet. M3G, being the heavier and more modern API relies on floating-point capability. However, MCv3 uses integer math for its computations. We can either perform all computations with integers and then convert them to floating point where needed (which was for most M3G methods), or we can perform all computations with floating points and convert them to integers when dealing with MCv3. This was not a easy choice as there are benefits and drawbacks with each choice. Choosing to perform all mathematical operations with integers has a big performance benefit on older handsets, where floating point math is done in software. However, allowing the application programmer only to use integers is not very practical and makes for a hard-to-use API and making it hard to port existing applications that use floating point to it. Using floating point has three big advantages; it's very easy to work with, it has much higher precision than integer math and it makes existing 3D applications from PC and other 3D-enabled devices much easier, as the math already is done in floating point. However, this approach will always be slower on older handsets. The choice in the end was to use floating point. The amount of work implementing either was just as big but floating point was chosen as its benefits outweighed its drawbacks. We sill need to choose a conversion approach that will work for both APIs, which turned out to be rather easy. In MCv3 a floating point one is described by the integer number As we are creating this API for modern handsets we can rely on the fact that floating-point is available. So, in the end all we needed to do was multiply every floating point number in every operation that needed to convert between floating and integer by This turned out to work incredibly well with very little precision loss. Of course, we did not solve the overflow issues that can occur but this we leave as a documented restriction of the API, so that the application programmer is aware of it. 3.3 Transforms (matrices) To represent all kinds of transforms in our library we will of course be using matrices. Both APIs have native matrix classes that we can use so the problem becomes only choosing a matrix representation that is flexible enough but also lightweight. To this end we have chosen an approach that is similar to that in OpenGL and MCv3 (as there are significant similarities). We need to support having rotation, translation and scaling matrices. This is the abstraction we came to: public interface XlibTransform /** Sets this matrix to a translation matrix holding dx, dy and dz. */ public abstract void translate(float dx, float dy, float dz); /** Sets this matrix to a rotation matrix rotating deg degrees around the X axis */ public abstract void rotatex(float deg); /** Sets this matrix to a rotation matrix rotating deg degrees around the Y axis */ public abstract void rotatey(float deg); /** Sets this matrix to a rotation matrix rotating deg degrees around the Z axis */ public abstract void rotatez(float deg); /** Sets this matrix to a scaling matrix */ public abstract void scale(float xs, float ys, float zs); 22

25 API */ /** Multiplies this matrix with another */ public abstract void mul(xlibtransform t); /** Multiplies t1 with t2 and stores the result in this matrix */ public abstract void mul(xlibtransform t1, XlibTransform t2); /** Sets the matrix explicitly */ public abstract void set(float[] matrix); /** Makes the matrix into the standard identity matrix */ public abstract void identity(); /** Abstract method that returns the physical representation, depending on the underlying public abstract Object representation(); As you see, we have taken a step back from the heavy M3G implementation where matrix operations can be performed on the matrix itself, instead of using multiplications. With this approach we advocate that the developers be conscious of the choices they make, knowing what ends in a matrix multiplication and what doesn't. So, of course, to create a rotation/translation matrix one would, in our library, create two separate matrices and then combine them with multiplication Matrices in both APIs Representing our abstraction became easy for both APIs, as all methods defined above were supported natively. It was a simple task of writing two filters that delegated responsibility downwards. This is what was created: M3G public class M3GTransform implements XlibTransform // Internal transform private Transform trans = null; /** Constructs internal components */ public M3GTransform() trans = new Transform(); /** Fetches the internal AffineTrans. Used by the Renderer */ public Object representation() return trans; /** Sets this matrix to a translation matrix holding dx, dy and dz. */ public void translate(float dx, float dy, float dz) // Revert matrix back to identity and translate it trans.setidentity(); trans.posttranslate(dx, dy, dz); /** Sets this matrix to a rotation matrix rotating deg degrees around the X axis */ public void rotatex(float deg) trans.setidentity(); trans.postrotate(deg, 1.0f, 0.0f, 0.0f); /** Sets this matrix to a rotation matrix rotating deg degrees around the Y axis */ public void rotatey(float deg) trans.setidentity(); trans.postrotate(deg, 0.0f, 1.0f, 0.0f); /** Sets this matrix to a rotation matrix rotating deg degrees around the Z axis */ 23

26 MCv3 public void rotatez(float deg) trans.setidentity(); trans.postrotate(deg, 0.0f, 0.0f, 1.0f); /** Sets this matrix to a scaling matrix */ public void scale(float xs, float ys, float zs) trans.setidentity(); trans.postscale(xs, ys, zs); /** Multiplies this matrix with another */ public void mul(xlibtransform t) trans.postmultiply(((m3gtransform)t).trans); /** Multiplies t1 with t2 and stores the result in this matrix */ public void mul(xlibtransform t1, XlibTransform t2) trans.set(((m3gtransform)t1).trans); trans.postmultiply(((m3gtransform)t2).trans); /** Sets the matrix explicitly */ public void set(float[] matrix) trans.set(matrix); /** Makes the matrix into the standard identity matrix */ public void identity() trans.setidentity(); public class MCTransform implements XlibTransform // Used for degree transformation protected static final int ONE = 4096; protected static final int degscale = ONE / 360; // Internal transform private AffineTrans trans = null; /** Constructs internal components */ public MCTransform() trans = new AffineTrans(); trans.setidentity(); /** Fetches the internal AffineTrans. Used by the Renderer */ public Object representation() return trans; /** Sets this matrix to a translation matrix holding dx, dy and dz. */ public void translate(float dx, float dy, float dz) trans.set(one, 0, 0, (int)dx, 0, ONE, 0, (int)dy, 0, 0, ONE, (int)dz); /** Sets this matrix to a rotation matrix rotating deg degrees around the X axis */ public void rotatex(float deg) trans.setidentity(); trans.rotationx((int)(deg * degscale)); 24

27 /** Sets this matrix to a rotation matrix rotating deg degrees around the Y axis */ public void rotatey(float deg) trans.setidentity(); trans.rotationy((int)(deg * degscale)); /** Sets this matrix to a rotation matrix rotating deg degrees around the Z axis */ public void rotatez(float deg) trans.setidentity(); trans.rotationz((int)(deg * degscale)); /** Sets this matrix to a scaling matrix */ public void scale(float xs, float ys, float zs) trans.setidentity(); trans.set((int)(one * xs), 0, 0, 0, 0, (int)(one * ys), 0, 0, 0, 0, (int)(one * zs), 0); /** Multiplies this matrix with another */ public void mul(xlibtransform t) trans.mul(((mctransform)t).trans); /** Multiplies t1 with t2 and stores the result in this matrix */ public void mul(xlibtransform t1, XlibTransform t2) trans.setidentity(); trans.mul((affinetrans)t1.representation(), (AffineTrans)t2.representation()); /** Sets the matrix explicitly */ public void set(float[] matrix) int[] i_mat = new int[12]; for(int i = 0; i < 12; ++i) i_mat[i] = (int)(matrix[i] * ONE); trans.set(i_mat); /** Makes the matrix into the standard identity matrix */ public void identity() trans.set(one, 0, 0, 0, 0, ONE, 0, 0, 0, 0, ONE, 0); Initially, the matrix class was a general mathematical matrix class written from scratch, performing multiplications, additions, rotations, etc., in software. This worked very well and even gained some memory performance over the already-implemented matrix classes of the 3D APIs in some cases, however it was always beat in performance. We realized this was due to hardware optimizations that the existing classes had been using and that were not available to us though the J2ME interface. We concluded that even if some things could be optimized with a custom matrix implementation the benefits were greater to use the existing matrix classes and wrap around them. Also, as discussed in 3.2 it is apparent above how we convert between integer and floatingpoint in the MCv3 implementation. 25

28 Another problem with the matrix class was choosing a method semantic that will be uniform across APIs and easy to use. Since both APIs treat a transformation matrix differently in form of methods and operations that can be performed on it, we had to go with a middle thing. For method semantics we chose a OpenGL-similar syntax, where each modification method replaces the original matrix with a new one. That is; if calling rotatex on a existing matrix, it will be replaced by a matrix that rotates around the X axis. Then all combinations of rotations, translations and scaling are to be done by multiplying matrices. This kept the API simple, memory efficient and easy to use. 3.4 Camera Choosing whether or not to support a camera class was another decision that had to be made. In M3G there is a fully implemented camera class that is used for rendering any model, however MCv3 keeps the OpenGL lazy approach by treating the camera for what it is; a simple transform matrix, and letting the API programmer perform model view transformations by themselves. Since one of the goals of this API was to make it powerful and easy to use the choice became obvious; a camera class was needed. The camera has a few extra methods that separate it from a transform and we can then use the M3G approach by allowing a single camera to be applied to all objects pre-rendering. The choice was made to keep the interface very simple and only support two methods, the de facto lookat-method and a perspective-altering method. public interface XlibCamera /** Sets this camera to stand at the position detailed by * pos and at the point look with up as the up-vector */ public void lookat(float[] pos, float[] look, float[] up); API */ /** * Sets the camera perspective * znear is the near clipping plane * zfar is the far clipping plane * deg is the FOV in degrees */ public abstract void setperspective(float znear, float zfar, float deg); /** Abstract method that returns the physical representation, depending on the underlying public abstract Object representation(); This approach was chosen for two reasons: 1. It was very simple to use as most application programmers have come in contact with both methods. 2. Since MCv3 didn't support a camera, it was very simple to implement these two methods without breaking its internal rendering pipeline. There were problems encountered with both implementations of the camera class, detailed below each of the code blocks. 26

29 M3G public class M3GCamera implements XlibCamera // Internal camera representation public static class CamPair public Camera camera; public Transform transform; CamPair cp = null; // Width and height int w, h; /** Constructs an empty camera with no perspective. * WARNING: Assumes that cx and cy are the true CENTER of the screen, * so width and height are calculated with cx * 2, cy * 2. * For some the call to Factory.xlibCamera might change when shifting * from MCv3 to M3G from the xlibcamera(cx, cy) to * actually xlibcamera(getwidth()/2, getheight()/2). */ public M3GCamera(int cx, int cy) // Simple creation cp = new CamPair(); cp.camera = new Camera(); cp.transform = new Transform(); // Store width and height (needed in setprojection) w = cx << 1; h = cx << 1; /** Sets this camera to stand at the position detailed by * pos and at the point look with up as the up-vector */ public void lookat(float[] pos, float[] look, float[] up) // Cross product to get side vector float[] side = (look[1] * up[2]) - (look[2] * up[1]), (look[2] * up[0]) - (look[0] * up[2]), (look[0] * up[1]) - (look[1] * up[0]); // Normalize the side vector float inv_len = 1.0f / (float) java.lang.math.sqrt(side[0] * side[0] + side[1] * side[1] + side[2] * side[2]); side[0] *= inv_len; side[1] *= inv_len; side[2] *= inv_len; // Another cross product to make the UP vector perpendicular float[] up2 = (side[1] * look[2]) - (side[2] * look[1]), (side[2] * look[0]) - (side[0] * look[2]), (side[0] * look[1]) - (side[1] * look[0]); // Finally store the matrix elements float[] mat = new float[16]; mat[0] = side[0]; mat[1] = up2[0]; mat[2] = -look[0]; mat[3] = pos[0]; mat[4] = side[1]; mat[5] = up[1]; mat[6] = -look[1]; mat[7] = pos[1]; mat[8] = side[2]; mat[9] = up[0]; mat[10] = -look[2]; mat[11] = pos[2]; mat[12] = 0.0f; mat[13] = 0.0f; mat[14] = 0.0f; mat[15] = 1.0f; // Assign it to our transform 27

30 cp.transform.set(mat); /** * Sets the camera perspective * znear is the near clipping plane * zfar is the far clipping plane * deg is the FOV in degrees */ public void setperspective(float znear, float zfar, float deg) // Use the Camera.setPerspective method. Calculate the aspect ratio // and scale all values to the true floating representation cp.camera.setperspective(deg, ((float)w)/((float)h), znear, zfar); /** Returns the physical representation in the MCv3 API */ public Object representation() return cp; The M3G implementation was the one with least problems design-wise, as the only problems here were implementing the lookat method, as it didn't exist natively. It had to be a very memory efficient implementation and also as fast as possible. The initial implementation used static memory storage for all arrays used internally but became thread-unsafe. As we wanted to keep the implementation thread safe we stuck the variables within the method. The downside of this is that there is a big allocation every time we call a lookat method that might decrease performance, so it should be called a restricted amount of times. MCv3 The problems encountered in the MCv3 implementation were quite different in nature from the M3G one. First of all, the design was simple as all the methods that our class supports were natively supported by MCv3, so we didn't need to do any extra computations. However, there were some integer-math issues, specifically with plane boundaries in the setperspective method. The method is displayed below: public void setperspective(float znear, float zfar, float deg) // Use the FigureLayout.setPerspective method, use MCTransform.degScale // to correctly input degrees and scale znear and zfar // Also, to maintain a good range of floating numbers // the numbers are divided by 10.0f // This allows the user to input a znear zfar from f // which is much better than without the division ([0-10.0f]) int zn = (int)((znear/10.0f)*mctransform.one); zn = Math.max(1, zn); int zf = (int)((zfar/10.0f)*mctransform.one); zf = Math.max(zn + 1, Math.min(zf, 32760)); int d = (int)(deg * MCTransform.degScale); System.out.println("setPerspective(" + zn + ", " + zf + ", " + d + ");"); camera.setperspective(zn, zf, d); The problem was that in Mascot Capsule v3, the largest value the far plane may have is and the lowest value of the near plane is 1. This does not fit our floating point model particularly well and thus some conversions had to be made. These conversions had to be made so that perspective changes in one API will still be identical in the other. 28

31 A soft cap had to be chosen between 1.f and 100.f, which was to be applied in both M3G and MCv3. This lets us keep the consistency while not having a huge impact on application development as mobile applications seldom need huge far planes in cameras. 3.5 Rendering pipeline After setting up the primitive classes that our library consists out of, we needed to define the rendering pipeline for each API and keep results consistent. To begin with all differences need to be identified and then eliminated through our abstraction. After analysis it is clear that both APIs can use the exact same approach with a little modification. M3G, which is normally a scene graph based renderer, can be shifted to render meshes primitively with only a transform attached to them and a global camera set. This is exactly how MCv3 works, which means this is what the abstract pipeline must look like. The chosen approach was: 1. Bind a Graphics resource to render to 2. Clear the screen with a certain color 3. Set global camera 4. Render primitives 5. Flush the back buffer A ideal rendering loop used to determine the above approach is displayed here: void paint(graphics g) IRenderer rend = getrenderer(); rend.begin(g); rend.setcamera(getcamera()); rend.rendermesh(getmesh(), gettransform()); rend.finish(g); When implementation began it was clear that both APIs had very different problems associated with them M3G Renderer M3G supports the notion of a global camera and thus that piece of the rendering pipeline was easy. The only modification to the M3G normal pipeline was to use it exclusively in the immediate mode, meaning we circumvent M3Gs scene graph. The method used for drawing primitives is the following: void render(vertexbuffer vertices, IndexBuffer triangles, Appearance appearance, Transform transform) Which fits perfectly, as those are the primitives we already have stored for each of the meshes. The actual render method was very simple, as the M3G pipeline could mimic our abstract pipeline perfectly: public void rendermesh(xlibmesh mesh, XlibTransform trans) // Set the camera M3GCamera.CamPair cp = (M3GCamera.CamPair)cam.representation(); g3d.setcamera(cp.camera, cp.transform); 29

32 // Fetch components Appearance app = (Appearance)mesh.getAppearance().representation(); M3GMeshData.MeshArrays ma = (M3GMeshData.MeshArrays)mesh.getData().representation(); // Render g3d.render(ma.vbuf, ma.ibuf, app, (Transform)trans.representation()); MCv3 Renderer In MCv3 the approach was a bit more advanced as the notion of a global camera doesn't exist. Thus we have to transform each object with the global camera before feeding it to the MCv3 system. The problem here was with performance, as we needed an extra matrix to store the rendering information in. Again the threading issue arose as we could either have a global variable that we use once per renderer instance and sacrifice thread safety (as we do not want any expensive locks) or we could allocate a big matrix for every object that is rendered. Here a middle ground was chosen. Every renderer instance holds its own transform buffer, making it possible for a thread to own a renderer and thus keep thread safety at a cost of a extra matrix per thread. This was the ideal solution and solved all our problems regarding both thread safety and performance. In the end, we ended up with this method: public void rendermesh(xlibmesh mesh, XlibTransform trans) // If there's no camera, don't render if(cam == null) return; // Fetch the camera transform (a FigureLayout in MCv3) FigureLayout camlayout = (FigureLayout)cam.representation(); AffineTrans camtrans = camlayout.getaffinetrans(); // Get the data MCMeshData.MeshArrays arr = (MCMeshData.MeshArrays)mesh.getData().representation(); MCMeshAppearance.Effects eff = (MCMeshAppearance.Effects)mesh.getAppearance().representation(); // Fix camera transform if(trans!= null) transbuffer.setidentity(); transbuffer.mul(camtrans, ((AffineTrans)trans.representation())); camlayout.setaffinetrans(transbuffer); // Render it g3d.renderprimitives(eff.texture, 0, 0, camlayout, eff.shading, arr.command, arr.numtris, arr.verts, arr.norms, arr.texels == null? arr.verts : arr.texels, arr.cols); // Restore camera transform camlayout.setaffinetrans(camtrans); Every rendered object gets transformed first by its local-to-world matrix and then by the camera matrix. Then the camera matrix is applied to the transformation parameter of MCv3's render method. 3.6 Summary The initial implementation of the abstraction was a very good and easy to use API. We implemented the subset of M3G and MCv3 that were common between APIs and thus got a useful rendering environment. We can now render textured objects with transforms in a full 3D environment. To continue work on this API we now need to expand our featureset to include all features present in M3G and emulate them as needed in MCv3. 30

33 4. Abstraction additions The emulation step of the abstraction begins here, as we will be implementing features that are missing from one of the APIs. In some steps we might even have to emulate them in both APIs due to performance issues. To begin with, we will implement basic collision detection through ray intersection. 4.1 Collision through ray intersection In M3G we already have an implemented collision routine. It can cast rays through groups of objects and report which one you have collided with and what the distance is. This is a very handy method to have for any 3D application. In our interface we have decided to place the collision within every mesh. That is, a XlibMesh holds a method, raycollision that will report if a given ray collides with that specific mesh. In order to cast a ray against every mesh in your scene you just iterate over all meshes and perform the collision, storing the mesh whose collision distance is the shortest. The abstract method follows: /** Sends a ray through this mesh. The transform trans * is applied to the mesh's vertices to obtain the * world-space coordinates. * Returns < 0.0f if there was no collision. */ public abstract float rayintersection(float[] rayorigin, float[] raydirection, XlibTransform trans); The biggest problem here is the performance. Not only do we need top notch speed so it can be used in real time, we also need a tight memory limit as memory is very scarce. For M3G this will not be an issue as M3G already has a native ray-casting method M3G Collision As previously mentioned, M3G already has a built-in system for ray collision. This makes our implementation in M3G simple. In M3G a ray can be cast through a Group of objects and it reports the object it collided with and the distance to it. The actual method was small and simple: public float rayintersection(float[] rayorigin, float[] raydirection, XlibTransform trans) RayIntersection ri = new RayIntersection(); Group g = new Group(); // Set the transform m.settransform((transform)trans.representation()); // Add to group and perform picking float dist = -10.0f; g.addchild(m); if(g.pick(-1, rayorigin[0], rayorigin[1], rayorigin[2], raydirection[0], raydirection[1], raydirection[2], ri)) dist = ri.getdistance(); // Remove mesh g.removechild(m); return dist; 31

34 4.1.2 MCv3 Collision Earlier it was stated that performance and memory will be an issue, thus we must find an algorithm that fits hand held devices in both processing speed and memory footprint. During the research phase we came across a interesting paper by Tomas Akenine-Möller and Ben Trumbore [MT97] that dealt with ray-triangle collision in a memory efficient manner. This method had a clear advantage in memory usage, as much data wasn't needed. However, it also suffered from being only slightly slower than standard methods. After further research and testing the choice became clear, memory was a far bigger factor, as any collision detection already is slow, due to handsets being slow at computation. Memory was a much scarcer resource and as handsets get faster the differences between the two methods will more or less be zero. First we need to transform all vertices to world-space by multiplying them with the transform we supplied to the method. After that we run the collision algorithm on every triangle the mesh consists out of and report the collision (if any) with the smallest distance. There are also plenty optimizations to be made here for different cases of collision. For instance, for only detecting a collision we can break out of the triangle-collision-loop as soon as we find the first intersection. In the end we chose the more practical solution of always reporting the smallest distance. The final method with the implemented algorithm: public float rayintersection(float[] rayorigin, float[] raydirection, XlibTransform trans) // Fetch internal arrays MCMeshData.MeshArrays arrays = (MCMeshData.MeshArrays)getData().representation(); // Storage for transformed vertices int[] tverts = new int[arrays.verts.length]; // Transform with trans AffineTrans at = (AffineTrans)trans.representation(); Vector3D tv = new Vector3D(); for(int i = 0; i < tverts.length; i+=3) tv.x = arrays.verts[i]; tv.y = arrays.verts[i+1]; tv.z = arrays.verts[i+2]; Vector3D v = at.transform(tv); tverts[i] = v.x; tverts[i+1] = v.y; tverts[i+2] = v.z; // Iterate over all triangles in mesh float mind = Float.MAX_VALUE; float[] res = null; for(int i = 0; i < arrays.verts.length; i+=9) res = triintersect(rayorigin, raydirection, tverts, i); // Check if it is smaller than the currently shortest // distance if(res!= null) mind = Math.min(minD, Math.abs(res[0])); // Return the result if(res == null) return -10.0f; else return mind; 32

35 In tests this turned out to be a fantastic method. We got collision detection that was comparably perfect on both APIs and kept our results consistent. 4.2 Scene graph rendering The second feature we need to emulate across MCv3 and that is already supported in M3G is scene graph rendering. M3G is originally a scene graph API and already has support for this, while MCv3 has no support what so ever for scene graphs. To solve this problem there was an issue that needed to be resolved. Should we write a scene graph from scratch and use it for both M3G and MCv3, or should we try to emulate scene graphs on MCv3 only and keep the existing implementation on M3G? To keep the existing implementation would mean implementing the entire M3G scene graph structure in MCv3 so it can be used with our classes, which meant we'd also have to implement the bloated and unnecessary parts of the APIs scene graph. Implementing a scene graph from scratch was the right way to go, both memory and computation-wise. This solution also meant that our scene graph could be implemented entirely in our abstract package, and not needed to be localized for each API in the specific packages. Considerations regarding to memory and computational performance were taken during the entire development of the scene graph classes. The entire class was small, easy to use and very functional: public class XlibNode // Internal vector holding all children private Vector children = new Vector(); // Internal transform private XlibTransform transform; // Internal mesh private XlibMesh mesh; // Used for rendering private static XlibTransform trans; public XlibNode() mesh = null; transform = null; public XlibNode(XlibMesh m, XlibTransform t) mesh = m; transform = t; public void addchild(xlibnode n) children.addelement(n); public void removechild(xlibnode n) children.removeelement(n); public XlibTransform gettransform() return transform; public void settransform(xlibtransform trans) transform = trans; public XlibMesh getmesh() return mesh; public void setmesh(xlibmesh m) mesh = m; public Vector getchildren() return children; 33

36 Rendering a scene graph would now be easy and since the graph structure only uses our abstract classes it would be the same across all APIs. We just need to keep a simple matrix stack so that we can preserve transformation information as we traverse the graph. This introduces some more memory usage, especially in a threaded application that might use several renderers. However, it was impossible to circumvent and in a case where memory is critical, developers may opt to not use scene graphs and thus not use the extra allocated memory. The common implementation: public void rendergraph(xlibnode root) XlibTransform t = null; boolean pushed = false; // Push onto matrix stack (if needed) if(mstack.isempty() && root.gettransform()!= null) t = Factory.xlibTransform(); t.mul(ident, root.gettransform()); mstack.addelement(t); pushed = true; else if(root.gettransform()!= null) t = Factory.xlibTransform(); t.mul((xlibtransform)mstack.lastelement(), root.gettransform()); mstack.addelement(t); pushed = true; // Render (if needed) if(root.getmesh()!= null) rendermesh(root.getmesh(), t); // Render children for(int i = 0; i < root.getchildren().size(); ++i) rendergraph((xlibnode)root.getchildren().elementat(i)); // Pop matrix stack if(pushed) mstack.removeelementat(mstack.size() - 1); The memory stack could have been allocated upon the very first use of the graph functions but after consideration was chosen to stay allocated on creation of a renderer. This is mostly for debugging purposes, as a call to render graph (in the case of on-demand allocation) might result in depletion of memory at a very late stage in a application. 4.3 Key frame animation Key frame animation exists in both APIs but in different ways. M3G has a full implementation of it, while MCv3 only allows key frame animation on pre-baked models loaded through their own closed binary format. Again it was chosen to be written from scratch, as the M3G implementation is heavy and cumbersome. We need a very lightweight solution that can mirror the needs of mobile 3D developers. The first step towards streamlining performance and memory use was to eliminate time-based animation altogether. This allows the developers to choose their own method of animation, whether it be per frame, event or tick. It also lowers the memory usage considerably. Our system boils down to a simple blending interface, where several meshes are added to a animation set and given unique identifiers (animation state names) that can be blended with each other. The application developer assigns the current state and the strength of that state, where a 1.f means to render the mesh that represents that state, a 0.f renders the previous mesh state and anything in between performs blending between the previous and current state. 34

37 Note that only two states can be blended, so if the application developer goes from 0.5f blending of two states and switches to a third state, only the second and third will be blended and the blending information of the first state will be lost. This is also a memory optimization that places some restrictions on the developer that we feel are more than reasonable. Another restriction we impose is the fact that you can't blend meshes that don't have the exact same amount of vertices. This in order to speed up the actual blending process. Also, this is the most common use of blending, where the same mesh in different states (keyframes) is animated. For this an abstract base class was needed, as we could provide plenty implementation details in the abstract package and only needed the actual physical vertex blending to be placed in the specific packages. public abstract class XlibKeyframe /** The Hashtable that holds all meshes with unique identifiers */ private Hashtable meshes; /** Last mesh */ private String lastmesh = ""; /** Current mesh */ private String currentmesh = ""; /** Current blending strength ( ) */ private float blend = 1.0f; /** The internal render object */ private XlibMesh robj = null; public XlibKeyframe() meshes = new Hashtable(); /** Adds a mesh to this keyframe animation set and gives it a * unique name that is later used when animating the model. If the * name already exists in the set it will be replaced. * Returns boolean true if the add was successful, otherwise * returns false. Subclasses are responsible for checking * meshes prior to add. */ public boolean addmesh(xlibmesh mesh, String name) if(mesh!= null && validate(mesh)) meshes.put(name, mesh); else return false; if(meshes.size() == 1) lastmesh = currentmesh = name; blend = 1.0f; robj = mesh; return true; /** Removes a mesh from this keyframe animation set */ public void removemesh(string name) meshes.remove(name); if(meshes.size() == 1) lastmesh = currentmesh = (String)meshes.keys().nextElement(); blend = 1.0f; /** Gets the current mesh for rendering */ public XlibMesh renderobject() 35

38 return robj; /** Sets the current animation state */ public void animate(string frame, float strength) boolean exists = meshes.containskey(frame); int size = meshes.size(); if(exists && size > 1) // Check if we're just blending the current animation if(!currentmesh.equals(frame)) lastmesh = currentmesh; currentmesh = frame; blend = strength; // Check for endpoints if(strength <= 0.01f) robj = (XlibMesh)meshes.get(lastMesh); else if(strength >= 0.99f) robj = (XlibMesh)meshes.get(currentMesh); else // Blend and return robj = blend((xlibmesh)meshes.get(lastmesh), (XlibMesh)meshes.get(currentMesh), blend); else if(size == 1 && exists) // Only one mesh in the set, set the render object robj = (XlibMesh)getMeshes().nextElement(); else // Either it doesn't exist, or the set is empty robj = null; /** Gets an enumeration of the meshes */ public Enumeration getmeshes() return meshes.elements(); /** Overridden by subclasses, performs actual blending */ protected abstract XlibMesh blend(xlibmesh from, XlibMesh to, float strength); /** Overridden by subclasses, returns true if the mesh is valid to add * to this set. All current meshes (XlibMesh) can be fetched * through the getmeshes() public method. */ protected abstract boolean validate(xlibmesh mesh); The abstract implementation is quite simple and relies only on two specific methods. One for validation, where the mesh is checked against our restraints (same amount of vertices as other meshes) and the actual blending of vertices M3G animation In M3G the performance will suffer the most, as we need to perform some array copying operations in order to get a hold of the mesh vertices, then we need to put them back into the mesh by using M3G's vertex buffer classes. protected XlibMesh blend(xlibmesh from, XlibMesh to, float strength) if(internal == null from == null to == null) return null; // From M3GMeshData.MeshArrays f = (M3GMeshData.MeshArrays)((M3GMeshData)((M3GMesh)from).getData()).representation(); 36

39 // To M3GMeshData.MeshArrays t = (M3GMeshData.MeshArrays)((M3GMeshData)((M3GMesh)to).getData()).representation(); // Static buffers short[] buf1 = new short[3]; short[] buf2 = new short[3]; short tmp; // Use linear interpolation for(int i = 0; i < f.vbuf.getvertexcount(); i++) // Get vertices System.arraycopy(f.verts, i*3, buf1, 0, 3); System.arraycopy(t.verts, i*3, buf2, 0, 3); // Interpolate for(int j = 0; j < 3; ++j) tmp = (short)((buf2[j] - buf1[j]) * strength); buf2[j] = (short)(buf1[j] + tmp); // Set vertices M3GMeshData.MeshArrays arr = (M3GMeshData.MeshArrays)((M3GMeshData)((M3GMesh)internal).getData()).representation(); arr.vbuf.getpositions(null).set(i, 1, buf2); // Check which appearance to use if(strength > 0.5f) internal.setappearance(to.getappearance()); else internal.setappearance(from.getappearance()); // All done return internal; We use an internal mesh that holds all interpolated data at a given time. Also, we can only state-shift appearances (shading information) which is done at the 0.5f mark MCv3 animation In MCv3 the animation was simpler to implement and less performance intensive. Here we already have the vertex data as arrays and can manipulate them in a very easy fashion. protected XlibMesh blend(xlibmesh from, XlibMesh to, float strength) if(internal == null from == null to == null) return null; // From MCMeshData.MeshArrays f = (MCMeshData.MeshArrays)((MCMeshData)((MCMesh)from).getData()).representation(); // To MCMeshData.MeshArrays t = (MCMeshData.MeshArrays)((MCMeshData)((MCMesh)to).getData()).representation(); // Local MCMeshData.MeshArrays l = (MCMeshData.MeshArrays)((MCMeshData)((MCMesh)internal).getData()).representation(); // Use linear interpolation for(int i = 0; i < f.verts.length; i++) // Interpolate l.verts[i] = f.verts[i] + (int)((t.verts[i] - f.verts[i]) * strength); // Check which appearance to use if(strength > 0.5f) internal.setappearance(to.getappearance()); 37

40 else internal.setappearance(from.getappearance()); mark. // All done return internal; Here we also use a internal mesh to store blending data and swap shading data at the 0.5f Validation Validation code is omitted as it is fairly simple. For both cases we simply check if the list of animations is empty and thus always validate to true and create our internal buffer of mesh data. If the buffer isn't empty, we compare the number of vertices with the internal buffer element. 5. Abstraction improvements The last step in the development of the API is to improve existing functionality both in the abstract API and in the abstracted APIs. This will be done by benchmarking specific parts of an application's execution. If the base APIs perform better, then the abstract API needs to be improved. If it's the other way around we'll still look at the abstract API to see if anything else can be optimized. 5.1 Collision To compare collision we needed to devise a benchmark that wouldn't just test the collisions, but test it in a realistic case (such as a game). The benchmark will test 50 collisions per frame, with two different origins. That is 25 collisions per unique origin. The first comparison of the collision (our [TM97] implementation vs. the native implementation in M3G) yielded the following results (milliseconds are averages): M3G 500 ms MCv3 [TM97] 100 ms As seen above, our implementation of [TM97] was fairly slow. This means that the method needs improving. The first approach was to cache the transformation of an object, as it in most cases is static. This means we don't have to re-calculate the transformed vertices for each ray collision call. With this improvement we got the following results: 38

M3G 500 ms MCv3 [TM97] 60 ms Only a slight improvement. What we also needed was a way to reject the entire collision if the cached parameters matched.

41 M3G 500 ms MCv3 [TM97] 60 ms Only a slight improvement. What we also needed was a way to reject the entire collision if the cached parameters matched. Apart from storing the last transform we now chose to store the last direction and origin as well and as soon as they were within acceptable boundaries we could reject the entire collision and return the cached value. The results from all benchmarks above are clearly in favour of our [TM97] implementation. It should still be noted that there is no early-rejection in our implementation of [TM97] and could be added through bounding spheres. Since we are content with the achieved results by optimizations we will not optimize further unless it is needed later on. As obvious, we need to exchange the inherent M3G collision method with our own, as it performs much better. There are some issues to consider before starting: 1. We will have to settle for having slightly worse performance on M3G, as vertices aren't stored in native arrays as in MCv3 2. Results of a VertexArray transform are done in floats, meaning one extra float array and a conversion loop. With these things in mind, we re-wrote the old collision method to this form: // Some statics used by this method to minimize memory allocation // while running static float[] vert0 = new float[3]; static float[] vert1 = new float[3]; static float[] vert2 = new float[3]; static float[] edge1 = new float[3]; static float[] edge2 = new float[3]; static float[] tvec = new float[3]; static float[] pvec = new float[3]; static float[] qvec = new float[3]; static float det = 1.0f; static float inv_det = 1.0f; static float EPSILON = f; static float tmp0 = 0.0f; static float tmp1 = 0.0f; // Ray intersection cache, per mesh short[] lastvertarray = null; XlibTransform lasttransform = null; float[] lastorig = null; float[] lastdir = null; float lastres = 0.0f; /** Sends a ray through this mesh. The transform trans * is applied to the mesh's vertices to obtain the * world-space coordinates. * Returns < 0.0f if there was no collision. */ public float rayintersection(float[] rayorigin, float[] raydirection, XlibTransform trans) 39

42 if(rayorigin == null raydirection == null) return -10.0f; // Check if we have cached the result of this transform if(lasttransform == trans && lastvertarray!= null) // Bail if result is cached if(lastorig!= null && XMathUtil.vectorCompare(lastOrig, rayorigin) && XMathUtil.vectorCompare(lastDir, raydirection)) return lastres; else // Fetch internal arrays VertexBuffer vbuf = m.getvertexbuffer(); VertexArray vpos = vbuf.getpositions(null); lastvertarray = new short[vpos.getvertexcount() * 3]; if(trans!= null) // Temporary floating store float[] tmpfarr = new float[vpos.getvertexcount() * 4]; ((Transform)trans.representation()).transform(vPos, tmpfarr, true); else // Conver to shorts for(int i = 0; i < tmpfarr.length; i+=4) int j = i / 4; lastvertarray[j * 3] = (short)tmpfarr[i]; lastvertarray[j * 3 + 1] = (short)tmpfarr[i + 1]; lastvertarray[j * 3 + 2] = (short)tmpfarr[i + 2]; vpos.get(0, vpos.getvertexcount(), lastvertarray); lasttransform = trans; lastorig = rayorigin; lastdir = raydirection; // Iterate over all triangles in mesh float mind = Float.MAX_VALUE; float res = -10.0f; for(int i = 0; i < lastvertarray.length; i+=9) res = triintersect(rayorigin, raydirection, lastvertarray, i); // Check if it is smaller than the currently shortest // distance if(res >= 0.0f) mind = Math.min(minD, Math.abs(res)); // Return the result if(res >= 0.0f) lastres = mind; return mind; else return -10.0f; /** Private. Subtracts two vectors and stores the results in * result. */ private void subvectors(float[] result, float[] v1, float[] v2) result[0]=v1[0]-v2[0]; result[1]=v1[1]-v2[1]; result[2]=v1[2]-v2[2]; 40

43 /** Private. Calculates the dot product between two vectors. */ private float dotvectors(float[] v1, float[] v2) return v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2]; /** Private. Calculates the cross product of two vectors and stores * it in result. */ private void crossvectors(float[] result, float[] v1, float[] v2) result[0]=v1[1]*v2[2]-v1[2]*v2[1]; result[1]=v1[2]*v2[0]-v1[0]*v2[2]; result[2]=v1[0]*v2[1]-v1[1]*v2[0]; private void debug(string name, float[] tvec) //System.out.println(name + ": (" + tvec[0] + "," + tvec[1] + "," + tvec[2] + ")"); /** Private. Calculates the distance, u and v coordinates * of the collision between a triangle and the ray. * The algorithm is from Möller and Trumbore 97 [MT97] and is * an excellent and fast method of doing ray intersection * with less memory than most standard methods. * See * for more information. */ private float triintersect(float[] orig, float[] dir, short[] verts, int firstindex) // Sanity check if(firstindex + 8 > verts.length) return -10.0f; // Extract the vertices vert0[0] = (float)verts[firstindex++]; vert0[1] = (float)verts[firstindex++]; vert0[2] = (float)verts[firstindex++]; vert1[0] = (float)verts[firstindex++]; vert1[1] = (float)verts[firstindex++]; vert1[2] = (float)verts[firstindex++]; vert2[0] = (float)verts[firstindex++]; vert2[1] = (float)verts[firstindex++]; vert2[2] = (float)verts[firstindex++]; //debug("v1", vert0); //debug("v2", vert1); //debug("v3", vert2); /* find vectors for two edges sharing vert0 */ subvectors(edge1, vert1, vert0); subvectors(edge2, vert2, vert0); /* begin calculating determinant - also used to calculate U parameter */ crossvectors(pvec, dir, edge2); /* if determinant is near zero, ray lies in plane of triangle */ det = dotvectors(edge1, pvec); if (det > -EPSILON && det < EPSILON) return -10.0f; inv_det = 1.0f / det; /* calculate distance from vert0 to ray origin */ subvectors(tvec, orig, vert0); /* calculate U parameter and test bounds */ tmp0 = dotvectors(tvec, pvec) * inv_det; if (tmp0 < 0.0 tmp0 > 1.0) return -10.0f; /* prepare to test V parameter */ crossvectors(qvec, tvec, edge1); /* calculate V parameter and test bounds */ tmp1 = dotvectors(dir, qvec) * inv_det; if (tmp1 < 0.0 tmp1 + tmp0 > 1.0) return -10.0f; 41

/* calculate t, ray intersects triangle */ return dotvectors(edge2, qvec) * inv_det; After this quite large change, we did another comparison with the collision in MCv3.

By using our implementation of [TM97] we increased the performance of collisions in M3G (compared to the built-in system) by a factor of 6,25.

44 /* calculate t, ray intersects triangle */ return dotvectors(edge2, qvec) * inv_det; After this quite large change, we did another comparison with the collision in MCv3. We obtained the following results: M3G M3G [TM97] 500 ms 80 ms MCv3 [TM97] 60 ms The results were satisfying. By using our implementation of [TM97] we increased the performance of collisions in M3G (compared to the built-in system) by a factor of 6,25. As a result of this benchmark we chose to keep our [TM97] implementation in both M3G and MCv3 as it performed much better. 5.2 Scene graph We wish to test the power of our scene graph, versus the scene graph in M3G. We will have to benchmark rendering a scene (application [B]) on our xlib scene graph and then on the inherent M3G scene graph. We will then compare which method gets the best average FPS. This choice will allow us to choose which implementation we wish to use for our xlib. Writing an application for this was challenging, as the main cost of every frame was the actual rendering. By reducing object complexity and maximizing the number of scene graph objects we came to the following benchmark results (milliseconds are averages): M3G M3G [internal] 70 ms 80 ms 42

45 From the very first test we see that our scene graph is in fact pretty powerful as-is, as it outperformed M3G's scene graph. Our graph is a lighter version with slightly less functionality, but perfect for the goal of this project. The conclusion from the above test was to keep our current scene graph as the scene graph used in M3G. 6. Similarities with other projects During the writing of this paper, we could find no other attempts at similar abstractions between M3G and MCv3. There were several abstractions for application development of 2D applications on mobile phones but nothing that actually abstracted both of these widespread rendering APIs. If look beyond commercially available solutions we can find that some companies known to us that have in-house abstractions of certain parts of both APIs. This can be camera handling, or matrices. However it seems most chose to have some parts unabstracted and left for implementation on a perapplication basis. The Finnish-based company Digital Chocolate, showed in several articles how they used preprocessor defines in their projects to generate code for the right API before the actual compilation. It is unclear whether they abstracted the entire two APIs or just select parts. 6.1 Comparison with a preprocessor-based solution Having a preprocessor-based solution is convenient, as the implementation of every class in the code base can differ depending on the underlying API. However, the biggest gain here is that large parts can be left untouched and only insert platform-specific code where needed. However, we feel that our approach with separating code entirely in two packages has many advantages over this method. First of all we improve compile times by not having a separate preprocessor step. We would also argue that without preprocessor blocks, code is much easier to read and understand. Finally, functionality that should exist on both APIs is achieved through object orientation and abstraction. A good example of this is our xutils library, that exists for both APIs. On the other hand, having too much abstraction and hierarchy can hamper performance heavily through cache invalidation and memory load. Common sense must be used in the approach we chose, so that performance isn't lost. 6.2 Comparison with a partial-abstracted solution A partial abstraction means that some parts of the rendering pipeline are fully abstracted, while others are left to be implemented on a per-application basis. This approach has the strength of letting the application developers design and optimize some parts of the pipeline to suit the given application, instead of using a generic do-all be-all pipeline. In our solution, we have abstracted every single part of the pipeline and let the application programmer only interface with our libraries. The advantage here is that since all operation is done through our libraries then adding another API does not affect the application code at all. Similarly, if some part of the pipeline needs to be optimized for a given application, it is always easy to create a application-specific version of the API-libraries and rewrite them to suit the application. This kind of solution is more of a one-shot abstraction done for a single project, or a series of concurrent projects. Thus it has no real future proofing. 43

6.3 Additional benefits of our method One powerful benefit is being able to choose the faster API on a given handset for your application, not to mention all the performance improvements our

46 6.3 Additional benefits of our method One powerful benefit is being able to choose the faster API on a given handset for your application, not to mention all the performance improvements our abstraction has brought. On handsets that implement all, or a subset of, the supported APIs it can be left up to the user to choose whichever API that suits his handset the best, through perhaps a settings dialog. This is an approach that is very similar to that on PCs as there is so much conflicting hardware configurations that some parts are best left up to the PC owner. Also, by developing with our abstract interface, even new APIs (such as GLES) will not be a problem, as adding another specific library is a quick task. This leaves your application programmers free to actually design and develop the application, while the low-level portability is handled by our abstract API. 6.4 Possible drawbacks Application development on mobile handsets is dependant on both performance and memory footprint. By adding an abstracted layer we add a small overhead on library calls and also add extra memory usage. Cutting-edge performance can only be gained through using pure library calls. In projects where this is the main concern (technology demonstrations, etc) then this API is not a suitable solution. However, one can argue that since we provided so many algorithmic performance increases our solution can still be a viable solution. 7. Conclusion The produced abstraction has both its differences and its similarities to the two abstracted APIs. We have kept a low-level profile but expanded through functionality such as a scene graph, key frame animation support and a camera implementation. Size-wise our API turned out to be quite lightweight with a size of approximately 27 kb (Figure C). This is to be compared with the very heavy M3G at over 50kB and the feature slim MCv3 at 21kB. Our API is thus a bit heavier than MCv3 but provides a lot more functionality. A downsize to using an abstracted API is of course the heavier memory load, as without our abstraction one would save the 27kB of memory. However, in today's market memory limits aren't a problem any more and 27kB extra is no longer a concern on all modern 3D-enabled handsets. (Figure C) 44

Could you make the XNA functions yourself?

1 Could you make the XNA functions yourself? For the second and especially the third assignment, you need to globally understand what s going on inside the graphics hardware. You will write shaders, which